Full-Time Staff Software Engineer, Science
Job Description
The Team
Founded by Priscilla Chan and Mark Zuckerberg in 2015, the Chan Zuckerberg Initiative (CZI) is a new kind of philanthropy that’s leveraging technology to help solve some of society’s toughest challenges – from eradicating disease, to improving education. Across our core initiatives of Science and Education, we’re pairing engineering with grantmaking, impact investing, policy work, and advocacy, to progress in our mission of building an inclusive, just and healthy future for everyone.
Our Values
- We believe we can help build a future for everyone.
- We aim to be daring, but humble: We look for bold ideas — regardless of structure and stage — and help them scale by pairing engineers with subject matter experts to build tools that accelerate the pace of social progress.
- We want to learn fast, but build for the long-term: We want to iterate fast and help bring new solutions to the table, but we also realize that important breakthroughs often take decades, or even centuries.
- Stay close to the real problems: We engage directly in the communities we serve because no one understands our society’s challenges like those who live them every day.
Our success is dependent on building teams that include people from different backgrounds and experiences who can challenge each other’s assumptions with fresh perspectives. To that end, we look for a diverse pool of applicants including those from historically marginalized groups — women, people with disabilities, people of color, formerly incarcerated people, people who are lesbian, gay, bisexual, transgender, and/or gender nonconforming, first and second generation immigrants, veterans, and people from different socioeconomic backgrounds.
CZI also supports multiple work options. Learn more about our philosophy and approach here.
The Opportunity
The Data Engineering team manages and processes scientific datasets specifically designed to enable biological modeling. It is responsible for data validation, wrangling, testing, storage, and retrieval. We handle over 89 million unique cells worth of single cell transcriptomic data, over 15 thousand cryoET tomograms that are in imaging datasets as large as 20TB and counting, and will be expanding to support larger scale and additional imaging, sequencing, and literature modalities. Our resources provide access to open source data that is structured and used by tens of thousands of scientists each month to quickly query and form hypotheses on understanding how genetic variants in cells impact disease risk, define drug toxicities, and eventually discover better therapies.
As a staff software engineer on the Data Engineering team, you will design and implement all the data needs for our platforms, CELLxGENE Discover, CryoET, as well as the new platform we are building that has a focus on data for AI and the virtual cell, in order to enable scientists to further interrogate our very large and growing corpus of data without any need to download the data itself or have any computational expertise. You will work on a collaborative, multidisciplinary team to develop solutions for our scientist users to accelerate their workflows and accelerate the pace of scientific discovery. You will be responsible for setting the direction of how our teams ingest, transform, validate, process, store, monitor, and utilize petabytes of data for ease of use, search and modeling. You will also be responsible for upscaling the engineers around you and influencing the proper technical best practices and data design for efficient and effective delivery.
No prior biology experience is required for this role. You will have the opportunity to pair with Computational Biologists to develop solutions for our users and be able to learn about biology from experts on our team.
Our tech stack: Python, Terraform, AWS infrastructure, TileDB.
What You’ll Do
- Develop end-to-end, robust data pipeline architectures that seamlessly integrate data ingestion, preprocessing, feature engineering, model training, and deployment.
- Implement scalable data warehousing solutions to handle massive volumes of single-cell transcriptomics data and imaging data.
- Ensure data security and compliance with industry standards and regulations.
- Implement optimization strategies such as data partitioning, indexing, and compression to enhance query performance and reduce computational costs.
- Create user-friendly APIs to enable researchers and scientists to easily access and explore the curated data.
- Develop scalable, maintainable, and testable software systems and participate in team conversations and efforts on engineering excellence.
- Collaborate with product managers, computational biologists, UX designers, and other software engineers to deliver constant incremental value for scientists without compromising on software quality.
- Have opportunities to learn about scientific data and technologies, though no prior experience is required!
What You’ll Bring
- 5+ years of relevant software experience
- Strong fundamentals in systems design, data structures, algorithms, and object oriented programming principles.
- Past experience with data processing and orchestration pipelines, such as Argo Workflows, Databricks
- Solid experience with object oriented programming languages and scripting languages, such as Java, C++, Python, Golang, etc.
- Past experience with big data.
- Some experience with infrastructure and automation tools, such as Kubernetes, Terraform, AWS.
- Excellent written and verbal communication skills.
- Enthusiasm to ramp up on technologies and learn a new science domain.
- Experience working in a multidisciplinary environment (engineering, product, design).
- Desirable but not required: experience with scientific computing libraries, such as NumPy and SciPy.
How to Apply
https://grnh.se/127dd22f1us47 total views, 0 today