Enable job alerts via email!

Software Engineer, Data Infrastructure New York, NY // San Francisco, CA, United States

EvolutionaryScale

New York (NY)

On-site

USD 90,000 - 150,000

Full time

9 days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative company focused on leveraging artificial intelligence to revolutionize biology is seeking a Data Infrastructure Engineer. This role involves designing and developing large-scale data processing systems, ensuring reliability and efficiency in handling biology datasets. Collaborating closely with bioinformatics and research teams, you'll implement best practices and integrate cutting-edge technologies to drive continuous improvements. If you are passionate about data infrastructure and eager to contribute to transformative projects in health and society, this is the perfect opportunity for you.

Qualifications

5+ years of experience in large-scale data processing systems.
Strong problem-solving skills to debug complex technical issues.

Responsibilities

Design and maintain large-scale batch processing pipelines.
Optimize data ingestion and retrieval processes.

Skills

Large-scale data processing

Spark

Ray

Hadoop

Kafka Streams

Problem-solving

Cloud providers (AWS, GCP, Azure)

Biology knowledge

Tools

Spark

Ray

Hadoop

Kafka

Software Engineer, Data Infrastructure Lead

Who we are

EvolutionaryScale’s mission is to develop artificial intelligence to understand biology for the benefit of human health and society, through open, safe, and responsible research, and in partnership with the scientific community. Over the next ten years AI will transform biological design, making molecules and entire cells programmable. We will develop the foundation models for biology that enable this.

The EvolutionaryScale team is based in San Francisco and New York. We believe in flexibility around work schedules and locations, but expect that our team members will work half of the days or more of most weeks from one of our offices.

What you’ll do

As a Data Infrastructure Engineer, you will work closely with bioinformatics and research teams to ensure our data jobs are reliable, efficient, and scalable. You'll implement best practices for handling large-scale data processing, select and integrate the right technologies, and drive continuous improvements in performance and quality of our data sets.

The role

Design, develop, and maintain large-scale batch processing pipelines using tools like Spark and Ray, for acquiring biology datasets.
Manage data infrastructure components to ensure robust and fault-tolerant operations.
Optimize data ingestion, storage, and retrieval processes for acquiring large and growing biology datasets, and for efficient pre and post training data ingestion.
Create systems for easy and reproducible data evaluation and experiments.
Integrate modern ML based data curation technologies with data processing pipelines.
Work with researchers and other engineering teams to understand data needs, create solutions that meet modeling requirements.

Preferred qualifications

Apply even if you don’t meet all of these!

Proven experience with large-scale data processing systems using technologies such as Hadoop, Spark, or Ray.
Knowledge of streaming data frameworks like Kafka Streams, Spark Streaming, or Flink.
Understanding of data processing principles and best practices.
Strong problem-solving skills, including the ability to research, debug, and resolve complex technical problems.
Experience with major cloud providers (AWS, GCP, or Azure), including familiarity with data warehousing tools is a plus.
Knowledge of biology and biology datasets is a big plus but not required.
Experience with large scale distributed systems or machine learning is also not required but a plus.
5+ years of experience in the above systems.

Apply for this job

indicates a required field

First Name *

Last Name *

Email *

Phone

Resume/CV *

Enter manually

Accepted file types: pdf, doc, docx, txt, rtf

Are you legally authorized to work in the United States? * Select...

Do you now or will you in the future require sponsorship to work in the U.S.? (e.g., H-1B visa status)? *

When would you be available to start a new position? *

Can you work at the specified job location? *

New York City preferred

Open to either

Could you provide the contact details of one or two colleagues, collaborators, or managers who could serve as references for your work? Take your time with this request if needed; we only call references after you pass the full interview panel. Feel free to email this to us later as well.

If resource is not a problem and the sky's the limit, what is the first project you would like to work on at EvolutionaryScale?

What is the project from your resume that you are most proud of and why. Share any news articles, publications, or open source repos if available.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.