Enable job alerts via email!

Remote Senior Data Engineer @ Varwise

Varwise

Warszawa

Remote

PLN 120,000 - 180,000

Full time

7 days ago

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A leading Adtech company seeks a Remote Senior Data Engineer to scale big data solutions for impactful advertising campaigns. Candidates should possess strong skills in AWS, Spark, and data pipeline management. Join an international team working entirely remote, with opportunities for personal and technical growth through challenging projects.

Benefits

100% remote work

Free coffee

Free snacks

Modern office

In-house trainings

International team collaboration

Qualifications

8+ years of professional software engineering experience, focusing on data engineering in big-data environments.
4+ years of experience in developing and delivering production-grade Scala-based systems.
Proficiency using Spark (PySpark) or TensorFlow.
Experience building and optimizing large-scale data pipelines using Databricks and Spark.
Hands-on experience developing and deploying data solutions in a major cloud platform (AWS, GCP, or Azure).
Experience working with AI, LLMs, agents, and/or generative AI technologies.

Responsibilities

Create and maintain reliable and scalable distributed data processing systems.
Become a core maintainer of the data lake.
Ensure our data pipelines run 24/7.
Lead technical discussions for improvements.

Skills

Apache Spark and PySpark

AWS (and/or GCP, Azure)

Linux

NoSQL and SQL databases

Kafka

Scala

Neo4j

Databricks

TensorFlow

Parquet, Delta Lake

Airflow, Jenkins, Kinesis, MLeap, Sagemaker, Kubeflow, LLM, LLM/Generative AI

GitHub, Jira, Agile, Kanban

Education

Bachelor’s degree in Computer Science or a related discipline

Tools

Databricks

AWS

Apache Spark

TensorFlow

Remote Senior Data Engineer @ Varwise

Warszawa, Warsaw, Masovian Voivodeship, Polska

Role Overview

We are looking for Data Engineers to work remotely for an Adtech company that leverages machine learning and data science to build an identity graph that can scale to reach millions of users via brands with programmatically selected households. The work includes scaling our Big Data asset that combines billions of transaction data points—including intent, conversions, and first‑party data—into an identity graph that must scale to a future cookie‑less world.

This is a 100% remote position. You will be working with team members in NYC.

We value technical excellence and you will have both resources and time to deliver world‑class code. If you like solving hard and technically challenging problems, join us to use those skills here to create real‑time, concurrent, globally distributed system applications and services.

Responsibilities

Work on creating and maintaining reliable and scalable distributed data processing systems.
Become a core maintainer of the data lake.
Maintain our data lake by building searchable data sets for broader business uses.
Scale, troubleshoot, and fix existing applications and services.
Own a complex set of services and applications.
Ensure our data pipelines run 24/7.
Lead technical discussions leading to improvements in tools, processes, or projects.
Work on scaling our identity graph to deliver impactful advertising campaigns.
Work on data sets exceeding billions of records.
Work on AWS‑based infrastructure.
Scale our MLOps platform by using both traditional ML as well as LLM/Generative AI based applications.

Qualifications

8+ years of professional software engineering experience, with a focus on data engineering in big‑data environments.
4+ years of experience in developing and delivering production‑grade Scala based systems, familiarity with Python, and at least one other high‑level programming language (e.g., Java, C++, C#).
Proficiency in all aspects of SDLC, from concept to running production systems.
Proficiency using Spark (PySpark) or TensorFlow.
Proven experience building and optimizing large‑scale data pipelines using Databricks and Spark.
Experience participating in ETL and ML pipeline projects based on Airflow, Kubeflow, MLeap, SageMaker or similar.
Hands‑on experience developing and deploying data solutions in a major cloud platform (AWS, GCP, or Azure).
Experience working with AI, LLMs, agents, and/or generative AI technologies, both in product applications and for development productivity.
Database experience at large scale, both SQL and NoSQL databases such as PostgreSQL, Cassandra, Neo4j, Neptune, or similar.
Experience in large‑scale data management formats and frameworks such as Parquet, ORC, Databricks/Delta Lake, Iceberg, or Hudi.
Bachelor’s degree in Computer Science or a related discipline.

Required Technical Skills

Apache Spark and PySpark
AWS (and/or GCP, Azure)
Linux
NoSQL and SQL databases
Kafka
Scala
Neo4j
Databricks
TensorFlow
Parquet, Delta Lake
Airflow, Jenkins, Kinesis, MLeap, Sagemaker, Kubeflow, LLM, LLM/Generative AI
GitHub, Jira, Agile, Kanban

Benefits & Extras

Small teams, international projects, and a flat structure.
100% remote work always.
International team collaboration.
Free coffee, bike parking, playroom, free snacks, and free beverages.
Modern office, no dress code, in‑house trainings.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.