Enable job alerts via email!

Junior Data Engineer - (Python/Web Scraping/Data Quality)

Madfish

Remote

GBP 30,000 - 45,000

Full time

Today

Be an early applicant

Generate a tailored resume in minutes

Land an interview and earn more. Learn more

Job summary

A data startup is looking for a motivated Junior Data Engineer to join their team. In this role, you will develop Python-based web scrapers to gather data, utilize various technologies for efficient data extraction and normalization, and ensure the reliability of data through validation pipelines. The ideal candidate is experienced in Python and web scraping, has a problem-solving attitude, and is eager to work in a collaborative and innovative remote environment.

Qualifications

Solid experience in Python, especially in building web scrapers.
Familiarity with libraries like Selenium, BeautifulSoup, or Scrapy.
Basic understanding of data validation, data cleaning, and monitoring best practices.

Responsibilities

Develop and maintain Python-based web scrapers to collect structured and unstructured data.
Package scrapers as Docker containers and deploy them to Kubernetes.
Set up Grafana dashboards to monitor pipeline health and data quality metrics.

Skills

Python

Web scraping

Data validation

Problem-solving

Tools

Selenium

Docker

Airflow

Grafana

Kubernetes

Pandas

Junior Data Engineer – (Python/Web Scraping/Data Quality)

We’re looking for a sharp, curious, and driven Junior Data Engineer to join our team at Forecasa, a U.S.-based data startup focused on delivering high-quality real estate data and analytics to lenders and investors.

In this role, you’ll be part of our Data Acquisition & Quality team, helping us scale and improve the systems that collect, validate, and monitor the data that powers our platform.

What You’ll Do

Develop and maintain Python-based web scrapers to collect structured and unstructured data from various sources.
Use tools like Selenium, BeautifulSoup, and Pandas and Pyspark to extract and normalize data efficiently.
Package scrapers as Docker containers and deploy them to Kubernetes.
Create and manage Airflow DAGs to orchestrate and schedule scraping pipelines.
Build data validation pipelines to catch anomalies, missing values, and data inconsistencies.
Set up Grafana dashboards to monitor pipeline health and data quality metrics.
Collaborate with senior engineers to continuously improve scraper reliability, performance, and coverage.

Our Tech Stack

Python • PySpark • Selenium • Airflow • Pandas • Postgres • S3 • Docker • Kubernetes • GitLab • Grafana

What We're Looking For

Solid experience in Python, especially in building web scrapers.
Familiarity with libraries like Selenium, BeautifulSoup, or Scrapy.
Some experience with Docker, Airflow, or other workflow orchestration tools.
Basic understanding of data validation, data cleaning, and monitoring best practices.
A resourceful, problem-solving mindset — you’re not afraid to dig into a messy site or debug a flaky scraper.

Bonus Points For

Experience working with Grafana or Prometheus for monitoring.
Exposure to cloud platforms (AWS preferred) and managing scrapers at scale.
Familiarity with CI/CD and Git workflows (we use GitLab).

About Us

Forecasa is a U.S.-based startup delivering enriched real estate transaction data to private lenders and investors. We’re a small, fast-moving team with a strong engineering culture and a mission to bring clarity and transparency to a fragmented market.

Location

Remote – we welcome candidates from anywhere in the world.

NOTE: Please make all e-mails and communications through the djinni website. Thank you.

Get your free, confidential resume review.

or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.

Top cities

Top companies

Popular jobs