Aktiviere Job-Benachrichtigungen per E-Mail!

Data Engineer (m/f/d)

Cyber Insight GmbH

Leipzig

Hybrid

Vertraulich

Vollzeit

Vor 10 Tagen

Zusammenfassung

A growing cybersecurity startup is seeking a hands-on Data Engineer to build reliable and secure data systems. In this role, you will design data pipelines, process cybersecurity data sources, and collaborate with AI teams to enhance risk assessment models. Ideal candidates have 3+ years in data engineering and strong Python skills, along with a security-focused mindset. This role offers flexible hours and exposure to cutting-edge technologies in AI and cybersecurity.

Leistungen

Flexible working hours
Remote-friendly setup
Cutting-edge technology exposure
Collaborative environment

Qualifikationen

  • 3+ years of experience in data engineering or cybersecurity data processing.
  • Strong Python skills and experience with pandas or PySpark.
  • Proven experience with data orchestration frameworks.
  • Solid understanding of data modeling and SQL optimization.
  • Familiarity with CVE data structures or vulnerability databases.

Aufgaben

  • Design, build, and maintain data pipelines and ETL workflows.
  • Ingest and process cybersecurity-relevant data sources.
  • Develop and maintain transformation logic and data models.
  • Implement and automate data validation and quality assurance.
  • Collaborate with AI teams to prepare data for analytics models.

Kenntnisse

Python
Data Engineering
Data Orchestration
ETL pipelines
Data Modeling
SQL Optimization
BigQuery
Data Testing
Security Awareness

Tools

Pandas
Airflow
Docker
Terraform
Prometheus
Jobbeschreibung

At Cyber Insight, we are building the next generation of AI-driven platforms for IT security and risk management. Our mission is to empower companies to gain deep insights into their IT landscapes and proactively mitigate risks in an increasingly complex digital world.

As a fast-growing startup, we combine expertise in cybersecurity, data engineering, and artificial intelligence to deliver solutions that automate risk assessments, predict potential threats, and help organizations stay ahead of evolving cyber risks. Our team thrives on innovation, collaboration, and a shared passion for making a real impact in the cybersecurity space.

We are looking for a hands-on Data Engineer who is passionate about building reliable, scalable, and secure data systems. You’ll help shape our data architecture and pipelines that feed our AI models and risk assessment engines — including the crucial task of mapping vulnerabilities (CVEs) to specific software and system components.

Tasks
  • Design, build, and maintain data pipelines and ETL/ELT workflows across GCP and on-prem environments.
  • Ingest and process cybersecurity-relevant data sources such as CVE feeds, software inventories, vulnerability databases, and event logs.
  • Develop and maintain transformation logic and data models linking vulnerabilities (CVEs) to affected software and assets.
  • Implement and automate data validation, consistency checks, and quality assurance using tools like Great Expectations or Deequ.
  • Collaborate with AI and graph modeling teams to structure and prepare data for threat intelligence and risk quantification models.
  • Manage and optimize data storage using BigQuery, PostgreSQL, and Cloud Storage, ensuring scalability and performance.
  • Automate data workflows and testing through CI/CD pipelines (GitHub Actions, GCP Cloud Build, Jenkins).
  • Implement monitoring and observability for pipelines using Prometheus, Grafana, and OpenTelemetry.
  • Apply a security-focused mindset in data handling, ensuring safe ingestion, processing, and access control of sensitive datasets.
Requirements

3+ years of experience in data engineering, backend data systems, or cybersecurity data processing.

  • Strong Python skills and experience with pandas, PySpark, or Dask for large-scale data manipulation.
  • Proven experience with data orchestration and transformation frameworks (Airflow, dbt, or Dagster).
  • Solid understanding of data modeling, data warehousing, and SQL optimization and ETL pipelines (Kafka).
  • Familiarity with CVE data structures, vulnerability databases (e.g. NVD, CPE, CWE), or security telemetry.
  • Experience integrating heterogeneous data sources (APIs, CSV, JSON, XML, or event streams).
  • Knowledge of GCP data tools (BigQuery, Pub/Sub, Dataflow, Cloud Functions) or equivalent in Azure/AWS.
  • Experience with containerized environments (Docker, Kubernetes) and infrastructure automation (Terraform or Pulumi).
  • Understanding of data testing, validation, and observability practices in production pipelines.
  • A structured and security-aware approach to building data products that support AI-driven risk analysis.

Nice to Have

  • Experience working with graph databases (Neo4j, ArangoDB) or ontology-based data modeling.
  • Familiarity with ML pipelines (Vertex AI Pipelines, MLflow, or Kubeflow).
  • Understanding of software composition analysis (SCA) or vulnerability scanning outputs (e.g. Trivy, Syft).
  • Background in threat intelligence, risk scoring, or cyber risk quantification.
  • Experience in multi-cloud or hybrid setups (GCP, Azure, on-prem).
Benefits
  • Freedom to design and shape a modern, secure data platform from the ground up.
  • A collaborative startup environment where your work directly supports AI and cybersecurity products.
  • Flexible working hours and remote-friendly setup.
  • Exposure to cutting-edge technologies in AI, data engineering, and cyber risk analytics.
  • Competitive salary and benefits tailored to your experience.

We are looking forward to meet you!

Hol dir deinen kostenlosen, vertraulichen Lebenslauf-Check.
eine PDF-, DOC-, DOCX-, ODT- oder PAGES-Datei bis zu 5 MB per Drag & Drop ablegen.