Core Skills & Knowledge Areas
- Python
- Data manipulation: Proficiency with libraries like pandas, numpy, and pyarrow.
- Scripting & automation: Writing reusable, modular scripts for data ingestion and transformation.
- APIs: Consuming and building RESTful APIs for data exchange.
- Testing: Unit testing with pytest or unittest.
- Cloud Platforms
- AWS / Azure / GCP: Familiarity with services like:
- AWS: S3, Lambda, Glue, Redshift, EMR
- Azure: Data Factory, Blob Storage, Synapse
- GCP: BigQuery, Cloud Functions, Dataflow
- Infrastructure as Code (IaC): Tools like Terraform or CloudFormation.
- Security & IAM: Managing access and permissions.
- Back-End Development
- Databases: SQL (PostgreSQL, MySQL), NoSQL (MongoDB, DynamoDB).
- APIs: Building data services using frameworks like Flask, FastAPI, or Django.
- CI/CD: Familiarity with Git, Docker, Jenkins, or GitHub Actions.
- ETL / ELT Pipelines
- Pipeline orchestration: Tools like Apache Airflow, Prefect, or Luigi.
- Data transformation: Using SQL, dbt, or Python scripts.
- Batch vs Streaming: Understanding of Kafka, Spark Streaming, or Flink.
- Monitoring & Logging: Ensuring data quality and pipeline reliability.
Tools & Technologies
- Programming: Python, SQL.
- Cloud: AWS, Azure, GCP.
- Orchestration: Airflow, Prefect.
- Databases: PostgreSQL, BigQuery, Redshift.
- Data Lakes: S3, Azure Data Lake.
- Containers: Docker, Kubernetes.
- Version Control: Git, GitHub/GitLab.
Soft Skills & Other Requirements
- Problem-solving: Ability to debug and optimize data workflows.
- Teamwork: Collaborating with Data Scientists, Analysts, and DevOps.
Some of our perks
- Fresh fruit sometimes, spoiled fruit all the time.
- You can work from anywhere, including your home.
- Flexible hours.
- Team lunches, Bday celebrations, happy hours.
- Wellness program and company retreats.
- English lessons.
- Courses and training.