Description
Data System Reliability Engineer (dSRE)
Role Overview:
A crucial role in CME's Cloud data transformation, the data SRE will be aligned to data product pods ensuring that our data infrastructure is reliable, scalable, and efficient as the GCP data footprint expands rapidly.
Accountabilities:
- Automate data tasks on GCP
- Work with data domain owners, data scientists, and other stakeholders to ensure effective data consumption on GCP
- Design, build, secure, and maintain data infrastructure, including data pipelines, databases, data warehouses, and data processing platforms on GCP
- Measure and monitor the quality of data on GCP data platforms
- Implement robust monitoring and alerting systems to proactively identify and resolve issues in data systems. Respond to incidents promptly to minimize downtime and data loss.
- Develop automation scripts and tools to streamline data operations and make them scalable to accommodate growing data volumes and user traffic.
- Optimize data systems to ensure efficient data processing, reduce latency, and improve overall system performance.
- Collaborate with data and infrastructure teams to forecast data growth and plan for future capacity requirements.
- Ensure data security and compliance with data protection regulations. Implement best practices for data access controls and encryption.
- Collaborate with data engineers, data scientists, and software engineers to understand data requirements, troubleshoot issues, and support data-driven initiatives.
- Continuously assess and improve data infrastructure and data processes to enhance reliability, efficiency, and performance.
- Maintain clear and up-to-date documentation related to data systems, configurations, and standard operating procedures.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science or related field, or equivalent practical experience
- Experience as a Data Site Reliability Engineer or similar role, focusing on data infrastructure management
- Proficiency in data technologies, such as relational databases, data warehousing, big data platforms (e.g., Hadoop, Spark), data streaming (e.g., Kafka), and cloud services (e.g., AWS, GCP, Azure)
- Programming skills in Python, Java, or Scala, with automation and scripting experience
- Experience with containerization and orchestration tools like Docker and Kubernetes is a plus
- Experience with data governance, security, and compliance best practices
- Understanding of software development methodologies, version control (e.g., Git), and CI/CD pipelines
- Analytical and problem-solving skills with a proactive approach to issues
- Excellent communication and collaboration skills
- Background in cloud computing and data-intensive applications, especially Google Cloud Platform
- 3+ years in data engineering or data science
- Experience with data quality assurance and testing
- Knowledge of GCP data services (BigQuery, Dataflow, Data Fusion, Dataproc, Cloud Composer, Pub/Sub, Cloud Storage)
- Understanding of logging and monitoring tools such as Cloud Logging, ELK Stack, etc.
- Ability to learn new technologies, including open source and cloud-native offerings
- Knowledge of AI and ML tools is a plus
- Google Associate Cloud Engineer or Data Engineer certification is a plus
Responsibilities:
- Automate infrastructure provisioning using tools like Terraform and KCC
- Build self-service capabilities for development teams to improve time-to-market and promote cloud adoption
- Create CI/CD pipelines with Jenkins, Bitbucket/Git
- Create monitoring dashboards using Splunk, Prometheus, Grafana
- Write unit tests in Go/Python/Java
- Collaborate with a globally distributed team
CME Group: Where Futures Are Made
CME Group (www.cmegroup.com) is the world's leading derivatives marketplace. Here, you can impact markets worldwide, transform industries, and build a career shaping tomorrow. We invest in your success and you own it, working alongside leading experts. We embrace diversity and are an equal opportunity employer.