About Dow Jones
Global Provider of News and Business Information :
We deliver high-quality content to consumers and organizations around the world across multiple formats.
Our news-gathering operations are one of the largest globally, producing unrivaled quality content for over 130 years.
Job Role : Data Scientist on AI Engineering Team
You will design, construct, and maintain machine learning pipelines tailored for various AI applications, focusing on Natural Language Processing.
Collaborate with our engineering team to integrate machine learning models, optimize performance, and ensure effective real-world application of machine learning solutions.
Responsibilities
- Maintain robust data pipelines supporting various ML models, focusing on information retrieval applications.
- Analyze and clean large datasets to optimize reusable ML models.
- Partner with stakeholders across the organization to translate business requirements into technical solutions.
- Utilize analytical skills for NLP modeling, algorithm selection, and POC development.
- Develop data enrichment pipelines to enhance insights aligned with strategic objectives.
- Collaborate with cross-functional teams to address the organization's data-driven needs, managing significant volumes of structured and unstructured data.
- Integrate diverse ML models into systems, ensuring interoperability and performance optimization.
- Lead efforts to optimize ML model performance through data analysis and validation in real-world applications.
Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related STEM field.
- At least 3 years of industrial experience in a data science or machine learning engineering role.
- Strong programming skills in Python and / or another high-level language commonly used in machine learning.
- Experience with NLP and Machine Learning frameworks and libraries (e.g., PyTorch, HuggingFace, LangChain, spaCy, NLTK, scikit-learn, etc.).
- Demonstrated understanding of various techniques for extracting structured data from unstructured sources, indicating expertise in information retrieval.
- Familiarity with LLMs APIs for pre-processing, fine-tuning, and deploying models on cloud-based infrastructure.
- Familiarity with cloud-based infrastructure and services (e.g., AWS, GCP, etc.).
- Experience with version control systems such as Git.