Overview
JOB SUMMARY
The Data Scientist will extract and analyze large datasets from a range of heterogenous business applications, most importantly the electronic medical record EMR. Using statistical, artificial intelligence and machine learning tools and algorithms, the data scientist will discover and offer insights into the data. They will design, develop, build and validate predictive, machine learning and AI models based on requirements and communicate the results in an understandable manner, and once approved by the business, the algorithms and models will be deployed into the production environment.
Responsibilities
- Collaborate with the business stakeholders to document the business requirements, anticipated benefits, and success criteria for analytics initiatives
- Collaborate with subject matter experts to define the use case and to prepare an exhaustive set of potential features for analysis
- Perform feature engineering for optimization and dimensionality reduction
- Perform advanced descriptive and diagnostic analytics (Exploratory Data Analysis)
- Perform Data Profiling
- Extract, process, clean, and validate quality datasets
- Assist data engineers in implementing data pipelines
- Conduct Experiment Design
- Identify multiple predictive, machine learning and AI algorithms that potentially fit the business case
- Write general, reusable code for repeatable analysis
- Evaluate and tune the models using various performance and error measures, hyperparameter tuning and cross validation techniques
- Select the final model based on performance
- Document and present the model findings to the business by explaining the model assumptions and validation methodologies
- Deploy the model by integrating it with the EMR or other business applications/systems
- Monitor the model for data and concept drift and retrain or tune the model as required
- Build visual interfaces with “what-if” functionality for business user interaction
Role Level Accountabilities
- Adheres to CCAD’s standards as they appear in the Code of Conduct and Conflict of Interest policies
- In view of the evolving needs and opportunities within CCAD, this position may be required to perform other duties as assigned and reporting relationships may vary.
Organization-Wide Competency Assessment Requirements
All employees will embrace the CCAD mission, vision and values and be responsible for adhering to the core values of the institution, including: Patient’s First, Collaboration, Mutual Respect, Quality, Patient Safety, Integrity, Cultural Sensitivity and Compassion. All employees are also expected to meet the standards of performance outlined in the Organization-Wide Competencies listed below as applied to the position. Customer Service Orientation includes attitude, behavior, interpersonal skill and problem solving that enable an employee to respond to internal and external customer needs and expectations in a positive manner. Adaptability including teamwork, flexibility needed to fulfill job responsibilities, adapting to changes in work environment and accepting supervisory feedback. Efficiency and Effectiveness includes quantity and quality of desired work, as well as organization skills necessary to perform successfully. Essential Job Requirements includes adherence to all relevant policies, procedures and guidelines affecting the work environment, as well as maintenance of required competencies and communication skills. Managerial Responsibilities includes overall accountability for assigned work group relative to operational goals, personnel requirements and budgetary constraints.
Education and Experience
- Education: Bachelors in a relevant field such as Statistics, Computer Science, Data Science, Mathematics, Physics; Masters in a relevant field such as Statistics, Computer Science, Data Science, Mathematics, Physics
- Experience: 5+ years; 3+ years
- Certification and Licensure
- Professional Membership
Job Specific Skills and Abilities
- Excellent oral and written communication skills
- Experience with Principal Component Analysis and Linear Discriminant Analysis
- Experience with Visual Analytics tools
- Experience with Correlations, Clustering, Imputation, De-skewing, Normalization and Outlier Analysis
- Experience with statistical analysis, Inferential statistics, hypothesis testing, Bayesian statistics, probability theory, Exploratory data analysis, Correlation - Regression analysis, design of experiments, sample size calculation etc.
- Experience with extracting data from database management systems
- Experience with transformation and cleaning of both structured and unstructured data
- Experience with Agile Software Development methodologies
- Experience with Software Engineering
- Experience in predictive modelling, Time series analysis, machine learning, neural networks, deep learning and reinforcement learning
- Deep knowledge of traditional ML concepts such as GLMs, GMMs, SVMs, random forest, regression, decision trees, and boosting
- Experience with AI models, TensorFlow, OpenAI, LLMs
- Experience with MLOps, LLMOPS, deploying models into production
- Experience with User Interface (UI) development
- Experience with the following software: Cloud Platforms: Fabric, Azure, AWS, GCP; Database Management Systems MS SQL Server, Azure Data Lake, Azure Synapse; Data Wrangling: MS SSIS, Azure Data Factory; Statistical Analysis Software: SAS, SPSS, Minitab; Version Control: GitHub; Platform: Databricks, Azure ML; Scripting Language: Python / R and libraries (Pandas, NumPy, Seaborn, Matplotlib); Query Language: SQL; Other Languages: Spark SQL; AI & ML Frameworks: TensorFlow, Keras, PyTorch, GANs, CNN, Caffe, Scikit-learn, Spark ML; Deployment Software: Docker; Visual Analytics: Power BI, Tableau, Minitab, Excel; UI Tools: Shiny, Power Apps