Dubai
Remote
AED 300,000 - 400,000
Full time
Job summary
A healthcare technology company is seeking a Data Architect with extensive experience in AWS to lead the data architecture strategy. The ideal candidate has 8-12 years in data architecture, with a strong focus on healthcare data standards. Responsibilities include overseeing data lake architecture and collaborating with data science teams for AI initiatives. This is a full-time, remote independent contractor position available during standard US business hours.
Qualifications
- 8–12+ years of experience in data architecture with 3–5 years in a technical leadership role.
- Proven experience architecting AWS-based data lakes and analytics pipelines.
- Deep understanding of healthcare data standards (FHIR, HL7).
Responsibilities
- Define and implement an enterprise-wide data architecture strategy.
- Lead the evolution of AWS-based data lake architecture.
- Collaborate with data science to implement AI/ML solutions.
Skills
Data architecture
AWS
Data security
Healthcare data standards
Team leadership
Education
Bachelor's or Master's in Computer Science or Data Engineering
Tools
Key Responsibilities
- Strategic Data Platform Leadership: Define and implement an enterprise-wide data architecture strategy that supports interoperability, AI / ML readiness, and regulatory compliance
- Lead the evolution of our AWS-based data lake architecture, supporting structured, semi-structured, and unstructured data types—especially FHIR-formatted JSON healthcare data
- Cloud Data Lake & Storage Optimization: Design and maintain scalable, secure, and cost-effective data lakes using Amazon S3, AWS Glue, Athena, Redshift, and Lake Formation
- Leverage Mountpoint for S3 to enable high-performance, POSIX-compliant access to S3 objects, including vectorized data files
- Optimize data storage and retrieval strategies for performance and cost-efficiency, including partitioning, file formats (e.g., Parquet, ORC), and compression techniques
- AI / ML Enablement and Vector Infrastructure: Collaborate with data science teams to implement embedding models, vectorization pipelines, and real-time inference architectures
- Design and manage vector storage systems (e.g., S3-based, FAISS, Pinecone, or Amazon OpenSearch) to support semantic search, retrieval-augmented generation (RAG), and intelligent data access
- Ensure vectorized data pipelines are aligned with model training, evaluation, and deployment strategies
- Healthcare Data Architecture & Interoperability: Architect systems to ingest, process, and store FHIR-compliant JSON data from EHRs, APIs, and HL7 sources
- Ensure conformance with healthcare interoperability standards and optimize for queryability and downstream analytics
- Implement data normalization and enrichment pipelines for use in both clinical and operational contexts
- Security, Compliance & Governance: Lead efforts to ensure data security at rest and in transit using AWS-native encryption, IAM, VPC controls, and bucket policies
- Implement and manage data access controls, audit logging, and role-based security models across AWS environments
- Oversee data governance including lineage, cataloging, and stewardship with tools such as AWS Glue Data Catalog, Lake Formation, or third-party platforms
- Team Leadership & Cross-Functional Collaboration: Build and lead a high-performing team of data architects and engineers
- Work closely with stakeholders from engineering, data science, product, and compliance teams to deliver data initiatives
Qualifications
- Bachelor’s or Master’s in Computer Science, Data Engineering, or related field
- 8–12+ years of experience in data architecture with 3–5 years in a technical leadership role
- Proven experience architecting AWS-based data lakes and analytics pipelines
- Deep understanding of healthcare data standards (FHIR, HL7) and working with FHIR JSON objects in large-scale systems
- Expertise with embedding and vectorization models, semantic search, and managing vector storage solutions
- Hands-on experience with Amazon S3, Mountpoint for S3, and optimizing S3-based workloads for performance and cost
- Strong background in data security, encryption, access control, and compliance frameworks (HIPAA, HITRUST)
- Preferred Qualifications
- AWS certifications (e.g., AWS Certified Big Data or Data Analytics – Specialty)
- Familiarity with open-source vector databases (e.g., FAISS, Weaviate) and MLOps pipelines
- Experience in clinical systems integration, claims processing, or population health analytics
Other
- This is an independent contractor position.
- Job Type : Full-time
- Location : Remote
- Hours : Available during standard US business hours (9am-5pm EST or 8 : 30am-4 : 30pm EST)
- This job description is intended to describe the general requirements for the position.
- It is not a complete statement of duties, responsibilities or requirements.
- Other duties not listed here may be assigned as necessary to ensure proper operations of the department.