About the job
Job Title
AI Model Quality Engineer – Healthcare (Wellness & Digital Health)
Location: Bangalore
Experience
6 - 12+ years overall experience
3+ years in ML / AI model testing, validation, or governance
Role Overview
We are seeking an AI Model Quality Engineer to ensure trustworthiness, safety, reliability, and regulatory compliance of AI/ML models used in our platform.
This role sits at the intersection of Quality Engineering, Data Science, Responsible AI, and Healthcare Compliance.
You will serve as the independent validation authority within the Model Development Life Cycle (MDLC). Functioning as a strategic partner to—but distinct from—the Model Development team, you will ensure strict separation of concerns by owning the objective validation gates between "ML Model Testing" and "Clinical Handoff". You will be responsible for certifying that models meet all defined acceptance criteria before they are exposed to clinical evaluators or production traffic.
Key Responsibilities
1. Model Quality, Validation, & "Shift Left" Strategy
- Requirements Definition: Partner with Clinical and Product teams during the Requirement Gathering phase to translate clinical goals into quantifiable technical acceptance criteria (e.g., defining specific thresholds for False Negatives, bias limits, or hallucination rates).
- Golden Set Management: Create, maintain, and secure "Golden Datasets" and "Adversarial Test Sets" that are isolated from the training process, ensuring true out-of-sample validation.
- Test Execution: Design and execute test strategies for AI/ML models (predictive, classification, recommendation, and GenAI/LLM-based systems).
- Comprehensive Validation: Validate model behavior across accuracy, bias, robustness, stability, and safety using independent test pipelines.
2. Engineering & Clinical Handoffs
- API Contract Validation: Validate model artifacts (containers, APIs) during the Non-production Model Deploy phase to ensure latency, throughput, and error handling meet Engineering Service Level Agreements (SLAs) before handoff.
- Clinical Pre-Screening: Execute automated "Red Teaming" and safety checks (e.g., PII leakage, hallucination triggers) to sanitize models before they enter the Clinical Testing phase, maximizing the efficiency of Clinical SME time.
- Release Gating: Own the final "Go/No-Go" decision for model promotion based on pre-defined quality gates.
3. Drift Detection, Monitoring & Feedback Loops
- Automated Feedback Loops: Design and implement automated pipelines that trigger Retraining Feedback Loops when post-deployment monitoring detects drift or performance degradation
- Drift Mechanisms: Design and test data drift, concept drift, and prediction drift detection mechanisms
- Data Sanity Checks: Implement upstream validation to catch data quality issues (missing values, schema changes) before model training begins.
- Alerting: Define thresholds, alerts, and runbooks for drift remediation and rollback strategies
4. Explainability & Transparency
- Validate model explainability using tools such as SHAP, LIME, Integrated Gradients, or equivalent.
- Ensure explanations are clinically meaningful, auditable, and regulator-ready
- Partner with Data Science teams to define explainability acceptance criteria
5. Hallucination & GenAI Safety (if LLMs are used)
- Design test scenarios to detect hallucinations, unsafe outputs, and medical misinformation
- Validate grounding mechanisms (RAG, citations, confidence scoring
- Ensure models do not generate diagnostic or treatment advice beyond the approved scope.
6. Healthcare Compliance & Responsible AI
- Ensure AI systems align with HIPAA (PHI protection), FDA SaMD principles, and Responsible AI guidelines.
- Participate in model risk reviews, audits, and governance forums.
- Maintain documentation for model cards, data sheets, and audit artifacts.
Required Qualifications
- Core Skills: Strong background in Quality Engineering / Validation / Test Automation.
- Hands-on experience testing ML models and/or LLM-basedsystems.
- Solid understanding of: Model lifecycle (training → validation → deployment → monitoring), Bias, fairness, drift, overfitting, and calibration. Experience with Python, notebooks, and ML testing frameworks.
AI / ML
- KnowledgeFamiliarity with: Supervised & unsupervised ML models
- Model explainability techniques (SHAP, LIME, etc.)Evaluation metrics for ML and GenAI systems
- ML testing frameworks Experience with model monitoring tools, or custom