About the job
About Us:
Steer Health helps healthcare organizations improve patient access, reduce operational burden, and recover revenue through AI-native workflow automation. Our lead product, Luna AI, acts as a voice-based digital workforce, handling patient access workflows such as scheduling, intake, and follow-up. We sit on top of existing EHR infrastructure and focus on measurable operational outcomes.
About The Role
We are looking for a Senior QA Engineer who thrives at the intersection of AI, voice automation, and cloud-native systems. You will own quality across our platform — from testing LLM-powered features and voice pipelines to ensuring robust end-to-end coverage on GCP infrastructure. You will work closely with product, engineering, and AI teams to embed quality from the ground up.
Requirements
- Design, build, and maintain automated test suites using Playwright for web and API surfaces, including AI-generated content flows.
- Lead QA strategy for voice automation pipelines built on ElevenLabs — developing test cases for synthesis quality, latency, and failure modes.
- Validate Claude (Anthropic) integrations: prompt-response accuracy, edge case handling, safety behaviors, and output consistency across builds.
- Build and maintain Node.js-based test tooling, harnesses, and custom reporters for CI/CD pipelines.
- Deploy, monitor, and triage test infrastructure on Google Cloud Platform — leveraging Cloud Run, GCS, and Pub/Sub for scalable test execution.
- Define and track quality metrics: test coverage, flakiness rates, mean-time-to-detect, and regression velocity.
- Collaborate with engineers during design reviews to surface testability gaps and advocate for observable, fault-tolerant system design.
- Mentor junior QA engineers and establish team-wide standards for test authoring, review, and maintenance
Required Qualifications
- 5+ years of QA engineering experience, with at least 2 years on systems that include LLMs, AI APIs, or speech/audio pipelines.
- Expert-level Playwright skills — authoring resilient selectors, managing parallel workers, and debugging flaky tests at scale.
- Proficient Node.js developer — comfortable writing custom test runners, CLI tooling, and service mocks in TypeScript/JavaScript.
- Hands-on GCP experience: deploying workloads to Cloud Run or GKE, querying logs in Cloud Logging, configuring artifact storage in GCS.
- Familiarity with ElevenLabs or comparable TTS/voice APIs — understanding synthesis parameters, webhook flows, and audio quality evaluation.
- Practical experience testing Claude or other LLMs — designing determinism-aware test strategies, evaluating prompt regressions, and building evals.
- Strong understanding of REST, WebSocket, and gRPC protocols for API-level testing.
- Experience integrating test suites into CI/CD pipelines (GitHub Actions, Cloud Build, or similar).
- Nice to Have
- Experience writing custom LLM evals or using evaluation frameworks such as PromptFoo or Braintrust.
- Background in audio signal quality assessment or speech intelligibility testing.
- Familiarity with observability tooling: OpenTelemetry, Datadog, or GCP Cloud Monitoring.
- Knowledge of accessibility testing standards (WCAG 2.1) and assistive technology compatibility
Core Technology Stack:
Google Cloud Platform (GCP)
- Cloud Run, GCS, Pub/Sub, Cloud Logging, GKE for scalable test infrastructure
ElevenLabs — Voice Automation
TTS pipeline testing, synthesis quality evaluation, webhook and latency validation
- Node.js / TypeScript Custom test runners, service mocks, CLI tooling, and CI/CD integration
- Playwright End-to-end and API-level browser automation with parallel execution
- Claude (Anthropic) LLM integration QA, prompt regression testing, and output evaluation
Benefits
- Competitive base salary commensurate with experience
- High-autonomy environment with direct access to executive leadership
- Structured operating cadence with clear goals, metrics, and career growth targets
- Work that touches 19M+ patients — the mission is real
Flexible PTOs policy