About the job
Location: Bangalore (Office)
Experience: 4–5 years
Role Type: Full-time
About TestZeus
TestZeus is pioneering the next generation of AI-powered software testing. We’re the team behind Hercules, the world’s first open-source testing agent. By combining large language models, multi-agent orchestration, and state-of-the-art retrieval pipelines, we deliver autonomous, zero-maintenance testing for web and API workloads. We are seeking an AI Engineer (4–5 years experience) who can design, build, and scale production-grade LLM systems. This is a hands-on product role where you’ll work closely with backend, frontend, and product teams to ship quickly, test with real users, and iterate in production. If you enjoy building RAG systems, prompt workflows, and agentic evaluation that solve real business problems, this role is for you.
Key Responsibilities
1. LLM Workflow & Prompt Engineering
- Design, build, and maintain LLM workflows that score freeform answers, generate contextual feedback, and assist users in real time.
- Develop and refine prompt templates, chains, and tools to improve relevance, reduce token usage, and mitigate hallucinations.
- Implement multi-step prompt workflows for tasks like mock interviews, code reviews, and automated test guidance.
2. Retrieval & RAG Pipelines
- Build and optimise retrieval-augmented generation (RAG) pipelines using vector stores (e.g., Pinecone, Weaviate, Elasticsearch).
- Implement embedding generation, similarity search, and dynamic context selection that reduce hallucinations and improve answer quality.
- Ensure low-latency, high-accuracy retrieval combined with LLM generation for highly personalised user experiences.
3. LLM Evaluation & Analytics
- Define and implement evaluation frameworks for LLM outputs: accuracy, consistency, bias, interpretability, and robustness.
- Build automated evaluation pipelines that monitor performance over time, detect regression, and flag failure modes.
- Instrument systems with metrics and logging to understand model behaviour in production and drive data-informed decisions.
4. Agent-Based System Development
- Build tool-augmented agents capable of evaluating coding, system design, or reasoning questions using frameworks like LangChain, AutoGen, LlamaIndex, or similar.
- Design and experiment with agent orchestration patterns (multi-agent workflows, planners, evaluators) to improve multi-step reasoning and reliability.
- Integrate agents with external tools and APIs (code execution, documentation search, test runners, etc.) to extend capabilities.
5. Cross-Functional Collaboration & Product Impact
- Partner with backend engineers (Go, FastAPI) and frontend engineers (React) to ship features end-to-end.
- Work closely with product managers and designers to shape user journeys, gather feedback, and iterate quickly in production.
- Participate in agile ceremonies (standups, sprint planning, retrospectives) and provide clear, consistent status updates.
6. Research, Experimentation & Innovation
- Stay current with state-of-the-art LLM and retrieval research, benchmarks, and open-source tools.
- Rapidly prototype new ideas (e.g., advanced retrieval strategies, custom fine-tuning flows, new evaluation methods) and demonstrate feasibility.
- Contribute to internal best practices, playbooks, and reusable components for LLM and agent development.
Qualifications & SkillsEducation
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent professional experience.
Experience
- 4–5 years of professional software engineering experience, with a strong focus on Python.
- At least 2 years of hands-on experience building and deploying LLM-powered applications in production (beyond toy projects).