About the job
Position: AI Model Evaluator (LLM & Agent Systems)
Type: Hourly contract
Compensation: $20–$30/hour
Location: Remote
Commitment: 10–40 hours/week
Role Responsibilities
- Evaluate outputs from large language models and autonomous agent systems using defined rubrics and quality standards.
- Review multi-step agent workflows, including screenshots and reasoning traces, to assess accuracy and completeness.
- Apply benchmarking criteria consistently while identifying edge cases and recurring failure patterns.
- Provide structured, actionable feedback to support model refinement and product improvements.
- Participate in calibration sessions to ensure consistent evaluation alignment across reviewers.
- Adapt to evolving guidelines and ambiguous scenarios with sound judgment.
- Document findings clearly and communicate insights to relevant stakeholders.
Requirements
- Strong experience in LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles.
- Proficiency in rubric-based scoring, benchmarking frameworks, and AI quality assessment.
- Excellent attention to detail with strong decision-making skills in ambiguous cases.
- Proficient English communication skills (written and verbal).
- Ability to work independently in a remote environment.
- Comfortable committing to structured evaluation workflows and evolving guidelines.
Application Process (Takes 20 Min)
- Upload resume
- Interview (15 min)
- Submit form