Crossing Hurdles logo

AI Quality Analyst (LLM)

Crossing Hurdles
Full-time
Remote
Worldwide
Automation Tester, SDET

About the job

Position: AI Model Evaluator (LLM & Agent Systems)

Type: Hourly contract

Compensation: $20–$30/hour

Location: Remote

Commitment: 10–40 hours/week

Role Responsibilities

  • Evaluate outputs from large language models and autonomous agent systems using defined rubrics and quality standards.
  • Review multi-step agent workflows, including screenshots and reasoning traces, to assess accuracy and completeness.
  • Apply benchmarking criteria consistently while identifying edge cases and recurring failure patterns.
  • Provide structured, actionable feedback to support model refinement and product improvements.
  • Participate in calibration sessions to ensure consistent evaluation alignment across reviewers.
  • Adapt to evolving guidelines and ambiguous scenarios with sound judgment.
  • Document findings clearly and communicate insights to relevant stakeholders.

Requirements

  • Strong experience in LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles.
  • Proficiency in rubric-based scoring, benchmarking frameworks, and AI quality assessment.
  • Excellent attention to detail with strong decision-making skills in ambiguous cases.
  • Proficient English communication skills (written and verbal).
  • Ability to work independently in a remote environment.
  • Comfortable committing to structured evaluation workflows and evolving guidelines.

Application Process (Takes 20 Min)

  • Upload resume
  • Interview (15 min)
  • Submit form