Oglasi za posao AI Engineer - Agent Evaluation Platform (Ona)
Oglas je preuzet sa sajta poslodavca i sajt HelloWorld ne garantuje njegovu ažurnost.

AI Engineer - Agent Evaluation Platform (Ona)

Hyperskill

Remote

16.01.2026.

intermediate

About The Product

Everyone's building AI agents now, but here's the problem: nobody really knows if their agents are actually working well.

Sure, you can see that your agent completed a task, but did it solve the user's actual problem? Did it deliver real business value, or just go through the motions? Right now, most people test their agents manually, which doesn't scale and isn't reliable.

The Agent Evaluation Platform (the name is to be defined) will automatically evaluate agent performance — not just "did it finish the task" but "did it achieve the outcome the user actually wanted." Think of it like Langfuse, but instead of testing individual prompts, we're evaluating entire agent workflows, complex chains of actions, and multi-agent systems.

This is especially important as companies start paying for agents based on outcomes rather than just usage. You need to know your agent is actually delivering value, not just burning through API calls.

What You'll Do

As an AI Engineer, you'll build the technical infrastructure for comprehensive agent evaluation. This means creating systems that can automatically test agent performancebuilding tools for managing evaluation datasets, and implementing both deterministic tests and non-deterministic evaluationYou'll also work on making this scale — evaluation systems that can handle enterprise workloads and provide reliable insights about agent performance.

Who You Are

  • You have deep AI engineering experience — you've built AI systems, deployed them in production, and dealt with the challenge of measuring their real-world performance
  • You understand evaluation platforms — you've worked with tools like Langfuse and know the current limitations of AI testing
  • You've built evaluation systems — you've created tools that measure AI system quality and can distinguish between technical functionality and user value
  • You thrive in uncertainty — you'll need to build a lot, figure things out on the go, experiment constantly, and handle multiple different tasks across various areas simultaneously.

What We Offer

  • Contractor agreement with a US-registered legal entity.
  • 100% remote — work from anywhere in the world
  • Competitive salary in USD + options in the product you're working on — we focus on market rates, ready to hear your expectations and prepare an offer matching your expertise
  • Resources — budget for tools, learning, and whatever you need to succeed
  • Fast-moving environment — we ship fast, learn fast, and iterate based on real customer feedback

Upload your resume and tell us a few words about yourself — we’d love to hear from you!

Preporuke se učitavaju...