Level 7
Evaluator
Designs evals; ships only when quality bar is met.
Agnostic
0/11 live · 1 mapped artefact
Golden test sets - build, version, maintain
soon20 min · must
Offline evals - accuracy, precision, win-rate
soon15 min · must
Online evals - A/B, thumbs, telemetry
soon12 min · must
Regression testing - when the model updates
soon15 min · must
The cost-latency-quality triangle
soon10 min · must
Red teaming your LLM app
soon15 min · must
Practical bias and fairness audits
soon12 min · compete
Fine-tuning - when and how
soon15 min · compete
LLMOps - prompt versioning and CI
soon12 min · compete
Evaluating agents (not just chat)
soon12 min · compete
Building a golden dataset from real traffic
soon10 min · compete
Vendor
0/14 live · 0 mapped artefacts
Evals on Microsoft - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · Microsoft
Evals on Google - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · Google
Evals on Anthropic - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · Anthropic
Evals on OpenAI - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · OpenAI
Evals on AWS - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · AWS
Evals on Salesforce - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · Salesforce
Evals on open-source - Foundry, Vertex Eval, Anthropic console, OpenAI evals, OSS
soon15 min · must · open-source
Eval tools of the month for Microsoft
soon8 min · stay-ahead · Microsoft
Eval tools of the month for Google
soon8 min · stay-ahead · Google
Eval tools of the month for Anthropic
soon8 min · stay-ahead · Anthropic
Eval tools of the month for OpenAI
soon8 min · stay-ahead · OpenAI
Eval tools of the month for AWS
soon8 min · stay-ahead · AWS
Eval tools of the month for Salesforce
soon8 min · stay-ahead · Salesforce
Eval tools of the month for open-source
soon8 min · stay-ahead · open-source
Industry
0/20 live · 0 mapped artefacts
Eval rubrics that matter in financial services
soon12 min · must · financial services
Eval rubrics that matter in healthcare
soon12 min · must · healthcare
Eval rubrics that matter in manufacturing
soon12 min · must · manufacturing
Eval rubrics that matter in retail and e-commerce
soon12 min · must · retail and e-commerce
Eval rubrics that matter in professional services
soon12 min · must · professional services
Eval rubrics that matter in construction
soon12 min · must · construction
Eval rubrics that matter in legal
soon12 min · must · legal
Eval rubrics that matter in education
soon12 min · must · education
Eval rubrics that matter in public sector
soon12 min · must · public sector
Eval rubrics that matter in small and mid-sized businesses
soon12 min · must · small and mid-sized businesses
Bias-audit template for financial services
soon10 min · compete · financial services
Bias-audit template for healthcare
soon10 min · compete · healthcare
Bias-audit template for manufacturing
soon10 min · compete · manufacturing
Bias-audit template for retail and e-commerce
soon10 min · compete · retail and e-commerce
Bias-audit template for professional services
soon10 min · compete · professional services
Bias-audit template for construction
soon10 min · compete · construction
Bias-audit template for legal
soon10 min · compete · legal
Bias-audit template for education
soon10 min · compete · education
Bias-audit template for public sector
soon10 min · compete · public sector
Bias-audit template for small and mid-sized businesses
soon10 min · compete · small and mid-sized businesses
Custom
2/2 live · 0 mapped artefacts