Resources for Measure what Matters: Psychometric Evaluation of AI with Situational Judgment Tests)(https://arxiv.org/abs/2510.22170)
AI & ML interests
We work with you to develop a high impact AI strategy for your industry, refine your data foundations and design meaningful human-AI interactions. We also empower you to develop, integrate and test the latest AI technologies responsibly.
Recent Activity
View all activity
models 16
thoughtworks/arithmetic-sorl
Updated • 1
thoughtworks/Qwen3-Coder-Next-Eagle3-exp-e
Text Generation • Updated • 1
thoughtworks/MiniMax-M2.5-Eagle3
Text Generation • 0.2B • Updated • 480
thoughtworks/GLM-4.7-FP8-Eagle3-exp-e
Text Generation • Updated
thoughtworks/GLM-4.7-Flash-Eagle3
Text Generation • 0.1B • Updated • 372 • 2
thoughtworks/Gemma-4-31B-Eagle3
Text Generation • 0.6B • Updated • 908 • 2
thoughtworks/arithmetic-sorl-saes
Updated
thoughtworks/DeepSeek-R1-Distill-Qwen-14B-Eagle3
Text Generation • Updated • 308
thoughtworks/DeepSeek-R1-Distill-Qwen-7B-Eagle3
Text Generation • Updated • 351
thoughtworks/Qwen2.5-7B-Instruct-Eagle3
Text Generation • Updated • 318
datasets 13
thoughtworks/arithmetic-sorl-data
Viewer • Updated • 1.02M • 87
thoughtworks/ablation_psychometrics_personas
Viewer • Updated • 500 • 21
thoughtworks/gemma_psychometrics_personas_responses
Viewer • Updated • 3.98M • 169 • 1
thoughtworks/psychometric_personas
Viewer • Updated • 23.6k • 55
thoughtworks/psychometric_sjts_analysis
Viewer • Updated • 1.85k • 26
thoughtworks/psychometric_personas_responses
Viewer • Updated • 4.57M • 40 • 1
thoughtworks/CulturalCounterfactuals
Updated • 7
thoughtworks/psychometric_human_annotations
Viewer • Updated • 55 • 10
thoughtworks/parliamentary_personas
Viewer • Updated • 2.2k • 6
thoughtworks/psychometric_personas_temp
Viewer • Updated • 50 • 14