Chai is running their own open source leaderboard

noneabove1182@sh.itjust.works · edit-2 1 year ago

Chai is running their own open source leaderboard

noneabove1182@sh.itjust.works · 1 year ago

Yeah it’s a step in the right direction at least, though now that you mention it doesn’t lmsys or someone do the same with human eval and side by side comparisons?

It’s such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)