Text from them:
Calling all model makers, or would-be model creators! Chai asked me to tell you all about their open source LLM leaderboard:
Chai is running a totally open LLM competition. Anyone is free to submit a llama based LLM via our python-package 🐍 It gets deployed to users on our app. We collect the metrics and rank the models! If you place high enough on our leaderboard you’ll win money 🥇
We’ve paid out over $10,000 in prizes so far. 💰
Come to our discord and check it out!
Link to latest board for the people who don’t feel like joining a random discord just to see results:
https://cdn.discordapp.com/attachments/1134163974296961195/1138833170838589471/image1.png
Yeah it’s a step in the right direction at least, though now that you mention it doesn’t lmsys or someone do the same with human eval and side by side comparisons?
It’s such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)