Chatbot Arena: Elo Rating Calculation (July 17, 2023) - Colab - 0 views
-
Janos Haits on 19 Apr 24"n this notebook, we will employ the Elo rating system to evaluate the performance of large language models (LLMs). The analysis is based on the pairwise battle results we collected from https://arena.lmsys.org between April 24 and July 17, 2023. This crowdsourcing way of data collection represents some use cases of LLMs in the wild. Below, we present the calculation procedure along with some basic analyses. To view the latest leaderboard, see https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard."