The way we measure progress in AI is terrible

November 26, 2024
Every time a new AI model is released, it’s typically touted as acing its performance against a series of benchmarks. OpenAI’s GPT-4o, for example, was launched in May with a compilation of results that showed its performance topping every other AI company’s latest model in several tests. The problem is that these benchmarks are poorly…