LLM Coding Benchmarks
A Folder from Kurt
Intro
OpenAI
↗
Anthropic
↗
Google
↗
DeepSeek
↗
Meta
↗
Llama 3.1
↗
HumanEval
openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"
↗
HumanEval Benchmark (Code Generation) | Papers With Code
↗
openai/openai_humaneval · Datasets at Hugging Face
↗
Agents
Debug like a Human
↗
AgentCoder Code Generation
↗
Copilot Workspace
↗
BigCodeBench
bigcode-project/bigcodebench: BigCodeBench: The Next Generation of HumanEval
↗
BigCodeBench: The Next Generation of HumanEval
↗
BigCodeBench Leaderboard - a Hugging Face Space by bigcode
↗
bigcode/bigcodebench · Datasets at Hugging Face
↗
LLM API Price and Perf.xlsx
↗
Legal Protection
DevDay Announcements
↗
Anthropic Terms Dec 2023
↗
Microsoft Copilot Commitment
↗
Generative AI Protection
↗
Meta won't release advanced AI in the EU due to stronger user data protections - UPI.com
↗
DeepSeek User Agreement
↗
Examples
ChatGPT
Claude
DeepSeek
Meta AI
Gemini
tf–idf - Wikipedia
↗