LLM Coding Benchmarks

LLM Coding Benchmarks

A Folder from Kurt

Intro

HumanEval

openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"

HumanEval Benchmark (Code Generation) | Papers With Code

openai/openai_humaneval · Datasets at Hugging Face

Agents

Debug like a Human

AgentCoder Code Generation

Copilot Workspace

BigCodeBench

bigcode-project/bigcodebench: BigCodeBench: The Next Generation of HumanEval

BigCodeBench: The Next Generation of HumanEval

BigCodeBench Leaderboard - a Hugging Face Space by bigcode

bigcode/bigcodebench · Datasets at Hugging Face

LLM API Price and Perf.xlsx

Legal Protection

DevDay Announcements

Anthropic Terms Dec 2023

Microsoft Copilot Commitment

Generative AI Protection

Meta won't release advanced AI in the EU due to stronger user data protections - UPI.com

DeepSeek User Agreement

Examples

tf–idf - Wikipedia