research_reads

research_reads

A Folder from Bhagyesh

Emergent Misalignment & Realignment — LessWrong

MECHANISTICALLY ANALYZING THE EFFECTS OF FINETUNING ON PROCEDURALLY DEFINED TASKS

Self-Fulfilling Misalignment Data Might Be Poisoning Our AI Models

Did Claude 3 Opus align itself via gradient hacking? — LessWrong

Alien Mind Study

Anthropic Safety Research

Active Inference Tutorial

Ambitious Language Model Evaluations

Streetlight to Shadows

Measurement Misconceptions

base-GPT-4 Impressions

Science of Evals

Tiling Agents Draft

Collaborators Needed

Tiling Agents AI

Löb's Theorem Cartoon

Pando AI Individuality

LLM Coherentization

Do you need consciousness to matter? On LLMs and moral relevance — LessWrong

The Case for Suffering-Focused Ethics – Center on Long-Term Risk

Gains from Trade through Compromise – Center on Long-Term Risk

janus — LessWrong

Cooperationism Moral Framework

Chalmers Status Update