research_reads
A Folder from Bhagyesh
Emergent Misalignment & Realignment — LessWrong
↗
MECHANISTICALLY ANALYZING THE EFFECTS OF FINETUNING ON PROCEDURALLY DEFINED TASKS
↗
Self-Fulfilling Misalignment Data Might Be Poisoning Our AI Models
↗
Did Claude 3 Opus align itself via gradient hacking? — LessWrong
↗
Alien Mind Study
↗
Anthropic Safety Research
↗
Active Inference Tutorial
↗
Ambitious Language Model Evaluations
↗
Streetlight to Shadows
↗
Measurement Misconceptions
↗
base-GPT-4 Impressions
↗
Science of Evals
↗
Tiling Agents Draft
↗
Collaborators Needed
↗
Tiling Agents AI
↗
Simulators
↗
Löb's Theorem Cartoon
↗
Pando AI Individuality
↗
LLM Coherentization
↗
Do you need consciousness to matter? On LLMs and moral relevance — LessWrong
↗
The Case for Suffering-Focused Ethics – Center on Long-Term Risk
↗
Gains from Trade through Compromise – Center on Long-Term Risk
↗
janus — LessWrong
↗
Cooperationism Moral Framework
↗
Chalmers Status Update
↗