research_reads
A Folder from Bhagyesh
Favicon
Emergent Misalignment & Realignment — LessWrong
↗
Favicon
MECHANISTICALLY ANALYZING THE EFFECTS OF FINETUNING ON PROCEDURALLY DEFINED TASKS
↗
Favicon
Self-Fulfilling Misalignment Data Might Be Poisoning Our AI Models
↗
Favicon
Did Claude 3 Opus align itself via gradient hacking? — LessWrong
↗
Favicon
Alien Mind Study
↗
Favicon
Anthropic Safety Research
↗
Favicon
Active Inference Tutorial
↗
Favicon
Ambitious Language Model Evaluations
↗
Favicon
Streetlight to Shadows
↗
Favicon
Measurement Misconceptions
↗
Favicon
base-GPT-4 Impressions
↗
Favicon
Science of Evals
↗
Favicon
Tiling Agents Draft
↗
Favicon
Collaborators Needed
↗
Favicon
Tiling Agents AI
↗
Favicon
Simulators
↗
Favicon
Löb's Theorem Cartoon
↗
Favicon
Pando AI Individuality
↗
Favicon
LLM Coherentization
↗
Favicon
Do you need consciousness to matter? On LLMs and moral relevance — LessWrong
↗
Favicon
The Case for Suffering-Focused Ethics – Center on Long-Term Risk
↗
Favicon
Gains from Trade through Compromise – Center on Long-Term Risk
↗
Favicon
janus — LessWrong
↗
Favicon
Cooperationism Moral Framework
↗
Favicon
Chalmers Status Update
↗