Shannon's Scaling Laws
A Folder from David
Attention Is All You Need
↗
A Mathematical Theory of Communication
↗
Prediction and Entropy of Printed English
↗
Scaling Laws for Neural Language Models
↗
Scaling Laws and Interpretability of Learning
↗
A Mathematical Framework for Transformer Circuits
↗
Training Compute-Optimal Large Language Models
↗
The Entropy of Words—Learnability and Expressivity across More than 1000 Languages
↗
Human languages order information efficiently
↗
Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche - PubMed
↗
Towards a universal model of reading | Behavioral and Brain Sciences | Cambridge Core
↗
The Collected Papers of Charles Sanders Peirce
↗
Deep Learning and the Information Bottleneck Principle
↗