2022 (and All Time) Posts by Pingback Count
post by
Raemon ·
2023-12-16T21:17:00.572Z ·
LW ·
GW ·
14 comments
Contents
14 comments
For the past couple years I've wished LessWrong had a "sort posts by number of pingbacks, or, ideally, by total karma of pingbacks". I particularly wished for this during the Annual Review, where "which posts got cited the most?" seemed like a useful thing to track for potential hidden gems.
We still haven't built a full-fledged feature for this, but I just ran a query against the database, and made it into a spreadsheet, which you can view here:
LessWrong 2022 Posts by Pingbacks
Here are the top 100 posts, sorted by Total Pingback Karma
Title/Link
Post Karma
Pingback Count
Total Pingback Karma
Avg Pingback Karma
AGI Ruin: A List of Lethalities
870 158 12,484 79 MIRI announces new "Death With Dignity" strategy
334 73 8,134 111 A central AI alignment problem: capabilities generalization, and the sharp left turn
273 96 7,704 80 Simulators
612 127 7,699 61 Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
367 83 5,123 62 Reward is not the optimization target
341 62 4,493 72 A Mechanistic Interpretability Analysis of Grokking
367 48 3,450 72 How To Go From Interpretability To Alignment: Just Retarget The Search
167 45 3,374 75 On how various plans miss the hard bits of the alignment challenge
292 40 3,288 82 [Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering
79 36 3,023 84 How likely is deceptive alignment?
101 47 2,907 62 The shard theory of human values
238 42 2,843 68 Mysteries of mode collapse
279 32 2,842 89 [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain
57 30 2,731 91 Why Agent Foundations? An Overly Abstract Explanation
285 42 2,730 65 A Longlist of Theories of Impact for Interpretability
124 26 2,589 100 How might we align transformative AI if it’s developed very soon?
136 32 2,351 73 A transparency and interpretability tech tree
148 31 2,343 76 Discovering Language Model Behaviors with Model-Written Evaluations
100 19 2,336 123 A note about differential technological development
185 20 2,270 114 Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
195 35 2,267 65 Supervise Process, not Outcomes
132 25 2,262 90 Shard Theory: An Overview
157 28 2,019 72 Epistemological Vigilance for Alignment
61 21 2,008 96 A shot at the diamond-alignment problem
92 23 1,848 80 Where I agree and disagree with Eliezer
862 27 1,836 68 Brain Efficiency: Much More than You Wanted to Know
201 27 1,807 67 Refine: An Incubator for Conceptual Alignment Research Bets
143 21 1,793 85 Externalized reasoning oversight: a research direction for language model alignment
117 28 1,788 64 Humans provide an untapped wealth of evidence about alignment
186 19 1,647 87 Six Dimensions of Operational Adequacy in AGI Projects
298 20 1,607 80 How "Discovering Latent Knowledge in Language Models Without Supervision" Fits Into a Broader Alignment Scheme
240 16 1,575 98 Godzilla Strategies
137 17 1,573 93 (My understanding of) What Everyone in Technical Alignment is Doing and Why
411 23 1,530 67 Two-year update on my personal AI timelines
287 18 1,530 85 [Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
90 16 1,482 93 [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL
66 25 1,460 58 Human values & biases are inaccessible to the genome
90 14 1,450 104 You Are Not Measuring What You Think You Are Measuring
350 21 1,449 69 Open Problems in AI X-Risk [PAIS #5]
59 14 1,446 103 [Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
146 25 1,407 56 Conditioning Generative Models
24 11 1,362 124 Conjecture: Internal Infohazard Policy
132 14 1,340 96 A challenge for AGI organizations, and a challenge for readers
299 18 1,336 74 Superintelligent AI is necessary for an amazing future, but far from sufficient
132 11 1,335 121 Optimality is the tiger, and agents are its teeth
288 14 1,319 94 Let’s think about slowing down AI
522 17 1,273 75 Niceness is unnatural
121 12 1,263 105 Announcing the Alignment of Complex Systems Research Group
91 11 1,247 113 [Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts
67 23 1,243 54 ELK prize results
135 17 1,235 73 Abstractions as Redundant Information
64 18 1,216 68 [Link] A minimal viable product for alignment
53 12 1,184 99 Acceptability Verification: A Research Agenda
50 11 1,182 107 What an actually pessimistic containment strategy looks like
647 16 1,168 73 Let's See You Write That Corrigibility Tag
120 10 1,161 116 chinchilla's wild implications
403 18 1,151 64 Worlds Where Iterative Design Fails
185 17 1,122 66 why assume AGIs will optimize for fixed goals?
138 14 1,103 79 Gradient hacking: definitions and examples
38 11 1,079 98 Contra shard theory, in the context of the diamond maximizer problem
101 6 1,073 179 We Are Conjecture, A New Alignment Research Startup
197 8 1,050 131 Circumventing interpretability: How to defeat mind-readers
109 11 1,047 95 Evolution is a bad analogy for AGI: inner alignment
73 7 1,043 149 Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
82 8 1,042 130 MATS Models
86 8 1,035 129 Common misconceptions about OpenAI
239 11 1,028 93 Prizes for ELK proposals
143 20 1,022 51 Current themes in mechanistic interpretability research
88 9 1,014 113 Discovering Agents
71 13 994 76 [Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”
42 15 992 66 What's General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
118 24 988 41 Inner and outer alignment decompose one hard problem into two extremely hard problems
115 17 959 56 Threat Model Literature Review
73 13 953 73 Language models seem to be much better than humans at next-token prediction
172 11 952 87 Will Capabilities Generalise More?
122 7 952 136 Pivotal outcomes and pivotal processes
91 8 938 117 Conditioning Generative Models for Alignment
56 9 934 104 Training goals for large language models
28 9 930 103 It’s Probably Not Lithium
441 5 929 186 Latent Adversarial Training
40 11 914 83 “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
129 11 913 83 Conditioning Generative Models with Restrictions
18 5 913 183 The alignment problem from a deep learning perspective
97 8 910 114 Instead of technical research, more people should focus on buying time
100 15 904 60 By Default, GPTs Think In Plain Sight
84 9 903 100 [Intro to brain-like-AGI safety] 4. The “short-term predictor”
64 16 890 56 Don't leave your fingerprints on the future
109 11 890 81 Strategy For Conditioning Generative Models
31 5 883 177 Call For Distillers
204 19 878 46 Thoughts on AGI organizations and capabilities work
102 5 871 174 Optimization at a Distance
87 9 868 96 [Intro to brain-like-AGI safety] 5. The “long-term predictor”, and TD learning
52 17 859 51 What does it take to defend the world against out-of-control AGIs?
180 11 853 78 Monitoring for deceptive alignment
135 11 851 77 Late 2021 MIRI Conversations: AMA / Discussion
119 8 849 106 How to Diversify Conceptual Alignment: the Model Behind Refine
87 27 845 31 wrapper-minds are the enemy
103 8 833 104 But is it really in Rome? An investigation of the ROME model editing technique
102 8 833 104 An Open Agency Architecture for Safe Transformative AI
74 12 831 69
Comments sorted by top scores.
comment by jessicata (jessica.liu.taylor) ·
2023-12-17T17:48:44.752Z · LW (p ) · GW (p ) I have to look for a while before finding any non-AI posts. Seems LW is mainly an AI / alignment discussion forum at this point.
Replies from: ryan_greenblatt , habryka4 ↑ comment by ryan_greenblatt ·
2023-12-17T19:39:51.721Z · LW (p ) · GW (p )It seems more informative to just look at top (inflation adjusted) karma for 2022 (similar to what habryka noted in the sibling). AI posts in bold .
AGI Ruin: A List of LethalitiesΩ
Where I agree and disagree with EliezerΩ
SimulatorsΩ
What an actually pessimistic containment strategy looks like
Let’s think about slowing down AIΩ
Luck based medicine: my resentful story of becoming a medical miracle
Counter-theses on Sleep
Losing the root for the tree
The Redaction Machine
It Looks Like You're Trying To Take Over The WorldΩ
(My understanding of) What Everyone in Technical Alignment is Doing and WhyΩ
Counterarguments to the basic AI x-risk caseΩ
It’s Probably Not Lithium
Reflections on six months of fatherhood
chinchilla's wild implicationsΩ
You Are Not Measuring What You Think You Are Measuring [AI related]
Lies Told To Children
What DALL-E 2 can and cannot do
Staring into the abyss as a core life skill
DeepMind alignment team opinions on AGI ruin argumentsΩ
Accounting For College Costs
A Mechanistic Interpretability Analysis of GrokkingΩ
Models Don't "Get Reward"Ω
Why I think strong general AI is coming soon
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeoverΩ
Why Agent Foundations? An Overly Abstract ExplanationΩ
MIRI announces new "Death With Dignity" strategy
Beware boasting about non-existent forecasting track records [AI related]
A challenge for AGI organizations, and a challenge for readersΩ
I count 18/29 about AI. A few AI posts are technically more general. A few non-AI posts seem to indirectly be about AI.
comment by Raemon ·
2023-12-16T21:56:01.285Z · LW (p ) · GW (p ) I just updated the spreadsheet to include All Time posts. Lists of Lethalities is still the winner by Total Pingback Karma, although not by Pingback Count (and this seems at least partially explained by karma inflation)
comment by Steven Byrnes (steve2152) ·
2023-12-16T22:04:34.704Z · LW (p ) · GW (p ) The list would look pretty different if self-cites were excluded. E.g. my posts would probably all be gone 😂
Replies from: Raemon , Raemon ↑ comment by Raemon ·
2023-12-16T22:13:42.938Z · LW (p ) · GW (p )Yeah if I have time today I'll make an "exclude self-cites" column, although fwiw I think the "total pingback karma" is fairly legit even if including self-cites. If your followup work got a lot of karma, I think that's a useful signal about your original post even if you quote yourself liberally.