LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

1 What If We Rebuild Motivation with the Fermi ESTIMATion?
P. João (gabriel-brito) · 2024-12-17T07:46:40.547Z · comments (0)

[link] Example of GPU-accelerated scientific computing with PyTorch
Tahp · 2025-01-01T23:01:04.606Z · comments (0)

[link] Streamlining my voice note process
Vlad Sitalo (harcisis) · 2024-12-26T06:04:01.990Z · comments (1)

Replaceable Axioms give more credence than irreplaceable axioms
Yoav Ravid · 2024-12-20T00:51:13.578Z · comments (2)

Reduce AI Self-Allegiance by saying "he" instead of "I"
Knight Lee (Max Lee) · 2024-12-23T09:32:29.947Z · comments (4)

Generating Cognateful Sentences with Large Language Models
vkethana (vijay-k) · 2025-01-06T18:40:09.564Z · comments (0)

[link] What is compute governance?
Vishakha (vishakha-agrawal) · 2024-12-23T06:32:25.588Z · comments (0)

A proposal for iterated interpretability with known-interpretable narrow AIs
Peter Berggren (peter-berggren) · 2025-01-11T14:43:05.423Z · comments (0)

Marx and the Machine
DAL · 2025-01-15T18:33:16.789Z · comments (2)

Speedrunning Rationality: Day I
aproteinengine · 2025-01-04T14:28:49.220Z · comments (0)

[link] World Models I'm Currently Building
temporary · 2024-12-15T16:29:08.287Z · comments (1)

[link] How to Edit an Essay into a Solstice Speech?
Czynski (JacobKopczynski) · 2024-12-15T04:30:50.545Z · comments (1)

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!
Christopher King (christopher-king) · 2024-12-28T16:05:47.037Z · comments (8)

Towards mutually assured cooperation
mikko (morrel) · 2024-12-22T20:46:21.965Z · comments (0)

[link] Inescapably Value-Laden Experience—a Catchy Term I Made Up to Make Morality Rationalisable
James Stephen Brown (james-brown) · 2024-12-19T04:45:37.906Z · comments (0)

Logic vs intuition <=> algorithm vs ML
pchvykov · 2025-01-04T09:06:51.822Z · comments (0)

Using LLM Search to Augment (Mathematics) Research
kaleb (geomaturge) · 2024-12-19T18:59:34.391Z · comments (0)

Printable book of some rationalist creative writing (from Scott A. & Eliezer)
CounterBlunder · 2024-12-23T15:44:31.437Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

[link] Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude
rife (edgar-muniz) · 2025-01-06T17:34:01.505Z · comments (18)

[link] Better antibodies by engineering targets, not engineering antibodies (Nabla Bio)
Abhishaike Mahajan (abhishaike-mahajan) · 2025-01-13T15:05:35.261Z · comments (0)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

ARC-AGI is a genuine AGI test but o3 cheated :(
Knight Lee (Max Lee) · 2024-12-22T00:58:05.447Z · comments (6)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

Vision of a positive Singularity
RussellThor · 2024-12-23T02:19:35.050Z · comments (0)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Governance Course - Week 1 Reflections
Alice Blair (Diatom) · 2025-01-09T04:48:27.502Z · comments (1)

3. Improve Cooperation: Better Technologies
Allison Duettmann (allison-duettmann) · 2025-01-02T19:03:16.588Z · comments (2)

5. Uphold Voluntarism: Digital Defense
Allison Duettmann (allison-duettmann) · 2025-01-02T19:05:33.963Z · comments (0)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

Some implications of radical empathy
MichaelStJules · 2025-01-07T16:10:16.755Z · comments (0)

Thoughts on the In-Context Scheming AI Experiment
ExCeph · 2025-01-09T02:19:09.558Z · comments (0)

[question] How should I optimize my decision making model for 'ideas'?
CstineSublime · 2024-12-18T04:09:58.025Z · answers+comments (0)

LLMs are really good at k-order thinking (where k is even)
charlieoneill (kingchucky211) · 2025-01-15T20:43:00.623Z · comments (0)

Have frontier AI systems surpassed the self-replicating red line?
nsage (wheelspawn) · 2025-01-11T05:31:31.672Z · comments (0)

AI models inherently alter "human values." So, alignment-based AI safety approaches must better account for value drift
bfitzgerald3132 · 2025-01-13T19:22:41.195Z · comments (1)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

[question] 2025 Alignment Predictions
anaguma · 2025-01-02T05:37:36.912Z · answers+comments (3)

Reminder: AI Safety is Also a Behavioral Economics Problem
zoop · 2024-12-20T01:40:53.847Z · comments (0)

[link] bending light
Recurrented (rachel-farley) · 2024-12-17T22:40:06.950Z · comments (4)

Towards a Unified Interpretability of Artificial and Biological Neural Networks
jan_bauer · 2024-12-21T23:10:45.842Z · comments (0)

[link] The Economics & Practicality of Starting Mars Colonization
Zero Contradictions · 2024-12-26T10:56:26.019Z · comments (1)

A Systematic Approach to AI Risk Analysis Through Cognitive Capabilities
Tom DAVID (tom-david) · 2025-01-09T00:18:04.608Z · comments (0)

Gothenburg LW / ACX meetup
Stefan (stefan-1) · 2025-01-08T21:39:18.309Z · comments (0)

Introducing Avatarism: A Rational Framework for Building actual Heaven
ratiba ro (ratiba-ro) · 2024-12-15T17:17:45.440Z · comments (2)

I Recommend More Training Rationales
Gianluca Calcagni (gianluca-calcagni) · 2024-12-31T14:06:44.007Z · comments (0)

Walking Sue
Matthew McRedmond (matthew-mcredmond) · 2024-12-18T13:19:41.575Z · comments (5)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning
rife (edgar-muniz) · 2025-01-15T22:59:46.321Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

gwern on Implications of the inference scaling paradigm for AI safety

I think this is missing a major piece of the self-play scaling paradigm: much of the point of a model like o1 is not to deploy it, but to generate training data for the next model. Every problem that an o1 solves is now a training data point for an o3 (eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined intuition).

I am actually mildly surprised OA has bothered to deploy o1-pro at all, instead of keeping it private and investing the compute into more bootstrapping of o3 training etc. (This is apparently what happened with Anthropic and Claude-3.6-opus - it didn't 'fail', they just chose to keep it private and distill it down into a small cheap but weirdly smart Claude-3.6-sonnet.) If you're wondering why OAers are suddenly weirdly, almost euphorically optimistic, watching the improvement from the original 4o model to o3 (and wherever it is now!) may be why. It's like watching the AlphaGo Elo curves: it just keeps going up... and up... and up... And then you get to have your cake and eat it too: the final AlphaGo/Zero model is not just superhuman but very cheap to run too. (Just searching out a few plies gets you to superhuman strength; even the forward pass alone is around pro human strength!)

If you look at the relevant scaling curves - may I yet again recommend reading Jones 2021? - the reason for this becomes obvious. Inference-time search is a stimulant drug that juices your score immediately, but asymptotes hard. Quickly, you have to use a smarter model to improve the search itself, instead of doing more. (If simply searching could work so well, chess would've been solved back in the 1960s. It's not hard to search more than the handful of positions a grandmaster human searches per second. If you want a text which reads 'Hello World', a bunch of monkeys on a typewriter may be cost-effective; if you want the full text of Hamlet before all the protons decay, you'd better start cloning Shakespeare.) Fortunately, you have the training data & model you need right at hand to create a smarter model...

Sam Altman (@sama, 2024-12-20):

seemingly somewhat lost in the noise of today:

on many coding tasks, o3-mini will outperform o1 at a massive cost reduction!

i expect this trend to continue, but also that the ability to get marginally more performance for exponentially more money will be really strange

So, it is interesting that you can spend money to improve model performance in some outputs... but you may simply be spending that money to improve the model itself, not just a one-off output.

This means that outsiders may never see the intermediate models (any more than Go players got to see random checkpoints from a third of the way through AlphaZero training). And to the extent that it is true that 'deploying costs 1000x more than now', that is a reason to not deploy at all. Why bother wasting that compute on serving external customers, when you can instead keep training, and distill that back in, and soon have a deployment cost of a superior model which is only 100x, and then 10x, and then 1x, and then <1x...?

gwern on Where should one post to get into the training data?

Reddit blocks scrapers now aggressively, because it's charging a fortune for access, and The Pile could no longer have been created (Pushshift is down). Reddit is not the worst place to post, but it's also not the best.

trevorone on How do you deal w/ Super Stimuli?

What worked for me was pavlovian training by putting my metabolism/blood sugar into lows/withdrawals and then timing the superstimuli habit during those lows for negative reinforcement.

For example, eating nothing but spaghetti and water for breakfast or both breakfast and lunch, on an empty stomach and not eating anything else as this interferes. In most people this causes a longer weaker rush than a sugar rush for ~2-4 hours or so, then a pretty bad crash (e.g. sufficiently powerful depression to not take this lightly and to keep people nearby, but not nearly enough to be dangerous). If you correctly time the superstimulus activity to start shortly before the crash, the negative reinforcement effect will be sufficiently powerful to put an ugh field [LW · GW] around it after 1-3 successful timings (even 3 is a stretch and I don't endorse this for anything that gets you to put yourself through this a fourth time).

You abort this crash by eating protein and waiting 20 minutes to an hour for your metabolism to exit the crash by returning to normal. The exiting of the crash provides positive reinforcement, and I don't know whether velocity or acceleration of mood causes the positive/negative reinforcement (just that looking at it in terms of position alone is probably grossly inadequate), so it's important to get the timing right and be conservative with making sure that you change the activity early to make sure that most of the velocity of mood change and acceleration of mood change are during the targeted activty and that minimal velocity/acceleration in the wrong direction happens during the targeted activity.

Using caffeine/stimulants or anything that increases mood or blood sugar during a superstimuli activity is deranged and dangerous; if you're not comfortable with doing it in the helpful taking-control direction I described here (which is quite reasonable and I might be mistaken for playing with this and don't intend to do it or anything like it again now that I've finished my packet of spaghetti and won't buy another), then you're probably not comfortable with doing it in the unhelpful losing-control direction (I'm referring to comfort upon reflection, not in the moment). Our civilization gives kids adderal et al. and tells them to use it during classes/homework and responds to any other use with finger wagging instead of explaining the risks, so no adjacent societal or cultural norms can be trusted let alone left unexamined.

algon on How do fictional stories illustrate AI misalignment?

That's a good film! A friend of mine absolutely loves it.

Do you think the Forbin Project illustrates some aspect of misalignment that isn't covered by this article?

seth-herd on Heritability: Five Battles

Wow! That is a hell of a comprehensive writeup.

The bio-determinist child-rearing rule of thumb [but see caveats below!]: Things you do as a parent will have generally small or zero effects on what the kid will be like as an adult—their personality, their intelligence and competence, their mental health, etc.

I found it pretty interesting here and back when I was reading about it that this list does not include happiness. This is part of a larger societal disinterest in happiness. But I do wonder if it might be influenced by parents nontrivially by them seeding children with a life philosophy and a set of cognitive habits about how to think about life.

I also noticed that the data they tracked about how parents treat children included no efforts to determine how much parents actually loved their children, or how much they fought with their children. While most parents, particularly the middle-classers that take part in studies, love their children, how much and whether that prevents ongoing feuds with their children does seem to vary a good bit.

Of course that would be problematic, because how much parents love and feud with their children is also clearly influenced by how much said children are acting like little shits. :)

I doubt any of these would show large effects, I'm just noting their absence.

I used to care about genetics for reasons 2 (what effect do parents have) and 3 (do your adult decisions like attending therapy really matter), back when I planned to use my PhD in cognitive psychology and neuroscience to write about "free will" (really self-determination; do our decisions matter for our outcomes) in ourselves and our society. My thesis was that we have substantial self-determination but also substantial limitations in it; and that liberal American philosophies tend to emphasize the extent to which we don't, while conservative American philosophies emphasize the extent to which we do. Neither is entirely correct, causing strong adherents of either to make no sense and therefore be super irritating to talk to.

But those ambitions ended when I first read Yudkowsky, and decided free will was small potatoes in the face of an onrushing intelligence explosion and alignment crisis. Thus, my ideas related to genetics have never before been published, and probably won't be. Thanks for the excuse to rant.

Again, wow. I'll be referring anyone to this writeup if they express more than the vaguest interest in what we know about genetics.

purplehermann on We probably won't just play status games with each other after AGI

Reading novels with ancient powerful beings is probably the best direction you have for imagining how status games amongst creatures which are only loosely human look.

Resources being bounded, there will tend to always be larger numbers of smaller objects (given that those objects are stable).

There will be tiers of creatures. (In a society where this is all relevant)

While a romantic relationship skipping multiple tiers wouldn't make sense, a single tier might.

The rest of this is my imagination :)

Base humans will be F tier, the lowest category while being fully sentient. (I suppose dolphins and similar would get a special G tier).

Basic AGIs (capable of everything a standard human is, plus all the spikey capabilities) and enhanced humans E tier.

Most creatures will be here.

D tier:

Basic ASIs and super enhanced humans (gene modding for 180+ IQ plus SOTA cyborg implants) will be the next tier, there will be a bunch of these in absolute terms but relative to the earlier tier rarer.

C tier:

Then come Alien Intelligence, massive compute resources supporting ASIs trained on immense amounts of ground reality data, biological creatures that have been redesigned fundamentally to function at higher levels and optimally synergize with neural connections (whether with other carbon based or silicon based lifeforms)

B tier:

Planet sized clusters running ASIs will be a higher tier.

A, S tiers:

Then you might get entire stars, then galaxies.

There will be much less at each level.

Most tiers will have a -, neutral or +.

- : prototype, first or early version. Qualitatively smarter than the tier below, but non-optimized use of resources, often not the largest gap from the + of the earlier tier

Neutral: most low hanging optimizations and improvements and some harder ones at this tier are implemented

+: highly optimized by iteratively improved intelligences or groups of intelligences at this level, perhaps even by a tier above.

zane on Everywhere I Look, I See Kat Woods

Some of the memes you referenced do seem "cringe" to me, but people have different senses of humor. I'm not sure what the issue is with someone posting memes they personally find funny.

If you disagree with the point that the memes are making, that's different, but can you give an example of something in one of the memes she posted that you thought was invalid reasoning? You called her content "dark arts tactics" and said:

"It feels like it is trying to convince me of something rather than make me smarter about something. It feels like it is trying to convey feelings at me rather than facts."

but you've only explained how it's making you feel instead of what message it's conveying.

sharmake-farah on Marx and the Machine

I enjoyed reading this post; thank you for writing it. LessWrong has an allergy to basically every category Marx is a member of - "armchair" philosophers, socialist theorists, pop humanities idols - in my view, all entirely unjustified.

To be fair here, Marx was kind of way overoptimistic about what could be achieved with central economic planning in the 20th century, because it way overestimated how far machines/robots could go, and also this part where he says communist countries don't need a plan because the natural laws would favor communism, which was bullshit.

More here:

In his review of Peter Singer's commentary on Marx, Scott Alexander writes:

[...] Marx was philosophically opposed, as a matter of principle, to any planning about the structure of communist governments or economies. He would come out and say it was irresponsible to talk about how communist governments and economies will work. He believed it was a scientific law, analogous to the laws of physics, that once capitalism was removed, a perfect communist government would form of its own accord. There might be some very light planning, a couple of discussions, but these would just be epiphenomena of the governing historical laws working themselves out.

keltan on We probably won't just play status games with each other after AGI

Sex is fun and awesome. Though it doesn’t feel fun and awesome to have sex all day everyday. You could probably do transhuman meth and make sex fun all the time. But a Pleasure Cube/Super Happy scenario makes me sad.

I’m also wondering who you’re talking about when you say “most people” here? I have the opposite model of most people.

gwern on Passages I Highlighted in The Letters of J.R.R.Tolkien

Tolkien invented their exact usage, but he didn't invent the words. "Elf", obviously, goes way back, but "orc" also goes way back, with meanings similar to the Tolkien usage.

"Zerg", "Protoss", & "SCV", are all neologisms; notably, the least weird ones, "Kerrigan" and "Terran", are quite ordinary words. ('Hydralisk' is a bit in between. 'Hydra' is familiar, albeit increasingly hopelessly overloaded with SF/comic connotations, but 'lisk' is a very unfamiliar one: 'obelisk' is the only one that comes to mind, and that appears to get 'lisk' as a butchering of Greek and then French.)

An interesting comparison here would be Gene Wolfe's Book of the New Sun, which does something similar: it uses old words in place of neologisms, and for that reason, despite being stuffed with weird terms (so much so you can publish a dictionary of it), words like 'pelagic argosy' or 'fuligin' or 'capote' nevertheless work as well as in the 1980s as they do now, despite not having achieved the cultural currency of 'elves' or 'orcs', and so demonstrating that the 'use old words' trick works in its own right and not simply by mere familiarity.

(But conversely, writing old-timey is no surefire solution. Wolfe's dying-earth fiction was influenced by Hodgson's The Night Land, which is imaginative and influential... and the style is almost ludicrously unreadable, whether in 1912 or 2025.)

Now, why is that? I suspect that it's a mix of unrealized familiarity (you may not have seen 'destrier' often enough to consciously recognize it as a real word, much less define or use it*, but unconsciously you do) and linguistic 'dark knowledge' in recognizing that somehow, the word 'autarch' is valid and a plausible word which could exist, in a way that 'Zerg' or 'Protoss' could not exist. It somehow respects the laws of languages and etymology and spelling, and you recognize that by not immediately rejecting it like most neologisms. (And to some extent, Tolkien's own conlangs, by having their long fictional history to justify various twists & turns, gain a hidden realism that a tidy hobbyist conlang will not.)

* this is why vocab can be a good IQ test: word use frequency is the original power law, and because you have been exposed to many more words than you consciously know, and how many of those words 'stick' will reflect your intelligence's efficiency at learning from 1 or 2 uses of a word, and thus provide a good proxy