LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

Motivation control
Joe Carlsmith (joekc) · 2024-10-30T17:15:50.881Z · comments (7)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

Australian AI Safety Forum 2024
Liam Carroll (liam-carroll) · 2024-09-27T00:40:11.451Z · comments (0)

MATS AI Safety Strategy Curriculum v2
DanielFilan · 2024-10-07T22:44:06.396Z · comments (6)

Time Efficient Resistance Training
romeostevensit · 2024-10-07T15:15:44.950Z · comments (8)

Startup Success Rates Are So Low Because the Rewards Are So Large
AppliedDivinityStudies (kohaku-none) · 2024-10-10T20:22:01.557Z · comments (6)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (4)

[link] Point of Failure: Semiconductor-Grade Quartz
Annapurna (jorge-velez) · 2024-09-30T15:57:40.495Z · comments (8)

Reflections on the Metastrategies Workshop
gw · 2024-10-24T18:30:46.255Z · comments (5)

[link] IAPS: Mapping Technical Safety Research at AI Companies
Zach Stein-Perlman · 2024-10-24T20:30:41.159Z · comments (12)

D&D Sci Coliseum: Arena of Data
aphyer · 2024-10-18T22:02:54.305Z · comments (23)

[link] An Interactive Shapley Value Explainer
James Stephen Brown (james-brown) · 2024-09-28T05:01:21.169Z · comments (9)

Monthly Roundup #23: October 2024
Zvi · 2024-10-16T13:50:05.869Z · comments (12)

2025 Color Trends
sarahconstantin · 2024-10-07T21:20:03.962Z · comments (7)

[Linkpost] Play with SAEs on Llama 3
Tom McGrath · 2024-09-25T22:35:44.824Z · comments (2)

Winners of the Essay competition on the Automation of Wisdom and Philosophy
AI Impacts (AI Imacts) · 2024-10-28T17:10:04.272Z · comments (3)

[question] Implications of China's recession on AGI development?
Eric Neyman (UnexpectedValues) · 2024-09-28T01:12:36.443Z · answers+comments (3)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (0)

Signaling with Small Orange Diamonds
jefftk (jkaufman) · 2024-11-07T20:20:08.026Z · comments (1)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

Are we dropping the ball on Recommendation AIs?
Charbel-Raphaël (charbel-raphael-segerie) · 2024-10-23T17:48:00.000Z · comments (16)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane (ckkissane) · 2024-10-27T18:46:21.316Z · comments (1)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (7)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (16)

Is the Power Grid Sustainable?
jefftk (jkaufman) · 2024-10-26T02:30:06.612Z · comments (37)

[link] Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims
garrison · 2024-11-13T17:00:01.005Z · comments (7)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

AI #89: Trump Card
Zvi · 2024-11-07T16:30:05.684Z · comments (12)

Video and transcript of presentation on Otherness and control in the age of AGI
Joe Carlsmith (joekc) · 2024-10-08T22:30:38.054Z · comments (1)

Book Review: On the Edge: The Gamblers
Zvi · 2024-09-24T11:50:06.065Z · comments (1)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures (Workshop @ EA Hotel!)
Sahil · 2024-11-01T17:24:09.957Z · comments (2)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

(Maybe) A Bag of Heuristics is All There Is & A Bag of Heuristics is All You Need
Sodium · 2024-10-03T19:11:58.032Z · comments (17)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sharmake-farah on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

I definitely agree that people are overupdating too much from this training run, and we will need to wait.

(I also made this mistake in overupdating.)

stormykat on Model of psychosis, take 2

This post is really interesting!

Do you have any thoughts on why then does psychosis typically suddenly 'kick in' in late adolescence / early adulthood? (and why trauma correlates with it and tends to act as that 'kickstarter'?)

Also any thoughts about delusions? Like how come schizophrenic people will occasionally not just believe in impossible things but very occasionally even random things like 'I am Jesus Christ' or 'I am Napoleon'?

vladimir_nesov on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

turns out that Ilya Sutskever was misinterpreted

That's not exactly my claim [LW · GW]. If he said more to the reporters than his words quoted in the article^[1], then it might've been justified to interpret him as saying that pretraining is plateauing. The article isn't clear on whether he said more. If he said nothing more, then the interpretation about plateauing doesn't follow, but could in principle still be correct.

Another point is that Sutskever left OpenAI before they trained the first 100K H100s model, and in any case one datapoint of a single training run isn't much evidence. The experiment that could convincingly demonstrate plateauing hasn't been performed yet [LW(p) · GW(p)]. Give it at least a few months, for multiple labs to try and fail.

“The 2010s were the age of scaling, now we're back in the age of wonder and discovery once again. Everyone is looking for the next thing,” Sutskever said. “Scaling the right thing matters more now than ever.”

↩︎

deepthoughtlife on Basics of Handling Disagreements with People

I have a lot of disagreements with this piece, and just wrote these notes as I read it. I don't know if this will even be a useful comment. I didn't write it as a through line. 'You' and 'your' are often used nonspecifically about people in general.

The usefulness of things like real world examples seems to vary wildly.

Rephrasing is often terrible; rephrasing done carelessly actually often leads to basically lying about what your conversation partner is saying, especially since many people will double down on the rephasing when told that they are wrong, which obviously infuriates many people (including me, of course.). People often forget that just because they rephrased it doesn't mean that they got the rephrasing right. Remember the whole thing about how you don't understand by default?

This leads into one of the primary sins of discussion, mindreading. You think you know what the other party is thinking, and you just don't. When corrected, many don't update and just keep insisting. (Of course, the corrections aren't always true either.)

A working definition may or may not be better than a theoretical one. Often times there really isn't a working definition that the person you are talking to can express (which is obviously true of theoretical at times too). People may have to argue about subjects where the definitions are inexpressible in any reasonable amount of time, or otherwise can't be shared.

Your suggestion for attacking personal experience seems very easy to do very badly. Personal experience is what we bootstrap literally every bit of our understanding of the world from. If that's not reliable, we have nothing to talk about. You have to build on some part of their personal experience or the conversation just won't work. (Luckily, a lot of our personal experiences are very similar.) It reminds me of games people play to win/look good, not to actually have a real discussion.

People don't generally use Bayes rule! Keep that in mind. When you are discussing something with someone, they aren't doing probability theory! (Perhaps very rarely.) Bayes rule can only be used by analogy to describe it.

Stories need to actually be short, clear, and to the point or they just confuse the matter more. If you spend fifty paragraphs on the life story of some random person that I don't care about, I'm just going to tune it out (despite the fact I am super long winded). (This is a problem with many articles, for instance.) Even if I didn't, I'm still going to miss your point, so get to the point. Can you tell this story in a couple hundred words? Then you can use it. No? Rethink the idea.

Caring about their underlying values is useful, but it needs to be preceeded by curiousity about and understanding of, or it does no good.

I do agree that understanding why someone wants something is obviously the best way to find out what you can offer that might be better than what they currently want to do, though I do think understanding what they want to do is useful too.

Something said in point 8 seems like the key. "Empathy isn't just a series of scripted responses." You need to adapt to the actual argument you are having. This isn't just true about empthy, but for any kind of understanding. The thing itself is the key, and the approach will have to change for each individual part. This isn't just once in attempting understanding, but recursively true with every subpart.

zoop on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy

I think you've made a motte-and-bailey argument:

Motte: The payoff structure of the cosmic flip/St. Petersburg Paradox applied to the real world is actually much better than double-or-nothing, and therefore you should play the game.
Bailey: SBF was correct in saying you should play the double-or-nothing St. Petersburg Paradox game.

Your motte is definitely defensible. Obviously, you can alter the payoff structure of the game to a point where you should play it.

That does not mean "there's no real paradox" , it just means you are no longer talking about the paradox. SBF literally said he would take the game in the specific case where the game was double-or-nothing. Totally different!

This ends my issue with your argument, but I'll also share my favorite anti-St. Petersburg Paradox argument since you didn't really touch on any of the issues it connects to. In short: the definition of expected value as the mean outcome is inappropriate in this scenario and we should instead use the median outcome.

This paper makes the argument better than I can if you're curious, but here's my concise summary:

Mean values are perhaps appropriate if we play the game many (or infinity) times. In these situations, through the law of large numbers, the mean outcome of the games played will approach the mean interpretation of expected value.
For a single play-through (as in the thought experiment) the mean is not appropriate, as the law of large numbers does not apply. Instead, we should value the game by its median outcome: the outcome one should reasonably expect.
Indeed, if you have people actually play this game, their betting behavior is more consistent with an intuition of median expected value (this is tested in the paper).
There's an argument Median EV is the better interpretation even when playing multiple times. In these situations you can think of the game as "playing the game multiple times, once." This resolves the paradox in all but the infinite cases.
If you use the median interpretation of EV for finite trials of the game, there is no paradox.

A personal gripe: I find it more than a little stupid that the "expected value" is a value you don't actually "expect" to observe very frequently when sampling highly skewed distributions.

Mathematicians and Economists have taken issue with the mean definition of EV basically as long as it has existed. Regardless of whether or not you agree with it, it seems pretty obvious to me that it is inappropriate to use the mean to value single trial outcomes.

So maybe in the real world we should play the game, but I firmly believe we should value the game using medians and not means. Do we get to play the world outcome optimization game multiple/infinite times? Obviously not.

nathan-helm-burger on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

https://www.lesswrong.com/posts/NRZfxAJztvx2ES5LG/a-path-to-human-autonomy [LW · GW]

Vladimir makes an excellent point that it's simply too soon to tell whether the next gen (eg gpt5) of LLMs will fizzle. I do think there's reasonable evidence for suspecting that the generation AFTER that (eg gpt6) won't be a straightforward scale up of gpt4. I think we're in a compute and data overhang for AGI, and that further parameter, compute, and data scaling beyond gpt5 level would be a waste of money.

The real question is whether gpt5 gen models will be just enough more capable than current ones to substantially increase the rate of the true limiting factor: algorithmic improvement.

lorec on I’m confused about innate smell neuroanatomy

Update: My best current theory [ hasn't changed in a few months but I figured it might be worth posting ] is that composite smell data [i.e. the better part of smell processing] is passed directly from the olfactory bulb to somewhere in the entorhinal-amygdalar-temporal area, while there are a few scents that function as pheromones in the sense that we have innate responses to the scents as opposed to their associated experiences [ so, skunk and feces as well as the scent of eligible mates ] and data about these scents is relayed by thin, almost invisible projections to the hypothalamus or other nuclei in the "emotional motor system" so the behavioral responses can bootstrap.

spade on Spade's Shortform

You offer a really interesting point. I don't think I feel as sharply bad about having to context switch as you do, but it very well could be that I still register a similar bad feeling, and simply react to it by doing nothing as opposed to being productive and then having to go through a context switch.

I hadn't really thought about it as a response to stimulus like that, but I guess that's because I have a more subtly bad feeling when switching contexts, so there wasn't as obvious of a thing to associate my behavior with.

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

Yes intuitions can be wrong welcome to reality

But these ways of looking at the world are not factually wrong, they're just perverted in a sense.
I agree that schools are quite terrible in general.

how could I have come up with this myself?

That helps for learning facts, but one can teach the same things in many different ways. A math book from 80 years ago may be confusing now, even if the knowledge it covers is something that you know already, because the terms, notation and ideas are slightly different.

we need wisdom because people cannot think

In a way. But some people who have never learned psychology have great social skills, and some people who are excellent with psychology are poor socializers. Some people also dislike "nerdy" subjects, and it's much more likely that they'd listen to a ted talk on budy language than read a book on evolutionary psychology and non-verbal communication. Having an "easy version" of knowledge available which requires 20 IQ points less than the hard version seems like a good idea.
Some of the wisest and psychologically healthy people I have met have been non-intellectual and non-ideological, and even teenagers or young adults. Remember your "Things to unlearn from school" post? Some people may have less knowledge than the average person, and thus have less errors, making them clear-sighted in a way that makes them seem well-read. Teaching these people philosophy could very well ruin their beautiful worldviews rather than improve on them.

if you know enough rationality you can easily get past all that.

I don't think "rationality" is required. Somebody who has never heard about the concept of rationality, but who is highly intelligent and thinks things through for himself, will be alright (outside of existential issues and infohazards, which have killed or ruined a fair share of actual geniuses).
But we're both describing conditions which apply to less than 2% of the population, so at best we have to suffer from the errors of the 98%.

I'm not sure what you mean by "when you dissent when you have an overwhelming reason". The article you linked to worded it "only when", as if one should dissent more often, but it also warns against dissenting since it's dangerous.
By the way, I don't like most rational communities very much, and one of the reasons is that is that they have a lot of snobs who will treat you badly if you disagree with them. The social mockery I've experienced is also quite strong, which is strange since you'd suspect intelligence to correlate with openness, and for the high rate of autistic people to combat some of the conformity.

I also don't like activism, and the only reason I care about the stupid ideas of the world is that all the errors are making life harder for me and the people that I care about. Like I said, not being an egoist is impossible, and there's no strong evidence that all egoism is bad, only that egoism can be bad. The same goes for money and power, I think they're neutral and both potentially good/bad. But being egoistic can make other people afraid of me if I don't act like I don't realize what I'm doing.

It's more optimal to be passionate about a field

I think this is mostly correct. But optimization can kill passion (since you're just following the meta and not your own desires). And common wisdom says "Follow your dreams" which is sort of naive and sort of valid at the same time.

Believing false things purposefully is impossible

I think believing something you think is false, intentionally, may be impossible. But false beliefs exist, so believing in false things is possible. For something where you're between 10% and 90% sure, you can choose if you want to believe in it or not, and then using the following algorithm:
Say "X is true because" and then allow your brain to search through your memoy for evidence. It will find them.

The articles you posted on beliefs is about the rules of linguistics (belief in belief is a valid string) and logic, but how belief works psychologically may be different. I agree that real beliefs are internalized (exist in system 1) to the point that they're just part of how you anticipate reality. But some beliefs are situational and easy to consciously manipulate (example: self-esteem. You can improve or harm your own self esteem in about 5 minutes if you try, since you just pick a perspective and set of standards in which you appear to be doing well or badly). Self-esteem is subjective, but I don't think the brain differentiates subjective and objective things, it doesn't even know the difference.

And it doesn't seem like you value truth itself, but that you value the utility of some truths, and only because they help you towards something you value more?

Ethically yes, epistemically no

You may believe this because a worldview will have to be formed through interactions with the territory, which means that a worldview cannot be totally unrelated to reality? You may also mean this: That if somebody has both knowledge and value judgements about life, then the knowledge is either true or false, while the value judgements are a function of the person. A happy person might say "Life is good" and a depression person might say "Life is cruel", and they might even know the same facts.

Online "black pills" are dangerous, because the truth value of the knowledge doesn't imply that the negative worldview of the person sharing it is justified. Somebody reading the vasistha yoga might become depressed because he cannot refute it, but this is quite an advanced error in thinking, as you don't need to refute it for its negative tone to be false.

Rationality is about having cognitive algorithms which have higher returns

But then it's not about maximizing truth, virtue, or logic.
If reality operates by different axioms than logic, then one should not be logical.
The word "virtue" is overloaded, so people write like the word is related to morality, but it's really just about thinking in ways which makes one more clear-sighted. So people who tell me to have "humility" are "correct" in that being open to changing my beliefs makes it easier for me to learn, which is rational, but they often act as if they're better people than me (as if I've made an ethical/moral mistake in being stubborn or certain of myself).
By truth, one means "reality" and not the concept "truth" as the result of a logic expression. This concept is overloaded too, so that it's easy for people to manipulate a map with logical rules and then tell another person "You're clearly not seeing the territory right".

physics is more accurate than intuitive world models

Physics is our own constructed reality, which seems to a act a lot like the actual reality. But I think an infinite amount of physics could exist which predicts reality with a high accuracy. In other words, "There's no one true map". We reverse engineer experiences into models, but experience can create multiple models, and multiple models can predict experiences.
One of the limitation is "there's no universal truth", but this is not even a problem as the universe is finite. But "universal" in mathematics is assumed to be truly universal, covering all things, and it's precisely this which is not possible. But we don't notice, and thus come up with the illusion of uniqueness. And it's this illusion which creates conflict between people, because they disagree with eachother about what the truth is, claiming that that conflicting things cannot both be true. I dislike the consensus because it's the consensus and not a consensus.

A good portion of hardcore rationalists tend to have something to protect, a humanistic cause

My bad for misrepresenting your position. Though I don't agree that many hardcore rationalists care for humanistic causes. I see them as placing rationality above humanity, and thus prefering robots, cyborgs, and AIs above humanity. They think they prefer an "improvement" of humanity, but this functionally means the destruction of humanity. If you remove negative emotions (or all emotions entirely. After all, these are the source of mistakes, right?), subjectivity, and flaws from humans, and align them with eachother by giving them the same personality, or get rid of the ego (it's also a source of errors and unhappiness) what you're left with is not human. It's at best a sentient robot. And this robot can achieve goals, but it cannot enjoy them.
I just remembered seeing the quote "Rationality is winning", and I'll admit this idea sounds appealing. But a book I really like (EST: Playing the game the new way, by Carl Frederick) is precisely about winning, and its main point is this: You need to give up on being correct. The human brain wants to have its beliefs validated, that's all. So you let other people be correct, and then you ask them for what you want, even if it's completely unreasonable.

Rationality doesn't necessarily have nature as a terminal value

I meant nature as its source (of evidence/truth/wisdom/knowledge). "Nature" meaning reality/the dao/the laws of physics/the universe/GNON. I think most schools of thought draw their conclusions from reality itself. The only kind of worldviews which seems disconnected from reality is religions which create ideals out of what's lacking in life and making those out to be virtue and the will of god.

None of that is incompatible with rationality

What I dislike might not be rationality, but how people apply it, and psychological tendencies in people who apply it. But upvotes and downvotes seem very biased in favor of a consensus and verifiability, rather than simply being about getting what you want out of life. People also don't seem to like being told accurate heuristics which seem immoral or irrational (the colloquial definition that regular people use) even if they predict reality well. There's also an implicit bias towards alturism which cannot be derived from objective truth.

About my values, they already exist even if I'm not aware of them, they're just unconscious until I make them conscious. But if system 1 functions well, then you don't really need to train system 2 to function well, and it's a pain to force system 2 rationality onto system 1 (your brain resists most attempts at self-modification). I like the topic of self-modification, but that line of studies doesn't come up on LW very often, which is strange to me. I still believe that the LW community downplays the importance of human nature and psychology. It may even underevaluate system 1 knowledge (street smarts and personal experiences) and overevaluate system 2 knowledge (authority, book-smarts, and reasoning)

sharmake-farah on Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever's Recent Claims

I just want to provide one important piece of information:

It turns out that Ilya Sutskever was misinterpreted as a claim about the model plateauing, but instead saying other directions work out better:

https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/?commentId=JFNZ5MGZnzKRtFFMu [LW · GW]