LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
Lee Sharkey (Lee_Sharkey) · 2024-07-18T14:15:50.248Z · comments (18)

[question] Which skincare products are evidence-based?
Vanessa Kosoy (vanessa-kosoy) · 2024-05-02T15:22:12.597Z · answers+comments (47)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (46)

[link] My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi · 2024-09-08T14:30:40.456Z · comments (18)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Anthropic's Certificate of Incorporation
Zach Stein-Perlman · 2024-06-12T13:00:30.806Z · comments (4)

Why I funded PIBBSS
Ryan Kidd (ryankidd44) · 2024-09-15T19:56:33.018Z · comments (21)

[link] Anthropic release Claude 3, claims >GPT-4 Performance
LawrenceC (LawChan) · 2024-03-04T18:23:54.065Z · comments (41)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

The LessWrong 2022 Review
habryka (habryka4) · 2023-12-05T04:00:00.000Z · comments (43)

Talent Needs of Technical AI Safety Teams
yams (william-brewer) · 2024-05-24T00:36:40.486Z · comments (64)

Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis · 2023-12-15T20:16:09.723Z · comments (155)

Mapping the semantic void: Strange goings-on in GPT embedding spaces
mwatkins · 2023-12-14T13:10:22.691Z · comments (31)

[link] Gender Exploration
sapphire (deluks917) · 2024-01-14T18:57:32.893Z · comments (25)

[link] introduction to cancer vaccines
bhauth · 2024-05-05T01:06:16.972Z · comments (19)

The Pareto Best and the Curse of Doom
Screwtape · 2024-02-21T23:10:01.359Z · comments (21)

Rationality Research Report: Towards 10x OODA Looping?
Raemon · 2024-02-24T21:06:38.703Z · comments (21)

Four visions of Transformative AI success
Steven Byrnes (steve2152) · 2024-01-17T20:45:46.976Z · comments (22)

[link] Please support this blog (with money)
Elizabeth (pktechgirl) · 2024-08-17T15:30:05.641Z · comments (3)

The Pearly Gates
lsusr · 2024-05-30T04:01:14.198Z · comments (6)

Simple versus Short: Higher-order degeneracy and error-correction
Daniel Murfet (dmurfet) · 2024-03-11T07:52:46.307Z · comments (6)

[link] Practically A Book Review: Appendix to "Nonlinear's Evidence: Debunking False and Misleading Claims" (ThingOfThings)
tailcalled · 2024-01-03T17:07:13.990Z · comments (25)

The Parable Of The Fallen Pendulum - Part 1
johnswentworth · 2024-03-01T00:25:00.111Z · comments (32)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Social status part 1/2: negotiations over object-level preferences
Steven Byrnes (steve2152) · 2024-03-05T16:29:07.143Z · comments (15)

The case for more ambitious language model evals
Jozdien · 2024-01-30T00:01:13.876Z · comments (30)

Ten arguments that AI is an existential risk
KatjaGrace · 2024-08-13T17:00:03.397Z · comments (41)

Introduction to French AI Policy
Lucie Philippon (lucie-philippon) · 2024-07-04T03:39:45.273Z · comments (12)

You should go to ML conferences
Jan_Kulveit · 2024-07-24T11:47:52.214Z · comments (13)

Being nicer than Clippy
Joe Carlsmith (joekc) · 2024-01-16T19:44:23.893Z · comments (32)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

[link] A primer on the current state of longevity research
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-22T17:14:57.990Z · comments (6)

Please stop using mediocre AI art in your posts
Raemon · 2024-08-25T00:13:52.890Z · comments (24)

What I Would Do If I Were Working On AI Governance
johnswentworth · 2023-12-08T06:43:42.565Z · comments (32)

A Selection of Randomly Selected SAE Features
CallumMcDougall (TheMcDouglas) · 2024-04-01T09:09:49.235Z · comments (2)

' petertodd'’s last stand: The final days of open GPT-3 research
mwatkins · 2024-01-22T18:47:00.710Z · comments (16)

Repeal the Jones Act of 1920
Zvi · 2024-11-27T15:00:06.801Z · comments (18)

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda (neel-nanda-1) · 2023-12-23T02:44:24.270Z · comments (8)

Attitudes about Applied Rationality
Camille Berger (Camille Berger) · 2024-02-03T14:42:22.770Z · comments (18)

The Leopold Model: Analysis and Reactions
Zvi · 2024-06-14T15:10:03.480Z · comments (19)

OthelloGPT learned a bag of heuristics
jylin04 · 2024-07-02T09:12:56.377Z · comments (10)

"AI Alignment" is a Dangerously Overloaded Term
Roko · 2023-12-15T14:34:29.850Z · comments (100)

Clarifying METR's Auditing Role
Beth Barnes (beth-barnes) · 2024-05-30T18:41:56.029Z · comments (1)

[link] My techno-optimism [By Vitalik Buterin]
habryka (habryka4) · 2023-11-27T23:53:35.859Z · comments (17)

2023 in AI predictions
jessicata (jessica.liu.taylor) · 2024-01-01T05:23:42.514Z · comments (35)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks (samuel-marks) · 2024-04-18T16:17:39.136Z · comments (10)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

[link] Most smart and skilled people are outside of the EA/rationalist community: an analysis
titotal (lombertini) · 2024-07-12T12:13:56.215Z · comments (36)

[question] How do you feel about LessWrong these days? [Open feedback thread]
jacobjacob · 2023-12-05T20:54:42.317Z · answers+comments (281)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

thomas-kwa on Eli's shortform feed

Whether or not it would happen by default, this would be the single most useful LW feature for me. I'm often really unsure whether a post will get enough attention to be worth making it a longform, and sometimes even post shortforms like "comment if you want this to be a longform".

cstinesublime on Raemon's Shortform

Feedback loops I think are the principle bottleneck in my skill development, aside from the fact that if you're a notice you don't even know what you should be noticing (even if you have enough awareness to be cognizant of all signs and outputs of an act).

To give an example, I'm currently trying to learn how to generate client leads through video content for Instagram. Unless someone actually tells me about a video they liked and what they liked about it, figuring out how to please the algorithm to generate more engagement is hard. The only thing that "works" - tagging other people. Nothing about the type of content, the framing of the shots, the subject matter, the audio... nope... just whether or not one or more other Instagram accounts are tagged in it. (Of course since the end objective is - 'get commissioned' perhaps optimizing for Instagram engagement is not even the thing I should be optimizing at all... how would I know?)
Feedback loops are hard. A desirbale metaskill to have would be developing tight feedback loops.

cstinesublime on sarahconstantin's Shortform

It's been a while since I've read Plato's Republic, but isn't the Myth of Er just a abstraction of the way people make decision based on (perceived) justice and injustice in their everyday life? Just in the same way that Socrates says it is easier to read large print than small print, so he scales up justice from an individual to the titular Kallipolis, so too the day to day determinism of choices motivated by what we consider is 'fair' or 'just' is easier seen if multiplied over endless cycles of lives, than days and nights.

Is it possible that Plato was saying that day to day we experience this homeostatic mechanism? (if you are rational enough to observe the patterns of how your choices affect your personal circumstances?).

An example from the Republic itself: if I remember correctly the entire dialogue starts because Socrates is in effect kidnapped after the end of a festival because his interlocutors find him so darn entertaining. This would appear to be unjust - but not unexpected because he is Socrates which he has this reputation for being engaging and wise even if it is not the 'right' or 'just' way to treat him. How then should he behave in future, knowing that this is the potential cost of his social behavior? And the Myth of Er says that Odysseus kept to himself, sought neither virtue nor tyranny. That's probably the wrong reading. It's been a while since I've read it.

mo-putera on AI #92: Behind the Curve

One frustrating conversation was about persuasion. Somehow there continue to be some people who can at least somewhat feel the AGI, but also genuinely think humans are at or close to the persuasion possibilities frontier – that there is no room to greatly expand one’s ability to convince people of things, or at least of things against their interests.
This is sufficiently absurd to me that I don’t really know where to start, which is one way humans are bad at persuasion. Obviously, to me, if you started with imitations of the best human persuaders (since we have an existence proof for that), and on top of that could correctly observe and interpret all the detailed signals, have limitless time to think, a repository of knowledge, the chance to do Monty Carlo tree search of the conversation against simulated humans, never make a stupid or emotional tactical decision, and so on, you’d be a persuasion monster. It’s a valid question ‘where on the tech tree’ that shows up how much versus other capabilities, but it has to be there. But my attempts to argue this proved, ironically, highly unpersuasive.

Scott tried out an intuition pump in responding to nostalgebraist's skepticism:

Nostalgebraist: ... it’s not at all clear that it is possible to be any better at cult-creation than the best historical cult leaders — to create, for instance, a sort of “super-cult” that would be attractive even to people who are normally very disinclined to join cults. (Insert your preferred Less Wrong joke here.) I could imagine an AI becoming L. Ron Hubbard, but I’m skeptical that an AI could become a super-Hubbard who would convince us all to become its devotees, even if it wanted to.
Scott: A couple of disagreements. First of all, I feel like the burden of proof should be heavily upon somebody who thinks that something stops at the most extreme level observed. Socrates might have theorized that it’s impossible for it to get colder than about 40 F, since that’s probably as low as it ever gets outside in Athens. But when we found the real absolute zero, it was with careful experimentation and theoretical grounding that gave us a good reason to place it at that point. While I agree it’s possible that the best manipulator we know is also the hard upper limit for manipulation ability, I haven’t seen any evidence for that so I default to thinking it’s false.
(lots of fantasy and science fiction does a good job intuition-pumping what a super-manipulator might look like; I especially recommend R. Scott Bakker’s Prince Of Nothing)
But more important, I disagree that L. Ron Hubbard is our upper limit for how successful a cult leader can get. L. Ron Hubbard might be the upper limit for how successful a cult leader can get before we stop calling them a cult leader.
The level above L. Ron Hubbard is Hitler. It’s difficult to overestimate how sudden and surprising Hitler’s rise was. Here was a working-class guy, not especially rich or smart or attractive, rejected from art school, and he went from nothing to dictator of one of the greatest countries in the world in about ten years. If you look into the stories, they’re really creepy. When Hitler joined, the party that would later become the Nazis had a grand total of fifty-five members, and was taken about as seriously as modern Americans take Stormfront. There are records of conversations from Nazi leaders when Hitler joined the party, saying things like “Oh my God, we need to promote this new guy, everybody he talks to starts agreeing with whatever he says, it’s the creepiest thing.” There are stories of people who hated Hitler going to a speech or two just to see what all the fuss was about and ending up pledging their lives to the Nazi cause. Even while he was killing millions and trapping the country in a difficult two-front war, he had what historians estimate as a 90% approval rating among his own people and rampant speculation that he was the Messiah. Yeah, sure, there was lots of preexisting racism and discontent he took advantage of, but there’s been lots of racism and discontent everywhere forever, and there’s only been one Hitler. If he’d been a little bit smarter or more willing to listen to generals who were, he would have had a pretty good shot at conquering the world. 100% with social skills.
The level above Hitler is Mohammed. I’m not saying he was evil or manipulative, just that he was a genius’ genius at creating movements. Again, he wasn’t born rich or powerful, and he wasn’t particularly scholarly. He was a random merchant. He didn’t even get the luxury of joining a group of fifty-five people. He started by converting his own family to Islam, then his friends, got kicked out of his city, converted another city and then came back at the head of an army. By the time of his death at age 62, he had conquered Arabia and was its unquestioned, God-chosen leader. By what would have been his eightieth birthday his followers were in control of the entire Middle East and good chunks of Africa. Fifteen hundred years later, one fifth of the world population still thinks of him as the most perfect human being ever to exist and makes a decent stab at trying to conform to his desires and opinions in all things.
The level above Mohammed is the one we should be worried about.

gwern on Rationality Quotes July 2014

I think Alistair might have mangled the story there. There does seem to be a Charles II/fish/weight story, but about a completely different weight - in water, not postmortem: https://gwern.net/doc/philosophy/epistemology/1948-oesper.pdf

(Although the sourcing here is still thinner than I'd like and may not be the original: no date is given, but Schönbein was born in 1799 and Charles II died in 1685, and an 1842 publication still leaves at least 157 years between the latest the story could've happened and this exact publication. But I'll leave it to someone else to try to track it further back.)

daniel-kokotajlo on Why Don't We Just... Shoggoth+Face+Paraphraser?

Thanks!

I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.

I suspect the same thing, they almost come right out and say it: (emphasis mine)

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.' I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.

daniel-kokotajlo on Why Don't We Just... Shoggoth+Face+Paraphraser?

Thanks!

I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.

I suspect the same thing, they almost come right out and say it: (emphasis mine)

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.' I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.

rhollerith_dot_com on keltan's Shortform

new cities like Los Alamos or Hanover

You mean Hanford.

denkenberger on Repeal the Jones Act of 1920

The thing is, there really are not all that many of them. Even if you counted every job at every shipyard, and every job aboard every Jones Act ship, and assumed all of them would be completely lost, it simply is not that many union workers.

But the Jones Act is massively benefiting truck and rail staff (and to some extent, pipelines), so I think there are a lot more workers you would need to compensate. Also, I would expect the truck and rail lobbies to try to save the Jones Act.

gwern on keltan's Shortform

Which was not terribly secret. The details of the Project were indeed super-secret, to the point where most of the politicians hadn't known anything, but despite the massive global-scale censorship & secrecy, many had observed the signs of a major project of some sort and some got as far as a nuclear bomb specifically. Also, commercial satellites with meter resolution did not exist which could quantify major facilities or new cities like Los Alamos or Hanford (but overflights, and then satellites, now exist and have helped reveal later nuclear bomb programs). I'm sure you can find plenty more about secrecy breaches in Rhodes.

This was not necessarily all that useful in the context of WWII - of course America had some big secret projects going, everyone did. It was a total world war. Everyone was aware there was a war on. The devil was in the details of what the program was - a failure like the V2-s, or a success like Enigma decrypts and Manhattan? But a binary exists/does-not-exist is useful in a peacetime context and the current discussion.

(If nothing else, the fact that DeepSeek keeps publishing is a signal. I would note here BTW that you cannot argue, without tying yourself into some pretzel knots explaining 4-D chess logic, that Chinese AI is about to catch up to and surpass the West because the best Chinese AI group, DeepSeek, just released a model or published this-or-that revealing the secrets of OA, and argue that there is already a secret all-out Chinese Manhattan Project going on which will potentially reach AGI first - because the first thing the latter would have done is stop the former from publishing anything which might help Western AI and then devour it for researchers.)