LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph (redhat) · 2024-10-28T14:41:41.969Z · comments (5)

[link] Fragile, Robust, and Antifragile Preference Satisfaction
adamShimi · 2024-11-02T17:25:55.986Z · comments (0)

[link] Chess As The Model Game
criticalpoints · 2024-11-17T19:45:26.499Z · comments (0)

Review: “The Case Against Reality”
David Gross (David_Gross) · 2024-10-29T13:13:29.643Z · comments (9)

Estimates of GPU or equivalent resources of large AI players for 2024/5
CharlesD · 2024-11-28T23:01:58.522Z · comments (1)

How likely is brain preservation to work?
Andy_McKenzie · 2024-11-18T16:58:54.632Z · comments (3)

[link] Update on the Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-11-04T19:22:06.540Z · comments (9)

Advisors for Smaller Major Donors?
jefftk (jkaufman) · 2024-11-06T14:30:06.187Z · comments (2)

In the Name of All That Needs Saving
pleiotroth · 2024-11-07T15:26:12.252Z · comments (2)

[link] AI & Liability Ideathon
Kabir Kumar (kabir-kumar) · 2024-11-26T13:54:01.820Z · comments (2)

Announcing the CLR Foundations Course and CLR S-Risk Seminars
JamesFaville (elephantiskon) · 2024-11-19T01:18:10.085Z · comments (0)

Using Dangerous AI, But Safely?
habryka (habryka4) · 2024-11-16T04:29:20.914Z · comments (2)

Heresies in the Shadow of the Sequences
Cole Wyeth (Amyr) · 2024-11-14T05:01:11.889Z · comments (12)

Long Live the Usurper
pleiotroth · 2024-11-27T12:10:51.025Z · comments (0)

[link] GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng (ccstan99) · 2024-11-01T00:10:50.718Z · comments (0)

Proposal to increase fertility: University parent clubs
Fluffnutt (Pear) · 2024-11-18T04:21:26.346Z · comments (3)

[link] A Little Depth Goes a Long Way: the Expressive Power of Log-Depth Transformers
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-20T11:48:14.170Z · comments (0)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

[link] Every niche event should also be a meetup
DMMF · 2024-11-19T20:47:50.053Z · comments (0)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)

[question] What epsilon do you subtract from "certainty" in your own probability estimates?
Dagon · 2024-11-26T19:13:46.795Z · answers+comments (6)

Evolutionary prompt optimization for SAE feature visualization
neverix · 2024-11-14T13:06:49.728Z · comments (0)

Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI
Seth Herd · 2024-11-12T18:23:53.533Z · comments (2)

Electric Grid Cyberattack: An AI-Informed Threat Model
moonlightmaze · 2024-11-11T21:34:17.190Z · comments (0)

LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction
Tristan Tran (tristan-tran) · 2024-11-09T20:58:09.182Z · comments (5)

New Funding Category Open in Foresight's AI Safety Grants
Allison Duettmann (allison-duettmann) · 2024-11-06T22:59:41.065Z · comments (0)

2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:46:18.674Z · comments (0)

Chaos Theory in Ecology
Elizabeth (pktechgirl) · 2024-11-09T17:50:01.727Z · comments (2)

[link] Levers for Biological Progress - A Response to "Machines of Loving Grace"
Niko_McCarty (niko-2) · 2024-11-01T16:35:08.221Z · comments (0)

Two arguments against longtermist thought experiments
momom2 (amaury-lorin) · 2024-11-02T10:22:11.311Z · comments (5)

[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)

Secular Solstice Songbook Update
jefftk (jkaufman) · 2024-11-17T17:30:07.404Z · comments (2)

[link] What if muscle tension is sometimes signal jamming?
Chipmonk · 2024-11-04T21:08:47.800Z · comments (1)

[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)

What can we learn from insecure domains?
Logan Zoellner (logan-zoellner) · 2024-11-01T23:53:30.066Z · comments (21)

[question] How can we prevent AGI value drift?
Dakara (chess-ice) · 2024-11-20T18:19:24.375Z · answers+comments (6)

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems
DanielFilan · 2024-11-14T07:00:06.977Z · comments (0)

Dance Differentiation
jefftk (jkaufman) · 2024-11-15T02:30:07.694Z · comments (0)

[link] I, Token
Ivan Vendrov (ivan-vendrov) · 2024-11-25T02:20:35.629Z · comments (2)

Aligning AI Safety Projects with a Republican Administration
Deric Cheng (deric-cheng) · 2024-11-21T22:12:27.502Z · comments (0)

[link] Disentangling Representations through Multi-task Learning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-11-24T13:10:26.307Z · comments (1)

Paraddictions: unreasonably compelling behaviors and their uses
Michael Cohn (michael-cohn) · 2024-11-22T20:53:59.479Z · comments (0)

Registrations Open for 2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:50:10.827Z · comments (0)

[link] [Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke · 2024-11-04T10:15:35.550Z · comments (0)

Crosspost: Developing the middle ground on polarized topics
juliawise · 2024-11-25T14:39:53.041Z · comments (15)

Curriculum of Ascension
andrew sauer (andrew-sauer) · 2024-11-07T23:54:18.983Z · comments (0)

Goal: Understand Intelligence
Johannes C. Mayer (johannes-c-mayer) · 2024-11-03T21:20:02.900Z · comments (19)

[question] Why is Gemini telling the user to die?
Burny · 2024-11-18T01:44:12.583Z · answers+comments (1)

[link] The lying p value
kqr · 2024-11-12T06:12:59.934Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cstinesublime on sarahconstantin's Shortform

It's been a while since I've read Plato's Republic, but isn't the Myth of Er just a abstraction of the way people make decision based on (perceived) justice and injustice in their everyday life? Just in the same way that Socrates says it is easier to read large print than small print, so he scales up justice from an individual to the titular Kallipolis, so too the day to day determinism of choices motivated by what we consider is 'fair' or 'just' is easier seen if multiplied over endless cycles of lives, than days and nights.

Is it possible that Plato was saying that day to day we experience this homeostatic mechanism? (if you are rational enough to observe the patterns of how your choices affect your personal circumstances?).

An example from the Republic itself: if I remember correctly the entire dialogue starts because Socrates is in effect kidnapped after the end of a festival because his interlocutors find him so darn entertaining. This would appear to be unjust - but not unexpected because he is Socrates which he has this reputation for being engaging and wise even if it is not the 'right' or 'just' way to treat him. How then should he behave in future, knowing that this is the potential cost of his social behavior? And the Myth of Er says that Odysseus kept to himself, sought neither virtue nor tyranny. That's probably the wrong reading. It's been a while since I've read it.

mo-putera on AI #92: Behind the Curve

One frustrating conversation was about persuasion. Somehow there continue to be some people who can at least somewhat feel the AGI, but also genuinely think humans are at or close to the persuasion possibilities frontier – that there is no room to greatly expand one’s ability to convince people of things, or at least of things against their interests.
This is sufficiently absurd to me that I don’t really know where to start, which is one way humans are bad at persuasion. Obviously, to me, if you started with imitations of the best human persuaders (since we have an existence proof for that), and on top of that could correctly observe and interpret all the detailed signals, have limitless time to think, a repository of knowledge, the chance to do Monty Carlo tree search of the conversation against simulated humans, never make a stupid or emotional tactical decision, and so on, you’d be a persuasion monster. It’s a valid question ‘where on the tech tree’ that shows up how much versus other capabilities, but it has to be there. But my attempts to argue this proved, ironically, highly unpersuasive.

Scott tried out an intuition pump in responding to nostalgebraist's skepticism:

Nostalgebraist: ... it’s not at all clear that it is possible to be any better at cult-creation than the best historical cult leaders — to create, for instance, a sort of “super-cult” that would be attractive even to people who are normally very disinclined to join cults. (Insert your preferred Less Wrong joke here.) I could imagine an AI becoming L. Ron Hubbard, but I’m skeptical that an AI could become a super-Hubbard who would convince us all to become its devotees, even if it wanted to.
Scott: A couple of disagreements. First of all, I feel like the burden of proof should be heavily upon somebody who thinks that something stops at the most extreme level observed. Socrates might have theorized that it’s impossible for it to get colder than about 40 F, since that’s probably as low as it ever gets outside in Athens. But when we found the real absolute zero, it was with careful experimentation and theoretical grounding that gave us a good reason to place it at that point. While I agree it’s possible that the best manipulator we know is also the hard upper limit for manipulation ability, I haven’t seen any evidence for that so I default to thinking it’s false.
(lots of fantasy and science fiction does a good job intuition-pumping what a super-manipulator might look like; I especially recommend R. Scott Bakker’s Prince Of Nothing)
But more important, I disagree that L. Ron Hubbard is our upper limit for how successful a cult leader can get. L. Ron Hubbard might be the upper limit for how successful a cult leader can get before we stop calling them a cult leader.
The level above L. Ron Hubbard is Hitler. It’s difficult to overestimate how sudden and surprising Hitler’s rise was. Here was a working-class guy, not especially rich or smart or attractive, rejected from art school, and he went from nothing to dictator of one of the greatest countries in the world in about ten years. If you look into the stories, they’re really creepy. When Hitler joined, the party that would later become the Nazis had a grand total of fifty-five members, and was taken about as seriously as modern Americans take Stormfront. There are records of conversations from Nazi leaders when Hitler joined the party, saying things like “Oh my God, we need to promote this new guy, everybody he talks to starts agreeing with whatever he says, it’s the creepiest thing.” There are stories of people who hated Hitler going to a speech or two just to see what all the fuss was about and ending up pledging their lives to the Nazi cause. Even while he was killing millions and trapping the country in a difficult two-front war, he had what historians estimate as a 90% approval rating among his own people and rampant speculation that he was the Messiah. Yeah, sure, there was lots of preexisting racism and discontent he took advantage of, but there’s been lots of racism and discontent everywhere forever, and there’s only been one Hitler. If he’d been a little bit smarter or more willing to listen to generals who were, he would have had a pretty good shot at conquering the world. 100% with social skills.
The level above Hitler is Mohammed. I’m not saying he was evil or manipulative, just that he was a genius’ genius at creating movements. Again, he wasn’t born rich or powerful, and he wasn’t particularly scholarly. He was a random merchant. He didn’t even get the luxury of joining a group of fifty-five people. He started by converting his own family to Islam, then his friends, got kicked out of his city, converted another city and then came back at the head of an army. By the time of his death at age 62, he had conquered Arabia and was its unquestioned, God-chosen leader. By what would have been his eightieth birthday his followers were in control of the entire Middle East and good chunks of Africa. Fifteen hundred years later, one fifth of the world population still thinks of him as the most perfect human being ever to exist and makes a decent stab at trying to conform to his desires and opinions in all things.
The level above Mohammed is the one we should be worried about.

gwern on Rationality Quotes July 2014

I think Alistair might have mangled the story there. There does seem to be a Charles II/fish/weight story, but about a completely different weight - in water, not postmortem: https://gwern.net/doc/philosophy/epistemology/1948-oesper.pdf

(Although the sourcing here is still thinner than I'd like and may not be the original: no date is given, but Schönbein was born in 1799 and Charles II died in 1685, and an 1842 publication still leaves at least 157 years between the latest the story could've happened and this exact publication. But I'll leave it to someone else to try to track it further back.)

daniel-kokotajlo on Why Don't We Just... Shoggoth+Face+Paraphraser?

Thanks!

I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.

I suspect the same thing, they almost come right out and say it: (emphasis mine)

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.' I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.

daniel-kokotajlo on Why Don't We Just... Shoggoth+Face+Paraphraser?

Thanks!

I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.

I suspect the same thing, they almost come right out and say it: (emphasis mine)

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.' I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.

rhollerith_dot_com on keltan's Shortform

new cities like Los Alamos or Hanover

You mean Hanford.

denkenberger on Repeal the Jones Act of 1920

The thing is, there really are not all that many of them. Even if you counted every job at every shipyard, and every job aboard every Jones Act ship, and assumed all of them would be completely lost, it simply is not that many union workers.

But the Jones Act is massively benefiting truck and rail staff (and to some extent, pipelines), so I think there are a lot more workers you would need to compensate. Also, I would expect the truck and rail lobbies to try to save the Jones Act.

gwern on keltan's Shortform

Which was not terribly secret. The details of the Project were indeed super-secret, to the point where most of the politicians hadn't known anything, but despite the massive global-scale censorship & secrecy, many had observed the signs of a major project of some sort and some got as far as a nuclear bomb specifically. Also, commercial satellites with meter resolution did not exist which could quantify major facilities or new cities like Los Alamos or Hanford (but overflights, and then satellites, now exist and have helped reveal later nuclear bomb programs). I'm sure you can find plenty more about secrecy breaches in Rhodes.

This was not necessarily all that useful in the context of WWII - of course America had some big secret projects going, everyone did. It was a total world war. Everyone was aware there was a war on. The devil was in the details of what the program was - a failure like the V2-s, or a success like Enigma decrypts and Manhattan? But a binary exists/does-not-exist is useful in a peacetime context and the current discussion.

(If nothing else, the fact that DeepSeek keeps publishing is a signal. I would note here BTW that you cannot argue, without tying yourself into some pretzel knots explaining 4-D chess logic, that Chinese AI is about to catch up to and surpass the West because the best Chinese AI group, DeepSeek, just released a model or published this-or-that revealing the secrets of OA, and argue that there is already a secret all-out Chinese Manhattan Project going on which will potentially reach AGI first - because the first thing the latter would have done is stop the former from publishing anything which might help Western AI and then devour it for researchers.)

quila on quila's Shortform

this could have been noise, but i noticed an increase in text fearing spies, in the text i've seen in the past few days^[1]. i actually don't know how much this concern is shared by LW users, so i think it might be worth writing that, in my view:

(AFAIK) both governments^[2] are currently reacting inadequately to unaligned optimization risk. as a starting prior, there's not strong reason to fear more one government {observing/spying on} ML conferences/gatherings over the other, absent evidence that one or the other will start taking unaligned optimization risks very seriously, or that one or the other is prone to race towards ASI.
- (AFAIK, we have more evidence that the U.S. government may try to race, e.g. this [LW · GW], but i could have easily missed evidence as i don't usually focus on this)
- tangentially, a more-pervasively-authoritarian government could be better situated to prevent unilaterally-caused risks (cf a similar argument in 'The Vulnerable World Hypothesis'), if it sought to. (edit: andif the AI labs closest to causing those risks were within its borders, which they are not atm)
  - this argument feels sad (or reflective of a sad world?) to me to be clear, but it seems true in this case

that said i don't typically focus on governance or international-AI-politics, so have not put much thought into this.

^{^}
examples: yesterday, saw this twitter/x post (via this quoting post)
today, opened lesswrong and saw this shortform about two uses of the word spy [LW(p) · GW(p)] and this shortform about how it's hard to have evidence against the existence of manhattan projects [LW(p) · GW(p)]
this was more than usual, and i sense that it's part of a pattern
^{^}
of those of US/china

vladimir_nesov on Estimates of GPU or equivalent resources of large AI players for 2024/5

Llama-3-405B is an important anchor for compute of other models. With 4e25 FLOPs and conservative training techniques it's about as capable, so the other models probably don't use much more. If they have better techniques, they need less compute to get similar performance, not more. And they probably didn't train for more than 6 months. At $2 per H100-hour^[1], $3 billion buys 6 months of time on 300K H100s. There are no publicly known training systems this large, the first 100K H100s systems started appearing in the later part of this year. Thus the training cost figures must include smaller experiments that in aggregate eat more compute than the largest training runs, through the now-ubiquitous smaller clusters also used for inference.

So anchoring to total number of GPUs is misleading about frontier model training because most GPUs are used for inference and smaller experiments, and the above estimate shows that figures like $3 billion for training are also poor anchors. If instead we look at 20K H100s as the typical scale of largest clusters in mid 2023 to early 2024, and 4 months as a typical duration of frontier model training, we get $120 million at $2 per H100-hour or 8e25 dense BF16 FLOPs at 40% compute utilization, only about 2x Llama-3-405B compute. This agrees with how Dario Amodei claimed that in Jun 2024 the scale of deployed models is about $100 million.

For what it's worth, since training the largest models requires building the training system yourself, which makes the market price of renting fewer GPUs from much smaller clusters not that relevant. ↩︎