LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

[link] Perplexity wins my AI race
Elizabeth (pktechgirl) · 2024-08-24T19:20:10.859Z · comments (12)

Backdoors as an analogy for deceptive alignment
Jacob_Hilton · 2024-09-06T15:30:06.172Z · comments (2)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

What happens if you present 500 people with an argument that AI is risky?
KatjaGrace · 2024-09-04T16:40:03.562Z · comments (7)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

LLM Applications I Want To See
sarahconstantin · 2024-08-19T21:10:03.101Z · comments (5)

Refactoring cryonics as structural brain preservation
Andy_McKenzie · 2024-09-11T18:36:30.285Z · comments (14)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (0)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (23)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (15)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

[link] the Giga Press was a mistake
bhauth · 2024-08-21T04:51:24.150Z · comments (26)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

[question] What are the best arguments for/against AIs being "slightly 'nice'"?
Raemon · 2024-09-24T02:00:19.605Z · answers+comments (58)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

[link] Executable philosophy as a failed totalizing meta-worldview
jessicata (jessica.liu.taylor) · 2024-09-04T22:50:18.294Z · comments (40)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

[question] Am I confused about the "malign universal prior" argument?
nostalgebraist · 2024-08-27T23:17:22.779Z · answers+comments (33)

SB 1047: Final Takes and Also AB 3211
Zvi · 2024-08-27T22:10:07.647Z · comments (11)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (42)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Defining alignment research
Richard_Ngo (ricraz) · 2024-08-19T20:42:29.279Z · comments (23)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (4)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

(Salt) Water Gargling as an Antiviral
Elizabeth (pktechgirl) · 2024-11-22T18:00:02.765Z · comments (6)

Singular learning theory: exercises
Zach Furman (zfurman) · 2024-08-30T20:00:03.785Z · comments (5)

[Intuitive self-models] 1. Preliminaries
Steven Byrnes (steve2152) · 2024-09-19T13:45:27.976Z · comments (20)

[link] Self-Help Corner: Loop Detection
adamShimi · 2024-10-02T08:33:23.487Z · comments (6)

Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort (stanislavfort) · 2024-08-29T17:17:47.136Z · comments (8)

Research update: Towards a Law of Iterated Expectations for Heuristic Estimators
Eric Neyman (UnexpectedValues) · 2024-10-07T19:29:29.033Z · comments (2)

Parable of the vanilla ice cream curse (and how it would prevent a car from starting!)
Mati_Roy (MathieuRoy) · 2024-12-08T06:57:45.783Z · comments (21)

There is a globe in your LLM
jacob_drori (jacobcd52) · 2024-10-08T00:43:40.300Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

johannes-c-mayer on What Have Been Your Most Valuable Casual Conversations At Conferences?

At the 2024 LessWrong Community weekend I met somebody who I have been working with for perhaps 50 hours so far. They are better at certain programming related tasks than me, in a way provided utility. Before meeting them they where not even considering working on AI alignment related things. The conversation wen't something like this:

Johannes: What are you working on. Other Person: Web development. What are you working on? Johannes: I am trying to understand intelligence such that we can build a system that is capable enough to prevent other misaligned AI's from being build, and that we understand enough such that we can be sure that it wouldn't kill us. [...] Why are you not working on it? Other Person: (forgot what) Johannes: Oh then now is the perfect time to start working on it. Other Person: So what are you actually doing. Johannes: (Describes some methodologies.) Other Person: (Questions whether these methodologies are actually good, and thinks about how they could be better.) [...]

Actually this all happened after the event when traveling from the venue to the train station.

It doesn't happen that often that I get something really good out of a random meeting. Most of them are bad. However, I think the most important thing I do to get something out is to just immediately talk about the things that I am interested in. This efficiently filters out people, either because they are not interested, or because they can't talk it.

You can overdo this. Starting a conversation with "AI seems very powerful, I think it will likely destroy the world" can make other people feel awkward (I know from experience). However, the above formula of "what do you do" and then "and I do this" get's to the point very quickly without inducing awkwardness.

philh on When Is Insurance Worth It?

Oh, I think that also means that section is slightly wrong. You want to take insurance if

and the insurance company wants to offer it if

log (W_{t h e m}) < p log (W_{t h e m} + P - c) + (1 - p) log (W_{t h e m} + P) .

So define

V (W) = log (W - P) - (p log (W - c) + (1 - p) log (W))

as you did above. Appendix B suggests that you'd take insurance if $V (W_{y o u}) > 0$ and they'd offer it if $V (W_{t h e m}) < 0$ . But in fact they'd offer it if $V (W_{t h e m} + P) < 0$ .

lc on Shortform

I find it suspicious that a lot of the criticisms I read online of Indian-Americans (nepotism, obsequiousness, "dual loyalty", scheming) are very similar to the criticisms I hear of Jews.

osten on What Have Been Your Most Valuable Casual Conversations At Conferences?

It doesn't need to be a singular high-value conversation. I'd say the long-term value of conversations is heavy tailed and so it may pay to have lots of conversations of low expected value. https://www.sciencedirect.com/science/article/abs/pii/B9780124424500500250

annapurna on Recommendations on communities that discuss AI applications in society

One example of what I am talking about is the middle chapters of the book Genesis, where it discusses applications of AI in military and general governance.

I don't necessarily agree with the book's predictions, but it really got me thinking of a near term pre-AGI world.

https://www.axios.com/2024/11/19/henry-kissinger-ai-book-released

Someone recommended I create the community, maybe I will in the new year.

knight-lee on A Solution for AGI/ASI Safety

I feel your points are very intelligent. I also agree that specializing AI is a worthwhile direction.

It's very uncertain if it works, but all approaches are very uncertain, so humanity's best chance is to work on many uncertain approaches.

Unfortunately, I disagree it will happen automatically. Gemini 1.5 (and probably Gemini 2.0 and GPT-4) are Mixture of Experts models. I'm no expert, but I think that means that for each token of text, a "weighting function" decides which of the sub-models should output the next token of text, or how much weight to give each sub-model.

So maybe there is an AI psychiatrist, an AI mathematician, and an AI biologist inside Gemini and o1. Which one is doing the talking depends on what question is asked, or which part of the question the overall model is answering.

The problem is they they all output words to the same stream of consciousness, and refer to past sentences with the words "I said this," rather than "the biologist said this." They think that they are one agent, and so they behave like one agent.

My idea [LW · GW]—which I only thought of thanks to your paper—is to do the opposite. The experts within the Mixture of Experts model, or even the same AI on different days, do not refer to themselves with "I" but "he," so they behave like many agents.

:) thank you for your work!

I'm not disagreeing with your work, I'm just a little less optimistic than you and don't think things will go well unless effort is made. You wrote the 100 page paper so you probably understand effort more than me :)

Happy holidays!

kairos_ on Orienting to 3 year AGI timelines

The thing with NVIDIA though is that the IV is so high and so are premiums. I spent a few hours looking for a better trade than that, though I think it's pretty solid.

I think SPY calls can possibly be much better than NVIDIA calls. The market doesn't expect the stock market to go up significantly in the next few years, but I think theres a chance it will assuming timelines are short. Here's the SPY YoY growth during the internet boom in the 90s.

Year 2000 saw a -9.7% return ($86.54) 1999: +20.4% ($95.88) 1998: +28.7% ($79.65) 1997: +33.5% ($61.89) 1996: +22.5% ($46.37) 1995: +38.0% ($37.85) 1994: +0.4% ($27.42)

Here we see that from any two year period from 1995-1999, the stock market went up anywhere from 50% to 70%.

Thus, I don't think it's unreasonable to think SPY has a good chance of going up 50% - 70% by Jan 15 2027 (to be fair, the past two years had a YoY growth of ~25%)

If you buy a $855 Strike price call for that date and SPY increases 50% by then you get a 12x return. If SPY increases 70% you get a 62x return.

If you buy the highest Strike Price call for that day at $910 and SPY increase by 70%, you get an 83x return.

Something to think about at least. At this time I’m going to buy long dated SPY calls for 2-3 years out at the 800 range. Nvidia calls still look good but the premiums are just so expensive because of the companies recent massive growth and volatility, so I think SPY calls are the better option.I'm still thinking about how to hedge incase the upcoming chaos turns the market sour (perhaps a Taiwanese blockade, or NVIDIA profits being hurt by increasing government interference)

This is looking like a February 2020 moment.

mo-putera on Panology

Eric Drexler wrote two essays that seem related, which I really loved.

The first is How to Understand Everything (and why). It's short enough to be quoted essentially whole, so if you don't mind I'll do so:

In science and technology, there is a broad and integrative kind of knowledge that can be learned, but isn’t taught. It’s important, though, because it makes creative work more productive and makes costly blunders less likely.
Formal education in science and engineering centers on teaching facts and problem-solving skills in a series of narrow topics. It is true that a few topics, although narrow in content, have such broad application that they are themselves integrative: These include (at a bare minimum) substantial chunks of mathematics and the basics of classical mechanics and electromagnetism, with the basics of thermodynamics and quantum mechanics close behind.
Most subjects in science and engineering, however, are narrower than these, and advanced education means deeper and narrower education. What this kind of education omits is knowledge of extent and structure of human knowledge on a trans-disciplinary scale. This means understanding — in a particular, limited sense — everything.
To avoid blunders and absurdities, to recognize cross-disciplinary opportunities, and to make sense of new ideas, requires knowledge of at least the outlines of every field that might be relevant to the topics of interest. By knowing the outlines of a field, I mean knowing the answers, to some reasonable approximation, to questions like these:
What are the physical phenomena?
What causes them?
What are their magnitudes?
When might they be important?
How well are they understood?
How well can they be modeled?
What do they make possible?
What do they forbid?
And even more fundamental than these are questions of knowledge about knowledge:
What is known today?
What are the gaps in what I know?
When would I need to know more to solve a problem?
How could I find what I need?
It takes far less knowledge to recognize a problem than to solve it, yet in key respects, that bit of knowledge is more important: With recognition, a problem may be avoided, or solved, or an idea abandoned. Without recognition, a hidden problem may invalidate the labor of an hour, or a lifetime. Lack of a little knowledge can be a dangerous thing.
Looking back over the last few decades, I can see that I’ve invested considerably more than 10,000 hours in learning about the structures, relationships, contents, controversies, open problems, limitations, capabilities, developing an understanding of how the fields covered in the major journals fit together to constitute the current state of science and technology. In some areas, of course, I’ve dug deeper into the contents and tools of a field, driven by the needs of problem solving; in others, I know only the shape of the box and where it sits.
This sort of knowledge is a kind of specialty, really — a limited slice of learning, but oriented crosswise. Because of this orientation, though, it provides leverage in integrating knowledge from diverse sources. I am surprised by the range of fields in which I can converse with scientists and engineers at about the level of a colleague in an adjacent field. I often know what to ask about their research, and sometimes make suggestions that light their eyes.

The follow-up essay is How to Learn About Everything. It's again short enough to quote wholesale:

Note that the title above isn’t “how to learn everything”, but “how to learn about everything”. The distinction I have in mind is between knowing the inside of a topic in deep detail — many facts and problem-solving skills — and knowing the structure and context of a topic: essential facts, what problems can be solved by the skilled, and how the topic fits with others.
This knowledge isn’t superficial in a survey-course sense: It is about both deep structure and practical applications. Knowing about, in this sense, is crucial to understanding a new problem and what must be learned in more depth in order to solve it. The cross-disciplinary reach of nanotechnology almost demands this as a condition of competence.
Studying to learn about everything
To intellectually ambitious students I recommend investing a lot of time in a mode of study that may feel wrong. An implicit lesson of classroom education is that successful study leads to good test scores, but this pattern of study is radically different. It cultivates understanding of a kind that won’t help pass tests — the classroom kind, that is.
Read and skim journals and textbooks that (at the moment) you only half understand. Include Science and Nature.
Don’t halt, dig a hole, and study a particular subject as if you had to pass a test on it.
Don’t avoid a subject because it seems beyond you — instead, read other half-understandable journals and textbooks to absorb more vocabulary, perspective, and context, then circle back.
Notice that concepts make more sense when you revisit a topic.
Notice which topics link in all directions, and provide keys to many others. Consider taking a class.
Continue until almost everything you encounter in Science and Nature makes sense as a contribution to a field you know something about.
Why is this effective?
You learned your native language by immersion, not by swallowing and regurgitating spoonfuls of grammar and vocabulary. With comprehension of words and the unstructured curriculum of life came what we call “common sense”.
The aim of what I’ve described is to learn an expanded language and to develop what amounts to common sense, but about an uncommonly broad slice of the world. Immersion and gradual comprehension work, and I don’t know of any other way.
This process led me to explore the potential of molecular nanotechnology as a basis for high-throughput atomically precise manufacturing. If broad-spectrum common sense were more widespread among scientists, there would be no air of controversy around the subject, milestones like the U.S. National Academies report on molecular manufacturing would have been reached a decade earlier, and today’s research agenda and perception of global problems would be very different.

I think I prefer either of Drexler's approach, Sarah Constantin's / Scott's fact-posting [LW · GW], and Holden Karnofsky's learning by writing, all of which can start with endless breadth but also require (quoting Drexler) deep structure and practical applications as focusing mechanisms, to the sort of learning that I think might be incentivised by budding panologists having to maximise their minimum score across some standardised battery of tests. I also liked Sarah's suggestion at the end:

Ideally, a group of people writing fact posts on related topics, could learn from each other, and share how they think. I have the strong intuition that this is valuable. It's a bit more active than a "journal club", and quite a bit more casual than "research". It's just the activity of learning and showing one's work in public.

karl-krueger on Panology

One problem I've seen around consilience is that some people say "well, clearly you don't think that there is one thing" when what they actually have evidence for is "you don't agree with my analogies across distant fields". So "believing in the unity of science" can get conflated with "agreeing with a particular set of analogies."

Andy: "Evolutionary biology proves that capitalism is the correct economic system, because competition produces fitness among firms as it does among organisms."
Betty: "I'm not sure you can directly draw inferences from biology to economics like that, without dealing with a whole bunch of disanalogies between the objects of those fields."
Andy: "What, do you deny that biology and economics are both studying the same world?"

lblack on What Have Been Your Most Valuable Casual Conversations At Conferences?

Some casual conversations with strangers that were high instrumental value:

At my first (online) LessWrong Community Weekend in 2020, I happened to chat with Linda Linsefors. That was my first conversation with anyone working in AI Safety. I’d read about the alignment problem for almost a decade at that point and thought it was the most important thing in the world, but I’d never seriously considered working on it. MIRI had made it pretty clear that the field only needed really exceptional theorists, and I didn’t think I was one of those. That conversation with Linda started the process of robbing me of my comfortable delusions on this front. What she said made it seem more like the field was pretty inadequate, and perfectly normal theoretical physicists could maybe help just by applying the standard science playbook for figuring out general laws in a new domain. Horrifying. I didn't really believe it yet, but this conversations was a factor in me trying out AI Safety Camp a bit over a year later.

At my first EAG, I talked to someone who was waiting for the actual event to begin along with me. This turned out to be Vivek Hebbar, who I'd never heard of before. We got to talking about inductive biases of neural networks. We kept chatting about this research area sporadically for a few weeks after the event. Eventually, Vivek called me to talk about the idea that would become this post [LW · GW]. Thinking about that idea led to me understanding the connection between basin broadness and representation dimensionality in neural networks, which ultimately resulted in this [LW · GW] research. It was probably the most valuable conversation I’ve had at any EAG so far, and it was unplanned.

At my second EAG, someone told me that an idea for comparing NN representations I’d been talking to them about already existed, and was called centred kernel alignment. I don’t quite remember how that conversation started, but I think it might have been a speed friending event.

My first morning in the MATS kitchen area in Berkeley, someone [LW · GW] asked me if I’d heard about a thing called Singular Learning Theory. I had not. He went through his spiel on the whiteboard. He didn’t have the explanation down nearly as well back then, but it still very recognisably connected to how I’d been thinking about NN generalisation and basin broadness, so I kept an eye on the area.