LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

[link] electric turbofans
bhauth · 2024-11-02T22:50:59.807Z · comments (2)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
[deleted] · 2024-05-17T16:25:02.267Z · comments (10)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Noticing Panic
Cole Wyeth (Amyr) · 2024-02-05T03:45:51.794Z · comments (8)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (18)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

d0themath on The Median Researcher Problem

The technology's in a sweet spot where a custom statistical analysis needs to be developed, but it's also so important that the best minds will do that analysis and a community norm exists that we defer to them. Example: clinical trial results.

The argument seems to be about this stage, and from what I've heard clinical trials indeed take so much more time than is necessary. But maybe I've only heard about medical clinical trials, and actually academic biomedical clinical trials are incredibly efficient by comparison.

It also sounds like "community norm exists that we defer to [the best minds]" requires the community to identify who the best minds are, which presumably involves critiquing the research outputs of those best minds according to the standards of the median researcher, which often (though I don't know about biomedicine) ends up being something crazy like h-index or number of citations or number of papers or derivatives of such things.

dentin on Scissors Statements for President?

My belief is that it's primarily the voting system that causes this. (Not the electoral college; rather the whole 'first past the post' style of voting.) We see scissors presidents because that's the winning strategy.

I suspect that other more sophisticated voting systems (even just ranked choice!) would do better. No voting system is perfect, but 'first past the post' is particularly pathological.

eggsyntax on LLM Generality is a Timeline Crux

Thanks for sharing, I hadn't seen those yet! I've had too much on my plate since o1-preview came out to really dig into it, in terms of either playing with it or looking for papers on it.

How much does o1-preview update your view? It's much better at Blocksworld for example.

Quite substantially. Substantially enough that I'll add mention of these results to the post. I saw the near-complete failure of LLMs on obfuscated Blocksworld problems as some of the strongest evidence against LLM generality. Even more substantially since one of the papers is from the same team of strong LLM skeptics (Subbarao Kambhampati's) who produced the original results (I am restraining myself with some difficulty from jumping up and down and pointing at the level of goalpost-moving in the new paper).

There's one sense in which it's not an entirely apples-to-apples comparison, since o1-preview is throwing a lot more inference-time compute at the problem (in that way it's more like Ryan's hybrid approach to ARC-AGI). But since the key question here is whether LLMs are capable of general reasoning at all, that doesn't really change my view; certainly there are many problems (like capabilities research) where companies will be perfectly happy to spend a lot on compute to get a better answer.

Here's a first pass on how much this changes my numeric probabilities -- I expect these to be at least a bit different in a week as I continue to think about the implications (original text italicized for clarity):

LLMs continue to do better at block world and ARC as they scale: 75% -> 100%, this is now a thing that has happened (note that o1-preview also showed substantially improved results on ARC-AGI).
LLMs entirely on their own reach the grand prize mark on the ARC prize (solving 85% of problems on the open leaderboard) before hybrid approaches like Ryan's: 10% -> 20%, this still seems quite unlikely to me (especially since hybrid approaches have continued to improve on ARC). Most of my additional credence is on something like 'the full o1 turns out to already be close to the grand prize mark' and the rest on 'OpenAI capabilities researchers manage to use the full o1 to find an improvement to current LLM technique (eg a better prompting approach) that can be easily fixed'.
Scaffolding & tools help a lot, so that the next gen^[7] (GPT-5, Claude 4) + Python + a for loop can reach the grand prize mark^[8]: 60% -> 75% -- I'm tempted to put it higher, but it wouldn't be that surprising if o1-mark-2 didn't quite get there even with scaffolding/tools, especially since we don't have clear insight into how much harder the full test set is.
Same but for the gen after that (GPT-6, Claude 5): 75% -> 90%? I feel less sure about this one than the others; it sure seems awfully likely that o2 plus scaffolding will be able to do it! But I'm reluctant to go past 90% because progress could level off because of training data requirements, maybe the o1 -> o2 jump doesn't focus on optimizing for general reasoning, etc. It seems very plausible that I'll bump this higher on reflection.
The current architecture, including scaffolding & tools, continues to improve to the point of being able to do original AI research: 65%, with high uncertainty^[9] -> 80%. That sure does seem like the world we're living in. It's not clear to me that o1 couldn't already do original AI research with the right scaffolding. Sakana claims to have gotten there with GPT-4o / Sonnet, but their claims seem overblown to me.

Now that I've seen these, I'm going to have to think hard about whether my upcoming research projects in this area (including one I'm scheduled to lead a team on in the spring, uh oh) are still the right thing to pursue. I may write at least a brief follow-up post to this one arguing that we should all update on this question.

Thanks again, I really appreciate you drawing my attention to these.

buck on Anthropic: Three Sketches of ASL-4 Safety Case Components

What do you think of the arguments in this post that it's possible to make safety cases that don't rely on the model being unlikely to be a schemer?

towards_keeperhood on [Intuitive self-models] 3. The Homunculus

I feel like life-force seems like a sensation that's different from what I'd expect from just having a thing in the world model with inherent surprisingness and ends-without-trajectory-predictions/"optimizerness" attached. ("Life-force" sounds more like "as if the thing had a soul" to me. I do not understand where this comes from but I don't see how I'd predict such a sensation in advance given just the inherent-surprisingness + optimizerness hypothesis.)

matt-putz on 5 homegrown EA projects, seeking small donors

I work at Open Philanthropy, and I recently let Gavin know that Open Phil is planning to recommend a grant of $5k to Arb for the second project on your list: Overview of AI Safety in 2024 (they had already raised ~$10k by the time we came across it). Thanks for writing this post Austin — it brought the funding opportunity to our attention.

Like other commenters on Manifund, I believe this kind of overview is a valuable reference for the field, especially for newcomers.

I wanted to flag that this project would have been eligible for our RFP for work that builds capacity to address risks from transformative AI. I worry that not all potential applicants are aware of the RFP or its scope, so I’ll take this opportunity to mention that this RFP’s scope is quite broad, including funding for:

Training and mentorship programs
Events
Groups
Resources, media, and communications
Almost any other type of project that builds capacity for advanced AI risks (in the sense of increasing the number of careers devoted to these problems, supporting people doing this work, and sharing knowledge related to this work).

More details at the link above. People might also find this page helpful, which lists all currently open application programs at Open Phil.

towards_keeperhood on [Intuitive self-models] 3. The Homunculus

Thanks for communicating your model well again!

I think we might mostly agree, but let's clarify.

I agree with all of:

In the course of predicting them well, the world-model invents some slightly-higher-level concept (or family of closely-interlinked concepts) that we call “cold”. And it notices and memorizes predictively-useful relationships between this new “cold” concept and other things in the world-model, e.g. shivering and ice.
I don’t think there’s more to the concept “cold” than the sum total of its associations with every other concept, with sensory input, and with motor output.

I also basically agree with:

I like to draw the distinction between understanding learning algorithms and understanding trained models. The former is kinda like what you learn in an ML course (gradient descent, training data, etc.) , the latter is kinda like what you learn in a mechanistic interpretability paper. I don’t think it’s realistic to “write code” for the “cold” concept, because I think it (like all concepts) emerges at the trained model level. It emerges from a learning algorithm, training environment, loss function, etc.

I agree that fully writing code would be quite a daunting task. I think my phrasing of "write code" was not great. But it's already some reductionist progress if you have something like:

if coldness concept gets more activated: increase activation of shivering anticipation; weakly increase activation of snow concept; ...

I don't think it's a worthwhile exercise to get very precise.

An important point I wanted to make here is just that the meaning of "cold" comes from the interactions with other concepts, and there's no such thing as an inherent independent meaning of the word "cold". (So when I hear 'If we look at naturalistic visual inputs that directly or indirectly trigger C, and they’re disproportionately pictures of clocks, then that’s some evidence that C “means” clock.' this seems a bit off to me, though not too bad.)

I guess I best try to explain why I felt some unease with your initial description of the cold example:

Suppose somebody said:
There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.
On the one hand, I would defend this passage as basically true.

Basically I think that some people - though a priory not you - would think that sth like "i feel cold because the cold-thermorecepters activate the corresponding cold concept" explains their sense of cold. However, if you just take this hypothesis which basically is "some sensors activate some concept" without anything else, then the concept would be completely shapeless and uninterpretable - unrelated to anything known.

I now think you probably didn't mean it in a nearly that bad way but not sure.

(But some parts of what you write seem to me like you have slightly weaker sensors about "how does a hypothesis actually constrain my anticipations / concentrate probability mass" or "what would this hypothesis predict if I didn't already know how I perceive it", and I do think those sensors are useful.)

(I also think that there is some hypothalamus-or-so buisness logic for what responses to trigger (e.g. shivers) from significant cold input signals that would need to be figured out if you want to get a good model of freezing/feeling-uncomfortably-cold, but that's about freezing in particular and not temperature as a property we model on objects.)

tyler-tracy on Anthropic: Three Sketches of ASL-4 Safety Case Components

I agree if there are privacy guarantees, then you can't get data from the real world, so you'd need honeypots. I guess I'm much more scared of the internal deployments where I don't think privacy guarantees are as likely. I'm trying to respond to someone who is like, "Control seems hard because you have to generate real-world looking input, and that seems hard" and I'm like "You can just use the real-world inputs and filter then with a monitor to find the most scary ones and perform auditing"

Yeah, I guess I'm saying that if you don't have a great way to elicit the capabilities from the model, you can substitute that by assuming that you are going against the best possible adversary you can make. This is much more conservative, but I don't think it makes control impossible to implement. In practice, I think it looks like finding the maximum capabilities of the model using your Elecitation Strategy, then boosting its capabilities even more with a scaffold and some clever red team tricks.

directedevolution on The Median Researcher Problem

It's not evidence, it's just an opinion!

But I don't agree with your presumption. Let me put it another way. Science matters most when it delivers information that is accurate and precise enough to be decision-relevant. Typically, we're in one of a few states:

The technology is so early that no level of statistical sophistication will yield decision-relevant results. Example: most single-cell omics in 2024 that I'm aware of, with respect to devising new biomedical treatments (this is my field).
The technology is so mature that any statistics required to parse it are baked into the analysis software, so that they get used by default by researchers of any level of proficiency. Example: Short read sequencing, where the extremely complex analysis that goes into obtaining and aligning reads has been so thoroughly established that undergraduates can use it mindlessly.
The technology's in a sweet spot where a custom statistical analysis needs to be developed, but it's also so important that the best minds will do that analysis and a community norm exists that we defer to them. Example: clinical trial results.

I think what John calls "memetic" research is just areas where the topics or themes are so relevant to social life that people reach for early findings in immature research fields to justify their positions and win arguments. Or where a big part of the money in the field comes from corporate consulting gigs, where the story you tell determines the paycheck you get. But that's not the fault of the "median researcher," it's a mixture of conflicts of interest and the influence of politics on scientific research communication.

winstonbosan on An alternative approach to superbabies

I did read the original. It was long and I skimmed it. It was better in the coherence-sense that the OOP didn’t post a probability on whether it is true or not. Hell, the OOP hedged it by saying “ Do I believe what I’m saying? Well, yes and no”.

I guess the core of my confusion is the radical mismatch in confidence projection in its explicit form and implicit form (through tone and context setting). [Note: the updated wording definitely tempers the expectations in the right direction, thou still a bit bonkers at first glance.]

50% is extremely high. And lighthearted tones are often used to convey a sense of “I know this is farfetched theory. But I hold this strong claim very/appropriately weakly”.