LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

The OODA Loop -- Observe, Orient, Decide, Act
Davis_Kingsley · 2025-01-01T08:00:27.979Z · comments (2)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

Correct my H5N1 research ($reward)
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (24)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

DeekSeek v3: The Six Million Dollar Model
Zvi · 2024-12-31T15:10:06.924Z · comments (6)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (2)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (29)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)

[link] Careless thinking: A theory of bad thinking
Nathan Young · 2024-12-17T18:23:16.140Z · comments (17)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

Toy Models of Feature Absorption in SAEs
chanind · 2024-10-07T09:56:53.609Z · comments (8)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes
Anna Gajdova (anna-gajdova) · 2024-10-09T12:56:24.856Z · comments (14)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Greedy-Advantage-Aware RLHF
sej2020 · 2024-12-27T19:47:25.562Z · comments (15)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

Live Machinery: An Interface Design Philosophy for Wholesome AI Futures
Sahil · 2024-11-01T17:24:09.957Z · comments (3)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft
Andrew_Critch · 2024-12-03T09:29:49.745Z · comments (2)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

[link] A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld (Caspar42) · 2024-12-16T22:42:03.763Z · comments (1)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

Book a Time to Chat about Interp Research
Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · comments (3)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

AI #91: Deep Thinking
Zvi · 2024-11-21T14:30:06.930Z · comments (10)

D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer · 2024-12-14T04:30:55.656Z · comments (16)

AI #88: Thanks for the Memos
Zvi · 2024-10-31T15:00:07.412Z · comments (5)

Anthropic rewrote its RSP
Zach Stein-Perlman · 2024-10-15T14:25:12.518Z · comments (19)

What Indicators Should We Watch to Disambiguate AGI Timelines?
snewman · 2025-01-06T19:57:43.398Z · comments (7)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (5)

AI as a powerful meme, via CGP Grey
TheManxLoiner · 2024-10-30T18:31:58.544Z · comments (8)

Considerations on orca intelligence
Towards_Keeperhood (Simon Skade) · 2024-12-29T14:35:16.445Z · comments (5)

Minimal Motivation of Natural Latents
johnswentworth · 2024-10-14T22:51:58.125Z · comments (14)

~80 Interesting Questions about Foundation Model Agent Safety
RohanS · 2024-10-28T16:37:04.713Z · comments (4)

[link] The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
Eneasz · 2024-12-24T22:45:50.065Z · comments (4)

Detection of Asymptomatically Spreading Pathogens
jefftk (jkaufman) · 2024-12-05T18:20:02.473Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

aynonymousprsn123 on Is my distinctiveness evidence for being in a simulation?

My argument didn't even make those assumptions. Nothing in my argument "falsified" reality, nor did I "prove" the existence of something outside my immediate senses. It was merely a probabilistic, anthropic argument. Are you familiar with anthropics? I want to hear from someone who knows anthropics well.

Indeed, your video game scenario is not even really qualitatively different from my own situation. Because if I were born with 1000 HP, you could still argue "data from within the 'simulation'...is not proof of something 'without'." And you could update your "scientific" understanding of the distribution of HP to account for the fact that precisely one character has 1000 HP.

The difference between my scenario and the video game one is merely quantitative: Pr(1000 HP | I'm not in a video game) < Pr(I'm a superlative | I'm not in a simulation), though both probabilities are very low.

vladimir_nesov on (My) self-referential reason to believe in free will

An algorithm that computes 22+117 or something like that is free to compute it correctly, even as it's running on a physical computer that might be broken in a subtle way, possibly producing a different result. Identifying with an algorithm that your brain currently implements when making a decision doesn't seem different, you are just a more complicated algorithm, producing some result. What the physical world does with that result is a separate issue, but for purposes of this argument the algorithm is selected to be in tune with the world, it's an algorithm that the brain is currently simulating in detail.

deepthoughtlife on What Indicators Should We Watch to Disambiguate AGI Timelines?

Note: I wrote my comment while reading as notes to see what I thought of your arguments while reading more than as a polished thing.

I think your calibration on the 'slow scenario' is off. What you claim is the slowest plausible one is fairly clearly the median scenario given that it is pretty much just following current trends, and slower than present trend is clearly plausible. Things already slowed way down, with advancements in very narrow areas being the only real change. There is a reason that OpenAI hasn't dared even name something GPT 5, for instance. Even 03 isn't really an improvement on general llm duties and that is the 'exciting' new thing, as you pretty much say.

Advancement is disappointingly slow in AI that I personally use (mostly image generation, where new larger models are often not really better overall for the past year or so, and newer ones mostly use llm style architectures), for instance, and it is plausible that there will be barely any movement in terms of clear quality improvement in general uses over the next couple years. And image generation should be easier to improve than general llms because it should be earlier in the diminishing returns of scale (as the scale is much smaller). Note that since most are also diffusion models, they are already using an image equivalent of the trick o1 and o3 introduced with what I would argue is effectively chain of thought. For some reason, all the advancements I hear about these days seem like uninspired copies of things that already happened in image generation.

The one exception is 'agents' but those show no signs of present day usefulness. Who knows how quickly such things will become useful, but historical trends on new tech, especially in AI, say 'not soon' for real use. A lot of people and companies are very interested in the idea for obvious reasons, but that doesn't mean it will be fast. See also self-driving cars which has taken many times longer than expected, despite seeming like it is probably a success story in the making (for the distant future). In fact, self-driving cars are the real world equivalent of a narrow agent, and the insane difficulty they are having is strong evidence against agents being a transformatively useful thing soon.

I do think that AI as it currently is will have a transformative impact in the near term for certain activities (image generation for non-artists like me is already one of them), but I think the smartphone comparison is a good one; I still don't bother to use a smartphone (though it has many significant uses). I would be surprised if it had as big an impact as the worldwide web has on a year for year basis counting from the beginning of the www (supposedly in 1989) for that and 2014 when transformers were invented (or even 2018 when GPT1 became a thing) for AI, for instance. I like the comparison to the web because I think that AI going especially well would be a change to our information capacities similar to an internet 3.0. (Assuming you count the web as 2.0).

As to the fast scenario, that does seem like the fastest scenario that isn't completely ridiculous, but think that your belief in its probability is dramatically too high. I do agree that if you believe that self-play (in the AlphaGo sense) to generate good data is doable for poorly definable problems that would alleviate the lack of data issues we suffer in large parts of the space, but it is unlikely that would actually improve the quality of the data in the near term, and there are already a lot of data quality issues. I personally do not believe that o1 and o3 have at all 'shown' that synthetic data is a solved issue, and it wouldn't be for quite a while if ever.

Note that the image generation models already have been using synthetic data by teachers for a while now with 'SDXL Turbo' and other later adversarial distillation schemes. This did manage a several times speed boost, but at a cost of some quality, as all such schemes do. Crucially, no one has managed to increase quality this way, because the 'teacher' provides a maximum quality level you can't go beyond (except by pure luck).

Speculatively, you could perhaps improve quality by having a third model selecting the absolute best outputs of the teacher and only training on those until you have something better than the teacher, and then switching 'better than the teacher' into teacher and automatically start training a new student (or perhaps retraining the old teacher?). The problem is, how do you get that selection model that is actually better than the things you are trying to improve in its own self-play style learning rather than just getting them to fit the static model of a good output? Human data creation cannot be replaced in general without massive advancements in the field. You might be able to switch human data generation to just training the selection model though.

In some areas, you could perhaps train the AI directly on automatically generated data from sensors in the real world, but that seems like it would reduce the speed of progress to that of the real world unless you have that exponential increase in sensor data instead.

I do agree that in a fast scenario, it would clearly be algorithmic improvements rather than scale leading to it.

Also, o1 and o3 are only 'better' because of a willingness to use immensely more compute in the inference stage, and given that people already can't afford them, that route seems like a it will be played out after not too many generations of scaling, especially since hardware is improving so slowly these days. Chain of thought should probably be largely replaced with something more like what image generation models currently use where each step iterates on the current results. These could be combined together of course.

Diffusion models make a latent picture of a bunch of different areas, and each of those influences each other area in the future, so in text generation you could analogously have a chain of thought that is used in its entirety to create a new chain of thought. For example, you could use a ten deep chain of thought being used to create another ten deep chain of thought nine times instead of a hundred different options (with the first ten being generated by just the input of course). If you're crazy, it could literally be exponential, where you generate one for the first step, two in the second... 32 in the fifth, and so on.

"Identifying The Requirements for a Short Timeline"
I think you are missing an interesting way to tell if AI is accelerating AI research. A lot of normal research is eventually integrated into the next generation of products. If AI really was accelerating the process, you would see the integrations happening much more quickly, with a shorter lag time between 'new idea first published' and 'new idea integrated into a fully formed product' that is actually good. A human might take several months to test the idea, but if an AI could do the research, it could also replicate the other research incredibly quickly, and see how it works when combined with the other research.

(Ran out of steam when my computer crashed during the above paragraph, though I don't seem to have lost any of what I wrote since I do it in notepad.)

I would say the best way to tell you are in a shorter timeline is if it seems like gains from each advancement start broadening rather than narrowing. If each advancement applies narrowly, you need a truly absurd number of advancements, but if they are broad, far fewer.

Honestly, I see very little likelihood of what I consider AGI in the next couple decades at least (at least if you want it to have surpassed humanity), and if we don't break out of the current paradigm, not for much, much longer than that, if ever. You do have some interesting points, and seem reasonable, but I really can't agree with the idea that we are at all close to it. Also, your fast scenario seems more like it would be 20 years than 4. 4 years isn't the 'fast' scenario, it is the 'miracle' scenario. The 'slow scenario' reads like 'this might be the work of centuries, or maybe half of one if we are lucky'. The strong disagreement on how long these scenarios would take is because the point we are at now is far, far below what you seem to believe. We aren't even vaguely close.

As far as your writing goes, I think it was fairly well written structurally and was somewhat interesting, and I even agree that large parts of the 'fast' scenario as you laid it out make sense, but since you are wrong about the amount of time to associate with the scenarios, the overall analysis is very far off. I did find it to be worth my time to read.

roman-malov on Is "hidden complexity of wishes problem" solved?

I meant to imply that we do not have a robot capable of performing tasks of a similar level of difficulty to the 'saving grandma' task, with safety properties comparable to those that a human firefighter can provide when performing 'saving grandma' task.

Thanks for pointing that out, I will adjust the post.

hunterjay on My AI Predictions 2023 - 2026

I agree, I definitely underestimated video. Before publishing, I had a friend review my predictions and they called out video as being too low, and I adjusted upward in response and still underestimated it.

I'd now agree with 2026 or 2027 for coherent feature film length video, though I'm not sure if it would be at feature film artistic quality (including plot). I also agree with Her-like products in the next year or two!

Personally I would still expect cloud compute to still be used for robotics, but only in ways where latency doesn't matter (like a planning and reasoning system on top of a smaller local model, doing deeper analysis like "There's a bag on the floor by the door. Ordinarily it should be put away, but given that it wasn't there 5 minutes ago, it might be actively used right now, so I should leave it..."). I'm not sure the privacy concerns will trump convenience, like with phones.

I also now think virtual agents will start to become a big thing in 2025 and 2026, doing some kinds of remote work, or sizable chucks of existing jobs autonomously (while still not being able to automate most jobs end to end)!

sohaib-imran on quila's Shortform

I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?

Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?

Can't you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?

No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.

raemon on Dmitry Vaintrob's Shortform

FYI I think by the time I wrote Optimistic Assumptions, Longterm Planning, and "Cope" [LW · GW], I think I had updated on the things you criticize about it here (but, I had started writing it awhile ago from a different frame and there is something disjointed about it)

But, like, I did mean both halfs of this seriously:

I think you should be scared about this, if you're the sort of theoretic researcher, who's trying to cut at the hardest parts of the alignment problem (whose feedback loops are weak or nonexistent)
I think you should be scared about this, if you're the sort of Prosaic ML researcher who does have a bunch of tempting feedback loops for current generation ML, but a) it's really not clear whether or how those apply to aligning superintelligent agents, b) many of those feedback loops also basically translate into enhancing AI capabilities and moving us toward a more dangerous world.

...

Re:

For the last few weeks, I’ve been working on trying to find plans for AI safety. They should cover the whole problem, including the major hurdles after intent alignment.
I strongly disagree with this being a good thing to do! We're not going to have a good, end-to-end plan about how to save the world from AGI.

I think in some sense I agree with you – the actual real plans won't be end-to-end. And I think I agree with you about some kind of neuroticism that unhelpfully bleeds through a lot of rationalist work. (Maybe in particular: actual real solutions to things tend to be a lot messier than the beautiful code/math/coordination-frameworks an autistic idealist dreams up)

But, there's still something like "plans are worthless, but planning is essential." I think you should aim for the standard of "you have a clear story for how your plan fits into something that solves the hard parts of the problem." (or, we need way more people doing that sort of thing, since most people aren't really doing it at all)

Some ways that I think about End to End Planning (and, metastrategy more generally)

Because there are multiple failure modes, I treat myself as having multiple constraints I have to satisfy:

My plans should backchain from solving the key problems I think we ultimately need to solve
My plans should forward chain through tractable near goals with at least okay-ish feedback loops. (If the okay-ish feedback loops don't exist yet, try to be inventing them. Although don't follow that off a cliff either – I was intigued by Wentworth's recent note that overly focusing on feedback loops led him to predictably waste some time)
Ship something to external people, fairly regularly
Be Wholesome (that is to say, when I look at the whole of what I'm doing it feels healthy, not like I've accidentally min-maxed my way into some brittle extreme corner of optimization space)

And for end to end planning, have a plan for...

up through the end of my current OODA loop
(maybe up through a second OODA loop if I have a strong guess for how the first OODA loop goes)
as concrete a plan as I can, assuming no major updates from the first OODA loop, up through the end of the agenda.
as concrete a visualization of the followup steps after my plan ends, for how it goes on to positively impact the world.

End to End plans don't mean you don't need to find better feedbackloops or pivot. You should plan that into the plan (And also expect to be surprised about it anyway). But, I think if you don't concretely visualize how it fits together you're like to go down some predictably wasteful paths.

benquo on Oppression and production are competing explanations for wealth inequality.

You don't think an exceptional magnitude of recognition for doing useful things is evidence for exceptional capacity and willingness to make that capacity useful to others? Why not?

winstonbosan on Meal Replacements in 2025?

p=1, Soylent still seems to be the top choice at the moment. (They are running into some supply chain problem at the moment / recently.)

(Huel seemed fine too from personal experience. If you care about refined oil/canola oil and protein sources it could be a decent alt)

daniel-kokotajlo on What Indicators Should We Watch to Disambiguate AGI Timelines?

That makes sense -- I should have mentioned, I like your post overall & agree with the thesis that we should be thinking about what short vs. long timelines worlds will look like and then thinking about what the early indicators will be, instead of simply looking at benchmark scores. & I like your slow vs. fast scenarios, I guess I just think the fast one is more likely. :)