LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Circuits in Superposition: Compressing many small neural networks into one
Lucius Bushnaq (Lblack) · 2024-10-14T13:06:14.596Z · comments (8)

[link] OpenAI's CBRN tests seem unclear
LucaRighetti (Error404Dinosaur) · 2024-11-21T17:28:30.290Z · comments (6)

"The Solomonoff Prior is Malign" is a special case of a simpler argument
David Matolcsi (matolcsid) · 2024-11-17T21:32:34.711Z · comments (44)

BIG-Bench Canary Contamination in GPT-4
Jozdien · 2024-10-22T15:40:48.166Z · comments (13)

Why Don't We Just... Shoggoth+Face+Paraphraser?
Daniel Kokotajlo (daniel-kokotajlo) · 2024-11-19T20:53:52.084Z · comments (51)

[link] The Dangers of Mirrored Life
Niko_McCarty (niko-2) · 2024-12-12T20:58:32.750Z · comments (7)

A bird's eye view of ARC's research
Jacob_Hilton · 2024-10-23T15:50:06.123Z · comments (12)

Passages I Highlighted in The Letters of J.R.R.Tolkien
Ivan Vendrov (ivan-vendrov) · 2024-11-25T01:47:59.071Z · comments (10)

[link] Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded
garrison · 2024-10-23T23:40:57.180Z · comments (1)

The Dream Machine
sarahconstantin · 2024-12-05T00:00:05.796Z · comments (6)

What o3 Becomes by 2028
Vladimir_Nesov · 2024-12-22T12:37:20.929Z · comments (10)

The o1 System Card Is Not About o1
Zvi · 2024-12-13T20:30:08.048Z · comments (5)

Scissors Statements for President?
AnnaSalamon · 2024-11-06T10:38:21.230Z · comments (31)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?
Thomas Kwa (thomas-kwa) · 2024-11-06T23:01:48.992Z · comments (35)

You should consider applying to PhDs (soon!)
bilalchughtai (beelal) · 2024-11-29T20:33:12.462Z · comments (19)

DeepSeek beats o1-preview on math, ties on coding; will release weights
Zach Stein-Perlman · 2024-11-20T23:50:26.597Z · comments (23)

Sorry for the downtime, looks like we got DDosd
habryka (habryka4) · 2024-12-02T04:14:30.209Z · comments (13)

The Big Nonprofits Post
Zvi · 2024-11-29T16:10:06.938Z · comments (10)

[link] Announcing turntrout.com, my new digital home
TurnTrout · 2024-11-17T17:42:08.164Z · comments (24)

Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke (Paulawurm) · 2024-12-17T23:58:19.222Z · comments (1)

Hierarchical Agency: A Missing Piece in AI Alignment
Jan_Kulveit · 2024-11-27T05:49:04.241Z · comments (20)

AIs Will Increasingly Attempt Shenanigans
Zvi · 2024-12-16T15:20:05.652Z · comments (2)

A shortcoming of concrete demonstrations as AGI risk advocacy
Steven Byrnes (steve2152) · 2024-12-11T16:48:41.602Z · comments (27)

I turned decision theory problems into memes about trolleys
Tapatakt · 2024-10-30T20:13:29.589Z · comments (20)

LLMs can learn about themselves by introspection
Felix J Binder (fjb) · 2024-10-18T16:12:51.231Z · comments (38)

Why comparative advantage does not help horses
Sherrinford · 2024-09-30T22:27:57.450Z · comments (15)

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck · 2024-10-10T13:36:53.810Z · comments (4)

[link] Advice for journalists
Nathan Young · 2024-10-07T16:46:40.929Z · comments (53)

[link] How to replicate and extend our alignment faking demo
Fabien Roger (Fabien) · 2024-12-19T21:44:13.059Z · comments (0)

MIRI’s 2024 End-of-Year Update
Rob Bensinger (RobbBB) · 2024-12-03T04:33:47.499Z · comments (2)

Bigger Livers?
sarahconstantin · 2024-11-08T21:50:09.814Z · comments (13)

[link] Seven lessons I didn't learn from election day
Eric Neyman (UnexpectedValues) · 2024-11-14T18:39:07.053Z · comments (33)

You can, in fact, bamboozle an unaligned AI into sparing your life
David Matolcsi (matolcsid) · 2024-09-29T16:59:43.942Z · comments (171)

The "Think It Faster" Exercise
Raemon · 2024-12-11T19:14:10.427Z · comments (13)

The case for unlearning that removes information from LLM weights
Fabien Roger (Fabien) · 2024-10-14T14:08:04.775Z · comments (15)

[link] Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman · 2024-11-06T16:00:06.940Z · comments (33)

[link] Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi (mtrazzi) · 2024-10-28T20:17:47.465Z · comments (5)

Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith (joekc) · 2024-12-18T18:22:34.059Z · comments (8)

[link] Sabotage Evaluations for Frontier Models
David Duvenaud (david-duvenaud) · 2024-10-18T22:33:14.320Z · comments (55)

2024 Petrov Day Retrospective
Ben Pace (Benito) · 2024-09-28T21:30:14.952Z · comments (25)

Hire (or become) a Thinking Assistant / Body Double
Raemon · 2024-12-23T03:58:42.061Z · comments (23)

Science advances one funeral at a time
Cameron Berg (cameron-berg) · 2024-11-01T23:06:19.381Z · comments (9)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (11)

LLMs Look Increasingly Like General Reasoners
eggsyntax · 2024-11-08T23:47:28.886Z · comments (45)

Zvi’s Thoughts on His 2nd Round of SFF
Zvi · 2024-11-20T13:40:08.092Z · comments (2)

2024 Unofficial LessWrong Census/Survey
Screwtape · 2024-12-02T05:30:53.019Z · comments (42)

A very strange probability paradox
notfnofn · 2024-11-22T14:01:36.587Z · comments (26)

The nihilism of NeurIPS
charlieoneill (kingchucky211) · 2024-12-20T23:58:11.858Z · comments (4)

Three Notions of "Power"
johnswentworth · 2024-10-30T06:10:08.326Z · comments (44)

Anvil Problems
Screwtape · 2024-11-13T22:57:41.974Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

We have met the first of our three fundraising goals! Thank you all so much! Seeing all the outpouring of support from so many different people has been very heartening.

braydenm on AI Control: Improving Safety Despite Intentional Subversion

Was a widely impactful piece of work, beyond the bounds of the less wrong community

tetraspace-grouping on shortplav

Dominance/submission dynamics in relationships

In Act I outputs Claudes do a lot of this, e.g. this screenshot of Sonnet 3.6

hide on Hire (or become) a Thinking Assistant / Body Double

It’s true any job can find unqualified applicants. What I’m saying is that this in particular relies on an untenably small niche of feasible candidates that will take an enormous amount of time to find/filter through on average.

Sure, you might get lucky immediately, but without a reliable way to find the “independently wealthy guy who’s an intellectual and is sufficiently curious about you specifically that he wants to sit silently and watch you for 8 hours a day for a nominal fee”, your recruitment time will, on average, be very long, especially in comparison to what would likely be a very short average tenure given the many countervailing opportunities that would be presented to such a candidate.

Yes, it’s possible in principle to articulate the perfect candidate, but my point is more about real-world feasibility.

tsvibt on What are the strongest arguments for very short timelines?

The burden is on you because you're saying "we have gone from not having the core algorithms for intelligence in our computers, to yes having them".

https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#The__no_blockers__intuition [LW · GW]

And I think you're admitting that your argument is "if we mush all capabilities together into one dimension, AI is moving up on that one dimension, so things will keep going up".

Would you say the same thing about the invention of search engines? That was a huge jump in the capability of our computers. And it looks even more impressive if you blur out your vision--pretend you don't know that the text that comes up on your screen is written by a humna, and pretend you don't know that search is a specific kind of task distinct from a lot of other activity that would be involved in "True Understanding, woooo"--and just say "wow! previously our computers couldn't write a poem, but now with just a few keystrokes my computer can literally produce Billy Collins level poetry!".

Blurring things together at that level works for, like, macroeconomic trends. But if you look at macroeconomic trends it doesn't say singularity in 2 years! Going to 2 or 10 years is an inside-view thing to conclude! You're making some inference like "there's an engine that is very likely operating here, that takes us to AGI in xyz years".

nim on Hire (or become) a Thinking Assistant / Body Double

Oops! I only realized in your reply that you're considering "reliability" the load-bearing element. Yes, the hiring pipeline will look like a background noise of consistent interest from the unqualified, and sporadic hits from excellent candidates. You're approaching it from the perspective that the background noise of incompetents is the more important part, whereas I think that the availability of an adequate candidate eventually is the important part.

I think this because basically anywhere that hires can reliably find unqualified applicants. For a role where people stay in the job for 6 months, for instance, you only need to find a suitable replacement once every 6 months... so "reliably" being able to find an excellent candidate every day seems simply irrelevant.

ulrik-horn on Last Line of Defense: Minimum Viable Shelters for Mirror Bacteria

Just a note that I intend to answer this comment, but it might be a couple of days.

cakubilo on People aren't properly calibrated on FrontierMath

For example, if the statement to be proved is say independent of ZFC, then no computer that can be computed from a Turing Machine (which includes all LLMs) can resolve the conjecture, and due to independent statements, you can make conjectures that are arbitrarily hard to solve, and even the non-independent conjectures may in practice be unsolvable by any human or AI for a long time, which means the benchmark is less useful for real AIs.

I don't believe this is true, actually! What do you mean by "resolve the conjecture"? If you mean write up with a proof of it, then of course you can write a turing machine that will write a proof of the conjecture, it's just infinite monkeys. ZFC is best thought of as the "minimal set of axioms to do most math". It's not anything particularly special. You can have various foundations such as ETCS, NF, Type theory, etc. If we have a model that can genuinely reason mathematically, then the set of axioms the model uses should be immaterial to its mathematical ability. In fact, it should certainly be able to handle more or less axioms, like replacing full choice with countable choice etc. Maybe I misunderstood your point here.

While I am not experienced at all in formalizing math, and thus am willing to update and be corrected by any expert on mathematics, especially those that formalize mathematics in proof assistants, I'd expect 2 language independent reasons for why formalizing mathematics in proof assistants are difficult:

But my point was that there are things that should be extremely easy, like proving lemmas about elementary row transformations, that have not been done in Lean yet. That is not due to a lack of people formalizing, but due to fundamental limitations with the proof assistant. The point that I'm failing to make explicit is that this seems like a copout. The ultimate naturalistic benchmark for an LLM's math ability is being able to formalize the undergraduate math curriculum! But it starts with having a proof assistant that is amenable to the formalization project, which seems to be the bottleneck today.

cole-wyeth on Cole Wyeth's Shortform

Most ordinary people don't know that no one understands how neural networks work (or even that modern "Generative A.I." is based on neural networks). This might be an underrated message since the inferential distance here is surprisingly high.

It's hard to explain the more sophisticated models that we often use to argue that human dis-empowerment is the default outcome but perhaps much better leveraged to explain these three points:

1) No one knows how A.I models / LLMs / neural nets work (with some explanation of how this is conceptually possible).

2) We don't know how smart they will get how soon.

3) We can't control what they'll do once they're smarter than us.

At least under my state of knowledge, this is also a particularly honest messaging strategy, because it emphasizes the fundamental ignorance of A.I. researchers.

cole-wyeth on Cole Wyeth's Shortform

A "Christmas edition" of the new book on AIXI is freely available in pdf form at http://www.hutter1.net/publ/uaibook2.pdf