LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Toward Safety Cases For AI Scheming
Mikita Balesni (mykyta-baliesnyi) · 2024-10-31T17:20:06.019Z · comments (1)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (24)

Why our politicians aren't Median
Yair Halberstadt (yair-halberstadt) · 2024-11-03T14:03:33.779Z · comments (15)

[link] Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Nisan · 2024-10-25T04:37:00.828Z · comments (2)

Intricacies of Feature Geometry in Large Language Models
7vik (satvik-golechha) · 2024-12-07T18:10:51.375Z · comments (0)

Checking in on Scott's composition image bet with imagen 3
Dave Orr (dave-orr) · 2024-12-22T19:04:17.495Z · comments (0)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

[question] Could orcas be (trained to be) smarter than humans? 
Towards_Keeperhood (Simon Skade) · 2024-11-04T23:29:26.677Z · answers+comments (20)

Seeking Collaborators
abramdemski · 2024-11-01T17:13:36.162Z · comments (15)

An Illustrated Summary of "Robust Agents Learn Causal World Model"
Dalcy (Darcy) · 2024-12-14T15:02:44.828Z · comments (2)

[link] The Alignment Trap: AI Safety as Path to Power
crispweed · 2024-10-29T15:21:26.545Z · comments (17)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

Measuring whether AIs can statelessly strategize to subvert security measures
Alex Mallen (alex-mallen) · 2024-12-19T21:25:28.555Z · comments (0)

Reading RFK Jr so that you don’t have to
braces · 2024-11-22T00:59:19.583Z · comments (1)

U.S.-China Economic and Security Review Commission pushes Manhattan Project-style AI initiative
Phib · 2024-11-19T18:42:43.296Z · comments (7)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (2)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Neuroscience of human social instincts: a sketch
Steven Byrnes (steve2152) · 2024-11-22T16:16:52.552Z · comments (0)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Win/continue/lose scenarios and execute/replace/audit protocols
Buck · 2024-11-15T15:47:24.868Z · comments (2)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

o1 Turns Pro
Zvi · 2024-12-10T17:00:08.036Z · comments (3)

Estimates of GPU or equivalent resources of large AI players for 2024/5
CharlesD · 2024-11-28T23:01:58.522Z · comments (7)

A Conflicted Linkspost
Screwtape · 2024-11-21T00:37:54.035Z · comments (0)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (10)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

I Finally Worked Through Bayes' Theorem (Personal Achievement)
keltan · 2024-12-05T02:04:16.547Z · comments (6)

[link] Just one more exposure bro
Chipmonk · 2024-12-12T21:37:07.069Z · comments (6)

Correct my H5N1 research ($reward)
Elizabeth (pktechgirl) · 2024-12-09T19:07:03.277Z · comments (23)

[link] A toy evaluation of inference code tampering
Fabien Roger (Fabien) · 2024-12-09T17:43:40.910Z · comments (0)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (1)

[link] Ideas for benchmarking LLM creativity
gwern · 2024-12-16T05:18:55.631Z · comments (10)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (2)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

[link] Review: Breaking Free with Dr. Stone
TurnTrout · 2024-12-18T01:26:37.730Z · comments (4)

[link] cancer rates after gene therapy
bhauth · 2024-10-16T15:32:53.949Z · comments (0)

AI #94: Not Now, Google
Zvi · 2024-12-12T15:40:06.336Z · comments (3)

[link] Active Recall and Spaced Repetition are Different Things
Saul Munn (saul-munn) · 2024-11-08T20:14:56.092Z · comments (2)

Dave Kasten's AGI-by-2027 vignette
davekasten · 2024-11-26T23:20:47.212Z · comments (8)

An alternative approach to superbabies
Towards_Keeperhood (Simon Skade) · 2024-11-05T22:56:15.740Z · comments (19)

A Solution for AGI/ASI Safety
Weibing Wang (weibing-wang) · 2024-12-18T19:44:29.739Z · comments (21)

Looking back on the Future of Humanity Institute - Asterisk
jakeeaton · 2024-11-19T00:44:40.928Z · comments (0)

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams · 2024-11-07T15:39:06.854Z · comments (6)

[link] What Ketamine Therapy Is Like
Sable · 2024-11-11T11:09:08.602Z · comments (8)

[link] Epistemic status: poetry (and other poems)
Richard_Ngo (ricraz) · 2024-11-21T18:13:17.194Z · comments (5)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (4)

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer · 2024-10-29T01:21:03.075Z · comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

dr_s on What Goes Without Saying

On this issue specifically, I feel like the bar for what counts as an actually sane and non-dysfunctional organization to the average user of this website is probably way too lofty for 95% of workplaces out there (to be generous!) so it's not even that strange that it would be the case.

error on What Have Been Your Most Valuable Casual Conversations At Conferences?

I suspect this varies by event, and also what you think of as "value". At LessOnline I got a large fraction of the value out of side conversations, but that value mostly wasn't in the form of practical benefits; the kinds of conversations on offer were simply extremely scarce in the rest of my personal life.

OTOH, at Dragoncon I get most of the value from structured events and the general sense of being-among-one's-tribe. It's crowded and anonymous, making private conversations difficult, and I know plenty of other fans in my everyday life, so there's not that sense of "suddenly having a badly-needed outlet". Two decades ago, when fandom conventions were smaller and local geeks were (for me) rare-to-nonexistent, that was less true.

anthony-digiovanni on What are the strongest arguments for very short timelines?

(I might misunderstand you. My impression was that you're saying it's valid to extrapolate from "model XYZ does well at RE-Bench" to "model XYZ does well at developing new paradigms and concepts." But maybe you're saying that the trend of LLM success at various things suggests we don't need new paradigms and concepts to get AGI in the first place? My reply below assumes the former:)

I'm not saying LLMs can't develop new paradigms and concepts, though. The original claim you were responding to was that success at RE-Bench in particular doesn't tell us much about success at developing new paradigms and concepts. "LLMs have done various things some people didn't expect them to be able to do" doesn't strike me as much of an argument against that.

More broadly, re: your burden of proof claim, I don't buy that "LLMs have done various things some people didn't expect them to be able to do" determinately pins down an extrapolation to "the current paradigm(s) will suffice for AGI, within 2-3 years." That's not a privileged reference class forecast, it's a fairly specific prediction.

sharmake-farah on Noosphere89's Shortform

Counterfactual worlds are just other real/simulated worlds where we don't have the level of compute and specification/details to simulate that world, so we have to discuss counterfactuals more abstractly than full simulation.

d0themath on What Have Been Your Most Valuable Casual Conversations At Conferences?

If conversations are heavy tailed then we should in fact expect people to have singular & likely memorable high-value conversations.

sharmake-farah on People aren't properly calibrated on FrontierMath

I don't believe this is true, actually! What do you mean by "resolve the conjecture"? If you mean write up with a proof of it, then of course you can write a turing machine that will write a proof of the conjecture, it's just infinite monkeys. ZFC is best thought of as the "minimal set of axioms to do most math". It's not anything particularly special. You can have various foundations such as ETCS, NF, Type theory, etc. If we have a model that can genuinely reason mathematically, then the set of axioms the model uses should be immaterial to its mathematical ability. In fact, it should certainly be able to handle more or less axioms, like replacing full choice with countable choice etc. Maybe I misunderstood your point here.

My point was to gesture at the fact that open conjectures can be so hard that neither humans nor AI can solve them in reasonable time, and that it's too hard to determine whether a conjecture is even provable by an AI or human in reasonable time, so the proving open conjectures benchmark is not a great benchmark, and I was using independence of ZFC as an example of such conjectures.

Also, a Turing Machine in that case (where a statement is independent but not recursively independent) will only prove independence from ZFC, not a proof or disproof of the conjecture.

More here:

https://x.com/ElliotGlazer/status/1871121633018331499

But my point was that there are things that should be extremely easy, like proving lemmas about elementary row transformations, that have not been done in Lean yet. That is not due to a lack of people formalizing, but due to fundamental limitations with the proof assistant. The point that I'm failing to make explicit is that this seems like a copout. The ultimate naturalistic benchmark for an LLM's math ability is being able to formalize the undergraduate math curriculum! But it starts with having a proof assistant that is amenable to the formalization project, which seems to be the bottleneck today.

I definitely would think that formalizing an entire undergraduate math curriculum would be a great benchmark, but it would have to be designed very carefully to ensure the LLM doesn't break the benchmark's usefulness.

Also, when you state that it's extremely easy to formalize certain lemmas/theorems, are you saying that it's easy to formalize them flat out, including all necessary details, or are you saying that it's easy to give a paper that would be accepted by mathematicians, and if necessary (though the journey may be painful) you could formalize the argument, including all necessary details?

Bonus question: What do you think are the current limitations of proof assistants such as Lean, more specifically?

xpostah on Why is neuron count of human brain relevant to AI timelines?

Thanks for the links. Might go through when I find time.

Even if the papers prove that there's similiarities, I don't see how this proves anything about evolution versus within-lifetime learning.

But there's decent evidence that there's not much more initialization than that, and that that huge fraction of the brain has to slowly pick up knowledge within the human lifetime before it starts being useful, e.g. https://pmc.ncbi.nlm.nih.gov/articles/PMC9957955/

This seems like your strongest argument. I will have to study more to understand this.

our DNA has on the order of a megabyte to spend on the brain

That's it? Really? That is new information for me.

Tbh your argument might end up being persuasive to me. So thank you for writing them.

The problem is that me building a background in neuroscience to the point I'm confident I'm not being fooled, will take time. And I'm interested in neuroscience but not that interested in studying it just for AI safety reasons. If you have like a post that covers this argument well (around initialisation not storing a lot of information) it'll be nice. (But not necessary ofcourse, that's upto you)

nadav-brandes on AGI with RL is Bad News for Safety

I totally agree. I think it's another framing for why open-ended RL is much more dangerous than pure LLMs. Models trained with open-ended RL are rewarded based on their final results, and will produce any series of tokens that help with that. They are incentivized to produce CoTs that do not logically represent their true thinking.

Pure LLMs, on the other hand, have no desire other than making the next token as likely as possible, given whatever character they are currently simulating. Whether they are "good" or "bad", I can't see anything in their training that would incentivize them to develop an efficient secret language to communicate with their future selves in a way that produces misleading CoTs.

nadav-brandes on AGI with RL is Bad News for Safety

Before I attempt to respond to your objections, I want to first make sure that I understand your reasoning.

I think you're saying that in theory it would be better to have CoT systems based on pure LLMs, but you don't expect these to be powerful enough without open-ended RL, so this approach won't be incentivized and die out from competition against AI labs who do use open-ended RL. Is it a faithful summary of (part of) your view?

You are also saying that if done right, open-ended RL discourages models from learning to reason strongly in the forward pass. Can you explain what you mean exactly and why you think that?

I think you are also saying that models trained with open-ended RL are easier to align than pure LLMs. Is it because you expect them to be overall more capable (and therefore easier to do anything with, including alignment), or for another reason?

In case it helps to clarify our crux, I'd like to add that I agree with you that AI systems without open-ended RL would likely be much weaker than those with it, so I'm definitely expecting incentives to push more and more AI labs to use this technique. I just wish we could somehow push against these incentives. Pure LLMs producing weaker AI systems is in my opinion a feature, not a bug. I think our society would benefit from slower progress in frontier AGI.

florian-habermacher on Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

Would you personally answer Should we be concerned about eating too much soy? [LW(p) · GW(p)] with "Nope, definitely not", or do you just find it's a reasonable gamble to take?

Btw, thanks a lot for the post; MANY parallels with my past as more-serious-but-uncareful-vegan until body showed clear signs of issues that I realized only late as I'd have never believed anyone that healthy vegan diet is that tricky.