LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI companies aren't really using external evaluators
Zach Stein-Perlman · 2024-05-24T16:01:21.184Z · comments (15)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (51)

Refusal in LLMs is mediated by a single direction
Andy Arditi (andy-arditi) · 2024-04-27T11:13:06.235Z · comments (93)

[link] Explore More: A Bag of Tricks to Keep Your Life on the Rails
Shoshannah Tekofsky (DarkSym) · 2024-09-28T21:38:52.256Z · comments (13)

The Online Sports Gambling Experiment Has Failed
Zvi · 2024-11-11T14:30:04.371Z · comments (25)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (30)

SAE feature geometry is outside the superposition hypothesis
jake_mendel · 2024-06-24T16:07:14.604Z · comments (17)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (31)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work
Rohin Shah (rohinmshah) · 2024-08-20T16:22:45.888Z · comments (33)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (88)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

The ‘strong’ feature hypothesis could be wrong
lewis smith (lsgos) · 2024-08-02T14:33:58.898Z · comments (17)

LLM Generality is a Timeline Crux
eggsyntax · 2024-06-24T12:52:07.704Z · comments (119)

AI Control: Improving Safety Despite Intentional Subversion
Buck · 2023-12-13T15:51:35.982Z · comments (14)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

Superbabies: Putting The Pieces Together
sarahconstantin · 2024-07-11T20:40:05.036Z · comments (37)

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"
Raemon · 2024-09-28T23:38:25.512Z · comments (69)

[link] "How could I have thought that faster?"
mesaoptimizer · 2024-03-11T10:56:17.884Z · comments (32)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (27)

Towards more cooperative AI safety strategies
Richard_Ngo (ricraz) · 2024-07-16T04:36:29.191Z · comments (132)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense
So8res · 2023-11-24T17:37:43.020Z · comments (83)

Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack (andrew-mack) · 2024-04-30T18:51:13.493Z · comments (40)

OpenAI: Fallout
Zvi · 2024-05-28T13:20:04.325Z · comments (25)

The Sun is big, but superintelligences will not spare Earth a little sunlight
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2024-09-23T03:39:16.243Z · comments (141)

[link] What TMS is like
Sable · 2024-10-31T00:44:22.612Z · comments (22)

Toward A Mathematical Framework for Computation in Superposition
Dmitry Vaintrob (dmitry-vaintrob) · 2024-01-18T21:06:57.040Z · comments (18)

Funny Anecdote of Eliezer From His Sister
Noah Birnbaum (daniel-birnbaum) · 2024-04-22T22:05:31.886Z · comments (6)

Pay Risk Evaluators in Cash, Not Equity
Adam Scholl (adam_scholl) · 2024-09-07T02:37:59.659Z · comments (19)

[link] Jaan Tallinn's 2023 Philanthropy Overview
jaan · 2024-05-20T12:11:39.416Z · comments (5)

Maybe Anthropic's Long-Term Benefit Trust is powerless
Zach Stein-Perlman · 2024-05-27T13:00:47.991Z · comments (21)

[link] Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy
garrison · 2024-02-10T19:52:55.191Z · comments (52)

The Hopium Wars: the AGI Entente Delusion
Max Tegmark (MaxTegmark) · 2024-10-13T17:00:29.033Z · comments (55)

This might be the last AI Safety Camp
Remmelt (remmelt-ellen) · 2024-01-24T09:33:29.438Z · comments (34)

Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes (steve2152) · 2023-12-01T17:30:52.720Z · comments (58)

The impossible problem of due process
mingyuan · 2024-01-16T05:18:33.415Z · comments (64)

[question] Examples of Highly Counterfactual Discoveries?
johnswentworth · 2024-04-23T22:19:19.399Z · answers+comments (100)

Response to Aschenbrenner's "Situational Awareness"
Rob Bensinger (RobbBB) · 2024-06-06T22:57:11.737Z · comments (27)

Optimistic Assumptions, Longterm Planning, and "Cope"
Raemon · 2024-07-17T22:14:24.090Z · comments (46)

Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu (Marc-Everin Carauleanu) · 2024-07-30T16:22:29.561Z · comments (43)

Making a conservative case for alignment
Cameron Berg (cameron-berg) · 2024-11-15T18:55:40.864Z · comments (39)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
orthonormal · 2024-08-06T02:32:41.364Z · comments (26)

What's Going on With OpenAI's Messaging?
ozziegooen · 2024-05-21T02:22:04.171Z · comments (13)

My AI Model Delta Compared To Christiano
johnswentworth · 2024-06-12T18:19:44.768Z · comments (73)

[link] The Compendium, A full argument about extinction risk from AGI
adamShimi · 2024-10-31T12:01:51.714Z · comments (49)

Two easy things that maybe Just Work to improve AI discourse
jacobjacob · 2024-06-08T15:51:18.078Z · comments (35)

My Interview With Cade Metz on His Reporting About Slate Star Codex
Zack_M_Davis · 2024-03-26T17:18:05.114Z · comments (187)

On Not Pulling The Ladder Up Behind You
Screwtape · 2024-04-26T21:58:29.455Z · comments (21)

OMMC Announces RIP
Adam Scholl (adam_scholl) · 2024-04-01T23:20:00.433Z · comments (5)

A basic systems architecture for AI agents that do autonomous research
Buck · 2024-09-23T13:58:27.185Z · comments (15)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

kvmanthinking on Have we seen any "ReLU instead of sigmoid-type improvements" recently

How were these discovered? Slow, deliberate thinking, or someone trying some random thing to see what it does and suddenly the AI is a zillion times smarter?

signer on Ethical Implications of the Quantum Multiverse

Things like lions, and chairs are other examples.

And counted branches.

This is how Wallace defines it (he in turn defines macroscopically indistinguishable in terms of providing the same rewards). It’s his term in the axiomatic system he uses to get decision theory to work. There’s not much to argue about here?

His definition leads to contradiction with informal intuition that motivates consideration of macroscopical indistinguishability in the first place.

We should care about low-measure instances in proportion to the measure, just as in classical decision theory we care about low-probability instances in proportion to the probability.

Why? Wallace's argument is just "you don't care about some irrelevant microscopic differences, so let me write this assumption that is superficially related to that preference, and here - it implies the Born rule". Given MWI, there is nothing wrong physically or rationally in valuing your instances equally whatever their measure is. Their thoughts and experiences don't depend on measure the same way they don't depend on thickness or mass of a computer implementing them. You can rationally not care about irrelevant microscopic differences and still care about number of your thin instances.

christiankl on Are the majority of your ancestors farmers or non-farmers?

If you disagree with the question, why answer deep in the comments of one answer than at the top level?

abstractapplic on Lessons I've Learned from Self-Teaching

compute

I don't remember the equations for integration by parts and haven't used them in years. However, when I saw this, I immediately started scribbling on the whiteboard by my bed, thinking:

"Okay, so start with (x^2)log(x). Differentiating that gives two times the target, but also gives us a spare x we'd need to get rid of. So the answer is (0.5)(x^2)log(x) - (x^2)/4."

So I actually think you're right in general but wrong on this specific example: getting a deep sense for what you're doing when you're doing integration-by-parts would be a more robust help than rote memorization.

(Though rote memorization and regular practice absolutely have their place; if I'd done more of those I'd have remembered to stick a "+c" on the end.)

sen on Slack matters more than any outcome

I think you're missing an important edge case where all of your resolved subsystems are in agreement that their collective desires are simultaneously compatible and unattainable without enormous amounts of motivation, which is something that an arms race can provide. Adaptation isn't just about spinning cycles and causing stress. It does have actual tangible outcomes, and not all of those outcomes are bad. Though I think for most people, your advice is probably close enough to the right advice.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

(Also, what Thane Ruthenis commented below.)

gianluca-calcagni on LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Hi Andrew, your post is very interesting and it made me think more carefully about the definition of consciousness and how it applies to LLMs. I'd be curious to get your feedback about a post of mine that, in my opinion, is related to yours - I am keen to receive even harsh judgement if you have any!
https://www.lesswrong.com/posts/e9zvHtTfmdm3RgPk2/all-the-following-are-distinct

green_leaf on LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Are you saying that after it has generated the tokens describing what the answer is, the previous thoughts persist, and it can then generate tokens describing them?

(I know that it can introspect on its thoughts during the single forward pass.)

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

I think the general impression of people on LW is that multipolar scenarios and concerns over "which monkey finds the radioactive banana and drags it home" [LW(p) · GW(p)]are in large part a driver of AI racing instead of being a potential impediment/solution to it. Individuals, companies, and nation-states justifiably believe that whichever one of them accesses potentially superhuman AGI first will have the capacity to flip the gameboard at-will, obtain power over the entire rest of the Earth, and destabilize the currently-existing system. Standard game theory explains the final inferential step for how this leads to full-on racing (see the recent U.S.-China Commission's report [LW · GW] for a representative example of how this plays out in practice).

At the risk of being overly spicy/unnuanced/uncharitable: I think quite a few MIRI [agent foundations] memes ("which monkey finds the radioactive banana and drags it home", ''automating safety is like having the AI do your homework'', etc.) seem very lazy/un-truth-tracking and probably net-negative at this point, and I kind of wish they'd just stop propagating them (Eliezer being probably the main culprit here).

Perhaps even more spicily, I similarly think that the old MIRI threat model of Consequentialism is looking increasingly 'tired'/un-truth-tracking, and there should be more updating away from it (and more so with every single increase in capabilities without 'proportional' increases in 'Consequentialism'/egregious misalignment).

(Especially) In a world where the first AGIs are not egregiously misaligned, it very likely matters enormously who builds the first AGIs and what they decide to do with them. While this probably creates incentives towards racing in some actors (probably especially the ones with the best chances to lead the race), I suspect better informing more actors (especially more of the non-leading ones, who might especially see themselves as more on the losing side in the case of AGI and potential destabilization) should also create incentives for (attempts at) more caution and coordination, which the leading actors might at least somewhat take into consideration, e.g. for reasons along the lines of https://aiprospects.substack.com/p/paretotopian-goal-alignment.

I get that we'd like to all recognize this problem and coordinate globally on finding solutions, by "mak[ing] coordinated steps away from Nash equilibria in lockstep" [LW · GW]. But I would first need to see an example, a prototype, of how this can play out in practice on an important and highly salient issue. Stuff like the Montreal Protocol banning CFCs doesn't count [LW(p) · GW(p)] because the ban only happened once comparably profitable/efficient alternatives had already been designed; totally disanalogous to the spot we are in right now, where AGI will likely be incredibly economically profitable [LW · GW], perhaps orders of magnitude more so than the second-best alternative.

I'm not particularly optimistic about coordination, especially the more ambitious kinds of plans (e.g. 'shut it all down', long pauses like in 'A narrow path...', etc.), and that's to a large degree (combined with short timelines and personal fit) why I'm focused on automated safety reseach. I'm just saying: 'if you feel like coordination is the best plan you can come up with/you're most optimistic about, there are probably more legible and likely also more truth-tracking arguments than superintelligence misalignment and loss of control'.

This is in large part why Eliezer often used to challenge readers and community members to ban [LW(p) · GW(p)] gain-of-function research, as a trial run [LW(p) · GW(p)] of sorts for how global coordination on pausing/slowing AI might go.

This seems quite reasonable; might be too late as a 'trial run' at this point though, if taken literally.

papetoast on papetoast's Shortforms

Joe Rogero [LW · GW]: Buying something more valuable with something less valuable should never feel like a terrible deal. If it does, something is wrong.
clone of saturn [LW · GW]: It's completely normal to feel terrible about being forced to choose only one of two things you value very highly.

https://www.lesswrong.com/posts/dRTj2q4n8nmv46Xok/cost-not-sacrifice?commentId=zQPw7tnLzDysRcdQv