LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

[link] How to Eradicate Global Extreme Poverty [RA video with fundraiser!]
aggliu · 2023-10-18T15:51:22.073Z · comments (5)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

Goal-Completeness is like Turing-Completeness for AGI
Liron · 2023-12-19T18:12:29.947Z · comments (26)

When to Get the Booster?
jefftk (jkaufman) · 2023-10-03T21:00:12.813Z · comments (15)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

Gemini 1.0
Zvi · 2023-12-07T14:40:05.243Z · comments (7)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

On Complexity Science
Garrett Baker (D0TheMath) · 2024-04-05T02:24:32.039Z · comments (19)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (7)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

The Shortest Path Between Scylla and Charybdis
Thane Ruthenis · 2023-12-18T20:08:34.995Z · comments (8)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (10)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

Wrong answer bias
lukehmiles (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Bounty: Diverse hard tasks for LLM agents
Beth Barnes (beth-barnes) · 2023-12-17T01:04:05.460Z · comments (31)

Childhood Roundup #3
Zvi · 2023-10-10T14:30:04.287Z · comments (3)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

[link] The point of a game is not to win, and you shouldn't even pretend that it is
mako yass (MakoYass) · 2023-09-28T15:54:27.990Z · comments (27)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

Basic Mathematics of Predictive Coding
Adam Shai (adam-shai) · 2023-09-29T14:38:28.517Z · comments (6)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

Competitive, Cooperative, and Cohabitive
Screwtape · 2023-09-28T23:25:52.723Z · comments (12)

[link] Urging an International AI Treaty: An Open Letter
Olli Järviniemi (jarviniemi) · 2023-10-31T11:26:25.864Z · comments (2)

On ‘Responsible Scaling Policies’ (RSPs)
Zvi · 2023-12-05T16:10:06.310Z · comments (3)

AISC 2024 - Project Summaries
NickyP (Nicky) · 2023-11-27T22:32:23.555Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

benito on Sabotage Evaluations for Frontier Models

Unfortunately a fair chunk of my information comes from non-online sources, so I do not have links to share.

I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with.

Corruption is rarely blatant. See this thread [LW(p) · GW(p)] for what I believe to be an example for the CEO of RAND misleading a senate committee about his beliefs about the existential threat posed by AI. See this discussion [LW · GW] about a time when an AI company attempted (Conjecture) to get critical comments about another AI company (OpenAI) taken down from LessWrong. I am not proposing a large conspiracy, I am describing lots of small bits of corruption and failures of integrity summing to system failure.

There will be millions of words of regulatory documents, and it is easy for things to slip such that some particular model class is not considered worth evaluating, or where the consequences of a failed evaluation is pretty weak.

jchan on If I care about measure, choices have additional burden (+AI generated LW-comments)

However, in Many-Worlds Interpretation (MWI), I split my measure between multiple variants, which will be functionally different enough to regard my future selves as different minds. Thus, the act of choice itself lessens my measure by a factor of approximately 10. If I care about this, I'm caring about something unobservable.

If we're going to make sense of living in a branching multiverse, then we'll need to adopt a more fluid concept of personal identity.

Scenario: I take a sleeping pill that will make me fall asleep in 30 minutes. However, the person who wakes up in my bed the next morning will have no memory of that 30-minute period; his last memory will be of taking the pill.

If I imagine myself experiencing that 30-minute interval, intuitively it doesn't at all feel like "I have less than 30 minutes to live." Instead, it feels like I'd be pretty much indifferent to being in this situation - maybe the person who wakes up tomorrow is not "me" in the artificial sense of having a forward-looking continuity of consciousness with my current self, but that's not really what I care about anyway. He is similar enough to current-me that I value his existence and well-being to nearly the same degree as I do my own; in other words, he "is me" for all practical purposes.

The same is true of the versions of me in nearby world branches. I can no longer observe or influence them, but they still "matter" to me. Of course, the degree of self-identification will decrease over time as they diverge, but then again, so does my degree of identification with the "me" many decades in the future, even assuming a single timeline.

askwho on OpenAI Email Archives (from Musk v. Altman)

I've turned this into a full cast recording with ElevenLabs, with individual voices for all the players:
https://open.substack.com/pub/askwhocastsai/p/openai-email-archives-from-musk-v

sharmake-farah on johnswentworth's Shortform

While I'm not a believer in the scaling has died meme yet, I'm glad you do have a plan for what happens if AI scaling does stop.

benito on Sabotage Evaluations for Frontier Models

But your story lacks any mechanism for accountability if the King is behaving badly. It is important to design systems of power that do not rely on the people in power being good and right, but instead make it so that if they behave badly, they are held to account. I don't think I have to explain why incentives and accountability matter for how the powerful wield their powers. The central thing I am talking about is basic measures for accountability, of which I consider very high up to be engaging with criticism, dialogue, and argument (as is somewhat natural given my background philosophy from growing up on LessWrong).

As far as I recall as I write this, I do not believe that the CEOs of the companies building potentially omnicidal machines are committed to any particular plan that public intellectuals are debating and defending, nor have they any commitment mechanism to any such plan that has been put out publicly. The plan is something of their own choosing, the details of which are almost entirely inaccessible by those whose lives are being risked. If you believe otherwise I request you provide links, I would welcome specifics.

elityre on Lao Mein's Shortform

I don't think that's a valid inference.

jrockwar on The Best Software For Every Need

On the topic of ffmpeg - additional shoutout to Handbrake, which is essentially ffmpeg with a GUI on top.

maelstrom on Alexander Gietelink Oldenziel's Shortform

One needs only to read 4 or so papers on category theory applied to AI to understand the problem. None of them share a common foundation on what type of constructions to use or formalize in category theory. The core issue is that category theory is a general language for all of mathematics, and as commonly used just exponentially increase the search space for useful mathematical ideas.

I want to be wrong about this, but I have yet to find category theory uniquely useful outside of some subdomains of pure math.

cata on When will computer programming become an unskilled job (if ever)?

One and a half years later it seems like AI tools are able to sort of help humans with very rote programming work (e.g. changing or writing code to accomplish a simple goal, implementing versions of things that are well-known to the AI like a textbook algorithm or a browser form to enter data, answering documentation-like questions about a system) but aren't much help yet on the more skilled labor parts of software engineering.

yonge on The Foraging (Ex-)Bandit [Ruleset & Reflections]

I was expecting earlier choices of foraging location to have a much stronger impact, and mistook some of the randomness for affects of earlier choices. In retrospect it would have been better to spend longer exploreing various possibilites rather than settling on an exploit strategy so soon. Adding an explicit target was a big improvement as it gave some idea of "how good a strategy" we should be searching for.