LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

What is Randomness?
martinkunev · 2024-09-27T17:49:42.704Z · comments (2)

Can startups be impactful in AI safety?
Esben Kran (esben-kran) · 2024-09-13T19:00:33.306Z · comments (0)

[link] Intention-to-Treat (Re: How harmful is music, really?)
kqr · 2024-09-18T18:44:41.128Z · comments (0)

Self location for LLMs by LLMs: Self-Assessment Checklist.
weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, September '24
gasteigerjo · 2024-10-02T09:49:00.357Z · comments (0)

Keyboard Gremlins
jefftk (jkaufman) · 2024-09-20T02:30:07.140Z · comments (0)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

Just How Good Are Modern Chess Computers?
nem · 2024-09-19T18:57:21.254Z · comments (1)

[link] How harmful is music, really?
dkl9 · 2024-09-17T14:53:25.426Z · comments (6)

The Other Existential Crisis
James Stephen Brown (james-brown) · 2024-09-21T01:16:38.011Z · comments (24)

Contextual Constitutional AI
aksh-n · 2024-09-28T23:24:43.529Z · comments (0)

A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)

[question] Does life actually locally *increase* entropy?
tailcalled · 2024-09-16T20:30:33.148Z · answers+comments (27)

[question] Doing Nothing Utility Function
k64 · 2024-09-26T22:05:18.821Z · answers+comments (9)

Electric Mandola
jefftk (jkaufman) · 2024-09-21T13:40:04.772Z · comments (0)

[question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal
doomyeser · 2024-09-11T18:07:19.385Z · answers+comments (9)

Will AI and Humanity Go to War?
Simon Goldstein (simon-goldstein) · 2024-10-01T06:35:22.374Z · comments (4)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (7)

[link] AISafety.info: What are Inductive Biases?
Algon · 2024-09-19T17:26:24.581Z · comments (4)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

A Dialogue on Deceptive Alignment Risks
Rauno Arike (rauno-arike) · 2024-09-25T16:10:12.294Z · comments (0)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

Keeping it (less than) real: Against ℶ₂ possible people or worlds
quiet_NaN · 2024-09-13T17:29:44.915Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

[link] When to join a respectability cascade
B Jacobs (Bob Jacobs) · 2024-09-24T07:54:16.051Z · comments (1)

Three main arguments that AI will save humans and one meta-argument
avturchin · 2024-10-02T11:39:08.910Z · comments (8)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

[link] Comparing Forecasting Track Records for AI Benchmarking and Beyond
ChristianWilliams · 2024-09-25T21:01:15.975Z · comments (0)

Steering LLMs' Behavior with Concept Activation Vectors
Ruixuan Huang (sprout_ust) · 2024-09-28T09:53:19.658Z · comments (0)

[question] Any Trump Supporters Want to Dialogue?
k64 · 2024-09-28T19:41:55.370Z · answers+comments (45)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (7)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

What did you mean exactly in 2016 by the scaling hypothesis ?

Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don't believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.

cubefox on [deleted]

Intersubjective agreement on what "is an alignment technique" means will be far worse than on "is a machine learning technique", and many implications of the first claim are far more contentious than of the second.

I think it is highly uncontroversial and even trivial to call RLHF an alignment technique, given that it is literally used to nudge the model away from "bad" responses and toward "good" responses. It seems the label "alignment technique" could only be considered inappropriate here for someone who has a nebulous science fiction idea of alignment as a technology that doesn't currently exist at all, like it was seen when Eliezer originally wrote the sequences. I think it's obvious that this view is outdated now.

bogdan-ionut-cirstea on Alexander Gietelink Oldenziel's Shortform

Fwiw, in 2016 I would have put something like 20% probability on what became known as 'the scaling hypothesis'. I still had past-2035 median timelines, though.

towards_keeperhood on Creating a truly formidable Art

This is a great post.

It's a good pointer to a part of the art which I am aiming for.

Some instructions are a bit vague and I'd have found it even nicer if you had more precisely written what repeated practices (e.g. what sorts of meditation) one can do to e.g. become better embody the void. I think if you had I would probably disagree with some parts of the suggested path, but it'd still be nice to have something more concrete to take what is good from.

I feel like the "creating a rationality dojo" part sorta distracts from the really important pieces in this post. It might've been nicer to just do the main post on "towards becoming truly formidable in the art" and then a second independentish post on "creating a rationality dojo".

(It feels a bit like you're saying "if you want to create a good rationality dojo, you first need to become great in the art, and then only create a dojo when the art calls for you to do it". Might've been nicer if you had just been like "here are some pointers to what i can sense which might be useful for aspiring rationalists". I think for many aspiring beisutsukai it might be the case that their purpose won't call for them to create a dojo for quite a while. Also beisutsukai usually don't become beisutsukai because they just want to become great in the art, but rather as instrumental to their actual purpose. ("the art must have a purpose other than its own".))

It's amusing because I doubt this is my gift to give the world. I'm doing something closely related, but different. Far too mystical for the right aesthetic.

I'm curious, what do you think is your gift to give to the world? (Feel free to answer by PM.)

foyle on If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?

I think there is far too much focus on technical approaches, when what is needed is a more socio-political focus. Raising money, convincing deep pockets of risks to leverage smaller sums, buying politicians, influencers and perhaps other groups that can be coopted and convinced of existential risk to put a halt to Ai dev.

It amazes me that there are huge, well financed and well coordinated campaigns for climate, social and environmental concerns, trivial issues next to AI risk, and yet AI risk remains strictly academic/fringe. What is on paper a very smart community embedded in perhaps the richest metropolitan area the world has ever seen, has not been able to create the political movement needed to slow things up. I think precisely because they pitching to the wrong crowd.

Dumb it down. Identify large easily influenceable demographics with a strong tendency to anxiety that can be most readily converted - most obviously teenagers, particularly girls and focus on convincing them of the dangers, perhaps also teachers as a community - with their huge influence. But maybe also the elederly - the other stalwart group we see so heavily involved in environmental causes. It would have orders of magnitude more impact than current cerebral elite focus, and history is replete with revolutions borne out of targeting conversion of teenagers to drive them.

cubefox on Mark Xu's Shortform

I don't see a significant difference in your distinction between alignment and control. If you say alignment is about doing what you want (which I strongly disagree with in its generality, e.g. when someone might want to murder or torture people or otherwise act unethically), that obviously includes your wanting to "be OK" when the AI didn't do exactly what you want. Alignment comes in degrees, and you merely seem to equate control with non-perfect alignment and alignment with perfect alignment. Or I might be misunderstanding what you have in mind.

rogerdearnaley on A basic systems architecture for AI agents that do autonomous research

I work for a startup that builds agents, and yes, we use the architecture described here — with the additional feature that we don't own the machines that the inference or execution mecahnisms run on: they're in separate datacenters owned and operated by other companies (for the inference servers, generally foundation model companies behind an API)

alexander-gietelink-oldenziel on Vladimir_Nesov's Shortform

Cutting edge AI research seems remarkably and surprisingly easy compared to other forms of cutting edge science. Most things work on the first try, clever insights aren't required, it's mostly an engineering task of scaling compute.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Oh no uh-oh I think I might have confused Shane Legg with Jan Leike

rogerdearnaley on Success without dignity: a nearcasting story of avoiding catastrophe by luck

I basically agree, for three reasons:

The level of understanding of and caring about human values required to not kill everyone and be able to keep many humans alive, is actually pretty low (especially on the knowledge side).
That's also basically sufficient to motivate wanting to learn more about human values, and being able to, so then the Value Learning [? · GW] process then kicks in: a competent and caring alien zookeeper would want to learn more about their charges' needs.
We have entire libraries half of whose content is devoted to "how to make humans happy", and we already fed most of them into our LLMs as training material. On a factual basis, knowing how to make humans happy in quite a lot of detail (and for a RAG agent, looking up details they don't already have memorized) is clearly well within their capabilities. The part that concerns me is the caring side, and that's not conceptually complicated: roughly speaking, the question is how to ensure an agent's selfless caring for humans is consistently a significantly stronger motivation than various bad habits [LW · GW] like ambition, competitiveness, and powerseeking that it either picked up from us during the "distillation" of the base model, and/or learnt during RL training.

Also, a question for this quote is what's the assumed capability/compute level used in this thought experiment?
E.g. if the there was an guide for alien zookeepers (ones already familiar with Terran biochemistry) on how to keep humans, how long would it need to be for the humans to mostly survive?

ASI, or high AGI: capable enough that we've lost control and alignment is an existential risk.