LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Grounding self-reference paradoxes in reality
Fiora from Rosebloom · 2024-09-29T05:50:30.559Z · comments (3)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

Biasing LLM Response with Visual Stimuli
Jaehyuk Lim (jason-l) · 2024-10-03T18:04:31.474Z · comments (0)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

[link] Should we abstain from voting? (In nondeterministic elections)
B Jacobs (Bob Jacobs) · 2024-10-02T10:07:43.167Z · comments (5)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

[link] In-Context Learning: An Alignment Survey
alamerton · 2024-09-30T18:44:28.589Z · comments (0)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

Toy Models of Superposition: Simplified by Hand
Axel Sorensen (axel-sorensen) · 2024-09-29T21:19:52.475Z · comments (0)

[question] How do you follow AI (safety) news?
PeterH · 2024-09-24T13:58:48.916Z · answers+comments (2)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)

[question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double · 2024-09-05T00:35:39.504Z · answers+comments (6)

San Francisco ACX Meetup “First Saturday”
Nate Sternberg (nate-sternberg) · 2024-09-29T03:13:34.615Z · comments (0)

On Measuring Intellectual Performance - personal experience and several thoughts
Alexander Gufan (alexander-gufan) · 2024-09-20T17:21:19.747Z · comments (2)

[question] Calibration training for 'percentile rankings'?
david reinstein (david-reinstein) · 2024-09-14T21:51:55.705Z · answers+comments (0)

For Limited Superintelligences, Epistemic Exclusion is Harder than Robustness to Logical Exploitation
Lorec · 2024-09-15T20:49:06.370Z · comments (9)

Collapsing “Collapsing the Belief/Knowledge Distinction”
Jeremias (jeremias-sur) · 2024-09-20T16:11:33.558Z · comments (0)

Apply to the Cooperative AI PhD Fellowship by October 14th!
Lewis Hammond (lewis-hammond-1) · 2024-10-05T12:41:24.093Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

Amoeba roles in tech
Sindhu Shivaprasad (sindhu-shivaprasad) · 2024-10-04T17:25:46.568Z · comments (0)

[link] Climate Change And Global Warming
Zero Contradictions · 2024-09-25T19:13:09.508Z · comments (0)

Endogenous Growth and Human Intelligence
Nicholas D. (nicholas-d) · 2024-09-18T14:05:54.567Z · comments (0)

MIT FutureTech are hiring for a Technical Associate role
peterslattery · 2024-09-09T20:16:49.299Z · comments (0)

[question] Searching for Impossibility Results or No-Go Theorems for provable safety.
Maelstrom · 2024-09-27T20:12:25.515Z · answers+comments (1)

What bootstraps intelligence?
invertedpassion · 2024-09-10T07:11:21.819Z · comments (2)

A Psychoanalytic Explanation of Sam Altman's Irrational Actions
Gabe · 2024-09-29T18:58:13.511Z · comments (3)

Can AI Quantity beat AI Quality?
Gianluca Calcagni (gianluca-calcagni) · 2024-10-02T15:21:45.711Z · comments (0)

Survey - Psychological Impact of Long-Term AI Engagement
Manuela García (manuela-garcia) · 2024-09-17T17:31:38.397Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

alexander-gietelink-oldenziel on Vladimir_Nesov's Shortform

China is producing research in a number of areas right now that is surpassing the West and arguably more impressive scientifically than producing top LLMs.

A big reason China is lagging a little bit might be political interference at major tech companies. Xi Jinping instigated a major crackdown recently. There is also significantly less Chinese text data. I am not a China or tech expert so these sre just guesses.

In any case, I wouldn't assign it to much significance. The AI space is just moving so quickly that even a minor year delay can seem like lightyears. But that doesnt mean that Chinese companies cant so it or that a country-continent with 1,4 billion people and a history of many technological firsts cant scale up a transformer.

dan-braun on Dan Braun's Shortform

In which worlds would AI Control prevent significant harm?

When I bring up the issue of AI model security to people working in AI safety, I’m often met with something of the form “yes, this is a problem. It’s important that people work hard on securing AI models. But it doesn’t really affect my work”.

Using AI Control [LW · GW] (an area which has recently excited many [LW(p) · GW(p)] in the field) as an example, I lay out an argument for why it might not be as effective an agenda as one might think after considering the realities of our cyber security situation.

AI Control concerns itself with models that intentionally try to subvert its developers.
These models are likely to be very generally capable and capable of causing significant harm without countermeasures.
Leading cyber-capable institutions would likely expend significant resources and political capital to steal these models or steal enough insights to reproduce such models.
If the weights or insights are stolen, work on AI control will not prevent these models from causing significant harm.
Current AI developers are not on track to be able to defend against high-priority operations from leading cyber-capable institutions in the coming years.
Therefore, AI control will only be useful in the coming years under one (or more) of these conditions:
1. Models that scheme are unlikely to be generally capable/dangerous enough to be a high-priority target for leading cyber-capable institutions.
2. Models that scheme are only developed by actors that can thwart high-priority operations from leading cyber-capable institutions (which precludes current AI developers for at least several years).
3. AI Control won’t be directly useful in the coming years but it will be indirectly useful to progress the field for when models are developed by actors capable of thwarting top cyber operations.
4. Even if the model was stolen and caused significant harm, there would still be less harm overall than if the model undergoing AI control measures also broke out. Of course, this argument is much weaker for models capable of existential harm.
5. Actors that steal the model would also implement strong-enough AI control measures and would not misuse the AI to cause significant harm.

There are of course other arguments against working on AI control. E.g. it may encourage the development and use of models that are capable of causing significant harm. This is an issue if the AI control methods fail or if the model is stolen. So one must be willing to eat this cost or argue that it’s not a large cost when advocating for AI Control work.

This isn’t to say that AI Control isn’t a promising agenda, I just think people need to carefully consider the cases in which their agenda falls down for reasons that aren’t technical arguments about the agenda itself.

I’m also interested to hear takes from those excited by AI Control on which conditions listed in #6 above that they expect to hold (or to otherwise poke holes in the argument).

bohaska on Vladimir_Nesov's Shortform

This seems like the sort of R&D that China is good at: research that doesn't need superstar researchers and that is mostly made of incremental improvements. But yet they don't seem to be producing top LLMs. Why is that?

niplav on Dalcy's Shortform

The way I do this is use the Print as PDF functionality in the browser on every single post, and then concatenate them using pdfunite.

nutrition-capsule on The Sun is big, but superintelligences will not spare Earth a little sunlight

I interpreted Eliezer as writing from the assumption that the superintelligence(s) in question are in fact not already aligned to maximize whatever it is that humanity needs to survive, but some other goal(s), which diverge from humanity's interests once implemented.

He explicitly states that the essay's point is to shoot down a clumsy counterargument (along "it wouldn't cost the ASI a lot to let us live, so we should assume they'd let us live"). So the context (I interpret) is that such requests, however sympathetic, have not been ingrained into the ASI:s goals. Using a different example would mean he was discussing something different.

That is, "just because it would make a trivial difference from the ASI:s perspective to let humanity thrive, whereas it would make an existential difference from humanity's perspective, doesn't mean ASIs will let humanity thrive", assuming such conditions aren't already baked into their decision-making.

I think Eliezer spends so much time on working from these premises because he believes 1) an unaligned ASI to be the default outcome of current developments, and 2) that all current attempts at alignment will necessarily fail.

bogdan-ionut-cirstea on Alexander Gietelink Oldenziel's Shortform

What did you mean exactly in 2016 by the scaling hypothesis ?

Something like 'we could have AGI just by scaling up deep learning / deep RL, without any need for major algorithmic breakthroughs'.

Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don't believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.

I'm not sure this is strictly true, though I agree with the 'vibe'. I think there were probably a couple of things in play:

I still only had something like 20% on scaling, and I expected much more compute would likely be needed, especially in that scenario, but also more broadly (e.g. maybe something like the median in 'bioanchors' - 35 OOMs of pretraining-equivalent compute, if I don't misremember; though I definitely hadn't thought very explicitly about how many OOMs of compute at that time) - so I thought it would probably take decades to get to the required amount of compute.
I very likely hadn't thought hard and long enough to necessarily integrate/make coherent my various beliefs.
Probably at least partly because there seemed to be a lot of social pressure from academic peers against even something like '20% on scaling', and even against taking AGI and AGI safety seriously at all. This likely made it harder to 'viscerally feel' what some of my beliefs might imply, and especially that it might happen very soon (which also had consequences in delaying when I'd go full-time into working on AI safety; along with thinking I'd have more time to prepare for it, before going all in).

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

What did you mean exactly in 2016 by the scaling hypothesis ?

Having past 2035 timelines and believing in the pure scaling maximalist hypothesis (which fwiw i don't believe in for reasons i have explained elsewhere) are in direct conflict so id be curious if you could more exactly detail your beliefs back then.

cubefox on [deleted]

Intersubjective agreement on what "is an alignment technique" means will be far worse than on "is a machine learning technique", and many implications of the first claim are far more contentious than of the second.

I think it is highly uncontroversial and even trivial to call RLHF an alignment technique, given that it is literally used to nudge the model away from "bad" responses and toward "good" responses. It seems the label "alignment technique" could only be considered inappropriate here for someone who has a nebulous science fiction idea of alignment as a technology that doesn't currently exist at all, like it was seen when Eliezer originally wrote the sequences. I think it's obvious that this view is outdated now.

bogdan-ionut-cirstea on Alexander Gietelink Oldenziel's Shortform

Fwiw, in 2016 I would have put something like 20% probability on what became known as 'the scaling hypothesis'. I still had past-2035 median timelines, though.

towards_keeperhood on Creating a truly formidable Art

This is a great post.

It's a good pointer to a part of the art which I am aiming for.

Some instructions are a bit vague and I'd have found it even nicer if you had more precisely written what repeated practices (e.g. what sorts of meditation) one can do to e.g. become better embody the void. I think if you had I would probably disagree with some parts of the suggested path, but it'd still be nice to have something more concrete to take what is good from.

I feel like the "creating a rationality dojo" part sorta distracts from the really important pieces in this post. It might've been nicer to just do the main post on "towards becoming truly formidable in the art" and then a second independentish post on "creating a rationality dojo".

(It feels a bit like you're saying "if you want to create a good rationality dojo, you first need to become great in the art, and then only create a dojo when the art calls for you to do it". Might've been nicer if you had just been like "here are some pointers to what i can sense which might be useful for aspiring rationalists". I think for many aspiring beisutsukai it might be the case that their purpose won't call for them to create a dojo for quite a while. Also beisutsukai usually don't become beisutsukai because they just want to become great in the art, but rather as instrumental to their actual purpose. ("the art must have a purpose other than its own".))

It's amusing because I doubt this is my gift to give the world. I'm doing something closely related, but different. Far too mystical for the right aesthetic.

I'm curious, what do you think is your gift to give to the world? (Feel free to answer by PM.)