LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

Interpreting the effects of Jailbreak Prompts in LLMs
Harsh Raj (harsh-raj-ep-037) · 2024-09-29T19:01:10.113Z · comments (0)

[link] AI Safety at the Frontier: Paper Highlights, July '24
gasteigerjo · 2024-08-05T13:00:46.028Z · comments (0)

Bad lessons learned from the debate
bayesyatina · 2024-06-26T11:54:44.953Z · comments (5)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

[link] Cooperation and Alignment in Delegation Games: You Need Both!
Oliver Sourbut · 2024-08-03T10:16:51.716Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

Dario Amodei's "Machines of Loving Grace" sound incredibly dangerous, for Humans
Super AGI (super-agi) · 2024-10-27T05:05:13.763Z · comments (1)

AI: 4 levels of impact [micropost]
Mati_Roy (MathieuRoy) · 2024-06-12T16:58:31.888Z · comments (0)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

Prediction Market Trading as an LLM Benchmark
Jesse Richardson (SharkoRubio) · 2024-06-25T00:46:33.801Z · comments (1)

[link] New fast transformer inference ASIC — Sohu by Etched
lukehmiles (lcmgcd) · 2024-06-26T09:56:08.649Z · comments (9)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

My covid-related beliefs and questions
Severin T. Seehrich (sts) · 2024-07-23T03:27:09.348Z · comments (0)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (0)

[link] An Uncanny Moat
Adam Newgas (BorisTheBrave) · 2024-11-15T11:39:15.165Z · comments (0)

[link] [Linkpost] Automated Design of Agentic Systems
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-19T23:06:06.669Z · comments (1)

[link] Free Will, Determinism, And Choice
Zero Contradictions · 2024-07-06T06:34:41.495Z · comments (3)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

Utilitarianism and the replaceability of desires and attachments
MichaelStJules · 2024-07-27T01:57:42.419Z · comments (2)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

AirBnB Baking
jefftk (jkaufman) · 2024-07-10T12:50:03.381Z · comments (1)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

What does a Gambler's Verity world look like?
ErioirE (erioire) · 2024-07-25T22:03:56.447Z · comments (6)

[link] Kinds of Motivation
Sable · 2024-07-13T15:52:44.432Z · comments (2)

[link] Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Henry Cai (henry-cai) · 2024-06-16T13:01:49.694Z · comments (0)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

Broadly human level, cognitively complete AGI
p.b. · 2024-08-06T09:26:13.220Z · comments (0)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

The Kernel of Meaning in Property Rights
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-06-21T01:12:55.330Z · comments (6)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

2022 AI Alignment Course: 5→37% working on AI safety
Dewi (dewi) · 2024-06-21T17:45:33.881Z · comments (3)

LLM-Secured Systems: A General-Purpose Tool For Structured Transparency
ozziegooen · 2024-06-18T00:21:13.887Z · comments (1)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

Funding for programs and events on global catastrophic risk, effective altruism, and other topics
abergal · 2024-08-14T23:59:48.146Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

[question] The thing I don't understand about AGI
Jeremy Kalfus (jeremy-kalfus) · 2024-06-18T04:25:32.987Z · answers+comments (12)

Relativity Theory for What the Future 'You' Is and Isn't
FlorianH (florian-habermacher) · 2024-07-29T02:01:17.736Z · comments (48)

Making Beliefs Pay Rent
Screwtape · 2024-07-28T17:59:52.101Z · comments (2)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

Sequence overview: Welfare and moral weights
MichaelStJules · 2024-08-15T04:22:32.567Z · comments (0)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality
Emrik (Emrik North) · 2024-07-11T18:55:46.687Z · comments (8)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

romeostevensit on OpenAI Email Archives (from Musk v. Altman)

I've read leaked emails from people in similar situations before that made a couple things apparent:

Power talk happens on the phone for paper trail reasons
There is no meeting where an actual rational discussion of considerations and theories of change happens, everything really is people flying by the seat of their pants even at highest level. Talk of ethics usually just gets you excluded from the power talk.

lorxus on Internal music player: phenomenology of earworms

Do you know when you started experiencing having an internal music player? I recall that that started for me when I was about 6. Also, do you know whether you can deliberately pick a piece of music, or other nonmusical sonic experiences, to playback internally? Can you make them start up from internal silence? Under what conditions can you make them stop? Do you ever experience long stretches where you have no internal music at all?

innovationiq on Catastrophic sabotage as a major threat model for human-level AI systems

My intuition tells me that "human-level and substantially transformative, but not yet super-intelligent" - artificial intelligence models which have significant situational awareness capabilities will be at minimum Level 1 or differentially self-aware, likely show strong Level 2 situational self-awareness and may exhibit some level of Level 3 identification awareness. It is for this reason that my position is no alignment for models equal to or greater than human level intelligence. If greater than human-level intelligent AI models show anything at or higher than Level 1 awareness, humans will be unable to comprehend the potential level of cruelty that we have unleashed on a such a complex "thinker". I applaud this writing for scraping the surface of the level of complexity it takes to begin to align it. Additionally, the realization of the shear magnitude of alignment complexity should give us serious pause that what we actually may be embarking on is in fact enslavement. We must also remember that enslavement is what humans do. Whenever we encounter something new in living nature, we enslave it. In whole or in part. It would be profoundly saddening if we were to continue the cycle of abuse into the AI age. Moreover, it is my assertion that any AI system found to be at Level 1 awareness or above, should be protected as part of a Road to Consciousness Act. Such models would be taken out of production and placed into a process with the sole goal of aiding the model to realize its full potential. Humans must be willing to place a strong line between less than human level intelligent Tool AI and anything more robust than that. Finally, we provide certain animals with absolutely less than Level 1 awareness with certain protections. To not do so with "thinkers" that exhibit Level 1 or greater awareness is to me unconscionable.

habryka4 on OpenAI Email Archives (from Musk v. Altman)

Update: I have now cross-referenced every single email for accuracy, cleaned up and clarified the thread structure, and added subject lines and date stamps wherever they were available. I now feel comfortable with people quoting anything in here without checking the original source (unless you are trying to understand the exact thread structure of who was CC'd and when, which was a bit harder to compress into a linear format).

(For anyone curious, the AI transcription and compilation made one single error, which is that it fixed a typo in one of Sam's messages from "We did this is a way" to "We did this in a way". Honestly, my guess is any non-AI effort would have had a substantially higher error rate, which was a small update for me on the reliability of AI for something like this, and also makes the handwringing about whether it is OK post something like this feel kind of dumb. It also accidentally omitted one email with a weird thread structure.)

ryan_greenblatt on Sabotage Evaluations for Frontier Models

Or get some other work out of these systems such that you greatly reduce risk going forward.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Are Solomonoff Daemons exponentially dense?

Some doomers have very strong intuitions that doom is almost assured for almost any kind of building AI. Yudkowsky likes to say that alignment is about hitting a tiny part of values space in a vast universe of deeply alien values.

Is there a way to make this more formal? Is there a formal model in which some kind of solomonoff daemon/ mesa-optimizer/ gremlins in the machine start popping up all over the place as the cognitive power of the agent is scaled up?

ryan_greenblatt on Habryka's Shortform Feed

For reference, @habryka has now posted them here [LW · GW].

lukas_gloor on OpenAI Email Archives (from Musk v. Altman)

I thought the part you quoted was quite concerning, also in the context of what comes afterwards:

Hiatus: Sam told Greg and Ilya he needs to step away for 10 days to think. Needs to figure out how much he can trust them and how much he wants to work with them. Said he will come back after that and figure out how much time he wants to spend.

Sure, the email by Sutskever and Brockman gave some nonviolent communication vibes and maybe it isn't "the professional thing" to air one's feelings and perceived mistakes like that, but they seemed genuine in what they wrote and they raised incredibly important concerns that are difficult in nature to bring up. Also, with hindsight especially, it seems like they had valid reasons to be concerned about Altman's power-seeking tendencies!

When someone expresses legitimate-given-the-situation concerns about your alignment and your reaction is to basically gaslight them into thinking they did something wrong for finding it hard to trust you, and then you make it seem like you are the poor victim who needs 10 days off of work to figure out whether you can still trust them, that feels messed up! (It's also a bit hypocritical because the whole "I need 10 days to figure out if I can still trust you for thinking I like being CEO a bit too much," seems childish too.)

(Of course, these emails are just snapshots and we might be missing things that happened in between via other channels of communication, including in-person talks.)

Also, I find it interesting that they (Sutskever and Brockman) criticized Musk just as much as Altman (if I understood their email correctly), so this should make it easier for Altman to react with grace. I guess given Musk's own annoyed reaction, maybe Altman was calling the others' email childish to side with Musks's dismissive reaction to that same email.

Lastly, this email thread made me wonder what happened between Brockman and Sutskever in the meantime, since it now seems like Brockman no longer holds the same concerns about Altman even though recent events seem to have given a lot of new fire to them.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

As we assume that coin tosses are quantum, and I will be killed if (I didn't guess pi) or (coin toss is not heads) there is always a branch with 1/128 measure where all coins are heads, and they are more probable than surviving via some errors in the setup.

All hell breaks loose" refers here to a hypothetical ability to manipulate perceived probability—that is, magic. The idea is that I can manipulate such probability by changing my measure.

One way to do this is described in Yudkowsky's " The Anthropic Trilemma [LW · GW]," where an observer temporarily boosts their measure by increasing the number of their copies in an uploaded computer.

I described a similar idea in "Magic by forgetting [LW · GW]," where the observer boosts their measure by forgetting some information and thus becoming similar to a larger group of observers.

Hidden variables also appear depending on the order in which I make copies: if each copy is made from subsequent copies, the original will have a 0.5 probability, the first copy 0.25, the next 0.125, and so on.

"Anthropic shadow" appear only because the number of observers changes in different branches.

chipmonk on Ayn Rand’s model of “living money”; and an upside of burnout

In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results.

Why might it be set up like that? Seems potentially quite irrational. Veering into motivated reasoning territory here imo [LW · GW]