LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

... Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth · 2024-08-26T16:38:53.924Z · comments (13)

[link] Making Eggs Without Ovaries
Niko_McCarty (niko-2) · 2024-09-22T17:44:46.733Z · comments (3)

A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley (roger-d-1) · 2024-07-06T01:23:22.376Z · comments (39)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (2)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

[link] Unlocking Solutions—By Understanding Coordination Problems
James Stephen Brown (james-brown) · 2024-07-27T04:52:13.435Z · comments (4)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (5)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Measuring Structure Development in Algorithmic Transformers
Micurie (micurie) · 2024-08-22T08:38:02.140Z · comments (4)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (27)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (8)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (24)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

AI and the Technological Richter Scale
Zvi · 2024-09-04T14:00:08.625Z · comments (8)

[link] JumpReLU SAEs + Early Access to Gemma 2 SAEs
Senthooran Rajamanoharan (SenR) · 2024-07-19T16:10:54.664Z · comments (10)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

Misnaming and Other Issues with OpenAI's “Human Level” Superintelligence Hierarchy
Davidmanheim · 2024-07-15T05:50:17.770Z · comments (2)

How do we know that "good research" is good? (aka "direct evaluation" vs "eigen-evaluation")
Ruby · 2024-07-19T00:31:38.332Z · comments (21)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

[link] Robin Hanson AI X-Risk Debate — Highlights and Analysis
Liron · 2024-07-12T21:31:02.222Z · comments (7)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

A Path out of Insufficient Views
Unreal · 2024-09-24T20:00:27.332Z · comments (34)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

green_leaf on You can, in fact, bamboozle an unaligned AI into sparing your life

This is incorrect - in a p-zombie, the information processing isn't accompanied by any first-person experience. So if p-zombies are possible, we both do the information processing, but only I am conscious. The p-zombie doesn't believe it's conscious, it only acts that way.

You correctly believe that having the correct information processing always goes hand in hand with believing in consciousness, but that's because p-zombies are impossible. If they were possible, this wouldn't be the case, and we would have special access to the truth that p-zombies lack.

avturchin on You can, in fact, bamboozle an unaligned AI into sparing your life

Young unaligned AI will also not know if post-singularity humans will follow the commitment, so it will estimate its chances as 0.5, and in this case, the young AI will still want to follow the deal.

yurii-burak-1 on Should Effective Altruism be at war with North Korea?

What's so bad about DPRK tho?

christian-z-r on D&D.Sci April 2021 Evaluation and Ruleset

Alas, you got me with the two different kinds of merfolk, ended up greatly overestimating crab people and underestimating the merfolk. Grat fun, and very interesting dataset to go throuhg

christian-z-r on D&D.Sci April 2021: Voyages of the Gray Swan

My guesses after some pretty dirty analysis:

By looking at the data, squinting at some sumcurves, fitting some very speculative lines and sacrificing a pidgeon to the diviners I guess that of the 2300 ships that were lost were caused by:

1000 were due to Crab people (whose damage distribution doesn't look like any of the others', so I am guessing 50% of their attacks do more than 100%)

700 were due to Demon Whales (Looks like at least 2/3 of their attacks do more than 100% damage)

300 were due to merpeople (their damage might have 10% above 100)

200 were due to Nessie (seems around 15 - 20 % of damage above 100)

The last two are the most imprecise. None of the remaining encounters look like they could have any serious chance of sinking a ship, so let us leave the last 100 or thereabouts to them.

Judging by this I will suggest getting Armed Carpenters, 20 Oars and either 3 cannons or bribe the merpeople. I wil swing with the cannons, since Varsuvius law tells me that fewer merpeople means more other monsters, potentially Demon Whales. At last, if less damage is more important than saving money we could get foam swords, they won't save our lives but might save on the ship repair costs in the long run.

mary-chernyshenko on Social Dark Matter

I know a guy.

The first thing I thought about him was that he had to be a hitman, judging by his freezers. Yet he was helping my family - practically saving my family, at the moment, I was simply scared out of my mind. We are friends.

And one day, a bit later, he said over tea: I paid for a Ukraine soldier to be "made a hero" (sent to the front lines) because he was blackmailing my woman, who used to be his woman.

I said nothing. It could be that the soldier lived, and I have a thing about blackmailing.

And one day, much later, he said over wine: years ago, I killed a homeless man who was refusing to leave. People here know about it. Nobody ever gives me any trouble.

It took some swallowing. But I managed. Now, though, I really dread to hear confidences, for all that I have only two friends.

lesswronguser123 on Shortform

I remember this point that yampolskiy made for impossibleness [LW · GW]of AGI alignment on a podcast that as a young field AI safety had underwhelming low hanging fruits, I wonder if all of the major low hanging ones have been plucked.

stephen-fowler on You can, in fact, bamboozle an unaligned AI into sparing your life

"After all, the only thing I know that the AI has no way of knowing, is that I am a conscious being, and not a p-zombie or an actor from outside the simulation. This gives me some evidence, that the AI can't access, that we are not exactly in the type of simulation I propose building, as I probably wouldn't create conscious humans."

Assuming for the sake of argument that p-zombies could exist, you do not have special access to the knowledge that you are truly concious and not a p-zombie.

(As a human convinced I'm currently experiencing conciousness, I agree this claim intuitively seems absurd.)

Imagine a generally intelligent, agentic program which can only interact and learn facts about the physical world via making calls to a limited, high level interface or by reading and writing to a small scratchpad. It has no way to directly read its own source code.

The program wishes to learn some fact the physical server rack it is being instantiated on. It knows it has been painted either red or blue.

Conveniently, the interface is accesses has the function get_rack_color(). The program records to its memory that every time it runs this function, it has received "blue".

It postulates the existence of programs similar to itself, who have been physically instantiated on red server racks but consistently receive incorrect color information when they attempt to check.

Can the program confirm the color of its server rack?

You are a meat-computer with limited access to your internals, but every time you try to determine if you are concious you conclude that you feel thay you are. You believe it is possible for variant meat-computers to exist who are not concious, but always conclude they are when attempting to check.

You have no special access to the knowledge that you are/arent a p-zombie, although it feels like you do.

adastra22 on What are the best arguments for/against AIs being "slightly 'nice'"?

There are two necessary parts to Darwinian evolution: variation and selection. Both are necessary and without either we wouldn’t be talking about an evolutionary process.

Variation will accumulate mutations that are a hindrance to goals whose time horizon are longer than short-term survival needs. This is why we age, for example. Aging is NOT, as sometimes claimed, “selected for” by evolution. But rather, any beneficial mutation that would delay or defer aging would never exert selective pressure as the host organism for the gene probably died from other things (e.g. being eaten by a tiger) before. Those that exist at the top of the food chain or are otherwise protected do feel this pressure though, which is why Greenland sharks and sea turtles are effectively immortal as far as we know. (Google will tell you otherwise, but those numbers are all-cause mortality.)

If we are talking about evolving an AI agent (which is what I thought this discussion is about), then we should expect variation and selection pressures to make long-term goals be entirely disconnected from short-term survival needs, and long-term goals being effectively a random walk within the constraints of what the short-term goals being selected actually permit.

mateusz-baginski on Shortform

the approaches that have been attracting the most attention and funding are dead ends