LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Thoughts on SB-1047
ryan_greenblatt · 2024-05-29T23:26:14.392Z · comments (1)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

[link] Are There Examples of Overhang for Other Technologies?
Jeffrey Heninger (jeffrey-heninger) · 2023-12-13T21:48:08.954Z · comments (50)

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

[link] An Opinionated Evals Reading List
Marius Hobbhahn (marius-hobbhahn) · 2024-10-15T14:38:58.778Z · comments (0)

Measuring Coherence of Policies in Toy Environments
dx26 (dylan-xu) · 2024-03-18T17:59:08.118Z · comments (9)

New paper shows truthfulness & instruction-following don't generalize by default
joshc (joshua-clymer) · 2023-11-19T19:27:30.735Z · comments (0)

[link] Towards shutdownable agents via stochastic choice
EJT (ElliottThornley) · 2024-07-08T10:14:24.452Z · comments (7)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (22)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Three Notions of "Power"

Slavery was abolished and remains abolished through dominance:

first by getting outlawed by the Northern US and Great Britain, who drew strong economic benefit from higher labor prices due to them industrializing earlier for geographic reasons,
secondly by leveraging state dominance during the great depression to demand massive increases in quality and quantity of production, to make it feasible to maintain a non-slave-holding society without having excess labor forces being forced to starve,
thirdly, endless policies that use state violence and reduce the total fertility rate as a side-effect,

Throughout most of history, there has been excess labor, making the value of work fall down close to the cost of subsistence, being only sustainable because landowners see natural fluctuations in their production and therefore desire to keep people around even if it doesn't make short-term economic sense. This naturally creates serfdom and indentured servitude.

It's only really prisoners of war (e.g. African-American chattel slaves) who are slaves due to dominance; ordinary slavery is just poor bargaining power.

tapatakt on I turned decision theory problems into memes about trolleys

Death in Damascus is easy, but boring.

Doomsday Argument is not a decision theory problem... but it can be turned into one... I think the trolley version would be too big and complicated, though.

Obviously, only problems with discrete choice can be expressed as a Trolley problems.

quetzal_rainbow on I turned decision theory problems into memes about trolleys

Wow, XOR-trolley really made me think

going-durden on Three Notions of "Power"

Dominance underlies the things that can be done most efficiently with dominance. The moment dominance is no longer the most efficient force, it collapses, because in the vast majority of cases, dominating others takes a lot of time, energy and effort. This is actually how and why slavery (pretty much the most powerful example of dominance) was abolished: it started to make less economic sense than Bargaining (paid employment of freemen) and just Getting Things Done (through better tools and ultimately machines), so even its most ardent supporters became dispirited.

templarrr on Occupational Licensing Roundup #1

Two things to note.

First - I feel like putting every occupation in the same pile and deciding are you for or against licensing isn't helpful? I personally don't need licensed lawnmower, but I would very much prefer licensed doctor. The cost of mistake in two occupations differs a lot and can be used for a threshold which jobs should require a license.

Second - there should be a difference between doing a thing to yourself (argument can be made even that here we shouldn't have any limits), doing things for free to your friends/relatives with their full knowledge of your skill level and experience (most of the non life-threatening things can probably be allowed here) and selling your craft for money.

cubefox on What TMS is like

Interesting, I really hope TMS gains more acceptance. By the way, according to studies, ECT (the precursor of TMS) is even more effective, though it does have more side effects, due to the required anesthesia, and it is gatekept even more strongly. In my youth I suffered from depression for several years, and all of this likely would have been avoidable with a few ECT sessions (TMS wasn't a thing yet), if it wasn't for the medical system's irrational bias in favor of exclusively using SSRIs and CBT. I think this happens because most medical staff have no idea how terrible depression can be, so they don't get the sense of urgency they'd get from more visible diseases.

vgillioz on vgillioz's Shortform

Pre-registering bcbe94590cf03bb7fbcb1ef04e0f779b1e39037d247ebb83b82ee18d5bcbd59f

going-durden on When do "brains beat brawn" in Chess? An experiment

A related thought: an intelligence can only work on the information that it has, regardless of its veracity, and it can only work on information that actually exists.

My hunch is that the plan of "AI boostraps itself to superintelligence, then superpower, then wipes out humanity" relies on it having access to information that is too well hidden to divine through sheer calculation and infogathering, regardless of its intelligence (ex: the location of all the military bunkers, and nuclear submarines humanity has), or simply does not exist (ex: future Human strategic choices based on coin-flips).

Most AI Apocalypse scenarios depend not only on the AI being superhumanly smart, but being inexplicably Omniscient about things that nobody could be plausibly Omniscient about.

lone17 on Refusal in LLMs is mediated by a single direction

Thanks for the insight on the locality check experiment.

For inducing refusal, I used the code from the demo notebook provided in your post. It doesn't have a section on inducing refusal but I just invert the difference-in-means vector and set the intervention layer to the single layer where said vector was extracted. I believe this has the same effect as what you described, which is to apply the intervention to every token at a single layer. Will checkout your repo to see if I missed something. Thank you for the discussion.

going-durden on Bitter lessons about lucid dreaming

this might not actually be always beneficial. Lucid dreaming also means you remember much more from the dreams, which can extend the lifespan of your recurring nightmares. Not to mention, if you dream lucidly, your consciousness is not resting, and intrusive thoughts will pile up.