LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #81: Alpha Proteo
Zvi · 2024-09-12T13:00:07.958Z · comments (3)

Some negative steganography results
Fabien Roger (Fabien) · 2023-12-09T20:22:52.323Z · comments (5)

Understanding SAE Features with the Logit Lens
Joseph Bloom (Jbloom) · 2024-03-11T00:16:57.429Z · comments (0)

D&D.Sci: The Mad Tyrant's Pet Turtles
abstractapplic · 2024-03-29T16:22:13.732Z · comments (18)

Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled · 2024-08-17T09:57:19.370Z · comments (9)

[link] Pacing Outside the Box: RNNs Learn to Plan in Sokoban
Adrià Garriga-alonso (rhaps0dy) · 2024-07-25T22:00:55.398Z · comments (8)

On the Latest TikTok Bill
Zvi · 2024-03-13T18:50:05.398Z · comments (7)

Please do not use AI to write for you
Richard_Kennaway · 2024-08-21T09:53:34.425Z · comments (34)

Woods’ new preprint on object permanence
Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · comments (1)

Apply to ESPR & PAIR, Rationality and AI Camps for Ages 16-21
Anna Gajdova (anna-gajdova) · 2024-05-03T12:36:37.610Z · comments (5)

[question] Shane Legg's necessary properties for every AGI Safety plan
jacquesthibs (jacques-thibodeau) · 2024-05-01T17:15:41.233Z · answers+comments (12)

Consider the humble rock (or: why the dumb thing kills you)
pleiotroth · 2024-07-04T13:54:15.593Z · comments (11)

A hermeneutic net for agency
TsviBT · 2024-01-01T08:06:30.289Z · comments (4)

[link] Announcing the $200k EA Community Choice
Austin Chen (austin-chen) · 2024-08-14T00:39:37.350Z · comments (8)

Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger (Fabien) · 2024-06-06T23:54:25.167Z · comments (5)

The LessWrong 2022 Review: Review Phase
RobertM (T3t) · 2023-12-22T03:23:49.635Z · comments (7)

Paper out now on creatine and cognitive performance
Fabienne · 2023-11-26T10:58:29.745Z · comments (2)

How you can help pass important AI legislation with 10 minutes of effort
ThomasW · 2024-09-14T22:10:50.386Z · comments (2)

Managing catastrophic misuse without robust AIs
ryan_greenblatt · 2024-01-16T17:27:31.112Z · comments (17)

[link] "Why I Write" by George Orwell (1946)
Arjun Panickssery (arjun-panickssery) · 2024-04-25T16:02:28.668Z · comments (2)

[link] Talk: "AI Would Be A Lot Less Alarming If We Understood Agents"
johnswentworth · 2023-12-17T23:46:32.814Z · comments (3)

The Problem With the Word ‘Alignment’
peligrietzer · 2024-05-21T03:48:26.983Z · comments (8)

Aligned AI is dual use technology
lc · 2024-01-27T06:50:10.435Z · comments (31)

[link] Against Nonlinear (Thing Of Things)
tailcalled · 2024-01-18T21:40:00.369Z · comments (18)

Mira Murati leaves OpenAI/ OpenAI to remove non-profit control
Sodium · 2024-09-25T21:15:17.315Z · comments (4)

Against empathy-by-default
Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · comments (22)

[link] Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Ozyrus · 2023-11-20T08:23:00.791Z · comments (15)

[link] microwave drilling is impractical
bhauth · 2024-06-12T22:16:00.199Z · comments (14)

We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
robertzk (Technoguyrob) · 2024-03-06T05:03:09.639Z · comments (0)

Occupational Licensing Roundup #1
Zvi · 2024-10-30T11:00:04.516Z · comments (8)

[question] What's the theory of impact for activation vectors?
Chris_Leong · 2024-02-11T07:34:48.536Z · answers+comments (12)

John Schulman leaves OpenAI for Anthropic
Sodium · 2024-08-06T01:23:15.427Z · comments (0)

Voting Results for the 2022 Review
Ben Pace (Benito) · 2024-02-02T20:34:59.768Z · comments (3)

Medical Roundup #1
Zvi · 2024-01-16T20:30:35.802Z · comments (9)

The Bitter Lesson for AI Safety Research
adamk · 2024-08-02T18:39:36.884Z · comments (5)

[link] Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun (Daniel Braun) · 2024-05-17T16:25:02.267Z · comments (10)

So What's Up With PUFAs Chemically?
J Bostock (Jemist) · 2024-04-27T13:32:52.159Z · comments (23)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0
James Fox · 2024-07-06T11:34:57.227Z · comments (7)

[link] Defending against hypothetical moon life during Apollo 11
eukaryote · 2024-01-07T04:49:42.628Z · comments (9)

Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik (sustrik) · 2024-08-25T08:30:01.901Z · comments (2)

Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt · 2023-12-23T00:05:55.357Z · comments (10)

On the UBI Paper
Zvi · 2024-09-03T14:50:08.647Z · comments (6)

Transfer Learning in Humans
niplav · 2024-04-21T20:49:42.595Z · comments (1)

Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth · 2024-08-08T18:58:56.046Z · comments (6)

AI #86: Just Think of the Potential
Zvi · 2024-10-17T15:10:06.552Z · comments (8)

Dual Wielding Kindle Scribes
mesaoptimizer · 2024-02-21T17:17:58.743Z · comments (18)

Now THIS is forecasting: understanding Epoch’s Direct Approach
Elliot Mckernon (elliot) · 2024-05-04T12:06:48.144Z · comments (4)

[link] Congressional Insider Trading
Maxwell Tabarrok (maxwell-tabarrok) · 2024-08-30T13:32:57.264Z · comments (6)

[link] This is Water by David Foster Wallace
Nathan Young · 2024-04-24T21:21:09.445Z · comments (16)

[link] [EAForum xpost] A breakdown of OpenAI's revenue
dschwarz · 2024-07-10T18:09:20.017Z · comments (5)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nutrition-capsule on Overcoming Bias Anthology

Hanson seems to treat the global civilization as a cultural melting pot, but he does distinguish insular subcultures from that. I intuit he sees contemporary cultures on a gradient relative to global, hegemonic trends (which correlate with technological progress, increasing wealth and education) and thereby drifting pressures.

thane-ruthenis on The Compendium, A full argument about extinction risk from AGI

Sure. But:

and then they can do a bunch of the work of generalizing

This is the step which is best made unnecessary if you're crafting a message for a broad audience, I feel.

Most people are not going to be motivated to put this work in. Why would they? They get bombarded with a hundred credible-ish messages claiming high-importance content on a weekly basis. They don't have the time nor stamina to do a deep dive into each of them.

Which means any given subculture would generate its own "inferential bridge" between itself and your message, artefacts that do this work for the median member (consisting of reviews by any prominent subculture members, the takes that go viral, the entire shape of the discourse around the topic, etc.). The more work is needed, the longer these inferential bridges will be. The longer they are, the bigger the opportunity to willfully or accidentally mistranslate your message.

Like I said, it doesn't seem wise or even fair to your potential audience, to act as if those dynamics don't take place. As if the only people that deserve consideration are those that would put in the work themselves (despite the fact it may be a locally suboptimal way to distribute resources under their current world-model), and everyone else are lost causes.

cousin_it on Dentistry, Oral Surgeons, and the Inefficiency of Small Markets

Tragedy of capitalism in a nutshell. The best action is to dismantle the artificial scarcity of doctors. But the most profitable action is to build a company that will profit from that scarcity - and, when it gets big enough, lobby to perpetuate it.

richard_kennaway on JargonBot Beta Test

You and the LW team are indirectly responsible, but only for the general feature. You are not standing behind each individual statement the AI makes. If the author of the post does not vet it, no-one stands behind it. The LW admins can be involved only in hindsight, if the AI does something particularly egregious.

bogdan-ionut-cirstea on johnswentworth's Shortform

how recent reports of OpenAI’s o1 being deceptive have been questioned [LW(p) · GW(p)].

This seems to be confusing a dangerous capability eval (of being able to 'deceive' in a visible scratchpad) with an assessment of alignment, which seems like exactly what the 'questioning' was about.

adam_scholl on JargonBot Beta Test

I also use them rarely, fwiw. Maybe I'm missing some more productive use, but I've experimented a decent amount and have yet to find a way to make regular use even neutral (much less helpful) for my thinking or writing.

rom on MichaelDickens's Shortform

I agree with the claim you're making: that if FHI still existed and they applied for a grant from OP it would be rejected. This seems true to me.

I don't mean to nitpick, but it still feels misleading to claim "FHI could not get OP funding" when they did in fact get lots of funding from OP. It implies that FHI operated without any help from OP, which isn't true.

everydaybought on [deleted]

Preventing the onslaught of spam on the internet using digital ID's:

As LLM's start passing the turing test and beating CAPTCHA's, spammers will soon be able to pass as humans. Right now, people often draw conclusions and whole worldviews from interactions and consensus they observe online. But when bots are indistinguishable from humans, whoever has the most computing power will have the most representation online, and will be able to skew our perception of the world.

To prevent this, I think it's crucial for our sanity and epistemics that we have strong private digital identities so you can see next to a profile whether it's is a person or a bot. In order to protect anonymity, the system could use clever cryptography allowing people to prove that they are a real person but without revealing who they are (for things like whistleblowers etc). Alternatively, these systems could be limited to only knowing that you haven't spammed a certain number of requests in the past few minutes, still while protecting your anonymity.

The internet needs to be conducive to people forming consensus around facts since so many people nowadays base their opinions based on what they see online. I hope people lobby for digital ID systems to keep the internet from devolving.

tailcalled on What can we learn from insecure domains?

In crypto, a lot of people just HODL instead of using it for stuff in practice. I'd guess the more people use it, the more likely they are to run into one of the 99.9% of projects that are scams. (Though... if we count the people who've been hit by ransomware, it is non-obvious to me that the majority of users are HODLers rather than ransomeware victims.) To prevent losing one's crypto, there have also been developed techniques like "cold storage", which are extremely secure.

The HTTP server logs you posted aren't based on insecurity of most webservers, they are based on the insecurity of particular programs (or versions of programs or setups of programs). Important systems (e.g. online banking) almost always use different systems than the ones that are currently getting attacked. Attacks roll the dice in the hope that maybe they'll find someone with a known vulnerability to exploit, but presumably such exploits are extremely temporary.

Copilot is general instructed via the user of the program, and the user and is relatively trusted. I mean, people are still trying to "align" to be robust against the user, but 99.9% of the time that doesn't matter, and the remaining time is often stuff like internet harassment which is definitely not existentially risky, even if it is bad.

Some people are trying to introduce LLM agents into more general places, e.g. shops automatically handling emails from businesses. I'm pretty skeptical about this being secure, but if it turns out to be hopelessly insecure, I'd expect the shops to just decline using them.

Nuclear weapons were used twice when only the US had them. They only became existentially dangerous as multiple parties built up enormous stockpiles of them, but at the same time people understood that they were existentially dangerous and therefore avoided using them in war. More recently they've agreed that keeping such things around is bad and have been disassembling them under mutual surveillance. And they have systems set up to prevent other, less-stable countries from developing them.

habryka4 on The Compendium, A full argument about extinction risk from AGI

Like, here's a sanity-check: suppose you must convince a specific Creationist that the AGI Risk is real. Do you need to argue them out of Creationism in order to do so?

My guess is no, but also, my guess is we will probably still have better comms if I err on the side of explaining things how they come naturally to me, and entangled with the way I came to adopt a position, and then they can do a bunch of the work of generalizing. Of course, if something is deeply triggering or mindkilly to someone, then it's worth routing, but it's not like any analogy with evolution is invalid from the perspective of someone who believes in Creationism. Yes, some of the force of such an analogy would be lost, but most of it comes from the logical consistency, not the empirical evidence.