LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

some thoughts on LessOnline
Raemon · 2024-05-08T23:17:41.372Z · comments (5)

[link] Datasets that change the odds you exist
dynomight · 2024-06-29T18:45:14.385Z · comments (4)

AI things that are perhaps as important as human-controlled AI
Chi Nguyen · 2024-03-03T18:07:24.291Z · comments (4)

Spatial attention as a “tell” for empathetic simulation?
Steven Byrnes (steve2152) · 2024-04-26T15:10:58.040Z · comments (12)

Introducing the WeirdML Benchmark
Håvard Tveit Ihle (havard-tveit-ihle) · 2025-01-16T11:38:17.056Z · comments (13)

Towards a formalization of the agent structure problem
Alex_Altair · 2024-04-29T20:28:15.190Z · comments (6)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Toward Safety Case Inspired Basic Research
Lucas Teixeira · 2024-10-31T23:06:32.854Z · comments (3)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

[link] a space habitat design
bhauth · 2024-11-25T17:28:48.481Z · comments (13)

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility
Johannes C. Mayer (johannes-c-mayer) · 2024-12-22T22:08:31.971Z · comments (35)

Safe Predictive Agents with Joint Scoring Rules
Rubi J. Hudson (Rubi) · 2024-10-09T16:38:16.535Z · comments (10)

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi (mtrazzi) · 2024-08-24T04:30:11.807Z · comments (0)

Calculating Natural Latents via Resampling
johnswentworth · 2024-06-06T00:37:42.127Z · comments (4)

[link] Building intuition with spaced repetition systems
Jacob G-W (g-w1) · 2024-05-12T15:49:04.860Z · comments (8)

[link] Demis Hassabis — Google DeepMind: The Podcast
Zach Stein-Perlman · 2024-08-16T00:00:04.712Z · comments (8)

[link] Questions are usually too cheap
Nathan Young · 2024-05-11T13:00:54.302Z · comments (19)

How the AI safety technical landscape has changed in the last year, according to some practitioners
tlevin (trevor) · 2024-07-26T19:06:47.126Z · comments (6)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

AI #99: Farewell to Biden
Zvi · 2025-01-16T14:20:05.768Z · comments (5)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

Luck Based Medicine: No Good Very Bad Winter Cured My Hypothyroidism
Elizabeth (pktechgirl) · 2024-12-08T20:10:02.651Z · comments (3)

[link] OpenAI releases GPT-4o, natively interfacing with text, voice and vision
Martín Soto (martinsq) · 2024-05-13T18:50:52.337Z · comments (23)

On Deliberative Alignment
Zvi · 2025-02-11T13:00:07.683Z · comments (1)

Provably Safe AI: Worldview and Projects
Ben Goldhaber (bgold) · 2024-08-09T23:21:02.763Z · comments (43)

On DeepSeek’s r1
Zvi · 2025-01-22T19:50:17.168Z · comments (2)

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

AI Safety as a YC Startup
Lukas Petersson (lukas-petersson-1) · 2025-01-08T10:46:29.042Z · comments (9)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Monthly Roundup #17: April 2024
Zvi · 2024-04-15T12:10:03.126Z · comments (4)

[question] What Have Been Your Most Valuable Casual Conversations At Conferences?
johnswentworth · 2024-12-25T05:49:36.711Z · answers+comments (21)

Predict 2025 AI capabilities (by Sunday)
Jonas V (Jonas Vollmer) · 2025-01-15T00:16:05.034Z · comments (3)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Math-to-English Cheat Sheet
nahoj · 2024-04-08T09:19:40.814Z · comments (5)

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning
rife (edgar-muniz) · 2025-01-15T22:59:46.321Z · comments (31)

Extended Interview with Zhukeepa on Religion
Ben Pace (Benito) · 2024-08-18T03:19:05.625Z · comments (59)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

Tax Price Gouging?
jefftk (jkaufman) · 2025-01-17T14:10:03.395Z · comments (20)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (34)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bernhard on Export Surplusses

Well please do derive it then, because to me it seems you just focused on one aspect and then concluded that that aspect definitely is the correct answer.

If the goal was to reward the best and the brightest, then why does china make some of them disappear from time to time? Why reeducate the odd billionaire who misbehaves? The idea was to get him in power because he knows better and generates riches, no?

On giving away stuff for 'free': what would be good examples in your opinion? Steel? Silicon or finished solar cells? Electric cars and batteries? Masks useful during pandemics?

Seriously?

I agree with you on the network effects and winner takes all mechanics. But to me that is not related at all to exports and their subsidies. Just making good stuff at a reasonable price is enough. If desired, production can be subsidized, sure, but that has positive effects in your country as well. Chinese people own lots of real estate, to of the line electronics and electric cars, more so than we do.

norimori1992 on [NSFW] The BDSM Path to No-Mind

For what it's worth, I still prefer the original title, even after seeing the rationale for changing it. Oh well.

bernhard on Export Surplusses

They cash in politically.

Imagine some middle eastern oil and money rich country going abroad to help develop a subsaharan economy to help them extract their resources. Imagine china doing the same. Who will have more success?

purplehermann on xpostah's Shortform

Our cruxes is whether the amount of investment to build one has a positive expected return on investment, breaking down into

If you could populate such a city
Whether this is a "try everything regardless of cost" issue, given that a replacent is being developed for other reasons.

I suggest focusing on 1, as it's pretty fundamental to your idea and easier to get traction on

davidmanheim on Alignment can be the ‘clean energy’ of AI

That is completely fair, and I was being uncharitable (which is evidently what happens when I post before I have my coffee, apologies.)

I do worry that we're not being clear enough that we don't have solutions for this worryingly near-term problem, and think that there's far too little public recognition that this is a hard or even unsolvable problem.

chipmonk on Prizes for ML Safety Benchmark Ideas

what came of this? (doing research on bounties, prizes, and retroactive funding rn)

kaj_sotala on Kaj's shortform feed

I dreamt that you could donate LessWrong karma to other LW users. LW was also an airport, and a new user had requested donations because to build a new gate at the airport, your post needed to have at least 60 karma and he had a plan to construct a series of them. Some posts had exactly 60 karma, with titles like "Gate 36 done, let's move on to the next one - upvote the Gate 37 post!".

(If you're wondering what the karma donation mechanism was needed for if users could just upvote the posts normally - I don't know.)

Apparently the process of constructing gates was separate from connecting them to the security control, and things had stopped at gate 36/37 because it needed to be connected up with security first. I got the impression that this was waiting for the security people to get it done.

seth-herd on Dream, Truth, & Good

I really like this general diretion of work: suggestions for capabilities that would also help with understanding and controlling network behavior. That would in turn be helpful for real alignment of network-based AGI. Proposing dual-use capabilities advances seems like a way to get alignment ideas actually implemented. That's what I've done in System 2 Alignment [LW · GW], although that's also a prediction about what developers might try for alignment by default.

Whether the approach you outline here would work is an empirical question, but it sounds likely enough that teams might actually put some effort into it. Preprocessing data to identify authors and similar categories wouldn't be that hard.

This helps with the problem Nate Soares characterized as making cognition aimable at all [LW · GW] - having AI pursue one coherent goal, (separately from worrying about whether you can direct that "goal slot" toward something that actually works). I think that's the alignment issue you're addressing (along with slop potentially leading to bad AI-assisted alignment). I briefly describe the LLM agent alignment part of that issue in Seven sources of goals in LLM agents [LW · GW].

I hope I'm reading you right about why you think reducing AI slop would help with alignment.

legionnaire on Have LLMs Generated Novel Insights?

It's hard to see what a novel insight is exactly. Any example can be argued against. Can you give an example of one? Or of one you've personally had?

Various LLMs can spot issues in code bases that are not public. Do all of these count?

yair-halberstadt on what an efficient market feels from inside

I think a classic example of an efficient market is one where goods are mostly fungible, e.g. the market for grain, or screws of a particular specification, or copper.

I imagine that inside those markets it feels a lot less like there's any good deals to sniff out. There's definitely bad ones like fraudsters or subpar quality, or someone selling holy screws for 10 times the price, or someone just praying on newcomers to the market who aren't yet calibrated to the standard price, but these are fairly easy to filter out with a bit of due diligence.