LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Transformer Debugger
Henk Tillman (henk-tillman) · 2024-03-12T19:08:56.280Z · comments (0)

Clipboard Filtering
jefftk (jkaufman) · 2024-04-14T20:50:02.256Z · comments (1)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

No Electricity in Manchuria
winstonBosan · 2024-11-19T01:11:58.661Z · comments (0)

[link] Linkpost: "Imagining and building wise machines: The centrality of AI metacognition" by Johnson, Karimi, Bengio, et al.
Chris_Leong · 2024-11-11T16:13:26.504Z · comments (6)

[link] The Best Essay (Paul Graham)
Chris_Leong · 2024-03-11T19:25:42.176Z · comments (2)

[link] Was a Subway in New York City Inevitable?
Jeffrey Heninger (jeffrey-heninger) · 2024-03-30T00:53:21.314Z · comments (4)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

[question] When engaging with a large amount of resources during a literature review, how do you prevent yourself from becoming overwhelmed?
corruptedCatapillar · 2024-11-01T07:29:49.262Z · answers+comments (2)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (13)

Twin Peaks: under the air
KatjaGrace · 2024-05-31T01:20:04.624Z · comments (2)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

How likely is brain preservation to work?
Andy_McKenzie · 2024-11-18T16:58:54.632Z · comments (3)

Abstractions are not Natural
Alfred Harwood · 2024-11-04T11:10:09.023Z · comments (21)

[link] A Theory of Equilibrium in the Offense-Defense Balance
Maxwell Tabarrok (maxwell-tabarrok) · 2024-11-15T13:51:33.376Z · comments (6)

[link] SB 1047 gets vetoed
ryan_b · 2024-09-30T15:49:38.609Z · comments (1)

Beta Tester Request: Rallypoint Bounties
lukemarks (marc/er) · 2024-05-25T09:11:11.446Z · comments (4)

How to put California and Texas on the campaign trail!
Yair Halberstadt (yair-halberstadt) · 2024-11-06T06:08:25.673Z · comments (4)

AXRP Episode 30 - AI Security with Jeffrey Ladish
DanielFilan · 2024-05-01T02:50:04.621Z · comments (0)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (9)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] Report: Evaluating an AI Chip Registration Policy
Deric Cheng (deric-cheng) · 2024-04-12T04:39:45.671Z · comments (0)

[link] Conventional footnotes considered harmful
dkl9 · 2024-10-01T14:54:01.732Z · comments (16)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

[link] Let's Design A School, Part 2.3 School as Education - The Curriculum (Phase 2, Specific)
Sable · 2024-05-15T20:58:50.981Z · comments (0)

[link] Tokyo AI Safety 2025: Call For Papers
Blaine (blaine-rogers) · 2024-10-21T08:43:38.467Z · comments (0)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

Boring & straightforward trauma explanation
lemonhope (lcmgcd) · 2024-11-08T09:45:19.486Z · comments (7)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

Good Reasons for Alts
jefftk (jkaufman) · 2024-12-21T01:30:03.113Z · comments (2)

[link] The Alignment Simulator
Yair Halberstadt (yair-halberstadt) · 2024-12-22T11:45:55.220Z · comments (3)

Second-Time Free
jefftk (jkaufman) · 2024-12-11T03:30:01.289Z · comments (4)

Visual demonstration of Optimizer's curse
Roman Malov · 2024-11-30T19:34:07.700Z · comments (3)

A few questions about recent developments in EA
Peter Berggren (peter-berggren) · 2024-11-23T02:36:25.728Z · comments (12)

How to bet on AI, without helping AGI?
Nicholas / Heather Kross (NicholasKross) · 2024-11-29T22:46:03.109Z · comments (0)

The Queen’s Dilemma: A Paradox of Control
Daniel Murfet (dmurfet) · 2024-11-27T10:40:14.346Z · comments (11)

[link] Genetically edited mosquitoes haven't scaled yet. Why?
alexey · 2024-12-30T21:37:32.942Z · comments (0)

UDT1.01: Local Affineness and Influence Measures (2/10)
Diffractor · 2024-03-31T07:35:52.831Z · comments (0)

[link] Extinction Risks from AI: Invisible to Science?
VojtaKovarik · 2024-02-21T18:07:33.986Z · comments (7)

Even if we lose, we win
Morphism (pi-rogers) · 2024-01-15T02:15:43.447Z · comments (17)

[link] Cellular respiration as a steam engine
dkl9 · 2024-02-25T20:17:38.788Z · comments (1)

Distinctions when Discussing Utility Functions
ozziegooen · 2024-03-09T20:14:03.592Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

cstinesublime on CstineSublime's Shortform

What about the incentives? PWC is apparently OpenAI's largest enterprise customer. I don't know how much PWC actually use the tools in-house and how much they use to on-sell "Digital Transformation" onto their own and new customers. How might this be affecting the way that OpenAI develop their products?

ozziegooen on Building AI Research Fleets

It's possible that from the authors perspective, the specific semantic meanings I took from terms like "automated alignment research" and "fleets" wasn't implied. But if I made the mistake, I'm sure other readers will as well, so I'd like to encourage changes here before these phrases take off much further (if others agree with my take.)

ozziegooen on Building AI Research Fleets

I'm happy this area is getting more attention.

I feel nervous about the terminology. I think that terminology can presuppose some specific assumptions about how this should or will play out, that I don't think are likely.

"automating alignment research" -> I know this has been used before, it sounds very high-level to me. Like saying that all software used as part of financial trading workflows is "automating financial trading." I think it's much easier to say that software is augmenting financial trading or similar. There's not one homogeneous thing called "financial trading," the term typically emphasises the parts that aren't yet automated. The specific ways it's integrated sometimes involve it replacing entire people, sometimes involve it helping people, and often does both in complex ways.

"Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists."
In software, the word fleet sometimes refers to specific deployment strategies. A whole lot of the automation doesn't look like "bots" - rather it's a lot of regular tools, plug-ins, helpers, etc.

"vast digital fleets of specialized AI agents working in concert"
This is one architecture we can choose, but I'm not sure how critical/significant it will be. I very much agree that AI will be a big deal, but this makes it sound like you're assuming a specific way for AI to be used.

All that said, I'm very much in favor of us taking a lot of advantage of AI systems for all the things we want in the world, including AI safety. I imagine that for AI safety, we'll probably use a very eccentric and complex mix of AI technologies. Some with directly replace some existing researchers, we'll have specific scripts for research experiments, maybe agent-like things that do ongoing oversight, etc.

jacob_cannell on How will we update about scheming?

Abilities/intelligence come almost entirely from pretraining, so all the situation awareness and scheming capability that current (and future similar) frontier models possess is thus also mostly present in the base model.

Yes, but for scheming, we care about whether the AI can self-locate itself as an AI using its knowledge. The fact that (at a minimum) sampling from the system is required for it to self-locate as an AI might make a big difference here.

So if your 'yes' above is agreeing that capabilities - including scheming - come mostly from pretraining, then I don't see how relevant it is whether or not that ability is actually used/executed much in pretraining, as the models we care about will go through post-training and I doubt you are arguing post-training will reliably remove scheming.

I also think it seems probably very hard to train a system capable of obsoleting top human experts which doesn't understand that it is an AI even if you're willing to take a big competitiveness hit.

Indeed but that is entirely the point - by construction!

Conceptually we have a recipe R (arch, algorithms, compute, etc), and a training dataset which we can parameterize by time cutoff T. Our objective (for safety research) is not to train a final agent, but instead to find a safe/good R with minimal capability penalty. All important results we care about vary with R independently of T, but competitiveness/dangerousness does vary strongly with T.

Take the same R but vary the time cutoff T of the training dataset: the dangerousness of the AI will depend heavily on T, but not the relative effectiveness of various configurations of R. That is simply a restatement of the ideal requirements for a safe experimental regime. Models/algos that work well for T of 1950 will also work for T of 2020 etc.

dakara on Rebuttals for ~all criticisms of AIXI

Since this post is about rebutting criticisms of AIXI, I feel it would be only fair to include Rob Bensinger's criticism [LW · GW]. I considered it to be the strongest criticism of AIXI by a mile. Do you have any rebuttals for that post?

just_browsing on Everywhere I Look, I See Kat Woods

Well put and I agree.

Karma is tricky as a measure because subreddits are non-stationary. In particular, I feel like the "vibes" of all the subreddits I listed were different 6+ months ago, and they are becoming more homogenous (in part due to power users such as Kat Woods). I don't know of a way to view what the "hot" page of any given subreddit would have looked like at some previous point in time, so it's hard to find data to understand subreddit culture drift. Anyway, the high karma is also consistent with selection effects, where the users who do not like this content bounce off, and only the users that do stick around those subreddits in the long term.

just_browsing on Everywhere I Look, I See Kat Woods

Typically I agree with the underlying facts behind her memes! For example I also think AI safety is a pressing issue. If her memes were funny I would instead be writing a post about how awesome it is that Kat Woods is everywhere. My main objection is that I do not like the packaging of the ideas she is spreading. For example the memes are not funny. (See the outline of this post: content, vibes, conduct.)

You asked for an example of Kat Woods content that aims to convince rather than educate. Here is one recent example. I feel like the packaging of this meme conveys: "all of the objections you might have to the idea of X-risk via AI can actually be easily be debunked, therefore you would be stupid to not believe X-risk via AI".

In reality, questions regarding likelihood of x-risk via AI are really tricky. Many thoughtful people have thought about these problems at great length and declared them to be hard and full of uncertainty. I feel like this meme doesn't convey this at all. Therefore, I'm not sure whether it is good for peoples' brains to consume this content. I will certainly say it's not good for my brain to consume this content.

jamie-joyce on The Difference Between Prediction Markets and Debate (Argument) Maps

I asked a leading question of our "Perspectives" system, and it gave a few hundred hypothetical reasons (here)

cstinesublime on Shortform

I have my own theories about the intentions which I do not feel comfortable discussing, so I'll focus on the practicalities and case studies which show why this complex and difficult to execute:
some hostages have been killed by the IDF during rescue operations, this isn't uncommon, the lone hostage was killed during a French raid in Somalia, consider the Lindt Cafe Siege in Sydney where a pregnant hostage was killed by ricocheting police bullet fire when they finally stormed in, three other hostages and a policeman were injured. This was a lone gunman, I can imagine that the Hamas hostage takers are well organized groups. A hostage during the Gladbeck Crises in Germany were also injured by police fire.

Kidnapping someone who "knows" the location of some hostages I would guess is highly ineffective for many reasons, Torture is a notoriously inaccurate source of information: hence the propensity for false admissions or telling interrogators what they want to hear. That and I suspect that there is a intentional system of moving around hostages from place to place, and never explicitly sharing locations with others to minimize the risk of locations leaking.

If someone who knows the exact location of a hostage has not been heard from for 24 hours, it is probably a good idea to move to a new location anyway.

Finally there is the incredible danger to the IDF soldiers themselves going into a dynamic environment where they don't know how much resistance they will encounter, being expected to minimize the harm to hostages while almost certainly coming under fire. It's probable suicide.

jamie-joyce on The Difference Between Prediction Markets and Debate (Argument) Maps

Thank you. So what do you think the cause of that is, and why do you think that cause exists and will it always exist?