LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Atlantis: Berkeley event venue available for rent
Jonas V (Jonas Vollmer) · 2023-11-22T01:47:12.026Z · comments (0)

A Model-based Approach to AI Existential Risk
Sammy Martin (SDM) · 2023-08-25T10:32:16.817Z · comments (9)

AI #36: In the Background
Zvi · 2023-11-02T18:00:01.803Z · comments (5)

Is Light Drinking Protective?
jefftk (jkaufman) · 2023-07-31T03:00:02.658Z · comments (8)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (12)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] AI Rights for Human Safety
Simon Goldstein (simon-goldstein) · 2024-08-01T23:01:07.252Z · comments (6)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

[link] NYT on the Manifest forecasting conference
Austin Chen (austin-chen) · 2023-10-09T21:40:16.732Z · comments (14)

Startup Roundup #2
Zvi · 2024-08-06T13:30:06.554Z · comments (0)

Dating Roundup #3: Third Time’s the Charm
Zvi · 2024-05-08T13:30:03.232Z · comments (26)

[link] Paper: Tell, Don't Show- Declarative facts influence how LLMs generalize
Owain_Evans · 2023-12-19T19:14:26.423Z · comments (4)

[link] Book review: Everything Is Predictable
PeterMcCluskey · 2024-05-27T03:33:53.857Z · comments (0)

Principled Satisficing To Avoid Goodhart
JenniferRM · 2024-08-16T19:05:27.204Z · comments (2)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

AI #53: One More Leap
Zvi · 2024-02-29T16:10:04.049Z · comments (0)

The Gemini Incident Continues
Zvi · 2024-02-27T16:00:05.648Z · comments (6)

Monthly Roundup #18: May 2024
Zvi · 2024-05-13T12:30:04.863Z · comments (10)

AI #72: Denying the Future
Zvi · 2024-07-11T15:00:05.865Z · comments (8)

Some open-source dictionaries and dictionary learning infrastructure
Sam Marks (samuel-marks) · 2023-12-05T06:05:21.903Z · comments (7)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

AI #32: Lie Detector
Zvi · 2023-10-05T13:50:05.030Z · comments (19)

Quick thoughts on the implications of multi-agent views of mind on AI takeover
Kaj_Sotala · 2023-12-11T06:34:06.395Z · comments (14)

AI #38: Let’s Make a Deal
Zvi · 2023-11-16T19:50:05.442Z · comments (2)

On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche
Zack_M_Davis · 2024-01-09T23:12:20.349Z · comments (31)

D&D.Sci Long War: Defender of Data-mocracy Evaluation & Ruleset
aphyer · 2024-05-14T03:35:10.586Z · comments (3)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (9)

[link] Level up your spreadsheeting
angelinahli · 2024-05-25T14:57:19.730Z · comments (11)

On Trust
johnswentworth · 2023-12-06T19:19:07.680Z · comments (26)

[link] LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery (arjun-panickssery) · 2024-04-17T21:09:12.007Z · comments (1)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (8)

Truthseeking, EA, Simulacra levels, and other stuff
Elizabeth (pktechgirl) · 2023-10-27T23:56:49.198Z · comments (12)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

Auditing failures vs concentrated failures
ryan_greenblatt · 2023-12-11T02:47:35.703Z · comments (0)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

[link] Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby · 2024-02-02T05:49:11.189Z · comments (1)

[link] Fluent dreaming for language models (AI interpretability method)
tbenthompson (ben-thompson) · 2024-02-06T06:02:59.296Z · comments (4)

D&D.Sci Long War: Defender of Data-mocracy
aphyer · 2024-04-26T22:30:15.780Z · comments (20)

Higher-Order Forecasts
ozziegooen · 2024-05-22T21:49:42.802Z · comments (1)

What does davidad want from «boundaries»?
Chipmonk · 2024-02-06T17:45:42.348Z · comments (1)

Commonsense Good, Creative Good
jefftk (jkaufman) · 2023-09-27T19:50:07.486Z · comments (11)

[link] Amazon to invest up to $4 billion in Anthropic
Davis_Kingsley · 2023-09-25T14:55:35.983Z · comments (8)

Announcing Atlas Computing
miyazono · 2024-04-11T15:56:31.241Z · comments (4)

[link] Manifund: What we're funding (weeks 2-4)
Austin Chen (austin-chen) · 2023-08-04T16:00:33.227Z · comments (2)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

Userscript to always show LW comments in context vs at the top
Vlad Sitalo (harcisis) · 2023-11-21T17:53:30.418Z · comments (8)

AI #60: Oh the Humanity
Zvi · 2024-04-18T14:10:02.281Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

screwtape on Open Thread Fall 2024

From my observations it's fairly common for post-rationalists to go to rationalist events and vice-versa, so there's at least engagement on the level of waving hello in the lunchroom. There's enough overlap in identification that some people people in both categories read each other's blogs, and the essays that wind up at the intersection of both interests will have some back and forth in the comments. Are you looking for something more substantial than that?

I can't think of any reverting rationalists off the top of my head, though they might well be out there.

nina-panickssery on Alexander Gietelink Oldenziel's Shortform

Isn't this already the commonly-accepted reason why sunglasses are cool?

Anyway, Claude agrees with you (see 1 and 3)

screwtape on Open Thread Fall 2024

I think the best Less Wrong Census for mental illness would be 2016 [? · GW], though 2012 did ask about autism. You're probably going to have better luck using the 2024 SSC/ACX survey data, as it's more recent and bigger.

Have fun!

queelius on Bitter lessons about lucid dreaming

It may be that what we call lucid dreaming is just an instance of dreaming that “you” are aware that “you” are dreaming. Consider the following:

(1) It must be possible for humans to have dreams where the narrative includes the notion that you are aware of dreaming—where the dream constructs a scenario that mimics lucidity, but this awareness is merely part of the dream’s content.

(2) It is not necessarily possible for humans to engage in lucid dreaming, as traditionally understood.

Distinguishing between lucid dreaming and (1) seems challenging, though evidence does exist that increases the posterior probability of true lucid dreaming (e.g., eye movement tests and neurological correlates of consciousness).

I'm just a layman on this topic, so my views on this may be somewhat naive and uninformed.

remmelt-ellen on [deleted]

Donation opportunities for restricting AI companies

Pause AI: protests and lobbying
Stop AI: is barricading OpenAI
FoxGlove: a legal non-profit targeting big tech scaling.
Disruption Network Institute: whistleblowers and researchers reporting on the AI military misuses.
Distributed AI Research Institute: AI ethics researchers giving general AI advocates hell and advocating for specialised models serving communities.
European Guild for AI Regulation: lobbying in the EU against the data scraping that makes large models possible.
Concept Art Association: lobbying in the EU against the data scraping that makes large models possible.

In my pipeline:

funding a 'horror documentary' against AI by an award-winning documentary maker (got a speculation grant of $50k)
funding lawyers in the EU for some high-profile lawsuits and targeted consultations with EU AI Office.

If you're a donor, I can give you details on their current activities. I worked with staff in each of these organisations. DM me.

thane-ruthenis on LLMs can learn about themselves by introspection

One is introspecting on your current mental state ("I feel a headache starting")

That's mostly what I had in mind as well. It still implies the ability to access a hierarchical model of your current state.

You're not just able to access low-level facts like "I am currently outputting the string 'disliked'", you also have access to high-level facts like "I disliked the third scene because it was violent", "I found the plot arcs boring", "I hated this movie", from which the low-level behaviors are generated.

Or using your example, "I feel a headache starting" is itself a high-level claim. The low-level claim is "I am experiencing a negative-valence sensation from the sensory modality A of magnitude X", and the concept of a "headache" is a natural abstraction over a dataset of such low-level sensory experiences.

scarcegreengrass on A Narrow Path: a plan to deal with AI extinction risk

It sounds like the core idea is a variant of the Intelligence Manhattan Project idea, but with a focus on long term international stability & a ban on competitors.

Perhaps the industry would be more likely to adopt this plan if GUARD could seek revenue the way corporations currently do: by selling stock & API subscriptions. This would also increase productivity for GUARD & shorten the dangerous arms race interval.

david-johnston on A brief theory of why we think things are good or bad

I have a pedantic and a non-pedantic answer to this. Pedantic: you say X is "usually considered good" if it increases welfare. Perhaps you mean to imply that if X is usually considered good then it is good. In this case, I refer you to the rest of the paragraph you quote.

Non-pedantic: yes, it's true that once you accept some fundamental assumptions about goodness and badness you can go about theorising and looking for evidence. I'm suggesting that motivated reasoning is the mechanism that makes those fundamental assumptions believable.

I added a paragraph mentioning this, because I think your reaction is probably common.

gordon-seidoh-worley on Information vs Assurance

The information/assurance split feels quite familiar to me as an engineering manager.

My work life revolves around projects, especially big projects that takes months to complete. Other parts of the business depend on when these projects will be done. In some cases, the entire company's growth plans may hinge on my team completing a project by a certain time. And so everyone wants as much assurance as possible about when projects will complete.

This makes it really hard to share information, because people are so hungry for assurance they will interpret almost any sharing of information as assurance. A typical conversation I used to have when I was naive to this fact:

Sales manager: Hey, Gordon, when do you think that project will be done?
Me: Oh, if things go according to plan, probably next month.
Sales manager: Cool, thanks for the update!

If the project ships next month, no problem. But as often happens in software engineering, if the project gets delayed, now the sales manager is upset:

Them: Hey, you said it would be ready next month. What gives?
Me: I said if things went according to plan, but there were surprises, so it took us longer than we initially though it would.
Them: Dammit. I sold a customer on the assumption that the project was shipping this month! What am I supposed to tell them now?
Me: I don't know, why did you do that? I was giving you an internal estimate, not a promise of delivery.
Them: You said this month. I'm tired of Engineering always having some excuse about why stuff is delayed.

What did I do wrong? I failed to understand that Sales, and most other functions in a software business, are so dependent and hungry for information from Engineering, that they saw the assurance they wanted to see rather than the information I was giving.

I've (mostly) learned my lesson. I have to carefully control how much I say to anyone not directly involved in the project, lest they get the wrong idea.

Someone: Hey, Gordon, when do you think that project will be done?
Me: We're working on it. We set a goal of having it complete by end of next quarter.

Do I actually expect it to take all the way to next quarter? No. Most likely it'll be done next month. But if anything unexpected happens, now I've given a promise I can keep.

This isn't exactly just "underpromise, overdeliver". That's part of it, but it's also about noticing when you're accidentally making a promise, even when you think you're not, even if you say really explicitly that you're not making a promise, someone will interpret as a promise and now you'll have to deal with that.

roko on If far-UV is so great, why isn't it everywhere?

Yes, certain places like preschools might benefit even from an isolated install.

But that is kind of exceptional.

The world isn't an efficient market, especially because people are kind of set in their ways and like to stick to the defaults unless there is strong social pressure to change.