LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

[link] Link Collection: Impact Markets
Saul Munn (saul-munn) · 2023-12-26T09:01:48.815Z · comments (0)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

Probably Not a Ghost Story
George Ingebretsen (george-ingebretsen) · 2024-06-12T22:55:26.264Z · comments (4)

How to develop a photographic memory 2/3
PhilosophicalSoul (LiamLaw) · 2023-12-30T20:18:14.255Z · comments (7)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization
Jonas Hallgren · 2024-01-11T12:03:41.763Z · comments (3)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

[link] Attention on AI X-Risk Likely Hasn't Distracted from Current Harms from AI
Erich_Grunewald · 2023-12-21T17:24:16.713Z · comments (2)

[question] How are you preparing for the possibility of an AI bust?
Nate Showell · 2024-06-23T19:13:45.247Z · answers+comments (16)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

[question] Thoughts on Francois Chollet's belief that LLMs are far away from AGI?
O O (o-o) · 2024-06-14T06:32:48.170Z · answers+comments (17)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (2)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

AI #57: All the AI News That’s Fit to Print
Zvi · 2024-03-28T11:40:05.435Z · comments (14)

How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
Owain_Evans · 2024-03-28T02:34:21.799Z · comments (0)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

[link] Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles
Zack_M_Davis · 2024-03-02T22:05:49.553Z · comments (22)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

What is the best argument that LLMs are shoggoths?
JoshuaFox · 2024-03-17T11:36:23.636Z · comments (22)

Am I going insane or is the quality of education at top universities shockingly low?
ChrisRumanov (pseudonymous-ai) · 2023-11-20T03:53:30.056Z · comments (30)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

[link] my favourite Scott Sumner blog posts
DMMF · 2024-06-11T14:40:43.093Z · comments (0)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

[link] Forecasting future gains due to post-training enhancements
elifland · 2024-03-08T02:11:57.228Z · comments (2)

AI debate: test yourself against chess 'AIs'
Richard Willis · 2023-11-22T14:58:10.847Z · comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on is it possible to comment anonymously on a post?

Yeah, the officially approved way is to just generate an alt.

(But please don't vote with multiple accounts, we will probably notice, and we will ban you if we do)

lsusr on is it possible to comment anonymously on a post?

No, but you can create an alt account.

alenoach on Are we dropping the ball on Recommendation AIs?

What do you think is the main issue preventing companies from making more ethical recommendation algorithms? Is it the difficulty of determining objectively what is accurate and ethical? Or is it more about the incentives, like an unwillingness to sacrifice addictiveness and part of their audience?

zack_m_davis on Claude Sonnet 3.5.1 and Haiku 3.5

The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?

Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)

nathan-helm-burger on Claude Sonnet 3.5.1 and Haiku 3.5

For raw IQ, sure. I just mean "conversational flavor".

alenoach on Are we dropping the ball on Recommendation AIs?

Good recommendation engines are really important for our epistemic environment, in my opinion more than for example prediction markets. Because it indeed affects so much of the content that people ingest in their daily lives, on a large scale.

The tough question is how tractable it is. Tournesol has some audience, but also seems to struggle to scale it up despite pretty mature software. I really don't know how effective it would be to pressure companies like Facebook or TikTok, or to push for regulation, or to conduct more research on how to improve recommendation algorithms. Seems worth investigating whether there are cost-effective opportunities, whether through grants or job recommendations.

nisan on Linkpost: Memorandum on Advancing the United States’ Leadership in Artificial Intelligence

Section 3.3(f)(iii):

Within 120 days of the date of this memorandum, DOE, acting primarily through the National Nuclear Security Administration (NNSA) and in close coordination with AISI and NSA, shall seek to develop the capability to perform rapid systematic testing of AI models’ capacity to generate or exacerbate nuclear and radiological risks. This initiative shall involve the development and maintenance of infrastructure capable of running classified and unclassified tests, including using restricted data and relevant classified threat information. This initiative shall also feature the creation and regular updating of automated evaluations, the development of an interface for enabling human-led red-teaming, and the establishment of technical and legal tooling necessary for facilitating the rapid and secure transfer of United States Government, open-weight, and proprietary models to these facilities.

It sounds like the plan is for AI labs to transmit models to government datacenters for testing. I anticipate at least one government agency will quietly keep a copy for internal use.

brasslion on Hell is wasted on the evil

This is my favorite exchange I have ever read on LessWrong.

mo-putera on Adverse Selection by Life-Saving Charities

To be clear, GiveWell won’t be shocked by anything I’ve said so far. They’ve commissioned work and published reports on this. But as you might expect, these quality of life adjustments wouldnt feature in GiveWell’s calculations anyway, since the pitch to donors is about the price paid for a life, or a DALY.

Can you clarify what you mean by these quality of life adjustments not featuring in GiveWell's calculations?

To be more concrete, let's take their CEA of HKI's vitamin A supplementation (VAS) program in Burkina Faso. They estimate that a $1M grant would avert 553 under-5 deaths (~80% of total program benefit) and incrementally increase future income for the ~560,000 additional children receiving VAS (~20%) (these figures vary considerably by location by the way, from 60 deaths averted in Anambra, Nigeria to 1,475 deaths averted in Niger) then they convert this to 81,811 income-doubling equivalents (their altruistic common denominator — they don't use DALYs in any of their CEAs, so I'm always befuddled when people claim they do), make a lot of leverage- and funging-related adjustments which reduces this to 75,272 income doublings, then compare it with the 3,355 income doublings they estimate would be generated by donating that $1M to GiveDirectly to get their 22.4x cash multiplier for HKI VAS in Burkina Faso.

So: are you saying that GiveWell should add a "QoL discount" when converting lives saved and income increase, like what Happier Lives Institute suggests [EA · GW] for non-Epicurean accounts of badness of death?

baometrus on Universal Basic Income and Poverty

I have known some rich people. They don't act as a coordinated group almost ever; and the group they don't form, is flatly not capable of accurately predicting and deliberately directing world-historical equilibria over centuries.

Beg to differ. Rich people congregate, populate, and settle in areas, and form orgs and egregore that push away and hide the poor and the problems that make them uncomfortable, and this happens largely unconsciously - phenomenologically - yet there is organizational agency in it. And this kind of wealth phenomena has certainly steered world-historical equilibria over centuries. My point is: it's not a "conspiracy" of rich people, it's a phenomenon of rich people, and I think you should include the phenomenology of wealthy populations in your modeling.