LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

[link] Linkpost: A Post Mortem on the Gino Case
Linch · 2023-10-24T06:50:42.896Z · comments (7)

South Bay SSC Meetup, San Jose, November 5th.
David Friedman (david-friedman) · 2023-10-24T04:50:50.974Z · comments (1)

AI Pause Will Likely Backfire (Guest Post)
jsteinhardt · 2023-10-24T04:30:02.113Z · comments (6)

Human wanting
TsviBT · 2023-10-24T01:05:39.374Z · comments (1)

[link] Towards Understanding Sycophancy in Language Models
Ethan Perez (ethan-perez) · 2023-10-24T00:30:48.923Z · comments (0)

Manifold Halloween Hackathon
Austin Chen (austin-chen) · 2023-10-23T22:47:18.462Z · comments (0)

Open Source Replication & Commentary on Anthropic's Dictionary Learning Paper
Neel Nanda (neel-nanda-1) · 2023-10-23T22:38:33.951Z · comments (12)

[link] The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
EJT (ElliottThornley) · 2023-10-23T21:00:48.398Z · comments (22)

[link] AI Alignment [Incremental Progress Units] this Week (10/22/23)
Logan Zoellner (logan-zoellner) · 2023-10-23T20:32:37.998Z · comments (0)

z is not the cause of x
hrbigelow · 2023-10-23T17:43:59.563Z · comments (2)

Some of my predictable updates on AI
Aaron_Scher · 2023-10-23T17:24:34.720Z · comments (8)

Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation
Fabien Roger (Fabien) · 2023-10-23T16:37:45.611Z · comments (3)

Machine Unlearning Evaluations as Interpretability Benchmarks
NickyP (Nicky) · 2023-10-23T16:33:04.878Z · comments (2)

[link] VLM-RM: Specifying Rewards with Natural Language
ChengCheng (ccstan99) · 2023-10-23T14:11:34.493Z · comments (2)

Contra Dance Dialect Survey
jefftk (jkaufman) · 2023-10-23T13:40:08.294Z · comments (0)

[question] Which LessWrongers are (aspiring) YouTubers?
Mati_Roy (MathieuRoy) · 2023-10-23T13:21:49.004Z · answers+comments (13)

[question] What is an "anti-Occamian prior"?
Zane · 2023-10-23T02:26:10.851Z · answers+comments (22)

AI Safety is Dropping the Ball on Clown Attacks
trevor (TrevorWiesinger) · 2023-10-22T20:09:31.810Z · comments (72)

The Drowning Child
Tomás B. (Bjartur Tómas) · 2023-10-22T16:39:53.016Z · comments (8)

Announcing Timaeus
Jesse Hoogland (jhoogland) · 2023-10-22T11:59:03.938Z · comments (15)

[link] Into AI Safety - Episode 0
jacobhaimes · 2023-10-22T03:30:57.865Z · comments (1)

Thoughts On (Solving) Deep Deception
Jozdien · 2023-10-21T22:40:10.060Z · comments (2)

Best effort beliefs
Adam Zerner (adamzerner) · 2023-10-21T22:05:59.382Z · comments (9)

How toy models of ontology changes can be misleading
Stuart_Armstrong · 2023-10-21T21:13:56.384Z · comments (0)

Soups as Spreads
jefftk (jkaufman) · 2023-10-21T20:30:08.320Z · comments (0)

Which COVID booster to get?
Sameerishere · 2023-10-21T19:43:04.273Z · comments (0)

Alignment Implications of LLM Successes: a Debate in One Act
Zack_M_Davis · 2023-10-21T15:22:23.053Z · comments (50)

How to find a good moving service
Ziyue Wang (VincentWang25) · 2023-10-21T04:59:07.814Z · comments (0)

Apply for MATS Winter 2023-24!
Rocket (utilistrutil) · 2023-10-21T02:27:34.350Z · comments (6)

[question] Can we isolate neurons that recognize features vs. those which have some other role?
Joshua Clancy (joshua-clancy) · 2023-10-21T00:30:11.758Z · answers+comments (2)

Muddling Along Is More Likely Than Dystopia
Jeffrey Heninger (jeffrey-heninger) · 2023-10-20T21:25:15.459Z · comments (10)

What's Hard About The Shutdown Problem
johnswentworth · 2023-10-20T21:13:27.624Z · comments (31)

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
jacobjacob · 2023-10-20T21:04:32.645Z · comments (30)

TOMORROW: the largest AI Safety protest ever!
Holly_Elmore · 2023-10-20T18:15:18.276Z · comments (25)

The Overkill Conspiracy Hypothesis
ymeskhout · 2023-10-20T16:51:20.308Z · comments (8)

I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines
307th · 2023-10-20T16:37:46.541Z · comments (32)

Internal Target Information for AI Oversight
Paul Colognese (paul-colognese) · 2023-10-20T14:53:00.284Z · comments (0)

On the proper date for solstice celebrations
jchan · 2023-10-20T13:55:02.999Z · comments (0)

Are (at least some) Large Language Models Holographic Memory Stores?
Bill Benzon (bill-benzon) · 2023-10-20T13:07:02.041Z · comments (4)

[link] Mechanistic interpretability of LLM analogy-making
Sergii (sergey-kharagorgiev) · 2023-10-20T12:53:26.550Z · comments (0)

[link] How To Socialize With Psycho(logist)s
Sable · 2023-10-20T11:33:46.066Z · comments (11)

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling
jdp · 2023-10-20T07:32:28.749Z · comments (15)

Features and Adversaries in MemoryDT
Joseph Bloom (Jbloom) · 2023-10-20T07:32:21.091Z · comments (6)

[link] AI Safety Hub Serbia Soft Launch
DusanDNesic · 2023-10-20T07:11:48.389Z · comments (1)

Announcing new round of "Key Phenomena in AI Risk" Reading Group
DusanDNesic · 2023-10-20T07:11:09.360Z · comments (2)

Unpacking the dynamics of AGI conflict that suggest the necessity of a premptive pivotal act
Eli Tyre (elityre) · 2023-10-20T06:48:06.765Z · comments (2)

[link] Genocide isn't Decolonization
robotelvis · 2023-10-20T04:14:07.716Z · comments (19)

Trying to understand John Wentworth's research agenda
johnswentworth · 2023-10-20T00:05:40.929Z · comments (11)

Boost your productivity, happiness and health with this one weird trick
ajc586 (Adrian Cable) · 2023-10-19T23:30:54.734Z · comments (9)

[link] A Good Explanation of Differential Gears
Johannes C. Mayer (johannes-c-mayer) · 2023-10-19T23:07:46.354Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

aaron_scher on DeepMind's "Frontier Safety Framework" is weak and unambitious

Just chiming in that I appreciate this post, and my independent impressions of reading the FSF align with Zach's conclusions: weak and unambitious.

A couple additional notes:

The thresholds feel high — 6/7 of the CCLs feel like the capabilities would be a Really Big Deal in prosaic terms, and ~4 feel like a big deal for x-risk. But you can't say whether the thresholds are "too high" without corresponding safety mitigations, which this document doesn't have. (Zach)

These also seemed pretty high to me, which is concerning given that they are "Level 1". This doesn't necessarily imply but it does hint that there won't be substantial mitigations — above the current level — required until those capability levels. My guess is that current jailbreak prevention is insufficient to mitigate substantial risk from models that are a little under the level 1 capabilities for e.g., bio.

GDP gets props for specifically indicating ML R&D + "hyperbolic growth in AI capabilities" as a source of risk.

Given the lack of commitments, it's also somewhat unclear what scope to expect this framework to eventually apply to. GDM is a large org with, presumably, multiple significant general AI capabilities projects. Especially given that "deployment" refers to external deployment, it seems like there's going to be substantial work to ensuring that all the internal AI development projects proceed safely. e.g., when/if there are ≥3 major teams and dozens of research projects working on fine-tuning highly capable models (e.g., base model just below level 1), compliance may be quite difficult. But this all depends on what the actual commitments and mechanisms turn out to be. This comes to mind after this event a few weeks ago, where it looks like a team at Microsoft released a model without following all internal guidelines, and then tried to unrelease it (but I could be confused).

keltan on Some perspectives on the discipline of Physics

I’m just finishing up an intro to physics course for university this semester. I self taught math last year. So I was expecting the hardest part to be the math itself. But actually the hardest part was similar to the start of “what else is there to say”. Understanding that formulas are written with different symbols depending on who is writing them and how they are feeling that day.

Like, why can (s) represent:

Seconds
Time itself
Distance
Displacement
Probably a few other things I’m forgetting

The hand rules for magnetic fields all called different things by different people. What!?

Still not finished reading this post. But I’m really enjoying it so far. Hope to see you at LessOnline!

linch on OpenAI: Exodus

This has been OpenAI's line (whether implicitly or explicitly) for a while iiuc. I referenced it on my Open Asteroid Impact website, under "Operation Death Star."

wassname on Language Models Model Us

Feel free to suggest improvements, it's just what worked for me, but is limited in format

hiddenprior on OpenAI: Exodus

Super helpful! Thanks!

hiddenprior on OpenAI: Exodus

I am limited in my means, but I would commit to a fund for strategy 2. My thoughts were on strategy 2, and it seems likely to do the most damage to OpenAI's reputation (and therefore funding) out of the above options. If someone is really protective of something, like their public image/reputation, that probably indicates that it is the most painful place to hit them.

michaeldickens on OpenAI: Exodus

This also stood out to me as a truly insane quote. He's almost but not quite saying "we have raised awareness that this bad thing can happen by doing the bad thing"

michaeldickens on OpenAI: Exodus

Some ideas:

Make Sam Altman look stupid on Twitter, which will marginally persuade more employees to quit and more potential investors not to invest (this is my worst idea but also the easiest, and people seem to pretty much have this one covered already)
Pay a fund to hire a good lawyer to figure out a strategy to nullify the non-disparagement agreements. Maybe a class-action lawsuit, maybe a lawsuit on the behalf of one individual, maybe try to charge Altman with some sort of crime, I'm not sure the best way to do this but that's the lawyer's job to figure out.
Have everyone call their representative in support of SB 1047, or maybe even say you want SB 1047 to have stronger whistleblower protections or something similar.

ann-brown on Testing for parallel reasoning in LLMs

Going to message you a suggestion I think.

yanni-kyriacos on yanni's Shortform

Please help me find research on aspiring AI Safety folk!

I am two weeks into the strategy development phase of my movement building and almost ready to start ideating some programs for the year.

But I want these programs to be solving the biggest pain points people experience when trying to have a positive impact in AI Safety .

Has anyone seen any research that looks at this in depth? For example, through an interview process and then survey to quantify how painful the pain points are?

Some examples of pain points I've observed so far through my interviews with Technical folk:

I often felt overwhelmed by the vast amount of material to learn.
I felt there wasn’t a clear way to navigate learning the required information
I lacked an understanding of my strengths and weaknesses in relation to different AI Safety areas (i.e. personal fit / comparative advantage) .
I lacked an understanding of my progress after I get started (e.g. am I doing well? Poorly? Fast enough?)
I regularly experienced fear of failure
I regularly experienced fear of wasted efforts / sunk cost
Fear of admitting mistakes or starting over might prevent people from making necessary adjustments.
I found it difficult to identify my desired role / job (i.e. the end goal)
When I did think I knew my desired role, identifying the specific skills and knowledge required for a desired role was difficult
There is no clear career pipeline: Do X and then Y and then Z and then you have an A% chance of getting B% role
Finding time to get upskilled while working is difficult
I found the funding ecosystem opaque
A lot of discipline and motivation over potentially long periods was required to upskill
I felt like nobody gave me realistic expectations as to what the journey would be like