LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility · 2024-01-19T04:05:44.782Z · comments (0)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Prepsgiving, A Convergently Instrumental Human Practice
JenniferRM · 2023-11-23T17:24:56.784Z · comments (0)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (9)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

[link] An AI Manhattan Project is Not Inevitable
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-06T16:42:35.920Z · comments (25)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

I’m confused about innate smell neuroanatomy
Steven Byrnes (steve2152) · 2023-11-28T20:49:13.042Z · comments (1)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

[link] Generative ML in chemistry is bottlenecked by synthesis
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-16T16:31:34.801Z · comments (2)

Metastatic Cancer Treatment Since 2010: The Success Stories
sarahconstantin · 2024-11-04T22:50:09.386Z · comments (0)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

[link] Characterizing stable regions in the residual stream of LLMs
Jett Janiak (jett) · 2024-09-26T13:44:58.792Z · comments (4)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes (steve2152) · 2024-10-29T13:36:16.325Z · comments (2)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

startattheend on Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

Yeah I'm asking because downvotes are far too ambigious. I think they're ambigious to the point that they don't make for useful feedback (You can't update a worldview for the better if you don't know what's wrong with it). I don't think downvotes are necessarily bad as a concept though. And about humanity - sure, and on any other website I'd largely have agreed with your view, but when I talk about intellectual things I largely push my own humanity to the side. And even if somebody downvotes because of irrational feelings, I'm interested in what those feelings are.

But I know that people on here frequently value truth, and I'm quite brutal to those values as I think truth is about as valid of a concept as a semicolon (the language is just math/logic rather than English). And if we are to talk about Truth with a capital T, then we're speaking about reality, which is more fundamental than language (the territory, reality, is important. But I rarely see any good maps, even on this website. So when taoists seem to suggest throwing the map away entirely, I do think that's a good idea for every day life. It's only for science, research and tech that I value maps). That makes me an outlier though, haha.

vladimir_nesov on In the Name of All That Needs Saving

and makes it satisfy the desires of all the things which ought be happy

Some things don't endorse themselves being happy as an important consideration.

christopher-king on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Believing QI is the same as a Bayesian update on the event "I will become immortal".

Imagine you are a prediction market trader, and a genie appears. You ask the genie "will I become immortal" and the genie answers "yes" and then disappears.

Would you buy shares on a Taiwan war happening?

If the answer is yes, the same thing should apply if a genie told you QI is true (unless the prediction market already priced QI in). No weird anthropics math necessary!

notfnofn on Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

Merging states would be far superior incentive-wise, as the influence of every signatory would increase and there would be essentially no expected net shift in the election results.

Suppose I am an Okalahoman who will always vote for the farther right candidate or a Californian who will always vote for the farther left candidate. Wouldn't I only support this if my odds on this increasing the chance of the farther right (resp. farther left) candidate winning was at least 50%? Wouldn't most people fall in one of these two categories?

seth-herd on The Case Against Moral Realism

I think this issue has been discussed at length and repeatedly on LW, leading to a weak consensus that at least strong moral realism isn't true.

Can anyone supply links to some other good posts on the topic?

alex-mallen on Anthropic: Three Sketches of ASL-4 Safety Case Components

I think that if they:
- Find a feature that is necessary for defection in all of the cases of defection in a diverse and realistic set of model organisms
- Locate that feature with very little optimization on the defection cases
I'd be fairly confident (>~80%) that it would catch any particular deployment-time defection attempt from an ASL-4 schemer. This would cut risk from the deployment by >5x when paired with compelling incrimination and a shutdown. This is a substantial risk reduction, and I might even buy this safety case if there weren't better options for this threat model (e.g. untrusted monitoring). My main concerns are:
- It seems unlikely we would get a feature that meets the above two criteria
- The feature might have too many false positives
- The possibility that the model organisms were sabotaging your eval by coordinating to route their defection attempts through a particular feature (this seems hard at ASL-4 though (but this claim would need an eval)).

zach-stein-perlman on Anthropic: Three Sketches of ASL-4 Safety Case Components

See [V1.1]

shoshannah-tekofsky on Explore More: A Bag of Tricks to Keep Your Life on the Rails

This made me unreasonably happy. Thank you :D

sarahconstantin on sarahconstantin's Shortform

links 11/07/2024: https://roamresearch.com/#/app/srcpublic/page/11-07-2024

on Donna Karan
- https://www.vogue.com/slideshow/donna-karan-seven-career-highlights-cold-shoulder
- https://www.vogue.com/article/donna-karan-vintage
https://du42p.r.a.d.sendibm1.com/mk/mr/sh/1f8JAEjGcfF85pENVqcuM6hh5D/tiVMw3KFimvC?fbclid=IwY2xjawGZnpRleHRuA2FlbQIxMQABHdHszEMzS3x1Xda2tTh60KMighJEJKDe30rsduQmFydSSyfTpK8mwG50vg_aem_IjjUQWdiwZf2AdgdLo9azA opinion on how hormonal contraception should be done differently -- I'm intrigued but I haven't yet checked these claims out
http://esr.ibiblio.org/?p=8720 Eric S. Raymond on "user stories" done right and wrong
https://endpts.com/biotech-industry-worries-over-potential-for-rfk-jr-ally-as-fda-pick/ Casey Means has been floated as the new pick for FDA head; apparently she's expressed concerns about vaccines and over-medication on the Joe Rogan podcast and has written a book about how most chronic diseases can be prevented by healthy lifestyles (which probably overstates the case)
https://www.slowboring.com/p/the-tyranny-of-climate-targets Matt Yglesias on why a lot of aggressive climate targets are impossible to actually meet.
- why do people try anyway? if it's "cheap talk", why is there so much costly, substantive follow-through? incentive misalignment, I suppose?
https://www.gordian.bio/blog/the-in-vivo-screening-revolution/ Martin Borch Jensen on in-vivo screening

buck on Anthropic: Three Sketches of ASL-4 Safety Case Components

Note that you can make untrusted monitoring work without the synthetic inputs too, you just have to do the honeypots during deployment instead of before deployment.