LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (7)

OpenAI defected, but we can take honest actions
Remmelt (remmelt-ellen) · 2024-10-21T08:41:25.728Z · comments (15)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
Linda Linsefors · 2024-08-23T14:18:24.327Z · comments (2)

[link] some questionable space launch guns
bhauth · 2024-10-13T22:52:26.418Z · comments (0)

[link] Four Levels of Voting Methods
hive · 2024-09-26T18:15:00.565Z · comments (3)

Is Text Watermarking a lost cause?
egor.timatkov · 2024-10-01T16:20:51.113Z · comments (13)

[link] Instruction Following without Instruction Tuning
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-24T13:49:09.078Z · comments (0)

[link] College technical AI safety hackathon retrospective - Georgia Tech
yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · comments (0)

[link] GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
ChengCheng (ccstan99) · 2024-11-01T00:10:50.718Z · comments (0)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

Appealing to the Public
jefftk (jkaufman) · 2024-10-23T19:00:07.669Z · comments (0)

[question] Is there a CFAR handbook audio option?
FinalFormal2 · 2024-10-26T17:08:36.480Z · answers+comments (0)

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg · 2024-10-27T17:34:50.479Z · comments (0)

[question] Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?
SpectrumDT · 2024-11-04T15:20:14.822Z · answers+comments (49)

Review: Dr Stone
ProgramCrafter (programcrafter) · 2024-09-29T10:35:53.175Z · comments (5)

2024 NYC Secular Solstice & Megameetup
Joe Rogero · 2024-11-12T17:46:18.674Z · comments (0)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (21)

Electric Grid Cyberattack: An AI-Informed Threat Model
moonlightmaze · 2024-11-11T21:34:17.190Z · comments (0)

Join a LessWrong Team for the Unaging System Challenge
Crissman · 2024-10-23T06:01:08.018Z · comments (5)

New Funding Category Open in Foresight's AI Safety Grants
Allison Duettmann (allison-duettmann) · 2024-11-06T22:59:41.065Z · comments (0)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

[link] Where is the Learn Everything System?
Shoshannah Tekofsky (DarkSym) · 2024-09-27T21:30:16.379Z · comments (8)

[link] Levers for Biological Progress - A Response to "Machines of Loving Grace"
Niko_McCarty (niko-2) · 2024-11-01T16:35:08.221Z · comments (0)

Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI
Seth Herd · 2024-11-12T18:23:53.533Z · comments (2)

LifeKeeper Diaries: Exploring Misaligned AI Through Interactive Fiction
Tristan Tran (tristan-tran) · 2024-11-09T20:58:09.182Z · comments (5)

Two arguments against longtermist thought experiments
momom2 (amaury-lorin) · 2024-11-02T10:22:11.311Z · comments (5)

[question] Any Trump Supporters Want to Dialogue?
k64 · 2024-09-28T19:41:55.370Z · answers+comments (80)

[link] What if muscle tension is sometimes signal jamming?
Chipmonk · 2024-11-04T21:08:47.800Z · comments (1)

[link] Runner's High On Demand: A Story of Luck & Persistence
Shoshannah Tekofsky (DarkSym) · 2024-09-29T17:15:29.494Z · comments (6)

Humans are (mostly) metarational
Yair Halberstadt (yair-halberstadt) · 2024-10-09T05:51:16.644Z · comments (6)

[link] The Ap Distribution
criticalpoints · 2024-08-24T21:45:35.029Z · comments (3)

[question] Looking to interview AI Safety researchers for a book
jeffreycaruso · 2024-08-24T19:57:33.119Z · answers+comments (0)

The deepest atheist: Sam Altman
Trey Edwin (Paolo Vivaldi) · 2024-10-10T03:27:34.465Z · comments (2)

[link] AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord (brendan-mccord) · 2024-09-05T18:23:32.690Z · comments (5)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

Inverse Problems In Everyday Life
silentbob · 2024-10-15T11:42:30.276Z · comments (2)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

What can we learn from insecure domains?
Logan Zoellner (logan-zoellner) · 2024-11-01T23:53:30.066Z · comments (21)

Thoughts after the Wolfram and Yudkowsky discussion
Tahp · 2024-11-14T01:43:12.920Z · comments (8)

Are LLMs on the Path to AGI?
Davidmanheim · 2024-08-30T03:14:04.710Z · comments (2)

Chaos Theory in Ecology
Elizabeth (pktechgirl) · 2024-11-09T17:50:01.727Z · comments (2)

[link] AI & wisdom 2: growth and amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:07:39.449Z · comments (0)

[link] AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L (LRudL) · 2024-10-28T21:08:56.604Z · comments (0)

[link] Benefits of Psyllium Dietary Fiber in Particular
Brendan Long (korin43) · 2024-08-28T18:13:23.891Z · comments (7)

My hopes for YouCongress.com
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-22T03:20:20.939Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

clone-of-saturn on Heresies in the Shadow of the Sequences

Any agent that makes decisions has an implicit decision theory, it just might not be a very good one. I don't think anyone ever said advanced decision theory was required for AGI, only for robust alignment.

rauno-arike on Rauno's Shortform

[Link] Something weird is happening with LLMs and chess by dynomight [LW · GW]

dynomight stacked up 13 LLMs against Stockfish on the lowest difficulty setting and found a huge difference between the performance of GPT-3.5 Turbo Instruct and any other model:

all

People noticed already last year that RLHF-tuned models are much worse at chess than base/instruct models, so this isn't a completely new result. The gap between models from the GPT family could also perhaps be (partially) closed through better prompting: Adam Karvonen has created a repo for evaluating LLMs' chess-playing abilities and found that many of GPT-4's losses against 3.5 Instruct were caused by GPT-4 proposing illegal moves. However, dynomight notes that there isn't nearly as big of a gap between base and chat models from other model families:

instruct comparison

This is a surprising result to me—I had assumed that base models are now generally decent at chess after seeing the news about 3.5 Instruct playing at 1800 ELO level last year. dynomight proposes the following four explanations for the results:

1. Base models at sufficient scale can play chess, but instruction tuning destroys it.
2. GPT-3.5-instruct was trained on more chess games.
3. There’s something particular about different transformer architectures.
4. There’s “competition” between different types of data.

anthonyc on The Case Against Moral Realism

If I'm understanding you correctly, then I strongly disagree about what ethics and meta-ethics are for, as well as what "individual selfishness" means. The questions I care about flow from "What do I care about, and why?" and "How much do I think others should or will care about these things, and why?" Moral realism and amoral nihilism are far from the only options, and neither are ones I'm interested in accepting.

anthonyc on If I care about measure, choices have additional burden (+AI generated LW-comments)

I'm not saying it improves decision making. I'm saying it's an argument for improving our decision making in general, if mundane decisions we wouldn't normally think are all that important have much larger and long-lasting consequences. Each mundane decision affects a large number of lives that parts of me will experience, in addition to the effects on others.

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

My point was that only 3 is relevant. How it improves average decision making?

neil-warren on Politics are not serious by default

I've been at Sciences Po for a few months now. Do you have any general advice? I seem to have trouble taking the subjects seriously enough to any real effort in them, which you seem to point out as a failure mode you skirted. Asking as many people I can for this, as I'm going through a minor existential crisis. Thanks!

harfe on D0TheMath's Shortform

insofar as the simplest & best internal logical-induction market traders have strong beliefs on the subject, they may very well be picking up on something metaphysically fundamental. Its simply the simplest explanation consistent with the facts.

Theorem 4.6.2 in logical induction says that the "probability" of independent statements does not converge to or $0$ , but to something in-between. So even if a mathematician says that some independent statement feels true (eg some objects are "really out there"), logical induction will tell him to feel uncertain about that.

shreyans-jain on Effects of Non-Uniform Sparsity on Superposition in Toy Models

Hey, i'm controlling the sparsity when I'm creating the batch of the data, so during that time, i sample according to the probability i'm assigning for that feature.

re: features getting baked into the bias: yeah, that might be one of the intuitions we can develop but to me the interesting part is that that kind of behaviour didn't happen in any of the other cases when the importance was varying and just happened when the feature importance for all of them is equal. I don't have a concrete intuition on why that might be the case, still trying to think on it.

romeostevensit on The Third Fundamental Question

One operationalization is splitting out positive and negative predictions/models in all three questions (or cost benefit etc).

anthonyc on If I care about measure, choices have additional burden (+AI generated LW-comments)

I don't see #1 affecting decision making because it happens no matter what, and therefore shouldn't differ based on our own choices or values. I guess you could argue it implies an absurdly high discount rate if you see the resulting branches as sufficiently separate from one another, but if the resulting worlds are ones I care about, then the measure dilution is just the default baseline I start from in my reasoning. Unless there is some way we can or could meaningfully increase the multiplication rate in some sets of branches but not others? I don't think that's likely with any methods or tech I can foresee.

#2 seems like an argument for improving ourselves to be more mindful in our choices to be more coherent on average, and #3 an argument for improving our average decision making. The main difference I can think of for how measure affects things is maybe in which features of the outcome distribution/probabilities among choices I care about.