LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Book review: Cuisine and Empire
eukaryote · 2024-01-21T06:15:12.969Z · comments (2)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas (gytis-daujotas) · 2024-08-01T21:08:38.800Z · comments (6)

instruction tuning and autoregressive distribution shift
nostalgebraist · 2024-09-05T16:53:41.497Z · comments (5)

[link] you should probably eat oatmeal sometimes
bhauth · 2024-08-25T14:50:37.570Z · comments (29)

[link] What's important in "AI for epistemics"?
Lukas Finnveden (Lanrian) · 2024-08-24T01:27:06.771Z · comments (0)

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.
Jessica Rumbelow (jessica-cooper) · 2024-08-03T12:07:46.302Z · comments (2)

Californians, tell your reps to vote yes on SB 1047!
Holly_Elmore · 2024-08-12T19:50:09.817Z · comments (24)

You're a Space Wizard, Luke
lsusr · 2024-08-18T05:35:39.238Z · comments (6)

[link] Adverse Selection by Life-Saving Charities
vaishnav92 · 2024-08-14T20:46:23.662Z · comments (14)

Instrumental deception and manipulation in LLMs - a case study
Olli Järviniemi (jarviniemi) · 2024-02-24T02:07:01.769Z · comments (13)

Forget Everything (Statistical Mechanics Part 1)
J Bostock (Jemist) · 2024-04-22T13:33:35.446Z · comments (6)

Logical Line-Of-Sight Makes Games Sequential or Loopy
StrivingForLegibility · 2024-01-19T04:05:44.782Z · comments (0)

Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij (teun-van-der-weij) · 2024-01-29T00:24:27.706Z · comments (5)

[link] Understanding Gödel’s completeness theorem
jessicata (jessica.liu.taylor) · 2024-05-27T18:55:02.079Z · comments (0)

Nitric oxide for covid and other viral infections
Elizabeth (pktechgirl) · 2024-02-07T21:30:03.774Z · comments (6)

[link] [Paper] Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee (bruce-lee) · 2024-02-22T18:52:32.237Z · comments (23)

[link] Linear infra-Bayesian Bandits
Vanessa Kosoy (vanessa-kosoy) · 2024-05-10T06:41:09.206Z · comments (5)

[link] Conflict in Posthuman Literature
Martín Soto (martinsq) · 2024-04-06T22:26:04.051Z · comments (1)

Apply to the PIBBSS Summer Research Fellowship
Nora_Ammann · 2024-01-12T04:06:58.328Z · comments (1)

Medical Roundup #3
Zvi · 2024-07-09T13:10:06.862Z · comments (4)

[link] Eight Magic Lamps
Richard_Ngo (ricraz) · 2023-10-14T04:10:02.040Z · comments (0)

Stitching SAEs of different sizes
Bart Bussmann (Stuckwork) · 2024-07-13T17:19:20.506Z · comments (12)

Prepsgiving, A Convergently Instrumental Human Practice
JenniferRM · 2023-11-23T17:24:56.784Z · comments (0)

Individually incentivized safe Pareto improvements in open-source bargaining
Nicolas Macé (NicolasMace) · 2024-07-17T18:26:43.619Z · comments (2)

[link] Legalize butanol?
bhauth · 2023-12-20T14:24:33.849Z · comments (20)

[Valence series] 5. “Valence Disorders” in Mental Health & Personality
Steven Byrnes (steve2152) · 2023-12-18T15:26:29.970Z · comments (12)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

[Interim research report] Evaluating the Goal-Directedness of Language Models
Rauno Arike (rauno-arike) · 2024-07-18T18:19:04.260Z · comments (4)

Whiteboard Pen Magazines are Useful
Johannes C. Mayer (johannes-c-mayer) · 2024-07-12T17:15:33.200Z · comments (6)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

Box inversion revisited
Jan_Kulveit · 2023-11-07T11:09:36.557Z · comments (3)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

When Are Results from Computational Complexity Not Too Coarse?
Dalcy (Darcy) · 2024-07-03T19:06:44.953Z · comments (7)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

Natural abstractions are observer-dependent: a conversation with John Wentworth
Martín Soto (martinsq) · 2024-02-12T17:28:38.889Z · comments (13)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

How To Do Patching Fast
Joseph Miller (Josephm) · 2024-05-11T20:13:52.424Z · comments (6)

Mud and Despair (Part 4 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-07T00:14:23.975Z · comments (0)

[link] Jailbreak steering generalization
Sarah Ball · 2024-06-20T17:25:24.110Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

viliam on Search 5000 books, speed up your research and personal growth

Also, it can be spam today and malware tomorrow, if the file changes.

richard_kennaway on Search 5000 books, speed up your research and personal growth

This one is sufficiently egregious that it should be deleted and the author banned. It's at best spam, at worst malware. Fortunately, the obfuscated URL does not actually work.

evan_gaensbauer on Defining alignment research

Do you mean Evan Hubinger, Evan R. Murphy, or a different Evan? (I would be surprised and humbled if it was me, though my priors on that are low.)

tailcalled on What are the best arguments for/against AIs being "slightly 'nice'"?

The big problem is excess aggregation inherent in the "AI" concept.

The world has a simple backbone of entities and ways to interact with them, and you can make software that unreflectingly propagates activity from one part of the backbone to another. Most currently addressed tasks can be solved by such software, but they haven't yet been. This software can be nice but is also extremely exploitable by adversaries. Let's call this an opportunity propagator.

Because it is exploitable, one task it cannot solve is providing security. To make something less exploitable, it needs to not just propagate things along the backbone, but also do wildly deep searches to find the most effective and robust methods. To search deeply, you need some guiding principle for the search, i.e. a utility function. Utility maximizers have all the standard AI safety issues.

Human society currently cares about human well-being because the opportunity propagators that have been arranged into an approximate utility maximizer to provide security (e.g. human military personnel arranged into NATO) depends on human thriving (even something as generous as liberty and equality allows military units to respond more dynamically to threats than traditional top-down structures do), which is then generalized in various ways to all of society. Artificial intelligence provides value by making it unnecessary to rely on humans for opportunity propagation, which breaks the natural attractor to corrigibility and promotion of human thriving that current systems have.

People intuit that there's something wrong with the utility maximizer framing because current AI seems to be evolving in a different way. That's true in the sense that opportunity propagators are a thing and constitute ~the fundamental atoms of agency. But it doesn't actually solve the alignment problem because we need utility maximizers.

tsvibt on Struggling like a Shadowmoth

Sometimes yes, but also this is a great and common excuse to be eaten.

evan_gaensbauer on Habryka's Shortform Feed

How do you square encouraging others to weigh in on EA fundraising, and presumably the assumption that anyone in the EA community can trust you as a collaborator of any sort, with your intentions, as you put it in July, to probably seek to shut down at some point in the future?

thomas-kwa on ASIs will not leave just a little sunlight for Earth

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

stefan_schubert on whestler's Shortform

Cf this Bostrom quote.

Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization - a niche we filled because we got there first, not because we are in any sense optimally adapted to it.

Re this:

In evolutionary timescales, virtually no time has elapsed since hominids began trading, utilizing complex symbolic thinking, making art, hunting large animals etc, and here we are, a blip later in high technology.

A bit nit-picky, but a recent paper studying West Eurasia found significant evolution over the last 14,000 years.

sodium on How LLMs are and are not myopic

Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation.
This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural "task" to do interpretability on even less defensible [AF · GW].

Anyways, I thought I should revisit this post after o1 comes out. I can't help noticing that it's stylistically very different from all of the janus writing I've encountered in the past, then I got to the end

The ideas in the post are from a human, but most of the text was written by Chat GPT-4 with prompts and human curation using Loom.

Ha, I did notice I was confused (but didn't bother thinking about it further)

lsusr on What are the best arguments for/against AIs being "slightly 'nice'"?

Noted. The problem remains—it's just less obvious. This phrasing still conflates "intelligent system" with "optimizer", a mistake that goes all the way back to Eliezer Yudkowsky's 2004 paper on Coherent Extrapolated Volition.

For example, consider a computer system that, given a number can (usually) produce the shortest computer program that will output $N$ . Such a computer system is undeniably superintelligent, but it's not a world optimizer at all.

"Far away, in the Levant, there are yogis who sit on lotus thrones. They do nothing, for which they are revered as gods," said Socrates.

―The Teacup Test [LW · GW]