LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Announcing the PIBBSS Symposium '24!
DusanDNesic · 2024-09-03T11:19:47.568Z · comments (0)

Looking for Goal Representations in an RL Agent - Update Post
CatGoddess · 2024-08-28T16:42:19.367Z · comments (0)

[link] Anthropic is being sued for copying books to train Claude
Remmelt (remmelt-ellen) · 2024-08-31T02:57:27.092Z · comments (4)

"Real AGI"
Seth Herd · 2024-09-13T14:13:24.124Z · comments (18)

[link] How to choose what to work on
jasoncrawford · 2024-09-18T20:39:12.316Z · comments (4)

My career exploration: Tools for building confidence
lynettebye · 2024-09-13T11:37:55.843Z · comments (0)

Why I'm bearish on mechanistic interpretability: the shards are not in the network
tailcalled · 2024-09-13T17:09:25.407Z · comments (37)

[question] Is this voting system strategy proof?
Donald Hobson (donald-hobson) · 2024-09-06T20:44:46.691Z · answers+comments (9)

[link] Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T13:23:27.033Z · comments (11)

[link] AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-14T23:23:26.296Z · comments (1)

What program structures enable efficient induction?
Daniel C (harper-owen) · 2024-09-05T10:12:14.058Z · comments (4)

[link] My lukewarm take on GLP-1 agonists
George3d6 · 2024-08-26T12:34:27.929Z · comments (0)

[link] Pronouns are Annoying
ymeskhout · 2024-09-18T13:30:04.620Z · comments (19)

[link] Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C (harper-owen) · 2024-09-07T10:04:47.840Z · comments (18)

Reducing global AI competition through the Commerce Control List and Immigration reform: a dual-pronged approach
Ben Smith (ben-smith) · 2024-09-03T05:28:24.549Z · comments (2)

[link] Why good things often don’t lead to better outcomes
DMMF · 2024-09-19T16:37:07.778Z · comments (1)

[question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2024-09-04T12:40:07.678Z · answers+comments (6)

Interview with Robert Kralisch on Simulators
WillPetillo · 2024-08-26T05:49:15.543Z · comments (0)

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)
Declan Molony (declan-molony) · 2024-09-10T05:54:47.000Z · comments (12)

[link] Foundations - Why Britain has stagnated [crosspost]
Nathan Young · 2024-09-23T10:43:20.411Z · comments (1)

Automating LLM Auditing with Developmental Interpretability
htlou · 2024-09-04T15:50:04.337Z · comments (0)

Announcing the Ultimate Jailbreaking Championship
InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · comments (1)

[link] AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord (brendan-mccord) · 2024-09-05T18:23:32.690Z · comments (5)

Slave Morality: A place for every man and every man in his place
Martin Sustrik (sustrik) · 2024-09-19T04:20:04.491Z · comments (7)

Against Explosive Growth
c.trout (ctrout) · 2024-09-04T21:45:03.120Z · comments (1)

[link] Benefits of Psyllium Dietary Fiber in Particular
Brendan Long (korin43) · 2024-08-28T18:13:23.891Z · comments (6)

[link] Verification methods for international AI agreements
Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · comments (1)

Are LLMs on the Path to AGI?
Davidmanheim · 2024-08-30T03:14:04.710Z · comments (2)

[question] Building an Inexpensive, Aesthetic, Private Forum
Aaron Graifman (aaron-graifman) · 2024-09-09T17:10:42.677Z · answers+comments (15)

A bet for Samo Burja
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-05T16:01:35.440Z · comments (2)

[link] Should Sports Betting Be Banned?
Maxwell Tabarrok (maxwell-tabarrok) · 2024-09-21T14:13:35.404Z · comments (1)

[link] How to Fake Decryption
ohmurphy · 2024-09-05T09:18:41.586Z · comments (0)

Avoiding the Bog of Moral Hazard for AI
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-13T21:24:34.137Z · comments (12)

Can Large Language Models effectively identify cybersecurity risks?
emile delcourt (emile-delcourt) · 2024-08-30T20:20:21.345Z · comments (0)

[question] Has Anyone Here Consciously Changed Their Passions?
Spade · 2024-09-09T01:36:26.197Z · answers+comments (12)

[link] Intention-to-Treat (Re: How harmful is music, really?)
kqr · 2024-09-18T18:44:41.128Z · comments (0)

Switching to a 4GB SD
jefftk (jkaufman) · 2024-09-23T11:20:05.432Z · comments (1)

My hopes for YouCongress.com
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-22T03:20:20.939Z · comments (2)

Can startups be impactful in AI safety?
Esben Kran (esben-kran) · 2024-09-13T19:00:33.306Z · comments (0)

[link] A primer on ML in antibody engineering
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-23T17:03:07.628Z · comments (0)

Just How Good Are Modern Chess Computers?
nem · 2024-09-19T18:57:21.254Z · comments (1)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

[link] In Praise of the Beatitudes
robotelvis · 2024-09-24T05:08:21.133Z · comments (0)

[link] How harmful is music, really?
dkl9 · 2024-09-17T14:53:25.426Z · comments (6)

[question] Does life actually locally *increase* entropy?
tailcalled · 2024-09-16T20:30:33.148Z · answers+comments (23)

On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it
Yuli_Ban · 2024-08-28T01:57:17.387Z · comments (2)

Keyboard Gremlins
jefftk (jkaufman) · 2024-09-20T02:30:07.140Z · comments (0)

Becket First
jefftk (jkaufman) · 2024-09-22T17:10:04.304Z · comments (0)

[link] Physics of Language models (part 2.1)
Nathan Helm-Burger (nathan-helm-burger) · 2024-09-19T16:48:32.301Z · comments (2)

[link] Virtue is a Vector
robotelvis · 2024-09-10T03:02:45.737Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

richard_kennaway on Search 5000 books, speed up your research and personal growth

This one is sufficiently egregious that it should be deleted and the author banned. It's at best spam, at worst malware. Fortunately, the obfuscated URL does not actually work.

evan_gaensbauer on Defining alignment research

Do you mean Evan Hubinger, Evan R. Murphy, or a different Evan? (I would be surprised and humbled if it was me, though my priors on that are low.)

tailcalled on What are the best arguments for/against AIs being "slightly 'nice'"?

The big problem is excess aggregation inherent in the "AI" concept.

The world has a simple backbone of entities and ways to interact with them, and you can make software that unreflectingly propagates activity from one part of the backbone to another. Most currently addressed tasks can be solved by such software, but they haven't yet been. This software can be nice but is also extremely exploitable by adversaries. Let's call this an opportunity propagator.

Because it is exploitable, one task it cannot solve is providing security. To make something less exploitable, it needs to not just propagate things along the backbone, but also do wildly deep searches to find the most effective and robust methods. To search deeply, you need some guiding principle for the search, i.e. a utility function. Utility maximizers have all the standard AI safety issues.

Human society currently cares about human well-being because the opportunity propagators that have been arranged into an approximate utility maximizer to provide security (e.g. human military personnel arranged into NATO) depends on human thriving (even something as generous as liberty and equality allows military units to respond more dynamically to threats than traditional top-down structures do), which is then generalized in various ways to all of society. Artificial intelligence provides value by making it unnecessary to rely on humans for opportunity propagation, which breaks the natural attractor to corrigibility and promotion of human thriving that current systems have.

People intuit that there's something wrong with the utility maximizer framing because current AI seems to be evolving in a different way. That's true in the sense that opportunity propagators are a thing and constitute ~the fundamental atoms of agency. But it doesn't actually solve the alignment problem because we need utility maximizers.

tsvibt on Struggling like a Shadowmoth

Sometimes yes, but also this is a great and common excuse to be eaten.

evan_gaensbauer on Habryka's Shortform Feed

How do you square encouraging others to weigh in on EA fundraising, and presumably the assumption that anyone in the EA community can trust you as a collaborator of any sort, with your intentions, as you put it in July, to probably seek to shut down at some point in the future?

thomas-kwa on ASIs will not leave just a little sunlight for Earth

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

stefan_schubert on whestler's Shortform

Cf this Bostrom quote.

Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization - a niche we filled because we got there first, not because we are in any sense optimally adapted to it.

Re this:

In evolutionary timescales, virtually no time has elapsed since hominids began trading, utilizing complex symbolic thinking, making art, hunting large animals etc, and here we are, a blip later in high technology.

A bit nit-picky, but a recent paper studying West Eurasia found significant evolution over the last 14,000 years.

sodium on How LLMs are and are not myopic

Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation.
This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural "task" to do interpretability on even less defensible [AF · GW].

Anyways, I thought I should revisit this post after o1 comes out. I can't help noticing that it's stylistically very different from all of the janus writing I've encountered in the past, then I got to the end

The ideas in the post are from a human, but most of the text was written by Chat GPT-4 with prompts and human curation using Loom.

Ha, I did notice I was confused (but didn't bother thinking about it further)

lsusr on What are the best arguments for/against AIs being "slightly 'nice'"?

Noted. The problem remains—it's just less obvious. This phrasing still conflates "intelligent system" with "optimizer", a mistake that goes all the way back to Eliezer Yudkowsky's 2004 paper on Coherent Extrapolated Volition.

For example, consider a computer system that, given a number can (usually) produce the shortest computer program that will output $N$ . Such a computer system is undeniably superintelligent, but it's not a world optimizer at all.

"Far away, in the Levant, there are yogis who sit on lotus thrones. They do nothing, for which they are revered as gods," said Socrates.

―The Teacup Test [LW · GW]

raemon on What are the best arguments for/against AIs being "slightly 'nice'"?

I realize this isn’t your main point here, but I do want to flag I put ‘nice’ in quotes because I don’t mean the colloquial definition. The question here is ‘would a super intelligent system with control over the solar system spend a billionth or trillionth of its resources helping beings too weak to usefully trade with it, if it didn’t benefit directly from it?’

As I see it the question is agnostic to what sort of mind the AI is.