LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

The Pragmatic Side of Cryptographically Boxing AI
Bart Jaworski (bart-jaworski) · 2024-08-06T17:46:21.754Z · comments (0)

[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig · 2024-07-15T14:08:15.716Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Establishing a Connection (Ch 17-20)
a littoral wizard · 2024-07-23T21:56:48.122Z · comments (2)

The Xerox Parc/ARPA version of the intellectual Turing test: Class 1 vs Class 2 disagreement
hamishtodd1 · 2024-06-30T15:34:53.729Z · comments (3)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

[link] Memorising molecular structures
dkl9 · 2024-07-12T22:40:42.307Z · comments (0)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (0)

[question] Request for AI risk quotes, especially around speed, large impacts and black boxes
Nathan Young · 2024-08-02T17:49:48.898Z · answers+comments (0)

[link] Solutions to problems with Bayesianism
B Jacobs (Bob Jacobs) · 2024-07-31T14:18:27.910Z · comments (0)

How can I get over my fear of becoming an emulated consciousness?
James Dowdell (james-dowdell) · 2024-07-07T22:02:43.520Z · comments (8)

Introduction to Modern Dating: Strategic Dating Advice for beginners
Jesper Lindholm · 2024-07-20T15:45:25.705Z · comments (5)

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom
Ghdz (gal-hadad) · 2024-08-05T18:27:20.709Z · comments (2)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

[Research log] The board of Alphabet would stop DeepMind to save the world
Lucie Philippon (lucie-philippon) · 2024-07-16T04:59:14.874Z · comments (0)

[link] Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions
James Stephen Brown (james-brown) · 2024-09-11T09:53:07.474Z · comments (0)

Activation Engineering Theories of Impact
kubanetics (jakub-nowak) · 2024-07-18T16:44:33.656Z · comments (1)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

Limitations on the Interpretability of Learned Features from Sparse Dictionary Learning
Tom Angsten (tom-angsten) · 2024-07-30T16:36:06.518Z · comments (0)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

Establishing a Connection (Ch 13-16)
a littoral wizard · 2024-07-17T23:56:23.069Z · comments (4)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[link] Labelling, Variables, and In-Context Learning in Llama2
Joshua Penman (joshua-penman) · 2024-08-03T19:36:34.721Z · comments (0)

[link] Risk Overview of AI in Bio Research
J Bostock (Jemist) · 2024-07-15T00:04:41.818Z · comments (0)

Toy Models of Superposition: what about BitNets?
Alejandro Tlaie (alejandro-tlaie-boria) · 2024-08-08T16:29:02.054Z · comments (1)

[link] Why People in Poverty Make Bad Decisions
James Stephen Brown (james-brown) · 2024-07-15T23:40:32.116Z · comments (8)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

[link] The AI regulator’s toolbox: A list of concrete AI governance practices
Adam Jones (domdomegg) · 2024-08-10T21:15:09.265Z · comments (1)

[link] Yet Another Critique of "Luxury Beliefs"
ymeskhout · 2024-07-18T18:37:28.703Z · comments (10)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

Mentorship in AGI Safety: Applications for mentorship are open!
Valentin2026 (Just Learning) · 2024-06-28T14:49:48.501Z · comments (0)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

Notes on Tuning Metacognition
JoNeedsSleep (joanna-j-1) · 2024-07-03T19:54:59.732Z · comments (0)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Freedom and Privacy of Thought Architectures
JohnBuridan · 2024-07-20T21:43:11.419Z · comments (2)

Ethical Deception: Should AI Ever Lie?
Jason Reid (jason-reid) · 2024-08-02T17:53:38.744Z · comments (2)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

[link] Launching the AI Forecasting Benchmark Series Q3 | $30k in Prizes
ChristianWilliams · 2024-07-08T17:20:54.717Z · comments (0)

[question] Can agents coordinate on randomness without outside sources?
Mikhail Samin (mikhail-samin) · 2024-07-06T13:43:44.633Z · answers+comments (16)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

That which can be destroyed by the truth, should be assumed to should be destroyed by it
Thac0 · 2024-07-09T19:39:57.887Z · comments (0)

The Carnot Engine of Economics
StrivingForLegibility · 2024-08-09T15:59:40.458Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

richard_kennaway on Search 5000 books, speed up your research and personal growth

This one is sufficiently egregious that it should be deleted and the author banned. It's at best spam, at worst malware. Fortunately, the obfuscated URL does not actually work.

evan_gaensbauer on Defining alignment research

Do you mean Evan Hubinger, Evan R. Murphy, or a different Evan? (I would be surprised and humbled if it was me, though my priors on that are low.)

tailcalled on What are the best arguments for/against AIs being "slightly 'nice'"?

The big problem is excess aggregation inherent in the "AI" concept.

The world has a simple backbone of entities and ways to interact with them, and you can make software that unreflectingly propagates activity from one part of the backbone to another. Most currently addressed tasks can be solved by such software, but they haven't yet been. This software can be nice but is also extremely exploitable by adversaries. Let's call this an opportunity propagator.

Because it is exploitable, one task it cannot solve is providing security. To make something less exploitable, it needs to not just propagate things along the backbone, but also do wildly deep searches to find the most effective and robust methods. To search deeply, you need some guiding principle for the search, i.e. a utility function. Utility maximizers have all the standard AI safety issues.

Human society currently cares about human well-being because the opportunity propagators that have been arranged into an approximate utility maximizer to provide security (e.g. human military personnel arranged into NATO) depends on human thriving (even something as generous as liberty and equality allows military units to respond more dynamically to threats than traditional top-down structures do), which is then generalized in various ways to all of society. Artificial intelligence provides value by making it unnecessary to rely on humans for opportunity propagation, which breaks the natural attractor to corrigibility and promotion of human thriving that current systems have.

People intuit that there's something wrong with the utility maximizer framing because current AI seems to be evolving in a different way. That's true in the sense that opportunity propagators are a thing and constitute ~the fundamental atoms of agency. But it doesn't actually solve the alignment problem because we need utility maximizers.

tsvibt on Struggling like a Shadowmoth

Sometimes yes, but also this is a great and common excuse to be eaten.

evan_gaensbauer on Habryka's Shortform Feed

How do you square encouraging others to weigh in on EA fundraising, and presumably the assumption that anyone in the EA community can trust you as a collaborator of any sort, with your intentions, as you put it in July, to probably seek to shut down at some point in the future?

thomas-kwa on ASIs will not leave just a little sunlight for Earth

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

stefan_schubert on whestler's Shortform

Cf this Bostrom quote.

Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization - a niche we filled because we got there first, not because we are in any sense optimally adapted to it.

Re this:

In evolutionary timescales, virtually no time has elapsed since hominids began trading, utilizing complex symbolic thinking, making art, hunting large animals etc, and here we are, a blip later in high technology.

A bit nit-picky, but a recent paper studying West Eurasia found significant evolution over the last 14,000 years.

sodium on How LLMs are and are not myopic

Now that o1 explicitly does RL on CoT, next token prediction for o1 is definitely not consequence blind. The next token it predicts enters into its input and can be used for future computation.
This type of outcome based training makes the model more consequentialist. It also makes using a single next token prediction as the natural "task" to do interpretability on even less defensible [AF · GW].

Anyways, I thought I should revisit this post after o1 comes out. I can't help noticing that it's stylistically very different from all of the janus writing I've encountered in the past, then I got to the end

The ideas in the post are from a human, but most of the text was written by Chat GPT-4 with prompts and human curation using Loom.

Ha, I did notice I was confused (but didn't bother thinking about it further)

lsusr on What are the best arguments for/against AIs being "slightly 'nice'"?

Noted. The problem remains—it's just less obvious. This phrasing still conflates "intelligent system" with "optimizer", a mistake that goes all the way back to Eliezer Yudkowsky's 2004 paper on Coherent Extrapolated Volition.

For example, consider a computer system that, given a number can (usually) produce the shortest computer program that will output $N$ . Such a computer system is undeniably superintelligent, but it's not a world optimizer at all.

"Far away, in the Levant, there are yogis who sit on lotus thrones. They do nothing, for which they are revered as gods," said Socrates.

―The Teacup Test [LW · GW]

raemon on What are the best arguments for/against AIs being "slightly 'nice'"?

I realize this isn’t your main point here, but I do want to flag I put ‘nice’ in quotes because I don’t mean the colloquial definition. The question here is ‘would a super intelligent system with control over the solar system spend a billionth or trillionth of its resources helping beings too weak to usefully trade with it, if it didn’t benefit directly from it?’

As I see it the question is agnostic to what sort of mind the AI is.