LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Apply to LASR Labs: a London-based technical AI safety research programme
Erin Robertson · 2024-04-09T17:34:06.847Z · comments (1)

When Does Altruism Strengthen Altruism?
jefftk (jkaufman) · 2024-01-21T18:50:05.424Z · comments (2)

[link] Against Student Debt Cancellation From All Sides of the Political Compass
Maxwell Tabarrok (maxwell-tabarrok) · 2024-05-13T14:55:57.525Z · comments (16)

Back to Basics: Truth is Unitary
lsusr · 2024-03-29T21:10:33.399Z · comments (13)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (3)

In defense of technological unemployment as the main AI concern
tailcalled · 2024-08-27T17:58:01.992Z · comments (36)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson (Rubi) · 2024-07-16T22:44:17.128Z · comments (27)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (3)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

[question] "Deception Genre" What Books are like Project Lawful?
Double · 2024-08-28T17:19:52.172Z · answers+comments (20)

[link] rapid growth
Chipmonk · 2024-06-05T00:43:51.501Z · comments (0)

Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd · 2024-08-12T18:10:56.543Z · comments (10)

Economics Roundup #3
Zvi · 2024-09-10T13:50:06.955Z · comments (5)

Job Listing: Managing Editor / Writer
Gretta Duleba (gretta-duleba) · 2024-02-21T23:41:26.818Z · comments (2)

[link] Non-alignment project ideas for making transformative AI go well
Lukas Finnveden (Lanrian) · 2024-01-04T07:23:13.658Z · comments (1)

Locating My Eyes (Part 3 of "The Sense of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-02-29T03:09:25.810Z · comments (4)

Why does generalization work?
Martín Soto (martinsq) · 2024-02-20T17:51:10.424Z · comments (16)

[question] Where is the Town Square?
Gretta Duleba (gretta-duleba) · 2024-02-13T03:53:18.205Z · answers+comments (8)

AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan · 2023-10-03T21:50:07.552Z · comments (0)

[link] Jacob on the Precipice
Richard_Ngo (ricraz) · 2023-09-26T21:16:39.590Z · comments (8)

An Introduction to AI Sandbagging
Teun van der Weij (teun-van-der-weij) · 2024-04-26T13:40:00.126Z · comments (8)

[link] An EPUB of Arbital's AI Alignment section
mesaoptimizer · 2023-10-16T19:36:29.109Z · comments (1)

[link] EPUBs of MIRI Blog Archives and selected LW Sequences
mesaoptimizer · 2023-10-26T14:17:11.538Z · comments (6)

[link] How bad is chlorinated water?
bhauth · 2023-12-13T18:00:12.640Z · comments (18)

Understanding Positional Features in Layer 0 SAEs
bilalchughtai (beelal) · 2024-07-29T09:36:40.701Z · comments (0)

[link] Why Georgism Lost Its Popularity
Zero Contradictions · 2024-07-20T15:08:41.469Z · comments (50)

Incidental polysemanticity
Victor Lecomte (victor-lecomte) · 2023-11-15T04:00:00.000Z · comments (7)

2023 LessWrong Community Census, Request for Comments
Screwtape · 2023-11-01T16:32:19.102Z · comments (37)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

My intellectual journey to (dis)solve the hard problem of consciousness
Charbel-Raphaël (charbel-raphael-segerie) · 2024-04-06T09:32:41.612Z · comments (41)

Childhood and Education Roundup #4
Zvi · 2024-01-30T13:50:06.033Z · comments (10)

Sci-Fi books micro-reviews
Yair Halberstadt (yair-halberstadt) · 2024-06-24T09:49:28.523Z · comments (27)

The Case for Predictive Models
Rubi J. Hudson (Rubi) · 2024-04-03T18:22:20.243Z · comments (7)

The need for multi-agent experiments
Martín Soto (martinsq) · 2024-08-01T17:14:16.590Z · comments (3)

The Next ChatGPT Moment: AI Avatars
kolmplex (luke-man) · 2024-01-05T20:14:10.074Z · comments (10)

New Executive Team & Board — PIBBSS
Nora_Ammann · 2024-07-01T19:30:45.261Z · comments (1)

[question] Does reducing the amount of RL for a given capability level make AI safer?
Chris_Leong · 2024-05-05T17:04:01.799Z · answers+comments (22)

Case studies on social-welfare-based standards in various industries
HoldenKarnofsky · 2024-06-20T13:33:44.780Z · comments (0)

Housing Roundup #7
Zvi · 2024-03-04T15:00:08.192Z · comments (1)

Concrete empirical research projects in mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:07:21.502Z · comments (0)

US Presidential Election: Tractability, Importance, and Urgency
kuhanj · 2024-05-29T23:52:22.420Z · comments (2)

[link] Project ideas: Epistemics
Lukas Finnveden (Lanrian) · 2024-01-05T23:41:23.721Z · comments (4)

[link] Post series on "Liability Law for reducing Existential Risk from AI"
Nora_Ammann · 2024-02-29T04:39:50.557Z · comments (1)

Wholesomeness and Effective Altruism
owencb · 2024-02-28T20:28:22.175Z · comments (3)

Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Chi Nguyen · 2024-02-28T18:58:25.688Z · comments (5)

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders
Evan Anders (evan-anders) · 2024-02-27T02:43:22.446Z · comments (16)

Deep and obvious points in the gap between your thoughts and your pictures of thought
KatjaGrace · 2024-02-23T07:30:07.461Z · comments (6)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger (jeffrey-heninger) · 2024-07-09T16:50:05.776Z · comments (2)

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems
Sonia Joseph (redhat) · 2024-03-13T17:09:17.027Z · comments (13)

When fine-tuning fails to elicit GPT-3.5's chess abilities
Theodore Chapman · 2024-06-14T18:50:52.855Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

drocta on A Nonconstructive Existence Proof of Aligned Superintelligence

Not if the point of the argument is to establish that a superintelligence is compatible with achieving the best possible outcome.

Here is a parody of the issue, which is somewhat unfair and leaves out almost all of your argument, but which I hope makes clear the issue I have in mind:

"Proof that a superintelligence can lead to the best possible outcome: Suppose by some method we achieved the best possible outcome. Then, there's no properties we would want a superintelligence to have beyond that, so let's call however we achieved the best possible outcome, 'a superintelligence'. Then, it is possible to have a superintelligence produce the best possible outcome, QED."

In order for an argument to be compelling for the conclusion "It is possible for a superintelligence to lead to good outcomes." you need to use a meaning of "a superintelligence" in the argument, such that the statement "It is possible for a superintelligence to lead to good outcomes", when interpreted with that meaning of "a superintelligence", produces the meaning you want that sentence to have? If I argue "it is possible for a superintelligence, by which I mean computer with a clock speed faster than N, to lead to good outcomes", then, even if I convincingly argue that a computer with a clock speed faster than N can lead to good outcomes, that shouldn't convince people that it is possible for a superintelligence, in the sense that they have in mind (presumably not defined as "a computer with a clock speed faster than N"), is compatible with good outcomes.

Now, in your argument you say that a superintelligence would presumably be some computational process. True enough! If you then showed that some predicate is true of every computational process, you would then be justified in concluding that that predicate is (presumably) true of every possible superintelligence. But instead, you seem to have argued that a predicate is true of some computational process, and then concluded that it is therefore true of some possible superintelligence. This does not follow.

daniel-murfet on yanni's Shortform

It might be worth knowing that some countries are participating in the "network" without having formal AI safety institutes

gordon-seidoh-worley on The Other Existential Crisis

What will I do when I grow up, if AI can do everything?

One interesting this about this question is that it comes from an implicit frame in which humans must do something to support their survival.

This is deeply ingrained in our biology and culture. As animals, we carry in us the well-worn drives to survive and reproduce, for which if we did not possess we not not exist because our ancestors would never have created the unbroken chain of billions of years that led to us. And with those drives comes the need to do something useful to those ends.

As humans, we are enmeshed in a culture that exists at the frontier of a long process of becoming ever better at working together to get better at surviving, because those cultures that did it better outcompeted those that were worse at it. And so we approach our entire lives with this question in our minds: what actions will I take that contribute to my survival and the survival of my society?

Transformative AI stands to break the survival frame, where the problem of our survival is put into the hands of beings more powerful than ourselves. And so then the question becomes, what do we do if we don't have to do anything to survive?

I imagine quite a lot of things! Consider what it is like to be a pet kept by humans. They have all their survival needs met for them. Some of them are so inexperienced at surviving that they'd probably die if their human caretakers disappeared, and others would make it but without the experience of years of caring for their own survival to make them experts at it. What do they do given they don't have to fight to survive? They live in luxury and happiness, if their caretakers love them and are skillful, or suffering and sorrow, if their caretakers don't or aren't.

So perhaps like a dog who lives to chase a ball or a cat who lives for napping in the sun, we will one day live to tell stories, to play games, or to simply enjoy the pleasures of being alive. Let us hope that's the world we manage to create!

drocta on A Nonconstructive Existence Proof of Aligned Superintelligence

Yes, I knew the cardinalities in question were finite. The point applies regardless though. For any set X, there is no injection from 2^X to X. In the finite case, this is 2^n > n for all natural numbers n.

If there are N possible states, then the number of functions from possible states to {0,1} is 2^N , which is more than N, so there is some function from the set of possible states to {0,1} which is not implemented by any state.

nathan-helm-burger on If I wanted to spend WAY more on AI, what would I spend it on?

Personally, I have some long lists of ideas for things I haven't got time for including: research projects in AI, research projects in other subjects which could be advanced entirely by work on a computer (e.g. collecting and summarizing relevant facts from papers, running physical simulations of potential designs, etc), games, books, productivity tools, etc.

I've tried some of the current AI agent stuff, and nothing I've tried is quite good enough with the current set of models to automated enough of actualizing my ideas to make it worth my time. I'm prioritizing saving the lives of everyone on Earth, including everyone I love, by attempting to reduce the risk of AI catastrophe. Maybe next year, the critical point will be reached where spending a lot on inference to make many tries at each necessary step will become effective. If I could just dump in a couple thousand dollars a month into AI agent inference working on my ideas, and get a handful of mostly complete projects out, then I'd be making tons of money even if my success rate for the ideas taking off were 1 in 1000.

If you aren't the sort of person who does have lists of potentially valuable projects sitting around waiting for intelligent workers to breathe life into them... I dunno. Maybe the next generation of models will be good enough to also help you with the ideation phase?

kenoubi on "Wanting" and "liking"

Thank you for writing this. It has a lot of stuff I haven't seen before (I'm only really interested in neurology insofar as it's the substrate for literally everything I care about, but that's still plenty for "I'd rather have a clue than treat the whole area as spooky stuff that goes bump in the night").

As I understand it, you and many scientists are treating energy consumption by anatomical part of the brain (as proxied by blood flow) as the main way to see "what the brain is doing". It seems possible to me that there are other ways that specific thoughts could be kept compartmentalized, e.g. which neurotransmitters are active (although I guess this correlates pretty strongly to brain region anyway) or microtemporal properties of neural pulses; but the fact that we've found any kind of reasonably consistent relationship between [brain region consuming energy] and [mental state as reported or as predicted by the situation] means that brain region is a factor used for separating / modularizing cognition, if not that it's the only such part. So, I'll take brain region = mental module for granted for now and get to my actual question:

Do you know whether anyone has compiled data, across a wide variety of experiments or other data-gathering opportunities, of which brain regions have which kinds of correlations with one another? E.g. "these two tend to be active simultaneously", "this one tends to become active just after this one", etc.

I'm particularly interested in this for the brain regions you mention in this article, those related in various senses to good and/or to bad. If one puts both menthol and capsaicin in one's mouth at the same time, the menthol will stimulate cold receptors and the capsaicin will stimulate heat receptors, and one will have an experience out of range of what the sensors usually encounter: hot and cold, simultaneously in the same location. What I actually want to know is: are good and bad (or some forms of them, anyway) also represented in a way where one isn't actually the opposite of the other, neurologically speaking? If so, are there actual cases that are clearly best described as "good and bad", where to pick a single number instead would inevitably miss the intensity of the experience?

lao-mein on Glitch Token Catalog - (Almost) a Full Clear

It does!

'What is \'████████\'?\n\nThis term comes from the Latin for "to know". It'
'What is \'████████\'?\n\n"████████" is a Latin for "I am not",'

Putting it in the middle of code causes it to sometimes spontaneously switch to an SCP story

' for i in █████.\n\n"I\'m not a scientist!"\n\n- Dr'

' for i in █████,\n\n[REDACTED]\n\n[REDACTED]\n\n[REDACTED] [REDACTED]\n\n[REDACTED]'

yanni-kyriacos on yanni's Shortform

Big AIS news imo: “The initial members of the International Network of AI Safety Institutes are Australia, Canada, the European Union, France, Japan, Kenya, the Republic of Korea, Singapore, the United Kingdom, and the United States.”

https://www.commerce.gov/news/press-releases/2024/09/us-secretary-commerce-raimondo-and-us-secretary-state-blinken-announce

H/T @shakeel

dagon on How Often Does Taking Away Options Help?

This seems really dependent on the distribution of games and choices faced by participants. Also the specifics of why external limits are possible but normal commitments aren’t.

rogerdearnaley on Avoiding the Bog of Moral Hazard for AI

On your categories:

As simulator theory [? · GW] makes clear, a base model is a random generator, per query, of members of your category 2. I view instruction & safety training that to generate a pretty consistent member of category 1, or 3 as inherently hard — especially 1, since it's a larger change. My guess would thus be that the personality of Claude 3.5 is closer to your category 3 than 1 (modulo philosophical questions about whether there is any meaningful difference, e.g. for ethical purposes, between "actually having" an emotion versus just successfully simulating the output of the same token stream as a person who has an emotion).