LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Self location for LLMs by LLMs: Self-Assessment Checklist.
weightt an (weightt-an) · 2024-09-26T19:57:31.707Z · comments (0)

[link] A primer on ML in antibody engineering
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-23T17:03:07.628Z · comments (0)

Updating the NAO Simulator
jefftk (jkaufman) · 2024-10-30T13:50:06.908Z · comments (0)

[link] Intention-to-Treat (Re: How harmful is music, really?)
kqr · 2024-09-18T18:44:41.128Z · comments (0)

Switching to a 4GB SD
jefftk (jkaufman) · 2024-09-23T11:20:05.432Z · comments (1)

Conversational Signposts—An Antidote to Dull Social Interactions
Declan Molony (declan-molony) · 2024-10-22T05:37:56.175Z · comments (6)

Spooky Recommendation System Scaling
phdead · 2024-10-31T22:00:51.728Z · comments (0)

Sample Prevalence vs Global Prevalence
jefftk (jkaufman) · 2024-07-08T21:00:03.809Z · comments (0)

Motte-and-Bailey: a Short Explanation
Lorec · 2024-10-23T22:29:55.074Z · comments (0)

[link] The Computational Complexity of Circuit Discovery for Inner Interpretability
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-17T13:18:46.378Z · comments (2)

Beyond Defensive Technology
ejk64 · 2024-10-14T11:34:24.595Z · comments (1)

Tall tales and long odds
Solenoid_Entity · 2024-08-10T15:22:16.958Z · comments (0)

Using Dangerous AI, But Safely?
habryka (habryka4) · 2024-11-16T04:29:20.914Z · comments (2)

[link] Mechanistic Anomaly Detection Research Update
Nora Belrose (nora-belrose) · 2024-08-06T10:33:26.031Z · comments (0)

Palisade is hiring: Exec Assistant, Content Lead, Ops Lead, and Policy Lead
Charlie Rogers-Smith (charlie.rs) · 2024-10-09T00:04:03.837Z · comments (0)

Controlled Creative Destruction
Martin Sustrik (sustrik) · 2024-07-08T04:36:52.274Z · comments (0)

[question] Has Anyone Here Consciously Changed Their Passions?
Spade · 2024-09-09T01:36:26.197Z · answers+comments (12)

Switching to a Yamaha P-121 Keyboard
jefftk (jkaufman) · 2024-10-02T02:20:02.284Z · comments (0)

[question] Pondering how good or bad things will be in the AGI future
Sherrinford · 2024-07-09T22:46:31.874Z · answers+comments (9)

[link] Comparing Forecasting Track Records for AI Benchmarking and Beyond
ChristianWilliams · 2024-09-25T21:01:15.975Z · comments (0)

Organisation for Program Equilibrium reading group
Smaug123 · 2024-07-25T19:11:02.332Z · comments (14)

On passing Complete and Honest Ideological Turing Tests (CHITTs)
Aryeh Englander (alenglander) · 2024-07-10T04:01:33.567Z · comments (2)

[link] AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?
Corin Katzke (corin-katzke) · 2024-08-21T18:09:33.284Z · comments (0)

Which evals resources would be good?
Marius Hobbhahn (marius-hobbhahn) · 2024-11-16T14:24:48.012Z · comments (0)

Substituting Talkbox for Breath Controller
jefftk (jkaufman) · 2024-10-27T19:10:03.768Z · comments (0)

[link] AISafety.info: What are Inductive Biases?
Algon · 2024-09-19T17:26:24.581Z · comments (4)

We Don't Just Let People Die—So What Next?
James Stephen Brown (james-brown) · 2024-08-03T01:04:49.756Z · comments (8)

Restructuring Pop Songs for Contra
jefftk (jkaufman) · 2024-08-18T14:10:04.029Z · comments (0)

[link] OpenAI’s cybersecurity is probably regulated by NIS Regulations
Adam Jones (domdomegg) · 2024-10-25T11:06:38.392Z · comments (2)

On epistemic autonomy
sanyer (santeri-koivula) · 2024-08-31T18:50:43.377Z · comments (0)

[link] [Linkpost] Building Altruistic and Moral AI Agent with Brain-inspired Affective Empathy Mechanisms
Gunnar_Zarncke · 2024-11-04T10:15:35.550Z · comments (0)

A Policy Proposal
phdead · 2024-09-29T20:45:34.745Z · comments (4)

Krona Compare
jefftk (jkaufman) · 2024-07-20T01:10:03.994Z · comments (0)

Request for advice: Research for Conversational Game Theory for LLMs
Rome Viharo (rome-viharo) · 2024-10-16T17:53:30.243Z · comments (0)

Apply now: Get "unstuck" with the New IFS Self-Care Fellowship Program
Inga G. (inga-g) · 2024-07-16T08:18:11.436Z · comments (3)

[question] What's a good book for a technically-minded 11-year old?
Martin Sustrik (sustrik) · 2024-10-19T06:05:12.178Z · answers+comments (32)

Analysis of key AI analogies
Kevin Kohler (KevinKohler) · 2024-06-29T10:55:21.925Z · comments (2)

A “Scaling Monosemanticity” Explainer
latterframe · 2024-06-29T17:50:49.855Z · comments (0)

[question] Using hex to get murder advice from GPT-4o
Laurence Freeman (laurence-freeman) · 2024-11-13T18:30:23.475Z · answers+comments (5)

Festival Stats 2024
jefftk (jkaufman) · 2024-11-12T02:00:04.831Z · comments (0)

Review of METR’s public evaluation protocol
nahoj · 2024-06-30T22:03:08.945Z · comments (0)

Crafting Polysemantic Transformer Benchmarks with Known Circuits
Evan Anders (evan-anders) · 2024-08-23T22:03:15.288Z · comments (0)

[link] Book Review: Replacing Guilt - On Having Something to Fight For
Cole Killian (cole-killian) · 2024-11-03T19:47:35.093Z · comments (0)

On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it
Yuli_Ban · 2024-08-28T01:57:17.387Z · comments (2)

[question] Where should I look for information on gut health?
FinalFormal2 · 2024-08-20T19:44:30.632Z · answers+comments (10)

Book Review: Safe Enough? A History of Nuclear Power and Accident Risk
ErickBall · 2024-07-09T01:12:28.730Z · comments (0)

[question] I want a good multi-LLM API-powered chatbot
rotatingpaguro · 2024-09-08T09:40:52.736Z · answers+comments (3)

Summer Tour Stops
jefftk (jkaufman) · 2024-07-09T19:10:05.659Z · comments (0)

Pleasure and suffering are not conceptual opposites
MichaelStJules · 2024-08-11T18:32:30.359Z · comments (0)

[question] Does life actually locally *increase* entropy?
tailcalled · 2024-09-16T20:30:33.148Z · answers+comments (27)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

notfnofn on D0TheMath's Shortform

Let's back up here and clarify definitions before invoking any theorems. In the language of set theory, we have a countably infinite set of finite statements. Some statements imply other statements. A subset of these statements is said to be consistent if they can all be assigned to true such that, when following the basic rules of logic, one does not arrive at a contradiction.

The compactness theorem is helpful when $A$ is an infinite set. $Z F C$ is a finite set of axioms, so let's ignore everything about finite subsets of $A$ and the compactness theorem; it's not relevant.

I'll now rewrite your last sentence as:

ZFC + not Consistent(ZFC) has no model <-> not Consistent(ZFC + not Consistent(ZFC))

This is true but irrelevant. Assuming ZFC is consistent, ZFC will not be able to prove its own consistency so [not Consistent(ZFC)] can be added as an axiom without affecting its consistency. This means that ZFC + [not Consistent(ZFC)] would indeed have a model; I forget how this goes but I think it's something like "start with a model of ZFC, throw in a $c$ that's treated as a natural number and corresponds to the contradiction found in ZFC, then close". I think $c$ is automatically treated as greater than every "actual" natural number (and the way to show that this can be added without issue (I think) involves the compactness theorem).

sharmake-farah on Lao Mein's Shortform

Maybe there's a case there, but I'd doubt it get past a jury, let alone result in any guilty verdicts.

sharmake-farah on o1 is a bad idea

Oh, now I understand.

And AIs have already been superhuman at chess for very long, yet that domain gives very little incentive for very strong instrumental convergence.

I am claiming that for practical AIs, the results of training them in the real world with goals will give them instrumental convergence, but without further incentives, will not give them so much instrumental convergence that it leads to power-seeking to disempower humans by default.

jbash on OpenAI Email Archives (from Musk v. Altman)

I used AI assistance to generate this, which might have introduced errors.

Resulting in a strong downvote and, honestly, outright anger on my part.

Check the original source to make sure it's accurate before you quote it: https://www.courtlistener.com/docket/69013420/musk-v-altman/ [1]

If other people have to check it before they quote it, why is it OK for you not to check it before you post it?

bogdan-ionut-cirstea on johnswentworth's Shortform

Would the prediction also apply to inference scaling (laws) - and maybe more broadly various forms of scaling post-training, or only to pretraining scaling?

mondsemmel on Lao Mein's Shortform

What if whistleblowers and internal documents corroborated that they think what they're doing could destroy the world?

sharmake-farah on Lao Mein's Shortform

Notably, no law I know of allows you to take legal action on a hunch that they might destroy the world based on your probability of them destroying the world being high without them doing any harmful actions (and no, building AI doesn't count here.)

mondsemmel on Lao Mein's Shortform

Ilya is demonstrably not in on that mission, since his step immediately after leaving OpenAI was to found an additional AGI company and thus increase x-risk.

mondsemmel on Lao Mein's Shortform

I don't understand the reference to assassination. Presumably there are already laws on the books that outlaw trying to destroy the world (?), so it would be enough to apply those to AGI companies.

joe-rogero on What are Emotions?

What happens then when a non-thinking thing feels happy? Is that happiness valued? To whom? Or do you think this is impossible?

When a baby feels happy, it feels happy. Nothing else happens.

There are differences among wanting, liking, and endorsing [LW · GW] something.

A happy blob may like feeling happy, and might even feel a desire to experience more of it, but it cannot endorse things if it doesn't have agency. Human fulfillment and wellbeing typically involves some element of all three.

An unthinking being cannot value even its own happiness, because the concept traditionally meant by "values" refers to the goals that an agent points itself at, and an unthinking being isn't agentic - it does not make plans to steer the world in any particular direction.

Then if you also say that happiness is good, and that good implies value, one must ask, who or what is valuing the happiness? The rock? The universe?

I am. When I say "happiness is good", this is isomorphic with "I value happiness". It is a statement about the directions in which I attempt to steer the world.

Like there must be some physical process by which happiness is valued. Maybe a dimension by which emotional value is expressed?

The physical process that implements "valuing happiness" is the firing of neurons in a brain. It could in theory be implemented in silicon as well, but it's near-certainly not implemented by literal rocks.

something that is challenging, and requires a certain kind of problem solving, where the solution is beautiful in some way

Yep, that makes sense. I notice, however, that these things do not appear to be emotions. And that's fine! It is okay to innately value things that are not emotions! Like "having a model of the world that is as accurate as possible", i.e. truth-seeking. Many people (especially here on LW) value knowledge for its own sake. There are emotions associated with this goal, but the emotions are ancillary. There are also instrumental reasons to seek truth, but they don't always apply. The actual goal is "improving one's world-model" or something similar. It bottoms out there. Emotions need not apply.

The key piece though is that regardless, as tslarm says, "emotions are accompanied by (or identical with, depending on definitions) valenced qualia". They always have some value.

First off, I'm not wholly convinced this is true. I think emotions are usually accompanied by valenced qualia, but (as with my comments about curiosity) not necessarily always. Sure, if you define "emotion" so that it excludes all possible counterexamples, then it will exclude all possible counterexamples, but also you will no longer be talking about the same concept as other people using the word "emotion".

Second, there is an important difference between "accompanied by valenced qualia" and "has value". There is no such thing as "inherent value", absent a thinking being to do the evaluation. Again, things like values and goals are properties of agents; they reflect the directions in which those agents steer.

Finally, more broadly, there's a serious problem with terminally valuing only the feeling of emotions. Imagine a future scenario: all feeling beings are wired to an enormous switchboard, which is in turn connected to their emotional processors. The switchboard causes them to feel an optimal mixture of emotions at all times (whatever you happen to think that means) and they experience nothing else. Is this a future you would endorse? Does something important seem to be missing?