LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary
Corin Katzke (corin-katzke) · 2024-10-01T20:35:32.399Z · comments (0)

New UChicago Rationality Group
Noah Birnbaum (daniel-birnbaum) · 2024-11-08T21:20:34.485Z · comments (0)

[link] Triangulating My Interpretation of Methods: Black Boxes by Marco J. Nathan
adamShimi · 2024-10-09T19:13:26.631Z · comments (0)

[link] Approval-Seeking ⇒ Playful Evaluation
Jonathan Moregård (JonathanMoregard) · 2024-08-28T21:03:51.244Z · comments (0)

[link] Contagious Beliefs—Simulating Political Alignment
James Stephen Brown (james-brown) · 2024-10-13T00:27:08.084Z · comments (0)

[link] Models of life
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-29T19:24:40.060Z · comments (0)

[link] Jailbreaking language models with user roleplay
loops (smitop) · 2024-09-28T23:43:10.870Z · comments (0)

An open response to Wittkotter and Yampolskiy
Donald Hobson (donald-hobson) · 2024-09-24T22:27:21.987Z · comments (0)

[question] Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion
Wenitte Apiou (wenitte-apiou) · 2024-11-01T18:56:06.900Z · answers+comments (25)

[link] Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Jonathan N (derpyplops) · 2024-11-05T01:01:08.083Z · comments (0)

LLMs are likely not conscious
research_prime_space · 2024-09-29T20:57:26.111Z · comments (8)

On Intentionality, or: Towards a More Inclusive Concept of Lying
Cornelius Dybdahl (Kalciphoz) · 2024-10-18T10:37:32.201Z · comments (0)

[link] Universal dimensions of visual representation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-08-28T10:38:58.396Z · comments (0)

[link] What is autonomy? Why boundaries are necessary.
Chipmonk · 2024-10-21T17:56:33.722Z · comments (1)

MIT FutureTech are hiring for a Head of Operations role
peterslattery · 2024-10-02T17:11:42.960Z · comments (0)

Meta AI (FAIR) latest paper integrates system-1 and system-2 thinking into reasoning models.
happy friday (happy-friday) · 2024-10-24T16:54:15.721Z · comments (0)

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

The Great Bootstrap
KristianRonn · 2024-10-11T19:46:51.752Z · comments (0)

[question] What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented?
Roko · 2024-10-19T06:11:12.602Z · answers+comments (16)

[link] Consciousness As Recursive Reflections
Gunnar_Zarncke · 2024-10-05T20:00:53.053Z · comments (3)

Of Birds and Bees
RussellThor · 2024-09-30T10:52:15.069Z · comments (9)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] [Linkpost] Hawkish nationalism vs international AI power and benefit sharing
jakub_krys (kryjak) · 2024-10-18T18:13:19.425Z · comments (5)

Fake Blog Posts as a Problem Solving Device
silentbob · 2024-08-31T09:22:54.513Z · comments (0)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

[question] What makes one a "rationalist"?
mathyouf · 2024-10-08T20:25:21.812Z · answers+comments (5)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

Deception and Jailbreak Sequence: 2. Iterative Refinement Stages of Jailbreaks in LLM
Winnie Yang (winnie-yang) · 2024-08-28T08:41:38.967Z · comments (2)

[link] Taking nonlogical concepts seriously
Kris Brown (kris-brown) · 2024-10-15T18:16:01.226Z · comments (5)

[question] Does a time-reversible physical law/Cellular Automaton always imply the First Law of Thermodynamics?
Noosphere89 (sharmake-farah) · 2024-08-30T15:12:28.823Z · answers+comments (11)

Denver USA - ACX Meetups Everywhere Fall 2024
Eneasz · 2024-08-29T18:40:53.332Z · comments (0)

Foresight Vision Weekend 2024
Allison Duettmann (allison-duettmann) · 2024-10-01T21:59:55.107Z · comments (0)

[link] Thinking LLMs: General Instruction Following with Thought Generation
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-10-15T09:21:22.583Z · comments (0)

[question] On the subject of in-house large language models versus implementing frontier models
Annapurna (jorge-velez) · 2024-09-23T15:00:32.811Z · answers+comments (1)

One person's worth of mental energy for AI doom aversion jobs. What should I do?
Lorec · 2024-08-26T01:29:01.700Z · comments (16)

[link] Boons and banes
dkl9 · 2024-09-23T06:18:38.335Z · comments (0)

[link] Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries (alexander-de-vries) · 2024-09-05T10:23:08.958Z · comments (20)

If I care about measure, choices have additional burden (+AI generated LW-comments)
avturchin · 2024-11-15T10:27:15.212Z · comments (7)

[link] Validating / finding alignment-relevant concepts using neural data
Bogdan Ionut Cirstea (bogdan-ionut-cirstea) · 2024-09-20T21:12:49.267Z · comments (0)

A brief theory of why we think things are good or bad
David Johnston (david-johnston) · 2024-10-20T20:31:26.309Z · comments (10)

Moral Trade, Impact Distributions and Large Worlds
Larks · 2024-09-20T03:45:56.273Z · comments (0)

The Personal Implications of AGI Realism
xizneb · 2024-10-20T16:43:37.870Z · comments (7)

Piling bounded arguments
momom2 (amaury-lorin) · 2024-09-19T22:27:41.534Z · comments (0)

[question] If I ask an LLM to think step by step, how big are the steps?
ryan_b · 2024-09-13T20:30:50.558Z · answers+comments (1)

A Brief Explanation of AI Control
Aaron_Scher · 2024-10-22T07:00:56.954Z · comments (1)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

[link] Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
david reinstein (david-reinstein) · 2024-09-11T20:25:27.845Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Wei· 3h

This post touches on several issues I've been thinking about since my early work on anthropic decision theory and UDT. Let me break this down:

1. The measure-decline problem is actually more general than just quantum mechanics. It appears in any situation where your decision algorithm gets instantiated multiple times, including classical copying, simulation, or indexical uncertainty. See my old posts on anthropic probabilities and probability-as-preference.

2. The "functional identity" argument being used here to dismiss certain types of splitting is problematic. What counts as "functionally identical" depends on your decision theory's level of grain. UDT1.1 would treat seemingly identical copies differently if they're in different computational states, while CDT might lump them together.

Some relevant questions that aren't addressed:

- How do we handle preference aggregation across different versions of yourself with different measures?
- Should we treat quantum branching differently from other forms of splitting? (I lean towards "no" these days)
- How does this interact with questions of personal identity continuity?
- What happens when we consider infinite branches? (This relates to my work on infinite ethics)

The real issue here isn't about measure per se, but about how to aggregate preferences across different instances of your decision algorithm. This connects to some open problems in decision theory:

1. The problem of preference aggregation across copies
2. How to handle logical uncertainty in the context of anthropics
3. Whether "caring about measure" can be coherently formalized

I explored some of these issues in my paper on UDT, but I now think the framework needs significant revision to handle these cases properly.

Stuart · 2h
> The problem of preference aggregation across copies

This seems key. Have you made any progress on formalizing this since your 2019 posts?

Wei · 2h
Some progress on the math, but still hitting fundamental issues with infinity. Might post about this soon.

Abram · 1h
Curious about your current thoughts on treating decision-theoretic identical copies differently. Seems related to logical causation?

Wei · 45m
Yes - this connects to some ideas about logical coordination I've been developing. The key insight is that even "identical" copies might have different logical roles...

[Edit: For those interested in following up, I recommend starting with my sequence on decision theory and anthropics, then moving to the more recent work on logical uncertainty.]

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Vladimir_N 3h

(This is a rather technical comment that attempts to clarify some decision-theoretic confusions.)

Your treatment of measure requires more formal specification. Let's be precise about what we mean by "caring about measure" in decision-theoretic terms.

Consider a formalization where we have:
1. A space of possible outcomes Ω
2. A measure μ on this space
3. A utility function U: Ω → ℝ
4. A decision function D that maps available choices to distributions over Ω

The issue isn't about "spending" measure, but about how we aggregate utility across branches. The standard formulation already handles this correctly through expected utility:

E[U] = ∫_Ω U(ω)dμ(ω)

Your concern about "measure decline" seems to conflate the measure μ with the utility U. These are fundamentally different mathematical objects serving different purposes in the formalism.

If we try to modify this to "care about measure directly," we'd need something like:

U'(ω) = U(ω) * f(μ(ω))

But this leads to problematic decision-theoretic behavior, violating basic consistency requirements like dynamic consistency. It's not clear how to specify f in a way that doesn't lead to contradictions.

The apparent paradox dissolves when we properly separate:
1. Measure as probability measure (μ)
2. Utility as preference ordering over outcomes (U)
3. Decision-theoretic aggregation (E[U])

[Technical note: This relates to my work on logical uncertainty and reflection principles. See my 2011 paper on decision theory in anthropic contexts.]

orthonormal · 2h
> U'(ω) = U(ω) * f(μ(ω))

This is a very clean way of showing why "caring about measure" leads to problems.

Vladimir_N · 2h
Yes, though there are even deeper issues with updateless treatment of anthropic measure that I haven't addressed here for brevity.

Wei_D · 1h
Interesting formalization. How would this handle cases where the agent's preferences include preferences over the measure itself?

Vladimir_N · 45m
That would require extending the outcome space Ω to include descriptions of measures, which brings additional technical complications...

[Note: This comment assumes familiarity with measure theory and decision theory fundamentals.]

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Eli · 2h

*sigh*

I feel like I need to step in here because people are once again getting confused about measure, identity, and decision theory in ways I thought we cleared up circa 2008-2009.

First: The whole "measure declining by choice" framing is confused. You're not "spending" measure like some kind of quantum currency. The measure *describes* the Born probabilities; it's not something you optimize for directly any more than you should optimize for having higher probabilities in your belief distribution.

Second: The apparent "splitting" of worlds isn't fundamentally different between quantum events, daily choices, and life-changing decisions. It's all part of the same unified wavefunction evolving according to the same physics. The distinction being drawn here is anthropocentric and not particularly meaningful from the perspective of quantum mechanics.

What *is* relevant is how you handle subjective anticipation of future experiences. But note that "caring about measure" in the way described would lead to obviously wrong decisions - like refusing to make any choices at all to "preserve measure," which would itself be a choice (!).

If you're actually trying to maximize expected utility across the multiverse (which is what you should be doing), then the Born probabilities handle everything correctly without need for additional complexity. The framework I laid out in Quantum Ethics handles this cleanly.

And please, can we stop with the quantum suicide thought experiments? They're actively harmful to clear thinking about decision theory and anthropics. I literally wrote "Don't Un-think the Quantum" to address exactly these kinds of confusions.

(Though I suppose I should be somewhat grateful that at least nobody in this thread has brought up p-zombies or consciousness crystals yet...)

[Edit: To be clear, this isn't meant to discourage exploration of these ideas. But we should build on existing work rather than repeatedly discovering the same confusions.]

RationalSkeptic · 1h
> like refusing to make any choices at all to "preserve measure,"

This made me laugh out loud. Talk about Pascal's Mugging via quantum mechanics...

Eli · 45m
Indeed. Though I'd note that proper handling of Pascal's Mugging itself requires getting anthropics right first...

viliam on Heresies in the Shadow of the Sequences

Stop using LLM's to write. It burns the commons by filling allowing you to share takes on topics you don't care enough to write about yourself, while also introducing insidious (and perhaps eventually malign) errors.

Yeah, someone just started doing this in ACX comments, and it's annoying.

When I read texts written by humans, there is some relation between the human and the text. If I trust the human, I will trust the text. If the text is wrong, I will stop trusting the human. Shortly, I hold humans accountable for their texts.

But if you just copy-paste whatever the LLM has vomited out, I don't know... did you at least do some sanity check, in other words, are you staking your personal reputation on these words? Or if I spend my time finding an error, will you just shrug and say "not my fault, we all know that LLMs hallucinate sometimes"? In other words, will feedback improve your writing in the future? If not... then the only reason to give feedback is to warn other humans who happen to read that text.

The same thing applies when someone uses an LLM to generate code. Yes, it is often a way more efficient way to write the code. But did you review the code? Or are you just copying it blindly? We already had a smaller version of this problem with people blindly copying code from Stack Exchange. LLM is like Stack Exchange on steroids, both the good and the bad parts.

there do exist fairly coherent moral projects such as religions

I am not sure how coherent they are. For example, I was reading on ACX about Christianity, and... it has the message of loving your neighbor and turning the other cheek... but also the recommendation not to cast pearls before the swine... and I am not sure whether it makes it clear when exactly are you supposed to treat your neighbors with love or as swines.

It also doesn't provide an answer to whom you should give your coat if two people are trying to steal your shirt, etc.

Plus, there were historical situations when Christians didn't turn the other cheek (Crusades, Inquisition, etc.), and maybe without those situations Christianity would not exist today.

What I am saying is that there is a human judgment involved (which sometimes results in breaking the rules), and maybe the projects are not going to work without that.

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

In replies to this comment I will post other Sonnet3.5-generated replies by known LW people. If it is against the rules please let me know and I will delete. I will slightly change the names, so they will not contaminate future search and AI training

lao-mein on Lao Mein's Shortform

Sam Altman has made many enemies in his tenure at OpenAI. One of them is Elon Musk, who feels betrayed by OpenAI, and has filed failed lawsuits against the company. I previously wrote this off as Musk considering the org too "woke", but Altman's recent behavior has made me wonder if it was more of a personal betrayal. Altman has taken Musk's money, intended for an AI safety non-profit, and is currently converting it into enormous personal equity. All the while de-emphasizing AI safety research.

Musk now has the ear of the President-elect. Vice-President-elect JD Vance is also associated with Peter Thiel, whose ties with Musk go all the way back to PayPal. Has there been any analysis on the impact this may have on OpenAI's ongoing restructuring? What might happen if the DOJ turns hostile?

viliam on Why would ASI share any resources with us?

In this scenario, why would ASI not do either one of the following things: 1) Exploit humans in pursuit of its own goals, while giving us the barest minimum to survive (effectively making us slaves) or 2) Take over the resources of the entire solar system for itself and leave us starving without any resources?

The ASI will do what it is programmed to do. If it means helping humans, it will help humans. If there is a bug in the program, it will... do something that is difficult to predict (and that sounds scary, because most random things are not good).

Make us slaves? We probably wouldn't be useful slaves, compared to alternatives, such as robots, or human bodies with brains replaced by computers.

Taking over the resources probably means killing us in the process, if those resources include e.g. water or oxygen of Earth.

atillayasar on AtillaYasar's Shortform

When is philosophy useful?

Meta

This post is useful to me because 1) it helped me think more clearly about whether and how exactly philosophy is useful, 2) I can read it later and again get the benefits of (1).

The problem

Doing philosophy and reading/writing is often so impractical, people do it just for the sake of doing it. When you write or read a bunch of stuff about X, get categories and lists and definitions, it feels like you're making progress on X, but are you really?

Joseph Goldstein (meditation teacher) at the beginning of his lecture about Mindfulness, jokes that they'll do an hour and a half of meditation, then after pausing for laughter points out that that would actually be more useful than anything he could say on the subject.

Criteria

The way to tell if philosophy is useful is, if it actually influences the future, if you:
- directly use the information for an action or decision
- use the material OR the wordless intuitions gained from the material in your future thinking (times the probability that you'll use it for a future action or decision)
- refute the bad ideas that you read or write (pruning is progress too!)

(slight caveat: reading and writing and thinking, makes you better at those things and it creates positive habits, even if it's not "object level useful". But still! I want to train my skills while attempting to be productive -- I don't take my mental energy for granted.)

Lost in time / deeper mental models / information bottlenecks

One really simplified way to measure utility, is whether you can remember philosophy that you did. But there's something way deeper to this. If you force yourself to do useful philosophy, given that you can't remember a lot, a solution that arises naturally is that you create deeper, or higher level, representations of things that you know.

The more abstract they get, the more information they can capture.

A way to view it (basically compression and indexing lol):
you simply replace the "main table" that was previously storing the information, with an index that point to the information (because the "main table" runs out of space, and then later an index to the indices, etc., where the main table gets increasingly abstract as the number of "leaves" at the end of the node (meaning explicit ideas and pieces of content and pieces of reasoning), keep increasing. But the "main table" is what you're mostly working with, in terms of what you perceive as your thoughts. So from your perspective, as you learn, you are simply getting increasingly abstract representations, and it takes longer to retrieve information or to put into words what you think you know, because you are spending more cycles on traversing and indexing the graph and translating between forms of representation. (Memory being reconstructive is loosely related to this but I haven't dug into that topic at all. Also isn't this analogous to how auto-encoders work?)

(Balaji Srinivasan about memory management, paraphrase: "if you just have a giant mental tree that you attach concepts to over time, you can have compounding learning and you can remember everything because it is all connected" -- similar to memory palace)

Format / readability / retrievability

This is subtle and hard to put into words but is in practice very impactful and keeps surprising me, which is that, how easy things are to find and how easy it is to read them once you find them, matters a lot. If you have a diary in Notepad++, it's essentially a flat list, which is super annoying to retrieve things from, since you can only scroll or do ctrl-f. Fancier systems can make retrieval easier and they allow formatting, but require more initial energy to start writing notes in.

Youtube comments contain almost zero back-and-forth, Twitter has more, Reddit has a lot more, 4chan has longer chains than Reddit but is weaker in terms of structure.

LessWrong is actually really good for this, because you can read a preview on-hover, and do it recursively for preview panels -- Gwern-style. It has great formatting and is pleasant to edit. Ease of discussion via comments is better than all the above, except Reddit.

About retrieval on LW: this is my first time seeing it, but this is actually pretty cool [? · GW] -- though it's still not super great. Because retrieval is very much a UI/UX problem that is super limited by this existing on the browser. Notice how you can't use the keyboard to affect how results are presented, affect filter/sort or change page -- this goes for any search engine I've ever seen.

It's hard to make dynamically changing and editable UIs. You can design nice UIs, but you can't really design a superset of UIs that is traversable with hotkeys. Because it has to work on, like, 9000 different browsers and devices and screen sizes.

Retrievability transformed programming

I believe that Substack+Google deeply transformed what it means to be a programmer. Think of the meme about expectation vs reality, that programmers spend almost all their time Googling things. It's because the search engine is so powerful, that it outsourced almost all of the memorization requirement to, Googling skills + ability to parse Substack posts + trying out suggestions + figuring out whether a suggestion will work for you and how to reshape it to your codebase.

clone-of-saturn on Heresies in the Shadow of the Sequences

Any agent that makes decisions has an implicit decision theory, it just might not be a very good one. I don't think anyone ever said advanced decision theory was required for AGI, only for robust alignment.

rauno-arike on Rauno's Shortform

[Link] Something weird is happening with LLMs and chess by dynomight [LW · GW]

dynomight stacked up 13 LLMs against Stockfish on the lowest difficulty setting and found a huge difference between the performance of GPT-3.5 Turbo Instruct and any other model:

all

People noticed already last year that RLHF-tuned models are much worse at chess than base/instruct models, so this isn't a completely new result. The gap between models from the GPT family could also perhaps be (partially) closed through better prompting: Adam Karvonen has created a repo for evaluating LLMs' chess-playing abilities and found that many of GPT-4's losses against 3.5 Instruct were caused by GPT-4 proposing illegal moves. However, dynomight notes that there isn't nearly as big of a gap between base and chat models from other model families:

instruct comparison

This is a surprising result to me—I had assumed that base models are now generally decent at chess after seeing the news about 3.5 Instruct playing at 1800 ELO level last year. dynomight proposes the following four explanations for the results:

1. Base models at sufficient scale can play chess, but instruction tuning destroys it.
2. GPT-3.5-instruct was trained on more chess games.
3. There’s something particular about different transformer architectures.
4. There’s “competition” between different types of data.