LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (14)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (5)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

Bay Winter Solstice 2024: song leading auditions
tcheasdfjkl · 2024-11-10T23:59:08.199Z · comments (0)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

Concrete Methods for Heuristic Estimation on Neural Networks
Oliver Daniels (oliver-daniels-koch) · 2024-11-14T05:07:55.240Z · comments (0)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (3)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

the Daydication technique
chaosmage · 2024-10-18T21:47:46.448Z · comments (0)

Why is there Nothing rather than Something?
Logan Zoellner (logan-zoellner) · 2024-10-26T12:37:50.204Z · comments (3)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde (kola-ayonrinde) · 2024-10-30T22:50:45.642Z · comments (0)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (8)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

Thinking in 2D
sarahconstantin · 2024-10-20T19:30:05.842Z · comments (0)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

[link] Care Doesn't Scale
stavros · 2024-10-28T11:57:38.742Z · comments (1)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI · 2024-10-28T14:48:16.744Z · comments (0)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

[link] A brief history of the automated corporation
owencb · 2024-11-04T14:35:04.906Z · comments (1)

Inferential Game: The Foraging (Ex-)Bandit
abstractapplic · 2024-11-11T16:59:42.058Z · comments (4)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Option control
Joe Carlsmith (joekc) · 2024-11-04T17:54:03.073Z · comments (0)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-10-27T04:05:20.377Z · comments (0)

You're Playing a Rough Game
jefftk (jkaufman) · 2024-10-17T19:20:06.251Z · comments (2)

[question] When can I be numerate?
FinalFormal2 · 2024-09-12T04:05:27.710Z · answers+comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

adam_scholl on Untrusted smart models and trusted dumb models

I'm curious if "trusted" in this sense basically just means "aligned"—or like, the superset of that which also includes "unaligned yet too dumb to cause harm" and "unaligned yet prevented from causing harm"—or whether you mean something more specific? E.g., are you imagining that some powerful unconstrained systems are trusted yet unaligned, or vice versa?

jeremy-gillen on Thoughts after the Wolfram and Yudkowsky discussion

I get the feeling that I’m still missing the point somehow and that Yudkowsky would say we still have a big chance of doom if our algorithms were created by hand with programmers whose algorithms always did exactly what they intended even when combined with their other algorithms.

I would bet against Eliezer being pessimistic about this, if we are assuming the algorithms are deeply-understood enough that we are confident that we can iterate on building AGI. I think there's maybe a problem with the way Eliezer communicates that gives people the impression that he's a rock with "DOOM" written on it.

I think the pessimism comes from there being several currently-unsolved problems that get in the way of "deeply-understood enough". In principle it's possible to understand these problems and hand-build a safe and stable AGI, it just looks a lot easier to hand-build an AGI without understanding them all, and even easier than that to train an AGI without even thinking about them.

I call most of these "instability" problems. Where the AI might for example learn more, or think more, or self-modify, and each of these can shift the context in a way that causes an imperfectly designed AI to pursue unintended goals.

Here are some descriptions of problems in that cluster: optimization daemons, ontology shifts, translating between our ontology and the AI's internal ontology in a way that generalizes, pascal's mugging [LW · GW], reflectively stable preferences & decision algorithms, reflectively stable corrigibility, and correctly estimating future competence under different circumstances.

Some may be resolved by default along the way to understanding how to build AGI by hand, but it isn't clear. Some are kinda solved already in some contexts.

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Wei· 3h

This post touches on several issues I've been thinking about since my early work on anthropic decision theory and UDT. Let me break this down:

1. The measure-decline problem is actually more general than just quantum mechanics. It appears in any situation where your decision algorithm gets instantiated multiple times, including classical copying, simulation, or indexical uncertainty. See my old posts on anthropic probabilities and probability-as-preference.

2. The "functional identity" argument being used here to dismiss certain types of splitting is problematic. What counts as "functionally identical" depends on your decision theory's level of grain. UDT1.1 would treat seemingly identical copies differently if they're in different computational states, while CDT might lump them together.

Some relevant questions that aren't addressed:

- How do we handle preference aggregation across different versions of yourself with different measures?
- Should we treat quantum branching differently from other forms of splitting? (I lean towards "no" these days)
- How does this interact with questions of personal identity continuity?
- What happens when we consider infinite branches? (This relates to my work on infinite ethics)

The real issue here isn't about measure per se, but about how to aggregate preferences across different instances of your decision algorithm. This connects to some open problems in decision theory:

1. The problem of preference aggregation across copies
2. How to handle logical uncertainty in the context of anthropics
3. Whether "caring about measure" can be coherently formalized

I explored some of these issues in my paper on UDT, but I now think the framework needs significant revision to handle these cases properly.

Stuart · 2h
> The problem of preference aggregation across copies

This seems key. Have you made any progress on formalizing this since your 2019 posts?

Wei · 2h
Some progress on the math, but still hitting fundamental issues with infinity. Might post about this soon.

Abram · 1h
Curious about your current thoughts on treating decision-theoretic identical copies differently. Seems related to logical causation?

Wei · 45m
Yes - this connects to some ideas about logical coordination I've been developing. The key insight is that even "identical" copies might have different logical roles...

[Edit: For those interested in following up, I recommend starting with my sequence on decision theory and anthropics, then moving to the more recent work on logical uncertainty.]

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Vladimir_N 3h

(This is a rather technical comment that attempts to clarify some decision-theoretic confusions.)

Your treatment of measure requires more formal specification. Let's be precise about what we mean by "caring about measure" in decision-theoretic terms.

Consider a formalization where we have:
1. A space of possible outcomes Ω
2. A measure μ on this space
3. A utility function U: Ω → ℝ
4. A decision function D that maps available choices to distributions over Ω

The issue isn't about "spending" measure, but about how we aggregate utility across branches. The standard formulation already handles this correctly through expected utility:

E[U] = ∫_Ω U(ω)dμ(ω)

Your concern about "measure decline" seems to conflate the measure μ with the utility U. These are fundamentally different mathematical objects serving different purposes in the formalism.

If we try to modify this to "care about measure directly," we'd need something like:

U'(ω) = U(ω) * f(μ(ω))

But this leads to problematic decision-theoretic behavior, violating basic consistency requirements like dynamic consistency. It's not clear how to specify f in a way that doesn't lead to contradictions.

The apparent paradox dissolves when we properly separate:
1. Measure as probability measure (μ)
2. Utility as preference ordering over outcomes (U)
3. Decision-theoretic aggregation (E[U])

[Technical note: This relates to my work on logical uncertainty and reflection principles. See my 2011 paper on decision theory in anthropic contexts.]

orthonormal · 2h
> U'(ω) = U(ω) * f(μ(ω))

This is a very clean way of showing why "caring about measure" leads to problems.

Vladimir_N · 2h
Yes, though there are even deeper issues with updateless treatment of anthropic measure that I haven't addressed here for brevity.

Wei_D · 1h
Interesting formalization. How would this handle cases where the agent's preferences include preferences over the measure itself?

Vladimir_N · 45m
That would require extending the outcome space Ω to include descriptions of measures, which brings additional technical complications...

[Note: This comment assumes familiarity with measure theory and decision theory fundamentals.]

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

Eli · 2h

*sigh*

I feel like I need to step in here because people are once again getting confused about measure, identity, and decision theory in ways I thought we cleared up circa 2008-2009.

First: The whole "measure declining by choice" framing is confused. You're not "spending" measure like some kind of quantum currency. The measure *describes* the Born probabilities; it's not something you optimize for directly any more than you should optimize for having higher probabilities in your belief distribution.

Second: The apparent "splitting" of worlds isn't fundamentally different between quantum events, daily choices, and life-changing decisions. It's all part of the same unified wavefunction evolving according to the same physics. The distinction being drawn here is anthropocentric and not particularly meaningful from the perspective of quantum mechanics.

What *is* relevant is how you handle subjective anticipation of future experiences. But note that "caring about measure" in the way described would lead to obviously wrong decisions - like refusing to make any choices at all to "preserve measure," which would itself be a choice (!).

If you're actually trying to maximize expected utility across the multiverse (which is what you should be doing), then the Born probabilities handle everything correctly without need for additional complexity. The framework I laid out in Quantum Ethics handles this cleanly.

And please, can we stop with the quantum suicide thought experiments? They're actively harmful to clear thinking about decision theory and anthropics. I literally wrote "Don't Un-think the Quantum" to address exactly these kinds of confusions.

(Though I suppose I should be somewhat grateful that at least nobody in this thread has brought up p-zombies or consciousness crystals yet...)

[Edit: To be clear, this isn't meant to discourage exploration of these ideas. But we should build on existing work rather than repeatedly discovering the same confusions.]

RationalSkeptic · 1h
> like refusing to make any choices at all to "preserve measure,"

This made me laugh out loud. Talk about Pascal's Mugging via quantum mechanics...

Eli · 45m
Indeed. Though I'd note that proper handling of Pascal's Mugging itself requires getting anthropics right first...

viliam on Heresies in the Shadow of the Sequences

Stop using LLM's to write. It burns the commons by filling allowing you to share takes on topics you don't care enough to write about yourself, while also introducing insidious (and perhaps eventually malign) errors.

Yeah, someone just started doing this in ACX comments, and it's annoying.

When I read texts written by humans, there is some relation between the human and the text. If I trust the human, I will trust the text. If the text is wrong, I will stop trusting the human. Shortly, I hold humans accountable for their texts.

But if you just copy-paste whatever the LLM has vomited out, I don't know... did you at least do some sanity check, in other words, are you staking your personal reputation on these words? Or if I spend my time finding an error, will you just shrug and say "not my fault, we all know that LLMs hallucinate sometimes"? In other words, will feedback improve your writing in the future? If not... then the only reason to give feedback is to warn other humans who happen to read that text.

The same thing applies when someone uses an LLM to generate code. Yes, it is often a way more efficient way to write the code. But did you review the code? Or are you just copying it blindly? We already had a smaller version of this problem with people blindly copying code from Stack Exchange. LLM is like Stack Exchange on steroids, both the good and the bad parts.

there do exist fairly coherent moral projects such as religions

I am not sure how coherent they are. For example, I was reading on ACX about Christianity, and... it has the message of loving your neighbor and turning the other cheek... but also the recommendation not to cast pearls before the swine... and I am not sure whether it makes it clear when exactly are you supposed to treat your neighbors with love or as swines.

It also doesn't provide an answer to whom you should give your coat if two people are trying to steal your shirt, etc.

Plus, there were historical situations when Christians didn't turn the other cheek (Crusades, Inquisition, etc.), and maybe without those situations Christianity would not exist today.

What I am saying is that there is a human judgment involved (which sometimes results in breaking the rules), and maybe the projects are not going to work without that.

avturchin on If I care about measure, choices have additional burden (+AI generated LW-comments)

In replies to this comment I will post other Sonnet3.5-generated replies by known LW people. If it is against the rules please let me know and I will delete. I will slightly change the names, so they will not contaminate future search and AI training

lao-mein on Lao Mein's Shortform

Sam Altman has made many enemies in his tenure at OpenAI. One of them is Elon Musk, who feels betrayed by OpenAI, and has filed failed lawsuits against the company. I previously wrote this off as Musk considering the org too "woke", but Altman's recent behavior has made me wonder if it was more of a personal betrayal. Altman has taken Musk's money, intended for an AI safety non-profit, and is currently converting it into enormous personal equity. All the while de-emphasizing AI safety research.

Musk now has the ear of the President-elect. Vice-President-elect JD Vance is also associated with Peter Thiel, whose ties with Musk go all the way back to PayPal. Has there been any analysis on the impact this may have on OpenAI's ongoing restructuring? What might happen if the DOJ turns hostile?

viliam on Why would ASI share any resources with us?

In this scenario, why would ASI not do either one of the following things: 1) Exploit humans in pursuit of its own goals, while giving us the barest minimum to survive (effectively making us slaves) or 2) Take over the resources of the entire solar system for itself and leave us starving without any resources?

The ASI will do what it is programmed to do. If it means helping humans, it will help humans. If there is a bug in the program, it will... do something that is difficult to predict (and that sounds scary, because most random things are not good).

Make us slaves? We probably wouldn't be useful slaves, compared to alternatives, such as robots, or human bodies with brains replaced by computers.

Taking over the resources probably means killing us in the process, if those resources include e.g. water or oxygen of Earth.

atillayasar on AtillaYasar's Shortform

When is philosophy useful?

Meta

This post is useful to me because 1) it helped me think more clearly about whether and how exactly philosophy is useful, 2) I can read it later and again get the benefits of (1).

The problem

Doing philosophy and reading/writing is often so impractical, people do it just for the sake of doing it. When you write or read a bunch of stuff about X, get categories and lists and definitions, it feels like you're making progress on X, but are you really?

Joseph Goldstein (meditation teacher) at the beginning of his lecture about Mindfulness, jokes that they'll do an hour and a half of meditation, then after pausing for laughter points out that that would actually be more useful than anything he could say on the subject.

Criteria

The way to tell if philosophy is useful is, if it actually influences the future, if you:
- directly use the information for an action or decision
- use the material OR the wordless intuitions gained from the material in your future thinking (times the probability that you'll use it for a future action or decision)
- refute the bad ideas that you read or write (pruning is progress too!)

(slight caveat: reading and writing and thinking, makes you better at those things and it creates positive habits, even if it's not "object level useful". But still! I want to train my skills while attempting to be productive -- I don't take my mental energy for granted.)

Lost in time / deeper mental models / information bottlenecks

One really simplified way to measure utility, is whether you can remember philosophy that you did. But there's something way deeper to this. If you force yourself to do useful philosophy, given that you can't remember a lot, a solution that arises naturally is that you create deeper, or higher level, representations of things that you know.

The more abstract they get, the more information they can capture.

A way to view it (basically compression and indexing lol):
you simply replace the "main table" that was previously storing the information, with an index that point to the information (because the "main table" runs out of space, and then later an index to the indices, etc., where the main table gets increasingly abstract as the number of "leaves" at the end of the node (meaning explicit ideas and pieces of content and pieces of reasoning), keep increasing. But the "main table" is what you're mostly working with, in terms of what you perceive as your thoughts. So from your perspective, as you learn, you are simply getting increasingly abstract representations, and it takes longer to retrieve information or to put into words what you think you know, because you are spending more cycles on traversing and indexing the graph and translating between forms of representation. (Memory being reconstructive is loosely related to this but I haven't dug into that topic at all. Also isn't this analogous to how auto-encoders work?)

(Balaji Srinivasan about memory management, paraphrase: "if you just have a giant mental tree that you attach concepts to over time, you can have compounding learning and you can remember everything because it is all connected" -- similar to memory palace)

Format / readability / retrievability

This is subtle and hard to put into words but is in practice very impactful and keeps surprising me, which is that, how easy things are to find and how easy it is to read them once you find them, matters a lot. If you have a diary in Notepad++, it's essentially a flat list, which is super annoying to retrieve things from, since you can only scroll or do ctrl-f. Fancier systems can make retrieval easier and they allow formatting, but require more initial energy to start writing notes in.

Youtube comments contain almost zero back-and-forth, Twitter has more, Reddit has a lot more, 4chan has longer chains than Reddit but is weaker in terms of structure.

LessWrong is actually really good for this, because you can read a preview on-hover, and do it recursively for preview panels -- Gwern-style. It has great formatting and is pleasant to edit. Ease of discussion via comments is better than all the above, except Reddit.

About retrieval on LW: this is my first time seeing it, but this is actually pretty cool [? · GW] -- though it's still not super great. Because retrieval is very much a UI/UX problem that is super limited by this existing on the browser. Notice how you can't use the keyboard to affect how results are presented, affect filter/sort or change page -- this goes for any search engine I've ever seen.

It's hard to make dynamically changing and editable UIs. You can design nice UIs, but you can't really design a superset of UIs that is traversable with hotkeys. Because it has to work on, like, 9000 different browsers and devices and screen sizes.

Retrievability transformed programming

I believe that Substack+Google deeply transformed what it means to be a programmer. Think of the meme about expectation vs reality, that programmers spend almost all their time Googling things. It's because the search engine is so powerful, that it outsourced almost all of the memorization requirement to, Googling skills + ability to parse Substack posts + trying out suggestions + figuring out whether a suggestion will work for you and how to reshape it to your codebase.