LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Come to Manifest 2024 (June 7-9 in Berkeley)
Saul Munn (saul-munn) · 2024-03-27T21:30:17.306Z · comments (2)

[Closed] PIBBSS is hiring in a variety of roles (alignment research and incubation program)
Nora_Ammann · 2024-04-09T08:12:59.241Z · comments (0)

Dating Roundup #2: If At First You Don’t Succeed
Zvi · 2024-01-02T16:00:04.955Z · comments (29)

[link] Google Gemini Announced
Jacob G-W (g-w1) · 2023-12-06T16:14:07.192Z · comments (22)

[link] On the Role of Proto-Languages
adamShimi · 2024-09-22T16:50:34.720Z · comments (1)

[link] How Likely Are Various Precursors of Existential Risk?
NunoSempere (Radamantis) · 2024-10-28T13:27:31.620Z · comments (4)

Safe Stasis Fallacy
Davidmanheim · 2024-02-05T10:54:44.061Z · comments (2)

AI #44: Copyright Confrontation
Zvi · 2023-12-28T14:30:10.237Z · comments (13)

Thiel on AI & Racing with China
Ben Pace (Benito) · 2024-08-20T03:19:18.966Z · comments (10)

On “first critical tries” in AI alignment
Joe Carlsmith (joekc) · 2024-06-05T00:19:02.814Z · comments (8)

AI #50: The Most Dangerous Thing
Zvi · 2024-02-08T14:30:13.168Z · comments (4)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

[link] S-Risks: Fates Worse Than Extinction
aggliu · 2024-05-04T15:30:36.666Z · comments (2)

Causal Graphs of GPT-2-Small's Residual Stream
David Udell · 2024-07-09T22:06:55.775Z · comments (7)

AI #76: Six Shorts Stories About OpenAI
Zvi · 2024-08-08T13:50:04.659Z · comments (10)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (10)

[link] Breaking Circuit Breakers
mikes · 2024-07-14T18:57:20.251Z · comments (13)

2022 (and All Time) Posts by Pingback Count
Raemon · 2023-12-16T21:17:00.572Z · comments (14)

Acting Wholesomely
owencb · 2024-02-26T21:49:16.526Z · comments (64)

AI #40: A Vision from Vitalik
Zvi · 2023-11-30T17:30:08.350Z · comments (12)

We are headed into an extreme compute overhang
devrandom · 2024-04-26T21:38:21.694Z · comments (33)

[question] Can we get an AI to "do our alignment homework for us"?
Chris_Leong · 2024-02-26T07:56:22.320Z · answers+comments (33)

Fat Tails Discourage Compromise
niplav · 2024-06-17T09:39:16.489Z · comments (5)

[link] LLMs seem (relatively) safe
JustisMills · 2024-04-25T22:13:06.221Z · comments (24)

Per protocol analysis as medical malpractice
braces · 2024-01-31T16:22:21.367Z · comments (8)

Trading off Lives
jefftk (jkaufman) · 2024-01-03T03:40:05.603Z · comments (12)

Be More Katja
Nathan Young · 2024-03-11T21:12:14.249Z · comments (0)

AI #87: Staying in Character
Zvi · 2024-10-29T07:10:08.212Z · comments (3)

AI #71: Farewell to Chevron
Zvi · 2024-07-04T13:40:05.905Z · comments (9)

Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask (patrickleask) · 2024-08-17T01:16:53.764Z · comments (0)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

The case for stopping AI safety research
catubc (cat-1) · 2024-05-23T15:55:18.713Z · comments (38)

Announcing the Double Crux Bot
sanyer (santeri-koivula) · 2024-01-09T18:54:15.361Z · comments (8)

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

AI #45: To Be Determined
Zvi · 2024-01-04T15:00:05.936Z · comments (4)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (23)

Was Releasing Claude-3 Net-Negative?
Logan Riggs (elriggs) · 2024-03-27T17:41:56.245Z · comments (5)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (14)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

themanxloiner on Scattered thoughts on what it means for an LLM to believe

But in this Eiffel Tower example, I’m not sure what is correlating with what

The physical object Eiffel Tower is correlated with itself.

However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions.

It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.

I think the question about whether current AI systems have real goals and beliefs does indeed matter

I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has 'real' beliefs, and there are other threats/solutions where it does matter.

I think CGP Grey perspective puts more weight on Definition 3.

I actually do not understand the distinction between Definition 2 and Definition 3. Don't need to resolve it here. I've editted post to include my uncertainty on this.

algon on Announcing turntrout.com, my new digital home

It's a beautiful website. I'm sad to see you go. I'm excited to see you write more.

d0themath on Alexander Gietelink Oldenziel's Shortform

I have found that they mirror you. If you talk to them like a real person, they will act like a real person. Call them (at least Claude) out on their corporate-speak and cheesy stereotypes in the same way you would a person scared to say what they really think.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Neural Network have a bias towards Highly Decomposable Functions.

tl;dr Neural networks favor functions that can be "decomposed" into a composition of simple pieces in many ways - "highly decomposable functions".

Degeneracy = bias under uniform prior

Consider a space of parameters used to implement functions, where each element $w \in W$ specifies a function $f_{w} : X \to Y$ via some map $π$ . Here, the set $W$ is our parameter space, and we can think of each $w$ as representing a specific configuration of the neural network that yields a particular function $f_{w}$ .

The mapping $π$ assigns each point $w \in W$ to a function $f_{w}$ . Due to redundancies and symmetries in parameter space, multiple configurations $w$ might yield the same function, forming what we call a fiber, or the "set of degenerates." of $f$ $π^{- 1} (f) = {w \in W | π (w) = f_{w} = f}$

This fiber is the set of ways in which the same functional behavior can be achieved by different parameterizations. If we uniformly sample from codes, the degeneracy of a function $f$ counts how likely it is to be sampled.

The Bias Toward Decomposability

Consider a neural network architecture built out of $l$ layers. Mathematically, we can decompose the parameter space $W$ as a product:

$W = W_{1} \times W_{2} \times . . . \times W_{l},$

where each $W_{i}$ represents parameters for a particular layer. The function implemented by the network, $f_{w}$ , is then a composition:

$f_{w} = f_{w_{1}} \circ f_{w_{2}} \circ . . . \circ f_{w_{l}}$

For a function $f$ its degeneracy (or the number of ways to parameterize it) is

$| π^{- 1} (f) | = \sum_{(f_{1}, . . ., f_{l}) \in V (f)} | π^{- 1} (f_{1}) | \cdot | π^{- 1} (f_{2}) | \cdot . . . \cdot | π^{- 1} (f_{l}) |$ .

Here, $V (f)$ is the set of all possible decompositions $f = f_{1} \circ f_{2} \circ . . . \circ f_{l}$ , of $f$ .

That means that functions that have many such decompositions are more likely to be sampled.

In summary, the layered design of neural networks introduces an implicit bias toward highly decomposable functions.

martin-randall on The Point of Trade

Them: The point of trade is that there are increasing marginal returns to production and diminishing marginal returns to consumption. We specialize in producing different goods, then trade to consume a diverse set of goods that maximizes utility.

Myself: Suppose there were no production possible, just some cosmic endowment of goods that are gradually consumed until everyone dies. Have we gotten rid of the point of trade?

Them: Well if people had different cosmic endowments then they would still trade to get a more balanced set to consume, due to diminishing marginal returns to consumption.

Myself: What if everyone has exactly the same cosmic endowment? And for good measure there are no diminishing returns, the tenth apple produces as much utility as the first.

Them: Well then there's no trade, what's the point? We just consume our cosmic endowment until we run out and die.

Myself: What if I like oranges more than apples, and you like apples more than oranges?

Them: Oh. I can trade one of my oranges for one of your apples, and we will both be better off. Darn it.

themanxloiner on Are we dropping the ball on Recommendation AIs?

Zvi's latest newsletter has a section on this topic! https://thezvi.substack.com/i/151331494/good-advice

dakara on AI Control: Improving Safety Despite Intentional Subversion

I have thought about the issue that you are outlining here and I can't see any possible solutions. Paired with steganography, that strategy indeed undermines that authors' strategy as far as I can see. I haven't been able to come up with any honeytraps either. This seems like a really big problem.

Am I missing something? Maybe one of the authors of the paper can clarify this issue?

daniel-tan on Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data

I don't have a strong take haha. I'm just expressing my own uncertainty.

Here's my best reasoning: Under Bayesian reasoning, a sufficiently small posterior probability would be functionally equivalent to impossibility (for downstream purposes anyway). If models reason in a Bayesian way then we wouldn't expect the deductive and abductive experiments discussed above to be that different (assuming the abductive setting gave the model sufficient certainty over the posterior).

But I guess this could still be a good indicator of whether models do reason in a Bayesian way. So maybe still worth doing? Haven't thought about it much more than that, so take this w/ a pinch of salt.

sohaib-imran on Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data

Ah I see ur point. Yh I think that’s a natural next step. Why do you think it not very interesting to investigate? Being able to make very accurate inferences given the evidence at hand seems important for capabilities, including alignment relevant ones?

turntrout on Announcing turntrout.com, my new digital home

Historically, I've found that LW comments have been a source of anxious and/or irritated rumination. That's why I mostly haven't commented this year. I'll write more about this in another post.

If I write these days, I generally don't read replies. (Again, excepting certain posts; and I'm always reachable via email and enjoy thoughtful discussions :) )