LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Vacuum: Theory and Technologies
ethanmorse · 2024-01-21T17:23:49.257Z · comments (0)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery (arjun-panickssery) · 2024-01-15T21:21:03.962Z · comments (0)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

One True Love
Zvi · 2024-02-09T15:10:05.298Z · comments (7)

Experimentation (Part 7 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-18T21:25:56.527Z · comments (0)

Monthly Roundup #16: March 2024
Zvi · 2024-03-19T13:10:05.529Z · comments (4)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

Basics of Handling Disagreements with People
Camille Berger (Camille Berger) · 2024-11-12T17:55:08.143Z · comments (4)

[question] Feedback request: what am I missing?
Nathan Helm-Burger (nathan-helm-burger) · 2024-11-02T17:38:39.625Z · answers+comments (5)

What AI companies should do: Some rough ideas
Zach Stein-Perlman · 2024-10-21T14:00:10.412Z · comments (10)

Empathy/Systemizing Quotient is a poor/biased model for the autism/sex link
tailcalled · 2024-11-04T21:11:57.788Z · comments (0)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (5)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

[link] NAO Updates, Fall 2024
jefftk (jkaufman) · 2024-10-18T00:00:04.142Z · comments (2)

[link] Concrete benefits of making predictions
Jonny Spicer (jonnyspicer) · 2024-10-17T14:23:17.613Z · comments (5)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (112)

Intent alignment as a stepping-stone to value alignment
Seth Herd · 2024-11-05T20:43:24.950Z · comments (4)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (12)

DunCon @Lighthaven
Duncan Sabien (Deactivated) (Duncan_Sabien) · 2024-09-29T04:56:27.205Z · comments (0)

An argument that consequentialism is incomplete
cousin_it · 2024-10-07T09:45:12.754Z · comments (27)

Housing Roundup #10
Zvi · 2024-10-29T13:50:09.416Z · comments (2)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (30)

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

[link] the subreddit size threshold
bhauth · 2024-01-23T00:38:13.747Z · comments (3)

A quick experiment on LMs’ inductive biases in performing search
Alex Mallen (alex-mallen) · 2024-04-14T03:41:08.671Z · comments (2)

Monthly Roundup #13: December 2023
Zvi · 2023-12-19T15:10:08.293Z · comments (5)

[link] Why you, personally, should want a larger human population
jasoncrawford · 2024-02-23T19:48:10.526Z · comments (32)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

notfnofn on notfnofn's Shortform

source seems genuine: https://old.reddit.com/r/artificial/comments/1gq4acr/gemini_told_my_brother_to_die_threatening/lwv84fr/?context=3 but I'm less sure now

lc on Shortform

The greatest strategy for organizing vast conspiracies is usually failing to realize that what you're doing is illegal.

vladimir_nesov on O O's Shortform

for anything related to human judgement, in theory this isn’t why it’s not doing well

The facts are in there, but not in the form of a sufficiently good reward model that can tell as well as human experts which answer is better or whether a step of an argument is valid. In the same way, RLHF is still better with humans on some queries, hasn't been fully automated to superior results by replacing humans with models in all cases.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

The Padding Argument or Simplicity = Degeneracy

[I learned this argument from Lucius Bushnaq and Matthias Dellago. It is also latent already in Solomonoff's original work]

Consider binary strings of a fixed length

Imagine feeding these strings into some turing machine; we think of strings as codes for a function. Suppose we have a function that can be coded by a short compressed string $s$ of length $k << L$ . That is, the function is computable by a small program.

Imagine uniformly sampling a random code for ${0, 1}^{L}$ . What fraction of the codes implement the same function as the string $s$ ? It's close to $2^{L - k}$ . Indeed, given the string $s$ of length $k$ we can 'pad' it to a string of length $L$ by writing the code

"run $s$ skip $t$ "

where $t$ is an arbitrary string of length $L - k - c$ where $c$ is a small constant accounting for the overhead. There are approximately $2^{L - k}$ of such binary strings. If our programming language has a simple skip / commenting out functionality then we expect approximately $2^{L - k}$ codes encoding the same function as $s$ .

I find this truly remarkable: the degeneracy or multiplicity is inversely exponentially proportional to the minimum description length of the function!

Just by sampling codes uniformly at random we get the Simplicity prior!!

Why do Neural Networks work? Why do polynomials not work?

It is sometimes claimed that neural networks work well because they are 'Universal Approximators'. There are multiple problems with this explanation, see e.g. here [LW · GW] but a very basic problem is that being a universal approximaton is very common. Polynomials are universal approximators!

Many different neural network architectures work. In the limit of large data, compute the difference of different architectures start to vanish and very general scaling laws dominate. This is not the case for polynomials.

Degeneracy=Simplicity explains why: polynomials are uniquely tied down by their coefficients, so a learning machine that tries to fit polynomials is does not have a 'good' simplicity bias that approximates the Solomonoff prior.

The lack of degeneracy applies to any set of functions that form an orthogonal basis. This is because the decomposition is unique. So there is no multiplicity and no implicit regularization/ simplicity bias.

[I learned this elegant argument from Lucius Bushnaq.]

The Singular Learning Theory and Algorithmic Information Theory crossover

I described the padding argument as an argument not a proof. That's because technically it only gives a lower bound on the number of codes equivalent to the minimal description code. The problem is there are pathological examples where the programming language (e.g. the UTM) hardcodes that all small codes $s$ encode a single function $f$ .

When we take this problem into account the Padding Argument is already in Solomonoff's original work. There is a theorem that states that the Solomonoff prior is equivalent to taking a suitable Universal Turing Machine and feeding in a sequence of (uniformly) random bits and taking the resulting distribution. To account for the pathological examples above everything is asymptotic and up to some constant like all results in algorithmic information theory. This means that like all other results in algorithmic information theory it's unclear whether it is at all relevant in practice.

However, while this gives a correct proof I think this understates the importance of the Padding argument to me. That's because I think in practice we shouldn't expect the UTM to be pathological in this way. In other words, we should heuristically expect the simplicity $K (f)$ to be basically proportional to the fraction of codes yielding $f$ for a large enough (overparameterized) architecture.

The bull case for SLT is now: there is a direct equality between algorithmic complexity and the degeneracy. This has always been SLT dogma of course but until I learned about this argument it wasn't so clear to me how direct this connection was. The algorithmic complexity can be usefully approximated by the (local) learning coefficient $λ$ !

The bull case for algorithmic information: the theory of algorithmic information, Solomonoff induction, AIXI etc is very elegant and in some sense gives answers to fundamental questions we would like to answer. The major problem was that it is both uncomputable and seemingly intractable. Uncomputability is perhaps not such a problem - uncomputability often arises from measure zero highly adversarial examples. But tractability is very problematic. We don't know how tractable compression is, but it's likely untractable. However, the Padding argument suggests that we should heuristically expect the simplicity $K (f)$ to be basically proportional to the fraction of codes yielding $f$ for a large enough (overparameterized) architecture - in other words it can be measured by the

Do Neural Networks actually satisfy the Padding argument?

Short answer: No.

Long answer: Unclear. maybe... sort of... and the difference might itself be very interesting...!

Stay tuned.

themanxloiner on Scattered thoughts on what it means for an LLM to believe

But in this Eiffel Tower example, I’m not sure what is correlating with what

The physical object Eiffel Tower is correlated with itself.

However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions.

It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.

I think the question about whether current AI systems have real goals and beliefs does indeed matter

I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has 'real' beliefs, and there are other threats/solutions where it does matter.

I think CGP Grey perspective puts more weight on Definition 3.

I actually do not understand the distinction between Definition 2 and Definition 3. Don't need to resolve it here. I've editted post to include my uncertainty on this.

algon on Announcing turntrout.com, my new digital home

It's a beautiful website. I'm sad to see you go. I'm excited to see you write more.

d0themath on Alexander Gietelink Oldenziel's Shortform

I have found that they mirror you. If you talk to them like a real person, they will act like a real person. Call them (at least Claude) out on their corporate-speak and cheesy stereotypes in the same way you would a person scared to say what they really think.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

Neural Network have a bias towards Highly Decomposable Functions.

tl;dr Neural networks favor functions that can be "decomposed" into a composition of simple pieces in many ways - "highly decomposable functions".

Degeneracy = bias under uniform prior

[see here [LW(p) · GW(p)]for why I think bias under the uniform prior is important]

Consider a space of parameters used to implement functions, where each element $w \in W$ specifies a function $f_{w} : X \to Y$ via some map $π$ . Here, the set $W$ is our parameter space, and we can think of each $w$ as representing a specific configuration of the neural network that yields a particular function $f_{w}$ .

The mapping $π$ assigns each point $w \in W$ to a function $f_{w}$ . Due to redundancies and symmetries in parameter space, multiple configurations $w$ might yield the same function, forming what we call a fiber, or the "set of degenerates." of $f$ $π^{- 1} (f) = {w \in W | π (w) = f_{w} = f}$

This fiber is the set of ways in which the same functional behavior can be achieved by different parameterizations. If we uniformly sample from codes, the degeneracy of a function $f$ counts how likely it is to be sampled.

The Bias Toward Decomposability

Consider a neural network architecture built out of $l$ layers. Mathematically, we can decompose the parameter space $W$ as a product:

$W = W_{1} \times W_{2} \times . . . \times W_{l},$

where each $W_{i}$ represents parameters for a particular layer. The function implemented by the network, $f_{w}$ , is then a composition:

$f_{w} = f_{w_{1}} \circ f_{w_{2}} \circ . . . \circ f_{w_{l}}$

For a function $f$ its degeneracy (or the number of ways to parameterize it) is

$| π^{- 1} (f) | = \sum_{(f_{1}, . . ., f_{l}) \in V (f)} | π^{- 1} (f_{1}) | \cdot | π^{- 1} (f_{2}) | \cdot . . . \cdot | π^{- 1} (f_{l}) |$ .

Here, $V (f)$ is the set of all possible decompositions $f = f_{1} \circ f_{2} \circ . . . \circ f_{l}$ , of $f$ .

That means that functions that have many such decompositions are more likely to be sampled.

In summary, the layered design of neural networks introduces an implicit bias toward highly decomposable functions.

martin-randall on The Point of Trade

Them: The point of trade is that there are increasing marginal returns to production and diminishing marginal returns to consumption. We specialize in producing different goods, then trade to consume a diverse set of goods that maximizes utility.

Myself: Suppose there were no production possible, just some cosmic endowment of goods that are gradually consumed until everyone dies. Have we gotten rid of the point of trade?

Them: Well if people had different cosmic endowments then they would still trade to get a more balanced set to consume, due to diminishing marginal returns to consumption.

Myself: What if everyone has exactly the same cosmic endowment? And for good measure there are no diminishing returns, the tenth apple produces as much utility as the first.

Them: Well then there's no trade, what's the point? We just consume our cosmic endowment until we run out and die.

Myself: What if I like oranges more than apples, and you like apples more than oranges?

Them: Oh. I can trade one of my oranges for one of your apples, and we will both be better off. Darn it.

themanxloiner on Are we dropping the ball on Recommendation AIs?

Zvi's latest newsletter has a section on this topic! https://thezvi.substack.com/i/151331494/good-advice