LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (9)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

Reflections on my first year of AI safety research
Jay Bailey · 2024-01-08T07:49:08.147Z · comments (3)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (26)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (13)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (14)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith (joekc) · 2024-10-28T21:57:12.063Z · comments (5)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Polysemantic Attention Head in a 4-Layer Transformer
Jett Janiak (jett) · 2023-11-09T16:16:35.132Z · comments (0)

The Assumed Intent Bias
silentbob · 2023-11-05T16:28:03.282Z · comments (13)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

Claude Sonnet 3.5.1 and Haiku 3.5
Zvi · 2024-10-24T14:50:06.286Z · comments (9)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (11)

[link] The Evals Gap
Marius Hobbhahn (marius-hobbhahn) · 2024-11-11T16:42:46.287Z · comments (7)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

Book Review: Righteous Victims - A History of the Zionist-Arab Conflict
Yair Halberstadt (yair-halberstadt) · 2024-06-24T11:02:03.490Z · comments (8)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

Cooperating with aliens and AGIs: An ECL explainer
Chi Nguyen · 2024-02-24T22:58:47.345Z · comments (8)

OpenAI-Microsoft partnership
Zach Stein-Perlman · 2023-10-03T20:01:44.795Z · comments (19)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

So you want to work on technical AI safety
gw · 2024-06-24T14:29:57.481Z · comments (3)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

[link] How to Eradicate Global Extreme Poverty [RA video with fundraiser!]
aggliu · 2023-10-18T15:51:22.073Z · comments (5)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

Changes in College Admissions
Zvi · 2024-04-24T13:50:03.487Z · comments (11)

Toy models of AI control for concentrated catastrophe prevention
Fabien Roger (Fabien) · 2024-02-06T01:38:19.865Z · comments (2)

When to Get the Booster?
jefftk (jkaufman) · 2023-10-03T21:00:12.813Z · comments (15)

n of m ring signatures
DanielFilan · 2023-12-04T20:00:06.580Z · comments (7)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

Gemini 1.0
Zvi · 2023-12-07T14:40:05.243Z · comments (7)

Sherlockian Abduction Master List
Cole Wyeth (Amyr) · 2024-07-11T20:27:00.000Z · comments (63)

GPT-2030 and Catastrophic Drives: Four Vignettes
jsteinhardt · 2023-11-10T07:30:06.480Z · comments (5)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

[link] Announcing Human-aligned AI Summer School
Jan_Kulveit · 2024-05-22T08:55:10.839Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

Under them the chance for you to find yourself in a branch where all coins are Heads is 1/128, but your over chance to survive is 100%. Therefore the low chance of failed execution doesn't matter, quantum immortality will "increase" the probability to 1

You are right, and it's a serious counterargument to consider. Actually, I invented path-dependent identity as a counterargument to Miller's thought experiment.

You are also right that the Anthropic Trilemma and Magic by Forgetting do not work with path-dependent identity.

However, we can almost recreate the magic machine from the Anthropic Trilemma using path-based identity:

Imagine that I want to guess in which room I will be if there are two copies of me in the future, red or green.

I go into a dream. A machine creates my copy and then one more copy of that copy, which will result in 1/4 and 1/4 chances each. The second copy then merges with the first one, so we end up with only two copies, but I have a 3/4 chance to be the first one and 1/4 to be the second. So we've basically recreated a machine that can manipulate probabilities and got magic back.

The main problem of path-dependent identity is that we assume the existence of a "global hidden variable" for any observer. It is hidden as it can't be measured by an outside viewer and only represents the subjective chances of the observer to be one copy and not another. And it is global as it depends on the observer's path, not their current state. It therefore contradicts the view that mind is equal to a Turing computer (functionalism) and requires the existence of some identity carrier which moves through paths (qualia, quantum continuity, or soul).

Also, path-dependent identity opens the door to back-causation and premonition, because if we normalize outputs of some black box where paths are mixed, similar to the magic machine discussed above, we get a shift in its input probability distribution in the past. This becomes similar to the 'timeline selection principle' (which I discussed in a longer version of this blog post but cut to fit format) in which not observer-moments are selected, but the whole timelines without updating on my position in the timeline. This idea formalizes the future anthropic shadow as I am more likely to be in the timeline that is fattest and longest in the future.

daniel-tan on Daniel Tan's Shortform

Interpretability needs a good proxy metric

I’m concerned that progress in interpretability research is ephemeral, being driven primarily by proxy metrics that may be disconnected from the end goal (understanding by humans). (Example: optimising for the L0 metric in SAE interpretability research may lead us to models that have more split features, even when this is unintuitive by human reckoning.)

It seems important for the field to agree on some common benchmark / proxy metric that is proven to be indicative of downstream human-rated interpretability, but I don’t know of anyone doing this. Similar to the role of BLEU in facilitating progress in NLP, I imagine having a standard metric would enable much more rapid and concrete progress in interpretability.

daniel-murfet on Alexander Gietelink Oldenziel's Shortform

Re: the SLT dogma.

For those interested, a continuous version of the padding argument is used in Theorem 4.1 of Clift-Murfet-Wallbridge to show that the learning coefficient is a lower bound on the Kolmogorov complexity (in a sense) in the setting of noisy Turing machines. Just take the synthesis problem to be given by a TM's input-output map in that theorem. The result is treated in a more detailed way in Waring's thesis (Proposition 4.19). Noisy TMs are of course not neural networks, but they are a place where the link between the learning coefficient in SLT and algorithmic information theory has already been made precise.

For what it's worth, as explained in simple versus short [LW · GW], I don't actually think the local learning coefficient is algorithmic complexity (in the sense of program length) in neural networks, only that it is a lower bound. So I don't really see the LLC as a useful "approximation" of the algorithmic complexity.

For those wanting to read more about the padding argument in the classical setting, Hutter-Catt-Quarel "An Introduction to Universal Artificial Intelligence" has a nice detailed treatment.

abandon on Alexander Gietelink Oldenziel's Shortform

Edit: ChatGPT and Claude are both fine IMO. Claude has a better ear for language, but ChatGPT's memory is very useful for letting you save info about your preferences, so I'd say they come out about even.
For ChatGPT in particular, you'll want to put whatever prompt you ultimately come up with into your custom instructions or its memory; that way all new conversations will start off pre-prompted.

In addition to borrowing others' prompts as Nathan suggested, try being more specific about what you want (e.g., 'be concise, speak casually and use lowercase, be sarcastic if i ask for something you can't help with'), and (depending on the style) providing examples (ETA: e.g., for poetry I'll often provide whichever llm with a dozen of my own poems in order to get something like my style back out). (Also, for style prompting, IME 'write in a pastiche of [author]' seems more powerful than just 'write like [author]', though YMMV).

notfnofn on notfnofn's Shortform

source seems genuine: https://old.reddit.com/r/artificial/comments/1gq4acr/gemini_told_my_brother_to_die_threatening/lwv84fr/?context=3 but I'm less sure now

lc on Shortform

The greatest strategy for organizing vast conspiracies is usually failing to realize that what you're doing is illegal.

vladimir_nesov on O O's Shortform

for anything related to human judgement, in theory this isn’t why it’s not doing well

The facts are in there, but not in the form of a sufficiently good reward model that can tell as well as human experts which answer is better or whether a step of an argument is valid. In the same way, RLHF is still better with humans on some queries, hasn't been fully automated to superior results by replacing humans with models in all cases.

alexander-gietelink-oldenziel on Alexander Gietelink Oldenziel's Shortform

The Padding Argument or Simplicity = Degeneracy

[I learned this argument from Lucius Bushnaq and Matthias Dellago. It is also latent already in Solomonoff's original work]

Consider binary strings of a fixed length

Imagine feeding these strings into some turing machine; we think of strings as codes for a function. Suppose we have a function that can be coded by a short compressed string $s$ of length $k << L$ . That is, the function is computable by a small program.

Imagine uniformly sampling a random code for ${0, 1}^{L}$ . What fraction of the codes implement the same function as the string $s$ ? It's close to $2^{L - k}$ . Indeed, given the string $s$ of length $k$ we can 'pad' it to a string of length $L$ by writing the code

"run $s$ skip $t$ "

where $t$ is an arbitrary string of length $L - k - c$ where $c$ is a small constant accounting for the overhead. There are approximately $2^{L - k}$ of such binary strings. If our programming language has a simple skip / commenting out functionality then we expect approximately $2^{L - k}$ codes encoding the same function as $s$ .

I find this truly remarkable: the degeneracy or multiplicity is inversely exponentially proportional to the minimum description length of the function!

Just by sampling codes uniformly at random we get the Simplicity prior!!

Why do Neural Networks work? Why do polynomials not work?

It is sometimes claimed that neural networks work well because they are 'Universal Approximators'. There are multiple problems with this explanation, see e.g. here [LW · GW] but a very basic problem is that being a universal approximaton is very common. Polynomials are universal approximators!

Many different neural network architectures work. In the limit of large data, compute the difference of different architectures start to vanish and very general scaling laws dominate. This is not the case for polynomials.

Degeneracy=Simplicity explains why: polynomials are uniquely tied down by their coefficients, so a learning machine that tries to fit polynomials is does not have a 'good' simplicity bias that approximates the Solomonoff prior.

The lack of degeneracy applies to any set of functions that form an orthogonal basis. This is because the decomposition is unique. So there is no multiplicity and no implicit regularization/ simplicity bias.

[I learned this elegant argument from Lucius Bushnaq.]

The Singular Learning Theory and Algorithmic Information Theory crossover

I described the padding argument as an argument not a proof. That's because technically it only gives a lower bound on the number of codes equivalent to the minimal description code. The problem is there are pathological examples where the programming language (e.g. the UTM) hardcodes that all small codes $s$ encode a single function $f$ .

When we take this problem into account the Padding Argument is already in Solomonoff's original work. There is a theorem that states that the Solomonoff prior is equivalent to taking a suitable Universal Turing Machine and feeding in a sequence of (uniformly) random bits and taking the resulting distribution. To account for the pathological examples above everything is asymptotic and up to some constant like all results in algorithmic information theory. This means that like all other results in algorithmic information theory it's unclear whether it is at all relevant in practice.

However, while this gives a correct proof I think this understates the importance of the Padding argument to me. That's because I think in practice we shouldn't expect the UTM to be pathological in this way. In other words, we should heuristically expect the simplicity $K (f)$ to be basically proportional to the fraction of codes yielding $f$ for a large enough (overparameterized) architecture.

The bull case for SLT is now: there is a direct equality between algorithmic complexity and the degeneracy. This has always been SLT dogma of course but until I learned about this argument it wasn't so clear to me how direct this connection was. The algorithmic complexity can be usefully approximated by the (local) learning coefficient $λ$ !

EDIT: see Clift-Murfet-Wallbridge and Tom Warings thesis for more. See below, thanks Dan

The bull case for algorithmic information: the theory of algorithmic information, Solomonoff induction, AIXI etc is very elegant and in some sense gives answers to fundamental questions we would like to answer. The major problem was that it is both uncomputable and seemingly intractable. Uncomputability is perhaps not such a problem - uncomputability often arises from measure zero highly adversarial examples. But tractability is very problematic. We don't know how tractable compression is, but it's likely untractable. However, the Padding argument suggests that we should heuristically expect the simplicity $K (f)$ to be basically proportional to the fraction of codes yielding $f$ for a large enough (overparameterized) architecture - in other words it can be measured by the

Do Neural Networks actually satisfy the Padding argument?

Short answer: No.

Long answer: Unclear. maybe... sort of... and the difference might itself be very interesting...!

Stay tuned.

themanxloiner on Scattered thoughts on what it means for an LLM to believe

But in this Eiffel Tower example, I’m not sure what is correlating with what

The physical object Eiffel Tower is correlated with itself.

However, I think the basic ability of an LLM to correctly complete the sentence “the Eiffel Tower is in the city of…” is not very strong evidence of having the relevant kinds of dispositions.

It is highly predictive of the ability of the LLM to book flights to Paris, when I create an LLM-agent out of it and ask it to book a trip to see the Eiffel Tower.

I think the question about whether current AI systems have real goals and beliefs does indeed matter

I dont think we disagree here. To clarify, my belief is there are threat models / solutions that are not affected by whether the AI has 'real' beliefs, and there are other threats/solutions where it does matter.

I think CGP Grey perspective puts more weight on Definition 3.

I actually do not understand the distinction between Definition 2 and Definition 3. Don't need to resolve it here. I've editted post to include my uncertainty on this.

algon on Announcing turntrout.com, my new digital home

It's a beautiful website. I'm sad to see you go. I'm excited to see you write more.