LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Gradient Descent on the Human Brain
Jozdien · 2024-04-01T22:39:24.862Z · comments (5)

[link] The Mysterious Trump Buyers on Polymarket
Annapurna (jorge-velez) · 2024-10-18T13:26:25.565Z · comments (6)

Can we build a better Public Doublecrux?
Raemon · 2024-05-11T19:21:53.326Z · comments (6)

BatchTopK: A Simple Improvement for TopK-SAEs
Bart Bussmann (Stuckwork) · 2024-07-20T02:20:51.848Z · comments (0)

Pseudonymity and Accusations
jefftk (jkaufman) · 2023-12-21T19:20:19.944Z · comments (20)

AI #43: Functional Discoveries
Zvi · 2023-12-21T15:50:04.442Z · comments (26)

The Geometry of Feelings and Nonsense in Large Language Models
7vik (satvik-golechha) · 2024-09-27T17:49:27.420Z · comments (10)

[link] OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns
Seth Herd · 2023-11-20T14:20:33.539Z · comments (28)

A D&D.Sci Dodecalogue
abstractapplic · 2024-04-12T01:10:01.625Z · comments (0)

Parental Writing Selection Bias
jefftk (jkaufman) · 2024-10-13T14:00:03.225Z · comments (3)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-11-07T16:12:20.031Z · comments (20)

Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat · 2023-12-06T08:16:26.846Z · comments (18)

Schelling points in the AGI policy space
mesaoptimizer · 2024-06-26T13:19:25.186Z · comments (2)

Llama Llama-3-405B?
Zvi · 2024-07-24T19:40:07.565Z · comments (9)

The Assumed Intent Bias
silentbob · 2023-11-05T16:28:03.282Z · comments (13)

Model evals for dangerous capabilities
Zach Stein-Perlman · 2024-09-23T11:00:00.866Z · comments (9)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer · 2024-06-17T21:29:08.778Z · comments (11)

[link] Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Gunnar_Zarncke · 2024-05-16T13:09:39.265Z · comments (20)

Will 2024 be very hot? Should we be worried?
A.H. (AlfredHarwood) · 2023-12-29T11:22:50.200Z · comments (12)

Applying refusal-vector ablation to a Llama 3 70B agent
Simon Lermen (dalasnoin) · 2024-05-11T00:08:08.117Z · comments (14)

On OpenAI’s Preparedness Framework
Zvi · 2023-12-21T14:00:05.144Z · comments (4)

[link] The Good Balsamic Vinegar
jenn (pixx) · 2024-01-26T19:30:57.435Z · comments (4)

Provably Safe AI: Worldview and Projects
bgold · 2024-08-09T23:21:02.763Z · comments (43)

Polysemantic Attention Head in a 4-Layer Transformer
Jett Janiak (jett) · 2023-11-09T16:16:35.132Z · comments (0)

On Lex Fridman’s Second Podcast with Altman
Zvi · 2024-03-25T12:20:08.780Z · comments (10)

[link] Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-10-15T16:46:48.727Z · comments (3)

[link] how birds sense magnetic fields
bhauth · 2024-06-27T18:59:35.075Z · comments (4)

[link] Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke · 2024-06-22T07:53:38.989Z · comments (0)

Cooperating with aliens and AGIs: An ECL explainer
Chi Nguyen · 2024-02-24T22:58:47.345Z · comments (8)

The Shutdown Problem: Incomplete Preferences as a Solution
EJT (ElliottThornley) · 2024-02-23T16:01:16.378Z · comments (22)

How to Give in to Threats (without incentivizing them)
Mikhail Samin (mikhail-samin) · 2024-09-12T15:55:50.384Z · comments (25)

[link] Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills · 2024-10-21T01:26:02.030Z · comments (4)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.
Andrew_Critch · 2024-09-11T04:41:24.872Z · comments (7)

[link] Prices are Bounties
Maxwell Tabarrok (maxwell-tabarrok) · 2024-10-12T14:51:40.689Z · comments (12)

Two LessWrong speed friending experiments
mikko (morrel) · 2024-06-15T10:52:26.081Z · comments (3)

Rewilding the Gut VS the Autoimmune Epidemic
GGD · 2024-08-16T18:00:46.239Z · comments (0)

Does literacy remove your ability to be a bard as good as Homer?
Adrià Garriga-alonso (rhaps0dy) · 2024-01-18T03:43:14.994Z · comments (19)

On Overhangs and Technological Change
Roko · 2023-11-05T22:58:51.306Z · comments (19)

Unlearning via RMU is mostly shallow
Andy Arditi (andy-arditi) · 2024-07-23T16:07:52.223Z · comments (3)

[link] Can AI Outpredict Humans? Results From Metaculus's Q3 AI Forecasting Benchmark
ChristianWilliams · 2024-10-10T18:58:46.041Z · comments (2)

[link] A starter guide for evals
Marius Hobbhahn (marius-hobbhahn) · 2024-01-08T18:24:23.913Z · comments (2)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

Applications of Chaos: Saying No (with Hastings Greer)
Elizabeth (pktechgirl) · 2024-09-21T16:30:07.415Z · comments (16)

Apply to the Conceptual Boundaries Workshop for AI Safety
Chipmonk · 2023-11-27T21:04:59.037Z · comments (0)

Paper in Science: Managing extreme AI risks amid rapid progress
JanB (JanBrauner) · 2024-05-23T08:40:40.678Z · comments (2)

AI #52: Oops
Zvi · 2024-02-22T21:50:07.393Z · comments (9)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chipmonk on The hostile telepaths problem

I'm very glad you wrote this

notfnofn on What are some good ways to form opinions on controversial subjects in the current and upcoming era?

I actually just meant sowing discord by pushing half the population towards one and the other half towards the other in cases where it doesn't really affect them, but that's a good point. It's important to not be deceived into thinking issues are complicated when they are really not.

linch on davekasten's Shortform

My guess is that we wouldn't actually know with high confidence before (and likely even some time after) things-will-definitely-be-fine.

E.g. 3 months after safe ASI people might still be publishing their alignment takes.

linch on What are some good ways to form opinions on controversial subjects in the current and upcoming era?

There are also times where "foreign actors" (I assume by that term you mean actors interested in muddying the waters in general, not just literal foreign election interference) know that it's impossible to push a conversation towards their preferred 1)A or 5)B, at least among informed/educated voices, so they try to muddy the waters and push things towards 3). Climate change^[1] and covid vaccines are two examples that comes to mind.

^{^}
Though the correct answer for climate change is closer to 2) than 1)

cole-wyeth on New intro textbook on AIXI

Nice things about the universal distribution underlying AIXI include:

It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.

With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.

chris_leong on avturchin's Shortform

What's ABBYY?

programcrafter on A Semiotic Critique of the Orthogonality Thesis

A goal is, fundamentally, an idea. As the final step in a plan, you can write it out as a symbolic representation of the “world state” you are trying to achieve, although it could represent other things as well. In a planning computer agent, this will probably terminate in a bunch of 1s and 0s stored in its memory.
In order for this symbolic representation to be meaningful, it must be comparable and distinct from other symbolic representations. World state A in the agent's plan could be contrasted from world state B, C and D. This is a very fundamental fact about how information and meaning work, if World State A was indistinguishable from all the others, there would be no reason for the agent to act, because its goal would have been “accomplished”.

This has a logic error. There need not be one best world state, and a world state need not be distinguishable from all others - merely from some of them. (In fact, utility function yielding a real value compresses the world into a characteristic of things we care about in such a way.)

Also, with unbounded computations, utility optimizer could tell supremum (best outcome) for any set of world states you'd provide it; without that, it will have less granularity, work on set of close states (for instance, "easily coming to human mind") or employ other optimization techniques.

I believe this underlies much of the disagreement, because then more knowledge or more intelligence might change only the relations of "final goal" sign but not its meaning (re: isomorphism).

Your series of posts also assume that signs have a fixed order. This is false. For instance, different fields of mathematics treat real number as either first order signs (atomic objects) or higher-order ones, defined as relations on rational numbers.

Or, for an easier example to work on: equality could be a second-order sign "object A is same as object B", or it may be defined using third order expression "for any property P, A and B either both have the property or both not have it". It is no coincidence that those definitions are identical; you cannot assume that if something is expressible using higher order signs, is not also expressible in lower order.

And this might undermine the rest of argument.

Engaging with the perspective of orthogonality thesis itself: rejecting it means that a change in intelligence will lead, in expectation, to change in final goals. Could you name the expected direction of such a change, like "more intelligent agents will act with less kindness"?

avturchin on avturchin's Shortform

Collapse of mega-project to create AI based on linguistics

ABBYY spent 100 million USD for 30 years to create a model of language using hundreds of linguists. It fails to compete with transformers. This month the project was closed. More in Russian here: https://sysblok.ru/blog/gorkij-urok-abbyy-kak-lingvisty-proigrali-poslednjuju-bitvu-za-nlp/

gwern on localdeity's Shortform

Our strategy is for variants to preserve well-defined behavior in the application but introduce diversity in the effect of undefined behavior (such as out-of-bounds accesses).

This Galois work is a lot narrower and targeted at low-level details irrelevant to most code, which thankfully is now written in non-C languages- where out-of-bounds accesses don't pwn your machine and undefined behavior does not summon nasal demons and stuff like ASLR is largely irrelevant.

So AI is wholly necessary for most of the value of such an idea.

And yeah, I think it's a pretty decent idea: with cheap enough LLMs, you can harden applications by sampling possible implementations which pass all unit-tests, and whose final combination pass all end-to-end or integration tests. You can already do this a bit to check things with LLMs being so cheap. (Last night, Achmiz asked a Markov chain question and I was too lazy to try to figure it out myself, so I had ChatGPT solve it 3 ways in R: Monte Carlo, solving the matrix, and proving an exact closed-form probability. The answer could be wrong but that seems unlikely when they all seem to agree. If I wanted to write it up, I'd also have Claude solve it independently in Python so I could cross-check all 6 versions...)

This would help avoid a decent number of logic bugs and oversights, and it would also have some benefits in terms of software engineering: you are getting a lot of automated 'chaos engineering' and unit-test generation and performance benchmarking for free, by distributing a combinatorial number of implementations. It's almost like a mass fuzzing exercise, where the users provide the fuzz.

You might think this would run into issues with tracking the combinatorial number of binaries, which could take up petabytes if you are distributing, say, a 1GB package to 1 million users, but this has plenty of possible fixes: if you are using reproducible builds, as you ought to, then you only need to track a list of the variants for each function and store that per user, and then you can rebuild the exact binary for a given user on-demand.* I think a bigger issue is that forcing diversity out of tuned LLMs is quite hard, and so you would run into the systematic error problem at a higher level: all the tuned LLMs, feeding on each others' outputs & mode-collapsed, will turn in code with the same implicit assumptions & algorithms & bugs, which would mostly defeat the point.

* Similarly, the LLMs are, or should be, deterministic and fixable with a seed. So the overhead here might be something like, if you have a codebase with 10,000 functions, each time you push out a release - which might happen daily or weekly - you store the RNG seed for the LLM snapshot ID (maybe a kilobyte total), generate 2 versions of each function and randomize per user, and track 10,000 bits or ~1kb per user, so if you have a million users that's just a gigabyte. Whenever you need to investigate a specific binary because it triggered a crash or something, you just fetch the LLM ID & RNG, decode the specific 10,000 function variants they used, and compile. For anyone with millions of users who is serious about security, a gigabyte of overhead per release is nothing. You already waste that much with random Docker images and crap.

davekasten on davekasten's Shortform

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...