LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

Monthly Roundup #12: November 2023
Zvi · 2023-11-14T15:20:06.926Z · comments (5)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

Wireheading and misalignment by composition on NetHack
pierlucadoro · 2023-10-27T17:43:41.727Z · comments (4)

Unpicking Extinction
ukc10014 · 2023-12-09T09:15:41.291Z · comments (10)

CHAI internship applications are open (due Nov 13)
Erik Jenner (ejenner) · 2023-10-26T00:53:49.640Z · comments (0)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

[link] Fake Deeply
Zack_M_Davis · 2023-10-26T19:55:22.340Z · comments (7)

An illustrative model of backfire risks from pausing AI research
Maxime Riché (maxime-riche) · 2023-11-06T14:30:58.615Z · comments (3)

Boston Solstice 2023 Retrospective
jefftk (jkaufman) · 2024-01-02T03:10:05.694Z · comments (0)

Regrant up to $600,000 to AI safety projects with GiveWiki
Dawn Drescher (Telofy) · 2023-10-28T19:56:06.676Z · comments (1)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor · 2024-04-18T08:39:13.368Z · comments (2)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
Arjun Panickssery (arjun-panickssery) · 2024-01-15T21:21:03.962Z · comments (0)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

We have promising alignment plans with low taxes
Seth Herd · 2023-11-10T18:51:38.604Z · comments (9)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

[link] FTX expects to return all customer money; clawbacks may go away
Mikhail Samin (mikhail-samin) · 2024-02-14T03:43:13.218Z · comments (1)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

Helpful examples to get a sense of modern automated manipulation
trevor (TrevorWiesinger) · 2023-11-12T20:49:57.422Z · comments (3)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

Disentangling four motivations for acting in accordance with UDT
Julian Stastny · 2023-11-05T21:26:22.514Z · comments (3)

What AI companies should do: Some rough ideas
Zach Stein-Perlman · 2024-10-21T14:00:10.412Z · comments (10)

DIY LessWrong Jewelry
Fluffnutt (Pear) · 2024-08-25T21:33:56.173Z · comments (0)

[link] Information dark matter
Logan Kieller (logan-kieller) · 2024-10-01T15:05:41.159Z · comments (4)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

Proveably Safe Self Driving Cars [Modulo Assumptions]
Davidmanheim · 2024-09-15T13:58:19.472Z · comments (26)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (44)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

AI Safety Strategies Landscape
Charbel-Raphaël (charbel-raphael-segerie) · 2024-05-09T17:33:45.853Z · comments (1)

AI #63: Introducing Alpha Fold 3
Zvi · 2024-05-09T14:20:03.176Z · comments (2)

[link] patent process problems
bhauth · 2024-07-14T21:12:04.953Z · comments (13)

Rational Animations offers animation production and writing services!
Writer · 2024-03-15T17:26:07.976Z · comments (0)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

Musings on LLM Scale (Jul 2024)
Vladimir_Nesov · 2024-07-03T18:35:48.373Z · comments (0)

[link] Twitter thread on open-source AI
Richard_Ngo (ricraz) · 2024-07-31T00:26:11.655Z · comments (6)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chipmonk on The hostile telepaths problem

I'm very glad you wrote this

notfnofn on What are some good ways to form opinions on controversial subjects in the current and upcoming era?

I actually just meant sowing discord by pushing half the population towards one and the other half towards the other in cases where it doesn't really affect them, but that's a good point. It's important to not be deceived into thinking issues are complicated when they are really not.

linch on davekasten's Shortform

My guess is that we wouldn't actually know with high confidence before (and likely even some time after) things-will-definitely-be-fine.

E.g. 3 months after safe ASI people might still be publishing their alignment takes.

linch on What are some good ways to form opinions on controversial subjects in the current and upcoming era?

There are also times where "foreign actors" (I assume by that term you mean actors interested in muddying the waters in general, not just literal foreign election interference) know that it's impossible to push a conversation towards their preferred 1)A or 5)B, at least among informed/educated voices, so they try to muddy the waters and push things towards 3). Climate change^[1] and covid vaccines are two examples that comes to mind.

^{^}
Though the correct answer for climate change is closer to 2) than 1)

cole-wyeth on New intro textbook on AIXI

Nice things about the universal distribution underlying AIXI include:

It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.

With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.

chris_leong on avturchin's Shortform

What's ABBYY?

programcrafter on A Semiotic Critique of the Orthogonality Thesis

A goal is, fundamentally, an idea. As the final step in a plan, you can write it out as a symbolic representation of the “world state” you are trying to achieve, although it could represent other things as well. In a planning computer agent, this will probably terminate in a bunch of 1s and 0s stored in its memory.
In order for this symbolic representation to be meaningful, it must be comparable and distinct from other symbolic representations. World state A in the agent's plan could be contrasted from world state B, C and D. This is a very fundamental fact about how information and meaning work, if World State A was indistinguishable from all the others, there would be no reason for the agent to act, because its goal would have been “accomplished”.

This has a logic error. There need not be one best world state, and a world state need not be distinguishable from all others - merely from some of them. (In fact, utility function yielding a real value compresses the world into a characteristic of things we care about in such a way.)

Also, with unbounded computations, utility optimizer could tell supremum (best outcome) for any set of world states you'd provide it; without that, it will have less granularity, work on set of close states (for instance, "easily coming to human mind") or employ other optimization techniques.

I believe this underlies much of the disagreement, because then more knowledge or more intelligence might change only the relations of "final goal" sign but not its meaning (re: isomorphism).

Your series of posts also assume that signs have a fixed order. This is false. For instance, different fields of mathematics treat real number as either first order signs (atomic objects) or higher-order ones, defined as relations on rational numbers.

Or, for an easier example to work on: equality could be a second-order sign "object A is same as object B", or it may be defined using third order expression "for any property P, A and B either both have the property or both not have it". It is no coincidence that those definitions are identical; you cannot assume that if something is expressible using higher order signs, is not also expressible in lower order.

And this might undermine the rest of argument.

Engaging with the perspective of orthogonality thesis itself: rejecting it means that a change in intelligence will lead, in expectation, to change in final goals. Could you name the expected direction of such a change, like "more intelligent agents will act with less kindness"?

avturchin on avturchin's Shortform

Collapse of mega-project to create AI based on linguistics

ABBYY spent 100 million USD for 30 years to create a model of language using hundreds of linguists. It fails to compete with transformers. This month the project was closed. More in Russian here: https://sysblok.ru/blog/gorkij-urok-abbyy-kak-lingvisty-proigrali-poslednjuju-bitvu-za-nlp/

gwern on localdeity's Shortform

Our strategy is for variants to preserve well-defined behavior in the application but introduce diversity in the effect of undefined behavior (such as out-of-bounds accesses).

This Galois work is a lot narrower and targeted at low-level details irrelevant to most code, which thankfully is now written in non-C languages- where out-of-bounds accesses don't pwn your machine and undefined behavior does not summon nasal demons and stuff like ASLR is largely irrelevant.

So AI is wholly necessary for most of the value of such an idea.

And yeah, I think it's a pretty decent idea: with cheap enough LLMs, you can harden applications by sampling possible implementations which pass all unit-tests, and whose final combination pass all end-to-end or integration tests. You can already do this a bit to check things with LLMs being so cheap. (Last night, Achmiz asked a Markov chain question and I was too lazy to try to figure it out myself, so I had ChatGPT solve it 3 ways in R: Monte Carlo, solving the matrix, and proving an exact closed-form probability. The answer could be wrong but that seems unlikely when they all seem to agree. If I wanted to write it up, I'd also have Claude solve it independently in Python so I could cross-check all 6 versions...)

This would help avoid a decent number of logic bugs and oversights, and it would also have some benefits in terms of software engineering: you are getting a lot of automated 'chaos engineering' and unit-test generation and performance benchmarking for free, by distributing a combinatorial number of implementations. It's almost like a mass fuzzing exercise, where the users provide the fuzz.

You might think this would run into issues with tracking the combinatorial number of binaries, which could take up petabytes if you are distributing, say, a 1GB package to 1 million users, but this has plenty of possible fixes: if you are using reproducible builds, as you ought to, then you only need to track a list of the variants for each function and store that per user, and then you can rebuild the exact binary for a given user on-demand.* I think a bigger issue is that forcing diversity out of tuned LLMs is quite hard, and so you would run into the systematic error problem at a higher level: all the tuned LLMs, feeding on each others' outputs & mode-collapsed, will turn in code with the same implicit assumptions & algorithms & bugs, which would mostly defeat the point.

* Similarly, the LLMs are, or should be, deterministic and fixable with a seed. So the overhead here might be something like, if you have a codebase with 10,000 functions, each time you push out a release - which might happen daily or weekly - you store the RNG seed for the LLM snapshot ID (maybe a kilobyte total), generate 2 versions of each function and randomize per user, and track 10,000 bits or ~1kb per user, so if you have a million users that's just a gigabyte. Whenever you need to investigate a specific binary because it triggered a crash or something, you just fetch the LLM ID & RNG, decode the specific 10,000 function variants they used, and compile. For anyone with millions of users who is serious about security, a gigabyte of overhead per release is nothing. You already waste that much with random Docker images and crap.

davekasten on davekasten's Shortform

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...