LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Altman firing retaliation incoming?
trevor (TrevorWiesinger) · 2023-11-19T00:10:15.645Z · comments (23)

Consent across power differentials
Ramana Kumar (ramana-kumar) · 2024-07-09T11:42:03.177Z · comments (12)

They are made of repeating patterns
quetzal_rainbow · 2023-11-13T18:17:43.189Z · comments (4)

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation
Benjamin Sturgeon (benjamin-sturgeon) · 2024-03-21T12:32:22.475Z · comments (8)

Goal-Completeness is like Turing-Completeness for AGI
Liron · 2023-12-19T18:12:29.947Z · comments (26)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
RP (Complex Bubble Tea) · 2024-02-09T07:00:45.825Z · comments (6)

Scenario Forecasting Workshop: Materials and Learnings
elifland · 2024-03-08T02:30:46.517Z · comments (3)

AI #82: The Governor Ponders
Zvi · 2024-09-19T13:30:04.863Z · comments (8)

[Intuitive self-models] 8. Rooting Out Free Will Intuitions
Steven Byrnes (steve2152) · 2024-11-04T18:16:26.736Z · comments (15)

Why you should learn a musical instrument
cata · 2024-05-15T20:36:16.034Z · comments (23)

The Shortest Path Between Scylla and Charybdis
Thane Ruthenis · 2023-12-18T20:08:34.995Z · comments (8)

Observations on Teaching for Four Weeks
ClareChiaraVincent · 2024-05-06T16:55:59.315Z · comments (14)

[link] Finding Backward Chaining Circuits in Transformers Trained on Tree Search
abhayesian · 2024-05-28T05:29:46.777Z · comments (1)

[link] on the dollar-yen exchange rate
bhauth · 2024-04-07T04:49:53.920Z · comments (21)

Job listing: Communications Generalist / Project Manager
Gretta Duleba (gretta-duleba) · 2023-11-06T20:21:03.721Z · comments (7)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter · 2023-11-08T11:37:43.997Z · comments (0)

The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn · 2024-09-04T21:04:49.878Z · comments (6)

[link] in defense of Linus Pauling
bhauth · 2024-06-03T21:27:43.962Z · comments (8)

Public Weights?
jefftk (jkaufman) · 2023-11-02T02:50:18.095Z · comments (19)

[link] DM Parenting
Shoshannah Tekofsky (DarkSym) · 2024-07-16T08:50:08.144Z · comments (4)

[link] Chapter 1 of How to Win Friends and Influence People
gull · 2024-01-28T00:32:52.865Z · comments (5)

[question] why did OpenAI employees sign
bhauth · 2023-11-27T05:21:28.612Z · answers+comments (23)

The Broken Screwdriver and other parables
bhauth · 2024-03-04T03:34:38.807Z · comments (1)

The Dunning-Kruger of disproving Dunning-Kruger
kromem · 2024-05-16T10:11:33.108Z · comments (0)

Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth · 2024-08-22T21:12:38.223Z · comments (1)

[link] Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd · 2024-05-21T22:30:52.638Z · comments (4)

[LDSL#0] Some epistemological conundrums
tailcalled · 2024-08-07T19:52:55.688Z · comments (10)

Should rationalists be spiritual / Spirituality as overcoming delusion
Kaj_Sotala · 2024-03-25T16:48:08.397Z · comments (57)

Childhood Roundup #3
Zvi · 2023-10-10T14:30:04.287Z · comments (3)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (10)

[link] On scalable oversight with weak LLMs judging strong LLMs
zac_kenton (zkenton) · 2024-07-08T08:59:58.523Z · comments (18)

Wrong answer bias
lemonhope (lcmgcd) · 2024-02-01T20:05:38.573Z · comments (24)

AI #58: Stargate AGI
Zvi · 2024-04-04T13:10:06.342Z · comments (9)

Low Probability Estimation in Language Models
Gabriel Wu (gabriel-wu) · 2024-10-18T15:50:05.947Z · comments (0)

An issue with training schemers with supervised fine-tuning
Fabien Roger (Fabien) · 2024-06-27T15:37:56.020Z · comments (12)

Notes on control evaluations for safety cases
ryan_greenblatt · 2024-02-28T16:15:17.799Z · comments (0)

Bounty: Diverse hard tasks for LLM agents
Beth Barnes (beth-barnes) · 2023-12-17T01:04:05.460Z · comments (31)

Basic Mathematics of Predictive Coding
Adam Shai (adam-shai) · 2023-09-29T14:38:28.517Z · comments (6)

AI #67: Brief Strange Trip
Zvi · 2024-06-06T18:50:03.514Z · comments (6)

SRE's review of Democracy
Martin Sustrik (sustrik) · 2024-08-03T07:20:01.483Z · comments (2)

[link] Book review: Xenosystems
jessicata (jessica.liu.taylor) · 2024-09-16T20:17:56.670Z · comments (18)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

Interested in Cognitive Bootcamp?
Raemon · 2024-09-19T22:12:13.348Z · comments (0)

Evaluating the truth of statements in a world of ambiguous language.
Hastings (hastings-greer) · 2024-10-07T18:08:09.920Z · comments (19)

[link] Every Mention of EA in "Going Infinite"
KirstenH · 2023-10-07T14:42:32.217Z · comments (0)

Translations Should Invert
abramdemski · 2023-10-05T17:44:23.262Z · comments (19)

Mechanistic Interpretability Workshop Happening at ICML 2024!
Neel Nanda (neel-nanda-1) · 2024-05-03T01:18:26.936Z · comments (6)

[link] Designing for a single purpose
Itay Dreyfus (itay-dreyfus) · 2024-05-07T14:11:22.242Z · comments (12)

Experiments as a Third Alternative
Adam Zerner (adamzerner) · 2023-10-29T00:39:31.399Z · comments (21)

Why the Best Writers Endure Isolation
Declan Molony (declan-molony) · 2024-07-16T05:58:25.032Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

danielfilan on Seven lessons I didn't learn from election day

Yeah but a bunch of people might actually answer how their neigbours will vote, given that that's what the pollster asked - and if the question is phrased as the post assumes, that's going to be a massive issue.

dakara on If we solve alignment, do we die anyway?

James, thank you for a well-written comment. It was a pleasure to read. Looking forward to Seth's response. Genuinely interested in hearing his thoughts.

avturchin on Anthropically Blind: the anthropic shadow is reflectively inconsistent

One problem here is that quantum immortality and angel immortality eventually merges: for example, if we survive 10 LHC failures because of QI, we most likely survive only on those timeline where some alien stops LHC. So both QI and angel immortality can be true and support one another and there is no contradiction.

avturchin on Quantum Immortality: A Perspective if AI Doomers are Probably Right

I know this post and have two problems with it: what they call 'anthropic shadow" is not proper term as Bostrom defined anthropic shadow as underestimation of past risks based on the fact of survival in his article this the same name. But it's ok.

The more serious problem is that quantum immortality and angel immortality eventually merges: for example, if we survive 10 LHC failures because of QI, we most likely survive only on those timeline where some alien stops LHC. So both QI and angel immortality can be true and support one another and there is no contradiction.

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.

Perhaps.

Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python.

Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.

Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc.

Rot13 is done with a hardcoded rot13 algorithm.

Somewhere in the algorithm is a giant list of facts, containing "Penguins Live In Antarctica". And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?)

And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica.

When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this.

Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?

Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code?

avturchin on Anthropically Blind: the anthropic shadow is reflectively inconsistent

Check my new post which favors the longest and thickest timelines https://www.lesswrong.com/posts/hB2CTaxqJAeh5jdfF/quantum-immortality-a-perspective-if-ai-doomers-are-probably?commentId=aAzrogWBqtFDqMMpp

dennis-zoeller on Anvil Problems

That’s a fantastic memory aid for this concept, much appreciated! Crafting games in general give ample examples to internalize this kind of bootstrap mentality. Also for quickly scaling to the next anvil-equivalent. As you touched upon, real life has a deep crafting tree, with anvil problems upon anvil problems. Something that took me far too long to learn, if you got your anvil, but still don't find yourself were you want to be, it pays to find the next anvil problem quickly. If you still have a lot of distance to cover, don't get bogged down by things that won't get you the next anvil-equivalent.

In a certain way, relationships have their own anvils. There are thresholds of trust, communication modes, that take investment. However, they also unlock completely new options, particularly when addressing challenges or navigating high-stress situations. I sometime notice, in me and others, a neglect to do serious work on relationships during good times, then lacking the tools to handle difficulties when they arise.

dakara on AI Control: Improving Safety Despite Intentional Subversion

I do believe that Shubhorup's second edit provides another case for concern. If you combine his proposal with steganography, the collusion team has another way of successfully mitigating Ryan's safety strategy.

This collusion team's strategy should also ideally be addressed.

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you

Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it.

Secondly I think there is some sort of slight of had here.

ChatGPT isn't yet fully general. Neither is a 3-sat solver. 3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.

In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't.

Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes.

dakara on AI Control: Improving Safety Despite Intentional Subversion

This paper creates a new 1-bit strategy for circumventing Ryan and Buck's safety strategy. The collusion team achieves 90%+ success rate.

I would really appreciate any input, especially from Ryan or his co-authors. This seems like a very important issue to address.