LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Augmenting Statistical Models with Natural Language Parameters
jsteinhardt · 2024-09-20T18:30:10.816Z · comments (0)

Difficulty classes for alignment properties
Jozdien · 2024-02-20T09:08:24.783Z · comments (5)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (36)

The Schumer Report on AI (RTFB)
Zvi · 2024-05-24T15:10:03.122Z · comments (3)

AI Impacts Survey: December 2023 Edition
Zvi · 2024-01-05T14:40:06.156Z · comments (6)

[link] My Apartment Art Commission Process
jenn (pixx) · 2024-08-26T18:36:44.363Z · comments (4)

ARENA4.0 Capstone: Hyperparameter tuning for MELBO + replication on Llama-3.2-1b-Instruct
25Hour (aaron-kaufman) · 2024-10-05T11:30:11.953Z · comments (2)

The Cognitive Bootcamp Agreement
Raemon · 2024-10-16T23:24:05.509Z · comments (0)

Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy
Joe Rogero · 2024-11-12T23:55:46.770Z · comments (17)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data
Sohaib Imran (sohaib-imran) · 2024-11-16T23:22:21.857Z · comments (5)

My disagreements with "AGI ruin: A List of Lethalities"
Noosphere89 (sharmake-farah) · 2024-09-15T17:22:18.367Z · comments (46)

Copyright Confrontation #1
Zvi · 2024-01-03T15:50:04.850Z · comments (7)

Intransitive Trust
Screwtape · 2024-05-27T16:55:29.294Z · comments (15)

[link] Robin Hanson & Liron Shapira Debate AI X-Risk
Liron · 2024-07-08T21:45:40.609Z · comments (4)

AI #56: Blackwell That Ends Well
Zvi · 2024-03-21T12:10:05.412Z · comments (16)

[link] Suffering Is Not Pain
jbkjr · 2024-06-18T18:04:43.407Z · comments (45)

AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan · 2024-06-12T03:30:05.747Z · comments (0)

[link] Why Yudkowsky is wrong about "covalently bonded equivalents of biology"
titotal (lombertini) · 2023-12-06T14:09:15.402Z · comments (40)

[link] The last era of human mistakes
owencb · 2024-07-24T09:58:42.116Z · comments (2)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

[link] hydrogen tube transport
bhauth · 2024-04-18T22:47:08.790Z · comments (12)

Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments
Radford Neal · 2023-12-07T03:33:16.149Z · comments (25)

[link] GPT2, Five Years On
Joel Burget (joel-burget) · 2024-06-05T17:44:17.552Z · comments (0)

How to develop a photographic memory 1/3
PhilosophicalSoul (LiamLaw) · 2023-12-28T13:26:36.669Z · comments (6)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic · 2024-05-17T00:25:42.950Z · comments (12)

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
RogerDearnaley (roger-d-1) · 2024-01-11T12:56:29.672Z · comments (4)

What I Learned (Conclusion To "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-20T21:24:37.464Z · comments (0)

How good are LLMs at doing ML on an unknown dataset?
Håvard Tveit Ihle (havard-tveit-ihle) · 2024-07-01T09:04:03.687Z · comments (4)

Takeaways from a Mechanistic Interpretability project on “Forbidden Facts”
Tony Wang (tw) · 2023-12-15T11:05:23.256Z · comments (8)

Monthly Roundup #20: July 2024
Zvi · 2024-07-23T12:50:07.991Z · comments (9)

Confusing the metric for the meaning: Perhaps correlated attributes are "natural"
NickyP (Nicky) · 2024-07-23T12:43:18.681Z · comments (3)

Update #2 to "Dominant Assurance Contract Platform": EnsureDone
moyamo · 2023-11-28T18:02:50.367Z · comments (2)

[link] On Lies and Liars
Gabriel Alfour (gabriel-alfour-1) · 2023-11-17T17:13:03.726Z · comments (4)

More on the Apple Vision Pro
Zvi · 2024-02-13T17:40:05.388Z · comments (5)

Love, Reverence, and Life
Elizabeth (pktechgirl) · 2023-12-12T21:49:04.061Z · comments (7)

AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
Roman Leventov · 2023-12-27T14:51:37.713Z · comments (9)

2024 ACX Predictions: Blind/Buy/Sell/Hold
Zvi · 2024-01-09T19:30:06.388Z · comments (2)

5. Moral Value for Sentient Animals? Alas, Not Yet
RogerDearnaley (roger-d-1) · 2023-12-27T06:42:09.130Z · comments (41)

Sparse autoencoders find composed features in small toy models
Evan Anders (evan-anders) · 2024-03-14T18:00:43.339Z · comments (12)

LLMs can strategically deceive while doing gain-of-function research
Igor Ivanov (igor-ivanov) · 2024-01-24T15:45:08.795Z · comments (4)

[question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
lillybaeum · 2023-12-10T17:26:34.206Z · answers+comments (34)

ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik · 2023-11-29T18:11:53.252Z · comments (16)

One way violinists fail
Solenoid_Entity · 2024-05-29T04:08:17.675Z · comments (5)

[link] AI Safety Memes Wiki
plex (ete) · 2024-07-24T18:53:04.977Z · comments (1)

Mech Interp Lacks Good Paradigms
Daniel Tan (dtch1997) · 2024-07-16T15:47:32.171Z · comments (0)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

The Consciousness Box
GradualImprovement · 2023-12-11T16:45:08.172Z · comments (22)

Templates I made to run feedback rounds for Ethan Perez’s research fellows.
Henry Sleight (ResentHighly) · 2024-03-28T19:41:15.506Z · comments (0)

[question] Is AlphaGo actually a consequentialist utility maximizer?
faul_sname · 2023-12-07T12:41:05.132Z · answers+comments (8)

"Which chains-of-thought was that faster than?"
Emrik (Emrik North) · 2024-05-22T08:21:00.269Z · comments (4)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

We can discuss anything that exists, that might exist, that did exist, that could exist, and that could not exist. So no matter what form your predict-the-next-token language model takes, if it is trained over the entire corpus of the written word, the representations it forms will be pretty hard to understand, because the representations encode an entire understanding of the entire world.

Perhaps.

Imagine a huge number of very skilled programmers tried to manually hard code a ChatGPT in python.

Ask this pyGPT to play chess, and it will play chess. Look under the hood, and you see a chess engine programmed in. Ask it to solve algebra problems, a symbolic algebra package is in there. All in the best neat and well commented code.

Ask it to compose poetry, and you have some algorithm that checks if 2 words rhyme. Some syllable counter. Etc.

Rot13 is done with a hardcoded rot13 algorithm.

Somewhere in the algorithm is a giant list of facts, containing "Penguins Live In Antarctica". And if you change this fact to say "Penguins Live in Canada", then the AI will believe this. (Or spot it's inconsistency with other facts?)

And with one simple change, the AI believes this consistently. Penguins appear when this AI is asked for poems about canada, and don't appear in poems about Antarctica.

When asked about the native canadian diet, it will speculate that this likely included penguin, but say that it doesn't know of any documented examples of this.

Can you build something with ChatGPT level performance entirely out of human comprehensible programmatic parts?

Obviously having humans program these parts directly would be slow. (We are still talking about a lot of code.) But if some algorithm could generate that code?

avturchin on Anthropically Blind: the anthropic shadow is reflectively inconsistent

Check my new post which favors the longest and thickest timelines https://www.lesswrong.com/posts/hB2CTaxqJAeh5jdfF/quantum-immortality-a-perspective-if-ai-doomers-are-probably?commentId=aAzrogWBqtFDqMMpp

dennis-zoeller on Anvil Problems

That’s a fantastic memory aid for this concept, much appreciated! Crafting games in general give ample examples to internalize this kind of bootstrap mentality. Also for quickly scaling to the next anvil-equivalent. As you touched upon, real life has a deep crafting tree, with anvil problems upon anvil problems. Something that took me far too long to learn, if you got your anvil, but still don't find yourself were you want to be, it pays to find the next anvil problem quickly. If you still have a lot of distance to cover, don't get bogged down by things that won't get you the next anvil-equivalent.

In a certain way, relationships have their own anvils. There are thresholds of trust, communication modes, that take investment. However, they also unlock completely new options, particularly when addressing challenges or navigating high-stress situations. I sometime notice, in me and others, a neglect to do serious work on relationships during good times, then lacking the tools to handle difficulties when they arise.

dakara on AI Control: Improving Safety Despite Intentional Subversion

I do believe that Shubhorup's second edit provides another case for concern. If you combine his proposal with steganography, the collusion team has another way of successfully mitigating Ryan's safety strategy.

This collusion team's strategy should also ideally be addressed.

donald-hobson on Giant (In)scrutable Matrices: (Maybe) the Best of All Possible Worlds

But if the universal failure of nature and man to find non-connectionist forms of general intelligence does not move you

Firstly, AIXI exists, and we agree that it would be very smart if we had the compute to run it.

Secondly I think there is some sort of slight of had here.

ChatGPT isn't yet fully general. Neither is a 3-sat solver. 3-sat looks somewhat like what you might expect a non-connectionist approach to intelligence to look like. There are a huge range of maths problems that are all theoretically equivalent to 3 sat.

In the infinite limit, both types of intelligence can simulate the other at huge overhead, In practice, they can't.

Also, non-connectionist forms of intelligence are hard to evolve, because evolution works in small changes.

dakara on AI Control: Improving Safety Despite Intentional Subversion

This paper creates a new 1-bit strategy for circumventing Ryan and Buck's safety strategy. The collusion team achieves 90%+ success rate.

I would really appreciate any input, especially from Ryan or his co-authors. This seems like a very important issue to address.

benito on Sabotage Evaluations for Frontier Models

I have found it fruitful to argue this case back and forth with you, thank you for defending and explaining your perspective.

I will restate my overall position, I invite you to do the same, and then I propose that we consider this 40+ comment thread concluded for now.

———

The comment of yours that (to me) started this thread was the following.

If the default path is AI's taking over control from humans, then what is the current plan in leading AI labs? Surely all the work they put in AI safety is done to prevent exactly such scenarios. I would find it quite hard to believe that a large group of people would vigorously do something if they believed that their efforts will go to vain.

I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so. One should not assume that the incentives are aligned – anyone who is risking omnicide-level outcomes via investing in novel tech development currently faces no criminal penalties, fines, or international sanctions.

Given the current intellectual elite scene where a substantial number of prestigious people care about extinction level outcomes, it is also not surprising that power-seeking companies have large departments focused on 'ethics' and 'safety' in order to look respectable to such people. Separately from any intrinsic interest, it has been a useful political chip for enticing a great deal of talent from scientific communities and communities interested in ethics to work for them (not dissimilar to how Sam Bankman-Fried managed to cause a lot of card-carrying members of the Effective Altruist philosophy and scene to work very hard to help build his criminal empire by talking a good game about utilitarianism, veganism, and the rest).

Looking at a given company's plan for preventing doom, and noticing it does not check out, should not be followed by an assumption of adequacy [? · GW] and good incentives such that surely this company would not exist nor do work on AI safety if it did not have a strong plan, I must be mistaken. I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not. Given the lack of accountability, and my belief that alignment is clearly unsolved and we fundamentally do not knowing what we're doing [LW(p) · GW(p)], I believe the people involved are getting rich risking all of our lives and there is (currently) no justice here.

We have agreed on many points, and from the outset I believe you felt my position had some truth to it ("I do get that point that you are making, but I think this is a little bit unfair to these organizations."). I will leave you to outline whichever overall thoughts and remaining points of agreement or disagreement that you wish.

turntrout on Announcing turntrout.com, my new digital home

Another bit I forgot to highlight in the original post: the fonts available on my site.

saidachmiz on Announcing turntrout.com, my new digital home

Not bad at all! Needs some work on the details and some bug fixes, but—really not bad! The dropcaps, in particular, are well done; and the overall theme is elegant.

raemon on Neutrality

Curated. This was one of the more inspiring things I read this year (in a year that had a moderate number of inspiring things!)

I really like how Sarah lays out the problem and desiderata for neutrality in our public/civic institutional spaces.

LessWrong's strength is being a fairly opinionated university about how to do epistemics, which the rest of the world isn't necessarily bought into. Trying to make LW a civic institution would fail. But, this post has me more excited to revisit "what would be necessary to build good, civic infrastructure" (where "good" requires both "be 'good' in some kind of deep sense," but also "be memetically fit enough to compete with Twitter et all." One solution might be convincing Musk of specific policies rather than building a competitor)