LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

[link] Goodhart's Law Example: Training Verifiers to Solve Math Word Problems
Chris_Leong · 2023-11-25T00:53:26.841Z · comments (2)

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

[question] What Software Should Exist?
Tomás B. (Bjartur Tómas) · 2024-01-19T21:43:50.112Z · answers+comments (27)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

D&D.Sci Hypersphere Analysis Part 3: Beat it with Linear Algebra
aphyer · 2024-01-16T22:44:52.424Z · comments (1)

Why I think it's net harmful to do technical safety research at AGI labs
Remmelt (remmelt-ellen) · 2024-02-07T04:17:15.246Z · comments (24)

Vote in the LessWrong review! (LW 2022 Review voting phase)
habryka (habryka4) · 2024-01-17T07:22:17.921Z · comments (9)

Smartphone Etiquette: Suggestions for Social Interactions
Declan Molony (declan-molony) · 2024-06-04T06:01:03.336Z · comments (4)

Geometric Utilitarianism (And Why It Matters)
StrivingForLegibility · 2024-05-12T03:41:21.342Z · comments (2)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

The Limitations of GPT-4
p.b. · 2023-11-24T15:30:30.933Z · comments (12)

The Sequences on YouTube
Neil (neil-warren) · 2024-01-07T01:44:39.663Z · comments (9)

[link] Manifold Markets
PeterMcCluskey · 2024-02-02T17:48:36.630Z · comments (9)

Ideas for Next-Generation Writing Platforms, using LLMs
ozziegooen · 2024-06-04T18:40:24.636Z · comments (4)

Causality is Everywhere
silentbob · 2024-02-13T13:44:49.952Z · comments (12)

Meetup In a Box: Year In Review
Czynski (JacobKopczynski) · 2024-02-14T01:18:28.259Z · comments (0)

Agent membranes/boundaries and formalizing “safety”
Chipmonk · 2024-01-03T17:55:21.018Z · comments (46)

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret (Adrià R. Moret) · 2023-12-02T14:07:29.992Z · comments (31)

Exploring OpenAI's Latent Directions: Tests, Observations, and Poking Around
Johnny Lin (hijohnnylin) · 2024-01-31T06:01:27.969Z · comments (4)

Evaluating Solar
jefftk (jkaufman) · 2024-02-17T21:50:04.783Z · comments (5)

[link] Let's Design A School, Part 2.1 School as Education - Structure
Sable · 2024-05-02T22:04:30.435Z · comments (2)

Quick takes on "AI is easy to control"
So8res · 2023-12-02T22:31:45.683Z · comments (49)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

Bayesian inference without priors
DanielFilan · 2024-04-24T23:50:08.312Z · comments (8)

Losing Metaphors: Zip and Paste
jefftk (jkaufman) · 2023-11-29T20:31:07.464Z · comments (6)

Consequentialism is a compass, not a judge
Neil (neil-warren) · 2024-04-13T10:47:44.980Z · comments (6)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

Three Types of Constraints in the Space of Agents
Nora_Ammann · 2024-01-15T17:27:27.560Z · comments (3)

SAE features for refusal and sycophancy steering vectors
neverix · 2024-10-12T14:54:48.022Z · comments (4)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[question] Seeking AI Alignment Tutor/Advisor: $100–150/hr
MrThink (ViktorThink) · 2024-10-05T21:28:16.491Z · answers+comments (3)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (69)

Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
Taras Kutsyk · 2024-09-29T19:37:30.465Z · comments (7)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

Sleeping on Stage
jefftk (jkaufman) · 2024-10-22T00:50:07.994Z · comments (3)

Essaying Other Plans
Screwtape · 2024-03-06T22:59:06.240Z · comments (4)

A list of all the deadlines in Biden's Executive Order on AI
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-11-01T17:14:31.074Z · comments (2)

[link] Emotional issues often have an immediate payoff
Chipmonk · 2024-06-10T23:39:40.697Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

bhauth on Could randomly choosing people to serve as representatives lead to better government?

see also: These Are Your Doges, If It Please You

abe-frei-pearson on BIG-Bench Canary Contamination in GPT-4

It should be pointed out that the original paper/press release describing GPT-4 explicitly says that they found that BIG-bench had contaminated their training data, and therefore excluded it as an evaluation. As far as I know there was no similar disclosure for claude or other models. See footnote 5 here: https://arxiv.org/abs/2303.08774v1

chris_leong on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I thought that this post on strategy [EA · GW] and this talk [EA · GW] were well done. Obviously, I'll have to see how this translates into practise.

nathan-helm-burger on Claude Sonnet 3.5.1 and Haiku 3.5

Having an expensive 3.5 Opus would be cool, but it's not my top wish. I'd prefer to have a variety of "flavors" of Sonnet. Different specializations for different use cases.

For example:

Science fiction writer / General Creative writer

Poet

Actor

Philosopher / Humanities professor

Chem/Bio professor

Math/Physics professor

Literary Editor

Coder

Lawyer/Political Science professor

Clerical worker for mundane repetitive tasks (probably should be a Haiku, actually)

The main things missing from Sonnet 3.5 that Opus 3 has are creativity, open mindedness, ability to analyze multi-sided complex philosophical questions better, ability to roleplay convincingly.

Why try to cram all abilities into one single model? Distilling down to smaller models seems like a perfect place to allow for specialization.

aphyer on A metaphor: what "green lights" for AGI would look like

How many of those green lights could the Wright Brothers have shown you?

anthonyc on Big tech transitions are slow (with implications for AI)

I'm going to ignore all "AI is different" arguments for the sake of this comment, even though I agree with some of them. Let's assume I grant all your points. The agricultural revolution took a couple of millennia. The industrial revolution took a couple of centuries. And now, the AI revolution will take decades.

This means I can equivalently restate your conclusion as, "Human activity will lose almost all economic value by the time my newborn niece would have finished grad school." This is certainly slower than many timeline predictions today, but it's hardly "slow" by most standards, and is in fact still faster than the median timelines of most experts as of 5 years ago.

Of course, one of the important facts about these past transitions is that each petered out after bootstrapping civilization far enough to start the next one that's 10x faster. So, if the world in 2047 is 1000x richer and moving at AGI speeds compared to today, then the next 1000x change should take a few years, and the next one after that a few months. This still implies "singularity by 2050." We'd probably have about an extra decade to ensure our survival, though, which I would agree is great.

saidachmiz on A metaphor: what "green lights" for AGI would look like

What is a “deontic mesh”? I am not familiar with this term; do you have a link that explains it?

alice-wanderland on What if AGI was already accidentally created in 2019? [Fictional story]

Fair! Though when the alternative is my own fiction writing skills... let's just say I appreciated Claude's version the most amongst the set of possible options available ^^;

simon on D&D Sci Coliseum: Arena of Data

You may well be right, I'll look into my hyperparameters. I looked at the code Claude had generated with my interference and that greatly lowered my confidence in them, lol (see edit to this comment [LW(p) · GW(p)]).

steve2152 on [Intuitive self-models] 3. The Homunculus

Hmm. Maybe here’s an analogy. Suppose somebody said:

There’s a certain kind of interoceptive sensory input, consisting of such-and-such signal coming from blah type of thermoreceptor in the peripheral nervous system. Your brain does its usual thing of transforming that sensation into its own “color” of “metaphysical paint” (as in §3.3.2) that forms a concept / property in your conscious awareness and world-model, and you know it by the everyday term “cold”.

On the one hand, I would defend this passage as basically true. On the other hand, there’s clearly a lot of connotations and associations of the word “cold” that go way beyond the natural generalization of things that trigger this thermoreceptor. “Concepts are clusters in thingspace” [LW · GW], as the saying goes, and thus things that go along with coldness often enough kinda get roped in as a connation or aspect of the coldness concept itself. And then all those aspects of coldness can in turn get analogized into other domains, and now here we are talking about cold personalities and cold starts and cold cases and cold symptoms and the Cold War and on and on.

By the same token, I’m happy to defend a claim along the lines of “intrinsic unpredictability is the seed / core at the center of concepts like animation, vitality, agency, etc.”, but I acknowledge that intrinsic unpredictability in and of itself is not the entirety of those terms and their various connotations and associations.

(This is a helpful discussion for me, thanks.)