LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[Interim research report] Activation plateaus & sensitive directions in GPT2
StefanHex (Stefan42) · 2024-07-05T17:05:25.631Z · comments (2)

How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley (roger-d-1) · 2023-11-28T19:56:49.679Z · comments (30)

Book Review: On the Edge: The Fundamentals
Zvi · 2024-09-23T13:40:11.058Z · comments (3)

A gentle introduction to mechanistic anomaly detection
Erik Jenner (ejenner) · 2024-04-03T23:06:16.778Z · comments (2)

SAEs (usually) Transfer Between Base and Chat Models
Connor Kissane (ckkissane) · 2024-07-18T10:29:46.138Z · comments (0)

On the Debate Between Jezos and Leahy
Zvi · 2024-02-06T14:40:05.487Z · comments (6)

Announcing New Beginner-friendly Book on AI Safety and Risk
Darren McKee · 2023-11-25T15:57:08.078Z · comments (2)

Superposition is not "just" neuron polysemanticity
LawrenceC (LawChan) · 2024-04-26T23:22:06.066Z · comments (4)

[link] On Shifgrethor
JustisMills · 2024-10-27T15:30:13.688Z · comments (13)

On the Gladstone Report
Zvi · 2024-03-20T19:50:05.186Z · comments (11)

[link] DeepMind: Frontier Safety Framework
Zach Stein-Perlman · 2024-05-17T17:30:02.504Z · comments (0)

[link] A free to enter, 240 character, open-source iterated prisoner's dilemma tournament
Isaac King (KingSupernova) · 2023-11-09T08:24:43.277Z · comments (19)

[link] Improving Dictionary Learning with Gated Sparse Autoencoders
Senthooran Rajamanoharan (SenR) · 2024-04-25T18:43:47.003Z · comments (38)

[link] AI, centralization, and the One Ring
owencb · 2024-09-13T14:00:16.126Z · comments (11)

Against most, but not all, AI risk analogies
Matthew Barnett (matthew-barnett) · 2024-01-14T03:36:16.267Z · comments (41)

All About Concave and Convex Agents
mako yass (MakoYass) · 2024-03-24T21:37:17.922Z · comments (23)

What mistakes has the AI safety movement made?
EuanMcLean (euanmclean) · 2024-05-23T11:19:02.717Z · comments (29)

Self-Awareness: Taxonomy and eval suite proposal
Daniel Kokotajlo (daniel-kokotajlo) · 2024-02-17T01:47:01.802Z · comments (2)

AiPhone
Zvi · 2024-06-12T22:20:02.141Z · comments (4)

Generalization, from thermodynamics to statistical physics
Jesse Hoogland (jhoogland) · 2023-11-30T21:28:50.089Z · comments (9)

On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
Zvi · 2024-04-22T13:10:02.645Z · comments (4)

[link] Moving on from community living
Vika · 2024-04-17T17:02:11.357Z · comments (7)

[link] Investigating an insurance-for-AI startup
L Rudolf L (LRudL) · 2024-09-21T15:29:10.083Z · comments (0)

[link] A primer on why computational predictive toxicology is hard
Abhishaike Mahajan (abhishaike-mahajan) · 2024-08-19T17:16:37.735Z · comments (2)

AI research assistants competition 2024Q3: Tie between Elicit and You.com
Elizabeth (pktechgirl) · 2024-10-12T15:10:05.417Z · comments (2)

[question] Is cybercrime really costing trillions per year?
Fabien Roger (Fabien) · 2024-09-27T08:44:07.621Z · answers+comments (28)

Bayesian updating in real life is mostly about understanding your hypotheses
Max H (Maxc) · 2024-01-01T00:10:30.978Z · comments (4)

Another argument against utility-centric alignment paradigms
Fiora from Rosebloom · 2024-09-22T07:28:27.856Z · comments (39)

Do not delete your misaligned AGI.
mako yass (MakoYass) · 2024-03-24T21:37:07.724Z · comments (13)

[link] Outrage Bonding
Jonathan Moregård (JonathanMoregard) · 2024-08-09T13:46:59.818Z · comments (12)

[link] Pay-on-results personal growth: first success
Chipmonk · 2024-09-14T03:39:12.975Z · comments (5)

Black Box Biology
GeneSmith · 2023-11-29T02:27:29.794Z · comments (30)

What is a Tool?
johnswentworth · 2024-06-25T23:40:07.483Z · comments (4)

[link] Ice: The Penultimate Frontier
Roko · 2024-07-13T23:44:56.827Z · comments (56)

Automation collapse
Geoffrey Irving · 2024-10-21T14:50:54.500Z · comments (6)

A framework for thinking about AI power-seeking
Joe Carlsmith (joekc) · 2024-07-24T22:41:01.685Z · comments (15)

Catastrophic Goodhart in RL with KL penalty
Thomas Kwa (thomas-kwa) · 2024-05-15T00:58:20.763Z · comments (10)

AI #55: Keep Clauding Along
Zvi · 2024-03-14T15:40:09.335Z · comments (16)

Never Drop A Ball
Screwtape · 2023-11-23T04:15:35.834Z · comments (1)

Don't sleep on Coordination Takeoffs
trevor (TrevorWiesinger) · 2024-01-27T19:55:26.831Z · comments (24)

[link] Superforecasting the Origins of the Covid-19 Pandemic
DanielFilan · 2024-03-12T19:01:15.914Z · comments (0)

Thoughts on open source AI
Sam Marks (samuel-marks) · 2023-11-03T15:35:42.067Z · comments (17)

[link] Twitter thread on AI safety evals
Richard_Ngo (ricraz) · 2024-07-31T00:18:14.076Z · comments (3)

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19
viking_math · 2024-02-19T01:14:06.772Z · comments (28)

Book Review: On the Edge: The Future
Zvi · 2024-09-27T14:00:05.279Z · comments (1)

RTFB: California’s AB 3211
Zvi · 2024-07-30T13:10:03.853Z · comments (2)

[link] Research Report: Sparse Autoencoders find only 9/180 board state features in OthelloGPT
Robert_AIZI · 2024-03-05T13:55:33.483Z · comments (24)

[link] on bacteria, on teeth
bhauth · 2024-09-30T15:56:56.830Z · comments (9)

Balancing Games
jefftk (jkaufman) · 2024-02-24T14:40:04.237Z · comments (18)

Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth · 2024-06-07T18:53:36.643Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

tailcalled on Three Notions of "Power"

Can you explain what this coordination would look like?

khafra on Three Notions of "Power"

Your definition seems like it fits the Emperor of China example--by reputation, they had few competitors for being the most willing and able to pessimize another agent's utility function; e.g. 9 Familial Exterminations.
And that seems to be a key to understanding this type of power, because if they were able to pessimize all other agents' utility functions, that would just be an evil mirror of bargaining power. Being able to choose a sharply limited number of unfortunate agents, and punish them severely pour encourager les autres, seems like it might just stop working when the average agent is smart enough to implicitly coordinate around a shared understanding of payoff matrices.
So I think I might have arrived back to the "all dominance hierarchies will be populated solely by scheming viziers" conclusion.

fread2281 on Alexander Gietelink Oldenziel's Shortform

I guess this is sorta about your 3, which I disbelieve (though algorithms for tasks other than learning are also important). Currently, Bayesian inference vs SGD is a question of how much data you have (where SGD wins except for very little data). For small to medium amounts of data, even without AGI, I expect SGD to lose eventually due to better inference algorithms. For many problems I have the intuition that it's ~always possible to improve performance with more complicated algorithms (eg sat solvers). All that together makes me expect there to be inference algorithms that scale to very large amounts of data (that aren't going to be doing full Bayesian inference but rather some complicated approximation).

bolverk on I got dysentery so you don’t have to

Sequence 1 length:3 

Sequence 2 length:6 

Alignment length: 6 

Identity: 3/6 (50.00%) 

Similarity: 3/6 (50.00%) 

Gaps: 3/6 (50.00%)

---AGC
   |||
AGCAGC

Like this. Difference between lengths is considered non-matching.

https://en.vectorbuilder.com/tool/sequence-alignment.html

inquilinekea on What TMS is like

https://pmc.ncbi.nlm.nih.gov/articles/PMC8122027/

raemon on JargonBot Beta Test

I've reverted the part that automatically generates jargon for drafts until we've figured out a better overall solution.

meme-marine on The Compendium, A full argument about extinction risk from AGI

I am unsurprised but disappointed to read the same Catastrophe arguments rehashed here, based on an outdated Bostromian paradigm of AGI. This is the main section I disagree with.

The underlying principle beneath these hypothetical scenarios is grounded in what we can observe around us: powerful entities control weaker ones, and weaker ones can fight back only to the degree that the more powerful entity isn’t all that powerful after all.

I do not think this is obvious or true at all. Nation-States are often controlled by a small group of people or even a single person, no different physiologically to any other human being. If it really wanted to, there would be nothing at all stopping the US military from launching a coup on its civilian government; in fact, military coups are a commonplace global event. Yet, generally, most countries do not suffer constant coup attempts. We hold far fewer tools to "align" military leaders than we do AI models - we cannot control how generals were raised as children, cannot read their minds, cannot edit their minds.

I think you could also make a similar argument that big things control little things - with much more momentum and potential energy, we observe that large objects are dominant over small objects. Small objects can only push large objects to the extent that the large object is made of a material that is not very dense. Surely, then building vehicles substantially larger than people would result in uncontrollable runaways that would threaten human life and property! But in reality, runaway dump truck incidents are fairly uncommon. A tiny man can control a giant machine. Not all men can - only the one in the cockpit.

My point is that it is not at all obvious that a powerful AI would lack such a cockpit. If its goals are oriented around protecting or giving control to a set of individuals, I see no reason whatsoever why it would do a 180 and kill its commander, especially since the AI systems that we can build in practice are more than capable of understanding the nuances of their commands.

The odds of an average chess player with an ELO of 1200 against a grandmaster with ELO 2500 are 1 to a million. Against the best chess AI today with an ELO of 3600, the odds are essential 0.

Chess is a system that's perfectly predictable. Reality is a chaotic system. Chaotic systems - like a three-body orbital arrangement - are impossible to perfectly predict in all cases even if they're totally deterministic, because even minute inaccuracies in measurement can completely change the result. One example would be the edges of the Mandelbrot set. It's fractal. Therefore, even an extremely powerful AI would be beholden to certain probabilistic barriers, notwithstanding quantum-random factors.

Many assume that an AI is only dangerous if it has hostile intentions, but the danger of godlike AI is not a matter of its intent, but its power and autonomy. As these systems become increasingly agentic and powerful, they will pursue goals that will diverge from our own.

It would not be incorrect to describe someone who pursued their goals irrespective of its externalities to be malevolent. Bank robbers don't want to hurt people, they want money. Yet I don't think anyone would suggest that the North Hollywood shooters were "non-hostile but misaligned". I do not like this common snippet of rhetoric and I think it is dishonest. It attempts to distance these fears of misaligned AI from movie characters such as Skynet, but ultimately, this is the picture that is painted.

Goal divergence is a hallmark of the Bostromian paradigm - the idea that a misspecified utility function, optimized hypercompetently, would lead to disaster. Modern AI systems do not behave like this. They behave in a much more humanlike way. They do not have objective functions that they pursue doggedly. The Orthogonality Thesis states that intelligence is uncorrelated with objectives. The unstated connection here, I think, is that their initial goals must have been misaligned in the first place, but stated like this, it sounds a little like you expect a superintelligent AI to suddenly diverge from its instructions for no reason at all.

Overall, this is a very vague section. I think you would benefit from explaining some of the assumptions being made here.

I'm not going to go into detail on the Alignment section, but I think that many of its issues are similar to the ones listed above. I think that the arguments are not compelling enough for lay people, mostly because I don't think they're correct. I think that the definition of Alignment you have given - "the ability to “steer AI systems toward a person's or group's intended goals, preferences, and ethical principles.”" - does not match the treatment it is given. I think that it is obvious that the scope of Alignment is too vague, broad, and unverifiable for it to be a useful concept. I think that Richard Ngo's post:

https://www.lesswrong.com/posts/67fNBeHrjdrZZNDDK/defining-alignment-research [LW · GW]

is a good summary of the issues I see with the current idea of Alignment as it is often used in Rationalist circles and how it could be adapted to suit the world in which we find ourselves.

Finally, I think that the Governance section could very well be read uncharitably as a manifesto for world domination. Less than a dozen people attend PauseAI protests; you do not have the political ability to make this happen. The ideas contained in this document, which resemble many other documents, such as a similar one created by the PauseAI group, are not compelling enough to sway people who are not already believers in its ideas, and the Rationalist language used in them is anathemic to the largest ideological groups that would otherwise support your cause.

You may receive praise from Rationalist circles, but I do not think you will reach a large audience with this type of work. Leopold Aschenbrenner's essay managed to reach a fairly substantial audience, and it has similar themes to your document, so in principle, people are willing to read this sort of writing. The main flaw is that it doesn't add anything to the conversation, and because of that, it won't change anyone's minds. The reason that the public discourse doesn't involve Alignment talk isn't due to lack of awareness, it's because it isn't at all compelling to most people. Writing it better, with a nicer format, will not change this.

yair-halberstadt on Trading Candy

Counterpoint: when I was about 12, I was too old to collect candy at my Synagogue on Simchat Torah, so I would beg a single candy from someone, then trade it up (Dutch book style) with naive younger kids until I had a decent stash. I was particularly pleased whenever my traded up stash included the original sweet.

denkenberger on Is the Power Grid Sustainable?

It would be helpful to see a calculation with your rates, the installed cost of batteries, cost of the space taken up, losses in the batteries and convertor, any cost of maintenance, lifetime of batteries, and cost (or benefit) of disposal.

nathan-helm-burger on johnswentworth's Shortform

I like it. I do worry that it, and The Narrow Path, are both missing how hard it will be to govern and restrict AI.