LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

Mysteries of mode collapse
janus · 2022-11-08T10:37:57.760Z · comments (56)

I Converted Book I of The Sequences Into A Zoomer-Readable Format
dkirmani · 2022-11-10T02:59:04.236Z · comments (31)

What it's like to dissect a cadaver
Alok Singh (OldManNick) · 2022-11-10T06:40:05.776Z · comments (23)

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren · 2022-11-28T12:54:52.399Z · comments (33)

Tyranny of the Epistemic Majority
Scott Garrabrant · 2022-11-22T17:19:34.144Z · comments (13)

Conjecture: a retrospective after 8 months of work
Connor Leahy (NPCollapse) · 2022-11-23T17:10:23.510Z · comments (9)

Planes are still decades away from displacing most bird jobs
guzey · 2022-11-25T16:49:32.344Z · comments (13)

Geometric Rationality is Not VNM Rational
Scott Garrabrant · 2022-11-27T19:36:00.939Z · comments (26)

The Geometric Expectation
Scott Garrabrant · 2022-11-23T18:05:12.206Z · comments (19)

The Alignment Community Is Culturally Broken
sudo · 2022-11-13T18:53:55.054Z · comments (68)

Sadly, FTX
Zvi · 2022-11-17T14:30:03.068Z · comments (18)

AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak · 2022-11-22T18:57:29.604Z · comments (98)

Mechanistic anomaly detection and ELK
paulfchristiano · 2022-11-25T18:50:04.447Z · comments (21)

On the Diplomacy AI
Zvi · 2022-11-28T13:20:00.884Z · comments (29)

Clarifying AI X-risk
zac_kenton (zkenton) · 2022-11-01T11:03:01.144Z · comments (24)

Geometric Exploration, Arithmetic Exploitation
Scott Garrabrant · 2022-11-24T15:36:30.334Z · comments (4)

Utilitarianism Meets Egalitarianism
Scott Garrabrant · 2022-11-21T19:00:12.168Z · comments (16)

Speculation on Current Opportunities for Unusually High Impact in Global Health
johnswentworth · 2022-11-11T20:47:03.367Z · comments (31)

What I Learned Running Refine
adamShimi · 2022-11-24T14:49:59.366Z · comments (5)

Applying superintelligence without collusion
Eric Drexler · 2022-11-08T18:08:31.733Z · comments (63)

How could we know that an AGI system will have good consequences?
So8res · 2022-11-07T22:42:27.395Z · comments (25)

Caution when interpreting Deepmind's In-context RL paper
Sam Marks (samuel-marks) · 2022-11-01T02:42:06.766Z · comments (6)

LW Beta Feature: Side-Comments
jimrandomh · 2022-11-24T01:55:31.578Z · comments (47)

LessWrong readers are invited to apply to the Lurkshop
Jonas V (Jonas Vollmer) · 2022-11-22T09:19:05.412Z · comments (41)

Instead of technical research, more people should focus on buying time
Akash (akash-wasil) · 2022-11-05T20:43:45.215Z · comments (45)

[link] ARC paper: Formalizing the presumption of independence
Erik Jenner (ejenner) · 2022-11-20T01:22:55.110Z · comments (2)

Instrumental convergence is what makes general intelligence possible
tailcalled · 2022-11-11T16:38:14.390Z · comments (11)

[link] Trying to Make a Treacherous Mesa-Optimizer
MadHatter · 2022-11-09T18:07:03.157Z · comments (14)

[link] Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)
Jacy Reese Anthis (Jacy Reese) · 2022-11-22T16:50:20.054Z · comments (64)

Conjecture Second Hiring Round
Connor Leahy (NPCollapse) · 2022-11-23T17:11:42.524Z · comments (0)

Searching for Search
NicholasKees (nick_kees) · 2022-11-28T15:31:49.974Z · comments (8)

Current themes in mechanistic interpretability research
Lee Sharkey (Lee_Sharkey) · 2022-11-16T14:14:02.030Z · comments (2)

By Default, GPTs Think In Plain Sight
Fabien Roger (Fabien) · 2022-11-19T19:15:29.591Z · comments (33)

Announcing the Progress Forum
jasoncrawford · 2022-11-17T19:26:29.584Z · comments (9)

When AI solves a game, focus on the game's mechanics, not its theme.
Cleo Nardo (strawberry calm) · 2022-11-23T19:16:07.333Z · comments (7)

[link] Results from the interpretability hackathon
Esben Kran (esben-kran) · 2022-11-17T14:51:44.568Z · comments (0)

Exams-Only Universities
Mati_Roy (MathieuRoy) · 2022-11-06T22:05:39.373Z · comments (40)

Always know where your abstractions break
lsusr · 2022-11-27T06:32:09.643Z · comments (6)

Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn (marius-hobbhahn) · 2022-11-16T14:40:16.734Z · comments (17)

[link] Engineering Monosemanticity in Toy Models
Adam Jermyn (adam-jermyn) · 2022-11-18T01:43:38.623Z · comments (7)

Follow up to medical miracle
Elizabeth (pktechgirl) · 2022-11-04T18:00:01.858Z · comments (5)

Threat Model Literature Review
zac_kenton (zkenton) · 2022-11-01T11:03:22.610Z · comments (4)

[link] Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos (pvs) · 2022-11-14T16:42:27.135Z · comments (12)

[link] Elastic Productivity Tools
Simon Berens (sberens) · 2022-11-19T21:59:39.913Z · comments (8)

[link] What is epigenetics?
Metacelsus · 2022-11-06T01:24:05.350Z · comments (4)

Respecting your Local Preferences
Scott Garrabrant · 2022-11-26T19:04:14.252Z · comments (1)

Takeaways from a survey on AI alignment resources
DanielFilan · 2022-11-05T23:40:01.917Z · comments (10)

Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Akash (akash-wasil) · 2022-11-22T22:19:09.419Z · comments (20)

Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus · 2022-11-19T23:51:27.510Z · comments (8)

K-types vs T-types — what priors do you have?
Cleo Nardo (strawberry calm) · 2022-11-03T11:29:00.809Z · comments (25)

next page (older posts) →

Archive

Recent comments

quiet_nan on Why I'm doing PauseAI

Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

I am by no means an expert on machine learning, but this sentence reads weird to me.

I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction.

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible.

So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public.

The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher -- unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult.

dagon on If you are assuming Software works well you are dead

Agreed, but it's not just software. It's every complex system, anything which requires detailed coordination of more than a few dozen humans and has efficiency pressure put upon it. Software is the clearest example, because there's so much of it and it feels like it should be easy.

aaron-kaufman on MSP ACX Hangout: Davanni's Pizza

Primarily people come to this on the discord, so I just have this on lw for visibility

nathan-helm-burger on Shannon Vallor’s “technomoral virtues”

Something I think humanity is going to have to grapple with soon is the ethics of self-modification / self-improvement, and the perils of value-shift due to rapid internal and external changes. How do we stay true to ourselves while changing fundamental aspects of what it means to be human?

lawrencec on Towards Multimodal Interpretability: Learning Sparse Interpretable Features in Vision Transformers

Huh, that's indeed somewhat surprising if the SAE features are capturing the things that matter to CLIP (in that they reduce loss) and only those things, as opposed to "salient directions of variation in the data". I'm curious exactly what "failing to work" means -- here I think the negative result (and the exact details of said result) are argubaly more interesting than a positive result would be.

dagon on Thomas Kwa's Shortform

I think this leans a lot on "get evidence uniformly over the next 10 years" and "Brownian motion in 1% steps". By conservation of expected evidence, I can't predict the mean direction of future evidence, but I can have some probabilities over distributions which add up to 0.

For long-term aggregate predictions of event-or-not (those which will be resolved at least a few years away, with many causal paths possible), the most likely updates are a steady reduction as the resolution date gets closer, AND random fairly large positive updates as we learn of things which make the event more likely.

johannes-c-mayer on Johannes C. Mayer's Shortform

Yes, that is a good point. I think you can totally write a program that checks given two lists as input, xs and xs', that xs' is sorted and also contains exactly all the elements from xs. That allows us to specify in code what it means that a list xs' is what I get when I sort xs.

And yes I can do this without talking about how to sort a list. I nearly give a property such that there is only one function that is implied by this property: the sorting function. I can constrain what the program can be totally (at least if we ignore runtime and memory stuff).

johannes-c-mayer on My hour of memoryless lucidity

To test whether Drake’s circumvention of his short-term memory loss worked via the intended mechanism, I could ask my girlfriend in advance to prompt me once — and only once — to complete the long-term memory scene that I had been practicing. Then I could see if I have a memory of the scene after I fully regain my memory.

Maybe you need to think the thought many times over in order to overwrite the original memory. In your place, I would try to prepare something similar to what Drake did. Some mental objects that you can retrieve have a predesigned hole to put information. To me, it seems like this should not be that hard to get. Then for ideally 30 minutes or so (though the streaming algorithm experiment seems also very interesting) after the surgery when you don't have short-term memory, you can repeatedly try to insert some specific object in the memory.

Maybe it would make sense for the sake of the experiment to limit yourself to 3 possible objects that could be inserted. Your girlfriend can then choose one randomly after surgery, for you to drill into the memory, by repeatedly thinking about the scene completed with that specific object.

Then after the 30 minutes, you do something completely different. Then 1 hour afterwards your girlfriend can ask you what the object was that she told you 1 hour ago. Well and probably many times during the first 30 minutes.

Probably it would be best if your girlfriend (or whatever person is willing to do this) constantly reminds you during the first 30 minutes or so that you need to imagine the object. Probably at least every minute or so.

dagon on Johannes C. Mayer's Shortform

I kind of see what you're saying, but I also rather think you're talking about specifying very different things in a way that I don't think is required. The closer CS definition of math's "define a sorted list" is "determine if a list is sorted". I'd argue it's very close to equivalent to the math formality of whether a list is sorted. You can argue about the complexity behind the abstraction (Math's foundations on set theory and symbols vs CS library and silicon foundations on memory storage and "list" indexing), but I don't think that's the point you're making.

When used for different things, they're very different in complexity. When used for the same things, they can be pretty similar.

ryan_b on How do you actually obtain and report a likelihood function for scientific research?

I was absolutely certain I had responded to this, because I had taken the trouble to search for and locate a description of the procedure used in particle physics, which appears to be the central place where likelihood functions are the preferred tool.

Seems I wrote it but never submitted it, so in this here placeholder comment I vouchsafe to hunt that resource down again and put it here in an edit.

Edit: As I promised, the resource: https://ep-news.web.cern.ch/what-likelihood-function-and-how-it-used-particle-physics

This is a short article from by a person from CERN, Robert Cousins. It covers in brief what likelihood is and how it is different than probability, then a short description of three different methods of using a likelihood function (here listed as Likelihoodist, Neman-Pearson, and Bayesian), and then on to a slightly more advanced example. It has references which include some papers from the work on identifying the Higgs Boson, and some of his own relevant papers.