LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Actually, Power Plants May Be an AI Training Bottleneck.
Lao Mein (derpherpize) · 2024-06-20T04:41:33.567Z · comments (13)

OpenAI o1, Llama 4, and AlphaZero of LLMs
Vladimir_Nesov · 2024-09-14T21:27:41.241Z · comments (24)

Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane (ckkissane) · 2024-01-16T00:26:14.767Z · comments (9)

Update on the UK AI Taskforce & upcoming AI Safety Summit
Elliot Mckernon (elliot) · 2023-10-11T11:37:42.436Z · comments (2)

Muddling Along Is More Likely Than Dystopia
Jeffrey Heninger (jeffrey-heninger) · 2023-10-20T21:25:15.459Z · comments (10)

Why you should be using a retinoid
GeneSmith · 2024-08-19T03:07:41.722Z · comments (57)

New roles on my team: come build Open Phil's technical AI safety program with me!
Ajeya Cotra (ajeya-cotra) · 2023-10-19T16:47:59.701Z · comments (6)

[link] What Depression Is Like
Sable · 2024-08-27T17:43:22.549Z · comments (23)

[link] Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb · 2024-04-16T10:10:13.338Z · comments (12)

An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers
Yitz (yitz) · 2024-01-17T09:48:07.930Z · comments (11)

The Good Life in the face of the apocalypse
Elizabeth (pktechgirl) · 2023-10-16T22:40:15.200Z · comments (8)

Agent Boundaries Aren't Markov Blankets. [Unless they're non-causal; see comments.]
abramdemski · 2023-11-20T18:23:40.443Z · comments (11)

[Paper] All's Fair In Love And Love: Copy Suppression in GPT-2 Small
CallumMcDougall (TheMcDouglas) · 2023-10-13T18:32:02.376Z · comments (4)

Catastrophic sabotage as a major threat model for human-level AI systems
evhub · 2024-10-22T20:57:11.395Z · comments (6)

Secular interpretations of core perennialist claims
zhukeepa · 2024-08-25T23:41:02.683Z · comments (32)

AI #83: The Mask Comes Off
Zvi · 2024-09-26T12:00:08.689Z · comments (19)

My Criticism of Singular Learning Theory
Joar Skalse (Logical_Lunatic) · 2023-11-19T15:19:16.874Z · comments (56)

Coup probes: Catching catastrophes with probes trained off-policy
Fabien Roger (Fabien) · 2023-11-17T17:58:28.687Z · comments (7)

Release: Optimal Weave (P1): A Prototype Cohabitive Game
mako yass (MakoYass) · 2024-08-17T14:08:18.947Z · comments (21)

Some Vacation Photos
johnswentworth · 2024-01-04T17:15:01.187Z · comments (0)

Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe · 2024-07-02T13:17:16.352Z · comments (7)

[link] "The Heart of Gaming is the Power Fantasy", and Cohabitive Games
Raemon · 2023-10-08T21:02:33.526Z · comments (49)

[link] Palworld development blog post
bhauth · 2024-01-28T05:56:19.984Z · comments (12)

AISafety.com – Resources for AI Safety
Søren Elverlin (soren-elverlin-1) · 2024-05-17T15:57:11.712Z · comments (3)

Values Are Real Like Harry Potter
johnswentworth · 2024-10-09T23:42:24.724Z · comments (17)

Bostrom Goes Unheard
Zvi · 2023-11-13T14:11:07.586Z · comments (9)

Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi (andy-arditi) · 2023-12-08T17:08:01.250Z · comments (7)

[link] New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman · 2024-05-21T11:00:41.794Z · comments (17)

Constructability: Plainly-coded AGIs may be feasible in the near future
Épiphanie Gédéon (joy_void_joy) · 2024-04-27T16:04:45.894Z · comments (13)

3C's: A Recipe For Mathing Concepts
johnswentworth · 2024-07-03T01:06:11.944Z · comments (5)

Self-Referential Probabilistic Logic Admits the Payor's Lemma
Yudhister Kumar (randomwalks) · 2023-11-28T10:27:29.029Z · comments (14)

[link] Not every accommodation is a Curb Cut Effect: The Handicapped Parking Effect, the Clapper Effect, and more
Michael Cohn (michael-cohn) · 2024-09-15T05:27:36.691Z · comments (39)

Studying The Alien Mind
Quentin FEUILLADE--MONTIXI (quentin-feuillade-montixi) · 2023-12-05T17:27:28.049Z · comments (10)

The Gemini Incident
Zvi · 2024-02-22T21:00:04.594Z · comments (19)

Survey of 2,778 AI authors: six parts in pictures
KatjaGrace · 2024-01-06T04:43:34.590Z · comments (1)

Announcing Athena - Women in AI Alignment Research
Claire Short (claire-short) · 2023-11-07T21:46:41.741Z · comments (2)

New report: "Scheming AIs: Will AIs fake alignment during training in order to get power?"
Joe Carlsmith (joekc) · 2023-11-15T17:16:42.088Z · comments (26)

Thomas Kwa's research journal
Thomas Kwa (thomas-kwa) · 2023-11-23T05:11:08.907Z · comments (1)

The case for a negative alignment tax
Cameron Berg (cameron-berg) · 2024-09-18T18:33:18.491Z · comments (20)

[Intuitive self-models] 2. Conscious Awareness
Steven Byrnes (steve2152) · 2024-09-25T13:29:02.820Z · comments (48)

[link] My thesis (Algorithmic Bayesian Epistemology) explained in more depth
Eric Neyman (UnexpectedValues) · 2024-05-09T19:43:16.543Z · comments (4)

LessWrong Community Weekend 2024, open for applications
UnplannedCauliflower · 2024-05-01T10:18:21.992Z · comments (2)

[link] MIRI's May 2024 Newsletter
Harlan · 2024-05-15T00:13:30.153Z · comments (1)

[link] The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
EJT (ElliottThornley) · 2023-10-23T21:00:48.398Z · comments (22)

How to prevent collusion when using untrusted models to monitor each other
Buck · 2024-09-25T18:58:20.693Z · comments (5)

Quick look: applications of chaos theory
Elizabeth (pktechgirl) · 2024-08-18T15:00:07.853Z · comments (51)

[link] Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
titotal (lombertini) · 2024-09-18T13:07:40.754Z · comments (3)

Scaffolding for "Noticing Metacognition"
Raemon · 2024-10-09T17:54:13.657Z · comments (4)

EU policymakers reach an agreement on the AI Act
tlevin (trevor) · 2023-12-15T06:02:44.668Z · comments (7)

A couple productivity tips for overthinkers
Steven Byrnes (steve2152) · 2024-04-20T16:05:50.332Z · comments (13)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

nate-showell on Heresies in the Shadow of the Sequences

Some other examples:

Agency and embeddedness are fundamentally at odds with each other. Decision theory and physics are incompatible approaches to world-modeling, with each making assumptions that are inconsistent with the other. Attempting to build mathematical models of embedding agency will fail as an attempt to understand advanced AI behavior.
Reductionism is false. If modeling a large-scale system in terms of the exact behavior of its small-scale components would take longer than the age of the universe, or would require a universe-sized computer, the large-scale system isn't explicable in terms of small-scale interactions even in principle. The Sequences are incorrect to describe non-reductionism as ontological realism about large-scale entities [LW · GW] -- the former doesn't inherently imply the latter.
Relatedly, nothing is ontologically primitive. Not even elementary particles: if, for example, you took away the mass of an electron, it would cease to be an electron and become something else. The properties of those particles, as well, depend on having fields to interact with. And if a field couldn't interact with anything, could it still be said to exist?
Ontology creates axiology and axiology creates ontology. We aren't born with fully formed utility functions in our heads telling us what we do and don't value. Instead, we have to explore and model the world over time, forming opinions along the way about what things and properties we prefer. And in turn, our preferences guide our exploration of the world and the models we form of what we experience. Classical game theory, with its predefined sets of choices and payoffs, only has narrow applicability, since such contrived setups are only rarely close approximations to the scenarios we find ourselves in.

thomas-kehrenberg on OpenAI Email Archives (from Musk v. Altman)

I wonder if it would be a good idea to put editor's notes after likely typos, like:

but we want [editor's note: Elon likely meant “won’t” here] do any contract or agree to “evangelize”.

thomas-kehrenberg on OpenAI Email Archives (from Musk v. Altman)

Yes, it sounds that he put too much stock into Andrej's paper-counting argument, and then even left the board because he didn't want to be associated with a failing company?

archimedes on The Third Fundamental Question

This sounds like metacognitive concepts and models. Like past, present, future, you can roughly align them with three types of metacognitive awareness: declarative knowledge, procedural knowledge, and conditional knowledge.

#1 - What do you think you know, and how do you think you know it?

Content knowledge (declarative knowledge) which is understanding one's own capabilities, such as a student evaluating their own knowledge of a subject in a class. It is notable that not all metacognition is accurate.

#2 - Do you know what you are doing, and why you are doing it?

Task knowledge (procedural knowledge) refers to knowledge about doing things. This type of knowledge is displayed as heuristics and strategies. A high degree of procedural knowledge can allow individuals to perform tasks more automatically.

#3 - What are you about to do, and what do you think will happen next?

Strategic knowledge (conditional knowledge) refers to knowing when and why to use declarative and procedural knowledge. It is one's own capability for using strategies to learn information.

Another somewhat tenuous alignment is with metacognitive skills: evaluating, monitoring, and planning.

#1 - What do you think you know, and how do you think you know it?

Evaluating: refers to appraising the final product of a task and the efficiency at which the task was performed. This can include re-evaluating strategies that were used.

#2 - Do you know what you are doing, and why you are doing it?

Monitoring: refers to one's awareness of comprehension and task performance

#3 - What are you about to do, and what do you think will happen next?

Planning: refers to the appropriate selection of strategies and the correct allocation of resources that affect task performance.

Quotes are adapted from https://en.wikipedia.org/wiki/Metacognition

seth-herd on OpenAI Email Archives (from Musk v. Altman)

I sometimes feel we spend too much time on philosophy and communication in the x-risk community. But thinking through the OpenAI drama suggests that it's crucial.

Now the world is in more and more immediate danger because a couple of smart guys couldn't get their philosophy or their communication right-enough, and didn't spend the time necessary to clarify. Instead Musk followed his combative and entrepreneurial instincts. The result was dramatically heating up the race for AGI, which previously had no real competition to DeepMind.

OpenAI wouldn't have launched without Musk's support, and he gave it because he was afraid of Larry Page being in charge of a successful Google AGI effort.

From Musk's interview with Tucker Carlson (automated transcript, sorry!):

I mean the the reason open AI exists at all is that um Larry Paige and I used to be close friends and I would stay at his house in pal Alto and I would talk to him late into the night about uh AI safety and at least my (01:12) perception was that Larry was not taking uh AI safety seriously enough um and um what did he say about it he really seemed to be um one want want sort of digital super intelligence basically digital God if you will and at one point uh I said well what about you know we're going to make sure humanity is okay here um and and and um uh and then he called me a speciest

Musk was afraid of what Page would do with AGI because Page called Musk a speciesist (specist?) when they were talking about AGI safety. What did Page mean by this? He probably hadn't worked it all the way through.

These guys stopped being friends, and Musk put a bunch of money and effort into developing an org that could rival DeepMind's progress toward AGI.

That org was captured by Altman. But it was always based on a stupid idea: make AGI open source. That's the dumbest thing you could do with something really dangerous - unless you believed that it would otherwise wind up in hands that just don't care about humanity.

That's probably not what Page meant. On consideration, he would probably have clarified that AI that includes what we value about humanity would be a worthy successor. He probably wasn't even clear on his own philosophy at the time.

A little more careful conversation would've prevented this whole thing, and we'd be in a much better strategic position.

In my mind this also shows how immensely intelligent people can also do really dumb things outside of their area of intellectual expertise.

cata on OpenAI Email Archives (from Musk v. Altman)

It seems like Musk in 2018 dramatically underestimated the ability of OpenAI to compete with Google in the medium term.

cata on OpenAI Email Archives (from Musk v. Altman)

Thanks for not only doing this but noting the accuracy of the unchecked transcript, it's always hard work to build a mental model of how good LLM tools are at what stuff.

sam-marks on Lao Mein's Shortform

I'm quite happy for laws to be passed and enforced via the normal mechanisms. But I think it's bad for policy and enforcement to be determined by Elon Musk's personal vendettas. If Elon tried to defund the AI safety institute because of a personal vendetta against AI safety researchers, I would have some process concerns, and so I also have process concerns when these vendettas are directed against OAI.

seth-herd on OpenAI Email Archives (from Musk v. Altman)

Rings true. I'm not sure it pushes me much on the ethics of OpenAI; somebody else had a good idea for a philosophy and a name to push for AI in a certain (maybe dumb) direction; they recognized it as a good idea and appropriated it for their own similar project. Should they have used a more different name? Probably. Should they have used a more different philosophical argument? No. Should they have brought Guy Ravine on board? Probably not; his vision for how the thing would actually go was very different from theirs, and none of his skills were really that relevant. He'd have been in arguments with them from the start.

Is this the right way for industry to work? Nope. But nobody knows how to properly give credit for good but broad ideas.

None of this is to endorse anything or anyone related to OpenAI, just to say it's pretty standard practice.

annasalamon on Ayn Rand’s model of “living money”; and an upside of burnout

Yes, this is a good point, relates to why I claimed at top that this is an oversimplified model. I appreciate you using logic from my stated premises; helps things be falsifiable.

It seems to me:

Somehow people who are in good physical health wake up each day with a certain amount of restored willpower. (This is inconsistent with the toy model in the OP, but is still my real / more-complicated model.)
Noticing spontaneously-interesting things can be done without willpower; but carefully noticing superficially-boring details and taking notes in hopes of later payoff indeed requires willpower, on my model. (Though, for me, less than e.g. going jogging requires.)
If you’ve just been defeated by a force you weren’t tracking, that force often becomes spontaneously-interesting. Thus people who are burnt out can sometimes take a spontaneous interest in how willpower/burnout/visceral motivation works, and can enjoy “learning humbly” from these things.
There’s a way burnout can help cut through ~dumb/dissociated/overconfident ideological frameworks (e.g. “only AI risk is interesting/relevant to anything”), and make space for other information to have attention again, and make it possible to learn things not in one's model. Sort of like removing a monopoly business from a given sector, so that other thingies have a shot again.

I wish the above was more coherent/model-y.