LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

next page (older posts) →

The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth · 2024-12-26T18:48:07.614Z · comments (43)

A Three-Layer Model of LLM Psychology
Jan_Kulveit · 2024-12-26T16:49:41.738Z · comments (3)

AI #96: o3 But Not Yet For Thee
Zvi · 2024-12-26T20:30:06.722Z · comments (5)

ReSolsticed vol I: "We're Not Going Quietly"
Raemon · 2024-12-26T17:52:33.727Z · comments (3)

[question] What are the most interesting / challenging evals (for humans) available?
Raemon · 2024-12-27T03:05:26.831Z · answers+comments (4)

Corrigibility's Desirability is Timing-Sensitive
RobertM (T3t) · 2024-12-26T22:24:17.435Z · comments (4)

[link] PCR retrospective
bhauth · 2024-12-26T21:20:56.484Z · comments (0)

Whistleblowing Twitter Bot
Mckiev · 2024-12-26T04:09:45.493Z · comments (3)

[link] Review: Planecrash
L Rudolf L (LRudL) · 2024-12-27T14:18:33.611Z · comments (0)

[question] What's the best metric for measuring quality of life?
ChristianKl · 2024-12-27T14:29:30.813Z · answers+comments (3)

[question] What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question?
avturchin · 2024-12-26T11:08:23.545Z · answers+comments (2)

If all trade is voluntary, then what is "exploitation?"
Darmani · 2024-12-27T11:21:30.036Z · comments (7)

Coin Flip
XelaP (scroogemcduck1) · 2024-12-27T11:53:01.781Z · comments (0)

[link] Letter from an Alien Mind
Shoshannah Tekofsky (DarkSym) · 2024-12-27T13:20:49.277Z · comments (0)

[question] Why don't we currently have AI agents?
ChristianKl · 2024-12-26T15:26:35.682Z · answers+comments (7)

[link] Streamlining my voice note process
Vlad Sitalo (harcisis) · 2024-12-26T06:04:01.990Z · comments (1)

Super human AI is a very low hanging fruit!
Hzn · 2024-12-26T19:00:22.822Z · comments (0)

[question] Are Sparse Autoencoders a good idea for AI control?
Gerard Boxo (gerard-boxo) · 2024-12-26T17:34:55.617Z · answers+comments (2)

Algorithmic Asubjective Anthropics, Cartesian Subjective Anthropics
Lorec · 2024-12-27T01:58:39.880Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

[link] The Economics & Practicality of Starting Mars Colonization
Zero Contradictions · 2024-12-26T10:56:26.019Z · comments (1)

Duplicate token neurons in the first layer of gpt2-small
Alex Gibson · 2024-12-27T04:21:55.896Z · comments (0)

[link] Human, All Too Human - Superintelligence requires learning things we can’t teach
Ben Turtel (ben-turtel) · 2024-12-26T16:26:27.328Z · comments (4)

Terminal goal vs Intelligence
Donatas Lučiūnas (donatas-luciunas) · 2024-12-26T08:10:42.144Z · comments (15)

next page (older posts) →

Archive

Recent comments

viliam on leogao's Shortform

Generally, it is about heuristics we can use to find quality in the oceans of crap. If we assume that people are sane to some degree, status is an imperfect proxy for quality. If we assume that people don't use AIs to polish their writing styles, the writing style is an imperfect proxy for quality.

I have no experience reading research. I suspect that there are also crackpots who can write using the right kind of style. For example, they may be experts at their own line of research, and also speak overconfidently about different things they do not understand.

So if you want to be taken seriously, you probably need to know what kind of crackpot do you remind others of, and then find a way how to distinguish yourself from this kind of crackpot specifically.

At some moment it would probably easier to simply do your homework, once, and then have something you can point at. For example, you don't need to publish everything in the established journals, but it would probably help to publish there once -- just to show that if you want, you can; that this is about your priorities, not about lack of quality.

There are probably other ways, for example if you don't wont to get involved too much with the system, find someone who already is, and maybe offer them co-authorship in return for jumping through all the hoops.

I guess my model is that the costs of complying with the standard system are high but constant. So the more time you spend complaining about the system not taking your seriously, the greater the chance that complying with the system would have actually been cheaper than the accumulating opportunity costs.

dagon on Terminal goal vs Intelligence

if terminal goal changes, agent is not rational. Agent has no control over its terminal goal, or you don't agree?

Why is it relevant that the agent can or cannot change or influence it's goals? Time-inconsistent terminal goals (utility function) are irrational. Time-inconsistent instrumental goals can be rational, if circumstances or beliefs change (in rational ways).

I don't think I'm supporting the orthogonality thesis with this (though I do currently believe the weak form of it - there is a very wide range of goals that is compatible with intelligence, not necessarily all points in goalspace). I'm just saying that goals which are arbitrarily mutable are incompatible with rationality in the Von Neumann-Morgenstern sense.

tsvibt on The Field of AI Alignment: A Postmortem, and What To Do About It

Remember that the top-level commenter here is currently a physicist, so it's not like the usefulness of their work would be going down by doing a useless MATS project :P

Yes it would! It would eat up motivation and energy and hope that they could have put towards actual research. And it would put them in a social context where they are pressured to orient themselves toward streetlighty research--not just during the program, but also afterward. Unless they have some special ability to have it not do that.

Without MATS: not currently doing anything directly useful (though maybe indirectly useful, e.g. gaining problem-solving skill). Could, if given $30k/year, start doing real AGI alignment thinking from scratch not from scratch, thereby scratching their "will you think in a way that unlocks understanding of strong minds" lottery ticket that each person gets.

With MATS: gotta apply to extension, write my LTFF grant. Which org should I apply to? Should I do linear probes software engineering? Or evals? Red teaming? CoT? Constitution? Hyperparamter gippity? Honeypot? Scaling supervision? Superalign, better than regular align? Detecting deception?

joel-burget on The Field of AI Alignment: A Postmortem, and What To Do About It

A different way to think about types of work is within current ML paradigms vs outside of them. If you believe that timelines are short (e.g. 5 years or less), it makes much more sense to work within current paradigms, otherwise there's very little chance your work will become adopted in time to matter. Mainstream AI, with all of its momentum, is not going to adopt a new paradigm overnight.

If I understand you correctly, there's a close (but not exact) correspondence between work I'd label in-paradigm and work you'd label as "streetlighting". On my model the reason to work in-paradigm is because that's where your work has a realistic chance to make a difference in this world.

lunatic_at_large on Whistleblowing Twitter Bot

I agree, though I think it would be a very ridiculous own-goal if e.g. GPT-4o decided to block a whistleblowing report about OpenAI because it was trained to serve OpenAI's interests. I think any model used by this kind of whistleblowing tool should be open-source (nothing fancy / more dangerous than what's already out there), run locally by the operators of the tool, and tested to make sure it doesn't block legitimate posts.

lunatic_at_large on Whistleblowing Twitter Bot

My gut instinct is that this would have been a fantastic thing to create 2-4 years ago. My biggest hesitation is that the probability a tool like this decreases existential risk is proportional to the fraction of lab researchers who know about it and adoption can be a slow / hard thing to make happen. I still think that this kind of program could be incredibly valuable under the right circumstances so someone should probably be working on this.

Also, I have a very amateurish security question: if someone provides their work email to verify their authenticity with this tool, can their employer find out? For example, I wouldn't put it past OpenAI to check if an employee's email account got pinged by this tool and then to pressure / fire that employee.

tailcalled on What's the best metric for measuring quality of life?

Since it's the FDA that's doing the regulating, they could pick the investigator. Completely ungameable.

viliam on ReSolsticed vol I: "We're Not Going Quietly"

Fantastic! Are lyrics available somewhere?

christiankl on What's the best metric for measuring quality of life?

That sounds like it's relatively easy to game by the company who chooses the investigators.

alexey on The Online Sports Gambling Experiment Has Failed

I mostly agree, but it's a double-digit percent increase in bankruptcies which ends up being (from the post)

about 4bps (0.04%)/year of additional bankruptcies