LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

$20K In Bounties for AI Safety Public Materials
Dan H (dan-hendrycks) · 2022-08-05T02:52:47.729Z · comments (9)

[link] The longest training run
Jsevillamol · 2022-08-17T17:18:40.387Z · comments (12)

Paper reading as a Cargo Cult
jem-mosig · 2022-08-07T07:50:33.296Z · comments (10)

[link] Jack Clark on the realities of AI policy
Kaj_Sotala · 2022-08-07T08:44:33.547Z · comments (3)

[link] In Defense Of Making Money
George3d6 · 2022-08-18T14:10:23.950Z · comments (13)

AI art isn't "about to shake things up". It's already here.
Davis_Kingsley · 2022-08-22T11:17:55.415Z · comments (19)

The Expanding Moral Cinematic Universe
Raemon · 2022-08-28T18:42:19.134Z · comments (9)

ACX Meetups Everywhere List
Scott Alexander (Yvain) · 2022-08-26T18:12:04.083Z · comments (1)

Encultured AI Pre-planning, Part 1: Enabling New Benchmarks
Andrew_Critch · 2022-08-08T22:44:09.365Z · comments (2)

Building a Bugs List prompts
CFAR!Duncan (CFAR 2017) · 2022-08-13T08:00:39.838Z · comments (9)

Vingean Agency
abramdemski · 2022-08-24T20:08:53.237Z · comments (14)

Steganography in Chain of Thought Reasoning
A Ray (alex-ray) · 2022-08-08T03:47:00.610Z · comments (13)

Seeking Interns/RAs for Mechanistic Interpretability Projects
Neel Nanda (neel-nanda-1) · 2022-08-15T07:11:25.107Z · comments (0)

Oops It's Time To Overthrow the Organizer Day!
Screwtape · 2022-08-18T16:40:53.864Z · comments (5)

[link] OpenAI's Alignment Plans
dkirmani · 2022-08-24T19:39:04.620Z · comments (17)

An Introduction to Current Theories of Consciousness
hohenheim · 2022-08-28T17:55:38.151Z · comments (44)

Finding Goals in the World Model
Jeremy Gillen (jeremy-gillen) · 2022-08-22T18:06:48.213Z · comments (8)

The Pragmascope Idea
johnswentworth · 2022-08-04T21:52:15.206Z · comments (19)

EA & LW Forums Weekly Summary (21 Aug - 27 Aug 22')
Zoe Williams (GreyArea) · 2022-08-30T01:42:39.309Z · comments (4)

How to plan for a radically uncertain future?
Kerry · 2022-08-30T02:14:47.244Z · comments (35)

My thoughts on direct work (and joining LessWrong)
RobertM (T3t) · 2022-08-16T18:53:20.359Z · comments (4)

Autonomy as taking responsibility for reference maintenance
Ramana Kumar (ramana-kumar) · 2022-08-17T12:50:30.218Z · comments (3)

Anti-squatted AI x-risk domains index
plex (ete) · 2022-08-12T12:01:24.927Z · comments (6)

Refine's First Blog Post Day
adamShimi · 2022-08-13T10:23:10.332Z · comments (3)

Brain-like AGI project "aintelope"
Gunnar_Zarncke · 2022-08-14T16:33:39.571Z · comments (2)

How and why to turn everything into audio
KatWoods (ea247) · 2022-08-11T08:55:55.871Z · comments (20)

[question] How to bet against civilizational adequacy?
Wei Dai (Wei_Dai) · 2022-08-12T23:33:56.173Z · answers+comments (17)

All the posts I will never write
Alexander Gietelink Oldenziel (alexander-gietelink-oldenziel) · 2022-08-14T18:29:06.800Z · comments (8)

I missed the crux of the alignment problem the whole time
zeshen · 2022-08-13T10:11:24.826Z · comments (7)

Transformer language models are doing something more general
Numendil · 2022-08-03T21:13:42.472Z · comments (6)

Seeking PCK (Pedagogical Content Knowledge)
CFAR!Duncan (CFAR 2017) · 2022-08-12T04:15:28.181Z · comments (11)

[link] Using GPT-3 to augment human intelligence
Henrik Karlsson (henrik-karlsson) · 2022-08-10T15:54:29.253Z · comments (8)

A Data limited future
Donald Hobson (donald-hobson) · 2022-08-06T14:56:35.916Z · comments (25)

[link] Announcing Squiggle: Early Access
ozziegooen · 2022-08-03T19:48:16.727Z · comments (7)

Variational Bayesian methods
Ege Erdil (ege-erdil) · 2022-08-25T20:49:55.415Z · comments (1)

General alignment properties
TurnTrout · 2022-08-08T23:40:47.176Z · comments (2)

Turbocharging
CFAR!Duncan (CFAR 2017) · 2022-08-02T00:01:23.148Z · comments (3)

[link] PreDCA: vanessa kosoy's alignment protocol
Tamsin Leake (carado-1) · 2022-08-20T10:03:10.701Z · comments (8)

On Car Seats as Contraception
Zvi · 2022-08-22T14:10:01.299Z · comments (15)

AGI Timelines Are Mostly Not Strategically Relevant To Alignment
johnswentworth · 2022-08-23T20:15:22.193Z · comments (34)

Gradient descent doesn't select for inner search
Ivan Vendrov (ivan-vendrov) · 2022-08-13T04:15:19.373Z · comments (23)

The Shard Theory Alignment Scheme
David Udell · 2022-08-25T04:52:50.206Z · comments (32)

Six weeks doesn’t make a habit
lynettebye · 2022-08-06T08:54:26.584Z · comments (1)

Againstness
CFAR!Duncan (CFAR 2017) · 2022-08-02T19:29:14.221Z · comments (7)

The Falling Drill
Screwtape · 2022-08-05T00:08:26.069Z · comments (3)

Covid 8/18/22: CDC Admits Mistakes
Zvi · 2022-08-18T14:30:01.286Z · comments (9)

Polaris, Five-Second Versions, and Thought Lengths
CFAR!Duncan (CFAR 2017) · 2022-08-01T07:14:16.429Z · comments (12)

Proposal: Consider not using distance-direction-dimension words in abstract discussions
moridinamael · 2022-08-09T20:44:13.256Z · comments (18)

The Dumbest Possible Gets There First
Artaxerxes · 2022-08-13T10:20:26.564Z · comments (7)

[link] Review: Amusing Ourselves to Death
L Rudolf L (LRudL) · 2022-08-20T21:13:47.023Z · comments (7)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

deluks917 on Stephen Fowler's Shortform

A serious effective altruism movement with clean house. Everyone who pushed the 'work with AI capabilities company' line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.

weightt-an on LLMs could be as conscious as human emulations, potentially

Each of the transformation steps described in the post reduces my expectation that the result would be conscious somewhat.

Well, it's like saying the {human in a car as a single system} is or is not conscious. Firstly it's a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car.

What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like you can constrain actions of a meaty human by putting them in a jail or something. (... or in a time loop / repeated complete memory wipes)

No, I don't think it would be "what the fuck" surprising if an emulation of a human brain was not conscious.

How would you expect to this possibly cash out? Suppose there are human emulations running around doing all things exactly like meaty humans. How exactly do you expect that announcement of a high scientific council go, "We discovered that EMs are not conscious* because .... and that's important because of ...". Is that completely out of model for you? Or like, can you give me (even goofy) scenario out of that possibility

Or do you think high resolution simulations will fail to replicate capabilities of humans, outlook of them? I.e special sauce/quantum fuckery/literal magic?

eggsyntax on Language Models Model Us

Absolutely! @Jozdien recounting those anecdotes was one of the sparks for this research, as was janus showing in the comments that the base model could confidently identify gwern. (I see I've inexplicably failed to thank Arun at the end of my post, need to fix that).

Interestingly, I was able to easily reproduce the gwern identification using the public model, so it seems clear that these capabilities are not entirely RLHFed away, although they may be somewhat impaired.

eggsyntax on Language Models Model Us

Oh thanks, I'd missed that somehow & thought that only the temp mattered for that.

eggsyntax on Language Models Model Us

Thanks!

eggsyntax on Language Models Model Us

That used to work, but as of March you can only get the pre-logit_bias logprobs back. They didn't announce the change, but it's discussed in the OpenAI forums eg here. I noticed the change when all my code suddenly broke; you can still see remnants of that approach in the code.

akash-wasil on DeepMind's "Frontier Safety Framework" is weak and unambitious

@Zach Stein-Perlman [LW · GW] I appreciate your recent willingness to evaluate and criticize safety plans from labs. I think this is likely a public good that is underprovided, given the strong incentives that many people have to maintain a good standing with labs (not to mention more explicit forms of pressure applied by OpenAI and presumably other labs).

One thought: I feel like the difference between how you described the Anthropic RSP and how you described the OpenAI PF feels stronger than the actual quality differences between the documents. I agree with you that the thresholds in the OpenAI PF are too high, but I think the PF should get "points" for spelling out risks that go beyond ASL-3/misuse.

OpenAI has commitments that are insufficiently cautious for ASL-4+ (or what they would call high/critical on model autonomy), but Anthropic circumvents this problem by simply refusing to make any commitments around ASL-4 (for now).

You note this limitation when describing Anthropic's RSP, but you describe it as "promising" while describing the PF as "unpromising." In my view, this might be unfairly rewarding Anthropic for just not engaging with the hardest parts of the problem (or unfairly penalizing OpenAI for giving their best guess answers RE how to deal with the hardest parts of the problem).

We might also just disagree on how firm or useful the commitments in each document are– I walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks. I do think OpenAI's thresholds are too high, but It's likely that I'll feel the same way about Anthropic's thresholds. In particular, I don't expect either (any?) lab to be able to resist the temptation to internally deploy models with autonomous persuasion capabilities or autonomous AI R&D capabilities (partially because the competitive pressures and race dynamics pushing them to do so will be intense). I don't see evidence that either lab is taking these kinds of risks particularly seriously, has ideas about what safeguards would be considered sufficient, or is seriously entertaining the idea that we might need to do a lot (>1 year) of dedicated safety work (that potentially involves coming up with major theoretical insights, as opposed to a "we will just solve it with empiricism" perspective) before we are confident that we can control such systems.

TLDR: I would remove the word "promising" and maybe characterize the RSP more like "an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4."

keltan on Why you should learn a musical instrument

I haven’t, but I’ll take a look. I appreciate the recommendation!

8e9 on AISafety.com – Resources for AI Safety

Some thoughts about the intro video: it's excellent but ends a little abruptly; it would be good to explain how AISafety.com plans to help. Also, I quite dislike flashing images (around the 45-second mark).

algon on Why you should learn a musical instrument

Since you seem interested in nootropics, I wonder if you've read Gwern's list of nootropic self-experiments? He covers a lot of supplements, some of which are pretty obscure AFAICT.~

EDIT: https://gwern.net/nootropic/nootropics