LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

[question] What projects and efforts are there to promote AI safety research?
Christopher King (christopher-king) · 2023-05-24T00:33:47.554Z · answers+comments (0)

My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI
Andrew_Critch · 2023-05-24T00:02:08.836Z · comments (39)

[link] AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI
Dan H (dan-hendrycks) · 2023-05-23T21:47:34.755Z · comments (0)

The Polarity Problem [Draft]
Dan H (dan-hendrycks) · 2023-05-23T21:05:34.567Z · comments (3)

[link] Progress links and tweets, 2023-05-23
jasoncrawford · 2023-05-23T20:15:49.360Z · comments (0)

[link] Yoshua Bengio: How Rogue AIs may Arise
harfe · 2023-05-23T18:28:27.489Z · comments (12)

'Fundamental' vs 'applied' mechanistic interpretability research
Lee Sharkey (Lee_Sharkey) · 2023-05-23T18:26:18.174Z · comments (6)

Coercion is an adaptation to scarcity; trust is an adaptation to abundance
Richard_Ngo (ricraz) · 2023-05-23T18:14:19.117Z · comments (11)

[question] Is "brittle alignment" good enough?
the8thbit (the8thbit-1) · 2023-05-23T17:35:43.750Z · answers+comments (5)

Will Artificial Superintelligence Kill Us?
James_Miller · 2023-05-23T16:27:52.419Z · comments (2)

Phone Number Jingle
jefftk (jkaufman) · 2023-05-23T15:20:01.874Z · comments (12)

GPT4 is capable of writing decent long-form science fiction (with the right prompts)
RomanS · 2023-05-23T13:41:10.105Z · comments (28)

[question] Do humans still provide value in correspondence chess?
Jonathan Paulson (jpaulson) · 2023-05-23T12:15:13.462Z · answers+comments (16)

[Linkpost] The AGI Show podcast
Soroush Pour (soroush-pour) · 2023-05-23T09:52:29.685Z · comments (0)

Data and "tokens" a 30 year old human "trains" on
Jose Miguel Cruz y Celis (jose-miguel-cruz-y-celis) · 2023-05-23T05:34:30.731Z · comments (15)

How I learned to stop worrying and love skill trees
junk heap homotopy (zrkrlc) · 2023-05-23T04:08:42.022Z · comments (2)

T-Shirt Size Distribution
jefftk (jkaufman) · 2023-05-23T02:40:01.529Z · comments (0)

AI self-improvement is possible
bhauth · 2023-05-23T02:32:08.472Z · comments (3)

Worrying less about acausal extortion
Raemon · 2023-05-23T02:08:18.900Z · comments (11)

Self-leadership and self-love dissolve anger and trauma
Richard_Ngo (ricraz) · 2023-05-22T22:30:06.650Z · comments (7)

A Manifold market notice: Binance
Scrooge Mcduck · 2023-05-22T22:24:07.076Z · comments (13)

[link] I don't want to talk about AI
KirstenH · 2023-05-22T21:23:28.841Z · comments (11)

Activation additions in a small residual network
Garrett Baker (D0TheMath) · 2023-05-22T20:28:41.264Z · comments (4)

[Linkpost] "Governance of superintelligence" by OpenAI
Daniel_Eth · 2023-05-22T20:15:25.327Z · comments (20)

Two Pieces of Advice About How to Remember Things
omnizoid · 2023-05-22T18:10:45.362Z · comments (3)

Why I Believe LLMs Do Not Have Human-like Emotions
OneManyNone (OMN) · 2023-05-22T15:46:13.328Z · comments (6)

AI Safety in China: Part 2
Lao Mein (derpherpize) · 2023-05-22T14:50:54.482Z · comments (28)

[link] Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI
Maris Sala (maris-sala) · 2023-05-22T14:31:59.139Z · comments (5)

Papers, Please #1: Various Papers on Employment, Wages and Productivity
Zvi · 2023-05-22T12:00:10.934Z · comments (2)

In Defense of «The Army of Jakoths»
MikkW (mikkel-wilson) · 2023-05-22T11:59:47.105Z · comments (10)

Speed of information input is a bottleneck for rationality
MikkW (mikkel-wilson) · 2023-05-22T10:24:37.673Z · comments (0)

Distillation of Neurotech and Alignment Workshop January 2023
lisathiergart · 2023-05-22T07:17:23.676Z · comments (9)

[link] The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)
Daniel Kokotajlo (daniel-kokotajlo) · 2023-05-22T05:49:28.145Z · comments (5)

The Stanley Parable: Making philosophy fun
Nathan1123 · 2023-05-22T02:15:53.569Z · comments (3)

Sea Monsters
Adam Zerner (adamzerner) · 2023-05-22T00:58:52.353Z · comments (11)

The Army of Jakoths (a parable)
MikkW (mikkel-wilson) · 2023-05-21T22:48:37.287Z · comments (0)

A&I (Rihanna 'S&M' parody lyrics)
nahoj · 2023-05-21T22:34:29.930Z · comments (0)

[link] Four Battlegrounds: Power in the Age of Artificial Intelligence (Book review)
PeterMcCluskey · 2023-05-21T21:19:17.228Z · comments (0)

Gender Vectors in ROME’s Latent Space
Xodarap · 2023-05-21T18:46:54.161Z · comments (2)

Weight by Impact
Vaniver · 2023-05-21T14:37:58.187Z · comments (1)

[link] [outdated] My current theory of change to mitigate existential risk by misaligned ASI
mesaoptimizer · 2023-05-21T13:46:06.570Z · comments (8)

Babble on growing trust
qbolec · 2023-05-21T13:19:50.605Z · comments (1)

Elevator Positioning
jefftk (jkaufman) · 2023-05-21T11:30:02.056Z · comments (1)

Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks
RogerDearnaley (roger-d-1) · 2023-05-21T08:29:09.896Z · comments (1)

Jeff Clune advertising a postdoc on twitter...and asking where he should target his posts
Joyee Chen (joyee-chen-1) · 2023-05-21T01:02:16.779Z · comments (0)

Running Sound for Yourself
jefftk (jkaufman) · 2023-05-20T22:10:00.902Z · comments (0)

Job Opening: SWE to help build signature vetting system for AI-related petitions
Ethan Ashkie (ethan-ashkie-1) · 2023-05-20T19:02:00.875Z · comments (0)

My Kind of Pragmatism
Nora Belrose (nora-belrose) · 2023-05-20T18:58:48.574Z · comments (11)

[link] Colors Appear To Have Almost-Universal Symbolic Associations
Thoth Hermes (thoth-hermes) · 2023-05-20T18:40:25.989Z · comments (4)

Twiblings, four-parent babies and other reproductive technology
GeneSmith · 2023-05-20T17:11:23.726Z · comments (32)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

deluks917 on Stephen Fowler's Shortform

A serious effective altruism movement with clean house. Everyone who pushed the 'work with AI capabilities company' line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.

weightt-an on LLMs could be as conscious as human emulations, potentially

Each of the transformation steps described in the post reduces my expectation that the result would be conscious somewhat.

Well, it's like saying the {human in a car as a single system} is or is not conscious. Firstly it's a weird question, because of course it is. And even if you chain the human to a wheel in such a way they will never disjoin from the car.

What I did is constrained possible actions of the human emulation. Not severely, the human still can talk whatever, just with constant compute budget, time or iterative commutation steps. Kind of like you can constrain actions of a meaty human by putting them in a jail or something. (... or in a time loop / repeated complete memory wipes)

No, I don't think it would be "what the fuck" surprising if an emulation of a human brain was not conscious.

How would you expect to this possibly cash out? Suppose there are human emulations running around doing all things exactly like meaty humans. How exactly do you expect that announcement of a high scientific council go, "We discovered that EMs are not conscious* because .... and that's important because of ...". Is that completely out of model for you? Or like, can you give me (even goofy) scenario out of that possibility

Or do you think high resolution simulations will fail to replicate capabilities of humans, outlook of them? I.e special sauce/quantum fuckery/literal magic?

eggsyntax on Language Models Model Us

Absolutely! @Jozdien recounting those anecdotes was one of the sparks for this research, as was janus showing in the comments that the base model could confidently identify gwern. (I see I've inexplicably failed to thank Arun at the end of my post, need to fix that).

Interestingly, I was able to easily reproduce the gwern identification using the public model, so it seems clear that these capabilities are not entirely RLHFed away, although they may be somewhat impaired.

eggsyntax on Language Models Model Us

Oh thanks, I'd missed that somehow & thought that only the temp mattered for that.

eggsyntax on Language Models Model Us

Thanks!

eggsyntax on Language Models Model Us

That used to work, but as of March you can only get the pre-logit_bias logprobs back. They didn't announce the change, but it's discussed in the OpenAI forums eg here. I noticed the change when all my code suddenly broke; you can still see remnants of that approach in the code.

akash-wasil on DeepMind's "Frontier Safety Framework" is weak and unambitious

@Zach Stein-Perlman [LW · GW] I appreciate your recent willingness to evaluate and criticize safety plans from labs. I think this is likely a public good that is underprovided, given the strong incentives that many people have to maintain a good standing with labs (not to mention more explicit forms of pressure applied by OpenAI and presumably other labs).

One thought: I feel like the difference between how you described the Anthropic RSP and how you described the OpenAI PF feels stronger than the actual quality differences between the documents. I agree with you that the thresholds in the OpenAI PF are too high, but I think the PF should get "points" for spelling out risks that go beyond ASL-3/misuse.

OpenAI has commitments that are insufficiently cautious for ASL-4+ (or what they would call high/critical on model autonomy), but Anthropic circumvents this problem by simply refusing to make any commitments around ASL-4 (for now).

You note this limitation when describing Anthropic's RSP, but you describe it as "promising" while describing the PF as "unpromising." In my view, this might be unfairly rewarding Anthropic for just not engaging with the hardest parts of the problem (or unfairly penalizing OpenAI for giving their best guess answers RE how to deal with the hardest parts of the problem).

We might also just disagree on how firm or useful the commitments in each document are– I walked away with a much better understanding of how OpenAI plans to evaluate & handle risks than how Anthropic plans to handle & evaluate risks. I do think OpenAI's thresholds are too high, but It's likely that I'll feel the same way about Anthropic's thresholds. In particular, I don't expect either (any?) lab to be able to resist the temptation to internally deploy models with autonomous persuasion capabilities or autonomous AI R&D capabilities (partially because the competitive pressures and race dynamics pushing them to do so will be intense). I don't see evidence that either lab is taking these kinds of risks particularly seriously, has ideas about what safeguards would be considered sufficient, or is seriously entertaining the idea that we might need to do a lot (>1 year) of dedicated safety work (that potentially involves coming up with major theoretical insights, as opposed to a "we will just solve it with empiricism" perspective) before we are confident that we can control such systems.

TLDR: I would remove the word "promising" and maybe characterize the RSP more like "an initial RSP that mostly spells out high-level reasoning, makes few hard commitments, and focuses on misuse while missing the all-important evals and safety practices for ASL-4."

keltan on Why you should learn a musical instrument

I haven’t, but I’ll take a look. I appreciate the recommendation!

8e9 on AISafety.com – Resources for AI Safety

Some thoughts about the intro video: it's excellent but ends a little abruptly; it would be good to explain how AISafety.com plans to help. Also, I quite dislike flashing images (around the 45-second mark).

algon on Why you should learn a musical instrument

Since you seem interested in nootropics, I wonder if you've read Gwern's list of nootropic self-experiments? He covers a lot of supplements, some of which are pretty obscure AFAICT.~

EDIT: https://gwern.net/nootropic/nootropics