LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

[link] Against Diversification
Jack Malde (jackmalde) · 2022-12-22T13:29:38.765Z · comments (0)

[link] Notes on Meta's Diplomacy-Playing AI
Erich_Grunewald · 2022-12-22T11:34:27.384Z · comments (2)

Take 13: RLHF bad, conditioning good.
Charlie Steiner · 2022-12-22T10:44:06.359Z · comments (4)

Applied Linear Algebra Lecture Series
johnswentworth · 2022-12-22T06:57:26.643Z · comments (8)

Naive Set Theory, Halmos
David Udell · 2022-12-22T02:34:38.509Z · comments (1)

Not Getting Hacked
jefftk (jkaufman) · 2022-12-21T21:40:05.254Z · comments (14)

[link] Metaphor.systems
the gears to ascension (lahwran) · 2022-12-21T21:31:17.373Z · comments (9)

[question] How much is DQC (Dynamic Quantum Clustering) currently looked into in AI Capabilities Research?
macmillan · 2022-12-21T20:46:55.448Z · answers+comments (0)

[link] Think wider about the root causes of progress
jasoncrawford · 2022-12-21T20:05:46.986Z · comments (11)

[question] What readings did you consider best for the happy parts of the secular solstice?
ChristianKl · 2022-12-21T15:45:44.583Z · answers+comments (0)

Recreating logic in type theory
Thomas Kehrenberg (thomas-kehrenberg) · 2022-12-21T15:19:18.275Z · comments (0)

You become the UI you use
Viliam · 2022-12-21T15:04:17.072Z · comments (7)

Price's equation for neural networks
tailcalled · 2022-12-21T13:09:16.527Z · comments (4)

Decisions: Ontologically Shifting to Determinism
Chris_Leong · 2022-12-21T12:41:30.884Z · comments (11)

[link] A Comprehensive Mechanistic Interpretability Explainer & Glossary
Neel Nanda (neel-nanda-1) · 2022-12-21T12:35:08.589Z · comments (6)

[link] Google Search loses to ChatGPT fair and square
shminux · 2022-12-21T08:11:43.287Z · comments (17)

Sazen
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2022-12-21T07:54:51.415Z · comments (83)

[link] Podcast: What's Wrong With LessWrong
Alfred · 2022-12-21T07:06:08.728Z · comments (11)

[link] New AI risk intro from Vox [link post]
JakubK (jskatt) · 2022-12-21T06:00:06.031Z · comments (1)

Local Memes Against Geometric Rationality
Scott Garrabrant · 2022-12-21T03:53:28.196Z · comments (3)

Logging Shell History in Zsh
jefftk (jkaufman) · 2022-12-21T03:30:03.180Z · comments (2)

CIRL Corrigibility is Fragile
Rachel Freedman (rachelAF) · 2022-12-21T01:40:50.232Z · comments (9)

[question] [DISC] Are Values Robust?
DragonGod · 2022-12-21T01:00:29.939Z · answers+comments (9)

Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values
Garrett Baker (D0TheMath) · 2022-12-21T00:44:55.373Z · comments (10)

[link] Progress links and tweets, 2022-12-20
jasoncrawford · 2022-12-21T00:35:59.686Z · comments (0)

K-complexity is silly; use cross-entropy instead
So8res · 2022-12-20T23:06:27.131Z · comments (53)

Podcast: Tamera Lanham on AI risk, threat models, alignment proposals, externalized reasoning oversight, and working at Anthropic
Akash (akash-wasil) · 2022-12-20T21:39:41.866Z · comments (2)

[link] Discovering Language Model Behaviors with Model-Written Evaluations
evhub · 2022-12-20T20:08:12.063Z · comments (34)

[link] Reflections: Bureaucratic Hell
Haris Rashid (haris-rashid) · 2022-12-20T19:22:13.606Z · comments (1)

[link] Proliferating Education
Haris Rashid (haris-rashid) · 2022-12-20T19:22:13.492Z · comments (2)

AGI is here, but nobody wants it. Why should we even care?
MGow · 2022-12-20T19:14:25.696Z · comments (0)

Properties of current AIs and some predictions of the evolution of AI from the perspective of scale-free theories of agency and regulative development
Roman Leventov · 2022-12-20T17:13:00.669Z · comments (3)

I believe some AI doomers are overconfident
[deleted] · 2022-12-20T17:09:23.325Z · comments (15)

Note on algorithms with multiple trained components
Steven Byrnes (steve2152) · 2022-12-20T17:08:24.057Z · comments (4)

Marvel Snap: Phase 2
Zvi · 2022-12-20T14:50:00.460Z · comments (1)

(Extremely) Naive Gradient Hacking Doesn't Work
ojorgensen · 2022-12-20T14:35:33.591Z · comments (0)

An Open Agency Architecture for Safe Transformative AI
davidad · 2022-12-20T13:04:06.409Z · comments (22)

[link] Under-Appreciated Ways to Use Flashcards - Part I
Florence Hinder (florence-hinder) · 2022-12-20T12:43:31.387Z · comments (5)

EA & LW Forums Weekly Summary (12th Dec - 18th Dec 22')
Zoe Williams (GreyArea) · 2022-12-20T09:49:51.463Z · comments (0)

[link] [link, 2019] AI paradigm: interactive learning from unlabeled instructions
the gears to ascension (lahwran) · 2022-12-20T06:45:30.035Z · comments (0)

[Fiction] Unspoken Stone
Gordon Seidoh Worley (gworley) · 2022-12-20T05:11:23.231Z · comments (0)

Notice when you stop reading right before you understand
just_browsing · 2022-12-20T05:09:43.224Z · comments (6)

Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Charlie Steiner · 2022-12-20T05:01:50.659Z · comments (1)

More notes from raising a late-talking kid
Steven Byrnes (steve2152) · 2022-12-20T02:13:01.018Z · comments (2)

The "Minimal Latents" Approach to Natural Abstractions
johnswentworth · 2022-12-20T01:22:25.101Z · comments (24)

[link] our deepest wishes
Tamsin Leake (carado-1) · 2022-12-20T00:23:32.892Z · comments (0)

Shard Theory in Nine Theses: a Distillation and Critical Appraisal
LawrenceC (LawChan) · 2022-12-19T22:52:20.031Z · comments (30)

[question] Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
Yann Dubois (yann-dubois) · 2022-12-19T22:42:30.959Z · answers+comments (6)

AGI Timelines in Governance: Different Strategies for Different Timeframes
simeon_c (WayZ) · 2022-12-19T21:31:25.746Z · comments (28)

Towards Hodge-podge Alignment
Cleo Nardo (strawberry calm) · 2022-12-19T20:12:14.540Z · comments (30)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

zach-stein-perlman on Akash's Shortform

Sorry for brevity, I'm busy right now.

Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism."
It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default.

akash-wasil on Akash's Shortform

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
- People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
- So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of "good judgment"– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.

With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).

Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman [LW · GW], and of course Jan Leike and Daniel K.

akash-wasil on DeepMind's "Frontier Safety Framework" is weak and unambitious

I personally have a large amount of uncertainty around how useful prosaic techniques & control techniques will be. Here are a few statements I'm more confident in:

Ideally, AGI development would have much more oversight than we see in the status quo. Whether or not development or deployment activities keep national security risks below acceptable levels should be a question that governments are involved in answering. A sensible oversight regime would require evidence of positive safety or "affirmative safety".
My biggest concern with the prosaic/control metastrategy is that I think race dynamics substantially decrease its usefulness. Even if ASL-4 systems are deployed internally in a safe way, we're still not out of the acute risk period. And even if the leading lab (Lab A) is trustworthy/cautious, it will be worried that incautious Lab B is about to get to ASL-4 in 1-3 months. This will cause the leading lab to underinvest into control, feel like it doesn't have much time to figure out how to use its ASL-4 system (assuming it can be controlled), and feel like it needs to get to ASL-5+ rather quickly.

It's still plausible to me that perhaps this period of a few months is enough to pull off actions that get us out of the acute risk period (e.g., use the ASL-4 system to generate evidence that controlling more powerful systems would require years of dedicated effort and have Lab A devote all of their energy toward getting governments to intervene).

Given my understanding of the current leading labs, it's more likely to me that they'll underestimate the difficulties of bootstrapped alignment [LW · GW] and assume that things are OK as long as empirical tests don't show imminent evidence of danger. I don't think this prior is reasonable in the context of developing existentially dangerous technologies, particularly technologies that are intended to be smarter than you. I think sensible risk management [LW · GW] in such contexts should require a stronger theoretical/conceptual understanding of the systems one is designing.

(My guess is that you agree with some of these points and I agree with some points along the lines of "maybe prosaic/control techniques will just work, we aren't 100% sure they're not going to work", but we're mostly operating in different frames.)

(I also do like/respect a lot of the work you and Buck have done on control. I'm a bit worried that the control meme is overhyped, partially because it fits into the current interests of labs. Like, control seems like a great idea and a useful conceptual frame, but I haven't yet seen a solid case for why we should expect specific control techniques to work once we get to ASL-4 or ASL-4.5 systems, as well as what we plan to do with those systems to get us out of the acute risk period. Like, the early work on using GPT-3 to evaluate GPT-4 was interesting, but it feels like the assumption about the human red-teamers being better at attacking than GPT-4 will go away– or at least be much less robust– once we get to ASL-4. But I'm also sympathetic to the idea that we're at the early stages of control work, and I am genuinely interested in seeing what you, Buck, and others come up with as the control agenda progresses.)

mesaoptimizer on Tamsin Leake's Shortform

I still parse that move as devastating the commons in order to make a quick buck.

I believe that ChatGPT was not released with the expectation that it would become as popular as it did. OpenAI pivoted hard when it saw the results.

Also, I think you are misinterpreting the sort of 'updates' people are making here.

nathan-helm-burger on Scientific Notation Options

Yeah, agreed. Also, using just an e makes it much easier to type on a phone keyboard.

There are also other variants, like ee and EE. And also sometimes you see a variant which uses only multiples of three as the exponent. I think it's called engineering notation instead of scientific notation? So like 1e3, 50e3, 700e6, 2e9. I also like this version less.

lesswronguser123 on Advice for Activists from the History of Environmentalism

instead semi-sensible policies would get considered somewhere in the bureaucracy of the states?

Whilst normally having radical groups is useful for shifting the Overton window or abusing anchoring effects in this case study of environmentalism I think it backfired from what I can understand, given the polling data of public in the sample country already caring about the environment.

ryan_greenblatt on Stephen Fowler's Shortform

I don't see how this is relevant to my comment.

By "positive EV bets" I meant positive EV with respect to shared values, not with respect to personal gain.

ETA: Maybe your view is that leaders should take this bets anyway even though they know they are likely to result in a forced retirement. (E.g. ignoring the disincentive.) I was actually thinking of the disincentive effect as: you are actually a good leader, so you remaining in power would be good, therefore you should avoid actions that result in you losing power for unjustified reasons. Therefore you should avoid making positive EV bets (as making these bets is now overall negative EV as it will result in a forced leadership transition which is bad). More minimally, you strongly select for leaders which don't make such bets.

nnotm on "If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

Bugs could potentially result in a new sentient species many millions of years down the line. With super-AI that happens to be non-sentient, there is no such hope.

ryan_greenblatt on Stephen Fowler's Shortform

Do you think that whenever anyone makes a decision that ends up being bad ex-post they should be forced to retire?

Doesn't this strongly disincentivize making positive EV bets which are likely to fail?

review-bot on DeepMind's "Frontier Safety Framework" is weak and unambitious

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?