LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

Cyborg Periods: There will be multiple AI transitions
Jan_Kulveit · 2023-02-22T16:09:04.858Z · comments (9)

The Open Agency Model
Eric Drexler · 2023-02-22T10:35:12.316Z · comments (18)

Intervening in the Residual Stream
MadHatter · 2023-02-22T06:29:37.973Z · comments (1)

What do language models know about fictional characters?
skybrian · 2023-02-22T05:58:43.130Z · comments (0)

Power-Seeking = Minimising free energy
Jonas Hallgren · 2023-02-22T04:28:44.075Z · comments (10)

[link] The shallow reality of 'deep learning theory'
Jesse Hoogland (jhoogland) · 2023-02-22T04:16:11.216Z · comments (11)

Candyland is Terrible
jefftk (jkaufman) · 2023-02-22T01:50:03.375Z · comments (2)

A proof of inner Löb's theorem
James Payor (JamesPayor) · 2023-02-21T21:11:41.183Z · comments (0)

Fighting For Our Lives - What Ordinary People Can Do
TinkerBird · 2023-02-21T20:36:32.579Z · comments (18)

The Emotional Type of a Decision
moridinamael · 2023-02-21T20:35:17.276Z · comments (0)

What is it like doing AI safety work?
KatWoods (ea247) · 2023-02-21T20:12:01.977Z · comments (2)

Pretraining Language Models with Human Preferences
Tomek Korbak (tomek-korbak) · 2023-02-21T17:57:09.774Z · comments (18)

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my dissertation)
Joe Carlsmith (joekc) · 2023-02-21T17:26:12.981Z · comments (15)

EIS X: Continual Learning, Modularity, Compression, and Biological Brains
scasper · 2023-02-21T16:59:42.438Z · comments (4)

No Room for Political Philosophy
Arturo Macias (arturo-macias) · 2023-02-21T16:11:38.010Z · comments (7)

Deceptive Alignment is <1% Likely by Default
DavidW (david-wheaton) · 2023-02-21T15:09:27.920Z · comments (26)

AI #1: Sydney and Bing
Zvi · 2023-02-21T14:00:00.480Z · comments (44)

You're not a simulation, 'cause you're hallucinating
Stuart_Armstrong · 2023-02-21T12:12:21.889Z · comments (6)

Basic facts about language models during training
beren · 2023-02-21T11:46:12.256Z · comments (14)

[link] [Preprint] Pretraining Language Models with Human Preferences
Giulio (thesofakillers) · 2023-02-21T11:44:27.423Z · comments (0)

Breaking the Optimizer’s Curse, and Consequences for Existential Risks and Value Learning
Roger Dearnaley · 2023-02-21T09:05:43.010Z · comments (1)

[link] Medlife Crisis: "Why Do People Keep Falling For Things That Don't Work?"
RomanHauksson (r) · 2023-02-21T06:22:23.608Z · comments (5)

A foundation model approach to value inference
sen · 2023-02-21T05:09:29.658Z · comments (0)

Instrumentality makes agents agenty
porby · 2023-02-21T04:28:57.190Z · comments (4)

Gamified narrow reverse imitation learning
TekhneMakre · 2023-02-21T04:26:45.792Z · comments (0)

Feelings are Good, Actually
Gordon Seidoh Worley (gworley) · 2023-02-21T02:38:11.793Z · comments (1)

AI alignment researchers don't (seem to) stack
So8res · 2023-02-21T00:48:25.186Z · comments (40)

EA & LW Forum Weekly Summary (6th - 19th Feb 2023)
Zoe Williams (GreyArea) · 2023-02-21T00:26:33.146Z · comments (0)

What to think when a language model tells you it's sentient
Robbo · 2023-02-21T00:01:54.585Z · comments (6)

On second thought, prompt injections are probably examples of misalignment
lc · 2023-02-20T23:56:33.571Z · comments (5)

Nothing Is Ever Taught Correctly
LVSN · 2023-02-20T22:31:50.917Z · comments (3)

Behavioral and mechanistic definitions (often confuse AI alignment discussions)
LawrenceC (LawChan) · 2023-02-20T21:33:01.499Z · comments (5)

Validator models: A simple approach to detecting goodharting
beren · 2023-02-20T21:32:25.957Z · comments (1)

There are no coherence theorems
Dan H (dan-hendrycks) · 2023-02-20T21:25:48.478Z · comments (115)

[question] Are there any AI safety relevant fully remote roles suitable for someone with 2-3 years of machine learning engineering industry experience?
Malleable_shape · 2023-02-20T19:57:12.955Z · answers+comments (2)

A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex (Stefan42) · 2023-02-20T19:35:14.027Z · comments (8)

Sydney the Bingenator Can't Think, But It Still Threatens People
Valentin Baltadzhiev (valentin-baltadzhiev) · 2023-02-20T18:37:44.500Z · comments (2)

EIS IX: Interpretability and Adversaries
scasper · 2023-02-20T18:25:43.641Z · comments (7)

What AI companies can do today to help with the most important century
HoldenKarnofsky · 2023-02-20T17:00:10.531Z · comments (3)

[link] Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky
bayesed · 2023-02-20T16:42:07.413Z · comments (54)

[link] Speculative Technologies launch and Ben Reinhardt AMA
jasoncrawford · 2023-02-20T16:33:56.964Z · comments (0)

[link] [MLSN #8] Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming
Dan H (dan-hendrycks) · 2023-02-20T15:54:13.791Z · comments (0)

Bing finding ways to bypass Microsoft's filters without being asked. Is it reproducible?
Christopher King (christopher-king) · 2023-02-20T15:11:28.538Z · comments (15)

Metaculus Introduces New 'Conditional Pair' Forecast Questions for Making Conditional Predictions
ChristianWilliams · 2023-02-20T13:36:19.649Z · comments (0)

On Investigating Conspiracy Theories
Zvi · 2023-02-20T12:50:00.891Z · comments (38)

The Estimation Game: a monthly Fermi estimation web app
Sage Future (aaron-ho-1) · 2023-02-20T11:33:04.736Z · comments (2)

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading
Bill Benzon (bill-benzon) · 2023-02-20T11:32:06.635Z · comments (87)

Russell Conjugations list & voting thread
Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-20T06:39:44.021Z · comments (62)

Emergent Deception and Emergent Optimization
jsteinhardt · 2023-02-20T02:40:09.912Z · comments (0)

AGI doesn't need understanding, intention, or consciousness in order to kill us, only intelligence
James Blaha (james-blaha) · 2023-02-20T00:55:34.329Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

vladimir_nesov on Alexander Gietelink Oldenziel's Shortform

(Re: Difficult to Parse react on the other comment [LW(p) · GW(p)]
I was confused about relevance of your comment above [LW(p) · GW(p)] on chunky innovations, and it seems to be making some point (for which what it actually says is an argument), but I can't figure out what it is. One clue was that it seems like you might be talking about innovations needed for superintelligence, while I was previously talking about possible absence of need for further innovations to reach autonomous researcher chatbots, an easier target. So I replied with formulating this distinction and some thoughts on the impact and conditions for reaching innovations of both kinds. Possibly the relevance of this was confusing in turn.)

dr_s on Stephen Fowler's Shortform

Aren't these different things? Private yes, for profit no. It was private because it's not like it was run by the US government.

emrik-1 on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

The links/graphics are broken btw. Would probably be nice to fix if it's quick.

dr_s on Stephen Fowler's Shortform

I think there's a solid case for anyone who supported funding OpenAI being considered at best well intentioned but very naive. I think the idea that we should align and develop superintelligence but, like, good, has always been a blind spot in this community - an obviously flawed but attractive goal, because it dodged the painful choice between extinction risk and abandoning hopes of personally witnessing the singularity or at least a post scarcity world. This is also a case where people's politics probably affected them, because plenty of others would be instinctively distrustful of corporation driven solutions to anything - it's something of a Godzilla Strategy after all, aligning corporations is also an unsolved problem - but those with an above average level of trust in free markets weren't so averse.

Such people don't necessarily have conflicts of interest (though some may, and that's another story) but they at least need to drop the fantasy land stuff and accept harsh reality on this before being of any use.

rom on [Linkpost] Please don't take Lumina's anticavity probiotic

The piece is unfair towards Bay Area Rationalists, but the critiques of Lumina can stand separate from what the author thinks about LW readers. "Haters gonna occasionally make some valid points" and such. Sometimes people who unfairly dislike you can also make valid critiques.

I think it's a fair point to note that:

Lumina have not done any clinical trials
They circumnavigated the FDA by classifying it as a cosmetic
They aren't following best practice guidelines for probiotics (granted actually I don't know how important that is)

eggsyntax on Language Models Model Us

On reflection I somewhat endorse pointing the risk out after discovering it, in the spirit of open collaboration, as you did. It was just really frustrating when all my experiments suddenly broke for no apparent reason. But that's mostly on OpenAI for not announcing the change to their API (other than emails sent to some few people). Apologies for grouching in your direction.

akash-wasil on robo's Shortform

There are some conversations about policy & government response taking place. I think there are a few main reasons you don't see them on LessWrong:

There really aren't that many serious conversations about AI policy, particularly in future worlds where there is greater concern and political will. Much of the AI governance community focuses on things that are within the current Overton Window.
Some conversations take place among people who work for governments & aren't allowed to (or are discouraged from) sharing a lot of their thinking online.
LessWrong does not have a history of being a particularly thoughtful place for people to have policy discussions, and a lot of "serious policy people" don't really think LW users will have much to add.
There's a perception that LessWrong has a bit of a libertarian-leaning bias. Some people think LWers are generally kind of anti-government, pro-tech people who are more interested in metastrategies along the lines of "how can me and my smart technical friends save the world" as opposed to "how can governments intervene to prevent the premature development of dangerous technology."

If anyone here is interested in thinking about "40% agreement" scenarios or more broadly interested in how governments should react in worlds where there is greater evidence of risk, feel free to DM me. Some of my current work focuses on the idea of "emergency preparedness"– how we can improve the government's ability to detect & respond to AI-related emergencies.

jonas-hallgren on Examples of Highly Counterfactual Discoveries?

Sure! Anything more specific that you want to know about? Practice advice or more theory?

stephen-fowler on Stephen Fowler's Shortform

So the case for the grant wasn't "we think it's good to make OAI go faster/better".

I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous

Rather, the grant was bad for numerous reasons, including but not limited to:

It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
It enabled OpenAI to "safety-wash" their product (although how important this has been is unclear to me.)
From what I've seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you're only concerned with human misuse and not misalignment.
Finally, it's giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.

This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI's value at the time the grant was given. However, wikipedia mentions that "In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone." This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.

Keep in mind, the grant needs to have generated 30 million in EV just to break even. I'm now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven't adjusted for inflation. I'm not claiming these are the best uses of 30 million dollars.

The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI's 2017 fundraiser [EA · GW], using 2020 numbers gives an estimate of ~4 years).

Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.

Or it could have saved the lives of ~1300 children [EA · GW].

This analysis is obviously much worse if in fact the grant was negative EV.

quila on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

I think this post is a red flag about your mental health. "I work so hard that I ignore broken glass and then walk on it" is not healthy.

Seems like a rational prioritization to me if they were in an important moment of thought and didn't want to disrupt it. (Noting of course that 'walking on it' was not intentional and was caused by forgetting it was there.)

Also, I would feel pretty bad if someone wrote a comment like this after I posted something. (Maybe it would have been better as a PM)