LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

[question] Are limited-horizon agents a good heuristic for the off-switch problem?
[deleted] · 2021-12-05T19:27:59.894Z · answers+comments (19)

ML Alignment Theory Program under Evan Hubinger
ozhang (oliver-zhang) · 2021-12-06T00:03:15.443Z · comments (3)

Anti-correlated causation
DirectedEvolution (AllAmericanBreakfast) · 2021-12-06T04:36:17.439Z · comments (2)

A Framework to Explain Bayesian Models
Jsevillamol · 2021-12-06T10:38:25.815Z · comments (1)

Modeling Failure Modes of High-Level Machine Intelligence
Ben Cottier (ben-cottier) · 2021-12-06T13:54:38.147Z · comments (1)

Life, struggle, and the psychological fallout from COVID
Alex Flint (alexflint) · 2021-12-06T16:59:39.611Z · comments (1)

Omicron Post #4
Zvi · 2021-12-06T17:00:01.470Z · comments (66)

Information bottleneck for counterfactual corrigibility
tailcalled · 2021-12-06T17:11:12.984Z · comments (1)

A Possible Resolution To Spurious Counterfactuals
JoshuaOSHickman · 2021-12-06T18:26:41.409Z · comments (5)

[link] Implications of the Grabby Aliens Model
harsimony · 2021-12-06T18:34:44.985Z · comments (3)

Are there alternative to solving value transfer and extrapolation?
Stuart_Armstrong · 2021-12-06T18:53:52.659Z · comments (8)

More Christiano, Cotra, and Yudkowsky on AI progress
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-12-06T20:33:12.164Z · comments (28)

Declustering, reclustering, and filling in thingspace
Stuart_Armstrong · 2021-12-06T20:53:14.559Z · comments (6)

Leaving Orbit
Rob Bensinger (RobbBB) · 2021-12-06T21:48:41.371Z · comments (17)

Dear Self; We Need To Talk About Social Media
Elizabeth (pktechgirl) · 2021-12-07T00:40:01.949Z · comments (19)

Ordering yourself around with an app
bfinn · 2021-12-07T00:49:11.546Z · comments (2)

Retail Investor Advantages
leogao · 2021-12-07T02:08:20.694Z · comments (13)

Considerations on interaction between AI and expected value of the future
Beth Barnes (beth-barnes) · 2021-12-07T02:46:19.215Z · comments (28)

Interviews on Improving the AI Safety Pipeline
Chris_Leong · 2021-12-07T12:03:04.420Z · comments (15)

Exterminating humans might be on the to-do list of a Friendly AI
RomanS · 2021-12-07T14:15:07.206Z · comments (8)

Counting Lightning
Jsevillamol · 2021-12-07T14:50:55.680Z · comments (8)

Randomness in Science
rogersbacon · 2021-12-07T18:17:51.232Z · comments (15)

HIRING: Inform and shape a new project on AI safety at Partnership on AI
madhu_lika · 2021-12-07T19:37:31.220Z · comments (0)

Let's buy out Cyc, for use in AGI interpretability systems?
Steven Byrnes (steve2152) · 2021-12-07T20:46:10.303Z · comments (10)

Theoretical Neuroscience For Alignment Theory
Cameron Berg (cameron-berg) · 2021-12-07T21:50:10.142Z · comments (18)

Some thoughts on why adversarial training might be useful
Beth Barnes (beth-barnes) · 2021-12-08T01:28:22.974Z · comments (6)

What makes for a good "argument"? (Request for thoughts and comments)
Simon DeDeo (simon-dedeo) · 2021-12-08T02:16:10.805Z · comments (3)

Interpreting the Biobot Spike
jefftk (jkaufman) · 2021-12-08T16:30:07.924Z · comments (1)

[link] Deepmind's Gopher--more powerful than GPT-3
hath · 2021-12-08T17:06:32.650Z · comments (26)

The Last Questions (part 1)
rogersbacon · 2021-12-08T18:09:53.760Z · comments (0)

[AN #170]: Analyzing the argument for risk from power-seeking AI
Rohin Shah (rohinmshah) · 2021-12-08T18:10:04.022Z · comments (1)

Finding the multiple ground truths of CoinRun and image classification
Stuart_Armstrong · 2021-12-08T18:13:01.576Z · comments (4)

Seeing the Invisible (And How to Think About Machine Learning)
Filip Dousek (fidnie) · 2021-12-08T21:04:49.828Z · comments (0)

COVID and the holidays
Connor_Flexman · 2021-12-08T23:13:56.097Z · comments (31)

Introduction to inaccessible information
Ryan Kidd (ryankidd44) · 2021-12-09T01:28:48.154Z · comments (6)

[link] [Linkpost] Cat Couplings
mike_hawke · 2021-12-09T01:41:11.646Z · comments (1)

Stop arbitrarily limiting yourself
unoptimal · 2021-12-09T02:42:34.466Z · comments (7)

Austin Winter Solstice
SilasBarta · 2021-12-09T05:01:17.511Z · comments (1)

Supervised learning and self-modeling: What's "superhuman?"
Charlie Steiner · 2021-12-09T12:44:14.004Z · comments (1)

[MLSN #2]: Adversarial Training
Dan H (dan-hendrycks) · 2021-12-09T17:16:49.684Z · comments (0)

[link] The end of Victorian culture, part I: structural forces
David Hugh-Jones (david-hugh-jones) · 2021-12-09T19:25:23.222Z · comments (0)

[question] What alignment-related concepts should be better known in the broader ML community?
Lauro Langosco · 2021-12-09T20:44:09.228Z · answers+comments (4)

LessWrong discussed in New Ideas in Psychology article
rogersbacon · 2021-12-09T21:01:17.920Z · comments (11)

Omicron Post #5
Zvi · 2021-12-09T21:10:00.469Z · comments (18)

Conversation on technology forecasting and gradualism
Richard_Ngo (ricraz) · 2021-12-09T21:23:21.187Z · comments (30)

Covid 12/9: Counting Down the Days
Zvi · 2021-12-09T21:40:01.105Z · comments (12)

Combining Forecasts
jsteinhardt · 2021-12-10T02:10:14.402Z · comments (1)

Are big brains for processing sensory input?
lsusr · 2021-12-10T07:08:31.495Z · comments (20)

The Promise and Peril of Finite Sets
davidad · 2021-12-10T12:29:56.535Z · comments (4)

There is essentially one best-validated theory of cognition.
abramdemski · 2021-12-10T15:51:06.423Z · comments (33)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

deluks917 on romeostevensit's Shortform

Lumina is incredibly cheap right now. I pre-ordered for 250usd. Even genuinely quite poor people I know don't find the price off-putting (poor in the sense of absolutely poor for the country they live in). I have never met a single person who decided not to try Lumina because the price was high. If they pass its always because they think its risky.

silentbob on The Alignment Problem No One Is Talking About

Just to note your last paragraph reminds me of Stuart Russel's approach to AI alignment in Human Compatible. And I agree this sounds like a reasonable starting point.

jett on Transformers Represent Belief State Geometry in their Residual Stream

This is such a cool result! I tried to reproduce it in this notebook

russellthor on Against "argument from overhang risk"

In terms of the big labs being inefficient, with hindsight perhaps. Anyway I have said that I can't understand why they aren't putting much more effort into Dishbrain etc. If I had ~$1B and wanted to get ahead on a 5 year timescale I would give it more probability expectation etc.

For

I am here for credibility. I am sufficiently highly confident they are not X-risk to not want to recommend stopping. I want the field to have credibility for later.
Yes, but I don't think stopping the training runs is much of an otherwise good thing if at all. To me it seems more like inviting a fire safety expert and they recommend a smoke alarm in your toilet but not kitchen. If we can learn alignment stuff from such training runs, then stopping is an otherwise bad thing.
OK I'm not up with the details but some experts sure think we learnt a lot from 3.5/4.0. Also my belief about it often being a good idea to deploy the most advanced non X-risk AI as defense. (This is somewhat unclear, usually what doesn't kill makes stronger, but I am concerned about AI companion/romantic partner etc. That could weaken society in a way to make it more likely to make bad decisions later. But that seems to have already happened and very large models being centralized could be secured against more capable/damaging versions.)

teatieandhat on Should I Finish My Bachelor's Degree?

I’m probably typical-minding a bit here, but: you say you have had mental health issues in the past (which, based on how you describe them, sound at least superficially similar to my own), and that you feel like you’ve outlived yourself. Which, although it is a feeling I recognise, is still a surprising thing to say: even a high P(doom) only tells you that your life might soon have to stop, not that it already has! My wild-ass guess would be that, in addition to maybe having something to prove intellectually and psychologically, you feel lost, with the ability to do things (btw, I didn’t know your blog and it’s pretty neat) but nothing in particular to do. Maybe you’re considering finishing your degree because it gives you a medium-term goal with some structure in the tasks associated with it?

aaron-bergman on quila's Shortform

Thank you, that is all very kind! ☺️☺️☺️

I expect if he continues being what he is, he'll produce lots of cool stuff which I'll learn from later.

I hope so haha

jett on Transformers Represent Belief State Geometry in their Residual Stream

For the two sets of mess3 parameters I checked the stationary distribution was uniform.

ben-lang on Losing Faith In Contrarianism

Nice post. Gets at something real.

My feeling is that a lot of contrarians get "pulled into" a more contrarian view. I have noticed myself in discussions propose a (specific, technical point correcting a detail of a particular model). Then, when I talk to people about it I feel like they are trying to pull me towards the simpler position (all those idiots are wrong, its completely different from that). This happens with things like "ah, so you mean...", which is very direct. But also through a much more subtle process, where I talk to many people, and most of them go away thinking "Ok, specific technical correction on a topic I don't care about that much." and most of them never talk or think about it again. But the people who get the exaggerated idea are more likely to remember.

russellthor on Against "argument from overhang risk"

If you are referring to this:

If we institute a pause, we should expect to see (counterfactually) reduced R&D investment in improving hardware capabilities, reduced investment in scaling hardware production, reduced hardware production, reduced investment in research, reduced investment in supporting infrastructure, and fewer people entering the field.

This seems an extreme claim to me (if these effects are argued to be meaningful), especially "fewer people entering the field"! Just how long do you think you would need a pause to make fewer people enter the field? I would expect that not only would the pause have to have lasted say 5+ years but there would have to be a worldwide expectation that it would go on for longer to actually put people off.

Because of flow on effects and existing commitments, reduced hardware R&D investment wouldn't start for a few years either. Its not clear that it will meaningfully happen at all if we want to deploy existing LLM everywhere also. For example in robotics I expect there will be substantial demand for hardware even without AI advances as our current capabilities havn't been deployed there yet.

As I have said here, and probably in other places, I am quite a bit more in favor of directly going for a hardware pause specifically for the most advanced hardware. I think it is achievable, impactful, and with clearer positive consequences (and not unintended negative ones) than targeting training runs of an architecture that already seems to be showing diminishing returns.

If you must go for after FLOPS for training, then build in large factors of safety for architectures/systems that are substantially different from what is currently done. I am not worried about unlimited FLOPS on GPT-X but could be for >100* less on something that clearly looks like it has very different scaling laws.

connor-kissane on Sparse Autoencoders Work on Attention Layer Outputs

Thanks for the comment! We always use the pre-ReLU feature activation, which is equal to the post-ReLU activation (given that the feature is activate), and is purely linear function of z. Edited the post for clarity.