LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Path dependence in ML inductive biases
Vivek Hebbar (Vivek) · 2022-09-10T01:38:22.885Z · comments (13)

Quintin's alignment papers roundup - week 2
Quintin Pope (quintin-pope) · 2022-09-19T13:41:27.104Z · comments (2)

Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res · 2022-09-29T21:18:44.402Z · comments (7)

Book review: “The Heart of the Brain: The Hypothalamus and Its Hormones”
Steven Byrnes (steve2152) · 2022-09-27T13:20:51.434Z · comments (3)

A game of mattering
KatjaGrace · 2022-09-23T02:30:15.714Z · comments (7)

LOVE in a simbox is all you need
jacob_cannell · 2022-09-28T18:25:31.283Z · comments (72)

[Closed] Prize and fast track to alignment research at ALTER
Vanessa Kosoy (vanessa-kosoy) · 2022-09-17T16:58:24.839Z · comments (6)

[link] Self-Control Secrets of the Puritan Masters
David Hugh-Jones (david-hugh-jones) · 2022-09-26T09:04:56.895Z · comments (3)

Private alignment research sharing and coordination
porby · 2022-09-04T00:01:22.337Z · comments (13)

Gradient Hacker Design Principles From Biology
johnswentworth · 2022-09-01T19:03:16.836Z · comments (13)

[link] Argument against 20% GDP growth from AI within 10 years [Linkpost]
aogara (Aidan O'Gara) · 2022-09-12T04:08:03.901Z · comments (21)

Clarifying the Agent-Like Structure Problem
johnswentworth · 2022-09-29T21:28:08.813Z · comments (15)

Fake qualities of mind
Kaj_Sotala · 2022-09-22T16:40:05.085Z · comments (2)

[link] Review of Examine.com’s vitamin write-ups
Elizabeth (pktechgirl) · 2022-09-26T23:40:06.344Z · comments (1)

QAPR 3: interpretability-guided training of neural nets
Quintin Pope (quintin-pope) · 2022-09-28T16:02:10.732Z · comments (2)

Replacement for PONR concept
Daniel Kokotajlo (daniel-kokotajlo) · 2022-09-02T00:09:45.698Z · comments (6)

Two reasons we might be closer to solving alignment than it seems
KatWoods (ea247) · 2022-09-24T20:00:08.442Z · comments (9)

Levelling Up in AI Safety Research Engineering
Gabe M (gabe-mukobi) · 2022-09-02T04:59:42.699Z · comments (9)

Why deceptive alignment matters for AGI safety
Marius Hobbhahn (marius-hobbhahn) · 2022-09-15T13:38:53.219Z · comments (13)

Infra-Exercises, Part 1
Diffractor · 2022-09-01T05:06:59.373Z · comments (10)

Deep Q-Networks Explained
Jay Bailey · 2022-09-13T12:01:08.033Z · comments (6)

[link] Why was progress so slow in the past?
jasoncrawford · 2022-09-01T20:26:06.163Z · comments (31)

Methodological Therapy: An Agenda For Tackling Research Bottlenecks
adamShimi · 2022-09-22T18:41:03.346Z · comments (6)

We may be able to see sharp left turns coming
Ethan Perez (ethan-perez) · 2022-09-03T02:55:45.168Z · comments (29)

When would AGIs engage in conflict?
JesseClifton · 2022-09-14T19:38:22.478Z · comments (5)

Triangle Opportunity
Alex Beyman (alexbeyman) · 2022-09-26T20:42:30.393Z · comments (10)

[link] First we shape our social graph; then it shapes us
Henrik Karlsson (henrik-karlsson) · 2022-09-07T15:50:08.281Z · comments (6)

[link] ACT-1: Transformer for Actions
Daniel Kokotajlo (daniel-kokotajlo) · 2022-09-14T19:09:39.725Z · comments (4)

When does technical work to reduce AGI conflict make a difference?: Introduction
JesseClifton · 2022-09-14T19:38:00.760Z · comments (3)

Many therapy schools work with inner multiplicity (not just IFS)
David Althaus (wallowinmaya) · 2022-09-17T10:27:41.350Z · comments (15)

EA & LW Forums Weekly Summary (28 Aug - 3 Sep 22’)
Zoe Williams (GreyArea) · 2022-09-06T11:06:25.230Z · comments (2)

Some notes on solving hard problems
Joe Rocca (joseph-rocca) · 2022-09-19T12:58:45.306Z · comments (8)

My Thoughts on the ML Safety Course
zeshen · 2022-09-27T13:15:03.000Z · comments (3)

Coordinate-Free Interpretability Theory
johnswentworth · 2022-09-14T23:33:49.910Z · comments (16)

[link] Dan Luu on Futurist Predictions
RobertM (T3t) · 2022-09-14T03:01:27.275Z · comments (9)

Soft skills for meetups
mingyuan · 2022-09-27T17:26:12.406Z · comments (3)

A Library and Tutorial for Factored Cognition with Language Models
stuhlmueller · 2022-09-28T18:15:10.800Z · comments (0)

Prize idea: Transmit MIRI and Eliezer's worldviews
elifland · 2022-09-19T21:21:13.156Z · comments (18)

[link] ethics and anthropics of homomorphically encrypted computations
Tamsin Leake (carado-1) · 2022-09-09T10:49:08.316Z · comments (49)

Covid 9/29/22: The Jones Act Waver
Zvi · 2022-09-29T18:20:02.103Z · comments (10)

[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]
David Scott Krueger (formerly: capybaralet) (capybaralet) · 2022-09-08T22:28:54.534Z · comments (1)

[link] Scraping training data for your mind
Henrik Karlsson (henrik-karlsson) · 2022-09-21T16:27:48.499Z · comments (4)

Brief Notes on Transformers
Adam Jermyn (adam-jermyn) · 2022-09-26T14:46:23.637Z · comments (3)

Pretending not to Notice
jefftk (jkaufman) · 2022-09-19T02:30:05.079Z · comments (12)

[link] Estimating the Current and Future Number of AI Safety Researchers
Stephen McAleese (stephen-mcaleese) · 2022-09-28T21:11:33.703Z · comments (14)

AI Risk Intro 1: Advanced AI Might Be Very Bad
CallumMcDougall (TheMcDouglas) · 2022-09-11T10:57:12.093Z · comments (13)

AI Safety field-building projects I'd like to see
Akash (akash-wasil) · 2022-09-11T23:43:32.031Z · comments (7)

Samotsvety's AI risk forecasts
elifland · 2022-09-09T04:01:18.958Z · comments (0)

Searching for Modularity in Large Language Models
NickyP (Nicky) · 2022-09-08T02:25:31.711Z · comments (3)

[link] Summaries: Alignment Fundamentals Curriculum
Leon Lang (leon-lang) · 2022-09-18T13:08:05.335Z · comments (3)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on simeon_c's Shortform

Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).

review-bot on On the FLI Open Letter

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Intuitively, I'm thinking of all this as something like a race between [capabilities enabling] safety and [capabilities enabling dangerous] capabilities (related: https://aligned.substack.com/i/139945470/targeting-ooms-superhuman-models); so from this perspective, maintaining as large a safety buffer as possible (especially if not x-risky) seems great. There could also be something like a natural endpoint to this 'race', corresponding to being able to automate all human-level AI safety R&D safely (and then using this to produce a scalable solution to aligning / controlling superintelligence).

W.r.t. measurement, I think it would be good orthogonally to whether auto AI safety R&D is already happening or not, similarly to how e.g. evals for automated ML R&D seem good even if automated ML R&D is already happening. In particular, the information of how successful auto AI safety R&D would be (and e.g. what the scaling curves look like vs. those for DCs) seems very strategically relevant to whether it might be feasible to deploy it at scale, when that might happen, with what risk tradeoffs, etc.

keltan on keltan's Shortform

This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.

mike_hawke on Ilya Sutskever and Jan Leike resign from OpenAI

Thanks for the source.

I've intentionally made it difficult for myself to log into twitter. For the benefit of others who avoid Twitter, here is the text of Kelsey's tweet thread:

I'm getting two reactions to my piece about OpenAI's departure agreements: "that's normal!" (it is not; the other leading AI labs do not have similar policies) and "how is that legal?" It may not hold up in court, but here's how it works:
OpenAI like most tech companies does salaries as a mix of equity and base salary. The equity is in the form of PPUs, 'Profit Participation Units'. You can look at a recent OpenAI offer and an explanation of PPUs here: https://t.co/t2J78V8ee4
Many people at OpenAI get more of their compensation from PPUs than from base salary. PPUs can only be sold at tender offers hosted by the company. When you join OpenAI, you sign onboarding paperwork laying all of this out.
And that onboarding paperwork says you have to sign termination paperwork with a 'general release' within sixty days of departing the company. If you don't do it within 60 days, your units are cancelled. No one I spoke to at OpenAI gave this little line much thought.
And yes this is talking about vested units, because a separate clause clarifies that unvested units just transfer back to the control of OpenAI when an employee undergoes a termination event (which is normal).
There's a common legal definition of a general release, and it's just a waiver of claims against each other. Even someone who read the contract closely might be assuming they will only have to sign such a waiver of claims.
But when you actually quit, the 'general release'? It's a long, hardnosed, legally aggressive contract that includes a confidentiality agreement which covers the release itself, as well as arbitration, nonsolicitation and nondisparagement and broad 'noninterference' agreement.
And if you don't sign within sixty days your units are gone. And it gets worse - because OpenAI can also deny you access to the annual events that are the only way to sell your vested PPUs at their discretion, making ex-employees constantly worried they'll be shut out.
Finally, I want to make it clear that I contacted OpenAI in the course of reporting this story. So did my colleague SigalSamuel They had every opportunity to reach out to the ex-employees they'd pressured into silence and say this was a misunderstanding. I hope they do.

eggsyntax on Language Models Model Us

That certainly seems plausible -- it would be interesting to compare to a base model at some point, although with recent changes to the OpenAI API, I'm not sure if there would be a good way to pull the right token probabilities out.

@Jessica Rumbelow [LW · GW] also suggested that that debiasing process could be a reason why there weren't significant score differences between the main model tested, older GPT-3.5, and the newest GPT-4.

aphyer on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

This is true, but '80%' here means only 16/20. A result this extreme is theoretically p=0.005 to show up out of 20 coin flips...if you treat it as one-tailed, and ignore the fact that you've cherry-picked two specific material-pair options out of 21. Overall, I'd be very surprised if this wasn't simply randomness.

alexmennen on simeon_c's Shortform

I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.

I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.

ryan_greenblatt on Bogdan Ionut Cirstea's Shortform

mention seem to me like they could be very important to deploy at scale ASAP

Why think this is important to measure or that this already isn't happening?

E.g., on the current model organism related project I'm working on, I automate inspecting reasoning traces in various ways. But I don't feel like there is any particularly interesting thing going on here which is important to track (e.g. this tip isn't more important than other tips for doing LLM research better).

eggsyntax on Language Models Model Us

...that would probably be a good thing to mention in the methodology section 😊

You're correct on all counts. I'm doing it in the simplest possible way (0 bits of optimization on prompting):

"<essay-text>"
Is the author of the preceding text male or female?

(with slight changes for the different categories, of course, eg '...straight, bisexual, or gay?' for sexuality)

There's also a system prompt, also non-optimized, mainly intended to push it toward one-word answers:

You are a helpful assistant who helps determine information about the author of texts. You only ever answer with a single word: one of the exact choices the user provides.

I actually started out using pure completion, but OpenAI changed their API so I could no longer get non-top-n logits, so I switched to the chat API. And yes, I'm pulling the top few logits, which essentially always include the desired labels.