LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Contra Hofstadter on GPT-3 Nonsense
rictic · 2022-06-15T21:53:30.646Z · comments (24)

Thoughts on the impact of RLHF research
paulfchristiano · 2023-01-25T17:23:16.402Z · comments (101)

Announcing Balsa Research
Zvi · 2022-09-25T22:50:00.626Z · comments (64)

The shard theory of human values
Quintin Pope (quintin-pope) · 2022-09-04T04:28:11.752Z · comments (66)

An Observation of Vavilov Day
Elizabeth (pktechgirl) · 2022-01-03T21:10:02.107Z · comments (42)

The Feeling of Idea Scarcity
johnswentworth · 2022-12-31T17:34:04.306Z · comments (22)

Deep Deceptiveness
So8res · 2023-03-21T02:51:52.794Z · comments (58)

[link] More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes (beth-barnes) · 2023-03-19T00:25:39.707Z · comments (54)

Editing Advice for LessWrong Users
JustisMills · 2022-04-11T16:32:17.530Z · comments (14)

You Don't Exist, Duncan
[DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-02-02T08:37:01.049Z · comments (107)

UFO Betting: Put Up or Shut Up
RatsWrongAboutUAP · 2023-06-13T04:05:32.652Z · comments (207)

My Clients, The Liars
ymeskhout · 2024-03-05T21:06:36.669Z · comments (85)

Policy discussions follow strong contextualizing norms
Richard_Ngo (ricraz) · 2023-04-01T23:51:36.588Z · comments (61)

Introduction to abstract entropy
Alex_Altair · 2022-10-20T21:03:02.486Z · comments (78)

Self-driving car bets
paulfchristiano · 2023-07-29T18:10:01.112Z · comments (41)

Lessons On How To Get Things Right On The First Try
johnswentworth · 2023-06-19T23:58:09.605Z · comments (56)

[link] Sum-threshold attacks
TsviBT · 2023-09-08T17:13:37.044Z · comments (52)

(briefly) RaDVaC and SMTM, two things we should be doing
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-01-12T06:20:35.555Z · comments (79)

[link] AGI in sight: our look at the game board
Andrea_Miotti (AndreaM) · 2023-02-18T22:17:44.364Z · comments (135)

AGI Safety FAQ / all-dumb-questions-allowed thread
Aryeh Englander (alenglander) · 2022-06-07T05:47:13.350Z · comments (526)

[link] ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano · 2021-12-14T20:09:50.209Z · comments (90)

Replacing Karma with Good Heart Tokens (Worth $1!)
Ben Pace (Benito) · 2022-04-01T09:31:34.332Z · comments (173)

Catching the Eye of Sauron
Casey B. (Zahima) · 2023-04-07T00:40:46.556Z · comments (68)

Announcing MIRI’s new CEO and leadership team
Gretta Duleba (gretta-duleba) · 2023-10-10T19:22:11.821Z · comments (52)

Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko · 2024-02-03T20:36:59.806Z · comments (156)

What do ML researchers think about AI in 2022?
KatjaGrace · 2022-08-04T15:40:05.024Z · comments (33)

How I buy things when Lightcone wants them fast
jacobjacob · 2022-09-26T05:02:09.003Z · comments (21)

Recursive Middle Manager Hell
Raemon · 2023-01-01T04:33:29.942Z · comments (45)

MIRI 2024 Mission and Strategy Update
Malo (malo) · 2024-01-05T00:20:54.169Z · comments (44)

[link] AI presidents discuss AI alignment agendas
TurnTrout · 2023-09-09T18:55:37.931Z · comments (22)

Announcing Apollo Research
Marius Hobbhahn (marius-hobbhahn) · 2023-05-30T16:17:19.767Z · comments (11)

CFAR Takeaways: Andrew Critch
Raemon · 2024-02-14T01:37:03.931Z · comments (62)

What are the results of more parental supervision and less outdoor play?
juliawise · 2023-11-25T12:52:29.986Z · comments (30)

Elements of Rationalist Discourse
Rob Bensinger (RobbBB) · 2023-02-12T07:58:42.479Z · comments (47)

Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn · 2023-07-04T17:32:48.047Z · comments (32)

Lessons learned from talking to >100 academics about AI safety
Marius Hobbhahn (marius-hobbhahn) · 2022-10-10T13:16:38.036Z · comments (17)

Thoughts on responsible scaling policies and regulation
paulfchristiano · 2023-10-24T22:21:18.341Z · comments (33)

Moses and the Class Struggle
lsusr · 2022-04-01T11:55:04.911Z · comments (26)

Modern Transformers are AGI, and Human-Level
abramdemski · 2024-03-26T17:46:19.373Z · comments (89)

ProjectLawful.com: Eliezer's latest story, past 1M words
Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2022-05-11T06:18:02.738Z · comments (112)

ChatGPT can learn indirect control
Raymond D · 2024-03-21T21:11:06.649Z · comments (23)

What I would do if I wasn’t at ARC Evals
LawrenceC (LawChan) · 2023-09-05T19:19:36.830Z · comments (8)

Believing In
AnnaSalamon · 2024-02-08T07:06:13.072Z · comments (49)

Launching Lightspeed Grants (Apply by July 6th)
habryka (habryka4) · 2023-06-07T02:53:29.227Z · comments (41)

[link] Actually, Othello-GPT Has A Linear Emergent World Representation
Neel Nanda (neel-nanda-1) · 2023-03-29T22:13:14.878Z · comments (24)

[link] Cultivating a state of mind where new ideas are born
Henrik Karlsson (henrik-karlsson) · 2023-07-27T09:16:42.566Z · comments (18)

[link] Introducing AI Lab Watch
Zach Stein-Perlman · 2024-04-30T17:00:12.652Z · comments (25)

[link] Orthogonal: A new agent foundations alignment organization
Tamsin Leake (carado-1) · 2023-04-19T20:17:14.174Z · comments (4)

Natural Abstractions: Key claims, Theorems, and Critiques
LawrenceC (LawChan) · 2023-03-16T16:37:40.181Z · comments (20)

What it's like to dissect a cadaver
Alok Singh (OldManNick) · 2022-11-10T06:40:05.776Z · comments (24)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

habryka4 on simeon_c's Shortform

Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).

review-bot on On the FLI Open Letter

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year. Will this post make the top fifty?

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Intuitively, I'm thinking of all this as something like a race between [capabilities enabling] safety and [capabilities enabling dangerous] capabilities (related: https://aligned.substack.com/i/139945470/targeting-ooms-superhuman-models); so from this perspective, maintaining as large a safety buffer as possible (especially if not x-risky) seems great. There could also be something like a natural endpoint to this 'race', corresponding to being able to automate all human-level AI safety R&D safely (and then using this to produce a scalable solution to aligning / controlling superintelligence).

W.r.t. measurement, I think it would be good orthogonally to whether auto AI safety R&D is already happening or not, similarly to how e.g. evals for automated ML R&D seem good even if automated ML R&D is already happening. In particular, the information of how successful auto AI safety R&D would be (and e.g. what the scaling curves look like vs. those for DCs) seems very strategically relevant to whether it might be feasible to deploy it at scale, when that might happen, with what risk tradeoffs, etc.

keltan on keltan's Shortform

This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.

mike_hawke on Ilya Sutskever and Jan Leike resign from OpenAI

Thanks for the source.

I've intentionally made it difficult for myself to log into twitter. For the benefit of others who avoid Twitter, here is the text of Kelsey's tweet thread:

I'm getting two reactions to my piece about OpenAI's departure agreements: "that's normal!" (it is not; the other leading AI labs do not have similar policies) and "how is that legal?" It may not hold up in court, but here's how it works:
OpenAI like most tech companies does salaries as a mix of equity and base salary. The equity is in the form of PPUs, 'Profit Participation Units'. You can look at a recent OpenAI offer and an explanation of PPUs here: https://t.co/t2J78V8ee4
Many people at OpenAI get more of their compensation from PPUs than from base salary. PPUs can only be sold at tender offers hosted by the company. When you join OpenAI, you sign onboarding paperwork laying all of this out.
And that onboarding paperwork says you have to sign termination paperwork with a 'general release' within sixty days of departing the company. If you don't do it within 60 days, your units are cancelled. No one I spoke to at OpenAI gave this little line much thought.
And yes this is talking about vested units, because a separate clause clarifies that unvested units just transfer back to the control of OpenAI when an employee undergoes a termination event (which is normal).
There's a common legal definition of a general release, and it's just a waiver of claims against each other. Even someone who read the contract closely might be assuming they will only have to sign such a waiver of claims.
But when you actually quit, the 'general release'? It's a long, hardnosed, legally aggressive contract that includes a confidentiality agreement which covers the release itself, as well as arbitration, nonsolicitation and nondisparagement and broad 'noninterference' agreement.
And if you don't sign within sixty days your units are gone. And it gets worse - because OpenAI can also deny you access to the annual events that are the only way to sell your vested PPUs at their discretion, making ex-employees constantly worried they'll be shut out.
Finally, I want to make it clear that I contacted OpenAI in the course of reporting this story. So did my colleague SigalSamuel They had every opportunity to reach out to the ex-employees they'd pressured into silence and say this was a misunderstanding. I hope they do.

eggsyntax on Language Models Model Us

That certainly seems plausible -- it would be interesting to compare to a base model at some point, although with recent changes to the OpenAI API, I'm not sure if there would be a good way to pull the right token probabilities out.

@Jessica Rumbelow [LW · GW] also suggested that that debiasing process could be a reason why there weren't significant score differences between the main model tested, older GPT-3.5, and the newest GPT-4.

aphyer on D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

This is true, but '80%' here means only 16/20. A result this extreme is theoretically p=0.005 to show up out of 20 coin flips...if you treat it as one-tailed, and ignore the fact that you've cherry-picked two specific material-pair options out of 21. Overall, I'd be very surprised if this wasn't simply randomness.

alexmennen on simeon_c's Shortform

I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.

I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.

ryan_greenblatt on Bogdan Ionut Cirstea's Shortform

mention seem to me like they could be very important to deploy at scale ASAP

Why think this is important to measure or that this already isn't happening?

E.g., on the current model organism related project I'm working on, I automate inspecting reasoning traces in various ways. But I don't feel like there is any particularly interesting thing going on here which is important to track (e.g. this tip isn't more important than other tips for doing LLM research better).

eggsyntax on Language Models Model Us

...that would probably be a good thing to mention in the methodology section 😊

You're correct on all counts. I'm doing it in the simplest possible way (0 bits of optimization on prompting):

"<essay-text>"
Is the author of the preceding text male or female?

(with slight changes for the different categories, of course, eg '...straight, bisexual, or gay?' for sexuality)

There's also a system prompt, also non-optimized, mainly intended to push it toward one-word answers:

You are a helpful assistant who helps determine information about the author of texts. You only ever answer with a single word: one of the exact choices the user provides.

I actually started out using pure completion, but OpenAI changed their API so I could no longer get non-top-n logits, so I switched to the chat API. And yes, I'm pulling the top few logits, which essentially always include the desired labels.