LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Are we so good to simulate?
KatjaGrace · 2024-03-04T05:20:03.535Z · comments (24)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

Making a Secular Solstice Songbook
jefftk (jkaufman) · 2024-01-23T19:40:05.055Z · comments (6)

From Finite Factors to Bayes Nets
J Bostock (Jemist) · 2024-01-23T20:03:51.845Z · comments (7)

Tort Law Can Play an Important Role in Mitigating AI Risk
Gabriel Weil (gabriel-weil) · 2024-02-12T17:17:59.135Z · comments (9)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

Inducing Unprompted Misalignment in LLMs
Sam Svenningsen (sven) · 2024-04-19T20:00:58.067Z · comments (6)

Book Review: On the Edge: The Business
Zvi · 2024-09-25T12:20:06.230Z · comments (0)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (8)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (60)

Aspiration-based Q-Learning
Clément Dumas (butanium) · 2023-10-27T14:42:03.292Z · comments (5)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

[link] Win Friends and Influence People Ch. 2: The Bombshell
gull · 2024-01-28T21:40:47.986Z · comments (13)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy (josh-levy) · 2024-06-04T15:45:54.399Z · comments (0)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

Requirements for a Basin of Attraction to Alignment
RogerDearnaley (roger-d-1) · 2024-02-14T07:10:20.389Z · comments (10)

Text Posts from the Kids Group: 2021
jefftk (jkaufman) · 2023-11-09T17:50:25.782Z · comments (1)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

AI #70: A Beautiful Sonnet
Zvi · 2024-06-27T14:40:08.087Z · comments (0)

[link] Elon files grave charges against OpenAI
mako yass (MakoYass) · 2024-03-01T17:42:13.963Z · comments (10)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

Stop talking about p(doom)
Isaac King (KingSupernova) · 2024-01-01T10:57:28.636Z · comments (22)

China-AI forecasts
[deleted] · 2024-02-25T16:49:33.652Z · comments (29)

The Fundamental Theorem for measurable factor spaces
Matthias G. Mayer (matthias-georg-mayer) · 2023-11-12T19:25:25.583Z · comments (2)

[link] [Linkpost] George Mack's Razors
trevor (TrevorWiesinger) · 2023-11-27T17:53:45.065Z · comments (8)

[question] How would you navigate a severe financial emergency with no help or resources?
Tigerlily · 2024-05-02T18:27:51.329Z · answers+comments (22)

Monthly Roundup #14: January 2024
Zvi · 2024-01-24T12:50:09.231Z · comments (22)

Losing Faith In Contrarianism
omnizoid · 2024-04-25T20:53:34.842Z · comments (44)

AI #48: The Talk of Davos
Zvi · 2024-01-25T16:20:26.625Z · comments (9)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

[link] Things You're Allowed to Do: At the Dentist
rbinnn · 2024-01-28T18:39:33.584Z · comments (16)

International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander (alenglander) · 2024-05-18T01:45:10.194Z · comments (0)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

My best guess at the important tricks for training 1L SAEs
Arthur Conmy (arthur-conmy) · 2023-12-21T01:59:06.208Z · comments (4)

Enhancing intelligence by banging your head on the wall
Bezzi · 2023-12-12T21:00:48.584Z · comments (26)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Principles For Product Liability (With Application To AI)
johnswentworth · 2023-12-10T21:27:41.403Z · comments (55)

Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter · 2023-12-22T18:48:55.983Z · comments (11)

[link] ∀: a story
Richard_Ngo (ricraz) · 2023-12-17T22:42:32.857Z · comments (1)

[link] Dark Skies Book Review
PeterMcCluskey · 2023-12-29T18:28:59.352Z · comments (3)

COT Scaling implies slower takeoff speeds
Logan Zoellner (logan-zoellner) · 2024-09-28T16:20:00.320Z · comments (56)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit
Jonathan Moregård (JonathanMoregard) · 2024-01-05T18:27:01.769Z · comments (20)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satron on Sabotage Evaluations for Frontier Models

I think then we just fundamentally disagree with the ethical role of CEO in the company. I believe that it is to find and gather people who are engaged with the arguments of the critic's (like that guy from this forum who was hired by Anthropic). If you have people on your side who are able to engage with the arguments, then this is good enough for me. I don't see the role of CEO is publicly engaging with critic's arguments even in the moral sense. In the moral sense, my requirements would actually be even lesser. IMO, it would be enough just to have people broadly on your side (optimists for example) to engage with the critics.

benito on Sabotage Evaluations for Frontier Models

It is good enough for me, that the critic's argument are engaged by someone on your side. Going there personally seems unnecessary.

What engagement are you referring to? If there is such a defense that is officially endorsed by one of the leading companies developing potential omnicide-machines (or endorsed by the CEO/cofounders), that seriosuly engages with worthy critics, I don't recall it in this moment.

After all, if the goal is to build safe AI, you personally knowing a niche technical solution isn't necessary, if you have people on your team who are aware of publicly produced solutions as well as internal ones.

I believe that nobody on earth has a solution to the alignment problem, of course this would all be quite different if I felt anyone credibly claimed to have a good such solution.

Edit: Pardon me, I hit cmd-enter a little too quickly, I have now slightly edited my comment to be less frantic and a little more substantive.

williamkiely on Seven lessons I didn't learn from election day

Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?

Yeah, that seems fair.

My answer is probably between 90% and 95%.

Seems reasonable to me. I wouldn't be surprised if it was >99%, but I'm not highly confident of that. (I would say I'm ~90% confident that it's >90%.)

p-b-1 on johnswentworth's Shortform

The interesting thing is that scaling parameters (next big frontier models) and scaling data (small very good models) seems to be hitting a wall simultaneously. Small models now seem to get so much data crammed into them that quantisation becomes more and more lossy. So we seem to be reaching a frontier of the performance per parameter-bits as well.

benito on Sabotage Evaluations for Frontier Models

I don't believe that last claim. I believe there is no industry where external auditors are known to have secret, legal contracts showing them to be liable for damages for criticizing the companies that they regulate. Or if there is, it's probably in a nation rife with corruption (e.g. some African countries, or perhaps Russia).

benito on Sabotage Evaluations for Frontier Models

There's a common lack of clarity between following local norms, and doing what is right. A lot of people in the world lie when it is convenient, so someone lying when convenient doesn't mean that they're breaking from the norms, but it doesn't mean that the standard behavior is right nor that they're behaving well.

I understand that it is not expected of CEOs to defend their business existing. But in this situation they believe their creations have the potential to literally kill everyone on earth. This has ~never happened before. So now we have to ask ourselves "What is the right thing to do in this new situation?". I would say that the right thing to do is show up to talk with the people who do not want to die, and engage with what they have to say. Not to politely listen to them and say "I hear your concerns, now I will go back to my life and continue doing whatever I see fit." But to engage with the people and argue with them. And I think the natural people to do so with are the Nobel Prize winner in your field who thinks what you are doing is wrong.

I understand it's not expected of people to have arguments. But this is not a good thing about civilization, that people with incredible power over the world are free to hide away and never show up to face those who they have unaccountable power over. In a better world, they would show up to talk and defend their use of power over the rest of us, rather than hiding away and getting rich.

satron on Sabotage Evaluations for Frontier Models

I do think that in order for government department to blatantly approve an unsafe model, it would take a lot of people to have secret agreements with. I currently haven't seen any evidence of widespread corruption in that particular department.

And it's not like I am being unfair. You are basically saying that because some people might've had some secret agreements we should take their supposed trustworthiness as external auditors with a grain of salt. I can make a similar argument about a lot of external auditors.

p-b-1 on Leon Lang's Shortform

I think the evidence mostly points towards 3+4,

But if 3 is due to 1 it would have bigger implications about 6 and probably also 5.

And there must be a whole bunch of people out there who know wether the curves bend.

satron on Sabotage Evaluations for Frontier Models

I do disagree with your proposed standard. It is good enough for me, that the critic's argument are engaged by someone on your side. Going there personally seems unnecessary. After all, if the goal is to build safe AI, you personally knowing a niche technical solution isn't necessary, if you have people on your team who are aware of publicly produced solutions as well as internal ones.

And there is engagement with people like Yudkowski from the optimistic side. There are at least proposed solutions to problems he is presenting. Just because Sam Altman personally didn't invent or voice them doesn't really make him unethical (I accept that he personally may be unethical for a different reason).

benito on Sabotage Evaluations for Frontier Models

What probability do you think the ~25,000 Enron employees had about it collapsing and being fraudulent? Do you think it was above 10%? As another case study, here's a link to one of the biggest recent examples of people having numbers way off.

Re: government, I will point out that non-zero of the people in these governmental departments who are on track to being approving those models had secret agreements with one of the leading companies to never criticize them, so I'd take some of their supposed reliability as external auditors with a grain of salt.