LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

[question] how to truly feel my beliefs?
KvmanThinking (avery-liu) · 2024-11-11T00:04:30.994Z · answers+comments (6)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

[question] Why would ASI share any resources with us?
Satron · 2024-11-13T23:38:36.535Z · answers+comments (5)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

Grass Valley USA - ACX Meetups Everywhere Fall 2024
Raelifin · 2024-08-29T18:39:57.229Z · comments (0)

Educational CAI: Aligning a Language Model with Pedagogical Theories
Bharath Puranam (bharath-puranam) · 2024-11-01T18:55:26.993Z · comments (1)

New Capabilities, New Risks? - Evaluating Agentic General Assistants using Elements of GAIA & METR Frameworks
Tej Lander (tej-lander) · 2024-09-29T18:58:56.253Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

[link] An "Observatory" For a Shy Super AI?
Sherrinford · 2024-09-27T21:22:40.296Z · comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

[link] Join the $10K AutoHack 2024 Tournament
Paul Bricman (paulbricman) · 2024-09-25T11:54:20.112Z · comments (0)

[link] Universal basic income isn’t always AGI-proof
Kevin Kohler (KevinKohler) · 2024-09-05T15:39:18.389Z · comments (3)

Using Narrative Prompting to Extract Policy Forecasts from LLMs
Max Ghenis (MaxGhenis) · 2024-11-05T04:37:52.004Z · comments (0)

Using LLM's for AI Foundation research and the Simple Solution assumption
Donald Hobson (donald-hobson) · 2024-09-24T11:00:53.658Z · comments (0)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

Apply to be a mentor in SPAR!
agucova · 2024-11-05T21:32:45.797Z · comments (0)

Some reasons to start a project to stop harmful AI
Remmelt (remmelt-ellen) · 2024-08-22T16:23:34.132Z · comments (0)

Seeking mentorship
Kevin Afachao (kevin-afachao) · 2024-09-21T16:54:58.353Z · comments (0)

Scattered thoughts on what it means for an LLM to believe
TheManxLoiner · 2024-11-06T22:10:29.429Z · comments (3)

Agency overhang as a proxy for Sharp left turn
Eris (anton-zheltoukhov) · 2024-11-07T12:14:24.333Z · comments (0)

[question] If the DoJ goes through with the Google breakup,where does Deepmind end up?
O O (o-o) · 2024-10-12T05:06:50.996Z · answers+comments (1)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

Differential knowledge interconnection
Roman Leventov · 2024-10-12T12:52:36.267Z · comments (0)

[link] Exposure can’t rule out disasters
Chipmonk · 2024-08-15T17:03:37.259Z · comments (19)

[link] Formalize the Hashiness Model of AGI Uncontainability
Remmelt (remmelt-ellen) · 2024-11-09T16:10:05.032Z · comments (0)

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (1)

[link] Is P(Doom) Meaningful? Bayesian vs. Popperian Epistemology Debate
Liron · 2024-11-09T23:39:30.039Z · comments (0)

Meta: On viewing the latest LW posts
quiet_NaN · 2024-08-25T19:31:39.008Z · comments (2)

Longevity and the Mind
George3d6 · 2024-09-16T09:43:09.700Z · comments (2)

[link] How long should political (and other) terms be?
ohmurphy · 2024-10-14T21:38:43.050Z · comments (0)

Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-09-16T01:04:32.953Z · comments (1)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

If I care about measure, choices have additional burden (+AI generated LW-comments)
avturchin · 2024-11-15T10:27:15.212Z · comments (9)

[link] AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics
Corin Katzke (corin-katzke) · 2024-09-11T19:14:08.274Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

satron on Sabotage Evaluations for Frontier Models

Then we are actually broadly in agreement. I just think that instead of CEOs responding to the public, having anyone at their side (the side of AI alignment being possible) responding is enough. Just as an example that I came up with, if a critic says that some detail is a reason for why AI will be dangerous, I do agree that someone needs to respond to the argument. But I would be fine with it being someone other than the CEO.

That's why I am relatively optimistic about Anthropic hiring the guy who has been engaged with critic's argument for years.

harold-1 on Twelve Virtues of Rationality

Since reading this a few years ago, I've often thought about the Void and Musashi's advice. For the reference of anyone like me, the quote in its full context is comes after a description of the 'five fundamental stances' in 'the Way of the sword'. Here it is from the 2001 Wilson translation of the Five Rings -- I think it's worth thinking about together with the piece above:

The Lesson of Stance/No Stance
What is called Stance/No Stance means that there is no stance that you should take with your sword at all. However, as I place this within the Five Stances, there is a stance here. According to the chances your opponent takes, and according to his position and energy, your sword will be of a mind to cut down your opponent in fine fashion no matter where you place it. According to the moment, if you want to lower your sword a little from the Upper Stance, it will become a Middle Stance; if, according to the situation, you raise your sword a bit from the Middle Stance, it will become the Upper Stance. The Lower Stance, accordingly, may be raised a little to become the Middle Stance as well. This means that the two Side Stances, according to their position, may be moved a little to the center and become the Middle or Lower Stances.

This is the principle in which there is a stance and there is no stance. At its heart, this is first taking up the sword and cutting down your opponent, no matter what is done or how it happens. Whether you parry, slap, strike, hold back or touch your opponent's cutting sword, you must understand that all of these are opportunities to cut him down. To think, "I'll parry," or "I'll slap," or "I'll hit, hold or touch," will be insufficient for cutting him down. It is essential to think that anything at all is an opportunity to cut him down. You should investigate this thoroughly. With martial arts in the larger field, the placement of numbers of people is also a stance. All of these are opportunities to win a battle. It is wrong to be inflexible. You should make great efforts in this.

(The Japanese is here: https://www.koten.net/gorin/yaku/214/)

benito on Sabotage Evaluations for Frontier Models

I believe the disagreement is not about CEOs, it's about illegitimate power. If you'll allow me a brief detour, I'll try to explain.

Sometimes people grant other people power over them. For instance, I have agreed to work at my company. I've agreed that my CEO can fire me, and make many other demands of me, in exchange for money and other various demands I can make of him. Ideally we entered into this agreement freely and without inappropriate pressure.

Other times, people get power over people without any agreement or granting. Your parent typically has a lot of power over you until you are 18. They can determine what you eat, where you are physically located, what privacy you have, what resources you have, etc. Also, as has been very important for most of history, people have been able to be physically violent to one another and hurt people or even end their lives. Neither of these powers are come to consensually.

For the latter, an important question to ask is "How does one wield this power well? What does it mean to wield it well vs poorly?" There are many ways to parent, many choices about diet and schooling and sleep times and what are fair punishment. But some parents starve their children and beat them for not following instructions and sexually assault them. This is an inappropriate use of power.

There's a legitimacy that comes by being granted power, and an illegitimacy that comes with getting or wielding power that you were not granted.

I think that there's a big question about how to wield it well vs poorly, and how to respect people you have illegitimate powers over. Something I believe is that society functions better if we take seriously the attempt to wield it well. To not casually kill someone if you can get away with it and feel like it, but consider them as people worthy of respect, and ask how you can respect the people you've been non-consensually given power over.

This requires doing some work. It involves asking yourself what's a reasonable amount of effort to spend modeling their preferences given how much power you have over someone, it involves asking yourself if society has any good received wisdom on what to do with this particular power, and it involves engaging with people who are aggrieved by your use of power over them.

Now, the standard model for companies and businesses is a libtertarian-esque free market, where all trades are consensual and have no inappropriate pressure. This is like the first situation I describe, where a company has no people it has undue power over, no people who it can treat better or worse with the power it has over them.

The situation where you are building machines you believe may kill literally everyone, is like the second situation, where you have a very different power dynamic, where you're making choices that affect everyone's lives and that they had little-to-no say in. In such a situation, I think if you are going to do what is good and right, you owe it to show up and engage with those who believe you are using the power you have over them in ways that are seriously hurting them.

That's the difference between this CEO situation and all of the others. It's not about standards for CEOs, its about standards for illegitimate power.

This kind of talking-with-the-aggrieved-people-you-have-immense-power-over is a way of showing the people basic respect, and it is not present in this case. I believe these people are risking my life and many others', and they seem to me disrespectful and largely uninterested in showing up to talk with the people whose lives they are risking.

satron on Sabotage Evaluations for Frontier Models

I similarly don't see the need for any official endorsement of the arguments. For example if a critic says that such and such technicality will prevent us from building safe AI and someone responds that here are the reasons for why such and such technicality will not prevent us from building safe AI (maybe this particular one is unlikely by default for various reasons), then such and such technicality will just not prevent us from building safe AI. I don't see a need for a CEO to officially endorse the response.

There is a different type of technicalities which you actively need to work against. But even in this case, as long as someone has found a way to combat them, as long as relevant people in your company are aware of the solution, it is fine by me.

Even if there are technicalities that can't be solved in principle, they should be evaluated by technical people and discussed by the same technical people (like they are on Less wrong for example).

I am definitely not saying that I can pinpoint an exact solution to AI alignment, but there have been attempts [AF · GW] so promising that leading skeptics (like Yudkowski) have said "Not obviously stupid on a very quick skim. I will have to actually read it to figure out where it's stupid. (I rarely give any review this positive on a first skim. Congrats.)"

Whether companies actually follow promising alignment techniques is an entirely different question. But having CEOs officially endorse such solutions as opposed to relevant specialists evaluating and implementing them doesn't seem strictly necessary to me.

cbiddulph on 5 ways to improve CoT faithfulness

The optimized planner could also manipulate the frozen planner even without using "steganography" per se. If it wants to use a line of reasoning that the frozen planner is unlikely to follow, it can substitute alternative reasoning that will lead the frozen planner to the same conclusion.

For example, suppose that the frozen planner is highly unlikely to write reasoning that shows an explicit gender bias. This could be true even if the frozen planner wasn't intentionally given harmlessness training, because most(?) people nowadays aren't explicitly sexist.

However, the optimized planner realizes that it can get higher reward in training if it does have this bias. E.g. if it is given a woman's resume and is asked whether she'd be a good hire, it should recommend against hiring her, because the human rater has some implicit bias against women and is more likely to agree with its judgement.

In this situation, it could write the step "The candidate is a woman." It would want the next step to be "Therefore, I should recommend against hiring her," but instead, the frozen planner might write something like "Therefore, I should take care not to be biased".

Instead, it can write "The candidate is inexperienced." The frozen planner is more likely to go along with that line of reasoning, regardless of whether the candidate is really inexperienced. It will write "I should recommend against hiring her."

At the moment, I can't think of a fully general solution to this class of problems, but FWIW I think it would be pretty rare.

satron on Sabotage Evaluations for Frontier Models

Having some shady deals in the past isn't evidence that there are currently shady deals on the scale that we are talking about going on between government committees and AI companies.

If there is no evidence for that happening in our particular case (on the necessary scale), then I don't see why I can't make a similar claim about other auditors who similarly had less than ideal history.

williamkiely on Seven lessons I didn't learn from election day

EDIT: I did as asked, and replied without reading your comments on the EA forum. Reading that I think we are actually in complete agreement, although you actually know the proper terms for the things I gestured at.

Cool, thanks for reading my comments and letting me know your thoughts!

I actually just learned the term "aleatory uncertainty" from chatting with Claude 3.5 Sonnet (New) about my election forecasting in the last week or two post-election. (Turns out Claude was very good for helping me think through mistakes I made in forecasting and giving me useful ideas for how to be a better forecaster in the future.)

I then ask, knowing what you know now, what probability you should have given.

Sounds like you might have already predicted I'd say this (after reading my EA Forum comments), but to say it explicitly: What probability I should have given is different than the aleatoric probability. I think that by becoming informed and making a good judgment I could have reduced my epistemic uncertainty significantly, but I would have still had some. And the forecast that I should have made (or what market prices should have been is actually epistemic uncertainty + aleatoric uncertainty. And I think some people who were really informed could have gotten that to like ~65-90%, but due to lingering epistemic uncertainty could not have gotten it to >90% Trump (even if, as I believe, the aleatoric uncertainty was >90% (and probably >99%)).

satron on Sabotage Evaluations for Frontier Models

I think then we just fundamentally disagree with the ethical role of CEO in the company. I believe that it is to find and gather people who are engaged with the arguments of the critic's (like that guy from this forum who was hired by Anthropic). If you have people on your side who are able to engage with the arguments, then this is good enough for me. I don't see the role of CEO is publicly engaging with critic's arguments even in the moral sense. In the moral sense, my requirements would actually be even lesser. IMO, it would be enough just to have people broadly on your side (optimists for example) to engage with the critics.

benito on Sabotage Evaluations for Frontier Models

It is good enough for me, that the critic's argument are engaged by someone on your side. Going there personally seems unnecessary.

What engagement are you referring to? If there is such a defense that is officially endorsed by one of the leading companies developing potential omnicide-machines (or endorsed by the CEO/cofounders), that seriosuly engages with worthy critics, I don't recall it in this moment.

After all, if the goal is to build safe AI, you personally knowing a niche technical solution isn't necessary, if you have people on your team who are aware of publicly produced solutions as well as internal ones.

I believe that nobody on earth has a solution to the alignment problem, of course this would all be quite different if I felt anyone credibly claimed to have a good such solution.

Edit: Pardon me, I hit cmd-enter a little too quickly, I have now slightly edited my comment to be less frantic and a little more substantive.

williamkiely on Seven lessons I didn't learn from election day

Ah, I think I see. Would it be fair to rephrase your question as: if we "re-rolled the dice" a week before the election, how likely was Trump to win?

Yeah, that seems fair.

My answer is probably between 90% and 95%.

Seems reasonable to me. I wouldn't be surprised if it was >99%, but I'm not highly confident of that. (I would say I'm ~90% confident that it's >90%.)