LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?
notfnofn · 2024-10-27T14:33:53.960Z · answers+comments (21)

[question] somebody explain the word "epistemic" to me
KvmanThinking (avery-liu) · 2024-10-28T16:40:24.275Z · answers+comments (8)

Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen (bayesshammai) · 2024-10-28T18:39:58.480Z · comments (0)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen · 2024-10-28T21:44:42.352Z · comments (0)

[link] October 2024 Progress in Guaranteed Safe AI
Quinn (quinn-dougherty) · 2024-10-28T23:34:51.689Z · comments (0)

Join my new subscriber chat
sarahconstantin · 2024-11-06T02:30:11.059Z · comments (0)

[link] Spherical cow
dkl9 · 2024-11-11T03:10:27.788Z · comments (0)

Not all biases are equal - a study of sycophancy and bias in fine-tuned LLMs
jakub_krys (kryjak) · 2024-11-11T23:11:15.233Z · comments (0)

[question] why won't this alignment plan work?
KvmanThinking (avery-liu) · 2024-10-10T15:44:59.450Z · answers+comments (7)

[question] Is School of Thought related to the Rationality Community?
Shoshannah Tekofsky (DarkSym) · 2024-10-15T12:41:33.224Z · answers+comments (6)

[question] how to truly feel my beliefs?
KvmanThinking (avery-liu) · 2024-11-11T00:04:30.994Z · answers+comments (6)

Exploring Shard-like Behavior: Empirical Insights into Contextual Decision-Making in RL Agents
Alejandro Aristizabal (alejandro-aristizabal) · 2024-09-29T00:32:42.161Z · comments (0)

A gentle introduction to sparse autoencoders
Nick Jiang (nick-jiang) · 2024-09-02T18:11:47.086Z · comments (0)

Thoughts on Evo-Bio Math and Mesa-Optimization: Maybe We Need To Think Harder About "Relative" Fitness?
Lorec · 2024-09-28T14:07:42.412Z · comments (6)

[link] Metaculus's 'Minitaculus' Experiments — Collaborate With Us
ChristianWilliams · 2024-08-26T20:44:32.125Z · comments (0)

'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
david reinstein (david-reinstein) · 2024-09-28T00:32:16.845Z · comments (0)

The Existential Dread of Being a Powerful AI System
testingthewaters · 2024-09-26T10:56:32.904Z · comments (1)

Introducing Kairos: a new AI safety fieldbuilding organization (the new home for SPAR and FSP)
agucova · 2024-10-25T21:59:08.782Z · comments (0)

[question] What are some positive developments in AI safety in 2024?
Satron · 2024-11-15T10:32:39.541Z · answers+comments (0)

Avoiding jailbreaks by discouraging their representation in activation space
Guido Bergman · 2024-09-27T17:49:20.785Z · comments (2)

Increasing the Span of the Set of Ideas
Jeffrey Heninger (jeffrey-heninger) · 2024-09-13T15:52:39.132Z · comments (1)

Forever Leaders
Justice Howard (justice-howard) · 2024-09-14T20:55:39.095Z · comments (9)

Retrieval Augmented Genesis
João Ribeiro Medeiros (joao-ribeiro-medeiros) · 2024-10-01T20:18:01.836Z · comments (0)

[link] SCP Foundation - Anti memetic Division Hub
landscape_kiwi · 2024-09-15T13:40:52.691Z · comments (1)

Against Job Boards: Human Capital and the Legibility Trap
vaishnav92 · 2024-10-24T20:50:50.266Z · comments (1)

Thirty random thoughts about AI alignment
Lysandre Terrisse · 2024-09-15T16:24:10.572Z · comments (1)

[link] Redundant Attention Heads in Large Language Models For In Context Learning
skunnavakkam · 2024-09-01T20:08:48.963Z · comments (0)

Does “Ultimate Neartermism” via Eternal Inflation dominate Longtermism in expectation?
Jordan Arel · 2024-08-17T22:28:21.849Z · comments (1)

[link] [Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos (fernando-avalos) · 2024-09-09T03:33:53.548Z · comments (1)

[question] Can subjunctive dependence emerge from a simplicity prior?
Daniel C (harper-owen) · 2024-09-16T12:39:35.543Z · answers+comments (0)

Budapest Hungary - ACX Meetups Everywhere Fall 2024
Timothy Underwood (timothy-underwood-1) · 2024-08-29T18:37:41.313Z · comments (0)

A Taxonomy Of AI System Evaluations
Maxime Riché (maxime-riche) · 2024-08-19T09:07:45.224Z · comments (0)

[question] Why would ASI share any resources with us?
Satron · 2024-11-13T23:38:36.535Z · answers+comments (5)

Another UFO Bet
codyz · 2024-11-01T01:55:27.301Z · comments (9)

[link] AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke (corin-katzke) · 2024-10-28T16:03:39.258Z · comments (0)

Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
Louka Ewington-Pitsos (louka-ewington-pitsos) · 2024-09-17T03:52:43.269Z · comments (2)

Understanding Hidden Computations in Chain-of-Thought Reasoning
rokosbasilisk · 2024-08-24T16:35:03.907Z · comments (1)

Inquisitive vs. adversarial rationality
gb (ghb) · 2024-09-18T13:50:09.198Z · comments (9)

[link] Optimising under arbitrarily many constraint equations
dkl9 · 2024-09-12T14:59:28.475Z · comments (0)

[question] How to cite LessWrong as an academic source?
PhilosophicalSoul (LiamLaw) · 2024-11-06T08:28:26.309Z · answers+comments (6)

[link] Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman (josh-hickman) · 2024-09-08T16:13:33.187Z · comments (1)

2025 Q1 Pivotal Research Fellowship (Technical & Policy)
Tobias H (clearthis) · 2024-11-12T10:56:24.858Z · comments (0)

GPT4o is still sensitive to user-induced bias when writing code
Reed (ThomasReed) · 2024-09-22T21:04:54.717Z · comments (0)

Halifax Canada - ACX Meetups Everywhere Fall 2024
interstice · 2024-08-29T18:39:12.490Z · comments (0)

[link] Linkpost: Hypocrisy standoff
Chris_Leong · 2024-09-29T14:27:19.175Z · comments (1)

Democracy beyond majoritarianism
Arturo Macias (arturo-macias) · 2024-09-03T15:10:56.284Z · comments (2)

[question] Artificial V/S Organoid Intelligence
10xyz (10xyz-coder) · 2024-10-23T14:31:46.385Z · answers+comments (0)

[question] AMA: International School Student in China
Novice · 2024-10-01T06:00:16.282Z · answers+comments (0)

[link] Internal music player: phenomenology of earworms
dkl9 · 2024-11-14T23:29:48.383Z · comments (1)

[question] How do we know dreams aren't real?
Logan Zoellner (logan-zoellner) · 2024-08-22T12:41:57.380Z · answers+comments (31)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

chaosmage on What Ketamine Therapy Is Like

First I heard of it was from an anesthesiologist who was very happy with how it is the only way to get to full anesthesia without depressing the patient's heart rate, so for senior patients it was really the only option. In retrospect, his enthusiasm about it does seem suspicious, but we were surrounded by professors and I don't think he was lying.

sam-marks on OpenAI Email Archives (from Musk v. Altman)

FYI it seems like this (important-seeming) email is missing, though the surrounding emails in the exchange seem to be present. (So maybe some other ones are missing too.)

benito on Sabotage Evaluations for Frontier Models

I'm having a hard time following this argument. To be clear, I'm saying that while certain people were in regulatory bodies in the US & UK govts, they actively had secret legal contracts to not criticize the leading industry player, else (prseumably) they could be sued for damages. This is not a past shady deals, this is about current people during their current tenure having been corrupted.

quetzal_rainbow on D0TheMath's Shortform

Completeness theorem states that consistent countable FO theory has a model. Compactness theorem states that FO theory has a model iff every finite subset of FO theory has a model. Both theorems are provable in ZFC.

Therefore:

Consistent(ZFC) <-> all finite subsets of ZFC have a model ->

not Consistent(ZFC) <-> some finite subsets of ZFC don't have a model ->

some finite subsets of ZFC + not Consistent(ZFC) don't have a model <->

not Consistent(ZFC + not Consistent(ZFC)),

proven in ZFC + not Consistent(ZFC)

ape-in-the-coat on Quantum Immortality: A Perspective if AI Doomers are Probably Right

If my π-guess is wrong, my only chance to survive is getting all-heads.

Your other chance for survival is that whatever means are used to kill you somehow does not succeed due to quantuum effects. And this is what QI+path-based identity approach actually predicts. The universe isn't going to reotroactively change the digit of pi, but neither it's going to influence the probability of the coin tosses due to the fact that someone may die. QI influence will trigger only at the moment of your death, turning it into near death. And then for the next attempt. And for the next one. Potentially locking you in a state of eternal torture.

However, abandoning SSSA also has a serious theoretical cost:
If observed probabilities have a hidden subjective dimension (because of path-dependency), all hell breaks loose. If we agree that probabilities of being a copy are distributed not in a state-dependent way, but in a path-dependent way, we agree that there is a 'hidden variable' in self-locating probabilities. This hidden variable does not play a role in our π experiment but appears in other thought experiments where the order of making copies is defined.

I fail to see this cost. Yes, we agree that there is an additional variable. Namely, my causal history. It's not necessary hidden but can as well be. So what? What is so hellbreaking about it? This is exactly how probability theory works in every other case. Why should it have a special case for conscious experience?

If there are two bags one with 1 red ball and another with 1000 blue balls and then the coin is tossed and based on the outcome I'm either getting a ball from the first or the second bag, I'm expecting to receive red ball with 50% chance. I'm not supposed to assume out of nowhere that every ball have to have equal probabilities to be given, therefore postulate a ball-shadow that will modify the fairness of the coin.

benito on Sabotage Evaluations for Frontier Models

I'm sorry, I'm confused about something here, I'll back up briefly and then respond to your point.

My model is:

The vast majority of people who've seriously thought about it believe we don't know how to solve the alignment problem.
More fundamentally, there's a sense in which we "basically don't know what we're doing" with regards to AI. People talk about "agents" and "goals" and "intentions" but we're kind of like at the phlogiston theory of heat or vitalism theory of life. We don't get it. We have no equations, we have no theory, we're just like "man these systems can really write and make pretty pictures" like we used to say "I don't get it but some things are hot and some things are cold". Science was tried, found hard, engineering was tried, found easy, and now we're only doing that. [LW · GW]
Many/most folks who've given it serious thought are pretty confident that the default outcome is doom (omnicide or permanent disempowerment), though it may be way kinda worse (e.g. eternal torture) or slightly better (e.g. we get to keep earth), due to intuitive arguments about instrumental goals and selecting on minds in the way machine learning works. (This framing is a bit local, in that not every scientist in the world would quite know what I'm referring to here.)
People are working hard and fast to build these AIs anyway because it's a profitable industry.

This literally spells the end of humanity (barring the eternal torture option or the grounded on earth option).

Back to your comment: some people are building AGI and knowingly threatening all of our lives. I propose they should show up and explain themselves.

A natural question is "Why should they talk with you Ben? You're just one of the 8 billion people whose lives they're threatening."

That is why I am further suggesting they talk with many of the great and worthy thinkers who are of the position this is clearly bad, like Hinton, Bengio, Russell, Yudkowsky, Bostrom, etc.

I am reading you say something like "But as long as someone is defending their behavior, they don't need to show up to defend it themselves."

This lands with me like we are two lowly peasants, who are talking about how the King has mistreated us due to how the royal guards often beat us up and rape the women. I'm saying "I would like the King to answer for himself" and I'm hearing you say "But I know a guy in the next pub who thinks the King is making good choices with his powers. If you can argue with him, I don't see why the King needs to come down himself." I would like to have the people who are wielding the power defend themselves.

Again, this is not me proposing business norms, it's me saying "the people who are taking the action that looks like it kills us, I want those people in particular to show up and explain themselves".

remmelt-ellen on AI Safety Camp 10

Fair question. You can assume it is AoE.

Research leads are not going to be too picky in terms of what hour you send the application in,

There is no need to worry about the exact deadline. Even if you send in your application on the next day, that probably won't significantly impact your chances of getting picked up by your desired project(s).

Sooner is better, since many research leads will begin composing their teams after the 17th, but there is no hard cut-off point.

satron on Sabotage Evaluations for Frontier Models

Then we are actually broadly in agreement. I just think that instead of CEOs responding to the public, having anyone at their side (the side of AI alignment being possible) responding is enough. Just as an example that I came up with, if a critic says that some detail is a reason for why AI will be dangerous, I do agree that someone needs to respond to the argument. But I would be fine with it being someone other than the CEO.

That's why I am relatively optimistic about Anthropic hiring the guy who has been engaged with critic's argument for years.

harold-1 on Twelve Virtues of Rationality

Since reading this a few years ago, I've often thought about the Void and Musashi's advice. For the reference of anyone like me, the quote in its full context is comes after a description of the 'five fundamental stances' in 'the Way of the sword'. Here it is from the 2001 Wilson translation of the Five Rings -- I think it's worth thinking about together with the piece above:

The Lesson of Stance/No Stance
What is called Stance/No Stance means that there is no stance that you should take with your sword at all. However, as I place this within the Five Stances, there is a stance here. According to the chances your opponent takes, and according to his position and energy, your sword will be of a mind to cut down your opponent in fine fashion no matter where you place it. According to the moment, if you want to lower your sword a little from the Upper Stance, it will become a Middle Stance; if, according to the situation, you raise your sword a bit from the Middle Stance, it will become the Upper Stance. The Lower Stance, accordingly, may be raised a little to become the Middle Stance as well. This means that the two Side Stances, according to their position, may be moved a little to the center and become the Middle or Lower Stances.

This is the principle in which there is a stance and there is no stance. At its heart, this is first taking up the sword and cutting down your opponent, no matter what is done or how it happens. Whether you parry, slap, strike, hold back or touch your opponent's cutting sword, you must understand that all of these are opportunities to cut him down. To think, "I'll parry," or "I'll slap," or "I'll hit, hold or touch," will be insufficient for cutting him down. It is essential to think that anything at all is an opportunity to cut him down. You should investigate this thoroughly. With martial arts in the larger field, the placement of numbers of people is also a stance. All of these are opportunities to win a battle. It is wrong to be inflexible. You should make great efforts in this.

(The Japanese is here: https://www.koten.net/gorin/yaku/214/)

benito on Sabotage Evaluations for Frontier Models

I believe the disagreement is not about CEOs, it's about illegitimate power. If you'll allow me a brief detour, I'll try to explain.

Sometimes people grant other people power over them. For instance, I have agreed to work at my company. I've agreed that my CEO can fire me, and make many other demands of me, in exchange for money and other various demands I can make of him. Ideally we entered into this agreement freely and without inappropriate pressure.

Other times, people get power over people without any agreement or granting. Your parent typically has a lot of power over you until you are 18. They can determine what you eat, where you are physically located, what privacy you have, what resources you have, etc. Also, as has been very important for most of history, people have been able to be physically violent to one another and hurt people or even end their lives. Neither of these powers are come to consensually.

For the latter, an important question to ask is "How does one wield this power well? What does it mean to wield it well vs poorly?" There are many ways to parent, many choices about diet and schooling and sleep times and what are fair punishment. But some parents starve their children and beat them for not following instructions and sexually assault them. This is an inappropriate use of power.

There's a legitimacy that comes by being granted power, and an illegitimacy that comes with getting or wielding power that you were not granted.

I think that there's a big question about how to wield it well vs poorly, and how to respect people you have illegitimate powers over. Something I believe is that society functions better if we take seriously the attempt to wield it well. To not casually kill someone if you can get away with it and feel like it, but consider them as people worthy of respect, and ask how you can respect the people you've been non-consensually given power over.

This requires doing some work. It involves asking yourself what's a reasonable amount of effort to spend modeling their preferences given how much power you have over someone, it involves asking yourself if society has any good received wisdom on what to do with this particular power, and it involves engaging with people who are aggrieved by your use of power over them.

Now, the standard model for companies and businesses is a libtertarian-esque free market, where all trades are consensual and have no inappropriate pressure. This is like the first situation I describe, where a company has no people it has undue power over, no people who it can treat better or worse with the power it has over them.

The situation where you are building machines you believe may kill literally everyone, is like the second situation, where you have a very different power dynamic, where you're making choices that affect everyone's lives and that they had little-to-no say in. In such a situation, I think if you are going to do what is good and right, you owe it to show up and engage with those who believe you are using the power you have over them in ways that are seriously hurting them.

That's the difference between this CEO situation and all of the others. It's not about standards for CEOs, its about standards for illegitimate power.

This kind of talking-with-the-aggrieved-people-you-have-immense-power-over is a way of showing the people basic respect, and it is not present in this case. I believe these people are risking my life and many others', and they seem to me disrespectful and largely uninterested in showing up to talk with the people whose lives they are risking.