LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

Experience Report - ML4Good AI Safety Bootcamp
Kieron Kretschmar · 2024-04-11T18:03:41.040Z · comments (0)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

[question] How does it feel to switch from earn-to-give?
Neil (neil-warren) · 2024-03-31T16:27:22.860Z · answers+comments (4)

Please Understand
samhealy · 2024-04-01T12:33:20.459Z · comments (11)

Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen · 2024-05-21T04:14:11.749Z · comments (0)

D&D.Sci Hypersphere Analysis Part 1: Datafields & Preliminary Analysis
aphyer · 2024-01-13T20:16:39.480Z · comments (1)

[link] GDP per capita in 2050
Hauke Hillebrandt (hauke-hillebrandt) · 2024-05-06T15:14:30.934Z · comments (8)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis · 2023-12-17T20:28:57.854Z · comments (7)

[question] Weighing reputational and moral consequences of leaving Russia or staying
spza · 2024-02-18T19:36:40.676Z · answers+comments (24)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Quick Thoughts on Our First Sampling Run
jefftk (jkaufman) · 2024-05-23T00:20:02.050Z · comments (3)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

donatas-luciunas on Alignment is not intelligent

I am sure you can't prove your position. And I am sure I can prove my position.

Your reasoning is based on assumption that all value is known. If utility function assigns value to something - it is valuable. If utility function does not assign value - it is not valuable. While the truth is that something might be valuable but your utility function does not know it yet. It would be more intelligent to use 3 categories - valuable, not valuable and unknown.

Let's say you are booking a flight and you have a possibility to get checked baggage for free. It's absolutely not relevant for you to your best current knowledge. But you understand that your knowledge might change and it costs nothing to keep more options open, so you take the checked baggage.

Let's say you are traveler, wanderer. You have limited space in your backpack. Sometimes you find items and you need to choose - put it in the backpack or not. You definitely keep items that are useful. You leave behind items that are not useful. What you do if you find an item which usefulness is unknown? Some mysterious item. Take it if it is small, leave it if it is big? According to you it is obvious to leave it. Does not sound intelligent for me.

We can draw a little decision matrix:

Leave item
- no burden 👍
- no opportunity to use it
Take item
- a burden 👎
- may be useful, may be harmful, may have no effect
- knowledge about usefuness of an item 👍

Don't you think that "knowledge about usefuness of an item" can sometimes be worth a burden?

We are deep in a rabbit hole, but I hope you understand the importance. If intelligence and goal are coupled (according to me they are) all current alignment research is dangerously misleading.

davidmanheim on (Salt) Water Gargling as an Antiviral

Do I understand correctly that the blue-green graph has a y-axis that goes above 100% median reduction, with error bars in that range? (This would happen if they estimated a proportion as a standard variable - not great practice, but I want to check that it is what happened.)

q-home on Making a conservative case for alignment

Napoleon is merely an argument for "just because you strongly believe it, even if it is a statement about you, does not necessarily make it true".

When people make arguments, they often don't list all of the premises. That's not unique to trans discourse. Informal reasoning is hard to make fully explicit. "Your argument doesn't explicitly exclude every counterexample" is a pretty cheap counter-argument. What people experience is important evidence and an important factor, it's rational to bring up instead of stopping yourself with "wait, I'm not allowed to bring that up unless I make an analytically bulletproof argument". For example, if you trust someone that they feel strongly about being a woman, there's no reason to suspect them of being a cosplayer who chases Twitter popularity.

I expect that you will disagree with a lot of this, and that's okay; I am not trying to convince you, just explaining my position.

I think I still don't understand the main conflict which bothers you. I thought it was "I'm not sure if trans people are deluded in some way (like Napoleons, but milder) or not". But now it seems like "I think some people really suffer and others just cosplay, the cosplayers take something away from true sufferers". What is taken away?

jblack on What epsilon do you subtract from "certainty" in your own probability estimates?

For all practical purposes, such credences don't matter. Such scenarios certainly can and do happen, but in almost all cases there's nothing you can do about them without exceeding your own bounded rationality and agency.

If the stakes are very high then it may make sense to consider the probability of some sort of trick, and attempt to get further evidence of the physical existence of the coin and that its current state matches what you are seeing.

There is essentially no point in assigning probabilities to hypotheses of failures of your mind itself. You can't reason your way out of serious mind malfunction using arithmetic. At best you could hope to recognize that it is malfunctioning, and try not to do anything that will make things worse. In the case of mental impairment severe enough to have false memories or sensations this blatant, a rational person should expect that a person so affected wouldn't be capable of correctly carrying out quantified Bayesian reasoning.

My own background credences are generally not insignificant for something like this or even stranger, but they play essentially zero role in my life and definitely not in any probability calculations. Such hypotheses are essentially untestable and unactionable.

arturo-macias on Arthropod (non) sentience

We are surprisingly high in forebrain neuron count:

https://en.m.wikipedia.org/wiki/List_of_animals_by_number_of_neurons

peterbarnett on Daniel Kokotajlo's Shortform

I've been playing around with Suno, inspired by this Rob Long Tweet: https://x.com/rgblong/status/1857233734640222364
I've been pretty shocked at how easily it makes music that I want to listen to (mainly slightly cringe midwest emo): https://suno.com/song/1a5a1edf-9711-4ca4-a2f7-ef814ca298b4

zero-contradictions on Eugenics Performed By A Blind, Idiot God

I don't believe that gene-editing is a viable solution to preventing dysgenics for the entire population.

Unregulated reproduction has the potential to harm others, so it's reasonable to regulate it.

seth-herd on Should you increase AI alignment funding, or increase AI regulation?

The reason this is a difficult question is that we don't know how hard alignment will be. Opinions from different people with best-in-class expertise and time-on-task disagree wildly.

Therefore I'd argue that we should throw effort and funding into resolving that question by putting the reasoning processes of the relevant experts to wider scrutiny, and do a more systematic job of evaluating them.

Funding comes from a different resource pool than regulation, so you might mean which one should get your advocacy efforts. The same arguments apply to both of them, and to the meta-alignment question.

seth-herd on How can we prevent AGI value drift?

I wish the odds for getting AGI into trustworthy hands were better. The source of my optimism is the hope that those hands just need to be decent - to have what I've conceptualized as a positive empathy - sadism balance. That's anyone who's not a total sociopath (lacking empathy and tending toward vengeance and competition) and/or sadist. I hope that about 90-99% of humanity would eventually make the world vastly better with their AGI, just because it's trivially easy for them to do, so it only requires the smallest bit of goodwill.

I wish I were more certain of that. I've tried to look a little at some historical examples of rulers born into power and with little risk of losing it. A disturbing number of them were quite callous rulers. They were usually surrounded by a group of advisors that got them to ignore the plight of the masses and focus on the concerns of an elite few. But this situation isn't analogous - once your AGI hits superintelligence, it would be trivially easy to both help the masses in profound ways, and pursue whatever crazy schemes you and your friends have come up with. Thus my limited optimism.

WRT the distributed power structure of Western governments: I think AGI would be placed under executive authority, like the armed forces, and the US president and those with similar roles in other countries would hold near-total power, should they choose to use it. They could transform democracies into dictatorships with ease. And we very much do continue to elect selfish and power-hungry individuals, some of whom probably actually have a negative empathy-sadism balance.

Looking back, I note that you said I argued for "good odds" while I said "decent odds". We may be in agreement on the odds.

But there's more to consider here. Thanks again for engaging; I'd like to get more discussion of this topic going. I doubt you or I are seeing all of the factors that will be obvious in retrospect yet.

leogao on leogao's Shortform

the most valuable part of a social event is often not the part that is ostensibly the most important, but rather the gaps between the main parts.

at ML conferences, the headline keynotes and orals are usually the least useful part to go to; the random spontaneous hallway chats and dinners and afterparties are extremely valuable
when doing an activity with friends, the activity itself is often of secondary importance. talking on the way to the activity, or in the gaps between doing the activity, carry a lot of the value
at work, a lot of the best conversations happen outside of scheduled 1:1s and group meetings, but rather happen in spontaneous hallway or dinner groups