LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] My MATS Summer 2023 experience
James Chua (james-chua) · 2024-03-20T11:26:14.944Z · comments (0)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

End-to-end hacking with language models
tchauvin (timot.cool) · 2024-04-05T15:06:53.689Z · comments (0)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

[link] Debate helps supervise human experts [Paper]
habryka (habryka4) · 2023-11-17T05:25:17.030Z · comments (6)

Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen · 2024-05-21T04:14:11.749Z · comments (0)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis · 2023-12-17T20:28:57.854Z · comments (7)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Non-myopia stories
lberglund (brglnd) · 2023-11-13T17:52:31.933Z · comments (10)

[link] The Poker Theory of Poker Night
omark · 2024-04-07T09:47:01.658Z · comments (13)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

Cicadas, Anthropic, and the bilateral alignment problem
kromem · 2024-05-22T11:09:56.469Z · comments (6)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

[link] AI Impacts 2023 Expert Survey on Progress in AI
habryka (habryka4) · 2024-01-05T19:42:17.226Z · comments (1)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (70)

[link] Cellular reprogramming, pneumatic launch systems, and terraforming Mars: Some things I learned about at Foresight Vision Weekend
jasoncrawford · 2024-01-04T19:33:57.887Z · comments (0)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem
Nora_Ammann · 2023-10-26T14:38:14.916Z · comments (4)

Ackshually, many worlds is wrong
tailcalled · 2024-04-11T20:23:59.416Z · comments (42)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

Heuristics for preventing major life mistakes
SK2 (lunchbox) · 2023-12-20T08:01:09.340Z · comments (2)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

Deconfusing “ontology” in AI alignment
Dylan Bowman (dylan-bowman) · 2023-11-08T20:03:43.205Z · comments (3)

AI #65: I Spy With My AI
Zvi · 2024-05-23T12:40:02.793Z · comments (7)

[link] Quick Thoughts on Scaling Monosemanticity
Joel Burget (joel-burget) · 2024-05-23T16:22:48.035Z · comments (1)

Escaping Skeuomorphism
Stuart Johnson (stuart-johnson) · 2023-12-20T03:51:00.489Z · comments (0)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

christiankl on What are some good ways to form opinions on controversial subjects in the current and upcoming era?

The idea that either A, B or something in between has to be right is for many political issues wrong. It's possible that both A and B are wrong. I don't see why would would start with a different assumption.

For most issues, you are not required to have an opinion and it's often better to focus your energies on issues where you have unique insight or power to affect the issue than focusing on national level political issues where you have neither unique insight nor power to influence them in a meaningful way.

Foreign actors will attempt to push people on twitter/reddit/etc. towards either (1) or (5), even if the answer is really (3) for them. Everyone I interact with is either partially influenced by these actors or discusses their opinions with people who are influenced by these actors.

Why do you consider it better to be manipulated by domestic actors than foreign actors? Why does it matter whether the actors are foreign?

ronny-fernandez on Lighthaven Sequences Reading Group #7 (Tuesday 10/22)

There is! It is now posted! Sorry about the delay.

zy on davekasten's Shortform

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc? It is also good to note that sometimes the "status" is dynamic, and may or may not imply anything causal with their decision making or choice on priorities.

One scenario is "higher status" might correlates with better resources to achieve those statuses, and a possibility thus that they haven't experienced or they are not subject to many near-term harms. In other words, it is not really about the difference between "average" and "high status"'s people's intelligence, but more about what kind of world they are exposed to.

I do think it is good to hear all different perspectives to stay curious/open-minded.

edit: I just saw Dragon nicely listed two potential reasons, with scenario 2 mentioning something similar with my comment here. But something slightly specific in my thinking, is that these choices made by "average" and "high status" people may or may not be conscious, but rather from the experience from their lives and the world they are exposed to.

raemon on johnswentworth's Shortform

I have similar tastes, but, some additional gears:

I think all day, these days. Even if I'm trying to have interesting, purposeful conversations with people who also want that, it is useful to have sorts of things to talk about that let some parts of my brain relax (while using other parts of my brain I don't use as much)
on the margin, you can do an intense intellectual conversation, but still make it funnier, or with more opportunity for people to contribute.

martinkunev on Generalizing Foundations of Decision Theory

up to a linear transformation

shouldn't it be positive linear transformation

momom2 on Why is there Nothing rather than Something?

I enjoy reading any kind of cogent fiction on LW, but this one is a bit too undeveloped for my tastes. Perhaps be more explicit about what Myrkina sees in the discussion which relates to our world?
You don't have to always spell earth-shattering revelations out loud (in fact it's best to let the readers reach the correct conclusion by themselves imo), but there needs to be enough narrative tension to make the conclusion inevitable; as it stands, it feels like I can just meh my way out of thinking more than 30s on what the revelation might be, the same way Tralith does.

anders-lindstroem on Anders Lindström's Shortform

I am so thrilled! Daylight saving time got me to experience (kind of) the sleeping beauty problem first hand.

Last night we in Sweden changed our clocks back one hour at 03.00 to 02.00 and went from “summertime” to the dreaded “wintertime”. It’s dreaded because we know what follows with it, ice storms and polar bears in the streets...

Anyways, I woke up in the middle of the night and I reached for my phone to check what time it was. It was 02.50. Then it struck me. Am I experiencing the first 02.50 or the second 02.50 this night, i.e. have I first slept to 03, then the clock have changed back to 02 (which it automatically does on the phone) and then slept until 02.50 the new time or am I on the first 02.50 and in 10 minutes at 03 the clock will switchback to 02?

It was a very dizzying thought. I could not for my life say either or. There was nothing in the dark that could give me any indication weather I was experiencing the first or the second 02.50. Then with my thoughts spinning I slowly waited for the clock on my phone to turn 03. When it did, it did not go back to 02, I had experienced the second 02.50 that night.

ninety-three on The Summoned Heroine's Prediction Markets Keep Providing Financial Services To The Demon King!

They made it so the sociopath at the top of the pyramid was the kind that’s clever and myopic and numerate and invested in the status quo

The word "myopic" seems out of place in this list of positive descriptors, especially contrasted with crazed gloryhounds. Was this supposed to be "farsighted"?

viliam on cryonics is a pascal's mugging?

Well, there are different opinions on the possibility of reconstructing a person. Some people here would agree with you. I am afraid that there will not be enough evidence left to reconstruct the person, even if we had all their writings, and we usually don't have even that.

christiankl on Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

I did talk with Geoff Anders about this. He told me that there's no legal agreement between CEA and Leverage. However, there are Leverage employees that are ex-CEA and thus bound by legal agreement. Geoff himself said, that he would consider it positive for the information to be public but he would not want to pick another fight with CEA by publically talking about what happened.