LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Potential alignment targets for a sovereign superintelligent AI
Paul Colognese (paul-colognese) · 2023-10-03T15:09:59.529Z · answers+comments (4)

Non-myopia stories
[deleted] · 2023-11-13T17:52:31.933Z · comments (10)

[link] Debate helps supervise human experts [Paper]
habryka (habryka4) · 2023-11-17T05:25:17.030Z · comments (6)

Deception Chess: Game #2
Zane · 2023-11-29T02:43:22.375Z · comments (17)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis · 2023-12-17T20:28:57.854Z · comments (7)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell · 2024-04-15T18:21:48.771Z · comments (7)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

AI #61: Meta Trouble
Zvi · 2024-05-02T18:40:03.242Z · comments (0)

Weekly newsletter for AI safety events and training programs
Bryce Robertson (bryceerobertson) · 2024-05-03T00:33:29.418Z · comments (0)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

Offering Completion
jefftk (jkaufman) · 2024-06-07T01:40:02.137Z · comments (6)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Linch · 2024-11-18T00:44:57.133Z · comments (2)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

Winning isn't enough
JesseClifton · 2024-11-05T11:37:39.486Z · comments (14)

Trading Candy
jefftk (jkaufman) · 2024-11-01T01:10:08.024Z · comments (4)

Escaping Skeuomorphism
Stuart Johnson (stuart-johnson) · 2023-12-20T03:51:00.489Z · comments (0)

Evaporation of improvements
Viliam · 2024-06-20T18:34:40.969Z · comments (27)

[question] How did you integrate voice-to-text AI into your workflow?
ChristianKl · 2023-11-20T12:01:37.696Z · answers+comments (12)

Solstice 2023 Roundup
dspeyer · 2023-10-11T23:09:08.252Z · comments (6)

Monthly Roundup #19: June 2024
Zvi · 2024-06-25T12:00:03.333Z · comments (9)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Childhood and Education Roundup #6: College Edition
Zvi · 2024-06-26T11:40:03.990Z · comments (8)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

Heuristics for preventing major life mistakes
SK2 (lunchbox) · 2023-12-20T08:01:09.340Z · comments (2)

Aggregative principles approximate utilitarian principles
Cleo Nardo (strawberry calm) · 2024-06-12T16:27:22.179Z · comments (3)

Deconfusing “ontology” in AI alignment
Dylan Bowman (dylan-bowman) · 2023-11-08T20:03:43.205Z · comments (3)

Updates to Open Phil’s career development and transition funding program
abergal · 2023-12-04T18:10:29.394Z · comments (0)

{Book Summary} The Art of Gathering
Tristan Williams (tristan-williams) · 2024-04-16T10:48:41.528Z · comments (0)

An explanation of evil in an organized world
KatjaGrace · 2024-05-02T05:20:06.240Z · comments (9)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Employee Incentives Make AGI Lab Pauses More Costly
nikola (nikolaisalreadytaken) · 2023-12-22T05:04:15.598Z · comments (12)

Cryonics p(success) estimates are only weakly associated with interest in pursuing cryonics in the LW 2023 Survey
Andy_McKenzie · 2024-02-29T14:47:28.613Z · comments (6)

[link] Memo on some neglected topics
Lukas Finnveden (Lanrian) · 2023-11-11T02:01:55.834Z · comments (2)

[link] Conversation Visualizer
ethanmorse · 2023-12-31T01:18:01.424Z · comments (4)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

Can quantised autoencoders find and interpret circuits in language models?
charlieoneill (kingchucky211) · 2024-03-24T20:05:50.125Z · comments (4)

Tackling Moloch: How YouCongress Offers a Novel Coordination Mechanism
Hector Perez Arenas (hector-perez-arenas) · 2024-05-15T23:13:48.501Z · comments (9)

Collection (Part 6 of "The Sense Of Physical Necessity")
LoganStrohl (BrienneYudkowsky) · 2024-03-14T21:37:00.160Z · comments (0)

Online Dialogues Party — Sunday 5th November
Ben Pace (Benito) · 2023-10-27T02:41:00.506Z · comments (1)

AI #64: Feel the Mundane Utility
Zvi · 2024-05-16T15:20:02.956Z · comments (11)

3. Premise three & Conclusion: AI systems can affect value change trajectories & the Value Change Problem
Nora_Ammann · 2023-10-26T14:38:14.916Z · comments (4)

Auditing LMs with counterfactual search: a tool for control and ELK
Jacob Pfau (jacob-pfau) · 2024-02-20T00:02:09.575Z · comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

lblack on Alexander Gietelink Oldenziel's Shortform

for a large enough (overparameterized) architecture - in other words it can be measured by the

The sentence seems cut off.

gunnar_zarncke on OpenAI Email Archives (from Musk v. Altman)

A much smaller subset was also published here, but does include documents:

https://www.techemails.com/p/elon-musk-and-openai?r=1jki4r

rotatingpaguro on AI #90: The Wall

I agree with whay you say about how to maximize what you get out of an interview. I also agree about that discussion vs. debate distinction you make, and I wasn't specifically trying to go there when I used the word "debate", I was just sloppy with words.

I guess you agree that it is friction to create a social norm that you should do a read up of the other person material before engaging in public. I expect less discussions would happen. There is not a clear threshold at how much you should be prepared.

I guess we disagree about how much value do we lose due to eliminating discussions that could have happaned, vs. how much value we gain by eliminating some lower quality discussions.

Another angle I have in mind that sidesteps this direct compromise, is that maybe what we value out of such discussions is not just doing an optimal play in terms of information transmitted between the parties. A public discussion has many different viewers. In the case at hand, I expect many people get more out of the discussion if they can see Wolfram think through the thing for the first time in real time, rather than having two informed people start discussing finer points in medias res.

viliam on Neutrality

Library in the sense of "we collect texts written by other people" is: The Best Textbooks on Every Subject [LW · GW]

I would like to see this one improved; specifically to have a dedicated UI where people can add books, vote on books, and review them. Maybe something like "people who liked X also liked Y".

Also, not just textbooks, but also good popular science books, etc.

gerardus-mercator on Claude seems to be smarter than LessWrong community

I see those assertions, but I don't see why an intelligent agent would be persuaded by them. Why would it think that the hypothetical objective goal is better than its utility function? Caring about objective facts and investigating them is also an instrumental goal compared to the terminal goal of optimizing its utility function. The agent's only frame of reference for 'better' and 'worse' is relative to its utility function; it would presumably understand that there are other frames of reference, but I don't think it would apply them, because that would lead to a worse outcome according to its current frame of reference.

dakara on Simple probes can catch sleeper agents

I am also interested in knowing whether the probing method is a solution to the undetectable backdoor problem.

dakara on Simple probes can catch sleeper agents

This paper argues that unintended deceptive behavior is not susceptible to detection by probing method. The authors of that paper argue that the probing method fares no better than random guessing for detecting unintended deceptive behavior.

I would really appreciate any input, especially from Monte or his co-authors. This seems like a very important issue to address.

dr_s on Neutrality

Agree 100% with all of this.

There is one thing that comes to mind IMO and that people who argue that "everything is political" and that neutrality is an evil ploy to actually sneak in your evil ideas really underestimate: the point of impartiality as you describe it is to keep things simpler. Maybe a God with an infinite mind could keep in it all the issues, all the complexities, all the nuances simultaneously, and continuously figure out the optimal path. But we can't. We come up with simple rules like "if you're a doctor, you have a duty to cure anyone, not pick and choose" because they make things more straightforward and decouple domains. Doctors cure people. If you do crimes, there's a system dedicated to punish you. But a doctor's job is different, and the knowledge they need to do it has nothing to do with your rap sheet.

The frenzy to couple everything into a single tangle of complexity is driven by the misunderstanding that complacency is the only reason why your ideology is not the winning one, and that if only everyone was forced to think about it all of the time, they'd end up agreeing with it. But in reality, decoupling is necessary mostly because it allows the world to be cognitively accessible rather than driving us into either perpetual decision paralysis or perpetual paranoia (or worse, both). Destroying that doesn't give anyone victory, we just end up all worse off.

satron on Sabotage Evaluations for Frontier Models

Sure, it sounds like a good idea! Below I will write my thoughts on your overall summarized position.

———

"I primarily wish to argue that, given the general lack of accountability for developing machine learning systems in worlds where indeed the default outcome is doom, it should not be surprising to find out that there is a large corporation (or multiple) doing so."

I do think that I could maybe agree with this if it was 1 small corporation. In your previous comment you suggested that you are describing not the intentional contribution to the omnicide, but the bit of rationalization. I don't think I would agree that that many people working on AI are successfully engaged in that bit of rationalization or that it would be enough to keep them doing it. The big factor is that in case of their failure, they personally (and all of their loved ones) will suffer the consequences.

"It is also not surprising that glory-seeking companies have large departments focused on 'ethics' and 'safety' in order to look respectable to such people."

I don't disagree with this, because it seems plausible that one of the reasons for creating safety departments is ulterior. However, I believe that this reason is probably not the main one and that AI safety labs are making genuinely good research papers. To take an example of Anthropic, I've seen safety papers that got LessWrong community excited (at least judging by upvotes). Like this [LW · GW] one.

"I believe that there is no good plan and that these companies would exist regardless of whether a good plan existed or not... I believe that the people involved are getting rich risking all of our lives and there is (currently) no justice here"

For the reasons, that I mentioned in my first paragraph I would probably disagree with this. Relatedly, while I do think wealth in general can be somewhat motivating, I also think that AI developers are aware that all their wealth would mean nothing if AI kills everyone.

———

Overall, I am really happy with this discussion. Our disagreements came down to a few points and we agree on quite a bit of issues. I am similarly happy to conclude this big comment thread.

anders-lindstroem on The Online Sports Gambling Experiment Has Failed

Good write up! People Cannot Handle "fill in the blank" on smartphones. Sex, food, drugs, social status, betting, binge watching, shopping etc. in abundance and a click away is something we cannot not handle. If some of biggest corporations in the world spends billions upon billions each year to grab our attention, they will win and "you" will on average loose, unless you pull the cord (or turn off the wifi...) or have extreme will power.

I am definitely not the one to throw the first rock, but is it not pretty embarrassing that most of us who thought we were so smart and independent are mere serfs, both intellectually and physically, to a little piece of electronics that have completely and utterly hijacked our brains and bodies.