LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI doing philosophy = AI generating hands?
Wei Dai (Wei_Dai) · 2024-01-15T09:04:39.659Z · comments (22)

1. The CAST Strategy
Max Harms (max-harms) · 2024-06-07T22:29:13.005Z · comments (19)

[link] The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
titotal (lombertini) · 2024-01-14T15:03:21.087Z · comments (6)

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu (wearsshoes) · 2024-06-25T01:35:54.064Z · comments (9)

The predictive power of dissipative adaptation
dr_s · 2023-12-17T14:01:31.568Z · comments (14)

AI #75: Math is Easier
Zvi · 2024-08-01T13:40:05.539Z · comments (25)

[link] Contra Scott on Abolishing the FDA
Maxwell Tabarrok (maxwell-tabarrok) · 2023-12-15T14:00:17.247Z · comments (3)

D&D.Sci(-fi): Colonizing the SuperHyperSphere
abstractapplic · 2024-01-12T23:36:54.248Z · comments (23)

[link] Bayesians Commit the Gambler's Fallacy
Kevin Dorst · 2024-01-07T12:54:59.939Z · comments (28)

Exercise: Planmaking, Surprise Anticipation, and "Baba is You"
Raemon · 2024-02-24T20:33:49.574Z · comments (19)

Decision Theory in Space
lsusr · 2024-08-18T07:02:11.847Z · comments (18)

[link] Metascience of the Vesuvius Challenge
Maxwell Tabarrok (maxwell-tabarrok) · 2024-03-30T12:02:38.978Z · comments (2)

Untrustworthy models: a frame for scheming evaluations
Olli Järviniemi (jarviniemi) · 2024-08-19T16:27:11.088Z · comments (3)

How to hire somebody better than yourself
lukehmiles (lcmgcd) · 2024-08-28T08:12:53.450Z · comments (5)

Some costs of superposition
Linda Linsefors · 2024-03-03T16:08:20.674Z · comments (11)

AI Safety 101 : Capabilities - Human Level AI, What? How? and When?
markov (markovial) · 2024-03-07T17:29:53.260Z · comments (8)

[link] If Clarity Seems Like Death to Them
Zack_M_Davis · 2023-12-30T17:40:42.622Z · comments (191)

All The Latest Human tFUS Studies
sarahconstantin · 2024-08-09T22:20:04.561Z · comments (2)

In Defense of Parselmouths
Screwtape · 2023-11-15T23:02:19.344Z · comments (10)

[link] Will releasing the weights of large language models grant widespread access to pandemic agents?
jefftk (jkaufman) · 2023-10-30T18:22:59.677Z · comments (25)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence
Towards_Keeperhood (Simon Skade) · 2024-05-06T17:09:10.729Z · comments (16)

Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby · 2024-06-21T00:09:30.441Z · comments (27)

On OpenAI’s Model Spec
Zvi · 2024-06-21T13:00:03.014Z · comments (3)

AI #68: Remarkably Reasonable Reactions
Zvi · 2024-06-13T16:30:02.969Z · comments (11)

[link] For Civilization and Against Niceness
Gabriel Alfour (gabriel-alfour-1) · 2023-11-20T10:56:20.352Z · comments (14)

Saving the world sucks
Defective Altruism (Elijah Bodden) · 2024-01-10T05:55:46.504Z · comments (29)

AI #41: Bring in the Other Gemini
Zvi · 2023-12-07T15:10:05.552Z · comments (16)

Vaniver's thoughts on Anthropic's RSP
Vaniver · 2023-10-28T21:06:07.323Z · comments (4)

On the Proposed California SB 1047
Zvi · 2024-02-12T16:40:04.854Z · comments (18)

So You Created a Sociopath - New Book Announcement!
Garrett Baker (D0TheMath) · 2024-04-01T18:02:18.010Z · comments (3)

D&D.Sci: The Mad Tyrant's Pet Turtles [Evaluation and Ruleset]
abstractapplic · 2024-04-09T14:01:34.426Z · comments (6)

[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes (steve2152) · 2024-06-10T14:19:51.194Z · comments (12)

Thoughts on "The Offense-Defense Balance Rarely Changes"
Cullen (Cullen_OKeefe) · 2024-02-12T03:26:50.662Z · comments (4)

Conflating value alignment and intent alignment is causing confusion
Seth Herd · 2024-09-05T16:39:51.967Z · comments (18)

[link] Michael Dickens' Caffeine Tolerance Research
niplav · 2024-09-04T15:41:53.343Z · comments (3)

Bounty for Evidence on Some of Palisade Research's Beliefs
benwr · 2024-09-23T20:01:20.917Z · comments (4)

[link] MIRI's September 2024 newsletter
Harlan · 2024-09-16T18:15:40.785Z · comments (0)

I finally got ChatGPT to sound like me
lsusr · 2024-09-17T09:39:59.415Z · comments (18)

Forecasting One-Shot Games
Raemon · 2024-08-31T23:10:05.475Z · comments (0)

We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap
johnswentworth · 2024-09-19T22:22:05.307Z · comments (47)

Work with me on agent foundations: independent fellowship
Alex_Altair · 2024-09-21T13:59:16.706Z · comments (5)

AI #80: Never Have I Ever
Zvi · 2024-09-10T17:50:08.074Z · comments (20)

[link] AlphaGeometry: An Olympiad-level AI system for geometry
alyssavance · 2024-01-17T17:17:30.913Z · comments (9)

[link] Loneliness and suicide mitigation for students using GPT3-enabled chatbots (survey of Replika users in Nature)
Kaj_Sotala · 2024-01-23T14:05:40.986Z · comments (2)

[link] Rational Animations' intro to mechanistic interpretability
Writer · 2024-06-14T16:10:57.015Z · comments (1)

A starting point for making sense of task structure (in machine learning)
Kaarel (kh) · 2024-02-24T01:51:49.227Z · comments (2)

On Tapping Out
Screwtape · 2023-11-17T03:23:55.880Z · comments (13)

[link] Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez (ethan-perez) · 2023-11-16T20:18:51.730Z · comments (3)

New intro textbook on AIXI
Alex_Altair · 2024-05-11T18:18:50.945Z · comments (5)

[link] How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer · 2024-03-16T16:00:47.830Z · comments (0)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

interstice on Shortform

Good post, it's underappreciated that a society of ideally rational people wouldn't have unsubsidized, real-money prediction markets.

unless you've actually got other people being wrong even in light of the new actors' information

Of course in real prediction markets this is exactly what we see. Maybe you could think of PMs as they exist not as something that would exist in an equilibrium of ideally rational agents, but as a method of moving our society closer to such an equilibrium, subsidized by the bets of systematically irrational people. It's not a perfect such method, but does have the advantage of simplicity. How many of these issues could be solved by subsidizing markets?

Discord Message

What discord is this, sounds cool.

yonge on D&D Sci Coliseum: Arena of Data

Looking at how various combinataions of race and class do against one another when their levels are the same there are clearly some combinations that do a lot better than others. Increasing the level helps to an extent, but the race/class combination looks like it is easily the most important factor. Special items do help, but less than the level. In most cases boots seem a bit more useful than gauntlets.

Manually scanning through the data suggests that the following combinations hopefully won't be too bad:

DWARF NINJA v HUMAN WARRIOR + 1 boots of speed + 2 gauntlets
ELF KNIGHT v HUMAN KNIGHT + 3 boots of speed +1 gauntlets
DWARF WARRIOR v ELF NINJA + 3 gauntlets + 2 boots of speed
HUMAN MONK v DWARF MONK + 4 boots of speed

As for the bonus obective. It looks like someone is threatening us if we do well in one or more of the fights, however I can't establish the details with any reliability. And as we have no idea who sent, and what their real intentions are, we would probably be wise to ignore it ad do what we would have done anyway, at least for now.

akash-wasil on What AI companies should do: Some rough ideas

TY. Some quick reactions below:

Agree that the public stuff has immediate effects that could be costly. (Hiding stuff from the public, refraining from discussing important concerns publicly, or developing a reputation for being kinda secretive/sus can also be costly; seems like an overall complex thing to model IMO.)
Sharing info with government could increase the chance of a leak, especially if security isn't great. I expect the most relevant info is info that wouldn't be all-that-costly if leaked (e.g., the government doesn't need OpenAI to share its secret sauce/algorithmic secrets. Dangerous capability eval results leaking or capability forecasts leaking seem less costly, except from a "maybe people will respond by demanding more govt oversight" POV.
I think all-in-all I still see the main cost as making safety regulation more likely, but I'm more uncertain now, and this doesn't seem like a particularly important/decision-relevant point. Will edit the OG comment to language that I endorse with more confidence.

zach-stein-perlman on IAPS: Mapping Technical Safety Research at AI Companies

@Zoe Williams [LW · GW]

arthur-conmy on IAPS: Mapping Technical Safety Research at AI Companies

I assume all the data is fairly noisy, since scanning for the domain I know in https://raw.githubusercontent.com/Oscar-Delaney/safe_AI_papers/refs/heads/main/Automated%20categorization/final_output.csv, it misses ~half of the GDM Mech Interp output from the specified window and also mislabels https://arxiv.org/abs/2208.08345 and https://arxiv.org/abs/2407.13692 as Mech Interp (though two labels are applied to these papers and I didn't dig to see which was used)

joao-ribeiro-medeiros on A Logical Proof for the Emergence and Substrate Independence of Sentience

I think this a very simple and likewise powerful articulation of anti-anthropocentrism, which I fully support.

I am particularly on board with "Sentience doesn’t require [...] any particular physical substrate beyond something capable of modeling these patterns."

To correctly characterize the proof as 100% logical I think there is still some room for explicitly stating rigorous definitions of the concepts involved such as sentience.

Thank you for sharing!

matthew-barnett on Alexander Gietelink Oldenziel's Shortform

Does trade here just means humans consuming, I.e. trading money for AI goods and services? That doesn't sound like trading in the usual sense where it is a reciprocal exchange of goods and services.

Trade can involve anything that someone "owns", which includes both their labor and their property, and government welfare. Retired people are generally characterized by trading their property and government welfare for goods and services, rather than primarily trading their labor. This is the basic picture I was trying to present.

How many 'different' AI individuals do you expect there to be ?

I think the answer to this question depends on how we individuate AIs. I don't think most AIs will be as cleanly separable from each other as humans are, as most (non-robotic) AIs will lack bodies, and will be able to share information with each other more easily than humans can. It's a bit like asking how many "ant units" there are. There are many individual ants per colony, but each colony can be treated as a unit by itself. I suppose the real answer is that it depends on context and what you're trying to figure out by asking the question.

ryan_greenblatt on What AI companies should do: Some rough ideas

It depends on the type of transparency.

Requirements which just involve informing one part of the government (say US AISI) in ways which don't cost much personnel time mostly have the effect of potentially making the government much more aware at some point. I think this is probably good, but presumably labs would prefer to retain flexibility and making the government more aware can go wrong (from the lab's perspective) in ways other than safety focused regulation. (E.g., causing labs to be merged as part of a national program to advance capabilities more quickly.)

Whistleblower requirements with teeth can have information leakage concerns. (Internal only whistleblowering policies like Anthropic's don't have this issue, but also have basically no teeth for the company overall.)

Any sort of public discussion (about e.g. threat models, government involvement, risks) can have various PR and reputational cost. (Idk if you were counting this under transparency.)

To the extent you expect to disagree with third party inspectors about safety (and they aren't totally beholden to you), this might end up causing issues for you later.

I'm not claiming that "reasonable" labs shouldn't do various types of transparency unilaterially, but I don't think the main cost is in making safety focused regulation more likely.

akash-wasil on What AI companies should do: Some rough ideas

@Zach Stein-Perlman [LW · GW] @ryan_greenblatt [LW · GW] feel free to ignore but I'd be curious for one of you to explain your disagree react. Feel free to articulate some of the ways in which you think I might be underestimating the costliness of transparency requirements.

(My current estimate is that whistleblower mechanisms seem very easy to maintain, reporting requirements for natsec capabilities seem relatively easy insofar as most of the information is stuff you already planned to collect, and even many of the more involved transparency ideas (e.g., interview programs) seem like they could be implemented with pretty minimal time-cost.)

akash-wasil on Lab governance reading list

Yeah, I think there's a useful distinction between two different kinds of "critiques:"

Critique #1: I have reviewed the preparedness framework and I think the threshold for "high-risk" in the model autonomy category is too high. Here's an alternative threshold.
Critique #2: The entire RSP/PF effort is not going to work because [they're too vague//labs don't want to make them more specific//they're being used for safety-washing//labs will break or weaken the RSPs//race dynamics will force labs to break RSPs//labs cannot be trusted to make or follow RSPs that are sufficiently strong/specific/verifiable].

I feel like critique #1 falls more neatly into "this counts as lab governance" whereas IMO critique #2 falls more into "this is a critique of lab governance." In practice the lines blur. For example, I think last year there was a lot more "critique #1" style stuff, and then over time as the list of specific object-level critiques grew, we started to see more support for things in the "critique #2" bucket.