LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

AI #59: Model Updates
Zvi · 2024-04-11T14:20:06.339Z · comments (2)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

Sparse MLP Distillation
slavachalnev · 2024-01-15T19:39:02.926Z · comments (3)

Interpreting the Learning of Deceit
RogerDearnaley (roger-d-1) · 2023-12-18T08:12:39.682Z · comments (14)

[link] There is no IQ for AI
Gabriel Alfour (gabriel-alfour-1) · 2023-11-27T18:21:26.196Z · comments (10)

RA Bounty: Looking for feedback on screenplay about AI Risk
Writer · 2023-10-26T13:23:02.806Z · comments (6)

Announcing SPAR Summer 2024!
laurenmarie12 · 2024-04-16T08:30:31.339Z · comments (2)

Information-Theoretic Boxing of Superintelligences
JustinShovelain · 2023-11-30T14:31:11.798Z · comments (0)

[link] 2024 State of the AI Regulatory Landscape
Deric Cheng (deric-cheng) · 2024-05-28T11:59:06.582Z · comments (0)

Some additional SAE thoughts
Hoagy · 2024-01-13T19:31:40.089Z · comments (4)

[link] Evaluating Stability of Unreflective Alignment
james.lucassen · 2024-02-01T22:15:40.902Z · comments (3)

Understanding Subjective Probabilities
Isaac King (KingSupernova) · 2023-12-10T06:03:27.958Z · comments (16)

[link] Managing AI Risks in an Era of Rapid Progress
Algon · 2023-10-28T15:48:25.029Z · comments (3)

"Full Automation" is a Slippery Metric
ozziegooen · 2024-06-11T19:56:49.855Z · comments (1)

Against "argument from overhang risk"
RobertM (T3t) · 2024-05-16T04:44:00.318Z · comments (11)

[link] AISN #28: Center for AI Safety 2023 Year in Review
aogara (Aidan O'Gara) · 2023-12-23T21:31:40.767Z · comments (1)

Running the Numbers on a Heat Pump
jefftk (jkaufman) · 2024-02-09T03:00:04.920Z · comments (12)

Verifiable private execution of machine learning models with Risc0?
mako yass (MakoYass) · 2023-10-25T00:44:48.643Z · comments (2)

The Intentional Stance, LLMs Edition
Eleni Angelou (ea-1) · 2024-04-30T17:12:29.005Z · comments (3)

Interpreting Quantum Mechanics in Infra-Bayesian Physicalism
Yegreg · 2024-02-12T18:56:03.967Z · comments (6)

[link] The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford · 2023-11-29T18:30:36.315Z · comments (1)

Adversarial Robustness Could Help Prevent Catastrophic Misuse
aogara (Aidan O'Gara) · 2023-12-11T19:12:26.956Z · comments (18)

[link] Big tech transitions are slow (with implications for AI)
jasoncrawford · 2024-10-24T14:25:06.873Z · comments (14)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[link] [Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang (leon-lang) · 2024-10-22T13:57:41.125Z · comments (0)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

AI #85: AI Wins the Nobel Prize
Zvi · 2024-10-10T13:40:07.286Z · comments (6)

Fun With CellxGene
sarahconstantin · 2024-09-06T22:00:03.461Z · comments (2)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (1)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[link] Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-05-20T04:14:44.435Z · comments (21)

I played the AI box game as the Gatekeeper — and lost
datawitch · 2024-02-12T18:39:35.777Z · comments (52)

Aggregative Principles of Social Justice
Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · comments (10)

Big-endian is better than little-endian
Menotim · 2024-04-29T02:30:48.053Z · comments (17)

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled · 2024-06-10T21:20:11.938Z · comments (13)

[link] Abs-E (or, speak only in the positive)
dkl9 · 2024-02-19T21:14:32.095Z · comments (24)

Investigating Bias Representations in LLMs via Activation Steering
DawnLu · 2024-01-15T19:39:14.077Z · comments (4)

Glomarization FAQ
Zane · 2023-11-15T20:20:49.488Z · comments (5)

[link] Debate helps supervise human experts [Paper]
habryka (habryka4) · 2023-11-17T05:25:17.030Z · comments (6)

A Common-Sense Case For Mutually-Misaligned AGIs Allying Against Humans
Thane Ruthenis · 2023-12-17T20:28:57.854Z · comments (7)

Wholesome Culture
owencb · 2024-03-01T12:08:17.877Z · comments (3)

Results from the Turing Seminar hackathon
Charbel-Raphaël (charbel-raphael-segerie) · 2023-12-07T14:50:38.377Z · comments (1)

Deception Chess: Game #2
Zane · 2023-11-29T02:43:22.375Z · comments (17)

Reviewing the Structure of Current AI Regulations
Deric Cheng (deric-cheng) · 2024-05-07T12:34:17.820Z · comments (0)

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith (joekc) · 2023-11-29T16:32:30.068Z · comments (1)

Throughput vs. Latency
alkjash · 2024-01-12T21:37:07.632Z · comments (2)

Non-myopia stories
lberglund (brglnd) · 2023-11-13T17:52:31.933Z · comments (10)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

interstice on Shortform

Good post, it's underappreciated that a society of ideally rational people wouldn't have unsubsidized, real-money prediction markets.

unless you've actually got other people being wrong even in light of the new actors' information

Of course in real prediction markets this is exactly what we see. Maybe you could think of PMs as they exist not as something that would exist in an equilibrium of ideally rational agents, but as a method of moving our society closer to such an equilibrium, subsidized by the bets of systematically irrational people. It's not a perfect such method, but does have the advantage of simplicity. How many of these issues could be solved by subsidizing markets?

Discord Message

What discord is this, sounds cool.

yonge on D&D Sci Coliseum: Arena of Data

Looking at how various combinataions of race and class do against one another when their levels are the same there are clearly some combinations that do a lot better than others. Increasing the level helps to an extent, but the race/class combination looks like it is easily the most important factor. Special items do help, but less than the level. In most cases boots seem a bit more useful than gauntlets.

Manually scanning through the data suggests that the following combinations hopefully won't be too bad:

DWARF NINJA v HUMAN WARRIOR + 1 boots of speed + 2 gauntlets
ELF KNIGHT v HUMAN KNIGHT + 3 boots of speed +1 gauntlets
DWARF WARRIOR v ELF NINJA + 3 gauntlets + 2 boots of speed
HUMAN MONK v DWARF MONK + 4 boots of speed

As for the bonus obective. It looks like someone is threatening us if we do well in one or more of the fights, however I can't establish the details with any reliability. And as we have no idea who sent, and what their real intentions are, we would probably be wise to ignore it ad do what we would have done anyway, at least for now.

akash-wasil on What AI companies should do: Some rough ideas

TY. Some quick reactions below:

Agree that the public stuff has immediate effects that could be costly. (Hiding stuff from the public, refraining from discussing important concerns publicly, or developing a reputation for being kinda secretive/sus can also be costly; seems like an overall complex thing to model IMO.)
Sharing info with government could increase the chance of a leak, especially if security isn't great. I expect the most relevant info is info that wouldn't be all-that-costly if leaked (e.g., the government doesn't need OpenAI to share its secret sauce/algorithmic secrets. Dangerous capability eval results leaking or capability forecasts leaking seem less costly, except from a "maybe people will respond by demanding more govt oversight" POV.
I think all-in-all I still see the main cost as making safety regulation more likely, but I'm more uncertain now, and this doesn't seem like a particularly important/decision-relevant point. Will edit the OG comment to language that I endorse with more confidence.

zach-stein-perlman on IAPS: Mapping Technical Safety Research at AI Companies

@Zoe Williams [LW · GW]

arthur-conmy on IAPS: Mapping Technical Safety Research at AI Companies

I assume all the data is fairly noisy, since scanning for the domain I know in https://raw.githubusercontent.com/Oscar-Delaney/safe_AI_papers/refs/heads/main/Automated%20categorization/final_output.csv, it misses ~half of the GDM Mech Interp output from the specified window and also mislabels https://arxiv.org/abs/2208.08345 and https://arxiv.org/abs/2407.13692 as Mech Interp (though two labels are applied to these papers and I didn't dig to see which was used)

joao-ribeiro-medeiros on A Logical Proof for the Emergence and Substrate Independence of Sentience

I think this a very simple and likewise powerful articulation of anti-anthropocentrism, which I fully support.

I am particularly on board with "Sentience doesn’t require [...] any particular physical substrate beyond something capable of modeling these patterns."

To correctly characterize the proof as 100% logical I think there is still some room for explicitly stating rigorous definitions of the concepts involved such as sentience.

Thank you for sharing!

matthew-barnett on Alexander Gietelink Oldenziel's Shortform

Does trade here just means humans consuming, I.e. trading money for AI goods and services? That doesn't sound like trading in the usual sense where it is a reciprocal exchange of goods and services.

Trade can involve anything that someone "owns", which includes both their labor and their property, and government welfare. Retired people are generally characterized by trading their property and government welfare for goods and services, rather than primarily trading their labor. This is the basic picture I was trying to present.

How many 'different' AI individuals do you expect there to be ?

I think the answer to this question depends on how we individuate AIs. I don't think most AIs will be as cleanly separable from each other as humans are, as most (non-robotic) AIs will lack bodies, and will be able to share information with each other more easily than humans can. It's a bit like asking how many "ant units" there are. There are many individual ants per colony, but each colony can be treated as a unit by itself. I suppose the real answer is that it depends on context and what you're trying to figure out by asking the question.

ryan_greenblatt on What AI companies should do: Some rough ideas

It depends on the type of transparency.

Requirements which just involve informing one part of the government (say US AISI) in ways which don't cost much personnel time mostly have the effect of potentially making the government much more aware at some point. I think this is probably good, but presumably labs would prefer to retain flexibility and making the government more aware can go wrong (from the lab's perspective) in ways other than safety focused regulation. (E.g., causing labs to be merged as part of a national program to advance capabilities more quickly.)

Whistleblower requirements with teeth can have information leakage concerns. (Internal only whistleblowering policies like Anthropic's don't have this issue, but also have basically no teeth for the company overall.)

Any sort of public discussion (about e.g. threat models, government involvement, risks) can have various PR and reputational cost. (Idk if you were counting this under transparency.)

To the extent you expect to disagree with third party inspectors about safety (and they aren't totally beholden to you), this might end up causing issues for you later.

I'm not claiming that "reasonable" labs shouldn't do various types of transparency unilaterially, but I don't think the main cost is in making safety focused regulation more likely.

akash-wasil on What AI companies should do: Some rough ideas

@Zach Stein-Perlman [LW · GW] @ryan_greenblatt [LW · GW] feel free to ignore but I'd be curious for one of you to explain your disagree react. Feel free to articulate some of the ways in which you think I might be underestimating the costliness of transparency requirements.

(My current estimate is that whistleblower mechanisms seem very easy to maintain, reporting requirements for natsec capabilities seem relatively easy insofar as most of the information is stuff you already planned to collect, and even many of the more involved transparency ideas (e.g., interview programs) seem like they could be implemented with pretty minimal time-cost.)

akash-wasil on Lab governance reading list

Yeah, I think there's a useful distinction between two different kinds of "critiques:"

Critique #1: I have reviewed the preparedness framework and I think the threshold for "high-risk" in the model autonomy category is too high. Here's an alternative threshold.
Critique #2: The entire RSP/PF effort is not going to work because [they're too vague//labs don't want to make them more specific//they're being used for safety-washing//labs will break or weaken the RSPs//race dynamics will force labs to break RSPs//labs cannot be trusted to make or follow RSPs that are sufficiently strong/specific/verifiable].

I feel like critique #1 falls more neatly into "this counts as lab governance" whereas IMO critique #2 falls more into "this is a critique of lab governance." In practice the lines blur. For example, I think last year there was a lot more "critique #1" style stuff, and then over time as the list of specific object-level critiques grew, we started to see more support for things in the "critique #2" bucket.