LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Empathy/Systemizing Quotient is a poor/biased model for the autism/sex link
tailcalled · 2024-11-04T21:11:57.788Z · comments (0)

Some comments on intelligence
Viliam · 2024-08-01T15:17:07.215Z · comments (5)

Housing Roundup #10
Zvi · 2024-10-29T13:50:09.416Z · comments (2)

SAE Probing: What is it good for? Absolutely something!
Subhash Kantamneni (subhashk) · 2024-11-01T19:23:55.418Z · comments (0)

AI #74: GPT-4o Mini Me and Llama 3
Zvi · 2024-07-25T13:50:06.528Z · comments (6)

AI Constitutions are a tool to reduce societal scale risk
Sammy Martin (SDM) · 2024-07-25T11:18:17.826Z · comments (2)

Winning isn't enough
Anthony DiGiovanni (antimonyanthony) · 2024-11-05T11:37:39.486Z · comments (2)

A path to human autonomy
Nathan Helm-Burger (nathan-helm-burger) · 2024-10-29T03:02:42.475Z · comments (11)

[link] Safety tax functions
owencb · 2024-10-20T14:08:38.099Z · comments (0)

[LDSL#4] Root cause analysis versus effect size estimation
tailcalled · 2024-08-11T16:12:14.604Z · comments (0)

[question] Where to find reliable reviews of AI products?
Elizabeth (pktechgirl) · 2024-09-17T23:48:25.899Z · answers+comments (6)

[question] What Other Lines of Work are Safe from AI Automation?
RogerDearnaley (roger-d-1) · 2024-07-11T10:01:12.616Z · answers+comments (35)

Examples of How I Use LLMs
jefftk (jkaufman) · 2024-10-14T17:10:04.597Z · comments (2)

AI labs can boost external safety research
Zach Stein-Perlman · 2024-07-31T19:30:16.207Z · comments (1)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger (jeffrey-heninger) · 2024-07-15T21:30:04.043Z · comments (1)

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean (euanmclean) · 2024-10-29T12:16:18.448Z · comments (7)

[link] Liquid vs Illiquid Careers
vaishnav92 · 2024-10-20T23:03:49.725Z · comments (6)

[link] AI forecasting bots incoming
Dan H (dan-hendrycks) · 2024-09-09T19:14:31.050Z · comments (44)

[link] [Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
Yohan Mathew (ymath) · 2024-09-25T14:52:48.263Z · comments (2)

The Shallow Bench
Karl Faulks (karl-faulks) · 2024-11-05T05:07:27.357Z · comments (2)

[link] My Methodological Turn
adamShimi · 2024-09-29T15:01:45.986Z · comments (0)

Bay Winter Solstice 2024: Speech Auditions
ozymandias · 2024-11-04T22:31:38.680Z · comments (0)

[link] A new process for mapping discussions
Nathan Young · 2024-09-30T08:57:20.029Z · comments (7)

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (8)

[link] Arithmetic Models: Better Than You Think
kqr · 2024-10-26T09:42:07.185Z · comments (4)

DIY RLHF: A simple implementation for hands on experience
Mike Vaiana (mike-vaiana) · 2024-07-10T12:07:03.047Z · comments (0)

Towards Quantitative AI Risk Management
Henry Papadatos (henry) · 2024-10-16T19:26:48.817Z · comments (1)

Open Thread Fall 2024
habryka (habryka4) · 2024-10-05T22:28:50.398Z · comments (95)

Reading More Each Day: A Simple $35 Tool
aysajan · 2024-07-24T13:54:04.290Z · comments (2)

[link] Our Digital and Biological Children
Eneasz · 2024-10-24T18:36:38.719Z · comments (0)

[link] New blog: Expedition to the Far Lands
Connor Leahy (NPCollapse) · 2024-08-17T11:07:48.537Z · comments (3)

Context-dependent consequentialism
Jeremy Gillen (jeremy-gillen) · 2024-11-04T09:29:24.310Z · comments (1)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jbash on Update on the Mysterious Trump Buyers on Polymarket

Can't this only be judged in retrospect, and over a decent sample size?

The model that makes you hope for accuracy from the market is that it aggregates the information, including non-public information, available to a large number of people who are doing their best to maximize profits in a reasonable VNM-ish rational way.

In this case, everybody seems pretty sure that the price is where it is because of the actions of a single person who's dumped in a very large amount of money relative to the float. It seems likely that that person has done this despite having no access to any important non-public information about the actual election. For one thing, they've said that they're dumping all of their liquidity into bets on Trump. Not just all the money they already have allocated to semi-recreational betting, or even all the money they have allocated to speculative long shots in general, but their entire personal liquidity. That suggests a degree of certainty that almost no plausible non-public information could actually justify.

Not only that, but apparently they've done it in a way calculated to maximally move the price, which is the opposite of what you'd expect a profit maximizer to want to do given their ongoing buying and their (I think) stated and (definitely at this point) evidenced intention to hold until the market resolves.

If the model is that makes you expect accuracy to begin with is known to be violated, it seems reasonable to assume that the market is out of whack.

Sure, it's possible that the market just happens to be giving an accurate probability for some reason unrelated to how it's "supposed" to work, but that sort of speculation would take a lot of evidence to establish confidently.

I'm assuming that by "every other prediction source" you mean everything other than prediction/betting markets

Well, yes. I would expect that if you successfully mess up Polymarket, you have actually messed up "The Betting Market" as a whole. If there's a large spread between any two specific operators, that really is free money for somebody, especially if that person is already set up to deal on both.

tapatakt on What's a good book for a technically-minded 11-year old?

Only one mention of Jules Verne in answers seems weird to me.

First and foremost, "The Mysterious Island". (But maybe it has already been read at nine?)

steve2152 on Complete Feedback

How about “purely epistemic” means “updated by self-supervised learning”, i.e. the updates (gradients, trader bankrolls, whatever) are derived from “things being true vs false” as opposed to “things being good vs bad”. Right?

[I learned the term teleosemantics [LW · GW] from you! :) ]

The original LI paper was in that category, IIUC. The updates (to which traders had more vs less money) are derived from mathematical propositions being true vs false.

LI defines a notion of logically uncertain variable, which can be used to represent desires

I would say that they don’t really represent desires. They represent expectations about what’s going to happen, possibly including expectations about an AI’s own actions.

And then you can then put the LI into a larger system that follows the rule: whatever the expectations are about the AI’s own actions, make that actually happen.

The important thing that changes in this situation is that the convergence of the algorithm is underdetermined—you can have multiple fixed points. I can expect to stand up, and then I stand up, and my expectation was validated. No update. I can expect to stay seated, and then I stay seated, and my expectation was validated. No update.

(I don’t think I’m saying anything you don’t already know well.)

Anyway, if you do that, then I guess you could say that the LI’s expectations “can be used” to represent desires … but I maintain that that’s a somewhat confused and unproductive way to think about what’s going on. If I intervene to change the LI variable, it would be analogous to changing habits (what do I expect myself to do ≈ which action plans seem most salient and natural), not analogous to changing desires.

(I think the human brain has a system vaguely like LI, and that it resolves the underdetermination by a separate valence [LW · GW] system, which evaluates expectations as being good vs bad, and applies reinforcement learning to systematically seek out the good ones.)

beliefs can have impacts on the world if the world looks at them

…Indeed, what I said above is just a special case. Here’s something more general and elegant. You have the core LI system, and then some watcher system W, which reads off some vector of internal variables V of the core LI system, and then W takes actions according to some function A(V).

After a while, the LI system will automatically catch onto what W is doing, and “learn” to interpret V as an expectation that A(V) is going to happen.

I think the central case is that W is part of the larger AI system, as above, leading to normal agent-like behavior (assuming some sensible system for resolving the underdetermination). But in theory W could also be humans peeking into the LI system and taking actions based on what they see. Fundamentally, these aren’t that different.

So whatever solution we come up with to resolve the underdetermination, whether human-brain-like “valence” [LW · GW] or something else, that solution ought to work for the humans-peeking-into-the-LI situation just as it works for the normal W-is-part-of-the-larger-AI situation.

(But maybe weird things would happen before convergence. And also, if you don’t have any system at all to resolve the underdetermination, then probably the results would be weird and hard to reason about.)

Also, it is easy for end users to build agentlike things out of belieflike things by making queries about how to accomplish things. Thus, we need to train epistemic systems to be responsible about how such queries are answered (as is already apparent in existing chatbots).

I’m not sure that this is coming from a coherent threat model (or else I don’t follow).

If Dr. Evil trains his own AGI, then this whole thing is moot, because he wants the AGI to have accurate beliefs about bioweapons.
If Benevolent Bob trains the AGI and gives API access to Dr. Evil, then Bob can design the AGI to (1) have accurate beliefs about bioweapons, and (2) not answer Dr. Evil’s questions about bioweapons. That might ideally look like what we’re used to in the human world: the AGI says things because it wants to say those things, all things considered, and it doesn’t want Dr. Evil to build bioweapons, either directly or because it’s guessing what Bob would want.

james-stephen-brown on If we solve alignment, do we die anyway?

Hi Seth,

I share your concern that AGI comes with the potential for a unilateral first strike capability that, at present, no nuclear power has (which is vital to the maintenance of MAD), though I think, in game theoretical terms, this becomes more difficult the more self-interested (in survival) players there are. Like in open-source software, there is a level of protection against malicious code because bad players are outnumbered, even if they try to hide their code, there are many others who can find it. But I appreciate that 100s of coders finding malicious code within a single repository is much easier than finding something hidden in the real world, and I have to admit I'm not even sure how robust the open-source model is (I only know how it works in theory). I'm more pointing to the principle, not as an excuse for complacency but as a safety model on which to capitalise.

My point about the UN's law against aggression wasn't that in and of itself it is a deterrent, only that it gives a permission structure for any party to legitimately retaliate.

I also agree that RSI-capable AGI introduces a level of independence that we haven't seen before in a threat. And I do understand inter-dependence is a key driver of cooperation. Another driver is confidence and my hope is that the more intelligent a system gets, the more confident it is, the better it is able to balance the autonomy of others with its goals, meaning it is able to "confide" in others—in the same way as the strongest kid in class was very rarely the bully, because they had nothing to prove. Collateral damage is still damage after all, a truly confident power doesn't need these sorts of inefficiencies. I stress this is a hope, and not a cause for complacency. I recognise that in analogy, the strongest kid, the true class alpha, gets whatever they want with the willing complicity of the classroom. RSI-cabable AGI might get what it wants coercively in a way that makes us happy with our own subjugation, which is still a species of dystopia.

But if you've got a super-intelligent inventor on your side and a few resources, you can be pretty sure you and some immediate loved ones can survive and live in material comfort, while rebuilding a new society according to your preferences.

This sort of illustrates the contradiction here, if you're pretty intelligent (as in you're designing a super-intelligent AGI) you're probably smart enough to know that the scenario outlined here has a near 100% chance of failure for you and your family, because you've created something more intelligent than you that is willing to hide its intentions and destroy billions of people, it doesn't take much to realise that that intelligence isn't going to think twice about also destroying you.

Now, I realise this sounds a lot like the situation humanity is in as a whole... so I agree with you that...

multipolar human-controlled AGI scenario will necessitate ubiquitous surveillance.

I'm just suggesting that the other AGI teams do (or can, leveraging the right incentives) provide a significant contribution to this surveillance.

habryka4 on Bogdan Ionut Cirstea's Shortform

(Most people in AI Alignment work at scaling labs and are therefore almost exclusively working on LLM alignment. That said, I don't actually know what it means to work on LLM alignment over aligning other systems, it's not like we have a ton of traction on LLM alignment, and most techniques and insights seem general enough to not be conditional specifically on LLMs)

dana on Matt Goldenberg's Short Form Feed

A few glaring issues here:
1) Does the question imply causation or not? It shouldn't.
2) Are these stats intended to be realistic such that I need to consider potential flaws and take a holistic view or just a toy scenario to test my numerical skills? If I believe it's the former and I'm confident X and Y are positively correlated, a 2x2 grid showing X and Y negatively correlated should of course make me question the quality of your data proportionally.
3) Is this an adversarial question such that my response may be taken out of context or otherwise misused?

The sample interviews from Veritasium did not seem to address any of these issues:
(1) They seemed to cut out the gun question, but the skin cream question implied causation, "Did the skin cream make the rash better or worse?"
(2) One person mentioned "I Wouldn't have expected that..." which implies he thought it was real data,
(3) the last person clearly interpreted it adversarially.

In the original study, the question was stated as "cities that enacted a ban on carrying concealed handguns were more likely to have a decrease in crime." This framing is not as bad, but still too close to implying causation in my opinion.

ann-brown on Survival without dignity

Too much runs into the very real issue that truth is stranger. 😉

ann-brown on Survival without dignity

It's nice to read some realistic science fiction.

l-rudolf-l on Survival without dignity

Also this very recent one: https://www.lesswrong.com/posts/6h9p6NZ5RRFvAqWq5/the-summoned-heroine-s-prediction-markets-keep-providing

l-rudolf-l on Survival without dignity

Do the stories get old? If it's trying to be about near-future AI, maybe the state-of-the-art will just obsolete it. But that won't make it bad necessarily, and there are many other settings than 2026. If it's about radical futures with Dyson spheres or whatever, that seems like at least a 2030s thing, and you can easily write a novel before then.

Also, I think it is actually possible to write pretty fast. 2k/day is doable, which gets you a good length novel in 50 days; even x3 for ideation beforehand and revising after the first draft only gets you to 150 days. You'd have to be good at fiction beforehand, and have existing concepts to draw on in your head though