LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
aogara (Aidan O'Gara) · 2024-01-24T19:38:33.461Z · comments (1)

[question] Supposing the 1bit LLM paper pans out
O O (o-o) · 2024-02-29T05:31:24.158Z · answers+comments (11)

Uncertainty in all its flavours
Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · comments (6)

Fifteen Lawsuits against OpenAI
Remmelt (remmelt-ellen) · 2024-03-09T12:22:09.715Z · comments (4)

An Affordable CO2 Monitor
Pretentious Penguin (dylan-mahoney) · 2024-03-21T03:06:53.255Z · comments (1)

Reprograming the Mind: Meditation as a Tool for Cognitive Optimization
Jonas Hallgren · 2024-01-11T12:03:41.763Z · comments (3)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

[question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?
Dalcy (Darcy) · 2024-08-03T12:39:44.085Z · answers+comments (1)

[link] ML Safety Research Advice - GabeM
Gabe M (gabe-mukobi) · 2024-07-23T01:45:42.288Z · comments (2)

On the 2nd CWT with Jonathan Haidt
Zvi · 2024-04-05T17:30:05.223Z · comments (3)

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

[question] What Software Should Exist?
Tomás B. (Bjartur Tómas) · 2024-01-19T21:43:50.112Z · answers+comments (27)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] Video Intro to Guaranteed Safe AI
Mike Vaiana (mike-vaiana) · 2024-07-11T17:53:47.630Z · comments (0)

Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes (steve2152) · 2024-07-08T15:27:07.402Z · comments (7)

[link] Solving alignment isn't enough for a flourishing future
mic (michael-chen) · 2024-02-02T18:23:00.643Z · comments (0)

Scientific Notation Options
jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · comments (13)

The economy is mostly newbs (strat predictions)
lukehmiles (lcmgcd) · 2024-02-01T19:15:49.420Z · comments (6)

[link] David Burns Thinks Psychotherapy Is a Learnable Skill. Git Gud.
Morpheus · 2024-01-27T13:21:05.068Z · comments (20)

[link] Analyzing how SAE features evolve across a forward pass
bensenberner · 2024-11-07T22:07:02.827Z · comments (0)

An AI crash is our best bet for restricting AI
Remmelt (remmelt-ellen) · 2024-10-11T02:12:03.491Z · comments (1)

Distinguishing ways AI can be "concentrated"
Matthew Barnett (matthew-barnett) · 2024-10-21T22:21:13.666Z · comments (2)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

[link] Evaluating Synthetic Activations composed of SAE Latents in GPT-2
Giorgi Giglemiani (Rakh) · 2024-09-25T20:37:48.227Z · comments (0)

There aren't enough smart people in biology doing something boring
Abhishaike Mahajan (abhishaike-mahajan) · 2024-10-21T15:52:04.482Z · comments (13)

[question] What prevents SB-1047 from triggering on deep fake porn/voice cloning fraud?
ChristianKl · 2024-09-26T09:17:39.088Z · answers+comments (21)

Domain-specific SAEs
jacob_drori (jacobcd52) · 2024-10-07T20:15:38.584Z · comments (0)

[link] Predicting Influenza Abundance in Wastewater Metagenomic Sequencing Data
jefftk (jkaufman) · 2024-09-23T17:25:58.380Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

[question] Any real toeholds for making practical decisions regarding AI safety?
lukehmiles (lcmgcd) · 2024-09-29T12:03:08.084Z · answers+comments (6)

Interpretability of SAE Features Representing Check in ChessGPT
Jonathan Kutasov (jonathan-kutasov) · 2024-10-05T20:43:36.679Z · comments (2)

European Progress Conference
Martin Sustrik (sustrik) · 2024-10-06T11:10:03.819Z · comments (11)

[link] Generic advice caveats
Saul Munn (saul-munn) · 2024-10-30T21:03:07.185Z · comments (1)

Superintelligence Can't Solve the Problem of Deciding What You'll Do
Vladimir_Nesov · 2024-09-15T21:03:28.077Z · comments (11)

[link] Why Recursion Pharmaceuticals abandoned cell painting for brightfield imaging
Abhishaike Mahajan (abhishaike-mahajan) · 2024-11-05T14:51:41.310Z · comments (1)

EA Infrastructure Fund's Plan to Focus on Principles-First EA
Linch · 2023-12-06T03:24:55.844Z · comments (0)

[link] Link Collection: Impact Markets
Saul Munn (saul-munn) · 2023-12-26T09:01:48.815Z · comments (0)

Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes (steve2152) · 2024-06-25T17:49:01.488Z · comments (1)

My Dating Heuristic
Declan Molony (declan-molony) · 2024-05-21T05:28:40.197Z · comments (4)

Without Fundamental Advances, Rebellion and Coup d'État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko · 2024-01-31T10:14:02.042Z · comments (34)

A short dialogue on comparability of values
cousin_it · 2023-12-20T14:08:29.650Z · comments (7)

Weak vs Quantitative Extinction-level Goodhart's Law
VojtaKovarik · 2024-02-21T17:38:15.375Z · comments (1)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

When and why should you use the Kelly criterion?
Garrett Baker (D0TheMath) · 2023-11-05T23:26:38.952Z · comments (25)

How to develop a photographic memory 2/3
PhilosophicalSoul (LiamLaw) · 2023-12-30T20:18:14.255Z · comments (7)

flowing like water; hard like stone
lsusr · 2024-02-20T03:20:46.531Z · comments (4)

[link] Found Paper: "FDT in an evolutionary environment"
the gears to ascension (lahwran) · 2023-11-27T05:27:50.709Z · comments (47)

NYU Code Debates Update/Postmortem
David Rein (david-rein) · 2024-05-24T16:08:06.151Z · comments (4)

A Strange ACH Corner Case
jefftk (jkaufman) · 2024-02-10T03:00:05.930Z · comments (2)

Survey on the acceleration risks of our new RFPs to study LLM capabilities
Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

mr-hire on Current safety training techniques do not fully transfer to the agent setting

However, this would not address the underlying pattern of alignment failing to generalize.

Is there proof that this is an overall pattern? It would make sense that models are willing to do things they're not willing to talk about, but that doesn't mean there's a general pattern that e.g. they wouldn't be willing to talk about things, and wouldn't be willing to do them, but WOULD be willing to some secret third option.

vlad-mikulik on Anthropic: Three Sketches of ASL-4 Safety Case Components

The reasoning for the indistinguishability criterion is as a precaution against a model sandbagging when used as the red team attack model. If the model can tell it’s in a control eval it could pretend to be worse at attacking, making us think the control measures are better than they actually are. If the red team elicitation is robust to this, I agree this isn’t necessary. (See the note below [P2.4]).

viliam on Viliam's Shortform

Fuck Google, seriously. About once a week it asks me whether I want to "backup my photos in the cloud", and I keep clicking no, because fuck you why would I want to upload my private photos on your company servers.

But apparently I accidentally once clicked yes (maybe), because suddenly Google sends me a notification about how it created a beautiful animation of my recent photos in the cloud, offering me the option to download them. I don't want to download my private photos from the fucking Google cloud, I never wanted them to be there in the first place! I want to click the delete button, but it's not there: it's either download the animation from the cloud, or close the dialog.

Of course, turning off the functionality is at least 10x more difficult than turning it on, so I get ready to spend this evening finding the advice online and configuring my phone to stop uploading my private photos to Google servers, and preferably to delete all the photos that are already there despite my wishes. Does the "delete" option even exist anymore, or is there just "move to recycle bin (where it stays for as long as we want it to stay there)"? Today I will find out.

Again, fuck Google. I hope the company burns down. I wonder what other things I have already accidentally "consented" to. Google's idea of consent is totally rapist. And I only found this out by accident. In future, I expect to accidentally find this or some other "optional" feature turned on again.

shankar-sivarajan on AI #89: Trump Card

J. D. Vance's (may he live forever) tweets about AI safety and open source (from March 3, 2024), replying to Vinod Khosla's advocacy for more centralized control:

There are undoubtedly risks related to AI. One of the biggest:
A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can’t produce accurate history. ChatGPT promotes genocidal concepts.
The solution is open source

and link

If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such an insane political bias? If you wanted to promote bipartisan efforts to regulate for safety, it's entirely counterproductive.
Any moderate or conservative who goes along with this obvious effort to entrench insane left-wing businesses is a useful idiot.
I'm not handing out favors to industrial-scale DEI bullshit because tech people are complaining about safety.

ricraz on Anthropic: Three Sketches of ASL-4 Safety Case Components

1. Yepp, seems reasonable. Though FYI I think of this less as some special meta argument, and more as the common-sense correction that almost everyone implicitly does when giving credences, and rationalists do less than most. (It's a step towards applying outside view, though not fully "outside view".)

2. Yepp, agreed, though I think the common-sense connotations of "if this became" or "this would have a big effect" are causal, especially in the context where we're talking to the actors who are involved in making that change. (E.g. the non-causal interpretation of your claim feels somewhat analogous to if I said to you "I'll be more optimistic about your health if you take these pills", and so you take the pills, and then I say "well the pills do nothing but now I'm more optimistic, because you're the sort of person who's willing to listen to recommendations". True, but it also undermines people's willingness/incentive to listen to my claims about what would make the world better.)

3. Here are ten that affect AI risk as much one way or the other:

The US government "waking up" a couple of years earlier or later (one operationalization: AISIs existing or not right now).
The literal biggest names in the field of AI becoming focused on AI risk.
The fact that Anthropic managed to become a leading lab (and, relatedly, the fact that Meta and other highly safety-skeptical players are still behind).
Trump winning the election.
Elon doing all his Elon stuff (like founding x.AI, getting involved with Trump, etc).
The importance of transparency about frontier capabilities (I think of this one as more of a logical update that I know you've made).
o1-style reasoning as the next big breakthrough.
Takeoff speeds (whatever updates you've made in the last three years).
China's trajectory of AI capabilities (whatever updates you've made about that in last 3 years).
China's probability of invading Taiwain (whatever updates you've made about that in last 3 years).

And then I think in 3 years we'll be able to publish a similar list of stuff that mostly we just hadn't predicted or thought about before now.

I expect you'll dispute a few of these; happy to concede the ones that are specifically about your updates if you disagree (unless you agree that you will probably update a bunch on them in the next 3 years).

But IMO the easiest way for safety cases to become the industry-standard thing is for AISI (or internal safety factions) to specifically demand it, and then the labs produce it, but kinda begrudgingly, and they don't really take them seriously internally (or are literally not the sort of organizations that are capable of taking them seriously internally—e.g. due to too much bureaucracy). And that seems very much like the sort of change that's comparable to or smaller than the things above.

I think I would be more sympathetic to your view if the claim were "if AI labs really reoriented themselves to take these AI safety cases as seriously as they take, say, being in the lead or making profit". That would probably halve my P(doom), it's just a very very strong criterion.

sarahconstantin on sarahconstantin's Shortform

links 11/08/2024: https://roamresearch.com/#/app/srcpublic/page/11-08-2024

https://agingbiotech.info/about/ a database of aging biotech companies compiled by Karl Pfleger
https://longevitylist.com/longevity-industry-database/ a database of aging biotech companies compiled by Nathan Cheng, includes somewhat different picks
GLP-1 receptor agonist drugs reduce all-cause mortality -- so what diseases or causes of death do they prevent?
- https://www.nature.com/articles/s41467-024-50199-y kidney disease (in type-2 diabetes patients with kidney disease)
- https://www.ajmc.com/view/glp-1s-reduce-cardiovascular-risk-equally-in-patients-with-overweight-obesity-regardless-of-diabetes cardiovascular disease (in overweight or obese patients)
  - https://journals.sagepub.com/doi/pdf/10.1177/17562864241281903
- https://www.science.org/doi/10.1126/science.adn4128 (sadly I couldn't find the full article)
- https://www.ingentaconnect.com/content/ben/cdr/2018/00000014/00000003/art00008 cardiovascular disease (in diabetics)
https://wonder.cdc.gov/controller/datarequest/D176;jsessionid=C53D7110417D14C262ECD70F0091 what are the leading causes of death in 2023?
- heart disease, cancer, accidents, stroke, COPD, Alzheimer's, diabetes, kidney disease, liver disease, COVID-19, suicide, influenza & pneumonia, hypertension, septicemia, Parkinson's
- surprised suicide was so high and that COVID-19 was still so deadly (I assume mostly in the elderly)
https://www.fiercebiotech.com/biotech/bioage-brings-almost-200m-ipo-obesity-biotech-joins-nasdaq BioAge IPO
I forgot that Sam Altman invested in Retro Bio
- https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/
- the man has good taste. like, it's not blindingly original to appreciate Retro, but it is eminently reasonable.
there's a lot of moderate-Democrat post-election resignation to the effect of "this is what the country wanted; the median voter is in fact pretty OK with Trump" and "the progressive apparatus was more interested in staying in its comfort zone than winning elections"
- https://substack.com/home/post/p-151278372 Jesse Singal
  - he was saying similar things all along: https://jessesingal.substack.com/p/democrats-should-acknowledge-reality
I'm also seeing a fair number of women going "ok, sure, there are things to criticize about feminist dogma, but actually I have experienced traditionalist religious mores and they were Not Good", which I think is a needed corrective these days
- https://substack.com/home/post/p-141175575 here's Audrey Horne
https://backofmind.substack.com/p/incompetence-is-a-form-of-bias Dan Davies says incompetence is a form of bias -- the people who have the social skills and clout to get their problems fixed, will.
Dan Davies on politics and populism...i'm not sure where he's going here but this is intriguing.
- https://substack.com/home/post/p-151264334
https://esmeralda.org/ Esmerelda, Devon Zeugel's Chautauqua-inspired village in California

viliam on A brief history of the automated corporation

Why do most humans in 2041 still need to work 40 hours a week? The answer is complicated, but to keep this comment simple, let's focus on a few factors that even a hypothetical reader from 2024 would understand.

In most countries, government regulation requires humans in the loop. These might seem like bullshit jobs, but that doesn't make the competition for them any less fierce. An average person cannot get a good job without good credentials (required for regulatory reasons), and good credentials are expensive; it often takes a lifetime to pay back the school debt. It doesn't matter whether the things taught at school are useful in any practical sense (the few remaining human teachers mostly agree that they are not), but they are required by law. The official reasoning is that general education keeps us human (note: this is simplified to the level of strawman, but I am trying to keep it simple for the hypothetical 2024 reader unfamiliar with the culture wars of 2041).

With the exception of a few things such as rent, most things today are significantly cheaper than they used to be in 2024. On the other hand, there are new expenses, many of them related to AI. Some aspects of life got complicated, for example contracts of all kinds. To put it bluntly, you need the latest AI to safely navigate the legal minefield created by the latest AI. Trying to save money by using a cheaper version of AI that is several weeks obsolete is generally considered a very bad idea, and will probably cost you more in long run, because you have no idea what you sign (and you should generally assume that the form was optimized to extract as much value from you as legally possible, otherwise the company would be leaving money on table). You either spend a large part of your income on AI services... or you risk joining the underclass at the first accident; there is not much of a middle way. If you can't afford the "business version" of the latest AI, you can get one that is supported by advertising -- the less you pay for it, the more you should expect the AI agent to optimize for the goals of the advertisers rather than your personal goals. (Oh, "advertisement" today no longer means trying to influence the humans. Humans are mostly irrelevant. It means influencing the AI agents that make most of the everyday decisions. As a simple example, you can pay the AI agents to buy your products rather than your competitor's products, even if they are somewhat more expensive or worse, and to defend this choice to human users using individually optimized arguments.)

There is increasingly addictive... well, basically everything. I am afraid that a far [? · GW]-mode description will fail to convey how strong the effect is when experienced in near mode, but basically: The salesmen of old have used only a few dozen simple techniques (such as smiling at you, looking in your eyes, repeating your name, trying to anchor you to a higher price and then giving you a discount, creating a false sense of urgency, etc.) which were only statistically effective and often failed or backfired for you, the modern ones come to you with a full AI-powered analysis of your personality (yes, there are regulations against this, but they are trivially circumvented), and they have probably already spent a few previous months trying to influence you in all known ways (bots pretending to be humans contacting you on social networks and nudging you in the desired direction, advertising in your AI agent if you use the cheaper version, subliminal advertising on the streets flashing when the screen detects you looking at it, etc.) which makes is almost impossible to resist; in many cases the humans believe that the interaction was actually their own idea, and quite often they fall in love with the salesperson.

Some people suggest that this is a problem humanity should focus on solving, but the respected economists (and more importantly, their AI advisors) mostly shrug and say: "revealed preferences".

cubefox on The Case Against Moral Realism

Yudkowsky has written about it:

(...) In standard metaethical terms, we have managed to rescue 'moral cognitivism' (statements about rightness have truth-values) and 'moral realism' (there is a fact of the matter out there about how right something is). We have not however managed to rescue the pretheoretic intuition underlying 'moral internalism' (...)

mondsemmel on Lao Mein's Shortform

You can't trust exit polls on demographics crosstabs. From Matt Yglesias on Slow Boring:

Over and above the challenge inherent in any statistical sampling exercise, the basic problem exit pollsters have is that they have no way of knowing what the electorate they are trying to sample actually looks like, but they do know who won the election. They end up weighting their sample to match the election results, which is good because otherwise you’d have polling error about the topline outcome, which would look absurd. But this weighting process can introduce major errors in the crosstabs.
For example, the 2020 exit poll sample seems to have included too many college- educated white people. That was a Biden-leaning demographic group, so in a conventional poll, it would have simply exaggerated Biden’s share of the total vote. But the exit poll knows the “right answer” for Biden’s aggregate vote share, so to compensate for overcounting white college graduates in the electorate, it has to understate Biden’s level of support within this group. That is then further offset by overstating Biden’s level of support within all other groups. So we got a lot of hot takes in the immediate aftermath of the election about Biden’s underperformance with white college graduates, which was fake, while people missed real trends, like Trump doing better with non-white voters.
To get the kind of data that people want exit polls to deliver, you actually need to wait quite a bit for more information to become available from the Census and the voter files about who actually voted. Eventually, Catalist produced its “What Happened in 2020” document, and Pew published its “Behind Biden’s 2020 Victory” report. But those take months to assemble, and unfortunately, conventional wisdom can congeal in the interim.
So just say no to exit poll demographic analysis!

niplav on AI #89: Trump Card

Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.

SQLite is well-known for its incredibly thorough test suite and relatively few CVEs, and with ~156kloc (excluding tests) it's not a very large project, so I think this would be an over-reaction. I'd guess that other databases have more and worse security vulnerabilities due to their attack surface—see MySQL with its ~4.4mloc (including tests). Big Sleep was probably now used on SQLite because it's a fairly small project of which large parts can fit into an LLMs' context window.

Maybe someone will try to translate the SQLite code to Rust or Zig using LLMs—until then we're stuck.