LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[link] Manifund: 2023 in Review
Austin Chen (austin-chen) · 2024-01-18T23:50:13.557Z · comments (0)

[link] Romae Industriae
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-19T13:03:31.536Z · comments (2)

An Introduction to Representation Engineering - an activation-based paradigm for controlling LLMs
Jan Wehner · 2024-07-14T10:37:21.544Z · comments (4)

[link] Inferring the model dimension of API-protected LLMs
Ege Erdil (ege-erdil) · 2024-03-18T06:19:25.974Z · comments (3)

Learning Math in Time for Alignment
Nicholas / Heather Kross (NicholasKross) · 2024-01-09T01:02:37.446Z · comments (3)

In Defense of Lawyers Playing Their Part
Isaac King (KingSupernova) · 2024-07-01T01:32:58.695Z · comments (9)

How I build and run behavioral interviews
benkuhn · 2024-02-26T05:50:05.328Z · comments (6)

[link] How "Pause AI" advocacy could be net harmful
Tamsin Leake (carado-1) · 2023-12-26T16:19:20.724Z · comments (8)

[link] Talking With People Who Speak to Congressional Staffers about AI risk
Eneasz · 2023-12-14T17:55:50.606Z · comments (0)

[link] End Single Family Zoning by Overturning Euclid V Ambler
Maxwell Tabarrok (maxwell-tabarrok) · 2024-07-26T14:08:45.046Z · comments (1)

A more systematic case for inner misalignment
Richard_Ngo (ricraz) · 2024-07-20T05:03:03.500Z · comments (4)

Falling fertility explanations and Israel
Yair Halberstadt (yair-halberstadt) · 2024-04-03T03:27:38.564Z · comments (4)

On Not Requiring Vaccination
jefftk (jkaufman) · 2024-02-01T19:20:12.657Z · comments (21)

Quick evidence review of bulking & cutting
jp · 2024-04-04T21:43:48.534Z · comments (5)

[link] New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking
Harlan · 2024-04-04T23:41:26.439Z · comments (5)

Why wasn't preservation with the goal of potential future revival started earlier in history?
Andy_McKenzie · 2024-01-16T16:15:08.550Z · comments (1)

[link] A Narrative History of Environmentalism's Partisanship
Jeffrey Heninger (jeffrey-heninger) · 2024-05-14T16:51:01.029Z · comments (3)

On "Geeks, MOPs, and Sociopaths"
alkjash · 2024-01-19T21:04:48.525Z · comments (35)

Games for AI Control
charlie_griffin (cjgriffin) · 2024-07-11T18:40:50.607Z · comments (0)

Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026 (Just Learning) · 2024-05-23T18:28:03.173Z · comments (3)

Good Bings copy, great Bings steal
dr_s · 2024-04-21T09:52:46.658Z · comments (6)

How Would an Utopia-Maximizer Look Like?
Thane Ruthenis · 2023-12-20T20:01:18.079Z · comments (23)

Retrospective: PIBBSS Fellowship 2023
DusanDNesic · 2024-02-16T17:48:32.151Z · comments (1)

UDT1.01: Plannable and Unplanned Observations (3/10)
Diffractor · 2024-04-12T05:24:34.435Z · comments (0)

Mapping the semantic void II: Above, below and between token embeddings
mwatkins · 2024-02-15T23:00:09.010Z · comments (4)

I was raised by devout Mormons, AMA [&|] Soliciting Advice
ErioirE (erioire) · 2024-03-13T16:52:19.130Z · comments (41)

Comparing Quantized Performance in Llama Models
NickyP (Nicky) · 2024-07-15T16:01:24.960Z · comments (2)

[link] [Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.
Linch · 2024-05-20T23:50:28.138Z · comments (8)

[link] The Cancer Resolution?
PeterMcCluskey · 2024-07-24T00:25:17.322Z · comments (24)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (10)

Extracting SAE task features for in-context learning
Dmitrii Kharlapenko (dmitrii-kharlapenko) · 2024-08-12T20:34:13.747Z · comments (1)

Music in the AI World
Martin Sustrik (sustrik) · 2024-08-16T04:20:01.706Z · comments (8)

Apply to MATS 7.0!
Ryan Kidd (ryankidd44) · 2024-09-21T00:23:49.778Z · comments (0)

Book Review: What Even Is Gender?
Joey Marcellino · 2024-09-01T16:09:27.773Z · comments (14)

Attention Output SAEs Improve Circuit Analysis
Connor Kissane (ckkissane) · 2024-06-21T12:56:07.969Z · comments (0)

[LDSL#1] Performance optimization as a metaphor for life
tailcalled · 2024-08-08T16:16:27.349Z · comments (4)

[LDSL#6] When is quantification needed, and when is it hard?
tailcalled · 2024-08-13T20:39:45.481Z · comments (0)

[question] When did Eliezer Yudkowsky change his mind about neural networks?
[deactivated] (Yarrow Bouchard) · 2023-11-14T21:24:00.000Z · answers+comments (15)

Different views of alignment have different consequences for imperfect methods
Stuart_Armstrong · 2023-09-28T16:31:20.239Z · comments (0)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic · 2024-05-20T09:38:55.228Z · comments (2)

Features and Adversaries in MemoryDT
Joseph Bloom (Jbloom) · 2023-10-20T07:32:21.091Z · comments (6)

Late-talking kid part 3: gestalt language learning
Steven Byrnes (steve2152) · 2023-10-17T02:00:05.182Z · comments (5)

AI's impact on biology research: Part I, today
octopocta · 2023-12-23T16:29:18.056Z · comments (6)

[link] Aaron Silverbook on anti-cavity bacteria
DanielFilan · 2023-11-20T03:06:19.524Z · comments (3)

[link] introduction to thermal conductivity and noise management
bhauth · 2024-03-06T23:14:02.288Z · comments (1)

[link] self-fulfilling prophecies when applying for funding
Chipmonk · 2024-03-01T19:01:40.991Z · comments (0)

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin (miles) · 2023-10-03T02:22:00.199Z · comments (0)

[link] Self-Resolving Prediction Markets
PeterMcCluskey · 2024-03-03T02:39:42.212Z · comments (0)

Game Theory without Argmax [Part 2]
Cleo Nardo (strawberry calm) · 2023-11-11T16:02:41.836Z · comments (14)

[link] Anthropic, Google, Microsoft & OpenAI announce Executive Director of the Frontier Model Forum & over $10 million for a new AI Safety Fund
Zach Stein-Perlman · 2023-10-25T15:20:52.765Z · comments (8)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

adastra22 on Are the majority of your ancestors farmers or non-farmers?

The model you propose isn't related to the underlying reality, which involves tracing back your personal history. You've asserted that you can group epochs of history into buckets and assign %age numbers for how many humans you'd likely be descended from, and that this is a reasonable method for approximating an answer to this question. That is not in fact obvious. There's a lot of underlying assumptions, some of which you call out on that guesstimate page, and some I've called out here.

So it's reasonable to spot check the claim via a separate method, as I did above and got an answer many sigmas removed from yours. That's an indication that this isn't a very good method for estimating the answer, and I'm not inclined to tweak numbers to get it to fit my preconceived notion of the right answer.

leogao on leogao's Shortform

in some way, bureaucracy design is the exact opposite of machine learning. while the goal of machine learning is to make clusters of computers that can think like humans, the goal of bureaucracy design is to make clusters of humans that can think like a computer

adastra22 on What are the best arguments for/against AIs being "slightly 'nice'"?

I'm sorry, I literally don't understand what you're saying here. What does "care about long-run power for similar values" mean? Do you care about maximizing your own power?

adastra22 on What are the best arguments for/against AIs being "slightly 'nice'"?

If we're talking about Darwinian contexts, systems which optimize for long-run power are in fact often selected out. Long-run benefit is of no utility unless short-term survival is taken care of, and long-run and short-term needs are often at odds.

So from a behavioral perspective I expect that you get systems which are optimizing for short-term survival. Indeed, I think this point is trivial and one which you probably agree with. What I'm saying is that short-term survival and long-run power are not necessarily correlated, and I think that is the crux.

Let's take an example that is not rigorously worked out and is probably wrong in some details, but can serve to illustrate. Long-run power in humans is derivative of social structure: the leaders of the tribe control the tribe's collective resources. If you want power in human society, you need to rise to the top of our social structures, and the optimal ways of doing that are generally not nice.

But why do we have social structures at all? Why are we organized as tribes? Because we are social animals who prefer the company of others. Being with others beats striking out on your own because, generally speaking, other people in the tribe are nice. Niceness creates an environment in which sycophantic power seeking pays off, but only because there is severe evolutionary pressure towards niceness in the first place. [In the environment which gave rise to humans, not as a general statement.]

ryan_greenblatt on Is cybercrime really costing trillions per year?

I'd also add that a high fraction of these costs won't be increased if you improve cyber crime productivity (by e.g. 10%). As in, maybe a high fraction of the costs are due to the possiblity of very low effort cyber crime (analogous to the cashier case).

And Fabien's original motivation was more closely related to this.

nathan-helm-burger on Model evals for dangerous capabilities

Bonus: post-train on similar tasks

I don't think that 'post-train on similar tasks' should be considered just a bonus. I think that that's a key part of adequate safety testing. Fine-tuning on similar tasks has a substantial history in ML literature when it comes to evaluating the max capability of a general model on a specific task. It is pretty standard to report variations of: zero-examples (aka zero-shot), n-examples (aka n-shot), 1-attempt, n-attempts (with a resolution scheme such as majority solution gets submitted), fine-tuning on similar task (or subset of the examples for this task).

This isn't some weird above-and-beyond demand, it's a standard technique used for assessing capabilities. I would go so far as to say that I would suspect that someone who didn't try this didn't actually want to elicit the full capabilities of the model.

The justification for fine-tuning not being a part of the reported assessment of general purpose models is that you want to measure what users will be expected to experience as they interact with the model. But even closed-weight API-only models often offer a fine-tuning API. And definitely if you are trying to assess the risk of the weights being stolen, you need to consider fine-tuning.

dagon on Is cybercrime really costing trillions per year?

There are (at least) two different meanings of "costing" in large-scale economic impact thinking. The narrow meaning is "actual amount spent on this topic". The more common (because it's a bigger number) meaning is "how much bigger would the economy be in the counterfactual world that doesn't have this feature".

The article linked from Wikipedia says

The damage cost estimation is based on historical cybercrime figures including recent year-over-year growth, a dramatic increase in hostile nation-state sponsored and organized crime gang hacking activities, and a cyberattack surface which will be an order of magnitude greater in 2025 than it is today.
Cybercrime costs include damage and destruction of data, stolen money, lost productivity, theft of intellectual property, theft of personal and financial data, embezzlement, fraud, post-attack disruption to the normal course of business, forensic investigation, restoration and deletion of hacked data and systems, reputational harm, legal costs, and potentially, regulatory fines.

Which puts it in the second category - most of these costs are NOT direct expenses, but indirect and foregone value. That doesn't make it wrong, exactly, just not comparable to "real" measures (which GDP and GPP isn't either, but it's more defensible).

It's extremely unclear whether LLM adoption and increasing capabilities will shift the equilibrium between attack and defense on these fronts. Actually, it's almost certain that it will shift it, but it's uncertain how much and in what direction, on what timeframes.

It's further unclear whether legislation can slow the attacks more than they hinder defense.

Mostly, it's not a useful estimate or model for reasoning about decisions.

nathan-helm-burger on The Offense-Defense Balance of Gene Drives

Important bit of history: Kevin Esvelt (head of SecureBio, who I have been working for) invented the Gene Drive back in 2013. He says he was at first scared that the idea could be misused, and thought carefully about the offense-defense balance before concluding that it was safely defense-dominant. He says that he shared his idea only after concluding it was safe.

The reason SecureBio is concerned about other biological threats, and the role AI might play in them, is that there are a wide range of things it is possible to do with genetic engineering. Some of these, unlike Gene Drives, are highly offense dominant and would be really bad for humanity if deployed.

So, in conclusion, I myself have been excited for Gene Drives to get deployed for years now. I think that anti-malaria-hosting-mosquito Gene Drives are very valuable and very safe. I am in fact pretty mad that there has been so much delay, and so many lives negatively impacted by society's excess caution on this front.

Why are we so overly cautious about things which have lots of evidence to support them being safe, and at the same time so reckless about other things which are genuinely dangerous? I feel frustrated and upset about humanity's poor collective risk assessment and decision making.

measure on Sherrinford's Shortform

There should be a dropdown menu at the left side in the input box (opposite the "submit" button).

jenniferrm on Is cybercrime really costing trillions per year?

I don't know the answer to how much cybercrime is really costing, but I think your economic analysis is not accurately tracking "what GDP means".

Arms length financial transactions of "money points for services or goods" operates on the basis of scarcity, monopoly pricing power, and other power concerns that are locally legible inside of bilateral exchanges between reasonable agents.

GDP does not track the "reserve price" of consumers of computational services, where conditional on a computing service hypothetically being monopolistically priced, the person would hypothetically pay a LOT for that service.

Various surveys and a bit of logic suggest that people would hypothetically pay thousands or in many cases even tens of thousands of dollars for access to the internet even though the real cost is much much less.

By contrast, GDP just measures the "true scarcity... and lawful evil induced scarcity" part of the economy (mushed together and swirled around, so the DMCA makes hacking printer ink cartridges full of producer-added malware illegal, rather than subsidizing such heroic hacking work, as would occur under benevolent governance, and so on).

Linus Torvalds is probably owed a "debt of gratitude", by Earth, on the order of many billions, and possibly trillions, but he gave away Linux and has never been paid anything like that amount, and so the value he created and gave away does not show up in GDP. (Not just him, there's a whole constellation of rarely sung heroes and moderately happy dudes who were part of a hobbyist ecosystem that created the modern digital world between 1970 and 2010 and gave it away for free).

On a deeper level, the inability to measure or encourage the "post-scarcity" or "public goods" part of the human "economy" (if you can even call it an "economy" when it doesn't run on bilateral arms-length self-interested deals) is part of why such goods are underproduced by default, in general, and have been underproduced for all of human history.

Within this frame, it seems very plausible that the computational consumer surplus that cybercriminals attack is worth huge amounts of money to protect, even though it was acquired very cheaply from people like Linus.

Presumably humans are not yet in "private scarcity-based equilibrium" with the economics of computation processes?

In the long run it might be reasonable to expect the "a la carte computer security situation" (where every technical system becomes a game of whack-a-mole fighting many very specific ways to ruin everything in the computational commons) to devolve until most uses of most computer processes have almost no consumer surplus, because the costs of paying for a la carte help with computer security almost perfectly balances against the consumer surplus from using "essentially free compute".

This would not happen if good computer security practices arise that can somehow preserve the existing (and probably massive) consumer surplus around computers such that "using the internet and computers in general in a safe way is very cheap because computer security itself is easy to get right and spread around as a public good with nearly no marginal cost".

Like... hypothetically the government could make baseline "secure and super valuable" computing systems.

But it doesn't.

A private ad-based surveillance and propaganda corporation "solved search and created lots of billionaires" NOT the library of congress.

The NSA tries to make sure that most consumer hardware and software is insecure so that the <0.5% of consumer buyers that happen to be mobsters or terrorists can be spied on, rather than putting out open source defensive software for everyone.

People like Aaron Swartz and Moxie did, mostly for free, the thigns that a benevolent government would do if a benevolent government existed.

But no actively benevolent governments exist.

In Anathem, Neil Stephenson (who is very smart, in a very fun way) posits a giant science inquisition that prevents technological advancement (leading to AGI or nukes or bioweapons or what have you) and lets humanity "experience the current tech scale" for thousands of years with instabilities factored out and only locally stable cultural loops retained...

...in that world it is just taken for granted that 99.999% of the internet is full of auto-generated lies called "bogons" that are put out by computer security companies so as to force consumers to pay monthly subscriptions for expensive bogon filtering software that make their handheld jeejaws only really good for talking with close personal friends or business associates. It is just normal to them, for the internet to exist and be worthless, like it is normal to us for lies in ads and on the news to be the default.

Anathem's future contains no wikipedia, because wikipedia is like linux: insanely valuable, yet not scarce, with very few dollars directed to it in ways that ensures (1) it isn't hacked from the outside and (2) the leadership doesn't ruin it for personal or ideological profit from the inside.

Anathem offers us a bleak "impossible possible future" but not the bleakest.

Things probably won't happen that way because that exact way of stabilizing human civilization is unlikely, but Anathem honestly grapples with the broader issue where information services are (1) insanely valuable and (2) also nearly impossible for the market to properly price.