LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

[question] Me & My Clone
SimonBaars (simonbaars) · 2024-07-18T16:25:40.770Z · answers+comments (22)

[link] AI Safety at the Frontier: Paper Highlights, August '24
gasteigerjo · 2024-09-03T19:17:24.850Z · comments (0)

Appraising aggregativism and utilitarianism
Cleo Nardo (strawberry calm) · 2024-06-21T23:10:37.014Z · comments (10)

Cheap Whiteboards!
Johannes C. Mayer (johannes-c-mayer) · 2024-08-08T13:52:59.627Z · comments (2)

Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen (dalasnoin) · 2024-07-15T17:07:33.283Z · comments (0)

[link] If-Then Commitments for AI Risk Reduction [by Holden Karnofsky]
habryka (habryka4) · 2024-09-13T19:38:53.194Z · comments (0)

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee (daniel-lee) · 2024-09-06T02:28:41.954Z · comments (0)

Just because an LLM said it doesn't mean it's true: an illustrative example
dirk (abandon) · 2024-08-21T21:05:59.691Z · comments (12)

The causal backbone conjecture
tailcalled · 2024-08-17T18:50:14.577Z · comments (0)

[link] Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott · 2024-09-01T07:46:26.647Z · comments (0)

[link] Positive visions for AI
L Rudolf L (LRudL) · 2024-07-23T20:15:26.064Z · comments (4)

RLHF is the worst possible thing done when facing the alignment problem
tailcalled · 2024-09-19T18:56:27.676Z · comments (6)

Links and brief musings for June
Kaj_Sotala · 2024-07-06T10:10:03.344Z · comments (0)

LessWrong email subscriptions?
Raemon · 2024-08-27T21:59:56.855Z · comments (6)

Talk: AI safety fieldbuilding at MATS
Ryan Kidd (ryankidd44) · 2024-06-23T23:06:37.623Z · comments (2)

Optimizing Repeated Correlations
SatvikBeri · 2024-08-01T17:33:23.823Z · comments (1)

[question] How are you preparing for the possibility of an AI bust?
Nate Showell · 2024-06-23T19:13:45.247Z · answers+comments (16)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

Proving the Geometric Utilitarian Theorem
StrivingForLegibility · 2024-08-07T01:39:10.920Z · comments (0)

[link] A primer on the next generation of antibodies
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-01T22:37:59.207Z · comments (0)

A Visual Task that's Hard for GPT-4o, but Doable for Primary Schoolers
Lennart Finke (l-f) · 2024-07-26T17:51:28.202Z · comments (4)

The Wisdom of Living for 200 Years
Martin Sustrik (sustrik) · 2024-06-28T04:44:10.609Z · comments (3)

An experiment on hidden cognition
Olli Järviniemi (jarviniemi) · 2024-07-22T03:26:05.564Z · comments (2)

Using an LLM perplexity filter to detect weight exfiltration
Adam Karvonen (karvonenadam) · 2024-07-21T18:18:05.612Z · comments (11)

[link] Beware the science fiction bias in predictions of the future
Nikita Sokolsky (nikita-sokolsky) · 2024-08-19T05:32:47.372Z · comments (20)

Housing Roundup #9: Restricting Supply
Zvi · 2024-07-17T12:50:05.321Z · comments (8)

[link] Announcing Open Philanthropy's AI governance and policy RFP
Julian Hazell (julian-hazell) · 2024-07-17T02:02:39.933Z · comments (0)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] MIRI's July 2024 newsletter
Harlan · 2024-07-15T21:28:17.343Z · comments (2)

[link] Fictional parasites very different from our own
Abhishaike Mahajan (abhishaike-mahajan) · 2024-09-08T14:59:39.080Z · comments (0)

[question] What's the Deal with Logical Uncertainty?
Ape in the coat · 2024-09-16T08:11:43.588Z · answers+comments (19)

[link] An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen (karvonenadam) · 2024-06-25T15:57:16.872Z · comments (0)

Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
scasper · 2024-07-30T14:57:06.807Z · comments (0)

I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!
mako yass (MakoYass) · 2024-08-02T22:35:21.136Z · comments (2)

Population ethics and the value of variety
cousin_it · 2024-06-23T10:42:21.402Z · comments (11)

A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson (Rubi) · 2024-06-24T20:26:09.744Z · comments (3)

[link] Altruism and Vitalism Aren't Fellow Travelers
Arjun Panickssery (arjun-panickssery) · 2024-08-09T02:01:11.361Z · comments (2)

Trying to be rational for the wrong reasons
Viliam · 2024-08-20T16:18:06.385Z · comments (8)

Distillation of 'Do language models plan for future tokens'
TheManxLoiner · 2024-06-27T20:57:34.351Z · comments (2)

[link] Truth is Universal: Robust Detection of Lies in LLMs
Lennart Buerger · 2024-07-19T14:07:25.162Z · comments (3)

[link] The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit · 2024-06-24T10:05:55.101Z · comments (0)

Best-of-n with misaligned reward models for Math reasoning
Fabien Roger (Fabien) · 2024-06-21T22:53:21.243Z · comments (0)

[question] What percent of the sun would a Dyson Sphere cover?
Raemon · 2024-07-03T17:27:50.826Z · answers+comments (26)

[link] Robert Caro And Mechanistic Models In Biography
adamShimi · 2024-07-14T10:56:42.763Z · comments (5)

How Congressional Offices Process Constituent Communication
Tristan Williams (tristan-williams) · 2024-07-02T12:38:41.472Z · comments (0)

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities
c.trout (ctrout) · 2024-09-11T15:09:48.019Z · comments (2)

[link] Managing Emotional Potential Energy
adamShimi · 2024-07-10T18:20:45.640Z · comments (4)

Text Posts from the Kids Group: 2019
jefftk (jkaufman) · 2024-06-23T13:20:01.495Z · comments (0)

Would you benefit from, or object to, a page with LW users' reacts?
Raemon · 2024-08-20T16:35:47.568Z · comments (6)

[question] Money Pump Arguments assume Memoryless Agents. Isn't this Unrealistic?
Dalcy (Darcy) · 2024-08-16T04:16:23.159Z · answers+comments (6)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sdm on AI #82: The Governor Ponders

For months, those who want no regulations of any kind placed upon themselves have hallucinated and fabricated information about the bill’s contents and intentionally created an internet echo chamber, in a deliberate campaign to create the impression of widespread opposition to SB 1047, and that SB 1047 would harm California’s AI industry.

There is another significant angle to add here. Namely: Many of the people in this internet echo chamber or behind this campaign are part of the network of neoreactionaries, MAGA supporters, and tech elites who want to be unaccountable that you've positioned yourself as a substantial counterpoint to.

Obviously it's invoking a culture war fight which has its downsides, but it's not just rhetoric: the charge that many bill opponents are basing their decisions on an ideology that Newsom opposes and sees as dangerous for the country, is true.

A16z and many other of the most dishonest opponents of the bill, are part of a Trump-supporting network with lots of close ties to neoreactionary thought, which opposes SB 1047 for precisely the same reason that they want Trump and republicans to win: to remove restraints on their own power in the short-to-medium term, and more broadly because they see it as a step towards making our society into one where wealthy oligarchs are given favorable treatment and can get away with anything.

It also serves as a counterpoint against the defense and competition angle, at least if its presented by a16z (this argument doesn't work for e.g. OpenAI, but there are many other good counterarguments). The claims they make about the bill harming competitiveness e.g. for defense and security against China and other adversaries ring hollow when most of them are anti-Ukraine support or anti-NATO, making it clear they don't generally care about the US maintaining its global leadership.

I think this would maybe compel Newsom who's positioned himself as an anti-MAGA figure.

eggsyntax on eggsyntax's Shortform

"simulations or training situations" doesn't necessarily sound like fun.

Seems like some would be and some wouldn't. Although those are the 'medium significance' ones; the largest category is the 188 that used 'low significance' tasks. Still doesn't map exactly to 'fun', but I expect those ones are at least very low stress.

Generally, comparing kids vs adults could be interesting, although it is difficult to say what would be an equivalent mental effort. Specifically I am curious about the impact of school. Oh, we should also compare homeschooled kids vs kids in school, to separate the effects of school and age.

That would definitely be interesting; it wouldn't surprise me if at least a couple of the studies in the meta-analysis did that.

tailcalled on tailcalled's Shortform

Thesis: in addition to probabilities, forecasts should include entropies (how many different conditions are included in the forecast) and temperatures (how intense is the outcome addressed by the marginal constraint in this forecast, i.e. the big-if-true factor).

I say "in addition to" rather than "instead of" because you can't compute probabilities just from these two numbers. If we assume a Gibbs distribution, there's the free parameter of energy: ln(P) = S - U/T. But I'm not sure whether this energy parameter has any sensible meaning with more general events that aren't some thermal chemical equillibrium type thing.

Follow-up thesis: a major problem with rationalist forecasting wisdom is that it focuses on gaining accuracy by increasing S (e.g. addressing conjunction fallacy/base-rates/antipredictions). Meanwhile, the signed interestingness of a forecast is something like P ln(T/T_baseline) or P U. I guess implicitly the assumption is the event is already preselected for high temperature, but then surprising predictions get selected for high entropy, and this leads to resolution difficulty as to what "counts".

gb on What you know when you know nothing

We can only make that inference about conjunctions if we know that the statements are independent. Since (by assumption) we don’t know anything about said world, we don’t know that either, so the conclusion does not follow.

johannes-c-mayer on We Don't Know Our Own Values, but Reward Bridges The Is-Ought Gap

reward is the evidence from which we learn about our values

A sadist might feel good each time they hurt somebody. I am pretty sure it is possible for a sadist to exist who does not endorse hurting people, meaning they feel good if they hurt people, but they avoid it nonetheless.

So to what extent is hurting people a value? It's like the sadist's brain tries to tell them that they ought to want to hurt people, but they don't want to. Intuitively the "they don't want to" seems to be the value.

mateusz-baginski on Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.

Similarly, hedging is not hypocrisy.

cousin_it on Slave Morality: A place for every man and every man in his place

I think Nietzsche would agree that "slave morality" originated with Jesus. The main new idea that Jesus brought as a moral philosopher was compassion, feeling for the other person. It's pretty find to hard in earlier sources, for example the heroes of the Iliad hurt weaker people without a second thought.

To me it feels obvious that the idea of compassion needs to exist, and needs to have force. Because otherwise we'd have a human society operating by the laws of the natural world, and if you look at what animals do to each other, there's no limit to how bad things can get.

Can compassion also become a tool of power and abuse? Sure. But let's not go back to a world without compassion, please.

xpym on A Nonconstructive Existence Proof of Aligned Superintelligence

This isn’t really a problem with alignment

I'd rather put it that resolving that problem is a prerequisite for the notion of "alignment problem" to be meaningful in the first place. It's not technically a contradiction to have an "aligned" superintelligence that does nothing, but clearly nobody would in practice be satisfied with that.

quetzal_rainbow on What's the Deal with Logical Uncertainty?

The reason why logical uncertainty was brought up in the first place is decision theory, to make crisp formal expression for intuitive "I cooperate with you conditional on you cooperating with me", where "you cooperating with me" is result of analysis of probability distribution over possible algorithms which control actions of your opponent and you can't actually run these algorithms due to computational constraints, and you want to do all this reasoning in non-arbitrary ways.

devrandom on Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety

There seem to be substantial problems with low probability events, coherent predictions over time, short term events, probabilities adding up to more than 100%, etc

An probabilistic oracle being inconsistent is completely besides the point. If I have a probabilistic oracle that has high accuracy but is sometimes inconsistent, I can just post-process the predictions to force them into a consistent format. For example, I can normalize the probabilities to 100%.

The economic value is in the overall accuracy. Being consistent is a cosmetic consideration.