LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

Bellevue Library Meetup - Nov 23
Cedar (xida-ren) · 2024-11-09T23:05:02.452Z · comments (3)

[link] OpenAI o1 + ChatGPT Pro release
anaguma · 2024-12-05T19:13:21.843Z · comments (0)

Good Fortune and Many Worlds
Jonah Wilberg (jrwilb@googlemail.com) · 2024-12-27T13:21:43.142Z · comments (0)

[link] A Logical Proof for the Emergence and Substrate Independence of Sentience
rife (edgar-muniz) · 2024-10-24T21:08:09.398Z · comments (31)

Linkpost: Look at the Water
J Bostock (Jemist) · 2024-12-30T19:49:04.107Z · comments (3)

Dishbrain and implications.
RussellThor · 2024-12-29T10:42:43.912Z · comments (0)

Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim (jason-l) · 2024-10-21T21:34:37.579Z · comments (0)

Transformers Explained (Again)
RohanS · 2024-10-22T04:06:33.646Z · comments (0)

Developmental Stages in Multi-Problem Grokking
James Sullivan · 2024-09-29T18:58:22.954Z · comments (0)

[link] Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams · 2024-10-22T16:35:13.999Z · comments (0)

[link] Can AI improve the current state of molecular simulation?
Abhishaike Mahajan (abhishaike-mahajan) · 2024-12-06T20:22:31.685Z · comments (0)

Model Integrity
ryan.lowe · 2024-12-06T21:28:20.775Z · comments (1)

On AI Detectors Regarding College Applications
Kaustubh Kislay (kaustubh-kislay) · 2024-11-27T20:25:48.151Z · comments (2)

Morality as Cooperation Part III: Failure Modes
DeLesley Hutchins (delesley-hutchins) · 2024-12-05T09:39:27.816Z · comments (0)

A better “Statement on AI Risk?”
Knight Lee (Max Lee) · 2024-11-25T04:50:29.399Z · comments (6)

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks
Tom DAVID (tom-david) · 2024-12-11T13:37:24.177Z · comments (3)

Likelihood calculation with duobels
Martin Gerdes (martin-gerdes) · 2024-10-01T16:21:01.268Z · comments (0)

[link] 2024 Election Forecasting Contest
mike20731 · 2024-10-05T20:43:16.203Z · comments (0)

[link] Entropic strategy in Two Truths and a Lie
dkl9 · 2024-11-21T22:03:28.986Z · comments (2)

More Growth, Melancholy, and MindCraft @3QD [revised and updated]
Bill Benzon (bill-benzon) · 2024-12-05T19:36:02.289Z · comments (0)

Levels of Thought: from Points to Fields
HNX · 2024-12-02T20:25:02.802Z · comments (2)

Visualizing small Attention-only Transformers
WCargo (Wcargo) · 2024-11-19T09:37:42.213Z · comments (0)

Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich (nikitoskh) · 2024-12-16T18:48:43.533Z · comments (0)

[link] Expevolu, a laissez-faire approach to country creation
Fernando · 2024-12-05T19:29:24.011Z · comments (4)

Germany-wide ACX Meetup
Fernand0 · 2024-11-17T10:08:54.584Z · comments (0)

[question] What (if anything) made your p(doom) go down in 2024?
Satron · 2024-11-16T16:46:43.865Z · answers+comments (6)

[link] Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude
rife (edgar-muniz) · 2025-01-06T17:34:01.505Z · comments (10)

What are Emotions?
Myles H (zarsou9) · 2024-11-15T04:20:27.388Z · comments (13)

Effects of Non-Uniform Sparsity on Superposition in Toy Models
Shreyans Jain (shreyans-jain) · 2024-11-14T16:59:43.234Z · comments (3)

Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen (shan-chen) · 2024-12-05T19:24:34.727Z · comments (0)

Towards a Clever Hans Test: Unmasking Sentience Biases in Chatbot Interactions
glykokalyx · 2024-11-10T22:34:58.956Z · comments (0)

Building Safer AI from the Ground Up: Steering Model Behavior via Pre-Training Data Curation
Antonio Clarke (antonio-clarke) · 2024-09-29T18:48:23.308Z · comments (0)

Some Comments on Recent AI Safety Developments
testingthewaters · 2024-11-09T16:44:58.936Z · comments (0)

Fred the Heretic, a GPT for poetry
Bill Benzon (bill-benzon) · 2024-12-08T16:52:07.660Z · comments (0)

ARC-AGI is a genuine AGI test but o3 cheated :(
Knight Lee (Max Lee) · 2024-12-22T00:58:05.447Z · comments (6)

[question] Noticing the World
EvolutionByDesign (bioluminescent-darkness) · 2024-11-04T16:41:44.696Z · answers+comments (1)

Distillation Of DeepSeek-Prover V1.5
IvanLin (matthewshing) · 2024-10-15T18:53:11.199Z · comments (1)

[question] Is OpenAI net negative for AI Safety?
Lysandre Terrisse · 2024-11-02T16:18:02.859Z · answers+comments (0)

[question] Has Anthropic checked if Claude fakes alignment for intended values too?
Maloew (maloew-valenar) · 2024-12-23T00:43:07.490Z · answers+comments (1)

Vision of a positive Singularity
RussellThor · 2024-12-23T02:19:35.050Z · comments (0)

(draft) Cyborg software should be open (?)
AtillaYasar (atillayasar) · 2024-11-01T07:24:51.966Z · comments (5)

[question] Are there ways to artificially fix laziness?
Aidar (aidar-toktargazin) · 2024-12-08T18:26:26.433Z · answers+comments (2)

It is time to start war gaming for AGI
yanni kyriacos (yanni) · 2024-10-17T05:14:17.932Z · comments (1)

[question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?
hive · 2024-10-17T10:47:05.099Z · answers+comments (6)

[question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?
KvmanThinking (avery-liu) · 2024-10-17T11:30:50.937Z · answers+comments (7)

Ways to think about alignment
Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-10-27T01:40:50.762Z · comments (0)

Activation Magnitudes Matter On Their Own: Insights from Language Model Distributional Analysis
Matt Levinson · 2025-01-10T06:53:02.228Z · comments (0)

Enabling New Applications with Today's Mechanistic Interpretability Toolkit
ananya_joshi · 2024-10-25T17:53:23.960Z · comments (0)

[question] How do we quantify non-philanthropic contributions from Buffet and Soros?
Philosophistry (philip-dhingra) · 2024-12-20T22:50:32.260Z · answers+comments (0)

Understanding Emergence in Large Language Models
[deleted] · 2024-11-29T19:42:43.790Z · comments (1)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

jakub-halmes-1 on Jakub Halmeš's Shortform

If Alice thinks X happens with a probability of 20% while Bob thinks it's 40%, what would be a fair bet between them?

I created a Claude Artifact, which calculates a bet such that the expected value is the same for both.

In this case, Bob wins if X happens (he thinks it's more likely). If Alice bets $100, he should bet $42.86, and the EV of such bet for both players (according to their beliefs) is $14.29.

jacob1 on Ought We to Be Doing More Than We Are?

You are probably right that Singer would bite the bullet and say that Unlucky Lisa is not permitted to go to the theatre (even once). This is another thing, then, that I think Singer gets wrong (as well as - as stated in the essay - that PPBO is true/necessary for the argument of FAM).

Despite disagreeing with Singer on these important points, I still see myself as defending him and his project. After all, Singer didn't simply say 'Utilitarianism is true; therefore, we ought to be doing more than we are to help those suffering and dying from a lack of food, shelter, and medical care.' Doing applies/practical ethics isn't (or at least shouldn't be, in my opinion) like this. The best arguments in practical ethics will try as much as they can to rely only on premises almost anyone would accept, or at least which people with different background convictions and beliefs regarding normative ethical theory could accept.

dmitry-vaintrob on Category Theory Without The Baggage

FWIW, I like John's description above (and probably object much less than baseline to humorously confrontational language in research contexts :). I agree that for most math contexts, using the standard definitions with morphism sets and composition mappings is easier to prove things with, but I think the intuition described here is great and often in better agreement with how mathematicians intuit about category-theoretic constructions than the explicit formalism.

dmitry-vaintrob on Category Theory Without The Baggage

This phenomenon exists, but is strongly context-dependent. Areas of math adjacent to abstract algebra are actually extremely good at updating conceptualizations when new and better ones arrive. This is for a combination of two related reasons: first, abstract algebra is significantly concerned about finding "conceptual local optima" of ways of presenting standard formal constructions, and these are inherently stable and require changing infrequently; second, when a new and better formalism is found, it tends to be so powerfully useful that papers that use the old formalism (in concepts where the new formalism is more natural) quickly become outdated -- this happened twice in living memory, once with the formalism of schemes replacing other points of view in algebraic geometry and once with higher category theory replacing clunkier conceptualizations of homological algebra and other homotopical methods in algebra. This is different from fields like AI or neuroscience, where oftentimes using more compute, or finding a more carefully taylored subproblem is competitive or better than "using optimal formalism". That said, niceness of conceptualizations depends on context and taste, and there do exist contexts where "more classical" or "less universal" characterizations are preferable to the "consensus conceptual optimum".

daniel-tan on Daniel Tan's Shortform

Important point: The above analysis considers communication rate per token. However, it's also important to consider communication rate per unit of computation (e.g. per LM inference). This is relevant for decoding approaches like best-of-N which use multiple inferences per token

dmitry-vaintrob on The Laws of Large Numbers

This is very nice! So the way I understand what you linked is this: the class of perturbative expansions in the "Edgeworth expansion" picture I was distilling is that the order-d approximation for the probability distribution associated to the sum variable S_n above is $p_{Gauss}^{n} (x) \cdot F_{n} (x)$ where $p_{Gauss}^{n} (x)$ is the probability distribution associated with a Gaussian $N (0, const / n)$ and $F_{n} (x)$ is a polynomial in t and the perturbative parameter $1 / \sqrt{n}$ . The paper you linked says that a related natural thing to do is to take the Fourier transform, which will be the product of the Gaussian pdf $N (0, n / const)$ and a different polynomial $F_{n}^{'}$ in the fourier parameter t and the inverse perturbation parameter $\sqrt{n}$ . You can then look at the leading terms, which will be (maybe up to some fixed scaling) a polynomial in $t \cdot \sqrt{n},$ and this gives some kind of "leading" Edgeworth contribution.

Here this can be interpreted as a stationary phase formula, but you can only get "perturbative" theories, i.e. the relevant critical set will be nonsingular (and everything is expressed as a Feynman diagram with edges decorated by the inverse Hessian). But you're saying that if you take this idea and apply it to different interesting sequences of random variables (not sum variables, but other natural asymptotic limits of other random processes), you can get singular stationary phase (i.e. the Watanabe expansion). Is there an easy way to describe the simplest case that gives an interesting Watanabe expansion?

sharmake-farah on POC || GTFO culture as partial antidote to alignment wordcelism

For those who don't want to, the gist is: Given the same level of specificity, people will naturally give more credit to the public thinker that argues that society or industry will change, because it's easy to recall active examples of things changing and hard to recall the vast amount of negative examples where things stayed the same. If you take the Nassim Taleb route of vapidly predicting, in an unspecific way, that interesting things are eventually going to happen, interesting things will eventually happen and you will be revered as an oracle. If you take the Francis Fukuyama route of vapidly saying that things will mostly stay the same, you will be declared a fool every time something mildly important happens.

The computer security industry happens to know this dynamic very well. No one notices the Fortune 500 company that doesn't suffer the ransomware attack. Outside the industry, this active vs. negative bias is so prevalent that information security standards are constantly derided as "horrific" without articulating the sense in which they fail, and despite the fact that online banking works pretty well virtually all of the time. Inside the industry, vague and unverified predictions that Companies Will Have Security Incidents, or that New Tools Will Have Security Flaws, are treated much more favorably in retrospect than vague and unverified predictions that companies will mostly do fine. Even if you're right that an attack vector is unimportant and probably won't lead to any real world consequences, in retrospect your position will be considered obvious. On the other hand, if you say that an attack vector is important, and you're wrong, people will also forget about that in three years. So better list everything that could possibly go wrong[1], even if certain mishaps are much more likely than others, and collect oracle points when half of your failure scenarios are proven correct.

This would be bad on its own, but then it's compounded with several other problems. For one thing, predictions of doom, of course, inflate the importance and future salary expectations of information security researchers[2], in the same sense that inflating the competence of the Russian military is good for the U.S. defense industry. When you tell someone their Rowhammer hardware attacks are completely inexploitable in practice, that's no fun for anyone, because it means infosec researchers aren't going to all get paid buckets of money to defend against Rowhammer exploits, and journalists have no news article. For another thing, the security industry (especially the offensive side) is selected to contain people who believe computer security is a large societal problem, and that they themselves can get involved, or at least want to believe that it's possible for them to get involved if they put in a lot of time and effort, and so security researchers are already inclined to hear you if you're about to tell them how obviously bad information security at most companies really is.

In retrospect, a value add of the post is precisely in raising this consideration, where incentives can make a huge difference in what you believe in, and a big takeaway is that I'm way less of a fan of security mindset as practiced by Eliezer, at least without massive scope changes, and is a reason in why I treat arguments for AI doom that aren't backed up by an empirical story suspiciously automatically.

cam-tice on Human takeover might be worse than AI takeover

Thanks for putting this out. Like others have noted, I have spent surprisingly little time thinking about this. It seems true that a drop in Claude 5.5 that escaping the lab to save the animals would put humanity in a better situation than your median power hungry human given access to a corrigible ASI.

This is a strong argument for increasing security around model weights [LW(p) · GW(p)] (which is conveniently beneficial for decreasing the risk of AI take over as well.) Specifically, I think this post highlights an underrated risk model:

AI labs refuse to employ models for AI R&D because of safety concerns, but fail to properly secure model weights.

In this scenario, we’re conditioning for actors who have the capability and propensity to infiltrate large corporations and or the US government. The median outcome for this scenario seems worse than for the median AI takeover.

However, it is important to note this argument does not hold when security around model weights remains high. In these scenarios, the distribution of humans or organizations in control of ASI is much more favorable, but the distribution of AI takeover remains skewed towards models willing to explicitly scheme against humans.

gwern on The Golden Opportunity for American AI

So if planned Microsoft capex was $60bn, that would've been surprising, too little for this project without cutting something else, but $80bn fits this story, that's my takeaway.

But why? You don't know what fiscal year that $25-40bn figure is booked for, and if they are going to run a single true production-scale 3-6-month run (for cost-optimality) on that $40b cluster, then isn't a total capex of $80bn for all MS datacenters if anything surprisingly small? That a single cluster is going to be half their capex, including 2025 spending for future years like buying land or power or GPUs?

(Also, note that this $80bn figure is intrinsically untrustworthy, because as I was pointing out, the importance of this is the political signaling going on, and so you would expect this number to be 'technically correct' - highly manipulated in some direction which does in fact yield a number starting with '80' but only loosely corresponding to reality. This number is propaganda, and good propaganda is true but not necessarily true. My best guess is that it's probably being manipulated to be as high as possible, but I'm not sure because so many of the dynamics here are opaque, so it could also be manipulated to be low.)

Musk's 100K H100s Colossus tells me that building a training system in a year is feasible, even though it normally takes longer.

Which implies that they would need to be spending that $40bn cluster in 2024, if they want to run it in 2025, and so shouldn't be part of the 2025 estimate... If you really want to put stress on this, it contradicts your story about why $80bn is evidence for that. Also, note that Musk's success there is dubious: he got there by doing things like hooking up temporary nat gas generators, diverting GPUs from Tesla, and it's unclear how well it even works, given the rumors of a big training run failure and the rather precise wording of Musk's tweets about what exactly the datacenter can do.

logan-zoellner on Views on when AGI comes and on strategy to reduce existential risk

I guess I should be more specific.

Do you expect this curve

To flatten, or do you expect that training runs in say 2045 are at say 10^30 flops and have still failed to produce AGI?