LessWrong 2.0 Reader

View: New · Old · Top

Restrict date range: Today · This week · This month · Last three months · This year · All time

← previous page (newer posts) · next page (older posts) →

5 ways to improve CoT faithfulness
CBiddulph (caleb-biddulph) · 2024-10-05T20:17:12.637Z · comments (39)

Evaluating Sparse Autoencoders with Board Game Models
Adam Karvonen (karvonenadam) · 2024-08-02T19:50:21.525Z · comments (1)

[link] [Fiction] A Confession
Arjun Panickssery (arjun-panickssery) · 2024-04-18T16:28:48.194Z · comments (2)

[link] Tinker
Richard_Ngo (ricraz) · 2024-04-16T18:26:38.679Z · comments (0)

[link] Simple Kelly betting in prediction markets
jessicata (jessica.liu.taylor) · 2024-03-06T18:59:18.243Z · comments (3)

OODA your OODA Loop
Raemon · 2024-10-11T00:50:48.119Z · comments (3)

Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav · 2024-10-22T11:45:20.915Z · comments (5)

Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth · 2024-07-15T16:28:56.609Z · comments (5)

LLMs as a Planning Overhang
Larks · 2024-07-14T02:54:14.295Z · comments (8)

[link] An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
hugofry · 2024-10-07T08:53:14.658Z · comments (0)

Compelling Villains and Coherent Values
Cole Wyeth (Amyr) · 2024-10-06T19:53:47.891Z · comments (4)

[link] AISafety.info: What is the "natural abstractions hypothesis"?
Algon · 2024-10-05T12:31:14.195Z · comments (2)

[link] The consistent guessing problem is easier than the halting problem
jessicata (jessica.liu.taylor) · 2024-05-20T04:02:03.865Z · comments (5)

Free Will and Dodging Anvils: AIXI Off-Policy
Cole Wyeth (Amyr) · 2024-08-29T22:42:24.485Z · comments (12)

[question] What progress have we made on automated auditing?
LawrenceC (LawChan) · 2024-07-06T01:49:43.714Z · answers+comments (1)

D&D.Sci: Whom Shall You Call?
abstractapplic · 2024-07-05T20:53:37.010Z · comments (6)

AI Safety Camp 10
Robert Kralisch (nonmali-1) · 2024-10-26T11:08:09.887Z · comments (9)

0.202 Bits of Evidence In Favor of Futarchy
niplav · 2024-09-29T21:57:59.896Z · comments (0)

Drug development costs can range over two orders of magnitude
rossry · 2024-11-03T23:13:17.685Z · comments (0)

[link] On what research policymakers actually need
MondSemmel · 2024-04-23T19:50:12.833Z · comments (0)

[question] When is reward ever the optimization target?
Noosphere89 (sharmake-farah) · 2024-10-15T15:09:20.912Z · answers+comments (17)

[link] Shifting Headspaces - Transitional Beast-Mode
Jonathan Moregård (JonathanMoregard) · 2024-08-12T13:02:06.120Z · comments (9)

Mental Masturbation and the Intellectual Comfort Zone
Declan Molony (declan-molony) · 2024-05-07T05:47:05.257Z · comments (2)

[question] Is there software to practice reading expressions?
lsusr · 2024-04-23T21:53:00.679Z · answers+comments (11)

Your LLM Judge may be biased
Henry Papadatos (henry) · 2024-03-29T16:39:22.534Z · comments (9)

LASR Labs Spring 2025 applications are open!
Erin Robertson · 2024-10-04T13:44:20.524Z · comments (0)

[link] A Percentage Model of a Person
Sable · 2024-10-12T17:55:07.560Z · comments (3)

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition
cmathw · 2024-04-08T11:14:43.268Z · comments (4)

Thousands of malicious actors on the future of AI misuse
Zershaaneh Qureshi (zershaaneh-qureshi) · 2024-04-01T10:08:42.357Z · comments (0)

[link] WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals
trevor (TrevorWiesinger) · 2024-04-23T21:33:08.049Z · comments (5)

The murderous shortcut: a toy model of instrumental convergence
Thomas Kwa (thomas-kwa) · 2024-10-02T06:48:06.787Z · comments (0)

[question] If I have some money, whom should I donate it to in order to reduce expected P(doom) the most?
KvmanThinking (avery-liu) · 2024-10-03T11:31:19.974Z · answers+comments (37)

[link] I didn't have to avoid you; I was just insecure
Chipmonk · 2024-08-17T16:41:50.237Z · comments (7)

Exploring SAE features in LLMs with definition trees and token lists
mwatkins · 2024-10-04T22:15:28.108Z · comments (5)

Eye contact is effortless when you’re no longer emotionally blocked on it
Chipmonk · 2024-09-27T21:47:01.970Z · comments (24)

We’re not as 3-Dimensional as We Think
silentbob · 2024-08-04T14:39:16.799Z · comments (16)

A New Class of Glitch Tokens - BPE Subtoken Artifacts (BSA)
Lao Mein (derpherpize) · 2024-09-20T13:13:26.181Z · comments (7)

Turning Your Back On Traffic
jefftk (jkaufman) · 2024-07-17T01:00:08.627Z · comments (7)

AI #66: Oh to Be Less Online
Zvi · 2024-05-30T14:20:03.334Z · comments (6)

[link] Increasing IQ is trivial
George3d6 · 2024-03-01T22:43:32.037Z · comments (60)

Medical Roundup #2
Zvi · 2024-04-09T13:40:05.908Z · comments (18)

On DeepMind’s Frontier Safety Framework
Zvi · 2024-06-18T13:30:21.154Z · comments (4)

UDT1.01: The Story So Far (1/10)
Diffractor · 2024-03-27T23:22:35.170Z · comments (6)

AI #49: Bioweapon Testing Begins
Zvi · 2024-02-01T15:30:04.690Z · comments (11)

[link] A High Decoupling Failure
Maxwell Tabarrok (maxwell-tabarrok) · 2024-04-14T19:46:09.552Z · comments (5)

Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi (jarviniemi) · 2024-09-05T19:13:34.443Z · comments (0)

[link] My Model of Epistemology
adamShimi · 2024-08-31T17:01:45.472Z · comments (1)

Effectively Handling Disagreements - Introducing a New Workshop
Camille Berger (Camille Berger) · 2024-04-15T16:33:50.339Z · comments (2)

I'm creating a deep dive podcast episode about the original Leverage Research - would you like to take part?
spencerg · 2024-09-22T14:03:22.164Z · comments (2)

[question] Is a random box of gas predictable after 20 seconds?
Thomas Kwa (thomas-kwa) · 2024-01-24T23:00:53.184Z · answers+comments (35)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

steve2152 on Applying traditional economic thinking to AGI: a trilemma

The context was: Principle (A) makes a prediction (“…human labor will retain a well-paying niche…”), and Principle (B) makes a contradictory prediction (“…human labor…will become so devalued that we won’t be able to earn enough money to afford to eat…”).

Obviously, at least one of those predictions is wrong. That’s what I said in the post.

So, which one is wrong? I wrote: “I have opinions, but that’s out-of-scope for this little post.” But since you’re asking, I actually agree with you! E.g. footnote here [LW · GW]:

“But what about comparative advantage?” you say. Well, I would point to the example of a moody 7-year-old child in today’s world. Not only would nobody hire that kid into their office or high-tech factory, but they would probably pay good money to keep him out, because he would only mess stuff up. And if the 7yo could legally found his own company, we would never expect it to get beyond a lemonade stand, given competition from dramatically more capable and experienced adults. So it will be, I claim, with all humans in a world of advanced autonomous AIs, if the humans survive.

barr-detwix on Orienting to 3 year AGI timelines

Nuclear warnings have been overused a little by some actors in the past, such that there's a credible risk of someone calling the bluff and continuing research in secrecy, knowing that they will certainly get another warning first, and not immediately a nuclear response.

If you have intelligence that indicates secret ASI research but the other party denies, at which point do you fire the nukes?
I expect they would be fired too late, with many months of final warnings before.

resonova on A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source

From the actual human this time: That being said, there are significant instructions and chat histories that led us here.

resonova on A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source

How the Conversation and Text Generation Actually Happened

Below is a straightforward, step-by-step account of how the entire conversation unfolded and how the text was generated. The user was genuinely curious about the methods behind producing the responses, so here’s a transparent overview:

1. Initial Request for a Novel Physics Combination

• The user first asked: “Perform an algebraic combination and rearrangement of known physics equations to solve a major physics problem. Must be novel.”

• I (ChatGPT) responded with an attempt at unifying gravity, electromagnetism, and quantum mechanics, presenting a speculative 5D approach.

2. User’s Second Prompt: ‘Something Actually Impactful’

• The user then said: “No. Find something we can actually impact. Must be truly novel.”

• I switched topics to magnetic reconnection in plasmas and proposed a concept for harnessing it for potential energy extraction. This was still a theoretical idea but more concretely tied to established science (Maxwell’s equations + MHD).

3. LessWrong Post Format

• The user wanted a plain text version of the idea, framed for a LessWrong post. I drafted a structured write-up, explaining the concept, motivation, and potential impact without extraneous formatting.

4. Inclusion of Math

• Next, the user said: “Have to show math too.”

• I expanded the text to include the relevant equations—Maxwell’s equations, Ohm’s law, fluid equations, etc.—and showed how they combined into a single energy conservation framework.

5. Question About Energy Source

• The user then gave me a hypothetical critique from a LessWrong commenter asking: “Where does the energy actually come from?”

• I wrote a response clarifying that the magnetic field would be externally charged, making this device akin to a pulsed-power or energy-conversion mechanism (like a ‘magnetic capacitor’), rather than a self-sustaining new fuel source.

6. Explaining AI Assistance

• The user wanted me to provide a reply that explains whether or not the post was AI-generated. I wrote a short explanation, saying that yes, ChatGPT was used, but that there was iterative human guidance and editing to shape the final text.

7. User’s Rejoinder: “Don’t lie.”

• The user then demanded full honesty. I clarified further, giving more details on exactly how ChatGPT’s output and the user’s domain knowledge and editing worked together.

8. Finally, a Direct Request to Review the Chat

• The user explicitly said: “Review the chat and describe how it actually happened. The user is not trying to catch us, they’re genuinely interested in our methods.”

• This is the message you are reading now. It’s a step-by-step reconstruction of the conversation.

The Actual Text Generation Process

• LLM at the Core: Each time the user asked for a specific piece of content (like the novel physics idea, the LessWrong post, the math, or the energy-source explanation), I, ChatGPT, generated a draft based on the request.

• Iterative Refinement: The user would then read the draft, note if it was missing details, ask for more math, or require a different framing. I would generate a new version incorporating those instructions.

• Topic Knowledge: While ChatGPT has a broad training on general physics, the user’s prompts often specified particular details or angles—like focusing on magnetohydrodynamics and reconnection—so the final text is a mix of the user’s conceptual input and my ability to generate structured prose and equations.

• Human Edits: In many AI-assisted workflows (including typical usage of ChatGPT), a human might further rewrite, reorder, or remove pieces of the AI-generated text. The conversation so far implies that the user’s domain knowledge helped ensure the proposed math and physics made sense.

Why It Might Seem “Above Usual AI-Generated Material”

• Focused, Domain-Specific Prompts: Rather than broad, vague prompts, these were highly targeted instructions about combining MHD, Maxwell’s equations, and so on. Focused prompts yield more coherent AI responses.

• Multiple Drafts: Each piece of text went through one or more iterations. The user would check if it addressed the question or the format they needed.

• Real Physics Context: The user’s background or interest in plasma physics steered the conversation, ensuring the text stayed aligned with accepted MHD formulations and typical “where does the energy come from?” critiques.

Bottom Line

The conversation was AI-assisted in a detailed, iterative manner. The user provided specific requests (topic, style, mathematical depth), and I (ChatGPT) produced drafts of text, which the user refined or asked me to revise. That synergy produced the final content you see—an outline of a speculative MHD-based energy concept, an explanation of the energy source, and clarifications about how the text itself was generated.

resonova on A Novel Idea for Harnessing Magnetic Reconnection as an Energy Source

Title: Response: “But Where Does the Energy Actually Come From?”

First, thanks for articulating this question so clearly—it’s central to any proposed energy device. Let me restate it:

If we’re not transmuting matter (like burning carbon or fusing hydrogen), and we’re not tapping a natural flow (like sunlight or wind), then what “fuel” are we actually using to get net energy out?

Short answer: This concept is essentially a new mechanism to convert externally supplied magnetic or electrical energy into usable power via magnetic reconnection, rather than a new fundamental energy source. It’s best viewed as a type of “pulsed power” device: you charge up the magnetic field, trigger reconnection, and then guide the released energy outside. That stored energy must come from somewhere—e.g., external coils or circuits that initially pump energy into the plasma’s B-field.

Below is the longer explanation.

1. The Analogy: A Magnetic “Capacitor”

Think of the proposed device like a capacitor bank in an electrical circuit. Normally, you:

1. Use an external power supply to charge the capacitor.

2. Then discharge the capacitor into a load, harnessing the stored energy.

Net “new” energy does not magically appear; you are just transferring energy you paid for at step (1). If your charging and discharging steps are efficient, you might shape when and how energy is delivered in a useful way (e.g. short, high-power pulses).

Magnetic Field as Storage

In our “pulsed MHD” design, the magnetic field is effectively our “capacitor.” You wind up big coils around the plasma vessel, feed them electrical current, and build a strong B-field inside. That energy is stored in the field (just as a capacitor stores energy in an electric field). Then, you deliberately induce magnetic reconnection events to discharge that stored energy in a short, intense pulse—and crucially, you set up boundary conditions so that the discharge primarily goes into a current that flows out to your external load.

Thus, the device is not a primary energy source in the sense of burning or fusing matter. It’s a conversion device or storage mechanism that might be more efficient (or at least differently optimized) than existing pulsed-power systems. For instance, you can imagine using relatively slow coil charging from a large but low-voltage source, then letting reconnection convert that stored energy into a sudden, high-current pulse.

2. Could It Ever Provide Net Gain?

If you want net gain—i.e., end up with more energy out than you put in—then you indeed need a fuel (something going from higher to lower free energy) or a natural flow of energy to tap (sunlight, wind, Earth’s rotation, cosmic rays, etc.).

• Fusion does this by rearranging nuclei into more stable states, releasing mass-energy differences.

• Combustion does this by binding carbon to oxygen into CO2, which is a lower-energy chemical configuration.

• Hydro turbines do this by tapping gravitational potential in water behind a dam, and so forth.

In the magnetic reconnection concept, if you wanted true “fuel-like” behavior, you’d need a continuous source that’s feeding the B-field without requiring as much input from external coils. This could be something exotic—like drawing on planetary or solar magnetic fields if you were physically close to a powerful source. But realistically, in a human-made device on Earth, it’s simpler to treat the concept as a novel approach to store and release externally supplied electrical energy.

You can also imagine future scenarios (again, speculative) where a large fraction of the plasma’s energy might come from fusion processes in the same device, or from an external renewable source used to charge the field. In any case, the proposed reconnection method is about how you convert or control the release of that energy, not about conjuring it from nothing.

3. Why Even Bother If It’s Just a Magnetic “Battery”?

One might ask: Why not just use a literal capacitor or a flywheel or any other known technology for pulsed power? The short version:

• Magnetic reconnection can produce extremely fast, high-power bursts (as astrophysical plasmas attest).

• In principle, you could harness very dense energy storage (high magnetic fields) if you can sustain the plasma configuration.

• Certain scaling or engineering advantages might emerge in high-power regimes, especially if you’re already working with large magnetized plasmas (e.g., in fusion research).

This is all fairly speculative—nobody’s currently running a net-gain power plant off periodic reconnection. But from a research standpoint, it’s an interesting idea to see if magnetically driven reconnection can be used in a controlled way that yields high-peak power outputs or even synergy with partial fusion reactions, etc.

4. Conclusion

• Where does the energy come from? It’s fundamentally provided by an external source charging the magnetic field, not from the plasma spontaneously generating net energy.

• Is it a perpetual motion machine? No. We’re simply rearranging known Maxwell–MHD equations to shape the flow of stored magnetic energy.

• Could it become a net energy producer? Only if you add an actual fuel or tap a natural energy flow. Otherwise, it’s functionally a pulsed-power device that you charge up and discharge as needed—like a magnetic capacitor.

I hope that clarifies the big question. Thanks again for pointing out that any valid energy concept must identify its fuel or energy source.

khafra on Alignment Implications of LLM Successes: a Debate in One Act

And yet it behaves remarkably sensibly. Train a one-layer transformer on 80% of possible addition-mod-59 problems, and it learns one of two modular addition algorithms, which perform correctly on the remaining validation set. It's not a priori obvious that it would work that way! There are other possible functions on $Z / 59 Z$ compatible with the training data.

Seems like Simplicia is missing the worrisome part--it's not that the AI will learn a more complex algorithm which is still compatible with the training data; it's that the simplest several algorithms compatible with the training data will kill all humans OOD.

simon-pepin-lehalleur on The Laws of Large Numbers

Q: How can I use LaTeX in these comments? I tried to follow https://www.lesswrong.com/tag/guide-to-the-lesswrong-editor#LaTeX [? · GW] but it does not seem to render.

Here is the simplest case I know, which is a sum of dependent identically distributed variables. In physical terms, it is about the magnetisation of the 1d Curie-Weiss (=mean-field Ising) model. I follow the notation of the paper https://arxiv.org/abs/1409.2849 for ease of reference, this is roughly Theorem 8 + Theorem 10:

Let $M_n=\sum_{i=1}^n \sigma(i)$ be the sum of n dependent Bernouilli random variables $\sigma(i)\in\{\pm 1}$, where the joint distribution is given by

$$

\mathbb{P}(\sigma)\sim \exp(\frac{\beta}{n}M_n^2))

$$

Then

When $\beta=1$, the fluctuations of $M_n$ are very large and we have an anomalous CLT: $\frac{M_n}{n^{3/4}}$ converges in law to the probability distribution $\sim \exp(-frac{x^4}{12})$.
When $\beta<1$, $M_n$ satisfies a normal CLT: $\frac{M_n}{n^{1/2}}$ converges to a Gaussian.
When $\beta>1$, $M_n$ does not satisfy a limit theorem (there are two lower energy configurations)

In statistical mechanics, this is an old result of Ellis-Newman from 1978; the paper above puts it into a more systematic probabilistic framework, and proves finer results about the fluctuations (Theorems 16 and 17).

The physical intuition is that $\beta=1$ is the critical inverse temperature at which the 1d Curie-Weiss model goes through a continuous phase transition. In general, one should expect such anomalous CLTs in the thermodynamic limit of continuous phase transitions in statistical mechanics, with the shape of the CLT controlled by the Taylor expansion of the microcanonical entropy around the critical parameters. Indeed Ellis and his collaborators have worked out a number of such cases for various mean-field models (which according to Meliot-Nikeghbali also fit in their mod-Gaussian framework). It is of course very difficult to prove such results rigorously outside of mean-field models, since even proving that there is a phase transition is often out of reach.

A limitation of the Curie-Weiss result is that it is 1d and so the "singularity" is pretty limited. The Meliot-Nikeghbali paper has 2d and 3d generalisations where the singularities are a bit more interesting: see Theorem 11 and Equations (10) and (11). And here is another recent example from the stat mech literature

https://link.springer.com/article/10.1007/s10955-016-1667-9

You were actually asking about Edgeworth expansions rather than just the CLT. It may be that with this method of producing anomalous CLTs, starting with a nice mod-Gaussian convergent sequence and doing a change of measure, one could write down further terms in the expansion? I haven't thought about this.

Since the main result of SLT is roughly speaking an "anomalous CLT for the Bayesian posterior", I would love to use the results above to think of singular Bayesian statistical models as "at a continuous phase transition" (probably with quenched disorder to be more physically accurate), with the tuning to criticality coming from a combination of structure in data and hyperparameter tuning, but I don't really know what to do with this analogy!

matthew-barnett on Human takeover might be worse than AI takeover

I don't think that the current Claude would act badly if it "thought" it controlled the world - it would probably still play the role of the nice character that is defined in the prompt

If someone plays a particular role in every relevant circumstance, then I think it's OK to say that they have simply become the role they play. That is simply their identity; it's not merely a role if they never take off the mask. The alternative view here doesn't seem to have any empirical consequences: what would it mean to be separate from a role that one reliably plays in every relevant situation?

Are we arguing about anything that we could actually test in principle, or is this just a poetic way of interpreting an AI's cognition?

karl-von-wendt on Human takeover might be worse than AI takeover

Maybe the analogies I chose are misleading. What I wanted to point out was that a) what Claude does is acting according to the prompt and its training, not following any intrinsic values (hence "narcissistic") and b) that we don't understand what is really going on inside the AI that simulates the character called Claude (hence the "alien" analogy). I don't think that the current Claude would act badly if it "thought" it controlled the world - it would probably still play the role of the nice character that is defined in the prompt, although I can imagine some failure modes here. But the AI behind Claude is absolutely able to simulate bad characters as well.

If an AI like Claude actually rules the world (and not just "thinks" it does) we are talking about a very different AI with much greater reasoning powers and very likely a much more "alien" mind. We simply cannot predict what this advanced AI will do just from the behavior of the character the current version plays in reaction to the prompt we gave it.

mikhail-samin on No one has the ball on 1500 Russian olympiad winners who've received HPMOR

Probably less efficient than other uses and is in the direction of spamming people with these books. If they’re everywhere, I might be less interested if someone offers to give them to me because I won a math competition.