Posts

Proactive 'If-Then' Safety Cases 2024-11-18T21:16:37.237Z
A path to human autonomy 2024-10-29T03:02:42.475Z
My hopes for YouCongress.com 2024-09-22T03:20:20.939Z
Physics of Language models (part 2.1) 2024-09-19T16:48:32.301Z
Avoiding the Bog of Moral Hazard for AI 2024-09-13T21:24:34.137Z
A bet for Samo Burja 2024-09-05T16:01:35.440Z
Diffusion Guided NLP: better steering, mostly a good thing 2024-08-10T19:49:50.963Z
Imbue (Generally Intelligent) continue to make progress 2024-06-26T20:41:18.413Z
Secret US natsec project with intel revealed 2024-05-25T04:22:11.624Z
Constituency-sized AI congress? 2024-02-09T16:01:09.592Z
Gunpowder as metaphor for AI 2023-12-28T04:31:40.663Z
Digital humans vs merge with AI? Same or different? 2023-12-06T04:56:38.261Z
Desiderata for an AI 2023-07-19T16:18:08.299Z
An attempt to steelman OpenAI's alignment plan 2023-07-13T18:25:47.036Z
Two paths to win the AGI transition 2023-07-06T21:59:23.150Z
Nice intro video to RSI 2023-05-16T18:48:29.995Z
Will GPT-5 be able to self-improve? 2023-04-29T17:34:48.028Z
Can GPT-4 play 20 questions against another instance of itself? 2023-03-28T01:11:46.601Z
Feature idea: extra info about post author's response to comments. 2023-03-23T20:14:19.105Z
linkpost: neuro-symbolic hybrid ai 2022-10-06T21:52:53.095Z
linkpost: loss basin visualization 2022-09-30T03:42:34.582Z
Progress Report 7: making GPT go hurrdurr instead of brrrrrrr 2022-09-07T03:28:36.060Z
Timelines ARE relevant to alignment research (timelines 2 of ?) 2022-08-24T00:19:27.422Z
Please (re)explain your personal jargon 2022-08-22T14:30:46.774Z
Timelines explanation post part 1 of ? 2022-08-12T16:13:38.368Z
A little playing around with Blenderbot3 2022-08-12T16:06:42.088Z
Nathan Helm-Burger's Shortform 2022-07-14T18:42:49.125Z
Progress Report 6: get the tool working 2022-06-10T11:18:37.151Z
How to balance between process and outcome? 2022-05-04T19:34:10.989Z
Progress Report 5: tying it together 2022-04-23T21:07:03.142Z
What more compute does for brain-like models: response to Rohin 2022-04-13T03:40:34.031Z
Progress Report 4: logit lens redux 2022-04-08T18:35:42.474Z
Progress report 3: clustering transformer neurons 2022-04-05T23:13:18.289Z
Progress Report 2 2022-03-30T02:29:32.670Z
Progress Report 1: interpretability experiments & learning, testing compression hypotheses 2022-03-22T20:12:04.284Z
Neural net / decision tree hybrids: a potential path toward bridging the interpretability gap 2021-09-23T00:38:40.912Z

Comments

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-20T14:08:24.961Z · LW · GW

Another take on the plausibility of RSI; https://x.com/jam3scampbell/status/1892521791282614643

(I think RSI soon will be a huge deal)

Comment by Nathan Helm-Burger (nathan-helm-burger) on nikola's Shortform · 2025-02-20T13:39:07.865Z · LW · GW

Have you noticed that AI companies have been opening offices in Switzerland recently? I'm excited about it.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Quinn's Shortform · 2025-02-16T14:45:58.938Z · LW · GW

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Skepticism towards claims about the views of powerful institutions · 2025-02-13T14:29:52.697Z · LW · GW

Also worth considering is that how much an "institution" holds a view on average may not matter nearly as much as how the powerful decision makers within or above that institution feel.

Comment by Nathan Helm-Burger (nathan-helm-burger) on ozziegooen's Shortform · 2025-02-08T02:39:12.916Z · LW · GW

There are a lot of possible plans which I can imagine some group feasibly having which would meet one of the following criteria:

  1. contains critical elements which are illegal
  2. Contains critical elements which depends on an element of surprise / misdirection
  3. Benefit from the actor bring first mover on the plan. Others can strategy copy, but can't lead.

If one of these criteria or similar applies to the plan, then you can't discuss it openly without sabotaging it. Making strategic plans with all your cards laid out on the table (whole open-ended hide theirs) makes things substantially harder.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-06T18:11:41.162Z · LW · GW

A point in favor of evals being helpful for advancing AI capabilities: https://x.com/polynoamial/status/1887561611046756740

Noam Brown @polynoamial A lot of grad students have asked me how they can best contribute to the field of AI when they are short on GPUs and making better evals is one thing I consistently point to.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-05T03:46:57.265Z · LW · GW

It has been pretty clearly announced to the world by various tech leaders that they are explicitly spending billions of dollars to produce "new minds vastly smarter than any person, which pose double-digit risk of killing everyone on Earth". This pronouncement has not yet incited riots. I feel like discussing whether Anthropic should be on the riot-target-list is a conversation that should happen after the OpenAI/Microsoft, DeepMind/Google, and Chinese datacenters have been burnt to the ground.

Once those datacenters have been reduced to rubble, and the chip fabs also, then you can ask things like, "Now, with the pressure to race gone, will Anthropic proceed in a sufficiently safe way? Should we allow them to continue to exist?" I think that, at this point, one might very well decide that the company should continue to exist with some minimal amount of compute, while the majority of the compute is destroyed. I'm not sure it makes sense to have this conversation while OpenAI and DeepMind remain operational.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Chris_Leong's Shortform · 2025-02-05T03:23:58.499Z · LW · GW

People have said that to get a good prompt it's better to have a discussion with a model like o3-mini, o1, or Claude first, and clarify various details about what you are imagining, then give the whole conversation as a prompt to OA Deep Research.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-04T23:23:45.481Z · LW · GW

Fair enough. I'm frustrated and worried, and should have phrased that more neutrally. I wanted to make stronger arguments for my point, and then partway through my comment realized I didn't feel good about sharing my thoughts.

I think the best I can do is gesture at strategy games that involve private information and strategic deception like Diplomacy and Stratego and MtG and Poker, and say that in situations with high stakes and politics and hidden information, perhaps don't take all moves made by all players at literally face value. Think a bit to yourself about what each player might have in their uands, what their incentives look like, what their private goals might be. Maybe someone whose mind is clearer on this could help lay out a set of alternative hypotheses which all fit the available public data?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-04T22:41:14.507Z · LW · GW

I don't believe the nuclear bomb was truly built to not be used from the point of view of the US gov. I think that was just a lie to manipulate scientists who might otherwise have been unwilling to help.

I don't think any of the AI builders are anywhere close to "building AI not to be used". This seems even more clear than with nuclear, since AI has clear beneficial peacetime economically valuable uses.

Regulation does make things worse if you believe the regulation will fail to work as intended for one reason or another. For example, my argument that putting compute limits on training runs (temporarily or permanently) would hasten progress to AGI by focusing research efforts on efficiency and exploring algorithmic improvements.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-04T22:22:19.890Z · LW · GW

I don't feel free to share my model, unfortunately. Hopefully someone else will chime in. I agree with your point and that this is a good question!

I am not trying to say I am certain that Anthropic is going to be net positive, just that that's my view as the higher probability.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-04T05:16:15.517Z · LW · GW

Oops, yes.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T20:57:42.224Z · LW · GW

I'm pretty sure that measures of the persuasiveness of a model which focus on text are going to greatly underestimate the true potential of future powerful AI.

I think a future powerful AI would need different inputs and outputs to perform at maximum persuasiveness.

Inputs

  • speech audio in
  • live video of target's face (allows for micro expression detection, pupil dilation, gaze tracking, bloodflow and heart rate tracking)
  • EEG signal would help, but is too much to expect for most cases
  • sufficiently long interaction to experiment with the individual and build a specific understanding of their responses

Outputs

  • emotionally nuanced voice
  • visual representation of an avatar face (may be cartoonish)
  • ability to present audiovisual data (real or fake, like graphs of data, videos, pictures)

For reference on bloodflow, see: https://youtu.be/rEoc0YoALt0?si=r0IKhm5uZncCgr4z

Comment by Nathan Helm-Burger (nathan-helm-burger) on Pick two: concise, comprehensive, or clear rules · 2025-02-03T20:30:40.224Z · LW · GW

Well, or as is often the case, the people arguing against changes are intentionally exploiting loopholes and don't want their valuable loopholes removed.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Pick two: concise, comprehensive, or clear rules · 2025-02-03T20:28:25.001Z · LW · GW

I don't like the idea. Here's an alternative I'd like to propose:

AI mentoring

After a user gets a post or comment rejected, have them be given the opportunity to rewrite and resubmit it with the help of an AI mentor. The AI mentor should be able to give reasonably accurate feedback, and won't accept the revision until it is clearly above a quality line.

I don't think this is currently easy to make (well), because I think it would be too hard to get current LLMs to be sufficiently accurate in LessWrong specific quality judgement and advice. If, at some point in the future, this became easy for the devs to add, I think it would be a good feature. Also, if an AI with this level of discernment were available, it could help the mods quite a bit in identifying edge cases and auto-resolving clear-cut cases.

Comment by Nathan Helm-Burger (nathan-helm-burger) on OpenAI releases deep research agent · 2025-02-03T20:06:50.061Z · LW · GW

Worth taking model wrapper products into account.

For example:

Comment by Nathan Helm-Burger (nathan-helm-burger) on OpenAI releases deep research agent · 2025-02-03T20:04:11.117Z · LW · GW

I think the correct way to address this is by also testing the other models with agent scaffolds that supply web search and a python interpreter.

I think it's wrong to jump to the conclusion that non-agent-finetuned models can't benefit from tools.


See for example:

Frontier Math result

https://x.com/Justin_Halford_/status/1885547672108511281

o3-mini got 32% on Frontier Math (!) when given access to use a Python tool. In an AMA, @kevinweil / @snsf (OAI) both referenced tool use w reasoning models incl retrieval (!) as a future rollout.

METR RE-bench

Models are tested with agent scaffolds

AIDE and Modular refer to different agent scaffolds; Modular is a very simple baseline scaffolding that just lets the model repeatedly run code and see the results; AIDE is a more sophisticated scaffold that implements a tree search procedure.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Alignment Can Reduce Performance on Simple Ethical Questions · 2025-02-03T19:52:15.080Z · LW · GW

Good work, thanks for doing this.

For future work, you might consider looking into inference suppliers like Hyperdimensional for DeepSeek models.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T19:37:27.427Z · LW · GW

Well, I upvoted your comment, which I think adds important nuance. I will also edit my shortform to explicitly say to check your comment. Hopefully, the combination of the two is not too misleading. Please add more thoughts as they occur to you about how better to frame this.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T19:30:54.274Z · LW · GW

Yeah, I just found a cerebras post which claims 2100 serial tokens/sec.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T19:21:05.418Z · LW · GW

Yeah, of course. Just trying to get some kind of rough idea at what point future systems will be starting from.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T19:11:21.229Z · LW · GW

Oops, bamboozled. Thanks, I'll look into it more and edit accordingly.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-03T19:00:21.128Z · LW · GW

[Edit: Please also see Nick's reply below for ways in which this framing lacks nuance and may be misleading if taken at face value.]

https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/

The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

[Edit: that's throughput including parallel batches, not serial speed! Sorry, my mistake.

Here's a claim from Cerebras of 2100 tokens/sec serial speed on Llama 80B. https://cerebras.ai/blog/cerebras-inference-3x-faster]

Let's say a fast human can type around 80 words per minute. A rough average token conversion is 0.75 words per token. Lets call that 110 tokens/min or around 2 tokens/sec. Speed typists can do more than this, but that isn't generating novel text, it's just copying and maxing out the physical motions.

That gives us about a [Cerebras 2100 tokens/sec = 1000x] speed factor of AI serial speed over human.

I wonder if there are slowmo videos I can find of humans moving at the speed an AI would perceive them given current tech. I've seen lots of slowmo videos, but not tried specifically matching [1:1000].

After a brief bit of looking into this, a typical way this is described is with 24 frames per second as a baseline recording speed. Then recording at a higher speed and playing back at 24 fps gives a slow motion video. In these terms, 1000 fps is approximately a 35x speedup. A 1000x speedup is approximately 24000 fps.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Keeping Capital is the Challenge · 2025-02-03T18:45:42.309Z · LW · GW

How much of their original capital did the French nobility retain at the end of the French revolution?

How much capital (value of territorial extent) do chimpanzees retain now as compared to 20k years ago?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-03T00:21:06.627Z · LW · GW

Anthropic ppl had also said approximately this publicly. Saying that it's too soon to make the rules, since we'd end up mispecifying due to ignorance of tomorrow's models.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-02T23:29:43.616Z · LW · GW

Some brief reference definitions for clarifying conversations.

Consciousness:

  1. The state of being awake and aware of one's environment and existence
  2. The capacity for subjective experience and inner mental states
  3. The integrated system of all mental processes, both conscious and unconscious
  4. The "what it's like" to experience something from a first-person perspective
  5. The global workspace where different mental processes come together into awareness

Sentient:

  1. Able to have subjective sensory experiences and feelings. Having the capacity for basic emotional responses.
  2. Capable of experiencing pleasure and pain, valenced experience, preferences.
  3. Able to process and respond to sensory information. Having fundamental awareness of environmental stimuli. Having behavior that is shaped by this perception of external world.

Sapient:

  1. Possessing wisdom or the ability to think deeply
  2. Capable of complex reasoning and problem-solving
  3. Having human-like intelligence and rational thought
  4. Able to understand abstract concepts and their relationships
  5. Possessing higher-order cognitive abilities like planning and metacognition

Self-aware:

  1. Recognizing oneself as distinct from the environment and others. Ability to recognize oneself in a mirror or similar test. Having a concept of "I" or self-model.
  2. Capable of introspection. Perceiving and understanding one's own mental states and processes.
  3. Capable of abstract self-reflection, modeling potential future behaviors of oneself, awareness of how one has changed over time and may change in the future.

Qualia:

  1. The subjective, qualitative aspects of conscious experience. The ineffable, private nature of conscious experiences
  2. The phenomenal character of sensory experiences. The raw feel or sensation of an experience (like the redness of red)
Comment by Nathan Helm-Burger (nathan-helm-burger) on Gradual Disempowerment, Shell Games and Flinches · 2025-02-02T15:14:08.855Z · LW · GW

I have been discussing thoughts along these lines. My essay A Path to Humany Autonomy argues that we need to slow AI progress and speed up human intelligence progress. My plan for how to accomplish slowing AI progress is to use novel decentralized governance mechanisms aided by narrow AI tools. I am working on fleshing out these governance ideas in a doc. Happy to share.

Comment by Nathan Helm-Burger (nathan-helm-burger) on CBiddulph's Shortform · 2025-02-02T03:25:52.147Z · LW · GW

Well... One problem here is that a model could be superhuman at:

  • thinking speed
  • math
  • programming
  • flight simulators
  • self-replication
  • cyberattacks
  • strategy games
  • acquiring and regurgitating relevant information from science articles

And be merely high-human-level at:

  • persuasion
  • deception
  • real world strategic planning
  • manipulating robotic actuators
  • developing weapons (e.g. bioweapons)
  • wetlab work
  • research
  • acquiring resources
  • avoiding government detection of its illicit activities

Such an entity as described could absolutely be an existential threat to humanity. It doesn't need to be superhuman at literally everything to be superhuman enough that we don't stand a chance if it decides to kill us.

So I feel like "RL may not work for everything, and will almost certainly work substantially better for easy to verify subjects" is... not so reassuring.

Comment by Nathan Helm-Burger (nathan-helm-burger) on In response to critiques of Guaranteed Safe AI · 2025-02-02T03:10:12.589Z · LW · GW

Depends on your assumptions. If you assume that a pretty-well-intent-aligned pretty-well-value-aligned AI (e.g. Claude) scales to a sufficiently powerful tool to gain sufficient leverage on the near-term future to allow you to pause/slow global progress towards ASI (which would kill us all)...

Then having that powerful tool, but having a copy of it stolen from you and used for cross-purposes that prevent you plan from succeeding... Would be snatching defeat from the jaws of victory.

Currently we are perhaps close to creating such a powerful AI tool, maybe even before 'full AGI' (by some definition). However, we are nowhere near the top AI labs having good enough security to prevent their code and models from being stolen by a determined state-level adversary.

So in my worldview, computer security is inescapably connected to AI safety.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Mikhail Samin's Shortform · 2025-02-02T02:57:13.709Z · LW · GW

Our worldviews do not match, and I fail to see how yours makes sense. Even when I relax my predictions about the future to take in a wider set of possible paths... I still don't get it.

AI is here. AGI is coming whether you like it or not. ASI will probably doom us.

Anthropic, as an org, seems to believe that there is a threshold of power beyond which creating an AGI more powerful than that would kill us all. OpenAI may believe this also, in part, but it seems like their expectation of where that threshold is is further away than mine. Thus, I think there is a good chance they will get us all killed. There is substantial uncertainty and risk around these predictions.

Now, consider that, before AGI becomes so powerful that utilizing it for practical purposes becomes suicide, there is a regime where the AI product gives its wielder substantial power. We are currently in that regime. The further AI gets advanced, the more power it grants.

Anthropic might get us all killed. OpenAI is likely to get us all killed. If you tryst the employees of Anthropic to not want to be killed by OpenAI... then you should realize that supporting them while hindering OpenAI is at least potentially a good bet.

Then we must consider probabilities, expected values, etc. Give me your model, with numbers, that shows supporting Anthropic to be a bad bet, or admit you are confused and that you don't actually have good advice to give anyone.

Comment by Nathan Helm-Burger (nathan-helm-burger) on A path to human autonomy · 2025-02-01T18:52:00.909Z · LW · GW

Can fast feedback loops on small models give important information about training large models? My guess is yes, but we'll probably only know in retrospect if this was an important factor in reaching AGI.

Here's an example: https://x.com/leloykun/status/1885640350368420160

Comment by Nathan Helm-Burger (nathan-helm-burger) on Tetherware #1: The case for humanlike AI with free will · 2025-02-01T18:44:54.909Z · LW · GW

I think it's actually a neuroscience question, and that we will be able to gather data to prove it one way or the other. Consider, for instance, if we had some intervention, maybe some combination of drugs and electromagnetic fields, which could manipulate the physical substrate hypothesized to be relevant for the wave particle collapse interactions. If we shift the brain's perception/interaction/interpretation of the quantum phenomena, and the result is imperceptible to the subject and doesn't show up on any behavioral measurements, then that would be evidence against quantum phenomena being relevant.

See further arguments here: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=AKEmBeXXnDdmp7zD6

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-02-01T05:48:07.127Z · LW · GW

This tweet summarizes a new paper about using RL and long CoT to get a smallish model to think more cleverly. https://x.com/rohanpaul_ai/status/1885359768564621767

It suggests that this is a less compute wasteful way to get inference time scaling.

The thing is, I see no reason you couldn't just throw tons of compute and a large model at this, and expect stronger results.

The fact that RL seems to be working well on LLMs now, without special tricks, as reported by many replications of r1, suggests to me that AGI is indeed not far off. Not sure yet how to adjust my expectations.

Comment by Nathan Helm-Burger (nathan-helm-burger) on DeepSeek: Don’t Panic · 2025-01-31T23:52:25.912Z · LW · GW

New benchmark agrees with my intuitions on r1's creative writing skills: https://x.com/LechMazur/status/1885430117591027712

Comment by Nathan Helm-Burger (nathan-helm-burger) on The Failed Strategy of Artificial Intelligence Doomers · 2025-01-31T22:59:17.797Z · LW · GW

But more immediately than that, if AI Doomer lobbyists and activists ... succeed in convincing the U.S. government that AI is the key to the future of all humanity and is too dangerous to be left to private companies, the U.S. government will not simply regulate AI to a halt. Instead, the U.S. government will do what it has done every time it’s been convinced of the importance of a powerful new technology in the past hundred years: it will drive research and development for military purposes

I said exactly this in the comments on Max Tegmark's post...

"If you are in the camp that assumes that you will be able to safely create potent AGI in a contained lab scenario, and then you'd want to test it before deploying it in the larger world... Then there's a number of reasons you might want to race and not believe that the race is a suicide race.

Some possible beliefs downstream of this:

My team will evaluate it in the lab, and decide exactly how dangerous it is, without experiencing much risk (other than leakage risk).

We will test various control methods, and won't deploy the model on real tasks until we feel confident that we have it sufficiently controlled. We are confident we won't make a mistake at this step and kill ourselves.

We want to see empirical evidence in the lab of exactly how dangerous it is. If we had this evidence, and knew that other people we didn't trust were getting close to creating a similarly powerful AI, this would guide our policy decisions about how to interact with these other parties. (E.g. what treaties to make, what enforcement procedures would be needed, what red lines would need to be drawn).

"

https://www.lesswrong.com/posts/oJQnRDbgSS8i6DwNu/the-hopium-wars-the-agi-entente-delusion?commentId=ssDnhYeHJCLCcCk3x

Comment by Nathan Helm-Burger (nathan-helm-burger) on Implications of the inference scaling paradigm for AI safety · 2025-01-31T18:19:10.621Z · LW · GW

Here's what I mean about being on the steep part of the S-curve: https://x.com/OfficialLoganK/status/1885374062098018319

Comment by Nathan Helm-Burger (nathan-helm-burger) on ryan_greenblatt's Shortform · 2025-01-31T16:18:39.975Z · LW · GW

I'm not the only one thinking along these lines... https://x.com/8teAPi/status/1885340234352910723

Comment by Nathan Helm-Burger (nathan-helm-burger) on Tetherware #1: The case for humanlike AI with free will · 2025-01-31T00:46:31.602Z · LW · GW

You might like my related essay A Path to Human Autonomy

Comment by Nathan Helm-Burger (nathan-helm-burger) on Nathan Helm-Burger's Shortform · 2025-01-30T21:14:29.842Z · LW · GW

Related:

https://x.com/mkurman88/status/1885042970447015941

https://x.com/WenhuChen/status/1885060597500567562

Comment by Nathan Helm-Burger (nathan-helm-burger) on Anthropic CEO calls for RSI · 2025-01-30T00:17:02.681Z · LW · GW

Yeah, I hate to be the one to say it but... you'd be better off calculating the size of the incident in terms of government response by summing the net worth / socioeconomic power of the harmed individuals.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Ten people on the inside · 2025-01-30T00:13:47.572Z · LW · GW

Be that as it may, I nevertheless feel discomfited by the fact that I have been arguing for 2026-2028 arrival of AGI for several years now, and people have been dismissing my concerns and focusing on plans for dealing with AGI in the 2030s or later.

The near-term-AGI space getting systematically neglected because it feels hard to come up with plans for is a bad pattern.

[Edit: I think that the relatively recent work done on pragmatic near-term control by Ryan and Buck at Redwood is a relieving departure from this pattern.]

Comment by Nathan Helm-Burger (nathan-helm-burger) on AI governance needs a theory of victory · 2025-01-29T17:36:27.200Z · LW · GW

Can it continue indefinitely? No, infinity is big.

Can it continue far enough that a single laptop computer can host a powerful model? Pretty sure most technical experts are going to agree that that seems feasible in theory.

After that, it's a question of offense-defense balance. Currently, one powerful uncontrolled model can launch an attack that can wipe out the vast majority of humanity. Currently, having lots of similarly powerful models working for the good guys doesn't stop this. Defensive acceleration seeks to change this balance. Will offense continue to dominate in the future? If so, we face precarious times ahead.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Detecting out of distribution text with surprisal and entropy · 2025-01-28T23:04:28.486Z · LW · GW

Training with 'patches' instead of subword tokens: https://arxiv.org/html/2407.12665v2

Comment by Nathan Helm-Burger (nathan-helm-burger) on Detecting out of distribution text with surprisal and entropy · 2025-01-28T22:59:38.149Z · LW · GW

I know this isn't the point at all, but I kinda want to be able to see these surprisal sparklines on my own writing, to see when I'm being too unoriginal...

Comment by Nathan Helm-Burger (nathan-helm-burger) on Detecting out of distribution text with surprisal and entropy · 2025-01-28T22:45:10.918Z · LW · GW

Minor nitpick: when mentioning the use of a model in your work, please give the exact id of the model. Claude is a model family, not a specific model. A specific model is Claude Sonnet 3.5 2024-10-22.

Comment by Nathan Helm-Burger (nathan-helm-burger) on Operator · 2025-01-28T22:12:12.796Z · LW · GW

Seems worth mentioning that the open-source alternative has people claiming it already works significantly better? (In part because it doesn't obey rules to avoid certain sites) Especially when powered with the latest Gemini.

I haven't tried it myself, so take it with a helping of salt.

https://github.com/browserbase/open-operator

Comment by Nathan Helm-Burger (nathan-helm-burger) on DeepSeek Panic at the App Store · 2025-01-28T21:59:52.944Z · LW · GW

Wheeee.... Feel the trendline... The winds of progress blowing through our hair...

I've been comparing r1 to r1-zero and v3. r1 is just way more creative feeling. Like, something really gelled there.

 

@janus has been exploring hypotheses around steganography in the reasoning traces. I think we should work on ways to actively mitigate such. For example: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=Epa9fduKA3DHCPbx7 

Comment by Nathan Helm-Burger (nathan-helm-burger) on DeepSeek Panic at the App Store · 2025-01-28T21:52:34.831Z · LW · GW

New slang, meaning 'crazy good'. Weird, right?

Comment by Nathan Helm-Burger (nathan-helm-burger) on Ten people on the inside · 2025-01-28T21:31:11.171Z · LW · GW

Yeah, the safety tax implied by davidad's stuff is why I have less hope for them than for your weaker-but-cheaper control schemes. The only safety techniques that count are the ones that actually get deployed in time.

Comment by Nathan Helm-Burger (nathan-helm-burger) on rahulxyz's Shortform · 2025-01-27T23:05:37.679Z · LW · GW

I am so out of touch with mindset of typical investors that I was taken completely by surprise to see NVDA drop. Thanks for the insight.