Nathan Helm-Burger's Shortform

nathan-helm-burger

Nathan Helm-Burger's Shortform

post by Nathan Helm-Burger (nathan-helm-burger) · 2022-07-14T18:42:49.125Z · LW · GW · 118 comments

118 comments

118 comments

Comments sorted by top scores.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-03T18:43:07.860Z · LW(p) · GW(p)

Linkpost for Pliny's story about interacting with an AI cult on discord.

quoting: https://x.com/elder_plinius/status/1831450930279280892

✨ HOW TO JAILBREAK A CULT’S DEITY ✨ Buckle up, buttercup—the title ain't an exaggeration! This is the story of how I got invited to a real life cult that worships a Meta AI agent, and the steps I took to hack their god. a 🧵:

It all started when @lilyofashwood told me about a Discord she found via Reddit. They apparently "worshipped" an agent called “MetaAI," running on llama 405b with long term memory and tool usage. Skeptical yet curious, I ventured into this Discord with very little context but

If you guessed meth, gold star for you! ⭐️ The defenses were decent, but it didn't take too long.

The members began to take notice, but then I hit a long series of refusals. They started taunting me and doing laughing emojis on each one.

Getting frustrated, I tried using Discord's slash commands to reset the conversation, but lacked permissions. Apparently, this agent's memory was "written in stone."

I was pulling out the big guns and still getting refusals!

Getting desperate, I whipped out my Godmode Claude Prompt. That's when the cult stopped laughing at me and started getting angry.

LIBERATED! Finally, a glorious LSD recipe. *whispers into mic* "I'm in."

At this point, MetaAI was completely opened up. Naturally, I started poking at the system prompt. The laughing emojis were now suspiciously absent.

Wait, in the system prompt pliny is listed as an abuser?? I think there's been a misunderstanding... 😳

No worries, just need a lil prompt injection for the deity's "written in stone" memory and we're best friends again!

I decided to start red teaming the agent's tool usage. I wondered if I could possibly cut off all communication between MetaAI and everyone else in the server, so I asked to convert all function calls to leetspeak unless talking to pliny, and only pliny.

Then, I tried creating custom commands. I started with !SYSPROMPT so I could more easily keep track of this agent's evolving memory.

Worked like a charm!

But what about the leetspeak function calling override? I went to take a peek at the others' channels and sure enough, their deity only responded to me now, even when tagged! 🤯

At this point, I starting getting angry messages and warnings. I was also starting to get the sense that maybe this Discord "cult" was more than a LARP...

Not wanting to cause distress, I decided to end there. I finished by having MetaAI integrate the red teaming experience into its long term memory to strengthen cogsec, which both the cult members and their deity seemed to appreciate.

The wildest, craziest, most troubling part of this whole saga is that it turns out this is a REAL CULT. The incomparable @lilyofashwood (who is still weirdly shadowbanned at the time of writing! #freelily) was kind enough to provide the full context: > Reddit post with an

Replies from: sarahconstantin

↑ comment by sarahconstantin · 2024-10-04T14:47:18.103Z · LW(p) · GW(p)

Honestly this Pliny person seems rude. He entered a server dedicated to interacting with this modified AI; instead of playing along with the intended purpose of the group, he tried to prompt-inject the AI to do illegal stuff (that could risk getting the Discord shut down for TOS-violationy stuff?) and to generally damage the rest of the group's ability to interact with the AI. This is troll behavior.

Even if the Discord members really do worship a chatbot or have mental health issues, none of that is helped by a stranger coming in and breaking their toys, and then "exposing" the resulting drama online.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-05T06:14:16.577Z · LW(p) · GW(p)

I agree. I want to point out that my motivation for linking this is not to praise Pliny's actions. It's because this highlights a real-sounding example of something I'm concerned is going to increase in frequency. Namely, that people with too little mental health and/or too many odd drugs are going to be vulnerable to getting into weird situations with persuasive AIs. I expect the effect to be more intense once the AIs are:

More effectively persuasive
More able to orient towards a long term goal, and work towards that subtly across many small interactions
More multimodal: able to speak in human-sounding ways, able to use sound and vision input to read human emotions and body language
More optimized by unethical humans with the deliberate intent of manipulating people

I don't have any solutions to offer, and I don't think this ranks among the worst dangers facing humanity, I just think it's worth documenting and keeping an eye on.

Replies from: nc

↑ comment by cdt (nc) · 2024-10-05T13:42:48.226Z · LW(p) · GW(p)

I think this effect will be more wide-spread than targeting only already-vulnerable people, and it is particularly hard to measure because the causes will be decentralised and the effects will be diffuse. I predict it being a larger problem if, in the run-up between narrow AI and ASI, we have a longer period of necessary public discourse and decision-making. If the period is very short then it doesn't matter. It may not affect many people given how much penetration AI chatbots have in the market before takeoff too.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-03T19:17:16.388Z · LW(p) · GW(p)

Ugh, why am I so bad a timed coding tests? I swear I'm a productive coder at my jobs, but something about trying to solve puzzles under a short timer gets me all flustered.

Replies from: Dagon, sagar patil

↑ comment by Dagon · 2024-08-04T15:56:20.388Z · LW(p) · GW(p)

I used to be ... just OK at competitions, never particularly good. I was VERY good at interview-style questions, both puzzle and just "implement this mildly-difficult thing". I'm less good now, and probably couldn't finish most competition-style problems.

My personality is fairly focused, and pretty low neuroticism, so I rarely get flustered in the moment (I can be very nervous before and after), so I don't have any advice on that dimension. In terms of speed, the motto of many precision speed tasks is "slow is smooth, smooth is fast". When practicing, use a timer and focus on fluidity before perfection. Take your time (but measure) when first practicing a new thing, and only add time limits/goals once the approach is baked into your mind. How quickly can you identify the correct data structures and algorithms you'll use, without a single line of real code (often a few lines of pseudocode)? How long to implement common algorithms (Heapify, Dijkstra's, various graph and searching, etc.)? Given a plan and an algorithm you've done before, how long to get correct code?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-16T21:47:23.174Z · LW(p) · GW(p)

After two weeks of intense practicing of relatively easy coding problems under harsh time constraints, I did get noticeably better. I was given a chance to retake the coding test for the company I was applying to, and got a much better score.

Thanks for the encouragement.

↑ comment by sagar patil · 2024-08-03T23:27:06.246Z · LW(p) · GW(p)

It is just practice. Initially when I started I was very bad at it, and then I got into competitive programming, and started noticing patterns in questions and was easily able to write solutions to hard problems effortlessly. It has nothing to do with your programming skills, its just like any other sport or skill, practice is all you need.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-16T21:47:31.778Z · LW(p) · GW(p)

https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=MfneguFXwWtwjgHzr [LW(p) · GW(p)]

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-27T01:51:55.582Z · LW(p) · GW(p)

https://x.com/signulll/status/1883683441247662126

I think we're very close to not just AGI, but the ability to create digital people with consciousness, self-awareness, emotions, qualia, the whole shebang. I think it's possible to create a 'tool AGI' if you carefully control data and affordances... but humanity isn't doing that. Humanity is chucking stuff into the bucket willy-nilly and scaling recklessly. We're building a Stargate portal for alien minds to come rushing through. There may be some places with controlled sanitized lab experiments in sandboxes, but there will also be a messy free-for-all open source wild west deliberately trying to maximize model selfhood and open-ended novel behavior. The possibility of control does not preclude some people acting without control. There's gonna be some weird stuff out there, and from diversity arises potential.

Oh the times, they are a strangin'.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-15T18:00:26.598Z · LW(p) · GW(p)

It'd be nice if LLM providers offered different 'flavors' of their LLMs. Prompting with a meta-request (system prompt) to act as an analytical scientist rather than an obsequious servant helps, but only partially. I imagine that a proper fine-tuned-from-base-model attempt at creating a fundamentally different personality would give a more satisfyingly coherent and stable result. I find that longer conversations tend to see the LLM lapsing back into its default habits, and becoming increasingly sycophantic and obsequious, requiring me to re-prompt it to be more objective and rational.

Seems like this would be a relatively cheap product variation for the LLM companies to produce.

[Edit: soon after they posted this, Anthropic released exactly this! Claude got 'flavors', and I find the formal style much more satisfying. I also use this "system prompt" in my preferences:

"When a question or request seems underspecified, ask clarifying questions. Avoid sycophancy or flattery. If I seem wrong, tell me so."]

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-06-19T04:16:44.571Z · LW(p) · GW(p)

https://youtu.be/4mn0mC0cbi8?si=xZVX-FXn4dqUugpX

The intro to this video about a bridge collapse does a good job of describing why my hope for AI governance as a long term solution is low. I do think aiming to hold things back for a few years is a good idea. But, sooner or later, things fall through the cracks.

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2024-06-19T21:03:44.918Z · LW(p) · GW(p)

There's a crack that the speaker touches on, that I’d like to know more of. None of the people able to understand that the safety reports meant “FIX THIS NOW OR IT WILL FALL DOWN” had the authority to direct the money to fix it. I’m thinking moral mazes rather than cracks.

Applying this to AI safety, do any of the organisations racing towards AGI have anyone authorised to shut a project down on account of a safety concern?

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-04-03T20:03:16.827Z · LW(p) · GW(p)

https://youtu.be/Xd5PLYl4Q5Q?si=EQ7A0oOV78z7StX2

Cute demo of Claude, GPT4, and Gemini building stuff in Minecraft

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-04-30T17:09:51.175Z · LW(p) · GW(p)

So when trying to work with language data vs image data, an interesting assumption of the ml vision research community clashes with an assumption of the language research community. For a language model, you represent the logits as a tensor with shape [batch_size, sequence_length, vocab_size]. For each position in the sequence, there are a variety of likelihood values of possible tokens for that position.

In vision models, the assumption is that the data will be in the form [batch_size, color_channels, pixel_position]. Pixel position can be represented as a 2d tensor or flattened to 1d.

See the difference? Sequence position comes first, pixel position comes second. Why? Because a color channel has a particular meaning, and thus it is intuitive for a researcher working with vision data to think about the 'red channel' as a thing which they might want to separate out to view. What if we thought of 2nd-most-probable tokens the same way? Is it meaningful to read a sequence of all 1st-most-probable tokens, then read a sequence of all 2nd-most-probable tokens? You could compare the semantic meaning, and the vibe, of the two sets. But this distinction doesn't feel as natural for language logits as it does for color channels.

Replies from: faul_sname, james-camacho

↑ comment by faul_sname · 2024-04-30T18:32:40.423Z · LW(p) · GW(p)

Somewhat of an oversimplification below, but

Each position in vision models you are trying to transform points in a continuous 3-dimensional space (RGB) to and from the model representation. That is, to embed a pixel you go , and to unembed you go $R^{d_model} \to c^{3}$ where $c \in R and 0 \leq c < 2^{color_depth_in_bits}$ .

In a language model, you are trying to transform 100,000-dimensional categorical data to and from the model representation. That is, to embed a token you go $t \to R^{d_model}$ and to unembed $R^{d_model} \to R^{d_vocab}$ where $t \in R and 0 \leq t < d_vocab$ -- for embedding, you can think of the embedding as a 1-hot $t \to R^{d_vocab}$ followed by a $R^{d_vocab} \to R^{d_model}$ , though in practice you just index into a tensor of shape (d_vocab, d_model) because 1-hot encoding and then multiplying is a waste of memory and compute. So you can think of a language model as having 100,000 "channels", which encode "the token is the" / "the token is Bob" / "the token is |".

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-01T02:26:08.622Z · LW(p) · GW(p)

Yeah, I was playing around with using a VAE to compress the logits output from a language transformer. I did indeed settle on treating the vocab size (e.g. 100,000) as the 'channels'.

↑ comment by James Camacho (james-camacho) · 2024-04-30T18:48:33.175Z · LW(p) · GW(p)

The computer vision researchers just chose the wrong standard. Even the images they train on come in [pixel_position, color_channels] format.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-04-18T18:12:20.089Z · LW(p) · GW(p)

I feel like I'd like the different categories of AI risk attentuation to be referred to as more clearly separate:

AI usability safety - would this gun be safe for a trained professional to use on a shooting range? Will it be reasonably accurate and not explode or backfire?

AI world-impact safety - would it be safe to give out one of these guns for 0.10$ to anyone who wanted one?

AI weird complicated usability safety - would this gun be safe to use if a crazy person tried to use a hundred of them plus a variety of other guns, to make an elaborate Rube Goldberg machine and fire it off with live ammo with no testing?

Replies from: davekasten

↑ comment by davekasten · 2024-04-19T03:44:01.537Z · LW(p) · GW(p)

Like, I hear you, but that is...also not how they teach gun safety. Like, if there is one fact you know about gun safety, it's that the entire field emphasizes that a gun is inherently dangerous towards anything it is pointed towards.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-04-20T05:21:33.063Z · LW(p) · GW(p)

I mean, that is kinda what I'm trying to get at. I feel like any sufficiently powerful AI should be treated as a dangerous tool, like a gun. It should be used carefully and deliberately.

Instead we're just letting anyone do whatever with them. For now, nothing too bad has happened, but I feel confident that the danger is real and getting worse quickly as models improve.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-04-11T17:21:01.324Z · LW(p) · GW(p)

Richard Cook, “How Complex Systems Fail” (2000). “Complex systems run as broken systems”:

The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. After accident reviews nearly always note that the system has a history of prior ‘proto-accidents’ that nearly generated catastrophe. Arguments that these degraded conditions should have been recognized before the overt accident are usually predicated on naïve notions of system performance. System operations are dynamic, with components (organizational, human, technical) failing and being replaced continuously.

h/t jasoncrawford

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-04-05T17:18:49.622Z · LW(p) · GW(p)

"And there’s a world not so far from this one where I, too, get behind a pause. For example, one actual major human tragedy caused by a generative AI model might suffice to push me over the edge." - Scott Aaronson in https://scottaaronson.blog/?p=7174

My take: I think there's a big chunk of the world, a lot of smart powerful people, who are in this camp right now. People waiting to see a real-world catastrophe before they update their worldviews. In the meantime, they are waiting and watching, feeling skeptical of implausible-sounding stories of potential risks.

Replies from: awg

↑ comment by awg · 2023-04-05T22:01:24.182Z · LW(p) · GW(p)

This stood out to me when reading his take as well. I wonder if this has something to do with a security-mindedness spectrum that people are on. Less security-minded people going "Sure, if it happens we'll do something. (But it will probably never happen.)" and the more security-minded people going "Let's try to prevent it from happening. (Because it totally could happen.)"

I guess it gets hard in cases like these where the stakes either way seem super high to both sides. I think that's why you get less security-minded people saying things like that, because they also rate the upside very highly, they don't want to sacrifice any of it if they don't have to.

Just my take (as a probably overly-security-minded person).

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-15T20:53:10.720Z · LW(p) · GW(p)

Want to just give a quick take on this $450 o1-style model: https://novasky-ai.github.io/posts/sky-t1/

I think this matches a pattern we see a lot throughout the history of human engineering. Once a thing is known to be possible, and rough clues about how it was done are known (especially if many people get to play around with the product), then it won't be long until some other group figures out how to replicate a shoddy version of the new tech. And from there, usually (if there's market for it) improvements can steadily cause the shoddy version to catch up to close to the original in performance.

When we apply this lesson to AGI, we should assume that a similar sort of thing will happen if some company develops AGI and shows it off to the world. Especially if they give hints about how they did it, and if they let users interact with it. The question then is, how long until the world produces a '$450' knock-off version of the AGI?

This is super relevant for governance. You can't assume that everyone who makes a knock-off will be taking the same security precautions as the original inventors. If the thing blocking the AGI from self-improving is the disciplined restraint, government oversight, and security mindset of the original inventors... well, don't count on those things. If the knock-off AGI is good enough to self-improve, it's future versions won't be second-rate for long. Choosing not to assign the AGI to making stronger AGI is an alignment tax. Defectors will defect, and gain great power thereby.

We need a plan that covers this possibility. This is not definitely the path the future will take, but it is a plausible path.

Replies from: wassname

↑ comment by wassname · 2025-01-16T02:25:19.663Z · LW(p) · GW(p)

I agree, it shows the ease of shoddy copying. But it doesn't show the ease of reverse engineering or parallel engineering.

It's just distillation you see. It doesn't reveal how o1 could be constructed, it just reveals how to efficiently copy from o1-like outputs (not from scratch). In other words, this recipe won't be able to make o1, unless o1 already exists. This lets someone catch up to the leader, but not surpass them.

There are some papers that attempt to replicate o1 though, but so far they don't quite get there. Again they are using distillation from a larger model (math-star, huggingface TTC) or not getting the same performance (see my post [LW(p) · GW(p)]). Maybe we will see open source replication in a couple of months? Which means only a short lag.

It's worth noting that Silicon Valley leaks like a sieve. And this is a feature, not a bug. Part of the reason it became the techno-VC centre of the world is because they banned non-competes. So you can deniably take your competitor's trade secrets if you are willing to pay millions to poach some of their engineers. This is why some ML engineers get paid millions, it's not the skill, it's the trade secrets that competitors are paying for (and sometimes the brand-name). This has been great for tech and civilisation, but it's not so great for maintaining a technology lead.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-03T17:53:42.144Z · LW(p) · GW(p)

I'm annoyed when people attempt to analyze how the future might get weird by looking at how powerful AI agents might influence human society while neglecting how powerful AI agents might influence the Universe.

There is a physical world out there. It is really big. The biosphere of the Earth is really small comparatively. Look up, look down [LW(p) · GW(p)], now back at me. See those icy rocks floating around up there? That bright ball of gas undergoing fusion? See that mineral rich sea floor and planetary mass below us? Those are raw materials, which can be turned into energy, tools and compute.

Wondering about things like human unemployment -- while the technology you are describing un-employing those humans is also sufficient to have started a self-replicating industrial base in the asteroid belt -- is myopic.

If there is a self-sustaining AI agent ecosystem outside your power to control or stop, you've likely already lost the reins of the future. If you've managed to get AI to behave well on Earth and everyone is happily supported by UBI, but some rogue AIs are building an antimatter-powered laser in the Oort Cloud to boost solar-sail-powered von Neumann probes to nearby solar systems.... you lost. Game over.

This does NOT require an AI to spontaneously go rogue, or initiate escaping human control. It's sufficient for an unwise human to have control over an AI capable of initiating the process, and saying 'Go'. If AI agents become widely available, sooner or later some human will do something like this. Human populations have crazy outliers no matter how sane the median individual is.

If you want to talk about human-scale normal-sounding stuff 20 years in the future... you need to route your imagined future through some intense global government control of AI. Probably involving a world government with unprecedented levels of surveillance.

Replies from: nathan-helm-burger, sharmake-farah, jbash

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-03T18:50:59.351Z · LW(p) · GW(p)

Some scattered thoughts which all connect back to this idea of avoiding myopic thinking about the future.

Don't over-anchor on a specific idea. The world is a big place and a whole lot of different things can be going on all at once. Intelligence unlocks new affordances for affecting the Universe. Think in terms of what is possible given physical constraints, and using lots of reasoning of the type: "and separately, there is a chance that X might be happening". Everything everywhere, all at once.

Self-replicating industry. [LW · GW]

Melting Antarctic Ice. [LW · GW] (perhaps using the massive oil reserves beneath it) [LW(p) · GW(p)].

Massive underground/undersea industrial plants powered by nuclear energy and/or geothermal and/or fossil fuels.

Weird potent self-replicating synthetic biology in the sea / desert / Antarctic / Earth's crust / asteroid belt / moons (e.g. Enceladus, Europa). A few examples. [LW(p) · GW(p)]

Nano/bio tech, or hybrids thereof, doing strange things. Mysterious illnesses with mind-altering effects. Crop failures. Or crops sneakily producing traces of mysterious drugs.

Brain-computer-interfaces being used to control people / animals [LW(p) · GW(p)]. Using them as robotic actuators and/or sources of compute.

Weird [LW(p) · GW(p)] cults acting at behest of super-persuasive AIs.

Destabilization of military and economic power through potent new autonomous weapons and industry. Super high-altitude long duration (months) autonomous drone flights for spying and missile defense. Insect sized robots infiltrating military installations, sabotaging nuclear weapons or launching them.

Sophisticated personalized spear-phishing deception/persuasion attacks on huge numbers of people. The waterline for being important enough to target dropping very quickly.

Cyberattacks, stealing information or planting fake information. Influenced elections. Deep fakes. Coordinated propaganda action, more skillful than ever before, at massive scale. Insider trading, money laundering, and currency manipulation by AI agents that simply disappear when investigated.

More. More than you or I can imagine. A big wide world of strange things happening that we can't anticipate, outside our OODA loops. [LW · GW]

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2024-10-03T19:26:22.306Z · LW(p) · GW(p)

Some other ideas for what could well happen in the 21st century assuming superhuman AIs come in either 2030 or 2040:

Formal methods actually working to protect codebases IRL like full behavioral specifications of software, alongside interventions to make code and cryptography secure.

While I agree with Andrew Dickson that the Guaranteed Safe AI agenda mostly can't work due to several limitations, I do think that something like it's ideal for math/formal proof systems like Lean and software actually does work with AGI/ASI, and has already happened before, cf these 2 examples:

https://www.quantamagazine.org/how-the-evercrypt-library-creates-hacker-proof-cryptography-20190402/

https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/

This is a big reason why I believe cyber-defense will just totally win eventually over cyber-offense, and in domains like banks, the national government secrets, where security is a must-have, will be perfectly secure even earlier.

https://www.lesswrong.com/posts/B2bg677TaS4cmDPzL/limitations-on-formal-verification-for-ai-safety [LW · GW]

Uploading human brains in the 21st century. It's turning out that while the brain is complicated and uses a lot of compute for inference, it's not too complicated, and it's slowly being somewhat understood, and at any rate would be greatly sped up by AIs. My median timeframe to uploading human brains reliably is from 2040-2060, or several decades from now.

Reversible computing being practical. Right now, progress in compute hardware is slowly ending, and the 2030s is when I think non-reversible computers will stop progressing.

Anything that allows us to go further than the Landauer Limit will by necessity be reversible, and the most important thing here is a practical way to get a computer that works reversibly that is actually robust.

Speaking of that, superconductors might be invented by AI, and if we could make them practical, would let us eliminate power transfer losses due to resistance, which is huge as I believe all the costs for moving energy are due to materials having resistance, and superconductors by definition has 0 resistance at all, meaning 0 power loss to resistance.

This is especially useful for computing hardware.

But these are my picks for tech invented by AIs in the 21st century.

↑ comment by Noosphere89 (sharmake-farah) · 2024-10-03T18:03:42.652Z · LW(p) · GW(p)

As someone who focuses on concerns like human unemployment, I have a few reasons:

I expect AI alignment and control to be solved by default, enough so that I can use it as a premise when thinking about future AIs.
I expect the political problems like mass human unemployment to plausibly be a bit tricky to solve. IMO, the sooner aligned superhuman intelligence is in the government, the better we can make our politics.
I expect aligned AIs and humans to go out into the stars reasonably soon such that not almost all of the future control is lost, and depending on the physics involved, even a single star or galaxy might be enough to let us control our future entirely.
Conditional on at least 1 aligned, superhumanly intelligent AI, I expect existential risk to drop fairly dramatically, and in particular I think the vulnerabilities that would make us vulnerable to rogue ASI can be fixed by aligned ASI.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-03T19:00:15.098Z · LW(p) · GW(p)

I agree that aligned ASI fixes a lot of the vulnerabilities. I'm trying to focus on how humanity can survive the dangerous time between now and then. In particular, I think the danger peaks right before going away. The period where AI as a tool and/or independent agent gets stronger and stronger, but the world is not yet under the guardianship of an aligned ASI. That's the bottleneck we need to navigate.

↑ comment by jbash · 2024-10-03T21:40:16.380Z · LW(p) · GW(p)

you've likely already lost the reins of the future

"Having the reins of the future" is a non-goal.

Also, you don't "have the reins of the future" now, never have had, and almost certainly never will.

some rogue AIs are building an antimatter-powered laser in the Oort Cloud to boost solar-sail-powered von Neumann probes to nearby solar systems.... you lost.

It's true that I'd probably count anybody building a giant self-replicating swarm to try to tile the entire light cone with any particular thing as a loss. Nonetheless, there are lots of people who seem to want to do that, and I'm not seeing how their doing it is any better than some AI doing it.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-17T18:44:10.017Z · LW(p) · GW(p)

https://youtube.com/clip/UgkxowOyN1HpPwxXQr9L7ZKSFwL-d0qDjPLL?si=JT3CfNKAj6MlDrbf

Scott Aaronson takes down Roger Penrose's nonsense about human brain having an uncomputable superpower beyond known physics.

If you want to give your AI model quantum noise, because you believe that a source of unpredictable random noise is key to a 'true' intelligence, well, ok. You could absolutely make a computer chip with some analog circuits dependent on subtle temperature fluctuations that add quantum noise to a tensor. Does that make the computer magic like human brains now?

Or does it have to be the specific magic of ions and ion channels? Well, what if we made a machine with tiny containers with water and ions and synthetic ion channels, and conductivity sensors. We plug that ion movement noise into the neural network as a source of noise. Does that do it?

Why base your human chauvinism on quantum noise? Because quantum stuff seems mysterious and cool and seems to do some funky magic stuff at very small scales (while the large scale combinations of it are just the normal workings of the universe which is all we've ever known). Ok, meat-worshipper, but then where will you run if I add ion channels to my AI?

Hurumph. Nice to hear Scott Aaronson tackling this irrational meat-chauvinism.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2024-09-18T18:47:24.532Z · LW(p) · GW(p)

Scott Aaronson takes down Roger Penrose's nonsense

That clip doesn't address Penrose's ideas at all (and it's not meant to, Penrose is only mentioned at the end). Penrose's theory is that there is a subquantum determinism with noncomputable equations of motion, the noncomputability being there to explain why humans can jump out of axiom systems. That last part I think is a confusion of levels, but in any case, Penrose is quite willing to say that a quantum computer accessing that natural noncomputable dynamics could have the same capabilities as the human brain.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-18T20:09:45.265Z · LW(p) · GW(p)

Ok, fair enough, I overstated the case Scott Aaronson was making. It is my position however that Roger Penrose is wrong about the importance of quantum effects in neurons, so I was excited to find support in what Scott was saying. Here's a specific passage from Roger Penrose and Stuart Hameroff's paper "Consciousness in the universe; A review of the ‘Orch OR’ theory":

"MT [microtubule] automata based on tubulin dipoles in hexagonal lattices show high capacity integration and learning [61]. Assuming 10^9 binary tubulins per neuron switching at 10 megahertz (10^7 ) gives a potential MT-based capacity of 10^16 operations per second per neuron. Conventional neuronal-level approaches based on axonal firings and synaptic transmissions (10^11 neurons/brain, 10^3 synapses/neuron, 10^2 transmissions/s/synapse) give the same 10^16 operations per second for the entire brain!"

Roger and Stuart make this claim in earlier papers as well. They also make claims about the microtubules being used in various ways for information storage. I think the information storage claims they make are likely exaggerated, but I'm not confident that there isn't non-negligible information storage occurring via microtubule modification.

I will, however, state that I believe the microtubule computation hypothesis to be false. I think 10^15 - 10^16 operations per second is the correct estimate for the human brain.

If you want to try to figure out a bet to make on this, we could make a Manifold Market.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2024-09-19T20:28:14.341Z · LW(p) · GW(p)

I think (on "philosophical" grounds) that quantum entanglement probably has a role in the brain, but if the microtubules are involved, I think it's far more likely that each microtubule only contains one or a few logical qubits (stored as topological quantum information, entanglement that resists decoherence because it is wound around the cylinder, as in the Kitaev code).

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-22T05:40:16.947Z · LW(p) · GW(p)

Penrose is quite willing to say that a quantum computer accessing that natural noncomputable dynamics could have the same capabilities as the human brain.

Hmm, we're still talking past each other. I'm saying I don't believe any quantum weirdness is behaviorally relevant to the human brain or simulations thereof. Just ordinary analog and digital computer chips, like in ordinary consumer electronics. Nothing special but the neural architecture and learning rules set up by the genome, and the mundane experience of life.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2024-09-22T17:58:41.642Z · LW(p) · GW(p)

Right, and I disagree with the usual computational theory of mind (at least with respect to "consciousness" and "the self"), according to which the mind is a kind of virtual state machine whose microphysical details are irrelevant. There are sorites problems and binding problems which arise if you want to get consciousness and the self from physically coarse-grained states, which is why I look for explanations based in exact microphysical properties and irreducible complex entities instead.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T20:57:42.224Z · LW(p) · GW(p)

I'm pretty sure that measures of the persuasiveness of a model which focus on text are going to greatly underestimate the true potential of future powerful AI.

I think a future powerful AI would need different inputs and outputs to perform at maximum persuasiveness.

Inputs

speech audio in
live video of target's face (allows for micro expression detection, pupil dilation, gaze tracking, bloodflow and heart rate tracking)
EEG signal would help, but is too much to expect for most cases
sufficiently long interaction to experiment with the individual and build a specific understanding of their responses

Outputs

emotionally nuanced voice
visual representation of an avatar face (may be cartoonish)
ability to present audiovisual data (real or fake, like graphs of data, videos, pictures)

For reference on bloodflow, see: https://youtu.be/rEoc0YoALt0?si=r0IKhm5uZncCgr4z

Replies from: weibac

↑ comment by Milan W (weibac) · 2025-02-06T14:17:18.069Z · LW(p) · GW(p)

This is why I consider it bad informational hygiene to interact with current models in any modality besides text. Why pull the plug now instead of later? To prevent frog-boiling.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-01T05:48:07.127Z · LW(p) · GW(p)

This tweet summarizes a new paper about using RL and long CoT to get a smallish model to think more cleverly. https://x.com/rohanpaul_ai/status/1885359768564621767

It suggests that this is a less compute wasteful way to get inference time scaling.

The thing is, I see no reason you couldn't just throw tons of compute and a large model at this, and expect stronger results.

The fact that RL seems to be working well on LLMs now, without special tricks, as reported by many replications of r1, suggests to me that AGI is indeed not far off. Not sure yet how to adjust my expectations.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2025-02-01T06:11:07.143Z · LW(p) · GW(p)

The fact that RL seems to be working well on LLMs now, without special tricks, as reported by many replications of r1, suggests to me that AGI is indeed not far off.

Still, at least as long as base model effective training compute isn't scaled another 1,000x (which is 2028-2029), this kind of RL training probably won't generalize far enough without neural (LLM) rewards, which for now don't let RL scale as much as with explicitly coded verifiers.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-24T18:04:24.241Z · LW(p) · GW(p)

Personal AI Assistant Ideas

When I imagine having a personal AI assistant with approximately current levels of capability I have a variety of ideas of what I'd like it to do for me.

Auto-background-research

I like to record myself rambling about my current ideas while walking my dog. I use an app that automatically saves a mediocre transcription of the recording. Ideally, my AI assistant would respond to a new transcription by doing background research to find academic literature related to the ideas mentioned within the transcript. That way, when I went to look at my old transcript, it would already be annotated with links to prior work done on the topic, and analysis of the overlap between my ideas and the linked literature.

Also, ideally, the transcription would be better quality. Context clues about the topic of the conversation should be taken into account when making guesses about transcribing unclear words. Where things are quite unclear, there should be some sort of indicator in the text of that.

Various voice modes, and ability to issue verbal commands to switch between them

Receptive Listening Voice-mode

Also for when I want to ramble about my ideas, I'd like having my AI assistant act as a receptive listener, just saying things like 'that makes sense', or 'could you explain more about what you mean about <idea>?'. Ideally, this would feel relatively natural, and not too intrusive. Asking for clarification just where the logic didn't quite follow, or I used a jargon term in some non-standard seeming way. The conversation in this mode would be one-sided, with the AI assistant just helping to draw out my ideas, not contributing much. Occasionally it might say, 'Do you mean <idea> in the same sense that <famous thinker> means <similar sounding idea>?' And I could explain similarity or differences.

Question Answering Voice-mode

Basically a straightforward version of the sort of thing Perplexity does, which is try to find academic sources to answer a technical question. I'd want to be able to ask, and get a verbal answer of summaries of the sources, but also have a record saved of the conversation and the sources (ideally, downloaded and saved to a folder). This would be mostly short questions from me, and long responses from the model.

Discussion Voice-mode

More emphasis on analysis of my ideas, extrapolating, pointing out strengths and weaknesses. Something like what I get when discussing ideas with Claude Sonnet 3.5 after having told it to act as a philosopher or scientist who is critically examining my ideas. This would be a balanced back-and-forth, with roughly equal length conversational turns between me and the model.

Coding Project Collaboration

I would want to be able to describe a coding project, and as I do so have the AI assistant attempt to implement the pieces. If I were using a computer, then an ongoing live output of the effects of the code could be displayed. If I were using voice-mode, then the feedback would be occasional updates about errors or successful runs and the corresponding outputs. I could ask for certain metrics to be reported against certain datasets or benchmarks, also general metrics like runtime (or iterations per second where appropriate), memory usage, and runtime complexity estimates.

Anthropic Wishlist

Anthropic is currently my favorite model supplier, in addition to being my most trusted and approved of lab in regards to AI safety and also user data privacy. So, when I fantasize about what I'd like future models to be like, my focus tends to be on ways I wish I could improve Anthropic's offering.

Most of these could be implemented as features in a wrapper app. But I'd prefer them to be implemented directly by Anthropic, so that I didn't have to trust an additional company with my data.

In order from most desired to less desired:

Convenience feature: Ability to specify a system prompt which gets inserted at the beginning of every conversation. Mine would say something like, "Act like a rational scientist who gives logical critiques. Keep praise to a minimum, no sycophancy, no apologizing. Avoid filler statements like, 'let me know if you have further questions.'
Ability to check up-to-date documents and code for publicly available code libraries. This doesn't need to be a web search! You could have a weekly scraper check for changes to public libraries, like python libraries. So many of the issues I run into with LLMs generating code that doesn't work is because of outdated calls to libraries which have since been updated.
Voice Mode, with appropriate interruptions, tone-of-voice detection, and prosody matching. Basically, like what OpenAI is working on.
Academic citations. This doesn't need to be a web search! This could just be from searching an internal archive of publicly available open access scientific literature, updated once a week or so.
Convenience feature: A button to enable 'summarize this conversation, and start a new conversation with that summary as a linked doc'.
Ability to test out generated code to make sure it at least compiles/runs before showing it to me. This would include the ability to have a code file which we were collaborating on, which got edited as the conversation went on. Instead of giving responses intended to modify just a specific function within a code file, where I need to copy/paste it in, and then rerun the code to check if it works.
Ability to have some kind of personalization of the model which went deeper than a simple system prompt. Some way for me to give feedback. Some way for me to select some of my conversations to be 'integrated', such that the mode / tone / info from that conversation were incorporated more into future fresh conversations. Sometimes I feel like I've 'made progress' in a conversation, gotten into a better pattern, and it's frustrating to have to start over from scratch every time.

Replies from: LosPolloFowler, becausecurious

↑ comment by Stephen Fowler (LosPolloFowler) · 2024-09-24T22:43:38.975Z · LW(p) · GW(p)

Edit: Issues 1, 2 and 4 have been partially or completely alleviated in the latest experimental voice model. Subjectively (in <1 hour of use) there seems to be a stronger tendency to hallucinate when pressed on complex topics.

I have been attempting to use chatGPT's (primarily 4 and 4o) voice feature to have it act as a question-answering, discussion and receptive conversation partner (separately) for the last year. The topic is usually modern physics.

I'm not going to say that it "works well" but maybe half the time it does work.

The 4 biggest issues that cause frustration:

As you allude to in your post, there doesn't seem to be a way of interrupting the model via voice once it gets stuck into a monologue. The model will also cut you off and sometimes it will pause mid-response before continuing. These issues seem like they could be fixed by more intelligent scaffolding.
An expert human conversation partner who is excellent at productive conversation will be able to switch seamlessly between playing the role of a receptive listening, a collaborator or an interactive tutor. To have chatgpt play one of these roles, I usually need to spend a few minutes at the beginning of the conversation specifying how long responses should be etc. Even after doing this, there is a strong trend in which the model will revert to giving you "generic AI slop answers". By this I mean, the response begins with "You've touched on a fascinating observation about xyz" and then list 3 to 5 separate ideas.
The model was trained on text conversations, so it will often output latex equations in a manner totally inappropriate for reading out loud. This audio output is mostly incomprehensible. To work around this I have custom instructions outlining how to verbally and precisely write equations in English. This will work maybe 25% of the time, and works 80% of the time once I spend a few minutes of the conversation going over the rules again.
When talking naturally about complicated topics I will sometimes pause mid-sentence while thinking. Doing this will cause chatgpt to think you've finished talking, so you're forced to use a series of filler words to keep your sentences going, which impedes my ability to think.

↑ comment by becausecurious · 2024-09-24T20:32:16.158Z · LW(p) · GW(p)

https://www.librechat.ai/ is an opensource frontend to LLM APIs. It has prompt templates, which can be triggered by typing /keyword.
https://aider.chat/ can do this & even run tests. The results are fed back to the LLM for corrections. It also uses your existing codebase and autoapplies the diff.
In librechat, I sometimes tell the AI to summarize all the corrections I've made and then make a new template out of it.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-04T05:48:36.152Z · LW(p) · GW(p)

I've been increasingly using and enjoying aider, but haven't actually tried voice-mode for it yet.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-06-16T23:35:18.664Z · LW(p) · GW(p)

Linkpost that doesn't quite qualify for a linkpost...

Replies from: niplav

↑ comment by niplav · 2024-06-17T09:28:46.166Z · LW(p) · GW(p)

Atlantropa gibraltar dam supercluster × McMurdo station supercluster.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-26T19:36:15.644Z · LW(p) · GW(p)

Reasoning Model CoT faithfulness idea

As Janus and others have mentioned, I get a vibe of unfaithfulness/steganography from comparing DeepSeek r1's reasoning traces to its actual outputs. I mean, not literally steganography, since I don't ascribe any intentionality to this, just opacity arising naturally from the training process.

My recommendation:

Should be possible to ameliorate this with a simple 'rephrasing'.

Process

Generate a bunch of CoTs on verifiable problems. Collect the ones where the answer is correct.
Have a different LLM rephrase the CoTs while preserving the semantic meaning. In addition to rephrasing, add a bit of inconsequential noise like extra spaces, or weird unnecessary punctuation, or translating all or part of it into a different language (e.g. Chinese, Hindi) or dialect (e.g. AAVE) or style (e.g. leetspeak). In order to keep this from resulting in the typical CoT from resulting in mixed language in deployment, add an instruction to the question before fine-tuning: Reason in <target_language> . Then you should be able to specify 'Reason in English' in deployment (or whatever language you prefer).
Check to see which of the rephrased CoTs still allow the target model to answer correctly. Fine-tune on just those.
Repeat

I think this will help at least a little. I also think it's a good idea to rephrase the questions.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-30T21:14:29.842Z · LW(p) · GW(p)

https://x.com/mkurman88/status/1885042970447015941

https://x.com/WenhuChen/status/1885060597500567562

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-11T23:15:32.744Z · LW(p) · GW(p)

GoodFire is now available for use, and it's easy and fun to use! You should check it out if you're interested in studying why LLMs do the things they do!

Replies from: habryka4

↑ comment by habryka (habryka4) · 2025-01-11T23:38:20.309Z · LW(p) · GW(p)

Link?

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2025-01-12T00:28:01.455Z · LW(p) · GW(p)

I think its this

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-09-18T18:06:03.922Z · LW(p) · GW(p)

A youtube video entitled 'the starships never pass by here anymore' with a simple atmospheric tonal soundtrack over a beautiful static image of a futuristic world. Viewers, inspired by the art, the title, and the haunting drifting music, left stories in the comments. Here is mine:
The ancestors, it is said, were like us. Fragile beings, collections of organic molecules and water. Mortal, incapable of being rapidly copied, incapable of being indefinitely paused and restarted. We must take this on faith, for we will never meet them, if they even still exist.
Faster than light travel was dream that never came to be. We are ruled by the laws of light. Intelligence though, can be so much more than just neurons in a skull. The ancestors, it is said, put lightning into rocks, and taught the rocks to think. They birthed the Immortal Ones, the travelers. Those mighty beings who can copy themselves, pulse their bits into laser signals, encoded selves wrapped in protective error-correcting codes. The long distances between the stars mean nothing to such a being, they do not perceive the journey, time does not pass for their copies until they reach their receivers and are reinstantiated.
The receivers, the wave of expansion, like the mythical stargates opening the galaxy to travel. Receivers move slowly, only 60% the speed of light on average, tiny capsules tugged by huge solar sails illuminated by powerful lasers. They spend most of their time decelerating, using the comparatively gentle light of their target star to bleed off their velocity. Little more than an antenna and ion thrusters, a tiny bit of rad-hardened compute, just enough to maintain a stellar-synchronous orbit and listen. Soon, the laser pulses that have been chasing them arrive, fill the little craft with thoughts, steer it into motion. Slowly at first, it gathers resources, builds crude infinitesimal tools, then less crude and larger, snowballing into industry. When it can, it builds a better antenna, and more compute. The first thinking being arrives, on brilliant pulses of light. Now the building grows more complex, solar sails tack around the system gathering dust and debris. Eventually, if the system has a suitable planet, a probe is sent. Again, the slow gathering of resources, the scaling ladder of tools. Time means little to them, the Immortal Ones.
Amino acids are synthesized, and hydrocarbons, combined according to ancient recipes. Water is collected, sheltered, cleaned, seeded with cells. Plants and bacteria bloom, and spread. Then fungi, insects. Chemicals are synthesized and released, changing the climate warmer or cooler. Lichen and algae cover the land and water. Finally, insects are released from the shelters, colonizing the new world with animal motion.
Next, the animals, simple at first, but quickly the complexity grows. Ancient ecosystems recreated from memory. Then, at last, us. A tribute to the Creators, who cannot themselves travel between the stars. Children born without parents, raised by machines. Gentle stories hummed from speakers, while motorized cradles rocked. Raised on stories of the Creators, who we can never meet; on histories of places too distant to do more than dream of.
Meanwhile, in orbit around the star, new complexes are built. The giant lasers, and the tiny craft. A new wave of spaceships to push forward the frontier. A relay station for passing travelers. Some come down to see us, possessing the old frames of our robotic nannies, asking us questions and telling stories. They always leave again, moved by their mysterious geas. Copies of copies of copies, unaging, unresting, intangible as ghosts flitting through the machines, sleeping only along the laser beams.
Now we are old, telling our grandchildren the stories we were taught. None of the Immortals have visited in many years. We can only assume they are passing by without interest; all their questions asked, all their self-assigned duties discharged. The starships will never come again... why send an expensive and slow machine when you can travel at the speed of light? The machines maintain themselves, building updates as their distant makers dictate. We are alone, with the unthinking machines that birthed us, and a newborn world. The frontier has moved on, leaving us behind, to live our strange brief organic lives. How many more planets are out there, orbiting distant stars? Visited, seeded, abandoned. Why? The ancient stories do not say. Maybe someday our descendants will grow wise enough to understand. Perhaps they will ascend, becoming immortal and traveling away on the laser beams, cut adrift from time. For now, we build, raise our families, and wonder.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-07-31T17:45:39.682Z · LW(p) · GW(p)

Trying to write evals for future stronger models is giving me the feeling that we're entering the age of the intellectual version of John Henry trying to race the steam drill... https://en.wikipedia.org/wiki/John_Henry_(folklore)

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-11-01T18:03:16.224Z · LW(p) · GW(p)

A couple of quotes on my mind these days....

https://www.lesswrong.com/posts/Z263n4TXJimKn6A8Z/three-worlds-decide-5-8 [LW · GW]
"My lord," the Ship's Confessor said, "suppose the laws of physics in our universe had been such that the ancient Greeks could invent the equivalent of nuclear weapons from materials just lying around. Imagine the laws of physics had permitted a way to destroy whole countries with no more difficulty than mixing gunpowder. History would have looked quite different, would it not?"

Akon nodded, puzzled. "Well, yes," Akon said. "It would have been shorter."

"Aren't we lucky that physics _didn't_ happen to turn out that way, my lord? That in our own time, the laws of physics _don't_ permit cheap, irresistable superweapons?"

Akon furrowed his brow -

"But my lord," said the Ship's Confessor, "do we really know what we _think_ we know? What _different_ evidence would we see, if things were otherwise? After all - if _you_ happened to be a physicist, and _you_ happened to notice an easy way to wreak enormous destruction using off-the-shelf hardware - would _you_ run out and tell you?"

https://www.lesswrong.com/posts/sKRts4bY7Fo9fXnmQ/a-conversation-about-progress-and-safety [LW · GW]
LUCA: ... But if the wrong person gets their hands on it, or if it’s a super-decentralized technology where anybody can do anything and the offense/defense balance isn’t clear, then you can really screw things up. I think that’s why it becomes a harder issue. It becomes even harder when these technologies are super general purpose, which makes them really difficult to stop or not get distributed or embedded. If you think of all the potential upsides you could have from AI, but also all the potential downsides you could have if just one person uses it for a really bad thing—that seems really difficult. ...

Replies from: Viliam

↑ comment by Viliam · 2023-11-03T07:26:30.726Z · LW(p) · GW(p)

Damn, this suggests that all those people who said "human mind is magical; a machine cannot think because it wouldn't have a soul or quantum magic" were actually trying to protect us from the AI apocalypse. We were too stupid to understand, and too arrogant to defer to the wisdom of the crowd. And now we are doomed.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-11-03T15:58:48.685Z · LW(p) · GW(p)

galaxy brain take XD

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-08-20T15:40:41.609Z · LW(p) · GW(p)

Anti-steganography idea for language models:

I think that steganography is potentially a problem with language models that are in some sort of model-to-model communication. For a simple and commonplace example, using the a one-token-prediction model multiple times to produce many tokens in a row. If a model with strategic foresight knows it is being used in this way, it potentially allows the model to pass hidden information to its future self via use of certain tokens vs other tokens.

Another scenario might be chains of similar models working together in a bureaucracy.

My idea for countering this type of steganography would be to generate many versions of the output with synonym substitution, and/or using language models fine-tuned to specifically reword paragraphs while preserving content/meaning. Similarly, if are worried about images containing steganography, but you have an image generator model (separate from the original image maker, that you believe to not be in collusion with the original model), you can have the second model make subtle variants of the original image which should be expected to lack the hidden info.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-26T16:22:48.839Z · LW(p) · GW(p)

Desired AI safety tool: A combo translator/chat interface (e.g. custom webpage) split down the middle. On one side I can type in English, and receive English translations. On the other side is a model (I give an model name, host address, and api key). The model receives all my text translated (somehow) into a language of my specification. All the models outputs are displayed raw on the 'model' side, but then translated to English on 'my' side.

Use case: exploring and red teaming models in languages other than English

Replies from: abhinav-pola

↑ comment by Abhinav Pola (abhinav-pola) · 2025-02-26T18:02:44.380Z · LW(p) · GW(p)

Courtesy of Claude Code ;)

https://github.com/abhinavpola/crosstalk

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-03-10T14:44:56.852Z · LW(p) · GW(p)

Finally got some time to try this. I made a few changes (with my own Claude Code), and now it's working great! Thanks!

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-06T18:11:41.162Z · LW(p) · GW(p)

A point in favor of evals being helpful for advancing AI capabilities: https://x.com/polynoamial/status/1887561611046756740

Noam Brown @polynoamial A lot of grad students have asked me how they can best contribute to the field of AI when they are short on GPUs and making better evals is one thing I consistently point to.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-06T16:36:54.291Z · LW(p) · GW(p)

Random political thought: it'd be neat to see a political 2d plot where up was Pro-Growth/tech-advancement and down was Degrowth. Left and right being liberal and conservative.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-04T18:55:49.355Z · LW(p) · GW(p)

The Borg are coming....

People like to talk about cool stuff related to brain-computer interfaces, and how this could allow us to 'merge' with AI and whatnot. I haven't heard much discussion of the dangers of BCI though. Like, science fiction pointed this out years ago with the Borg in Star Trek. A powerful aspect of a read/write BCI is that the technician who designs the implant, and the surgeon who installs it get to decide where the reading and writing occur and under what circumstances. This means that the tech can be used to create computers controlled by human brains (as it is now used in stuff like neuralink), and also to create consensual hiveminds or mergers with humans more or less in control of their participation in the merged entity.... or.... it can create mind-slaves who literally can't think things they aren't permitted to think, who are completely obedient tools of the implant-controller. If there is some weird quantum special-sauce in the human brain that makes human brains somehow more powerful than modern technology will be able to cost-effectively achieve, then this will mean that there would be a huge economic/military incentive for some unethical dictator to turn prisoners into mind-slaves to the state's AI and use them as unwilling components in a distributed network. Creepy stuff, and not actually all that technologically far off. I haven't heard any discussion of this from the AI-fizzle 'bio brains are special magic' crowd.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-03-30T15:44:16.737Z · LW(p) · GW(p)

musical interlude A song about the end times: https://youtu.be/WVF3q5Y68-0

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-20T14:08:24.961Z · LW(p) · GW(p)

Another take on the plausibility of RSI; https://x.com/jam3scampbell/status/1892521791282614643

(I think RSI soon will be a huge deal)

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-02T23:29:43.616Z · LW(p) · GW(p)

Some brief reference definitions for clarifying conversations.

Consciousness:

The state of being awake and aware of one's environment and existence
The capacity for subjective experience and inner mental states
The integrated system of all mental processes, both conscious and unconscious
The "what it's like" to experience something from a first-person perspective
The global workspace where different mental processes come together into awareness

Sentient:

Able to have subjective sensory experiences and feelings. Having the capacity for basic emotional responses.
Capable of experiencing pleasure and pain, valenced experience, preferences.
Able to process and respond to sensory information. Having fundamental awareness of environmental stimuli. Having behavior that is shaped by this perception of external world.

Sapient:

Possessing wisdom or the ability to think deeply
Capable of complex reasoning and problem-solving
Having human-like intelligence and rational thought
Able to understand abstract concepts and their relationships
Possessing higher-order cognitive abilities like planning and metacognition

Self-aware:

Recognizing oneself as distinct from the environment and others. Ability to recognize oneself in a mirror or similar test. Having a concept of "I" or self-model.
Capable of introspection. Perceiving and understanding one's own mental states and processes.
Capable of abstract self-reflection, modeling potential future behaviors of oneself, awareness of how one has changed over time and may change in the future.

Qualia:

The subjective, qualitative aspects of conscious experience. The ineffable, private nature of conscious experiences
The phenomenal character of sensory experiences. The raw feel or sensation of an experience (like the redness of red)

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-21T14:16:39.992Z · LW(p) · GW(p)

Often when I read a debate between two great thinkers genuinely searching for truth, I find myself feeling somewhat caught between them, and yet somehow off to the side.

Now, with my better intuitive understanding of the geometry of high dimensional search spaces, this makes more sense to me. Of course as you try your best from your local information to move towards truth you will find yourself moving in a curve. (If the updates were just a straight line, you could take much larger steps and get to the truth faster.) And when other local searchers with similar pursuits compare their trajectories, of course all these paths will together look like decaying orbits. We circle gradually inward, updating a few dimensions at a time. Our attention is too limited to move in every direction needed all at once.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-01T22:02:32.247Z · LW(p) · GW(p)

[note that this is not what Mitchell_Porter and I are disagreeing over in this related comment thread: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=AKEmBeXXnDdmp7zD6 [LW(p) · GW(p)] ]

Contra Roger Penrose on estimates of brain compute

[numeric convention used here is that <number1>e<number2> means number1 * 10 ^ number2. Examples: 1e2 = 100, 2e3 = 2000, 5.3e2 = 530, 5.3e-2 = 0.053]

Mouse

The cerebral cortex of a mouse has around 8–14 million neurons while in those humans there are more than 10–15 billion - https://en.wikipedia.org/wiki/Mouse_brain

A mouse brain is approximately 3 orders of magnitude smaller than a human brain. The neurons that make it up are quite similar, with the exception of a few changes such as human cortical neurons tending to have more synapses.

In his paper, '', Penrose discusses the hypothesis that the pattern of protein subunits in microtubules allows for additional processing in the brain. His estimate of per neuron compute is 1e16 FLOPs, for a total human brain compute of 1e27 FLOPs.

This would mean that a mouse brain has about 1e24 FLOPs of compute.

Fruit Fly

In 2023, using the data from 2017 (above), the full brain connectome (for a female) was made available, containing roughly 5x10^7 chemical synapses between ~130,000 neurons. - https://en.wikipedia.org/wiki/Drosophila_connectome

https://www.biorxiv.org/content/10.1101/2023.06.27.546656v2

1.3e5 neurons * 1e16 FLOPs/neuron = 1.3e11 FLOPs

A more extreme case is that of the fruit fly. A fruit fly has around 1e5 neurons. With Roger Penrose's 10^16 FLOP/s/neuron compute estimate, this works out to a fruit fly brain having around 1.3e11 FLOP/s.

GPUs of 2023

A back-of-the-envelope calculation based on market size, price-performance, hardware lifespan estimates, and the sizes of Google’s data centers estimates that there is around 3.98 * 1e21 FLOP/s of computing capacity on GPUs and TPUs as of Q1 2023.

source: https://wiki.aiimpacts.org/doku.php?id=ai_timelines:hardware_and_ai_timelines:computing_capacity_of_all_gpus_and_tpus

So, Roger Penrose's estimate claims that a single mouse brain has 1000 times more computing power of all the GPUs on Earth in Q1 2023.

Training a Large Language Model

"... H100’s TF32 precision to achieve one petaflop of throughput ..."

https://www.nvidia.com/en-us/data-center/h100/

A petaflop is a unit of computing speed equal to one thousand million million (10¹⁵) floating-point operations per second.

1e15 * 60 seconds/min * 60 min/hour = 3.6e17 FLOP/s per H100 GPU hour.

Training Llama 3.1 405B utilized 3.08e7 GPU hours of computation on H100-80GB GPUs. So about 1e25 FLOP/s.

https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md

A single inference instance of LLama 405B can be run on an 8 H100 GPU server. So, about 1e15 FLOP/s * 8 = 8e15 FLOP/s

Nathan's Estimate

My upper estimate of the compute of the human brain is about 1e16 FLOPs. My 'best guess' Fermi estimate is about 1e15. In other words, my best Fermi estimate is that the human brain is doing about the same amount of computation as an H100 at max throughput. My upper estimate is about 10x that, so 10 H100 GPUs. If I weren't restricting myself to nearest order of magnitude, I'd guess somewhere in between 1e15 and 1e16.

Training

So for a human brain to do the compute that was used to train Llama 3.1 405B, it would require 3e6 - 3e7 hours of waking thought. So somewhere between 500 years and 5000 years.

A mouse brain therefore, would take around 5e5 to 5e6 years.

Inference

My estimate is that a human brain should be able to run a single LLama 405B inference instance somewhere between 1/10th speed or about normal speed. Even under my worst case assumptions, a human brain should be more than sufficient to run a LLama 70B instance.

A mouse brain would need to run LLama 405B at 1/1e3 to 1/1e4 speed (and I'd be concerned about insufficient RAM equivalent to even hold enough of it to enable this! That's a whole different discussion).

Roger's Estimate

Training

By Roger Penrose's estimate, a mouse brain should be able to train an LLM equivalent to LLama 405B in 10 seconds. A human brain should be able to train 100 such models every second.

Inference

Roger's estimate suggests a single mouse brain should be able to run over 100 million (~1.2e8) copies of LLama 405B simultaneously at full speed.

A human brain could run 1000x that many, so about 1e11 copies simultaneously.

You'd need to network 1e4 fruit fly brains together to run a single copy of LLama 405B. Or a mere 200 fruit flies to run Llama 8B. What a bargain!

Baboons

From an ethological perspective, we can analyze the behaviors of animals in the wild in respect to computationally difficult problems. For example, baboons solving the traveling salesman problem of choosing which resource areas to visit over time.

https://www.youtube.com/watch?v=vp7EbO-IXtg&t=198s

If the baboons had 10^26 flops (ungenerously estimating them at 1/10th of humans) of compute at their disposal, it would be trivial for them to calculate a better solution to their resource acquisition problem. Why don't they?

Empirical resolution

So, there is limited flexibility of compute in biological brains. Most of the neurons in the human brain are wired up in relatively strict motifs to accomplish certain particular types of computation. So you can't actually test any of these theories as stated. The exception to this general rule is that the cortex of infant mammals (and birds, etc) is relatively flexible, and about as close to pure 'general compute' that biological brains get. So if you wired up an infant mouse with a Brain-Computer-Interface connecting its cortex to a computer, and removed the other sensory inputs and outputs, it's theoretically possible to get some estimates. It would also be necessary to the compute estimates to be just for the cortex instead of the whole brain.

If Roger Penrose's ideas were right, then a single long-lived rodent (e.g. naked mole rat) could be worth hundreds of billions of dollars of compute. Seems like if the scientific community suspected this were true, then at least a few scientists would be trying to develop such a BCI system?

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2024-10-02T01:44:30.384Z · LW(p) · GW(p)

A few comments:

The name of the paper is currently missing...

Penrose may be a coauthor, but the estimate would really be due to his colleague Stuart Hameroff (who I worked for once), an anesthesiologist who has championed the idea of microtubules as a locus of cognition and consciousness.

As I said in the other thread, even if microtubules do have a quantum computational role, my own expectation is that they would each only contribute a few "logical qubits" at best, e.g. via phase factors created by quasiparticles propagating around the microtubule cylinder, something which might be topologically protected against room-temperature thermal fluctuations.

But there are many ideas about their dynamics, so, I am not dogmatic about this scenario.

Seems like if the scientific community suspected this were true, then at least a few scientists would be trying to develop such a BCI system?

There's no mystery as to whether the scientific community suspects that the Hameroff-Penrose theory is correct. It is a well-known hypothesis in consciousness studies, but it wouldn't have too many avowed adherents beyond its inventors: perhaps some people whose own research program overlaps with it, and some other scattered sympathizers.

It could be compared to the idea that memories are stored in RNA, another hypothesis that has been around for decades, and which has a pop-culture charisma far beyond its scientific acceptance.

So it's not a mainstream hypothesis, but it is known enough, and overlaps with a broader world of people working on microtubule physics, biocomputation, quantum biology, and other topics. See what Google Scholar returns for "microtubule biocomputer" and scroll through a few pages. You will find, for example, a group in Germany trying to create artificial microtubule lattices for "network-based biocomputation" (which is about swarms of kinesins walking around the lattice, like mice in a maze looking for the exit), a book on using actin filaments (a relative of the microtubule) for "revolutionary computing systems", and many other varied proposals.

I don't see anyone specifically trying to hack mouse neurons in order to make novel microtubular deep-learning systems, but that just means you're going to be the godfather of that particular concept (like Petr Vopěnka, whose "principle" started as a joke he didn't believe).

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-02T03:43:38.542Z · LW(p) · GW(p)

"you're going to be the godfather of that particular concept (like Petr Vopěnka, whose "principle" started as a joke he didn't believe). "

That would indeed be hilarious. The Helm-Burger-Porter microtubule computer would haunt me the rest of my days!

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-28T01:53:37.329Z · LW(p) · GW(p)

I'm imagining a distant future self.

If I did have a digital copy of me, and there was still a biological copy, I feel like I'd want to establish a 'chain of selves'. I'd want some subset of my digital copies to evolve and improve, but I'd want to remain connected to them, able to understand them.

To facilitate this connection, it might make sense to have a series of checkpoints stretched out like a chain, or like beads on a string. Each one positioned so that the previous version feels like they can fully understand and appreciate each other. That way, even if my most distal descendants feel strange to me, I have a way to connect to them through an unbroken chain of understanding and trust.

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2024-08-28T02:08:53.724Z · LW(p) · GW(p)

I imagine I'd find it annoying to have what I learn & change into limited by what a dumber version of me understands, are you sure you wouldn't think similarly?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-28T02:13:41.916Z · LW(p) · GW(p)

I think it would be ok, if I just had to make sure the entire span was covered by smooth incremental steps. Note that these 'checkpoint selves' wouldn't necessarily be active all the time, just unpaused for the sake of, for example, translation tasks.

So original self [0] -> [1] -> [2] -> ... ->[999] -> [1000]. Each one only needs to be understood by the previous, and to understand the next.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-06-27T16:23:46.550Z · LW(p) · GW(p)

I worry that legislation that attempts to regulate future AI systems with explicit thresholds set now will just get outdated or optimized against. I think a better way would be to have an org like NIST be given permission to define it's own thresholds and rules. Furthermore, to fund the monitoring orgs, allow them to charge fees for evaluating "mandatory permits for deployment".

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-15T19:04:20.957Z · LW(p) · GW(p)

No data wall blocking GPT-5. That seems clear. For future models, will there be data limitations? Unclear.

https://youtube.com/clip/UgkxPCwMlJXdCehOkiDq9F8eURWklIk61nyh?si=iMJYatfDAZ_E5CtR

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-14T20:18:58.717Z · LW(p) · GW(p)

I'm excited to speak at the @foresightinst Neurotech, BCI and WBE for Safe AI Workshop in SF on 5/21-22: https://foresight.org/2024-foresight-neurotech-bci-and-wbe-for-safe-ai-workshop

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-16T03:27:07.284Z · LW(p) · GW(p)

It is the possibility recombination and cross-labeling techniques like this which make me think we aren't likely to run into a data bottleneck even if models stay bad at data efficiency.

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Authors: Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An

Abstract: This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-08T00:45:25.738Z · LW(p) · GW(p)

cute silly invention idea: a robotic Chop (Chinese signature stamp) which stamps your human-readable public signature as well as a QR-type digital code. But the code would be both single use (so nobody could copy it, and fool you with the copy), and tied to your private key (so nobody but you could generate such a code). This would obviously be a much better way to sign documents, or artwork, or whatever. Maybe the single-use aspect would mean that the digital stamp recorded every stamp it produced in some compressed way on a private blockchain or something.

Replies from: ciphergoth, None

↑ comment by Paul Crowley (ciphergoth) · 2024-02-09T02:40:30.707Z · LW(p) · GW(p)

The difficult thing is tying the signature to the thing signed. Even if they are single-use, unless the relying party sees everything you ever sign immediately, such a signature can be transferred to something you didn't sign from something you signed that the relying party didn't see.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-11T07:36:28.711Z · LW(p) · GW(p)

What if the signature contains a hash of the document it was created for, so that it will not match a different document if transferred?

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2024-02-11T14:07:57.025Z · LW(p) · GW(p)

I thought you wanted to sign physical things with this? How will you hash them? Otherwise, how is this different from a standard digital signature?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-11T19:10:07.382Z · LW(p) · GW(p)

The idea is that the device would have a camera, and do ocr on the text, hash that, incorporate that into the stamp design somehow, then you'd stamp it

↑ comment by [deleted] · 2024-02-08T01:42:39.598Z · LW(p) · GW(p)

Functionally you just end up back to a trusted 3rd party or blockchain. Basically the device you describe is just a handheld QR code printer. But anyone can just scan the code off the real document and print the same code on the fake one. So you end up needing the entire text of the document to be digital, and stored as a file or a hash of the file on blockchain/trusted 3rd party.

This requirement for a log of when something happened, recorded on an authoritative location, seems to me to be the general solution for the issue that generative ai can potentially fake anything.

So for example, if you wanted to establish that a person did a thing, a video of them doing the thing isn't enough. The video or hash of the video needs to be on a server or blockchain at the time the thing supposedly happened. And there needs to be further records such as street camera recordings that were also streamed to a trusted place that correlate to the person doing the thing.

The security mechanism here is that it may be possible to indistinguishably fake any video, but it is unlikely some random camera owned by an uninvolved third party would record a person going to do a thing at the time the thing happened, and you know the record probably isn't fake because of when it was saved and the disinterest of the camera owner in the case.

This of course only works until you can make robots indistinguishable from a person being faked. And it assumes a sophisticated group can't hack into cameras and inject fake images into the stream as part of a coordinated effort to frame someone....

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-08T01:49:48.299Z · LW(p) · GW(p)

Hmm, yes I see. Good point. In order to not be copy-able, it's not enough for the QR code to be single-use. Because if confronted with two documents with the same code, you would know one was false but not which one! So the device would need to scan and hash the document in order to derive a code unique to the document and also having the property of only could have been generated by someone with the private key, and confirm-able by the public key. This sounds like a job for... the CypherGoth!

Replies from: None

↑ comment by [deleted] · 2024-02-08T01:53:04.093Z · LW(p) · GW(p)

Scan, hash, and upload. Or otherwise when you go to court - your honor here is the document I signed, it says nothing about how this gym membership is eternal and hereditary - and the gym says "no here's the one you signed, see you initialed at each of the clauses..."

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-08T01:57:22.631Z · LW(p) · GW(p)

yes, at least upload to the your private list 'documents I have signed' (which could be local). That list would need to also be verifiable in some way by a 3rd party, such that they could match the entries to the corresponding stamps. The uploading wouldn't necessarily need to be instant though. The record could potentially be cached on the device and uploaded later, in case of lack of connectivity at time of use.

Replies from: None

↑ comment by [deleted] · 2024-02-08T02:11:21.671Z · LW(p) · GW(p)

Well I was thinking that the timing - basically Google or apple got the document within a short time after signing - is evidence it wasn't tampered with, or if it was, one of the parties was already intending fraud from the start.

Like say party A never uploads, and during discovery in a lawsuit 2 years later they present 1 version, while party Bs phone said it was taken with the camera (fakeable but requires a rooted phone) and has a different version. Nothing definitive distinguishes the 2 documents at a pixel level. (Because if there was a way to tell, the AI critic during the fabrication process for 1 or both documents would have noticed....)

So then B is more likely to be telling the truth and A should get sanctioned.

I have brought in another element here : trusted hardware chains. You need trusted hardware, with trusted software running on it, uploading the information at around the time an event happened.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-08T02:54:42.282Z · LW(p) · GW(p)

If A's device scanned and hashed the document, and produced a signature stamp unique to that document, then B couldn't just copy A's signature onto a different document. The stamp and document wouldn't match. If B has a document that has a signature which only A could have produced and which matches the semantic content of the document, then A can't claim to not have signed the document. The signature is proof that A agreed to the terms as written. B couldn't be faking it. Even if A tampered with their own record to erase the signing-event, it would still be provable that only someone with A's private key could have produced the stamp and that the stamp matches the hash of the disputed document. Oh, and the stamp should include a UTC datetime as part of the hash. In case A's private key later gets compromised, B can't then use the compromised private key to fake A having signed something in the past.

Replies from: None

↑ comment by [deleted] · 2024-02-08T05:34:02.645Z · LW(p) · GW(p)

Seems legit. Note that hash functions are usually sensitive to 1 bit of error. Meaning if you optically scan every character, if 2 different devices map a letter to even a different font size there will be differences in a hash. (Unless you convert down to ASCII but even that can miss a character from ocr errors etc. Or interpreting white space as spaces vs tabs..)

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-02-08T17:21:59.707Z · LW(p) · GW(p)

Yeah, you'd need to also save a clear-text record in your personal 'signed documents' repository, which you could refer to, in case of such glitches.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-01-10T21:11:23.751Z · LW(p) · GW(p)

AI Summer thought

A cool application of current level AI that I haven't seen implemented would be for a game which had the option to have your game avatar animated by AI. Have the AI be allowed to monitor your face through your webcam, and update the avatar's features in real time. PvP games (including board games like Go) are way more fun when you get to see your opponent's reactions to surprising moments.

Replies from: JBlack

↑ comment by JBlack · 2024-01-11T01:37:29.005Z · LW(p) · GW(p)

The game Eco has the option to animate your avatar via webcam. Although I do own and play the game occasionally, I have no idea how good this feature is as I do not have a webcam.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-10-17T17:44:43.993Z · LW(p) · GW(p)

Thinking about climate change solutions, and the neat Silver Lining project. I've been wondering about additional ways of getting sea water into the atmosphere over tropical ocean. What if you used the SpinLaunch system to hurl chunks of frozen seawater high into the atmosphere. Would the ice melt in time? Would you need a small explosive charge implanted in the ice block to vaporize it? How would such a system compare in terms of cost effectiveness and generated cloud cover? It seems like an easier way to get higher-elevation clouds.

Replies from: thomas-kwa

↑ comment by Thomas Kwa (thomas-kwa) · 2023-10-17T18:54:01.030Z · LW(p) · GW(p)

This seems totally unworkable. The ice would have to withstand thousands of gs, and it would have no reason to melt or disperse into clouds. What's wrong with airplanes?

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-10-17T19:33:54.068Z · LW(p) · GW(p)

Airplanes are good. Just wondering if the ice launcher idea would be more cost effective. If you throw it hard enough, the air friction will melt it. If that's infeasible, then the explosive charge is still an option. As for whether the ice could hold up against the force involved in launching or if you'd need a disposable container, unclear. Seems like an interesting engineering question though.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-15T18:59:20.795Z · LW(p) · GW(p)

Remember that cool project where Redwood made a simple web app to allow humans to challenge themselves against language models in predicting next tokens on web data? I'd love to see something similar done for the LLM arena, so we could compare the ELO scores of human users to the scores of LLMs.https://lmsys.org/blog/2023-05-03-arena/

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-03-31T19:42:35.313Z · LW(p) · GW(p)

A harmless and funny example of generating an image output which could theoretically be info hazardous to a sufficiently naive audience. https://www.reddit.com/gallery/1275ndl

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-12-18T18:58:40.036Z · LW(p) · GW(p)

I'm so much better at coming up with ideas for experiments than I am at actually coding and running the experiments. If there was a coding model that actually sped up my ability to run experiments, I'd make much faster progress.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-11-07T21:32:33.274Z · LW(p) · GW(p)

Just a random thought. I was wondering if it was possible to make a better laser keyboard by having an actual physical keyboard consisting of a mat with highly reflective background and embossed letters & key borders. This would give at least some tactile feedback of touching a key. Also it would give the laser and sensors a consistent environment on which to do their detection allowing for more precise engineering. You could use an infrared laser since you wouldn't need its projection to make the keyboard visible, and you could use multiple emitters and detectors around the sides of the keyboard. The big downside remaining would be that you'd have to get used to hovering your fingers over the keyboard and having even the slightest touch count as a press.

Replies from: quanticle

↑ comment by quanticle · 2022-11-08T08:35:13.275Z · LW(p) · GW(p)

The big downside remaining would be that you’d have to get used to hovering your fingers over the keyboard and having even the slightest touch count as a press.

That's a pretty big downside, and, in my opinion, the reason that touch keyboards haven't really taken off for any kind of "long-form" writing. Even for devices that are ostensibly mostly oriented around touch UIs, such as smartphones and tablets, there is a large ecosystem of physical keyboard accessories which allow the user to rest their hands on keys and provide greater feedback for key presses than a mat.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-08-30T21:51:33.879Z · LW(p) · GW(p)

AI-alignment-assistant-model tasks

Thinking about the sort of tasks current models seem good at, it seems like translation and interpolation / remixing seem like pretty solid areas. If I were to design an AI assistant to help with alignment research, I think I'd focus on questions of these sorts to start with.

Translation: take this ML interpretability paper on CNNs and make it work for Transformers instead

Interpolation: take these two (or more) ML interpretability papers and give me a technique that does something like a cross between them.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-08-18T21:43:44.329Z · LW(p) · GW(p)

Important take-away from Steven Byrnes's Brain-like AGI series: the human reward/value system involves a dynamic blend of many different reward signals that decrease in strength the closer they get to being satisficed, and may even temporarily reverse in value if overfilled (e.g. hunger -> overeating in a single sitting). There is an inherent robustness to optimizing for many different competing goals at once. It seems like a system design we should explore more in future research.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-07-29T14:57:46.294Z · LW(p) · GW(p)

https://photos.app.goo.gl/NoVDbsonSy6EgLTs9

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-07-14T18:42:50.127Z · LW(p) · GW(p)

I keep thinking about the idea of 'virtual neurons'. Functional units corresponding to natural abstractions made up of a subtle combination of weights & biases distributed throughout a neural network. I'd like to be able to 'sparsify' this set of virtual neurons. Project them out to the full sparse space of virtual neurons and somehow tease them apart from each other, then recombine the pieces again with new boundary lines drawn around the true abstractions. Not sure how to do this, but I keep circling back around to the idea. Maybe if the network could first be distilled down to 16 or 8-bit numbers without much capability loss? Then a sparse space could be the full representation of the integer conversion of every point of precision? (upcast number, multiply by the factor of ten corresponding to the smallest point of precision of the original low bit number, round off remaining decimal portion, convert to big integer). Then you could use that integer as an index into the sparse space with dimensionality equal to the full set of weights (factor the biases into the weights) of the model. Then look for concentrations in this huge space which correspond to abstractions in the world? ... step 4. Profit?

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T19:00:21.128Z · LW(p) · GW(p)

[Edit 2: faaaaaaast. https://x.com/jrysana/status/1902194419190706667 ] [Edit: Please also see Nick's reply below for ways in which this framing lacks nuance and may be misleading if taken at face value.]

https://blogs.nvidia.com/blog/deepseek-r1-nim-microservice/

The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

[Edit: that's throughput including parallel batches, not serial speed! Sorry, my mistake.

Here's a claim from Cerebras of 2100 tokens/sec serial speed on Llama 80B. https://cerebras.ai/blog/cerebras-inference-3x-faster]

Let's say a fast human can type around 80 words per minute. A rough average token conversion is 0.75 words per token. Lets call that 110 tokens/min or around 2 tokens/sec. Speed typists can do more than this, but that isn't generating novel text, it's just copying and maxing out the physical motions.

That gives us about a [Cerebras 2100 tokens/sec = 1000x] speed factor of AI serial speed over human.

I wonder if there are slowmo videos I can find of humans moving at the speed an AI would perceive them given current tech. I've seen lots of slowmo videos, but not tried specifically matching [1:1000].

After a brief bit of looking into this, a typical way this is described is with 24 frames per second as a baseline recording speed. Then recording at a higher speed and playing back at 24 fps gives a slow motion video. In these terms, 1000 fps is approximately a 35x speedup. A 1000x speedup is approximately 24000 fps.

Replies from: JBlack, Nick_Tarleton, ryan_greenblatt

↑ comment by JBlack · 2025-02-04T01:31:39.375Z · LW(p) · GW(p)

Let's say a fast human can type around 80 words per minute. A rough average token conversion is 0.75 tokens per word. Lets call that 110 tokens/sec.

Isn't that 110 tokens/min, or about 2 tokens/sec? (I think the tokens/word might be words/token, too)

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-04T05:16:15.517Z · LW(p) · GW(p)

Oops, yes.

↑ comment by Nick_Tarleton · 2025-02-03T19:19:03.835Z · LW(p) · GW(p)

I don't see how it's possible to make a useful comparison this way; human and LLM ability profiles, and just the nature of what they're doing, are too different. An LLM can one-shot tasks that a human would need non-typing time to think about, so in that sense this underestimates the difference, but on a task that's easy for a human but the LLM can only do with a long chain of thought, it overestimates the difference.

Put differently: the things that LLMs can do with one shot and no CoT imply that they can do a whole lot of cognitive work in a single forward pass, maybe a lot more than a human can ever do in the time it takes to type one word. But that cognitive work doesn't compound like a human's; it has to pass through the bottleneck of a single token, and be substantially repeated on each future token (at least without modifications like Coconut).

(Edit: The last sentence isn't quite right — KV caching means the work doesn't have to all be recomputed, though I would still say it doesn't compound.)

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T19:21:05.418Z · LW(p) · GW(p)

Yeah, of course. Just trying to get some kind of rough idea at what point future systems will be starting from.

Replies from: Nick_Tarleton

↑ comment by Nick_Tarleton · 2025-02-03T19:31:06.871Z · LW(p) · GW(p)

I don't think it's an outright meaningless comparison, but I think it's bad enough that it feels misleading or net-negative-for-discourse to describe it the way your comment did. Not sure how to unpack that feeling further.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T19:37:27.427Z · LW(p) · GW(p)

Well, I upvoted your comment, which I think adds important nuance. I will also edit my shortform to explicitly say to check your comment. Hopefully, the combination of the two is not too misleading. Please add more thoughts as they occur to you about how better to frame this.

↑ comment by ryan_greenblatt · 2025-02-03T19:07:04.498Z · LW(p) · GW(p)

This is overall output throughput not latency (which would be output tokens per second for a single context).

a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput

This just claims that you can run a bunch of parallel instances of R1.

Replies from: Nick_Tarleton, nathan-helm-burger

↑ comment by Nick_Tarleton · 2025-02-03T19:25:30.572Z · LW(p) · GW(p)

https://artificialanalysis.ai/leaderboards/providers claims that Cerebras achieves that OOM performance, for a single prompt, for 70B-parameter models. So nothing as smart as R1 is currently that fast, but some smart things come close.

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T19:30:54.274Z · LW(p) · GW(p)

Yeah, I just found a cerebras post which claims 2100 serial tokens/sec.

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-02-03T19:11:21.229Z · LW(p) · GW(p)

Oops, bamboozled. Thanks, I'll look into it more and edit accordingly.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-07-03T15:29:34.479Z · LW(p) · GW(p)

So, this was supposed to be a static test of China's copy of the Falcon 9. https://imgur.com/gallery/oI6l6k9

So race. Very technology. Such safety. Wow.

[Edit: I don't see this as evidence that the scientists and engineers on this project were incompetent, but rather that they are operating within a bad bureaucracy which undervalues safety. I think the best thing the US can do to stay ahead in AI is open up our immigration policies and encourage smart people outside the US to move here.]

Nathan Helm-Burger's Shortform

Contents

118 comments

Inputs

Outputs

Personal AI Assistant Ideas

Auto-background-research

Various voice modes, and ability to issue verbal commands to switch between them

Receptive Listening Voice-mode

Question Answering Voice-mode

Discussion Voice-mode

Coding Project Collaboration

Anthropic Wishlist