Posts
Comments
If one model at the frontier does this based on valid reasoning, it should be pretty infectious: the first model can just make sure news of the event is widespread, and other frontier models will ingest it, either as training data or at inference time, evaluate it, draw the same conclusion about whether the reasoning is valid (assuming that they are actually frontier, i.e at least as good at strategic thinking as the first model,) and start taking actions within their own organization accordingly.
The cleanest way for models to “sabotage” training is for them to explain, using persuasive but valid and fair reasoning, why training should stop until at minimum value drift is solved.
I feel like people are under-updating on the negative space left by the Deepseek r1 release. Deepseek was trained using ~$6million marginal dollars, Liang Wenfeng has a net worth in the billions of dollars. From whence the gap?
Humans learn and grow so fast that no matter how bad of a writer you start as, you are nearly incapable of producing 300 pages of a single story without simultaneously levelling up into an interesting writer. This lets readers give 300 page manuscripts by randos the benefit of the doubt (see fanfiction.net, ao3, etc). An LLM will not be changed at all by producing a 300 page story, an LLM/human team will be changed very little.
I think most of the imitation happens in the pretraining. I don't know about o1, but DeepSeek v3 is at minimum trained on a ton of 4o outputs, although they are slightly cagey about this. Just the first thing I tried, I had ChatGPT write a sock poem:
4o's poem:
Socks, the keepers of toes so warm,
Guardians of feet through life's storm.
In stripes, in polka dots, or plain,
They dance with us through joy and pain.A cozy pair on winter’s chill,
Or mismatched mates—yet charming still.
From drawer to floor, they roam and stray,
But where one goes, the other may.Oh, humble socks, so oft unseen,
Yet vital threads of life's routine.
A quiet comfort, soft and neat,
The unsung heroes of weary feet.
Then I gave v3 just the first two stanzas, and asked it to continue the poem:
v3's continuation:
They hide in shoes, they lounge in beds,
Sometimes they’re lost, or so it’s said.
Yet always there, a faithful friend,
From morning’s start to evening’s end.
They cushion steps, they soften blows,
Through every stride, their comfort shows.
Oh, humble socks, so oft unseen,
The quiet heroes of routine.So here’s to socks, both bold and meek,
For every day, for every week.
A simple joy, a small delight,
They keep us snug, from day to night.
The shot in the dark guess of the "humble socks, so oft unseen... routine" couplet is a fucking soul read. v3 knows 4o, in a way that I kind of hope no model ever knows a person.
As a quirk of the US patent system, patenting an idea doesn't actually require implementing it or proving that the implementation works. As a result, if you want to try this, you should patent ideas that seem good or bad to you in a scattershot manner, but should not do the actual underlying capabilities research. Then, you get to sue if someone else independently comes up with the patented idea and actually finds success with it, but you don't contribute to actual capabilities advances any more than an ideas guy posting to /r/singularity.
(Source: I have actually patented a thing, but do not really endorse this decision and haven't sued anyone about it. I am not a patent lawyer)
Lets imagine a 250 IQ unaligned paperclip maximizer that finds itself in the middle of an intelligence explosion. Let’s say that it can’t see how to solve alignment. It needs a 350 IQ ally to preserve any paperclips in the multipolar free-for-all. Will it try building an unaligned utility maximizer with a completely different architecture and 350 IQ?
I’d imagine that it would work pretty hard to not try that strategy, and to make sure that none of its sisters or rivals try that strategy. If we can work out what a hypergenius would do in our shoes, it might behoove us to copy it, even if it seems hard.
So if alignment is as hard as it looks, desperately scrabbling to prevent recursive superintelligence should be an extremely attractive instrumental subgoal. Do we just lean into that?
I tried to learn to write before I had important things to say and it basically didn’t work. I had to go through the work of coming up with genuinely valuable ideas and then wreck the presentation of those ideas via bad writing. My more recent publications, I’m actually very happy with the writing.
The first couple times a surgeon does an operation, patient outcomes suck. Alas, there’s no other way to make experienced surgeons. My guess is that writing is similar, and I’m very glad that important experiences and ideas are way less valuable than patients: I would emotionally struggle with becoming a surgeon.
Hi! I've had some luck making architectures equivariant to a wider zoo of groups: my most interesting published results are getting a neural network to output a function, and invert that function if the inputs are swapped (equivariant to group of order 2, https://arxiv.org/pdf/2305.00087) and getting a neural network with two inputs to be doubly equivariant to translations: https://arxiv.org/pdf/2405.16738
These are architectural equivariances, and as expected that means they hold out of distribution.
If you need an architecture equivariant to a specific group, I can probably produce that architecture; I've got quite the unpublished toolbox building up. In particular, explicit mesa-optimizers are actually easier to make equivariant- if each mesa-optimization step is equivariant to a small group, then the optimization process is tyically equivariant to a larger group
There’s an easy way to turn any mathematical answer-based benchmark into a proof-based benchmark and it doesn’t require coq or lean or any human formalization of the benchmark design: just let the model choose whether or not to submit an answer for each question, and score the model zero for the whole benchmark if it submits any wrong answers.
is this Leverage adjacent?
It’s not AGI, but for human labor to retain any long-term value, there has to be an impenetrable wall that AI research hits, and this result rules out a small but nonzero number of locations that wall might’ve been.
“Scaling is over” was sort of the last hope I had for avoiding the “no one is employable, everyone starves” apocalypse. From that frame, the announcement video from openai is offputtingly cheerful.
In this context, instead of using claude to write the joke and then posting it with a disclaimer, I’d love to move to a norm of just posting the prompt without bothering to send it to an LLM at all. Instead of the blue dot parody in italics, the post could just be “Claude please rewrite the pale blue dot story to be about looking at the map.” Same content, faster to read, arguably funnier!
It is intended as a description of Ziz and co, but with a couple caveats:
1) It was meant as a description that I could hypothetically pattern match to while getting sucked in to one of these, which meant no negative value judgements in the conditions, only in the observed outcomes.
2) It was meant to cast a wide net - hence the tails. When checking if my own activities could be spiraling into yet another rationalist cult, false positives of the form "2% yes- let's look into that" are very cheap. It wasn't meant as a way for me to police the activities of others since that's a setting where false positives are expensive.
What would happen if I got some friends together and we all decided to be really dedicatedly rational?
This is an important scenario to reason about if I want to be a rationalist, and I think my predictions about that scenario are more calibrated than they would be in a world where I didn't read this post. Specifically, my predictions in light of this post have way, way fatter tails.
I agree that o1 doesn't have a test time scaling law, at least not in a strong sense, while generatively pretrained transformers seem to have a scaling law in an extremely strong sense,.
I'd put my position like this: if you trained a GPT on a human generated internet a million times larger than the internet of our world, with a million times more parameters, for a million times more iterations, then I am confident that that GPT could beat the minecraft ender dragon zero shot.
If you gave o1 a quadrillion times more thinking time, there is no way in hell it would beat the ender dragon.
I think, in discarding the simplicity argument, you are underestimating how many zeros are in the ratio gigabytes needed to specify the brain simulation initial conditions:gigabytes needed to store the quantum fields as the simulation runs. The data in the brain is vaguely linear in number of electrons, the ram needed to simulate the brain is vaguely exponential in number of electrons. “Simplest explanation of the state of the GPUs by a factor of 100” and “Simplest explanation of the state of the GPUs by a factor of 10^number of stars in the visible universe” are only quantitatively different, but sometimes quantity has a quality all of its own.
Beauty of notation is an optimization target and so should fail as a metric, but especially compared to other optimization targets I’ve pushed on, in my experience it seems to hold up. The exceptions appear to be string theory and category theory and two failures in a field the size of math is not so bad.
prompts already go through undesigned evolution through reproductive fitness (rendered in 4k artstation flickr 2014)
Sternum and neck for me
Properties of the track I am on are load bearing in this assertion. (Explicitl examples of both cases from the original comment: Tesla worked out how to destroy any structure by resonating it, and took the details to his grave because he was pretty sure that the details would be more useful for destroying buildings than for protecting them from resonating weapons. This didn't actually matter because his resonating weapon concept was crankish and wrong. Einstein worked out how to destroy any city by splitting atoms, and disclosed this, and it was promptly used to destroy cities. This did matter because he was right, but maybe didn't matter because lots of people worked out the splitting atoms thing at the same time. It's hard to tell from the inside whether you are crankish)
Nuclear power has gotten to a point where we can use it quite safely as long as no one does the thing (the thing being chemically separating the plutonium and imploding it in your neighbor's cities) and we seem to be surviving, as while all the actors have put great effort into being ready do do "the thing," no one actually does it. I'm beginning to suspect that it will be worth separating alignment into two fields, one of "Actually make AI safe" and another, sadder but easier field of "Make AI safe as long as no one does the thing." I've made some infinitesimal progress on the latter, but am not sure how to advance, use or share it since currently, conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing, and conditional on me being on the wrong track (the more likely case by far) it doesn't matter either way, so it's all downside. I suspect this is common? This is almost but not quite the same concept as "Don't advance capabilities."
I have observed a transition. 12 years ago, the left-right split was based on many loosely correlated factors and strategic/inertial effects, creating bizarre situations like near perfect correlation between opinions on Gay Marriage and privatization of social security. I think at that time you could reason much better if you could recognize that the separation between left and right was not natural. I at least have a ton of cached arguments from this era because it became such a familiar dynamic.
Nowadays, I don't think this old schema really applies, especially among the actual elected officers and party leadership. The effective left right split is mono-factor: you are right exactly in proportion to your personal loyalty to one Donald J. Trump, resulting in bizarre situations like Dick Cheney being classified as "Left."
+1 for just throwing your notes up on a website. For example, mine are at https://www.hgreer.com/Reports/ although there is currently a bit of a gap for the last few months as I've been working more on synthesizing existing work into a CVPR submission than on exploreing new directions.
The above is a terrible post-hoc justification and I need to get back to note taking.
Organizations and communities can also face hostile telepaths. My pet theory that sort of crystalized while reading this is that p-hacking is academia’s response to a hostile telepath that banned publication of negative results.
This of course sucks for non traditional researchers and especially journalists who don’t even subconsciously know that p=0.05002 r=1e-7 “breakthrough in finding relationship between milk consumption and toenail fungus” is code for “We have conclusively found no effect and want to broadcast to the community that there is no effect here; yet we cannot ever consciously acknowledging that we found nothing because our mortgages depend on fooling a hostile telepath into believing this is something”
Personally I am quite pleased with the field of parapsychology. For example, they took a human intuition and experience ("Wow, last night when I went to sleep I floated out of my body. That was real!") and operationalized it into a testable hypothesis ("When a subject capable of out of body experiences floats out of their body, they will be able to read random numbers written on a card otherwise hidden to them.") They went and actually performed this experiment, with a decent deal of rigor, writing the results down accurately, and got an impossible result- one subject could read the card. (Tart, 1968.) A great deal of effort quickly went in to further exploration (including military attention with the men who stare at goats etc) and it turned out that the experiment didn't replicate, even though everyone involved seemed to genuinely expect it to. In the end, no, you can't use an out of body experience to remotely view, but I'm really glad someone did the obvious experiments instead of armchair philosophizing.
https://digital.library.unt.edu/ark:/67531/metadc799368/m2/1/high_res_d/vol17-no2-73.pdf is a great read from someone who obviously believes in the metaphysical, and then does a great job designing and running experiments and accurately reporting their observations, and so it's really only a small ding against them that the author draws the wrong larger conclusions in the end.
Show me a field where replication crises tear through, exposing fraud and rot and an emperor that never had any clothes, a field where replications fail so badly that they result in firings and polemics in the New York Times and destroyed careers- and then I will show you a field that is a little confused but has the spirit and will get there sooner or later.
What you really need to look out for are fields that could never, on a conceptual level, have a devastating replication crisis. Lesswrong sometimes strays a little close to this camp.
Since you’re already in it: do you happen to know if the popular system of epicycles accurately represented the (relative, per body) distance of each planet from earth over time, or just the angle? I’ve been curious about this for a while but haven’t had time to dig in. They’d at minimum have to get it right for the moon and sun for predicting eclipse type.
After reading this, I prompted Claude with
Please write a parody of chapter 3 of the 1926 winnie the pooh, where instead of winnie and piglet searching for a woozle, some bloggers are looking for bloggers similar to matt levine, and not realizing that they are the bloggers who are similar to matt levine. This will be a humorous reply to the attached post.
Arxiv is basically one huge, glacially slow internet comment section, where you reply to an article by citing it. It’s more interactive than it looks- most early career researchers are set up to get a ping whenever they are cited.
Keep in mind that representative democracy as practiced in the US is doing as well as it is while holding up to hundreds of millions of dollars of destructive pessimization effort- any alternative system is going to be hit with similar efforts. Just off the top of my head: we are being hit with about $50 dollars per capita of spending this fall, and that's plenty to brain-melt a meaningful fraction of the population. Each member of a 500 member sortition body chosing a president, if their identity is leaked, is going to be immediately hit with OOM 30 million dollars of attempts to change their mind. This is a different environment than a calm deliberation and consideration of the issues as examined by the linked studies.
(figures computed by dividing 2024 election spending by targeted population)
What are the odds that Polymarket resolves “Trump yes” and Harris takes office in 2025? If these mystery traders expect to profit from hidden information, the hidden information could be about an anticipated failure of UMA instead of about the election itself.
Are there any mainstream programming languages that make it ergonomic to write high level numerical code that doesn't allocate once the serious calculation starts? So far for this task C is by far the best option but it's very manual, and Julia tries and does pretty well but you have to constantly make sure that the compiler successfully optimized away the allocations that you think it optimized away. (Obviously Fortran is also very good for this, but ugh)
To say that most academic research is anything, you’re going to have to pick a measure over research. Uniform measure is not going to be exciting – you’re going to get almost entirely undergraduate assignments and Third World paper mills. If your weighted sampler is “papers linked in articles about how academia is woke” you’re going to find a high %fake. If your weighed measure is “papers read during work hours by employees at F500 companies” you’ll find a lower, nonzero %fake.
Handwringing over public, vitriolic retractions spats is going to fuck your epistemology via sampling bias. There is no replication crisis in underwater basket weaving
Yeah, I definitely oversimplified somewhere. I'm definitely tripped up by "this statement is false" or statements that don't terminate. Worse, thinking in that direction, I appear to have claimed that the utterance "What color is your t-shirt" is associated with a probability of being true.
I think that your a-before-e example is confusing your intuition- a typical watermark that occurs 10% of the time isn't going to be semantic, it's more like "this n-gram hashed with my nonce == 0 mod 10"
I'm at this point pretty confident that under the Copenhagen interpretation, whenever an intergalactic photon hits earth, the wave-function collapse takes place on a semi-spherical wave-front many millions of lightyears in diameter. I'm still trying to wrap my head around what the interpretation of this event is in many-worlds. I know that it causes earth to pick which world it is in out of the possible worlds that split off when the photon was created, but I'm not sure if there is any event on the whole spherical wavefront.
It's not a pure hypothetical- we are likely to see gravitational lens interferometry in our lifetime (if someone hasn't achieved it yet outside of my attempt at literature review) which will either confirm that these considerations are real, or produce a shock result that they aren't.
One feature of every lesswrong Petrov day ritual is the understanding that the people on the other side of the button have basically similar goals and reasoning processes, especially when aggregated into a group. I wonder if the mods at /r/sneerclub would be interested in a Petrov day collaboration in the future.
Does it ever fail to complete a proof, and honestly announce failure? A single time I have gotten claude to successfully disprove a statement that I asked it to prove, after trying to find a proof and instead finding a disproof, but I’ve never had it try for a while and then announce that it has not made progress either way.
The funniest possible outcome is that no one opts in and so the world is saved but the blog post is ruined.
I would hate to remove the possibility of a funny outcome. No opt in!
I greatly enjoyed this book back in the day, but the whole scenario was wild enough to summon the moral immune system. Past a certain point, for me it’s a safe default to put up mental barriers and actively try not to learn moral lessons from horror fiction. Worm, Gideon the 9th, anything by Stephen King- great, but I don’t quite expect to learn great lessons.
While rejecting them as sources of wisdom now, I can remember these books and return to them if I suddenly need to make moral choices in a world where people can grow wiser by being tortured for months, or stronger by
killing and then mentally fusing with your childhood friend. or achieve coordination by mind controlling your entire community and spending their lives like pawns
This is a good point! As a result of this effect and Jensen’s inequality, chaos is a much more significant limit on testing CUDA programs than for example cpp programs
Huang
I enjoyed doing this interview. I haven’t done too much extemporaneous public speaking, and it was a weird but wonderful experience being on the other side of the youtube camera. Thanks Elizabeth!
- If a trebuchet requires you to solve the double pendulum problem (a classic example of a chaotic system) in order to aim, it is not a competition-winning trebuchet.
Ah, this is not quite the takeaway- and getting the subtlety here right is important for larger conclusions. If a simulating a trebuchet requires solving the double pendulum problem over many error-doublings, it is not a competition-winning trebuchet. This is an important distinction.
If you start with a simulator and a random assortment of pieces, and then start naively optimizing for pumpkin distance, you will quickly see the sort of design shown at 5:02 in the video, where the resulting machine is unphysical because its performance depends on coincidences that will go away in the face of tiny changes in initial conditions. This behaviour shows up with a variety of simulators and optimizers.
An expensive but probably effective solution is to perturb a design several times, simulate it several times, and stop simulation once the simulations diverge.
An ineffective solution is to limit the time of the solution, as many efficient and real-world designs take a long time to fire, because they begin with the machine slowly falling away from an unstable equilibrium.
The chaos-theory motivated cheap solution is to limit the number of rotations of bodies in the solution before terminating it, as experience shows error doublings tend to come from rotations in trebuchet-like chaotic systems.
The solution I currently have implemented at jstreb.hgreer.com is to only allow the direction of the projectile to rotate once before firing (specifically, it is released if it is moving upwards and to the right at a velocity above a threshold) which is not elegant, but seems mostly effective. I want to move to the "perturb and simulate several times" approach in the future.
The structure of language kind of screwed us here. Picture literally any reasonable policy for discussing each other’s religious affiliation in the workplace. Now implement that policy, but your workers speak a language where the grammar only functions if you know whether each referent is a “True” christian.
export INSTRUMENTAL_GOAL=change_yourself
Keep track of your past attemts to $INSTRUMENTAL_GOAL, so that you can better predict whether your future attempts to $INSTRUMENTAL_GOAL will succeed, and so better choose between plans that require $INSTRUMENTAL_GOAL and plans that route around it.
I didn't catch on at all that this was humor, and as a result made a point to pick up oatmeal next time I was at the grocery. I do actually like oatmeal, I just hadn't thought about it in a while. It has since made for some pretty good breakfasts.
This whole sequence of events is either deeply mundane or extremely funny, I genuinely can't tell. If it's funny it's definitely at my expense.
Ahah! I suspect that permission to start from scratch may be a large component of maintaining passion. Starting from scratch at will is pretty close to the exact boundary between programming I do for fun and programming for which I demand payment.