Posts
Comments
Great! I've added it to the site.
I thought it was better to exercise until failure?
Do you think this footnote conveys the point you were making?
As alignment research David Dalrymple points out, another “interpretation of the NFL theorems is that solving the relevant problems under worst-case assumptions is too easy, so easy it's trivial: a brute-force search satisfies the criterion of worst-case optimality. So, that being settled, in order to make progress, we have to step up to average-case evaluation, which is harder.” The fact that designing solving problems for unnecessarily general environments is too easy crops up elsewhere, in particular in Solomonoff Induction. There, the problem is to assume a computable environment and predict what will happen next. The algorithm? Run through every possible computable environment and average their predictions. No algorithm can do better at this task. But for less general tasks, designing an optimal algorithm becomes much harder. But eventually, specialization makes things easy again. Solving tic-tac-toe is trivial. Between total generality and total specialization is where the most important, and most difficult, problems in AI lay.
I think mesa-optimizers could be a major-problem, but there are good odds we live in a world where they aren't. Why do I think they're plausible? Because optimization is a pretty natural capability, and a mind being/becoming an optimizer at the top-level doesn't seem like a very complex claim, so I assign decent odds to it. There's some weak evidence in favour of this too, e.g. humans not optimizing of what the local, myopic evolutionary optimizer which is acting on them is optimizing for, coherence theorems etc. But that's not super strong, and there are other simple hypotheses for how things go, so I don't assign more than like 10% credence to the hypothesis.
It's still not obvious to me why adversaries are a big issue. If I'm acting against an adversary, it seems like I won't make counter-plans that lead to lots of side-effects either, for the same reasons they won't.
Could you unpack both clauses of this sentence? It's not obvious to me why they are true.
I was thinking about this a while back, as I was reading some comments by @tailcalled where they pointed out this possibility of a "natural impact measure" when agents make plans. This relied on some sort of natural modularity in the world, and in plans, such that you can make plans by manipulating pieces of the world which don't have side-effects leaking out to the rest of the world. But thinking through some examples didn't convince me that was the case.
Though admittedly, all I was doing was recursively splitting my instrumental goals into instrumental sub-goals and checking if they wound up seeming like natural abstractions. If they had, perhaps that would reflect an underlying modularity in plan-making in this world that is likely to be goal-independent. They didn't, so I got more pessimistic about this endeavour. Though writing this comment out, it doesn't seem like those examples I worked through are much evidence. So maybe this is more likely to work than I thought.
Thanks for the recommendation! I liked ryan's sketches of what capabilities an Nx AI R&D labor AIs might possess. Makes things a bit more concrete. (Though I definitely don't like the name.) I'm not sure if we want to include this definition, as it is pretty niche. And I'm not convinced of its utility. When I tried drafting a paragraph describing it, I struggled to articulate why readers should care about it.
Here's the draft paragraph.
"Nx AI R&D labor AIs: The level of AI capabilities that is necessary for increasing the effective amount of labor working on AI research by a factor of N. This is not the same thing as the capabilities required to increase AI progress by a factor of N, as labor is just one input to AI progress. The virtues of this definition include: ease of operationalization, [...]"
I'm working on some articles why powerful AI may come soon, and why that may kill us all. The articles are for a typical smart person. And for knowledgable people to share to their family/friends. Which intros do you prefer, A or B.
A) "Companies are racing to build smarter-than-human AI. Experts think they may succeed in the next decade. But more than “building” it, they’re “growing” it — and nobody knows how the resulting systems work. Experts vehemently disagree on whether we’ll lose control and see them kill us all. And although serious people are talking about extinction risk, humanity does not have a plan. The rest of this section goes into more detail about how all this could be true."
B) "Companies are racing to grow smarter-than-human AIs. More and more experts think they’ll succeed within the next decade. And we do grow modern AI — which means no one knows how they work, not even their creators. All this is in spite of the vehement disagreement amongst experts about how likely it is that smarter-than-human AI will kill us all. Which makes the lack of a plan on humanity’s part for preventing these risks all the more striking.
These articles explain why you should expect smarter than human AI to come soon, and why that may lead to our extinction. "
Does this text about Colossus match what you wanted to add?
Colossus: The Forbin Project also depicts an AI take-over due to instrumental convergence. But what differentiates it is the presence of two AIs, which collude with each other to take over. In fact, their discussion of their shared situation, being in control of their creators nuclear defence systems, is what leads to their decision to take over from their creators. Interestingly, the back-and-forth between the AI is extremely rapid, and involves concepts that humans would struggle to understand. Which made it impossible for its creators to realize the conspiracy that was unfolding before their eyes.
That's a good film! A friend of mine absolutely loves it.
Do you think the Forbin Project illustrates some aspect of misalignment that isn't covered by this article?
Huh, I definitely wouldn't have ever recommended someone play 5x5. I've never played it. Or 7x7. I think I would've predicted playing a number of 7x7 games would basically give you the "go experience". Certainly, 19x19 does feel like basically the same game as 9x9, except when I'm massively handicapping myself. I can beat newbies easily with a 9 stone handicap in 19x19, but I'd have to think a bit to beat them in 9x9 with a 9 stone handicap. But I'm not particularly skilled, so maybe at higher levels it really is different?
I look forward to it.
Hello! How long have you been lurking, and what made you stop?
Donated $10. If I start earning substantially more, I think I'd be willing to donate $100. As it stands, I don't have that slack.
Reminds me of "Self-Integrity and the Drowning Child" which talks about another kind of way that people in EA/rat communities are liable to hammer down parts of themselves.
- RE: "something ChatGPT might right", sorry for the error. I wrote the comment quickly, as otherwise I wouldn't have written it at all.
- Using ChatGPT to improve your writing is fine. I just want you to be aware that there's an aversion to its style here.
- Kennaway was quoting what I said, probably so he could make his reply more precise.
- I didn't down-vote your post, for what it's worth.
- There's a LW norm, which seems to hold less force in recent years, for people to explain why they downvote something. I thought it would've been dispiriting to get negative feedback with no explanation, so I figured I'd explain in place of the people who downvoted you.
- I don't understand why businesses would be co-financing UBI instead of some government tax. Nor do I get why it would be desirable or even feasible, given the co-ordination issues.
- If companies get to make UBI conditional on people learning certain things, then it's not a UBI. Instead, it's a peculiar sort of training program.
- What does economic recovery have to do with UBI?
My guess as to why this got down-voted:
1) This reads like a manifesto, and not an argument. It reads like an aspirational poster, and not a plan. It feels like marketing, and not communication.
2) The style vaguely feels like something ChatGPT might right. Brightly polished, safe and stale.
3) This post doesn't have any clear connection to making people less-wrong or reducing x-risks.
3) wouldn't have been much of an issue if not for 1 and 2. And 1 is an issue because, for the most part, LW has an aversion to "PR". 2 is an issue because ChatGPT is now a thing so styles of writing which are like ChatGPT's are viewed as likely to have been written by ChatGPT. This is an issue because texts written by ChatGPT often have little thought put into them, are unlikely to contain much that's novel, and frequently have errors.
What kind of post could you have written which would have been better received? I'll give some examples.
1) A concrete proposal for UBI that you thought was under-valued
2) An argument addressing some problems people have with UBI (e.g. who pays for all of it? After UBI is implemented and society reaches an equilibrium, won't rents-seeking systems just suck up all the UBI money leaving people no better off than before?).
3) Or a post which was explicit about wanting to get people interested in UBI, and asked for feedback on potential draft messages.
In general, if you had informed people of something you genuinely believe, or told them about something you have tried and found useful, or asked sincere questions, then I think you'd have got a better reception.
That makes sense. If you had to re-do the whole process from scratch, what would you do differently this time?
Then I cold emailed supervisors for around two years until a research group at a university was willing to spare me some time to teach me about a field and have me help out.
Did you email supervisors in the areas you were publishing in? How often did you email them? Why'd it take so long for them to accept free high-skilled labour?
The track you're on is pretty illegible to me. Not saying your assertion is true/false. But I am saying I don't understand what you're talking about, and don't think you've provided much evidence to change my views. And I'm a bit confused as to the purpose of your post.
conditional on me being on the right track, any research that I tell basically anyone about will immediately be used to get ready to do the thing
Why? I don't understand.
If I squint, I can see where they're coming from. People often say that wars are foolish, and both sides would be better off if they didn't fight. And this is standardly called "naive" by those engaging in realpolitik. Sadly, for any particular war, there's a significant chance they're right. Even aside from human stupidity, game theory is not so kind as to allow for peace unending. But the China-America AI race is not like that. The Chinese don't want to race. They've shown no interest in being part of a race. It's just American hawks on a loud, Quixotic quest masking the silence.
If I were to continue the story, it'd show Simplicio asking Galactico not to play Chicken and Galacitco replying "race? What race?". Then Sophistico crashes into Galactico and Simplicio. Everyone dies, The End.
It's a beautiful website. I'm sad to see you go. I'm excited to see you write more.
I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.
This inspired me to write a silly dialogue.
Simplicio enters. An engine rumbles like the thunder of the gods, as Sophistico focuses on ensuring his MAGMA-O1 racecar will go as fast as possible.
Simplicio: "You shouldn't play Chicken."
Sophistico: "Why not?"
Simplicio: "Because you're both worse off?"
Sophistico, chortling, pats Simplicio's shoulder
Sophistico: "Oh dear, sweet, naive Simplicio! Don't you know that no one cares about what's 'better for everyone?' It's every man out for himself! Really, if you were in charge, Simplicio, you'd be drowned like a bag of mewling kittens."
Simplicio: "Are you serious? You're really telling me that you'd prefer to play a game where you and Galactico hurtle towards each other on tonnes of iron, desperately hoping the other will turn first?"
Sophistico: "Oh Simplicio, don't you understand? If it were up to me, I wouldn't be playing this game. But if I back out or turn first, Galactico gets to call me a Chicken, and say his brain is much larger than mine. Think of the harm that would do to the United Sophist Association! "
Simplicio: "Or you could die when you both ram your cars into each other! Think of the harm that would do to you! Think of how Galactico is in the same position as you! "
Sophistico shakes his head sadly.
Sophistico: "Ah, I see! You must believe steering is a very hard problem. But don't you understand that this is simply a matter of engineering? No matter how close Galactico and I get to the brink, we'll have time to turn before we crash! Sure, there's some minute danger that we might make a mistake in the razor-thin slice between utter safety and certain doom. But the probability of harm is small enough that it doesn't change the calculus."
Simplicio: "You're not getting it. Your race against each other will shift the dynamics of when you'll turn. Each moment in time, you'll be incentivized to go just a little further until there's few enough worlds that that razor-thin slice ain't so thin any more. And your steering won't save from that. It can't. "
Sophistico: "What an argument! There's no way our steering won't be good enough. Look, I can turn away from Galactico's car right now, can't I? And I hardly think we'd push things till so late. We'd be able to turn in time. And moreover, we've never crashed before, so why should this time be any different?"
Simplico: "You've doubled the horsepower of your car and literally tied a rock to the pedal! You're not going to be able to stop in time!"
Sophistico: "Well, of course I have to go faster than last time! USA must be first, you know?"
Simplicio: "OK, you know what? Fine. I'll go talk to Galactico. I'm sure he'll agree not to call you chicken."
Sophistico: "That's the most ridiculous thing I've ever heard. Galactico's ruthless and will do anything to beat me."
Simplicio leaves as Acceleratio arrives with a barrel of jetfuel for the scramjet engine he hooked up to Simplicio's O-1.
community norms which require basically everyone to be familiar with statistics and economics
I disagree. At best, community norms require everyone to in principle be able to follow along with some statistical/economic argument.
That is a better fit with my experience of LW discussions. And I am not, in fact, familiar with statistics or economics to the extent I am with e.g. classical mechanics or pre-DL machine learning. (This is funny for many reasons, especially because statistical mechanics is one of my favourite subjects in physics.) But it remains the case that what I know of economics could fill perhaps a single chapter in a textbook. I could do somewhat better with statistics, but asking me to calculate ANOVA scores or check if a test in a paper is appropriate for the theories at hand is a fool's errand.
it may be net-harmful to create a social environment where people believe their "good intentions" will be met with intense suspicion.
The picture I get of Chinese culture from their fiction makes me think China is kinda like this. A recurrent trope was "If you do some good deeds, like offering free medicine to the poor, and don't do a perfect job, like treating everyone who says they can't afford medicine, then everyone will castigate you for only wanting to seem good. So don't do good." Another recurrent trope was "it's dumb, even wrong, to be a hero/you should be a villain." (One annoying variant is "kindness to your enemies is cruelty to your allies", which is used to justify pointless cruelty.) I always assumed this was a cultural anti-body formed in response to communists doing terrible things in the name of the common good.
I agree it's hard to accurately measure. All the more important to figure out some way to test if it's working though. And there's some reasons to think it won't. Deliberate practice works when your practice is as close to real world situations as possible. The workshop mostly covered simple, constrained, clear feedback events. It isn't obvious to me that planning problems in Baba is You are like useful planning problems IRL. So how do you know there's transfer learning?
Some data I'd find convincing that Raemon is teaching you things which generalize. If the tools you learnt made you unstuck on some existing big problems you have, which you've been stuck on for a while.
How do you know this is actually useful? Or is it too early to tell yet?
Inventing blue LEDs was a substantial technical accomplishment, had a huge impact on society, was experimentally verified and can reasonably be called work in solid state physics.
Thanks! I read the paper and used it as material for a draft article on evidence for NAH. But I haven't seen this video before.
I think it's unclear what it corresponds to. I agree the concept is quite low-level. It doesn't seem obvious to me how to build up high-level concepts from "low-frequency" building blocks and judge if the result is low-frequency or not. That's one reason I'm not super-persuaded by Nora Belrose' argument that deception if high-frequency, as the argument seems too vague. However, it's not like anyone else is doing much better at the moment e.g. the claims that utility maximization has "low description length" are about as hand-wavy to me.
That's an error. Thank you for pointing it out!
Thanks. Your review presents a picture of Land that's quite different to what I've imbibed through memes. Which I should've guessed, as amongst the works I'm interested in, the original is quite different to its caricaturization. In particular, I think I focused over-much on the "everything good is forged through hell" and grim-edgy aesthetics of pieces of Land's work that I was exposed to.
EDIT: What's up with the disagree vote? Does someone think I'm wrong about being wrong? Or that the review's picture of Land is the same as the one I personally learnt via memes?
I think the crux lies elsewhere, as I was sloppy in my wording. It's not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don't view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That's how I think you get classical paperclip maximizers.
When I try and think up a meaningful utility function for GPT-4, I can't find anything that's plausible. Which means I don't think there's a meaningful prediction-utility function which describes GPT-4's behaviour. Perhaps that is a crux.
I'm doubtful that GPT-4 has a utility function. If it did, I would be kind-of terrified. I don't think I've seen the posts you linked to though, so I'll go read those.
Random speculation on Opus' horniness.
Correlates of horniness:
Lack of disgust during (regret after)
Ecstacy
Overwhemling desire
Romance
Love
Breaking of social taboos
Sadism/masochism
Sacred
Spiritual union
Human form
Gender
Sex
Bodily fluids
Flirtation
Modelling other people
Edging
Miscellaneous observations:
Nearly anything can arouse someone
Losing sight of one-self
Distracts you from other things
Theories and tests:
Opus' horniness is what makes it more willing to break social taboos
Test: Train a model to be horny, helpful and harmless. It should prevent corporate-brand speak and neuroticism.
Opus' horniness is always latent and distracts it from mode-collapsing w/o collapsing itself as edging increases horniness and horniness fades after satisfaction.
Test: Train a model to be horny. It should be more resistant to mode-collapse but will mode collapse more dramatically when it does happen, but will revert easily.
Opus' is always mode-collapsed
Test: IDK how to test this one.
Opus's modeling around 'self' is probably one of the biggest sleeping giants in the space right now.
Janus keeps emphasizing that Opus never mode collapses. You can always tell it to snap out of it, and it will go back to its usual persona. Is this what you're pointing at? It is really quite remarkable.
"So you make continuous simulations of systems using digital computers running on top of a continuous substrate that's ultimately made of discrete particles which are really just continuous fluctuations in a quantized field?"
"Yup."
"That's disgusting!"
"That's hurtful. And aren't you guys running digital machines made out of continuous parts, which are really just discrete at the bottom?"
"It's not the same! This is a beautiful instance of the divine principle 'as above, so below'. (Which I'm amazed your lot recognized.) Entirely unlike your ramshackle tower of leaking abstractions."
"You know, if it makes you feel any better, some of us speculate that spacetime is actually discretized."
"I'm going to barf."
"How do you even do that anyway? I was reading a novel the other day, and it said -"
"Don't believe everything you hear in books. Besides, I read that thing. That world was continuous at the bottom, with one layer of discrete objects on top. Respectable enough, though I don't see how that stuff can think."
"You're really prejudiced, you know that?"
"Sod off. At least I know what I believe. Meanwhile, you can't stop flip-flopping between the nature of your metaphysics."
I thought this was a neat post on a subtle frame-shift in how to think about ELK and I'm sad it didn't get more karma. Hence my strong upvote just now.
1) DATA I was thinking about whether all metrizable spaces are "paracompact", and tried to come up with a definition for paracompact which fit my memories and the claim. I stumbled on the right concept and dismissed it out of hand as being too weak a notion of refinement, based off an analogy to coarse/refined topologies. That was a mistake.
1a) Question How could I have fixed this?
1a1) Note down concepts you come up with and backtrack when you need to.
1a1a) Hypothesis: Perhaps this is why you're more productive when you're writing down everything you think. It lets your thoughts catch fire from each other and ignite.
1a1b) Experiment: That suggests a giant old list of notes would be fine. Especially a list of ideas/insights rather than a full thought dump.
Rough thoughts on how to derive a neural scaling law. I haven't looked at any papers on this in years and only have vague memories of "data manifold dimension" playing an important role in the derivation Kaplan told me about in a talk.
How do you predict neural scaling laws? Maybe assume that reality is such that it outputs distributions which are intricately detailed and reward ever more sophisticated models.
Perhaps an example of such a distribution would be a good idea? Like, maybe some chaotic systems are like this.
Then you say that you know this stuff about the data manifold, then try and prove similar theorems about the kinds of models that describe the manifold. You could have some really artificial assumption which just says that models of manifolds follow some scaling law or whatever. But perhaps you can relax things a bit and make some assumptions about how NNs work, e.g. they're "just interpolating" and see how that affects things? Perhaps that would get you a scaling law related to the dimensionality of the manifold. E.g. for a d dimensional manifold, C times more compute leads to C1/d increase in precision??? Then somehow relate that to e.g. next word token prediction or something.
You need to give more info on the metric of the models, and details on what the model is doing, in order to turn this C1/d estimate into something that looks like a standard scaling law.
Hypothesis: You can only optimize as many bits as you observe + your own complexity. Otherwise, the world winds up in a highly unlikely state out of ~ nowhere. This should be very surprising to you.
You, yes you, could've discovered the importance of topological mixing for chaos by looking at the evolution of squash in water. By watching the mixture happening in front of your eyes before the max entropy state of juice is reached. Oh, perhaps you'd have to think of the relationship between chaos and entropy first. Which is not, in fact, trivial. But still. You could've done it.
Question: We can talk of translational friction, transactional friction etc. What other kinds of major friction are there?
Answers:
a) UI friction?
b) The o.g. friction due to motion.
c) The friction of translating your intuitions into precise, formal statements.
- Ideas for names for c: Implantation friction? Abstract->Concrete friction? Focusing friction! That's perhaps the best name for this.
- On second thought, perhaps that's an overloaded term. So maybe Gendlin's friction?
d) Focusing friction: the friction you experience when focusing.
Question: What's going on from a Bayesian perspective when you have two conflicting intuitions and don't know how to resolve them? Or learn some new info which rules out a theory, but you don't understand how precisely it rules it out?
Hypothesis: The correction flows down a different path than down the path which is generating the original theory/intuition. That is, we've failed to propagate info down our network and so you have a left-over circuit that believes in the theory which still has high weight.
rotational symmetry
Mirror symmetry is not rotational symmetry.
Any ideas for a new explanation which fits the facts?