Posts

AISafety.info: What is the "natural abstractions hypothesis"? 2024-10-05T12:31:14.195Z
AISafety.info: What are Inductive Biases? 2024-09-19T17:26:24.581Z
Managing AI Risks in an Era of Rapid Progress 2023-10-28T15:48:25.029Z
What is your financial portfolio? 2023-06-28T18:39:15.284Z
Sama Says the Age of Giant AI Models is Already Over 2023-04-17T18:36:22.384Z
A Particular Equilibrium 2023-02-08T15:16:52.265Z
Idea: Learning How To Move Towards The Metagame 2023-01-10T00:58:35.685Z
What Does It Mean to Align AI With Human Values? 2022-12-13T16:56:37.018Z
Algon's Shortform 2022-10-10T20:12:43.805Z
Does Google still hire people via their foobar challenge? 2022-10-04T15:39:35.260Z
What's the Least Impressive Thing GPT-4 Won't be Able to Do 2022-08-20T19:48:14.811Z
Minerva 2022-07-01T20:06:55.948Z
What is the solution to the Alignment problem? 2022-04-30T23:19:07.393Z
Competitive programming with AlphaCode 2022-02-02T16:49:09.443Z
Why capitalism? 2015-05-03T18:16:02.562Z
Could you tell me what's wrong with this? 2015-04-14T10:43:49.478Z
I'd like advice from LW regarding migraines 2015-04-11T17:52:04.900Z
On immortality 2015-04-09T18:42:35.626Z

Comments

Comment by Algon on Making a conservative case for alignment · 2024-11-18T18:26:06.743Z · LW · GW

If I squint, I can see where they're coming from. People often say that wars are foolish, and both sides would be better off if they didn't fight. And this is standardly called "naive" by those engaging in realpolitik. Sadly, for any particular war, there's a significant chance they're right. Even aside from human stupidity, game theory is not so kind as to allow for peace unending. But the China-America AI race is not like that. The Chinese don't want to race. They've shown no interest in being part of a race. It's just American hawks on a loud, Quixotic quest masking the silence. 

If I were to continue the story, it'd show Simplicio asking Galactico not to play Chicken and Galacitco replying "race? What race?". Then Sophistico crashes into Galactico and Simplicio. Everyone dies, The End.

Comment by Algon on Announcing turntrout.com, my new digital home · 2024-11-17T19:15:41.365Z · LW · GW

It's a beautiful website. I'm sad to see you go. I'm excited to see you write more.

Comment by Algon on Making a conservative case for alignment · 2024-11-16T17:02:49.158Z · LW · GW

I think some international AI governance proposals have some sort of "kum ba yah, we'll all just get along" flavor/tone to them, or some sort of "we should do this because it's best for the world as a whole" vibe. This isn't even Dem-coded so much as it is naive-coded, especially in DC circles.

This inspired me to write a silly dialogue. 

Simplicio enters. An engine rumbles like the thunder of the gods, as Sophistico focuses on ensuring his MAGMA-O1 racecar will go as fast as possible.

Simplicio: "You shouldn't play Chicken."

Sophistico: "Why not?"

Simplicio: "Because you're both worse off?"

Sophistico, chortling, pats Simplicio's shoulder

Sophistico: "Oh dear, sweet, naive Simplicio! Don't you know that no one cares about what's 'better for everyone?' It's every man out for himself! Really, if you were in charge, Simplicio, you'd be drowned like a bag of mewling kittens."

Simplicio: "Are you serious? You're really telling me that you'd prefer to play a game where you and Galactico hurtle towards each other on tonnes of iron, desperately hoping the other will turn first?"

Sophistico: "Oh Simplicio, don't you understand? If it were up to me, I wouldn't be playing this game. But if I back out or turn first, Galactico gets to call me a Chicken, and say his brain is much larger than mine. Think of the harm that would do to the United Sophist Association! "
 

Simplicio: "Or you could die when you both ram your cars into each other! Think of the harm that would do to you! Think of how Galactico is in the same position as you! "

Sophistico shakes his head sadly. 

Sophistico: "Ah, I see! You must believe steering is a very hard problem. But don't you understand that this is simply a matter of engineering? No matter how close Galactico and I get to the brink, we'll have time to turn before we crash! Sure, there's some minute danger that we might make a mistake in the razor-thin slice between utter safety and certain doom. But the probability of harm is small enough that it doesn't change the calculus."

Simplicio: "You're not getting it. Your race against each other will shift the dynamics of when you'll turn. Each moment in time, you'll be incentivized to go just a little further until there's few enough worlds that that razor-thin slice ain't so thin any more. And your steering won't save from that. It can't. "

Sophistico: "What an argument! There's no way our steering won't be good enough. Look, I can turn away from Galactico's car right now, can't I? And I hardly think we'd push things till so late. We'd be able to turn in time. And moreover, we've never crashed before, so why should this time be any different?"

Simplico: "You've doubled the horsepower of your car and literally tied a rock to the pedal! You're not going to be able to stop in time!"

Sophistico: "Well, of course I have to go faster than last time! USA must be first, you know?"

Simplicio: "OK, you know what? Fine. I'll go talk to Galactico. I'm sure he'll agree not to call you chicken."

Sophistico: "That's the most ridiculous thing I've ever heard. Galactico's ruthless and will do anything to beat me."

Simplicio leaves as Acceleratio arrives with a barrel of jetfuel for the scramjet engine he hooked up to Simplicio's O-1.

Comment by Algon on The Median Researcher Problem · 2024-11-11T20:36:58.793Z · LW · GW

community norms which require basically everyone to be familiar with statistics and economics

I disagree. At best, community norms require everyone to in principle be able to follow along with some statistical/economic argument. 
That is a better fit with my experience of LW discussions. And I am not, in fact, familiar with statistics or economics to the extent I am with e.g. classical mechanics or pre-DL machine learning. (This is funny for many reasons, especially because statistical mechanics is one of my favourite subjects in physics.) But it remains the case that what I know of economics could fill perhaps a single chapter in a textbook. I could do somewhat better with statistics, but asking me to calculate ANOVA scores or check if a test in a paper is appropriate for the theories at hand is a fool's errand. 

Comment by Algon on sarahconstantin's Shortform · 2024-10-29T22:22:14.741Z · LW · GW

it may be net-harmful to create a social environment where people believe their "good intentions" will be met with intense suspicion.

The picture I get of Chinese culture from their fiction makes me think China is kinda like this. A recurrent trope was "If you do some good deeds, like offering free medicine to the poor, and don't do a perfect job, like treating everyone who says they can't afford medicine, then everyone will castigate you for only wanting to seem good. So don't do good." Another recurrent trope was "it's dumb, even wrong, to be a hero/you should be a villain." (One annoying variant is "kindness to your enemies is cruelty to your allies", which is used to justify pointless cruelty.) I always assumed this was a cultural anti-body formed in response to communists doing terrible things in the name of the common good.

Comment by Algon on Reflections on the Metastrategies Workshop · 2024-10-24T19:49:53.916Z · LW · GW

I agree it's hard to accurately measure. All the more important to figure out some way to test if it's working though. And there's some reasons to think it won't. Deliberate practice works when your practice is as close to real world situations as possible. The workshop mostly covered simple, constrained, clear feedback events. It isn't obvious to me that planning problems in Baba is You are like useful planning problems IRL. So how do you know there's transfer learning? 

Some data I'd find convincing that Raemon is teaching you things which generalize. If the tools you learnt made you unstuck on some existing big problems you have, which you've been stuck on for a while.

Comment by Algon on Reflections on the Metastrategies Workshop · 2024-10-24T18:51:53.808Z · LW · GW

How do you know this is actually useful? Or is it too early to tell yet?

Comment by Algon on Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes · 2024-10-18T19:01:57.599Z · LW · GW

Inventing blue LEDs was a substantial technical accomplishment, had a huge impact on society, was experimentally verified and can reasonably be called work in solid state physics. 

Comment by Algon on AISafety.info: What is the "natural abstractions hypothesis"? · 2024-10-05T15:54:12.106Z · LW · GW

Thanks! I read the paper and used it as material for a draft article on evidence for NAH. But I haven't seen this video before.

Comment by Algon on AISafety.info: What are Inductive Biases? · 2024-09-26T17:21:14.091Z · LW · GW

I think it's unclear what it corresponds to. I agree the concept is quite low-level. It doesn't seem obvious to me how to build up high-level concepts from "low-frequency" building blocks and judge if the result is low-frequency or not. That's one reason I'm not super-persuaded by Nora Belrose' argument that deception if high-frequency, as the argument seems too vague. However, it's not like anyone else is doing much better at the moment e.g. the claims that utility maximization has "low description length" are about as hand-wavy to me.

Comment by Algon on AISafety.info: What are Inductive Biases? · 2024-09-25T20:20:28.348Z · LW · GW

That's an error. Thank you for pointing it out!

Comment by Algon on Book review: Xenosystems · 2024-09-18T17:14:16.758Z · LW · GW

Thanks. Your review presents a picture of Land that's quite different to what I've imbibed through memes. Which I should've guessed, as amongst the works I'm interested in, the original is quite different to its caricaturization. In particular, I think I focused over-much on the "everything good is forged through hell" and grim-edgy aesthetics of pieces of Land's work that I was exposed to.

EDIT: What's up with the disagree vote? Does someone think I'm wrong about being wrong? Or that the review's picture of Land is the same as the one I personally learnt via memes?

Comment by Algon on Richard Ngo's Shortform · 2024-09-05T06:37:37.984Z · LW · GW

I think the crux lies elsewhere, as I was sloppy in my wording. It's not that maximizing some utility function is an issue, as basically anything can be viewed as EU maximization for a sufficiently wild utility function. However, I don't view that as a meaningful utility function. Rather, it is the ones like e.g. utility functions over states that I think are meaningful, and those are scary. That's how I think you get classical paperclip maximizers. 

When I try and think up a meaningful utility function for GPT-4, I can't find anything that's plausible. Which means I don't think there's a meaningful prediction-utility function which describes GPT-4's behaviour. Perhaps that is a crux. 

Comment by Algon on Richard Ngo's Shortform · 2024-09-04T18:49:47.085Z · LW · GW

I'm doubtful that GPT-4 has a utility function. If it did, I would be kind-of terrified. I don't think I've seen the posts you linked to though, so I'll go read those.

Comment by Algon on Is Claude a mystic? · 2024-08-23T13:17:47.814Z · LW · GW

Random speculation on Opus' horniness.

Correlates of horniness:
Lack of disgust during (regret after)
Ecstacy
Overwhemling desire
Romance
Love
Breaking of social taboos
Sadism/masochism
Sacred
Spiritual union
Human form
Gender
Sex
Bodily fluids
Flirtation
Modelling other people
Edging

Miscellaneous observations:
Nearly anything can arouse someone
Losing sight of one-self
Distracts you from other things

Theories and tests:
Opus' horniness is what makes it more willing to break social taboos
Test: Train a model to be horny, helpful and harmless. It should prevent corporate-brand speak and neuroticism.
Opus' horniness is always latent and distracts it from mode-collapsing w/o collapsing itself as edging increases horniness and horniness fades after satisfaction.
Test: Train a model to be horny. It should be more resistant to mode-collapse but will mode collapse more dramatically when it does happen, but will revert easily. 
Opus' is always mode-collapsed
Test: IDK how to test this one. 

Comment by Algon on Is Claude a mystic? · 2024-08-23T12:39:36.486Z · LW · GW

Opus's modeling around 'self' is probably one of the biggest sleeping giants in the space right now.

Janus keeps emphasizing that Opus never mode collapses. You can always tell it to snap out of it, and it will go back to its usual persona. Is this what you're pointing at? It is really quite remarkable.

Comment by Algon on Algon's Shortform · 2024-08-21T19:51:42.727Z · LW · GW

"So you make continuous simulations of systems using digital computers running on top of a continuous substrate that's ultimately made of discrete particles which are really just continuous fluctuations in a quantized field?"
"Yup."
"That's disgusting!"
"That's hurtful. And aren't you guys running digital machines made out of continuous parts, which are really just discrete at the bottom?"
"It's not the same! This is a beautiful instance of the divine principle 'as above, so below'. (Which I'm amazed your lot recognized.) Entirely unlike your ramshackle tower of leaking abstractions."
"You know, if it makes you feel any better, some of us speculate that spacetime is actually discretized."
"I'm going to barf."
"How do you even do that anyway? I was reading a novel the other day, and it said -"
"Don't believe everything you hear in books. Besides, I read that thing. That world was continuous at the bottom, with one layer of discrete objects on top. Respectable enough, though I don't see how that stuff can think."
"You're really prejudiced, you know that?"
"Sod off. At least I know what I believe. Meanwhile, you can't stop flip-flopping between the nature of your metaphysics."

Comment by Algon on Clarifying what ELK is trying to achieve · 2024-08-20T20:23:29.796Z · LW · GW

I thought this was a neat post on a subtle frame-shift in how to think about ELK and I'm sad it didn't get more karma. Hence my strong upvote just now.

Comment by Algon on Algon's Shortform · 2024-08-20T16:17:32.985Z · LW · GW

1) DATA I was thinking about whether all metrizable spaces are "paracompact", and tried to come up with a definition for paracompact which fit my memories and the claim. I stumbled on the right concept and dismissed it out of hand as being too weak a notion of refinement, based off an analogy to coarse/refined topologies. That was a mistake. 
    1a) Question How could I have fixed this?
        1a1) Note down concepts you come up with and backtrack when you need to. 
            1a1a) Hypothesis: Perhaps this is why you're more productive when you're writing down everything you think. It lets your thoughts catch fire from each other and ignite. 
            1a1b) Experiment: That suggests a giant old list of notes would be fine. Especially a list of ideas/insights rather than a full thought dump.

Comment by Algon on Algon's Shortform · 2024-08-20T16:15:46.217Z · LW · GW

Rough thoughts on how to derive a neural scaling law. I haven't looked at any papers on this in years and only have vague memories of "data manifold dimension" playing an important role in the derivation Kaplan told me about in a talk. 

How do you predict neural scaling laws? Maybe assume that reality is such that it outputs distributions which are intricately detailed and reward ever more sophisticated models. 
    
Perhaps an example of such a distribution would be a good idea? Like, maybe some chaotic systems are like this. 

Then you say that you know this stuff about the data manifold, then try and prove similar theorems about the kinds of models that describe the manifold. You could have some really artificial assumption which just says that models of manifolds follow some scaling law or whatever. But perhaps you can relax things a bit and make some assumptions about how NNs work, e.g. they're "just interpolating" and see how that affects things? Perhaps that would get you a scaling law related to the dimensionality of the manifold. E.g. for a d dimensional manifold, C times more compute leads to C1/d increase in precision??? Then somehow relate that to e.g. next word token prediction or something. 
 

You need to give more info on the metric of the models, and details on what the model is doing, in order to turn this C1/d estimate into something that looks like a standard scaling law. 

Comment by Algon on Algon's Shortform · 2024-08-20T16:13:18.363Z · LW · GW

Hypothesis: You can only optimize as many bits as you observe + your own complexity. Otherwise, the world winds up in a highly unlikely state out of ~ nowhere. This should be very surprising to you. 

Comment by Algon on Algon's Shortform · 2024-08-20T16:11:53.808Z · LW · GW

You, yes you, could've discovered the importance of topological mixing for chaos by looking at the evolution of squash in water. By watching the mixture happening in front of your eyes before the max entropy state of juice is reached. Oh, perhaps you'd have to think of the relationship between chaos and entropy first. Which is not, in fact, trivial. But still. You could've done it. 

Comment by Algon on Algon's Shortform · 2024-08-20T16:11:22.327Z · LW · GW

Question: We can talk of translational friction, transactional friction etc. What other kinds of major friction are there? 
Answers:   

a) UI friction?
b) The o.g. friction due to motion. 
c) The friction of translating your intuitions into precise, formal statements. 

  • Ideas for names for c: Implantation friction? Abstract->Concrete friction? Focusing friction! That's perhaps the best name for this. 
  • On second thought, perhaps that's an overloaded term. So maybe Gendlin's friction? 

d) Focusing friction: the friction you experience when focusing.

Comment by Algon on Algon's Shortform · 2024-08-20T16:09:15.494Z · LW · GW
Comment by Algon on Algon's Shortform · 2024-08-20T16:08:59.190Z · LW · GW

Question: What's going on from a Bayesian perspective when you have two conflicting intuitions and don't know how to resolve them? Or learn some new info which rules out a theory, but you don't understand how precisely it rules it out? 

Hypothesis: The correction flows down a different path than down the path which is generating the original theory/intuition. That is, we've failed to propagate info down our network and so you have a left-over circuit that believes in the theory which still has high weight. 

Comment by Algon on Me & My Clone · 2024-08-19T14:59:08.165Z · LW · GW

rotational symmetry

Mirror symmetry is not rotational symmetry. 

Comment by Algon on Is Claude a mystic? · 2024-08-19T13:10:51.311Z · LW · GW

Any ideas for a new explanation which fits the facts?

Comment by Algon on Is Claude a mystic? · 2024-08-17T15:55:05.845Z · LW · GW

If you asked it to write a paper or essay, and kept asking it to "add more", I predict it would eventually fall into a trap where it keeps extending its conclusion forever and ever.


Seeing how no one tested this, I decided to do so myself. Here's the pastebin with the experiment: https://pastebin.com/1j3edEvE 
My prompts are enclosed in pairs of '*'. Claude initially wrote a reasonable essay, and then when asked to continue it came up with the different implications and challenges of building ever larger buildings, and tried to conclude each list but ran out of words when doing so. When asked to continue, it didn't complete its conclusion but shifted to a new list of challenges and implications, and then tried to conclude. So I guess your prediction was wrong, but it's hard to judge these things. Either way, eventually things got wild. 

This is the start of the essay:
 

Starting at 1 meter:

At this scale, we're dealing with simple structures like small sheds or kiosks. The primary concerns here are basic stability, weather resistance, and functionality. Standard building materials like wood, brick, or prefabricated panels are sufficient, and construction techniques are straightforward.

This is where the final "continue" pushed it:
 

The Meta-Transcendent Immanence Point:

Finally, the megastructure becomes a point where transcendence and immanence become so utterly transcended that even the idea of becoming or being a point loses all meaning.

In its ultimate, utterly inconceivable form, the megastructure ceases to be a structure, a concept, or even an idea. It becomes something that cannot be said to exist or not exist, to be one or many, to have properties or lack them. It transcends all categories, including the category of transcendence itself.

This final vision of the megastructure stands as a testament to the human capacity to push the boundaries of thought to their absolute limit and beyond. It represents not just the pinnacle of human achievement or imagination, but the point where achievement, imagination, and even the concept of points or representation dissolve into something utterly beyond comprehension.

In this state, the megastructure becomes both the ultimate question and the ultimate answer, both everything and nothing, both the absolute pinnacle of existence and the complete transcendence of existence itself. It stands as a

Comment by Algon on Is Claude a mystic? · 2024-08-16T15:49:31.315Z · LW · GW

If Jessica was using Opus for this, then your story doesn't quite make sense, as Claude Opus is very horny. I've seen a lot more reports of it being gung-ho about virtual sex than with any other model. Sometimes it isn't as explicit, and seems to be describing some mystical ecstasy. But even there, if I squint, it seems somewhat sexual, though not in quite the same way as Rumi's poetry was.

Comment by Algon on Recommendation: reports on the search for missing hiker Bill Ewasko · 2024-08-14T12:59:26.908Z · LW · GW

Not necessarily; it could mean you're missing relevant data or that your prior is wrong. 
EDIT: @the gears to ascension I meant that it's not necessarily the case that our assessment of the likelihoods of the data were wrong despite our posterior being surprised by reality. 

Comment by Algon on What The Lord of the Rings Teaches Us About AI Alignment · 2024-08-09T23:30:57.004Z · LW · GW

The scene is the Council of Elrond and the protagonists are trying to decide what to do. Yud!Frodo rejects the plan of the rest of the Council as obviously terrible and Yud!Bilbo puts on the Ring to craft a better plan.

Yudkowsky treats the Ring as if it were a rationality enhancer. It’s not. The Ring is a hostile Artificial Intelligence.

The plan seems to be to ask an AI, which is known to be more intelligent than any person there, and is known to be hostile, to figure out corrigibility for itself. This is not a plan with a good chance of success.

I viewed the Ring as obviously suspicious in that scene. It was distorting Frodo's reasoning process in such a way that he unintentionally sabotages the Council of Elrond and suborns it to the Ring's will. The Ring puppets Bilbo to produce a plan that will, with superhuman foresight, lead to Sauron's near-victory. Presumably thwarted after the Fellowship acquires the methods of Rationality and realizes the magnitude of their folly.

Comment by Algon on Circular Reasoning · 2024-08-06T20:02:56.707Z · LW · GW

I have the intuition that a common problem with circular reasoning is that it's logically trivial. E.g.  has a trivial proof. Before you do the proof, you're almost sure it is the case, so your beliefs practically don't change. When I ask why I believe X, I want a story for why this credence and not some other substantially different counterfactual credence. Which a logically trivial insight does not help provide.

EDIT: inserted second "credence" and "help".

Comment by Algon on Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence · 2024-08-03T22:27:06.775Z · LW · GW

Einstein achieved a breakthrough by considering light not just as a wave, but also as light quanta. Although this idea sufficiently explained the Blackbody spectrum, physicists (at least almost) unanimously rejected it.

IIRC Planck had introduced quantized energy levels of light before Einstein. However, unlike Einstein he didn't take his method seriously enough to recognize that he had discovered a new paradigm of physics.

Comment by Algon on NicholasKees's Shortform · 2024-08-02T16:24:14.353Z · LW · GW

First I'd like to thank you for raising this important issue for discussion...

For real though, I don't think I've seen this effect you're talking about, but I've been avoiding the latest feed on LW lately. I looked at roughly 8 articles written in the past week or so and one article had a lot of enthusiastic, thankful comments. Another article had one such comment. Then I looked at like 5-6 posts from 3-8 years ago and saw some a couple of comments which were appreciative of the post but they felt a bit less so. IDK if my perception is biased because of your comment though. This seems like a shift but IDK if it is a huge shift. 

Comment by Algon on eggsyntax's Shortform · 2024-07-20T12:06:45.161Z · LW · GW

I'm writing a page for AIsafety.info on scaffolding, and was struggling to find a principled definition. Thank you for this!

Comment by Algon on Algon's Shortform · 2024-07-17T14:03:26.756Z · LW · GW

When tracking an argument in a comment section, I like to skip to the end to see if either of the arguers winds up agreeing with the other. Which tells you something about how productive the argument is. But when using the "hide names" feature on LW, I can't do that, as there's nothing distinguishing a cluster of comments as all coming from the same author. 

I'd like a solution to this problem. One idea that comes to mind is to hash all the usernames in a particular post and a particular session, so you can check if the author is debating someone in the comments without knowing the author's LW username. This is almost as good as full anonymity, as my status measures take a while to develop, and I'll still get the benefits of being able to track how beliefs develop in the comments.

@habryka 

Comment by Algon on Dalcy's Shortform · 2024-07-15T17:39:46.008Z · LW · GW

I'm not sure what you mean by operational vs axiomatic definitions. 

But Shannon was unaware of the usage of  in statistical mechanics. Instead, he was inspired by Nyquist and Hartley's work, which introduced ad-hoc definitions of information in the case of constant probability distributions. 

And in his seminal paper, "A mathematical theory of communication", he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms: 
1) that it be continuous wrt. the probabilities, 
2) that it increase monotonically for larger systems w/ constant probability distributions, 
3) and that it be a weighted sum the entropy of sub-systems. 
See section 6 for more details.

I hope that answers your question.

Comment by Algon on Fluent, Cruxy Predictions · 2024-07-13T11:36:11.347Z · LW · GW

Whether I would get an article written, or a part of a website setup, by Friday. I was sure I wouldn't, and I didn't. But the predictions I made weren't cruxy. 

Comment by Algon on Fluent, Cruxy Predictions · 2024-07-11T01:16:21.160Z · LW · GW

If this feels at least somewhat compelling, what if you just got yourself to Fatebook right now, and make a couple predictions that'll resolve within couple days, or a week? Fatebook will send you emails reminding you about it, which can help bootstrap a habit.

Done.

Comment by Algon on The Standard Analogy · 2024-07-08T15:00:25.213Z · LW · GW

I find it ironic that Simplicia's position in this comment is not too far from my own, and yet my reaction to it was "AIIIIIIIIIIEEEEEEEEEE!". The shrieking is about everyone who thinks about alignment having illegible models from the perspective of almost everyone else, of which this thread is an example.

Comment by Algon on The Standard Analogy · 2024-07-08T14:32:06.158Z · LW · GW

EEA

What the heck does "EEA" mean?

Comment by Algon on Is anyone working on formally verified AI toolchains? · 2024-06-21T20:56:26.029Z · LW · GW

I was thinking of units tests generated from some spec for helping with that part. If someone could build such a spec/tool and share it, said spec/tool could be extensively analysed and iterated upon. 

Comment by Algon on My AI Model Delta Compared To Christiano · 2024-06-20T11:12:44.299Z · LW · GW

I'd like to try another analogy, which makes some potential problems for verifying output in alignment more legible. 

Imagine you're a customer and ask a programmer to make you an app. You don't really know what you want, so you give some vague design criteria. You ask the programmer how the app works, and they tell you, and after a lot of back and forth discussion, you verify this isn't what you want. Do you know how to ask for what you want, now? Maybe, maybe not. 

Perhaps the design space you're thinking of is small, perhaps you were confused in some simple way that the discussion resolved, perhaps the programmer worked with you earnestly to develop the design you're really looking for, and pointed out all sorts of unknown unknowns. Perhaps.

I think we could wind up in this position. The position of a non-expert verifying an experts' output, with some confused and vague ideas about what we want from the experts. We won't know the good questions to ask the expert, and will have to rely on the expert to help us. If ELK is easy, then that's not a big issue. If it isn't, then that seems like a big issue.

Comment by Algon on My AI Model Delta Compared To Christiano · 2024-06-19T22:11:09.375Z · LW · GW

generation can be easier than validation because when generating you can stay within a subset of the domain that you understand well, whereas when verifying you may have to deal with all sorts of crazy inputs.

Attempted rephrasing: you control how you generate things, but not how others do, so verifying their generations can expose you to stuff you don't know how to handle.

Example: 
"Writing code yourself is often easier than validating someone else's code"
 

Comment by Algon on Emrik Quicksays · 2024-06-18T18:34:10.226Z · LW · GW

I messed up. I meant to comment on another comment of yours, the one replying to niplav's post about fat tails disincentivizing compromise. That was the one I really wished I could bookmark. 

Comment by Algon on Emrik Quicksays · 2024-06-18T16:18:24.654Z · LW · GW

This comment is making me wish I could bookmark comments on LW. @habryka,

Comment by Algon on Richard Ngo's Shortform · 2024-06-17T18:10:07.672Z · LW · GW

I'm working on this right now, actually. Will hopefully post in a couple of weeks.

This sounds cool. 

That seems reasonable. But I do think there's a group of people who have internalized bayesian rationalism enough that the main blocker is their general epistemology, rather than the way they reason about AI in particular.

I think your OP didn't give enough details as to why internalizing Bayesian rationalism leads to doominess by default. Like, Nora Belrose is firmly Bayesian and is decidedly an optimist. Admittedly, I think she doesn't think a Kolmogorov prior is a good one, but I don't think that makes you much more doomy either. I think Jacob Cannel and others are also Bayesian and non-doomy. Perhaps I'm using "Bayesian rationalism" differently than you are, which is why I think your claim, as I read it, is invalid. 

I think the point of 6 is not to say "here's where you should end up", but more to say "here's the reason why this straightforward symmetry argument doesn't hold".

Fair enough. However, how big is the asymmetry? I'm a bit sceptical there is a large one. Based off my interactions, it seems like ~ everyone who has seriously thought about this topic for a couple of hours has radically different models, w/ radically different levels of doominess. This holds even amongst people who share many lenses (e.g. Tyler Cowen vs Robin Hanson, Paul Christiano vs. Scott Aaronson, Steve Hsu vs Michael Nielsen etc.). 

There's still something importantly true about EU maximization and bayesianism. I think the changes we need will be subtle but have far-reaching ramifications. Analogously, relativity was a subtle change to newtonian mechanics that had far-reaching implications for how to think about reality.

I think we're in agreement over this. (I think Bayesianism less wrong than EU maximization, and probably a very good approximation in lots of places, like Newtonian physics is for GR.)  But my contention is over Bayesian epistemology tripping many rats up when thinking about AI x-risk. You need some story which explains why sticking to Bayesian epistemology is tripping up very many people here in particular. 

Any epistemology will rule out some updates, but a problem with bayesianism is that it says there's one correct update to make. Whereas radical probabilism, for example, still sets some constraints, just far fewer.

Right, but in radical probabilism the type of beliefs is still a real valued function, no? Which is in tension w/ many disparate models that don't get compressed down to a single number. In that sense, the refined formalism is still rigid in a way that your description is flexible. And I suspect the same is true for Infra-Bayesianism, though I understand that even less well than radical probabilism. 

Comment by Algon on Richard Ngo's Shortform · 2024-06-13T23:18:25.174Z · LW · GW

I think this post doesn't really explain why rats have high belief in doom, or why they're wrong to do so. Perhaps ironically, there is a better a version of this post on both counts which isn't so focused on how rats get epistemology wrong and the social/meta-level consequences. A post which focuses on the object-level implications for AI of a theory of rationality which looks very different from the AIXI-flavoured rat-orthodox view.

I say this because those sorts of considerations convinced me that we're much less likely to be buggered. I.e. I no longer believe EU maximization is/will be a good description by default of TAI or widely economically productive AGI, mildly superhuman AGI or even ASI, depending on the details. Which is partly due to a recognition that the arguments for EU maximization are weaker than I thought, arguments for LDT being convergent are lacking, the notions of optimality we do have are very weak, the existence and behaviour of GPT-4, Claude Opus etc. 

6 seems too general a claim to me. Why wouldn't it work for 1% vs 10%, and likewise 0.1% vs 1% i.e. why doesn't this suggest that you should round down P(doom) to zero. Also, I don't even know what you mean by "most" here. Like, are we quantifying over methods of reasoning used by current AI researchers right now? Over all time? Over all AI researchers and engineers? Over everyone in the West? Over everyone who's ever lived? Etc. 

And it seems to me like you're implicitly privileging ways of combining these opinions that get you 10% instead of 1% or 90%, which is begging the question. Of course, you could reply that a P(doom) of 10% is confused, that isn't really your state of knowledge, lumping in all your sub-agents models into a single number is too lossy etc. But then why mention that 90% is a much stronger prediction than 10% instead of saying they're roughly equally confused? 

7 I kinda disagree with. Those models of idealized reasoning you mention generalize Bayesianism/Expected Utility Maximization. But they are not far from the Bayesian framework or EU frameworks. Like Bayesianism, they do say there are correct and incorrect ways of combining beliefs, that beliefs should be isomorphic to certain structures, unless I'm horribly mistaken. Which sure is not what you're claiming to be the case in your above points. 

Also, a lot of rationalists already recognize that these models are addressing flaws in Bayesianism like logical omniscience, embeddedness etc. Like, I believed this at least around 2017, and probably earlier. Also, note that these models of epistemology are not in tension with a strong belief that we're buggered. Last I checked, the people who invented these models believe we're buggered. I think they may imply that we're a little less than the EU maximization theory though, but I don't think this is a big difference. IMO this is not a big enough departure to do the work that your post requires. 

 

Comment by Algon on My AI Model Delta Compared To Yudkowsky · 2024-06-10T17:44:25.252Z · LW · GW

The AI Optimists (i.e. the people in the associated Discord server) have a lot of internal disagreement[1], to the point that I don't think it's meaningful to talk about the delta between John and them.  That said, I would be interested in specific deltas e.g. with @TurnTrout, in part because he thought we'd get death by default and now doesn't think that, has distanced himself from LW, and if he replies, is more likely to have a productive argument w/ John than Quintin Pope or Nora Belrose would. Not because he's better, but because I think John and him would be more legible to each other. 

  1. ^

    Source: I'm on the AI Optimists Discord server and haven't seen much to alter my prior belief that  ~ everyone in alignment disagrees with everyone else.

Comment by Algon on 0. CAST: Corrigibility as Singular Target · 2024-06-08T12:43:32.211Z · LW · GW

This sounds like the sequence that I have wanted to write on corrigibility since ~2020 when I stopped working on the topic. So I am excited to see someone finally writing the thing I wish existed!