Posts

At 87, Pearl is still able to change his mind 2023-10-18T04:46:29.339Z
Contra LeCun on "Autoregressive LLMs are doomed" 2023-04-10T04:05:10.267Z
Bayesian optimization to find molecules that bind to proteins 2023-03-13T18:17:44.812Z

Comments

Comment by rotatingpaguro on Changes in College Admissions · 2024-04-24T23:39:56.175Z · LW · GW

After the events of April 2024, I cannot say that for Columbia or Yale. No just no.

What are these events?

Comment by rotatingpaguro on Anthropic AI made the right call · 2024-04-15T01:52:48.775Z · LW · GW

Your argument would imply that competition begets worse products?

Comment by rotatingpaguro on Metascience of the Vesuvius Challenge · 2024-03-31T06:08:06.983Z · LW · GW

One big prize, or many small prizes like here?

Comment by rotatingpaguro on What is the best argument that LLMs are shoggoths? · 2024-03-18T01:00:15.778Z · LW · GW

First thoughts:

  • Context length is insanely long
  • Very good at predicting the next token
  • Knows many more abstract facts

These three things are all instances of being OOM better at something specific. If you consider the LLM somewhat human-level at the thing it does, this suggests that it's doing it in a way which is very different from what a human does.

That said, I'm not confident about this; I can sense there could be an argument that this counts as human but ramped up on some stats, and not an alien shoggoth.

Comment by rotatingpaguro on More people getting into AI safety should do a PhD · 2024-03-15T01:56:18.798Z · LW · GW

If I had to give only one line of advice to a randomly sampled prospective grad student: you don't actually have to do what the professor says.

Comment by rotatingpaguro on Richard Ngo's Shortform · 2024-03-11T18:52:01.497Z · LW · GW

Ok. Then I'll say that randomly assigned utility over full trajectories are beyond wild!

The basin of attraction just needs to be large enough. AIs will intentionally be created with more structure than that.

Comment by rotatingpaguro on Richard Ngo's Shortform · 2024-03-11T01:28:52.940Z · LW · GW

I read the section you linked, but I can't follow it. Anyway, here it is its conclusive paragraph:

Conclusion: Optimal policies for u-AOH will tend to look like random twitching. For example, if you generate a u-AOH by uniformly randomly assigning each AOH utility from the unit interval , there's no predictable regularity to the optimal actions for this utility function. In this setting and under our assumptions, there is no instrumental convergence without further structural assumptions. 

From this alone, I get the impression that he hasn't proved that "there isn't instrumental convergence", but that "there isn't a totally general instrumental convergence that applies even to very wild utility functions".

Comment by rotatingpaguro on Shortform · 2024-03-11T01:12:37.675Z · LW · GW

It's AI-based, so my guess is that it uses a lot of somewhat superficial correlates that could be gamed. I expect that if it went mainstream it would be Goodharted.

I expect Goodhart would hit particularly bad if you were doing the kind of usage I guess you are implying, which is searching for a few very well selected people. A selective search is a strong optimization, and so Goodharts more.

More concrete example I have in mind, that maybe applies right now to the technology: there are people who are good at lying to themselves.

Comment by rotatingpaguro on Why correlation, though? · 2024-03-06T21:40:49.077Z · LW · GW

Yes, in general the state of the art is more advanced than looking at correlations.

You just need to learn when using correlations makes sense. Don't assume that everyone is using correlations blindly; Statistics PhDs most likely decide whether to use them or not based on context and know the limited ways in which what the say applies.

Correlations make total sense when the distribution of the variables is close to multivariate Normal. The covariance matrix, which can be written as a combination of variances + correlation matrix, completely determines the shape of a multivariate Normal.

If the variables are not Normal, you can try to transform them to make them more Normal, using both univariate and multivariate transformations. This is a very common Statistics tool. Basic example: Quantile normalization.

Comment by rotatingpaguro on Some costs of superposition · 2024-03-03T17:12:20.518Z · LW · GW

As we get closer to maxing out 

This is , right? (Feel free to delete this comment.)

Comment by rotatingpaguro on Counting arguments provide no evidence for AI doom · 2024-02-28T05:10:43.886Z · LW · GW

There is also a hazy counting argument for overfitting:

  1. It seems like there are “lots of ways” that a model could end up massively overfitting and still get high training performance.
  2. So absent some additional story about why training won’t select an overfitter, it feels like the possibility should be getting substantive weight.

While many machine learning researchers have felt the intuitive pull of this hazy overfitting argument over the years, we now have a mountain of empirical evidence that its conclusion is false. Deep learning is strongly biased toward networks that generalize the way humans want— otherwise, it wouldn’t be economically useful.

I don't know well NN history, but I have the impression good NN training is not trivial. I expect that the first  attempts at NN training went bad in some way, including overfitting. So, without already knowing how to train an NN without overfitting, you'd get some overfitting in your experiments. The fact that now, after someone already poured their brain juice over finding techniques that avoid the problem, you don't get overfitting, is not evidence that you shouldn't have expected overfitting before.

The analogy with AI scheming is: you don't already know the techniques to avoid scheming. You can't use as counterargument a case in which a problem has already deliberately been solved. If you take that same case, and put yourself in the shoes of someone who doesn't already have the solution, you see you'll get the problem in your face a few times before solving it.

Then, it is a matter of whether it works like Yudkowsky says, that you may only get one chance to solve it.


The title says "no evidence for AI doom in counting arguments", but the article mostly talks about neural networks (not AI in general), and the conclusion is

In this essay, we surveyed the main arguments that have been put forward for thinking that future AIs will scheme against humans by default. We find all of them seriously lacking. We therefore conclude that we should assign very low credence to the spontaneous emergence of scheming in future AI systems— perhaps 0.1% or less.

"main arguments": I don't think counting arguments completely fill up this category. Example: the concept of scheming originates from observing it in humans.

Overall, I have the impression of some overstatement. It can also be that I'm missing some previous discussion context/assumptions, so other background theory from you may say "humans don't matter as examples", and also "AI will be NNs and not other things".

Comment by rotatingpaguro on China-AI forecasts · 2024-02-26T19:51:20.517Z · LW · GW

It’s quite successfully managed to urbanize it’s population and now seems to have reached the Lewis turning point where young people who try to leave their villages to find work cities often don’t find it and have to stay in their villages, in the much lower productivity jobs.

I can't follow this. Wikipedia says that

The Lewis turning point is a situation in economic development where surplus rural labor is fully absorbed into the manufacturing sector. This typically causes agricultural and unskilled industrial real wages to rise.

So it looks like at the Lewis point there's over-demand for workers, so they can find the jobs. Instead you describe it as if there's over-supply, the manufacturing sector does not need any more workers so they can't find jobs.

Comment by rotatingpaguro on Can we get an AI to do our alignment homework for us? · 2024-02-26T18:58:40.748Z · LW · GW

There's only one way to know!

</joking> <=========

Comment by rotatingpaguro on Jailbreaking GPT-4 with the tool API · 2024-02-21T21:26:23.521Z · LW · GW

My guess is that things which are forbidden but somewhat complex (like murder instructions) have not really been hammered out from the base model as much as more easily identifiable things like racial slurs.

It should be easier to train the model to just never say the latin word for black, than to recognize instances of sequences of actions that lead to a bad outcome.

The latter require more contextual evaluation, so maybe that's why the safety training has not generalized well to the tool usage behaviors; is "I'm using a tool" enough different context that "murder instructions" + "tool mode" should count as a case different from "murder instructions" alone?

Comment by rotatingpaguro on Ten Modes of Culture War Discourse · 2024-02-10T18:05:04.281Z · LW · GW

IF/IE (Yandere/Tsundere): Alice (the Yandere) pretends to like Bob but in fact is trying to manipulate him into doing what she wants, while Bob (the Tsundere) pretends to hate Alice but in fact is totally on-board with her agenda.

  • This description is a bit of a joke - I can't even imagine what this mode would look like, let alone think of any real-world examples.

Maybe love things? Or female things?

Comment by rotatingpaguro on AI #49: Bioweapon Testing Begins · 2024-02-05T06:37:06.287Z · LW · GW

I want to give that conclusion a Bad Use of Statistical Significance Testing. Looking at the experts, we see a quite obviously significant difference. There is improvement here across the board, this is quite obviously not a coincidence. Also, ‘my sample size was not big enough’ does not get you out of the fact that the improvement is there – if your study lacked sufficient power, and you get a result that is in the range of ‘this would matter if we had a higher power study’ then the play is to redo the study with increased power, I would think?

My immediate take on seeing the thing as you report it:

  • Please write if the bars are 68%, 90%, or 95%.
  • Total agree on sidestepping significativity and instead asking "does the posterior distribution over possible effect sizes include an effect size I consider relevant?"
  • "There is improvement here across the board, this is quite obviously not a coincidence.": I would need more details to feel that confident. The first thing I'd look at is the correlation structure of those scores; can it be that are they just repeating mostly the same information over and over?

Paper argues that transformers are a good fit for language but terrible for time series forecasting, as the attention mechanisms inevitably discard such information. If true, then there would be major gains to a hybrid system, I would think, rather than this being a reason to think we will soon hit limits. It does raise the question of how much understanding a system can have if it cannot preserve a time series.

That paper got a reply one year later: "Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)" (haven't read either one).

Comment by rotatingpaguro on "Genlangs" and Zipf's Law: Do languages generated by ChatGPT statistically look human? · 2024-01-31T21:42:33.622Z · LW · GW

I think that instead of considering random words as a baseline reference (Fig. 2), you should take the alphabet plus the space symbol, generate a random i.i.d. sequence of them, and then index words in that text. This won't give a uniform distribution over words. It is total gibberish, but I expect it would follow Zipf's law the same, based on these references I found on Wikipedia:

Wentian Li (1992), "Random Texts Exhibit Zipfs-Law-Like Word Frequency Distribution"

V. Belevitch (1959), "On the statistical laws of linguistic distributions"

I'd also show an example of the "ChatGPT gibberish" produced.

Comment by rotatingpaguro on Making every researcher seek grants is a broken model · 2024-01-26T21:49:16.070Z · LW · GW

Do you think CERN is an example of what you want?

Comment by rotatingpaguro on Monthly Roundup #14: January 2024 · 2024-01-24T23:20:28.400Z · LW · GW

And Italy also definitely uses the green socket, and a larger version of the three dots in line socket. Many sockets in commerce just accommodate everything at once.

Comment by rotatingpaguro on Optimisation Measures: Desiderata, Impossibility, Proposals · 2024-01-22T02:54:21.732Z · LW · GW

I remembered this when I read the following excerpt in Meaning and Agency:

In Belief in Intelligence, Eliezer sketches the peculiar mental state which regards something else as intelligent:

Imagine that I'm visiting a distant city, and a local friend volunteers to drive me to the airport.  I don't know the neighborhood. Each time my friend approaches a street intersection, I don't know whether my friend will turn left, turn right, or continue straight ahead.  I can't predict my friend's move even as we approach each individual intersection - let alone, predict the whole sequence of moves in advance.

Yet I can predict the result of my friend's unpredictable actions: we will arrive at the airport. 
[...]
I can predict the outcome of a process, without being able to predict any of the intermediate steps of the process.

In Measuring Optimization Power, he formalizes this idea by taking a preference ordering and a baseline probability distribution over the possible outcomes. In the airport example, the preference ordering might be how fast they arrive at the airport. The baseline probability distribution might be Eliezer's probability distribution over which turns to take -- so we imagine the friend turning randomly at each intersection. The optimization power of the friend is measured by how well they do relative to this baseline. 

I think this can be a useful notion of agency, but constructing this baseline model does strike me as rather artificial. We're not just sampling from Eliezer's world-model. If we sampled from Eliezer's world-model, the friend would turn randomly at each intersection, but they'd also arrive at the airport in a timely manner no matter which route they took -- because Eliezer's actual world-model believes the friend is capably pursuing that goal.

So to construct the baseline model, it is necessary to forget the existence of the agency we're trying to measure while holding other aspects of our world-model steady. While it may be clear how to do this in many cases, it isn't clear in general. I suspect if we tried to write down the algorithm for doing it, it would involve an "agency detector" at some point; you have to be able to draw a circle around the agent in order to selectively forget it. So this is more of an after-the-fact sanity check for locating agents, rather than a method of locating agents in the first place.

Comment by rotatingpaguro on AI #48: Exponentials in Geometry · 2024-01-18T17:03:50.206Z · LW · GW

I guess the statement on the front page "Announcement: Hi, I think my best content is on Twitter/X right now" means that they do not update the front page so well.

https://nitter.ktachibana.party/base_rate_times

Comment by rotatingpaguro on AI #48: Exponentials in Geometry · 2024-01-18T17:02:02.727Z · LW · GW

I also want to note that, while I consider such behavior to indeed be technically criminal under existing law, I do not actually want anyone to be thrown in jail for it, on reflection I remember that does not actually lead good places, and regret saying that. I stand by the rest of my statement.

Indeed I was surprised you wanted people jailed. It even came to my mind while I was waking up today.

Comment by rotatingpaguro on Even if we lose, we win · 2024-01-16T02:45:24.730Z · LW · GW

Therefore, it might be a good idea for all (or at least a large group of) alignment researchers to coordinate around pursuing the same specific alignment plan based on the result of a quantum RNG, or something like that.

From this I infer that you think the set of alignment strategies we would use as alternatives to pick by quantum dice is enough to cover much more space than a single one which seems the best by general consensus.

My intuition tells me that if Clippy thought we had a  chance, this trick does not really move the needle.


Epistemic status: Updating on this comment and taking into account uncertainty about my own values, my credence in this post is around 50%.

Is this conditional on the "Assumptions" section, or marginal?

Comment by rotatingpaguro on Quick takes on "AI is easy to control" · 2024-01-14T22:37:01.484Z · LW · GW

I think the term is clear because it references the name and rule of the world-famous board game, where you can't use words from a list during your turn.

Comment by rotatingpaguro on Eliminating Cookie Banners is Hard · 2024-01-14T21:44:49.390Z · LW · GW

I think this is unlikely, because it is not in a website interest to annoy its users, and they are not otherwise obtaining something from bigger banners.

Comment by rotatingpaguro on AI Alignment Metastrategy · 2024-01-13T05:14:50.455Z · LW · GW

All of those fields use math but don't heavily rely on rigorously provable formulations of their problems.

Chicken and egg: is this evidence they are not mature enough to make friendly AI, or evidence that friendly AI can be made with that current level of rigor?

Comment by rotatingpaguro on AI Alignment Metastrategy · 2024-01-13T05:09:05.803Z · LW · GW

I heard that Ashkenazi Jews are 1 SD up on IQ, which is about the kind of improvement we are talking about with embryo selection. I do not have the impression they are bad towards other humans. Do you think otherwise?

To be clear, I am not trying to gotcha you with antisemitism, and I totally understand if you want to avoid discussion because this is a politically charged topic.

Comment by rotatingpaguro on [Request]: Use "Epilogenics" instead of "Eugenics" in most circumstances · 2024-01-13T05:03:01.284Z · LW · GW

I do think though, that we can agree that the amount of people who abort when they are warned of it is much higher than the percentage of people who are unhappy they got their surprise down syndrome kid?

Is this because people predict accurately whether they'd like a Down kid, or because everyone thinks it's bad but it's actually pleasant?

Or some middle ground. Taking all percentages at face value, my first guess is that there is some social pressure to avoid birthing Downs, so people over-abort, and those who do not are defiant because they are quite sure, in the right, that they are totally fine with a Down kid, so they end up with a high rate of happiness. This is compatible with aborting being the right choice for most people.

Rephrase:

  • Most people are better off aborting the Down.
  • A few people are not.
  • The majority sets the social pressure.
  • People in the grey area default to aborting due to such social pressure.
  • The non-aborters are thus only those who are sure enough of what they to not follow social pressure.
  • Thus the non-aborters are a very well selected group of people actually better off with their Down kid.
Comment by rotatingpaguro on Eliminating Cookie Banners is Hard · 2024-01-13T04:21:37.835Z · LW · GW

I think this is a good demonstration of why companies generally do choose to stick with cookie banners even though they're annoying.

Why are they annoying?

Some websites—rare, delicate lotus flowers—bother me with a small, horizontal banner on the bottom of the page. When I click "accept", it actually goes away forever on that browser for that website.

While many others, istead, slap a bulky framed message in the middle of the page. Possibly 2-4 seconds after most of the loading, just to interrupt my initial interactions with the page in the most annoying way possible.

Is there a reason for that? Is it out of control overconservative legal worry?

Comment by rotatingpaguro on Bayesians Commit the Gambler's Fallacy · 2024-01-10T22:15:20.135Z · LW · GW

But the point of the post is to use that as a simplified model of a more general phenomenon, that should cling to your notions connected to "gambler's fallacy".

A title like yours is more technically defensible and closer to the math, but it renounces an important part. The bolder claim is actually there and intentional.

It reminds me of a lot of academic papers where it's very difficult to see what all that math is there for.

To be clear, I second making the title less confident. I think your suggestion exceeds in the other direction. It omits content.

Comment by rotatingpaguro on Reflective consistency, randomized decisions, and the dangers of unrealistic thought experiments · 2024-01-10T21:49:45.773Z · LW · GW

I don't see any inconsistency in beliefs. Initially, everyone thinks that the probability that the urn with 18 green balls is chosen is 1/2. After someone picks a green ball, they revise this probability to 9/10, which is not an inconsistency, since they have new evidence, so of course they may change their belief. This revision of belief should be totally uncontroversial. If you think a person who picks a green ball shouldn't revise their probability in this way then you are abandoning the whole apparatus of probability theory developed over the last 250 years. The correct probability is 9/10. Really. It is.

I don't like this way of argument by authority and sheer repetition.

That said, I feel totally confused about the matter so I can't say whether I agree or not.

Comment by rotatingpaguro on A Land Tax For Britain · 2024-01-10T21:31:17.656Z · LW · GW

Ok, I agree that you have to normalize the number of vacant homes. The total number of homes is the largest denominator that makes sense. My doubt was if the denominator could be something smaller than the number of total homes.

In different words, my knowledge of the housing market is not sufficient to say if 2.7% counts as small or large. Why does it "seem really low"?

Analogous example that comes to my mind: if I am a male searching for a female mate, I prefer cities with higher female/male ratio. Say town A has 49-51, and town B has 51-49. Is a 2% difference large or small? I argue it could be large, for what matters for finding a mate: if most couples in a town are already locked, i.e., people in long term relationships, then the "free market" of dating is much more gender-skewed than a 2% difference.

To be concrete: say 90 people are paired, and 10 are single. Then removing the paired 45:45, the gender ratio within singles remains 4:6 in town A, and 6:4 in town B, i.e., in town A there are 2 single females for each 3 single males.

Thus the set that makes the ratio more intuitively "large" or "small" is the set of singles rather than the set of all people.

Getting back to housing: maybe there is a smaller set contaning all vacant homes, or even a more restrictive set to consider that contains only some vacant homes, that is more appropriate. I don't know though.

Comment by rotatingpaguro on Bayesians Commit the Gambler's Fallacy · 2024-01-07T17:40:38.032Z · LW · GW

I have the impression the OP is using "gambler's fallacy" as a conventional term for a strategy, while you are taking "fallacy" to mean "something's wrong". The OP does write about this contrast, e.g., in the conclusion:

Maybe the gambler’s fallacy doesn’t reveal statistical incompetence at all. After all, it’s exactly what we’d expect from a rational sensitivity to both causal uncertainty and subtle statistical cues.

So I think the adversative backbone of your comment is misdirected.

Comment by rotatingpaguro on A Land Tax For Britain · 2024-01-07T10:48:37.447Z · LW · GW

As I try to look into this more, I'm also finding that the vacancy rate seems really low in England. 676,304 vacant homes / 24.9 million total homes gives a vacancy rate of 2.7%

Why is the ratio (vacant homes/total homes) the right thing to look at, if a single metric is to be considered for argument?

Comment by rotatingpaguro on Game Theory without Argmax [Part 1] · 2023-12-28T19:45:41.924Z · LW · GW

I'd like to read your solution to exercise 6, could you add math formatting? I have a hard time reading latex code directly.

You can do that with the visual editor mode by selecting the math and using the contextual menu that appears automatically, or with $ in the Markdown editor.

There are $ in your comment, so I guess you inadvertently typed in visual mode using the Markdown syntax.

Comment by rotatingpaguro on Game Theory without Argmax [Part 1] · 2023-12-28T19:39:53.456Z · LW · GW

Ok, take 2.

If I understand correctly, what you want must be more like "restrict the domain of the task before plugging it into the optimiser," and less like "restrict the output of the optimiser."

I don't know how to do that agnostically, however, because optimisers in general have the domain of the task baked in. Indeed the expression for a class of optimisers is , with  in it.

Considering better-than-average optimisers from your example, they are a class with a natural notion of "domain of the task" to tweak, so I can naturally map any initial optimiser to a new one with a restricted task domain: , by taking the mean over .

But given a otherwise unspecified , I don't see a natural way to define a .

Assuming there's no more elegant answer than filtering for that (), then the question must be: is there another minimally restrictive class of optimisers with such a natural notion, which is not the one with the "detested element"  already proposed by the OP?

Try 1: consequentialist optimisers, plus the assumption , i.e., the legal moves do not restrict the possible payoffs. Then, since the optimiser picks actions only through , for each r I can delete illegal actions from the preimage, without creating new broken outputs. However, this turns out to be just filtering, so it's not an interesting case.

Try 2: the minimal distill of try 1 is that the output either is empty or contains legal moves already, and then I filter, so yeah not an interesting idea.

Try 3: invariance under permutation of something? A task invariant under permutation of  is just a constant task. An optimiser "invariant under permutation of " does not even mean something.

Try 4: consider a generic map . This does not say anything, it's the baseline.

Try 5: analyse the structure of a specific example. The better-than-average class of optimisers is . It is consequentialist and context-independent. I can't see how to generalize something mesospecific here.

Time out.

Comment by rotatingpaguro on Why does expected utility matter? · 2023-12-27T22:52:52.093Z · LW · GW

if the language is not saying anything that is meaningful to my intuition.

When you learn a new language, you eventually form new intuitions. If you stick to existing intuitions, you do not grow. Current intuition does not generalize to the utmost of your potential ability.

When I was toddler, I never proceeded to grow new concepts by rigorous construction; yet I ended up mostly knowing what was around me. Then, to go further, I employed abstract thought, and had to mold and hew my past intuitions. Some things I intuitively perceived, turned out likely false; hallucinations.

Later, when I was learning Serious Math, I forgot that learning does not work by a straight stream of logic and proofs, and instead demanded that what I was reading both match my intuitions, and be properly formal and justified. Quite the ask!

The problem with this view of utility "just as a language"

My opinion is that if you think the problem lays in seeing it as a language, a new lens to the world, because specifically of the new language not matching your present intuition, you are pointing at the wrong problem.

If instead you meant to prosaically plead for object-level explanations that would clarify, oh uhm sorry I don't actually know, I'm an improvised teacher, I actually have no clue, byeeeeee

Comment by rotatingpaguro on Why does expected utility matter? · 2023-12-27T18:56:35.030Z · LW · GW

This is not a complete answer, it's just a way of thinking about the matter that was helpful to me in the past, and so might be to you too:

Saying that you ought to maximise the expected value of a real valued function of everything still leaves a huge amount of freedom; you can encode what you want by picking the right function over the right things.

So you can think of it as a language: a conventional way of expressing decision strategies. If you can write a decision strategy as , then you have written the problem in the language of utility.

Like any generic language, this won't stop you from expressing anything in general, but it will make certain things easier to express than others. If you know at least two languages, you'll have sometimes encountered short words that can't be efficaciously translated to a single word in the other language.

Similarly, thinking that you ought to maximise expected utility, and then asking "what is my utility then?", naturally suggests to your mind certain kinds of strategies rather than others.

Some decisions may need many epicycles to be cast as utility maximisation. That this indicates a problem with utility maximisation, with the specific decision, or with the utility function, is left to your judgement.

There is currently not a theory of decision that just works for everything, so there is not a totally definitive argument for maximum expected utility. You'll have to learn when and how you can not apply it with further experience.

Comment by rotatingpaguro on Generalizing Koopman-Pitman-Darmois · 2023-12-24T22:05:23.015Z · LW · GW

I already understood that because you explain it in the text; the further doubt I have is: concerning only the "only if" part, given a -dimensional sufficient statistic exists by assumption, if  is also a -dimensional sufficient statistic or not.

I think not, because it should not be able to capture what goes on with the  variables, that's hidden in the completely arbitrary  term.

This annoys me because I can't see the form of the sufficient statistic like in the i.i.d. case.

Comment by rotatingpaguro on Generalizing Koopman-Pitman-Darmois · 2023-12-24T20:29:53.374Z · LW · GW

In the "Independent But Non-Identical KPD" version, the term  is not a sufficient statistic in general, right?

(I could probably figure it out getting into the weeds of the proof, but I did not get it by reading once.)

Comment by rotatingpaguro on OpenAI: Preparedness framework · 2023-12-18T22:54:42.260Z · LW · GW

OpenAI's basic framework:

  1. Do dangerous capability evals at least every 2x increase in effective training compute. This involves fine-tuning for dangerous capabilities, then doing evals on pre-mitigation and post-mitigation versions of the fine-tuned model. Score the models as Low, Medium, High, or Critical in each of several categories.
    1. Initial categories: cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy.
  2. If the post-mitigation model scores High in any category, don't deploy it until implementing mitigations such that it drops to Medium.
  3. If the post-mitigation model scores Critical in any category, stop developing it until implementing mitigations such that it drops to High.
  4. If the pre-mitigation model scores High in any category, harden security to prevent exfiltration of model weights. (Details basically unspecified for now.)

This outlines a very iterative procedure. If models started hitting the thresholds, and this logic was applied repeatedly, the effect could be pushing the problems under the rug. At levels of ability sufficient to trigger those thresholds, I would be worried about staying on the bleeding edge of danger, tweaking the model until problems don't show up in evals.

I guess this strategy is intended to be complemented with the superalignment effort, and not to be pushed on indefinitely and the main alignment strategy.

Comment by rotatingpaguro on Some Rules for an Algebra of Bayes Nets · 2023-12-14T13:14:31.263Z · LW · GW

Aaaand about this?

I have never encountered this topic, and I couldn't find it skimming the Handbook of Graphical Models (2019), did you invent it? If not, could you list references?

Comment by rotatingpaguro on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-14T12:59:09.628Z · LW · GW

LessWrong is mostly ok. Specific problems/new things I'd like:

NEW REACTION EMOJIS

  • A reaction emoji to say "be quantitative here". Example: someone says "too much", I can't infer from context how much is too much, I believe it's the case I need to know that to carry their reasoning through, and I want them to stick their neck out and say a number. Possible icons: stock chart, magi-stream of numbers, ruler,  atom, a number with digits and a question mark (like "1.23?"), dial.
  • A reaction emoji to say "give a calibration/reference level for this". Example: interlocutor says "A is intelligent", I can't infer from context who A is intelligent compared to, and I want them to say "compared to B", "relative to average in group ABCD", or similar. Possible icons: double-plate scale with an object on one side and a question mark on the other (too complex maybe?), ruler, caliper, °C/°F, bell curve, gauge stick in water or snow.

TECHNICAL PROBLEMS

  • In the last month loading any LessWrong page hangs indefinitely for me. I have to hit reload about 5 times in close sequence to unjam it.

"HIDE USER NAMES" PROBLEMS

  • The "hide user names" option is not honored everywhere. The names unconditionally appear in the comments thread structure on the left, and in dialogue-related features (can't remember the exact places).
  • The "hide user names" feature would be more usable if names were consistently replaced with auto-generated human-friendly nicknames (e.g., aba-cpu-veo, DarkMacacha, whatev simple scheme of random words/syllables), re-generated and assigned on page load. With the current "(hidden)" placeholder it's quite difficult to follow discussions. After this modification, the anonymized user names should have different formatting or decoration to avoid being confused with true usernames.
  • When the actual user name appears by hovering over the fake one, it annoyingly flickers between the two states if the actual name is shorter than the placeholder. I guess the simplest solution is bounding the width to remain at least that of the placeholder.
  • I often involuntarily reveal a name by running across it with the pointer, or can't resist the temptation. It should be somewhat expensive to uncover a single name. Maybe a timer like the one for strong votes.
Comment by rotatingpaguro on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2023-12-14T00:19:51.857Z · LW · GW

Couldn't there be genetic effects on things that can improve the brain even once its NN structure is mostly fixed? Maybe it's possible to have neurons work faster, or for the brain to wear less with abstract thinking, or to need less sleep.

This kind of thing is not a full intelligence improvement because it does not allow you to notice more patterns or to think with new schemes.

So maybe yes, it won't make a difference for AI timelines, though it would still be a very big deal.

Comment by rotatingpaguro on Are There Examples of Overhang for Other Technologies? · 2023-12-13T23:49:48.520Z · LW · GW

TL;DR: No.

I'm curious why the summary is "No" when afterwards you provide the example of land speed records. Are you not counting it because of your argument that jet engines were pushed on by a larger independent market, while large-scale AI hardware would not if large-scale AI was banned?

I would have preferred a less concise summary such as "we looked at the most-like-this N things, 0 matched, 1 barely".

Disclaimer: did not read the thing in full, skimmed many parts. May have missed an explanation.

Comment by rotatingpaguro on AI #41: Bring in the Other Gemini · 2023-12-13T07:18:31.679Z · LW · GW

Yudkowsky AFAICT has at most engaged via a couple tweets (again which don’t seem to engage with the points).

If you mean literally two, it's more, although I won't take the time to dig up the tweets. I remember seeing them discuss at non-trivial length at least once on twitter. (If "a couple" encompassed that... Well once someone asked me "a couple of spaghetti" and when I gave him 2 spaghetti he got quite upset. Uhm. Don't get upset at me, please?)

I've thought a bit about this because I too on first sight perceived a lack of serious engagement. I've not yet come to a confident conclusion; on reflection I'm not so sure anymore there was an unfair lack of engagement.

First I tried to understand Pope's & co arguments at the object level. Within the allotted time, I failed. I expected to fare better, so I think there's some mixture of (Pope's framework not being simplifiable) & (Pope's current communication situation low), where the comparatives refer to the state of Yudkowsky's & co framework when I first encountered it.

So I turned to proxies; in cases where I thought I understood the exchange, what could I say about it? Did it seem fair?

From this I got the impression that sometimes Pope makes blunders at understanding simple things Yudkowsky means (not cruxes or anything really important, just trivial misunderstandings), which throw a shadow over his reading comprehension, such that then one is less inclined to spend the time to take him seriously when he makes complicated arguments that are not clear at once.

On the other hand, Yudkowsky seems to not take the time to understand when Pope's prose is a bit approximative or not totally rigorous, which is difficult to avoid when compressing technical arguments.

So my current conceit is: a mixture of (Pope is not good at communicating) & (does not invest in communication). This does not bear significatively on whether he's right, but it's a major time investment to understand him, so inevitably someone with many options on who to talk to is gonna deprioritize him.

To look at a more specific point, Vaniver replied at length to Quintin's post on Eliezer's podcast, and Eliezer said those answers were already "pretty decent", so although he did not take the time to answer personally, he bothered to check that someone was replying more thoroughly.

P.S. to try to be actionable: I think Pope's viewpoint would greatly benefit from having someone who understands it well, but is good and dedicated at communication. Although they are faring quite well on fame, so maybe they don't need, after all, anything more?

P.P.S. they now have a website, optimists.ai, so indeed they do think they should ramp up communication efforts, instead of resting on their current level of fame.

Comment by rotatingpaguro on Why No Automated Plagerism Detection For Past Papers? · 2023-12-12T17:57:38.005Z · LW · GW

Is plagiarism considered bad everywhere in the world, or is it an American foible? I vaguely recall reading years ago that in China it was not considered bad per-se and this occasionally gave Chinese some problems with American academic institutions. However I did not check the sources at the time nor quantified the effect, I was a naive newspaper-reader.

Comment by rotatingpaguro on What do you do to remember and reference the LessWrong posts that were most personally significant to you, in terms of intellectual development or general usefulness? · 2023-12-10T21:07:25.236Z · LW · GW

I copy them as PDFs using my browser's "Export as PDF..." menu item, and add the date in the file name. I keep them all in a directory. PDFs have the advantage that I can mark notes on them, search by file content from the file browser, or input them into Claude.

I've started doing this recently after I hit your same problem, I'm liking it so far.

When I really care about remembering, I make Anki flashcards.

Comment by rotatingpaguro on Some Rules for an Algebra of Bayes Nets · 2023-12-10T21:02:00.065Z · LW · GW
  • Re-Rooting Rule for Markov Chains
  • Joint Independence Rule
  • Frankenstein Rule
  • Factorization Transfer Rule
  • Stitching Rule for A Shared Markov Blanket
  • Swap Rule  <=====
  • Bookkeeping Rules

Can't find the "Swap rule" in the post, what was that?


The more complex approximation rule requires some additional machinery. For any diagram, we can decompose its  into one term for each variable via the chain rule:

  <=====

If we know how our upper bound  on the diagram’s  decomposes across variables, then we can use that for more fine-grained bounds. Suppose that, for each diagram  and variable , we have an upper bound

  <=====

I guess here there are missing expectations (indicated in red), otherwise  would remain a free RV. The second expectation is not necessary, although omitting it makes the bound unnecessarily strict.


I have never encountered this topic, and I couldn't find it skimming the Handbook of Graphical Models (2019), did you invent it? If not, could you list references?


Do you perchance have something for conditioning or marginalization? I know that graphical marginalization requires using ADMGs (graphs with double-headed arrows) instead of DAGs, I don't know about conditioning.


Exercise

Extend the above proofs to re-rooting of arbitrary trees (i.e. the diagram is a tree). We recommend thinking about your notation first; better choices for notation make working with trees much easier.

In a tree each node has a single parent. Re-rooting means flipping the arrow from the root to one of its children.

In the factorization I do P(old_root)P(new_root|old_root) = P(old_root|new_root)P(new_root), so the proof is the same.

The fact that notation is not involved makes my suspect I may be missing something. Maybe they mean it's needed to write out in full the factorization expression? I guess the nice notation is using ch(X_i) instead of pa(X_i).


Joint Independence Rule: Exercise

The Joint Independence Rule can be proven using the Frankenstein Rule. This is left as an exercise. (And we mean that unironically, it is actually a good simple exercise which will highlight one or two subtle points, not a long slog of tedium as the phrase “left as an exercise” often indicates.)

Bonus exercise: also prove the conditional version of the Joint Independence Rule using the Frankenstein Rule.

Joint independence starts from the graphs

  X_i        {X_not i}

for all i. The Frankenstein rule can not be applied right away because the graphs are not on the same variables, since all but one appear grouped as a single tuple node.

To change that, I replace the tuple nodes with fully connected subgraphs. This is valid because cliques do not constrain the distribution, as any distribution can be factorized in any order if all variables are kept in the conditionals.

I choose the connection order of the cliques such that they respect a global ordering.

Then I apply Frankenstein, picking, for each node, the graph where that node is isolated.

Is there a way to obtain a narrower bound? Yes: I can pick only  from the i-th graph.

Conditional version: basically the same proof, with the common parent as first node in the global order. Needs more care in justifying replacing a tuple node with a clique: Conditional on the parent, the argument above still goes. The only additional independence property of the graph with the tuple is that the parent d-separates the tuple from the singlet, and that is preserved as well.


Frankenstein Rule

We’ll prove the approximate version, then the exact version follows trivially.

Without loss of generality, assume the order of variables which satisfies all original diagrams is . Let  be the factorization expressed by diagram , and let  be the diagram from which the parents of  are taken to form the Frankenstein diagram. (The factorization expressed by the Frankenstein diagram is then .)

The proof starts by applying the chain rule to the  of the Frankenstein diagram:

   <=====

Then, we add a few more expected KL-divergences (i.e. add some non-negative numbers) to get:

Thus, we have 

   <=====

I guess you meant the green thing (first margin arrow) to appear in place of the red thing (second margin arrow)? The math is right, but I hypothesize the final line was intended to express at once the narrow and the loose bounds.

Comment by rotatingpaguro on Game Theory without Argmax [Part 1] · 2023-11-29T21:13:04.229Z · LW · GW

Ah guess it's a typo then, and your use is a nonstandard one.