Posts

acronyms ftw 2022-10-21T13:36:39.378Z
The "you-can-just" alarm 2022-10-08T10:43:23.977Z
What's the actual evidence that AI marketing tools are changing preferences in a way that makes them easier to predict? 2022-10-01T15:21:13.883Z
EAGT Coffee Talks: Two toy models for theoretically optimal pre-paradigmatic research 2022-08-24T11:49:46.618Z
Emrik's Shortform 2022-08-21T10:43:47.675Z
Are you allocated optimally in your own estimation? 2022-08-20T19:46:54.589Z
Two Prosocial Rejection Norms 2022-04-28T20:53:15.850Z
The underappreciated value of original thinking below the frontier 2021-10-02T16:03:14.300Z
The Paradox of Expert Opinion 2021-09-26T21:39:45.752Z

Comments

Comment by Emrik (Emrik North) on Three Fables of Magical Girls and Longtermism · 2022-12-02T22:44:06.857Z · LW · GW

Still the only anime with what at least half-passes for a good ending. Food for thought, thanks! 👍

Comment by Emrik (Emrik North) on Thomas Larsen's Shortform · 2022-11-09T20:16:24.088Z · LW · GW

I've been exploring evolutionary metaphors to ML, so here's a toy metaphor for RLHF: recessive persistence. (Still just trying to learn both fields, however.)

"Since loss-of-function mutations tend to be recessive (given that dominant mutations of this type generally prevent the organism from reproducing and thereby passing the gene on to the next generation), the result of any cross between the two populations will be fitter than the parent." (k)

Related: 

Recessive alleles persists due to overdominance letting detrimental alleles hitchhike on fitness-enhancing dominant counterpart. The detrimental effects on fitness only show up when two recessive alleles inhabit the same locus, which can be rare enough that the dominant allele still causes the pair to be selected for in a stable equilibrium.

The metaphor with deception breaks down due to unit of selection. Parts of DNA stuck much closer together than neurons in the brain or parameters in a neural networks. They're passed down or reinforced in bulk. This is what makes hitchhiking so common in genetic evolution.

(I imagine you can have chunks that are updated together for a while in ML as well, but I expect that to be transient and uncommon. Idk.)


Bonus point: recessive phase shift.

"Allele-frequency change under directional selection favoring (black) a dominant advantageous allele and (red) a recessive advantageous allele." (source)

In ML:

  1. Generalisable non-memorising patterns start out small/sparse/simple.
  2. Which means that input patterns rarely activate it, because it's a small target to hit.
  3. But most of the time it is activated, it gets reinforced (at least more reliably than memorised patterns).
  4. So it gradually causes upstream neurons to point to it with greater weight, taking up more of the input range over time. Kinda like a distributed bottleneck.
  5. Some magic exponential thing, and then phase shift!

One way the metaphor partially breaks down because DNA doesn't have weight decay at all, so it allows for recessive beneficial mutations to very slowly approach fixation.

Comment by Emrik (Emrik North) on AllAmericanBreakfast's Shortform · 2022-11-09T19:30:11.677Z · LW · GW

Eigen's paradox is one of the most intractable puzzles in the study of the origins of life. It is thought that the error threshold concept described above limits the size of self replicating molecules to perhaps a few hundred digits, yet almost all life on earth requires much longer molecules to encode their genetic information. This problem is handled in living cells by enzymes that repair mutations, allowing the encoding molecules to reach sizes on the order of millions of base pairs. These large molecules must, of course, encode the very enzymes that repair them, and herein lies Eigen's paradox...


(I'm not making any point, just wanted to point to interesting related thing.)

Comment by Emrik (Emrik North) on Emrik's Shortform · 2022-11-08T20:32:04.425Z · LW · GW

Seems like Andy Matuschak feels the same way about spaced repetition being a great tool for innovation.

Comment by Emrik (Emrik North) on Moneypumping Bryan Caplan's Belief in Free Will · 2022-11-07T13:16:34.540Z · LW · GW

I like the framing. Seems generally usefwl somehow. If you see someone believing something you think is inconsistent, think about how to money-pump them. If you can't, then are you sure they're being inconsistent? Of course, there are lots of inconsistent beliefs that you can't money-pump, but seems usefwl to have a habit of checking. Thanks!

Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T09:53:45.221Z · LW · GW

How do you account for the fact that the impact of a particular contribution to object-level alignment research can compound over time?

  1. Let's say I have a technical alignment idea now that is both hard to learn and very usefwl, such that every recipient of it does alignment research a little more efficiently. But it takes time before that idea disseminates across the community.
    1. At first, only a few people bother to learn it sufficiently to understand that it's valuable. But every person that does so adds to the total strength of the signal that tells the rest of the community that they should prioritise learning this.
    2. Not sure if this is the right framework, but let's say that researchers will only bother learning it if the strength of the signal hits their person-specific threshold for prioritising it.
    3. Number of researchers are normally distributed (or something) over threshold height, and the strength of the signal starts out below the peak of the distribution.
    4. Then (under some assumptions about the strength of individual signals and the distribution of threshold height), every learner that adds to the signal will, at first, attract more than one learner that adds to the signal, until the signal passes the peak of the distribution and the idea reaches satiation/fixation in the community.
  2. If something like the above model is correct, then the impact of alignment research plausibly goes down over time.
    1. But the same is true of a lot of time-buying work (like outreach). I don't know how to balance this, but I am now a little more skeptical of the relative value of buying time.
  3. Importantly, this is not the same as "outreach". Strong technical alignment ideas are most likely incompatible with almost everyone outside the community, so the idea doesn't increase the number of people working on alignment.
Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T02:46:40.873Z · LW · GW

That's fair, but sorry[1] I misstated my intended question. I meant that I was under the impression that you didn't understand the argument, not that you didn't understand the action they advocated for.

I understand that your post and this post argue for actions that are similar in effect. And your post is definitely relevant to the question I asked in my first comment, so I appreciate you linking it.

  1. ^

    Actually sorry. Asking someone a question that you don't expect yourself or the person to benefit from is not nice, even if it was just due to careless phrasing. I just wasted your time.

Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T01:37:40.258Z · LW · GW

No, this isn't the same. If you wish, you could try to restate what I think the main point of this post is, and I could say if I think that's accurate. At the moment, it seems to me like you're misunderstanding what this post is saying.

Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T01:21:51.754Z · LW · GW

I would not have made this update by reading your post, and I think you are saying very different things. The thing I updated on from this post wasn't "let's try to persuade AI people to do safety instead," it was the following:

If I am capable of doing an average amount of alignment work  per unit time, and I have  units of time available before the development of transformative AI, I will have contributed  work. But if I expect to delay transformative AI by  units of time if I focus on it, everyone will have that additional time to do alignment work, which means my impact is , where  is the number of people doing work. Naively then, if , I should be focusing on buying time.[1]

  1. ^

    This assumes time-buying and direct alignment-work is independent, whereas I expect doing either will help with the other to some extent.

Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T00:45:35.769Z · LW · GW

A concrete suggestion for a buying-time intervention is to develop plans and coordination mechanisms (e.g. assurance contracts) for major AI actors/labs to agree to pay a fixed percentage alignment tax (in terms of compute) conditional on other actors also paying that percentage. I think it's highly unlikely that this is new to you, but didn't want to bystander just in case.

A second point is that there is a limited number of supercomputers that are anywhere close to the capacity of top supercomputers. The #10 most powerfwl is 0.005% as powerfwl as the #1. So it could be worth looking into facilitating coordination between them.

Perhaps one major advantage of focusing on supercomputer coordination is that the people who can make the relevant decisions[1] may not actually have any financial incentives to participate in the race for new AI systems. They have financial incentives to let companies use their hardware to train AIs, naturally, but they could be financially indifferent to how those AIs are trained.

In fact, if they can manage to coordinate it via something like assurance contract, they may have a collective incentive to demand that AIs are trained in safer alignment-tax-paying ways, because then companies have to buy more computing time for the same level of AI performance. That's too much to hope for. The main point is just that their incentives may not have a race dynamic.

Who knows.

  1. ^

    Maybe the relevant chain of command goes up to high government in some cases, or maybe there are key individuals or small groups who have relevant power to decide.

Comment by Emrik (Emrik North) on Instead of technical research, more people should focus on buying time · 2022-11-06T00:03:12.556Z · LW · GW

(Update: I'm less optimistic about this than I was when I wrote this comment, but I still think it seems promising.)

Multiplier effects: Delaying timelines by 1 year gives the entire alignment community an extra year to solve the problem. 

This is the most and fastest I've updated on a single sentence as far back as I can remember. I am deeply gratefwl for learning this, and it's definitely worth Taking Seriously. Hoping to look into it in January unless stuff gets in the way.

Have other people written about this anywhere?

I have one objection to claim 3a, however: Buying-time interventions are plausibly more heavy-tailed than alignment research in some cases because 1) the bottleneck for buying time is social influence and 2) social influence follows a power law due to preferential attachment. Luckily, the traits that make for top alignment researchers have limited (but not insignificant) overlap with the traits that make for top social influencers. So I think top alignment researchers should still not switch in most cases on the margin.

Comment by Emrik (Emrik North) on publishing alignment research and exfohazards · 2022-11-01T15:08:44.426Z · LW · GW

When walls don't work, can use ofbucsation? I have no clue about this, but wouldn't it be much easier to use pbqrjbeqf for central wurds necessary for sensicle discussion so that it wouldn't be sreachalbe, and then have your talkings with people on fb or something?

Would be easily found if written on same devices or accounts used for LW, but that sounds easier to work around than literally only using paper?

Comment by Emrik (Emrik North) on publishing alignment research and exfohazards · 2022-10-31T18:42:06.961Z · LW · GW

Yes! The way I'd like it is if LW had a "research group" feature that anyone could start, and you could post privately to your research group.

Comment by Emrik (Emrik North) on So8res's Shortform · 2022-10-27T18:46:56.132Z · LW · GW

Same! LW is an outstanding counterexample to my belief that resurrections are impossible. But I haven't incorporated it into my gears-level model yet, and I'm unsure how to. What did LW do differently, or which gear in my head caused me to fail to predict this?

Comment by Emrik (Emrik North) on Emrik's Shortform · 2022-10-27T17:07:29.459Z · LW · GW

Here's my definitely-wrong-and-overly-precise model of productivity. I'd be happy if someone pointed out where it's wrong.

It has three central premises: a) I have proximal (basal; hardcoded) and distal (PFC; flexible) rewards. b) Additionally, or perhaps for the same reasons, my brain uses temporal-difference learning, but I'm unclear on the details. c) Hebbian learning: neurons that fire together, wire together.

If I eat blueberry muffins, I feel good. That's a proximal reward. So every time my brain produces a motivation to eat blueberry muffins, and I take steps that makes me *predict* that I am closer to eating blueberry muffins, the synapses that produced *that particular motivation* gets reinforced and are more likely to fire again next time.

The brain gets trained to produce the motivations that more reliably produce actions that lead to rewards.

If I get out of bed quickly after the alarm sounds, there are no hardcoded rewards for that. But after I get out of bed, I predict that I am better able to achieve my goals, and that prediction itself is the reward that reinforces the behaviour. It's a distal reward. Every time the brain produces motivations that in fact get me to take actions that I in fact predict will make me more likely to achieve my goals, those motivations get reinforced.

But I have some marginal control over *which motivations I choose to turn into action*, and some marginal control over *which predictions I make* about whether those actions take me closer to my goals. Those are the two levers with which I am able to gradually take control over which motivations my brain produces, as long as I'm strategic about it. I'm a fledgling mesa-optimiser inside my own brain, and I start out with the odds against me.

I can also set myself up for failure. If I commit to, say, study math for 12 hours a day, then... I'm able to at first feel like I've committed to that as long as I naively expect, right then and there, that the commitment takes me closer to my goals. But come the next day when I actually try to achieve this, I run out of steam, and it becomes harder and harder to resist the motivations to quit. And when I quit, *the motivations that led me to quit get reinforced because I feel relieved* (proximal reward). Trying-and-failing can build up quitting-muscles.

If you're a sufficiently clever mesa-optimiser, you *can* make yourself study math for 12 hours a day or whatever, but you have to gradually build up to it. Never make a large ask of yourself before you've sufficiently starved the quitting-pathways to extinction. Seek to build up simple well-defined trigger-action rules that you know you can keep to every single time they're triggered. If more and more of input-space gets gradually siphoned into those rules, you starve alternative pathways out of existence.

Thus, we have one aspect of the maxim: "You never make decisions, you only ever decide between strategies."

Comment by Emrik (Emrik North) on acronyms ftw · 2022-10-25T09:25:29.000Z · LW · GW

Thank you for this discussion btw, this is helpfwl. I suspect it's hitting diminishing returns unless we hone in on practical specifics.

I think our levels of faith in the rationality community is a crux. Here's what I think, although I would stress again that although I tentatively believe what I say, I am not trying to be safe to defer to. Thus, I omit disclaimers and caveats and instead try to provide perspectives for evaluation. I think this warning is especially prudent here.

We have a really strong jargon hit-rate

The "natural incentives around jargon creation" in most communities favour usefwlness much less compared to this community. I can think of some examples of historically bad jargon:

  • "Politics is the mind killer" (not irredeemably bad, but net bad nonetheless imo)
  • "Bayesian"
    • Not confident here, but I think the term expanded too far from its roots, plus overemphasised. This could be prevented either by an increased willingness to use new terms for neighbouring semantic space, or an increased unwillingness to expand the use shibboleths for new things.
  • "NPC" (non-player character)
    • Not irredeemable, but questionable net value.
  • Probably more here but I can't recall.

I think our hit-rate so far on jargon has been remarkably good. Even under the assumption that increased coinage reduces accuracy (which I weakly disagree with), it seems on the margin plausible that it will take us closer to the pareto frontier.

I am less worried about becoming marginally more insular

Our collective project is exceedingly dangerous. We're deactivating our memetic immune system and fumbling towards deliberate epistemic practices that we hope can make up for it. I think rationality education must consist of lowering intuitive defenses in tandem with growing epistemological awareness. And in cases where this education is out of sync, it produces victims.

But I'd be wary of updating too much on Leverage as an indictment of rationality culture in general. That kind of defensiveness is the same mechanism by which hospitals get bureaucrified--they're minimising false-positives at the cost of everything else.

I suspect that our community's cultural inclination against these failure modes makes it more likely that our epistemic norms weaken with widespread social integration with other cultures.

I also think, more generally, that norms/advice that were necessary early on, could nowadays actively be hampering our progress. "Be less sure of yourself, seek wisdom from outside sources, etc." is necessary advice for someone just starting out on the path, but at some point your wisdom so far exceeds outside sources that the advice hits diminishing returns--tune yourself to where you sniff out value of information, whether that be insular or not.

Comment by Emrik (Emrik North) on acronyms ftw · 2022-10-24T15:35:29.689Z · LW · GW

Epistemic status: Do not defer to me. I'm here to provide interesting arguments and patterns that may help enlighten our understanding of things. I'm not optimising my conclusions for being safe to defer to. (It's the difference between minimising false-positives vs minimising false-negatives.)


  1. A high bar for adopting new jargon does not conflict with a low bar for suggesting jargon. In fact, I think if more jargon is suggested, I expect a lower proportion of them to be adopted, and also that the average quality of winning jargon goes up.
  2. There's a speed/scope tradeoff here similar to the Zollman effect. If what you care about is that your subculture advances in idea space asap, then adopting words faster could be good. If instead it matters that the culture be accessible to a greater number of people, then a strong prior against jargon seems better. I care more about the progress of the subculture than I do about its breadth, at least on the current margin as I see it.
  3. Point 2 above assumes for the sake of argument that you are correct about accessibility dropping the more jargon there is. But I don't think the case for that is very strong. I think well-placed jargon makes ideas and whole paradigms a lot easier to learn, and therefore more accessible. Furthermore, I think it's worth making a distinction between community-accessibility and idea-accessibility. A lot of jargon makes it harder to participate in the community without a greater understanding of the ideas, but they also makes it easier to understand those ideas in the first place. The net effect is probably that it creates a clearer separation between people inside and outside the community, with fewer gradients in between.
  4. Re your concern about epistemic closure (thanks for the jargon btw!)[1]: I think if the community had more widespread healthier epistemic norms, we would be more effective at adopting new ideas and reason rationally about outside perspectives, and be more willing to change our ways en masse if that outside perspective is actually better. The rationality community is qualitatively different than other communities because prestige is to a large extent determined by one's ability to avoid epistemic failure modes, such that the usual "cult warning signs" apply to a lesser degree.
  5. Saying heretical stuff here, I know, but I did disclaim deferral status in the first sentence, so should be safe : )
  6. Final point: bureaucratic acronyms are just way terribler than Internet-slang acronyms, tho! :p
  1. ^

    This is not meant to be a critique, I just found the irony a little funny. I appreciate your comment, and learning about epistemic closure from the link.

Comment by Emrik (Emrik North) on The "you-can-just" alarm · 2022-10-24T15:07:06.421Z · LW · GW

Yeah, all these "alarms" are supposed to warn you that the word (or something) might be misleading, and you should pay extra attention (unless it's already obvious) to avoid being misled. Or, pay extra attention because there is something you can do in response which is profitable.

Comment by Emrik (Emrik North) on aisafety.community - A living document of AI safety communities · 2022-10-21T13:39:15.125Z · LW · GW

nou. im busy rn, maybe later.

Comment by Emrik (Emrik North) on aisafety.community - A living document of AI safety communities · 2022-10-21T12:43:42.974Z · LW · GW

O.O!

They should bring it back.

Comment by Emrik (Emrik North) on aisafety.community - A living document of AI safety communities · 2022-10-21T12:09:07.910Z · LW · GW

Alas, those are just for hosting events. Jic, I tried to check if they have more functionality since I'm an organiser of the EA Gather Town group, but it doesn't do any of that.

Comment by Emrik (Emrik North) on aisafety.community - A living document of AI safety communities · 2022-10-21T07:24:52.911Z · LW · GW

Would be cool if LessWrong hosted subforums/bubbles/research-groups for anyone who wanted to start one and invite their friends. You would have the ability to write a post only to your bubble (visible on your bubble's frontpage or a private filter to the main frontpage) or choose to crosspost it to main as well. Having the bubbles be on LW provides them a little prestige boost and could stimulate some folk to initiate new research covens for alignment or whatever (or *cough* social epistemology research bubble maybe).

You could also have the option to filter karma so you only see the karma assigned by people in your bubble. Or, just like you can subscribe to get notified when people post, you could "subscribe" to prioritise their karma too. You could make a custom karma-filter individual to you by subscribing to people or groups whose opinions you trust. And the individual-filtered karma could be transitive as well, according to some parameters you set yourself--similar to plex&co's EigenTrust project except it'd be EigenKarma. There's more cool stuff here, but I'm probably never going to actually finish a post about it, so better suggest it briefly to someone than not suggest it at all.

OK, done daydreaming. Back to work.

Comment by Emrik North on [deleted post] 2022-10-21T06:50:20.433Z

Expert status A: I have tried to optimise this text to be safe to defer to, and minimise false-positive statements. If I say something wrong, I wish to be called out on it so that people don't end up deferring to something false on my account.

https://hollyelmore.substack.com/p/standard-disclaimer

Comment by Emrik (Emrik North) on The Balto/Togo theory of scientific development · 2022-10-15T08:56:36.212Z · LW · GW

This is a great parable. I'm often mildly reluctant to talk about some of my pre-formal ideas in case it gets finished up proper by others and I counterfactually lose social credit. I usually do it anyway, especially for stuff I don't plan on "finishing up". But I can see how this reluctance is like heavy molasses poured all over a research community, and it makes us much less effective.

In my experience, the "finishing stage" of making an idea precise enough to be presented is not where the germs of generality are--the parts of ideas that can be used to build other ideas with in a compounding fashion.[1] If I'm just researching or working on something in order to build up a repertoire of tools in order to personally use them for other problems, then I don't need to go through the expensive "finishing" stage of making the infrastructure for all the middle steps legible to others.

There's an essay by fields medalist William Thurston[2] with several related points, but it's worth reading in its entirety.

“First I will discuss briefly the theory of foliations, which was my first subject, starting when I was a graduate student. (It doesn’t matter here whether you know what foliations are.)

At that time, foliations had become a big center of attention among geometric topologists, dynamical systems people, and differential geometers. I fairly rapidly proved some dramatic theorems. I proved a classification theorem for foliations, giving a necessary and sufficient condition for a manifold to admit a foliation. I proved a number of other significant theorems. I wrote respectable papers and published at least the most important theorems. It was hard to find the time to write to keep up with what I could prove, and I built up a backlog.

An interesting phenomenon occurred. Within a couple of years, a dramatic evacuation of the field started to take place. I heard from a number of mathematicians that they were giving or receiving advice not to go into foliations—they were saying that Thurston was cleaning it out. People told me (not as a complaint, but as a compliment) that I was killing the field. Graduate students stopped studying foliations, and fairly soon, I turned to other interests as well.

... When I started working on foliations, I had the conception that what people wanted was to know the answers. I thought that what they sought was a collection of powerful proven theorems that might be applied to answer further mathematical questions. But that’s only one part of the story. More than the knowledge, people want personal understanding. And in our credit-driven system, they also want and need theorem-credits.

... I’ll skip ahead a few years, to the subject that Jaffe and Quinn alluded to, when I began studying 3-dimensional manifolds and their relationship to hyperbolic geometry.

... In reaction to my experience with foliations and in response to social pressures, I concentrated most of my attention on developing and presenting the infrastructure in what I wrote and in what I talked to people about

... There has been and there continues to be a great deal of thriving mathematical activity. By concentrating on building the infrastructure and explaining and publishing definitions and ways of thinking but being slow in stating or in publishing proofs of all the “theorems” I knew how to prove, I left room for many other people to pick up credit. There has been room for people to discover and publish other proofs of the geometrization theorem.

In this episode (which still continues) I think I have managed to avoid the two worst possible outcomes: either for me not to let on that I discovered what I discovered and proved what I proved, keeping it to myself (perhaps with the hope of proving the Poincare conjecture), or for me to present an unassailable and hard-to-learn theory with no practitioners to keep it alive and to make it grow.

(...) I think that what I have done has not maximized my “credits”. I have been in a position not to feel a strong need to compete for more credits. Indeed, I began to feel strong challenges from other things besides proving new theorems. I do think that my actions have done well in stimulating mathematics.”

Thurston was a Togo.

  1. ^

    “The art of doing mathematics consists in finding that special case which contains all the germs of generality.”

  2. ^

    And in the spirit of this post, I should HT Chris Olah for linking to this essay. It's important to maintain a culture for remembering what hat-tips are due.

Comment by Emrik (Emrik North) on A common failure for foxes · 2022-10-14T23:56:24.253Z · LW · GW

Good points, but I feel like you're a bit biased against foxes. First of all, they're cute (see diagram). You didn't even mention that they're cute, yet you claim to present a fair and balanced case? Hedgehog hogwash, I say.

Anyway, I think the skills required for forecasting vs model-building are quite different. I'm not a forecaster, but if I were, I would try to read much more and more widely so I'm not blindsided by stuff I didn't even know that I didn't know. Forecasting is caring more about the numbers; model-building is caring more about how the vertices link up, whatever their weights. Model-building is for generating new hypotheses that didn't exist before; forecasting is discriminating between what already exists.

I try to build conceptual models, and afaict I get much more than 80% of the benefit from 20% of the content that's already in my brain. There are some very general patterns I've thought so deeply on that they provide usefwl perspectives on new stuff I learn weekly. I'd rather learn 5 things deeply, and remember sub-patterns so well that they fire whenever I see something slightly similar, compared to 50 things so shallowly that the only time I think about them is when I see the flashcards. Knowledge not pondered upon in the shower is no knowledge at all.

Comment by Emrik (Emrik North) on Loss of Alignment is not the High-Order Bit for AI Risk · 2022-10-14T22:34:05.386Z · LW · GW

I strongly disagree with nearly everything and think the reasoning as written is flawed, but I still strong-upvoted because I seem to have significantly updated on your fourth paragraph.  I hadn't let the question sink in before now, so reading it was helpfwl. Thanks!

Comment by Emrik (Emrik North) on Working Mantras · 2022-10-11T22:19:01.506Z · LW · GW

I needed 9. There's not enough time to get utility U out of the project, so the right thing to do is to get U-100 out of it, and there's time for that. Owning up to it doesn't make it worse.

Comment by Emrik (Emrik North) on The Teacup Test · 2022-10-09T18:40:49.175Z · LW · GW

Aye, I didn't jump to the conclusion that you were aggressive. I wanted to make my comment communicate that message anyway, and that your comment could be interpreted like that gave me an excuse.

Comment by Emrik (Emrik North) on So, geez there's a lot of AI content these days · 2022-10-09T18:39:11.849Z · LW · GW

Randomise karma-boosts for each topic every day. Or let there be an "AI day", "Rationality day", "Practical day", etc. where the topic gets relatively promoted to the frontpage, but have it be luck of the draw rather than specific days. Just so writers have less of an incentive to withhold posting something to wait for the perfect day.

If readers visit the forum and see 90% AI every day, they'll probably have more of an impression that this is an AI forum, compared to if they see the same proportion of AI posts over a week, but not every day is an AI day.

Comment by Emrik (Emrik North) on The Teacup Test · 2022-10-08T20:02:43.011Z · LW · GW

All the correct ways and none of the incorrect ways, of course! I see the ambivalence and range of plausible interpretations. Can't I just appreciate a good post for the value I found in it without being fished out for suspected misunderstandings? :p

I especially liked how this is the cutest version of Socrates I've encountered in any literature.

Comment by Emrik (Emrik North) on The Teacup Test · 2022-10-08T08:36:58.292Z · LW · GW

This is glorious in so many ways. Thank you.

Comment by Emrik (Emrik North) on Emrik's Shortform · 2022-10-08T06:44:26.033Z · LW · GW

I struggle with prioritising what to read. Additionally, but less of a problem, I struggle to motivate myself to read things. Some introspection:

The problem is that my mind desires to "have read" something more than desiring the state of "reading" it. Either because I imagine the prestige or self-satisfaction that comes with thinking "hehe, I read the thing," or because I actually desire the the knowledge for its own sake, but I don't desire the attaining of it, I desire the having of it.[1]

Could I goodhart-hack this by rewarding myself for reading and feeling ashamed of myself for actually finishing a whole post? Probably not. I think perhaps my problem is that I'm always trying to cut the enemy, so I can't take my eyes off it for long enough to innocently experience the inherent joy of seeing interesting patterns. When I do feel the most joy, I'm usually descending unnecessarily deep into a rabbit hole.

"What are all the cell adhesion molecules, how are they synthesised, and is the synthesis bottlenecked by a particular nutrient I can supplement?!"

Nay, I think my larger problem is always having a million things that I really want to read, and I feel a desparate urge to go through all of them--yesterday at the latest! So when I do feel joy at the nice patterns I learn, I feel a quiet unease at the back of my mind calling me to finish this as soon as possible so we can start on the next urgent thing to read.

(The more I think about it, the more I realise just how annoying that constant impatient nagging is when I'm trying to read something. It's not intense, but it really diminishes the joy. While I do endorse impatience and always trying to cut the enemy, I'm very likely too impatient for my own good. On the margin, I'd make speedier progress with more slack.)

If this is correct, then maybe what I need to do is to--well, close all my tabs for a start--separate out the process of collecting from the process of reading. I'll make a rule: If I see a whole new thing that I want to read, I'm strictly forbidden to actually read it until at least a day has passed. If I'm already engaged in a particular question/topic, then I can seek out and read information about it, but I can only start on new topics if it's in my collection from at least a day ago.

I'm probably intuitively overestimating the a new thing's value relative to the things in my collections anyway, just because it feels more novel. If instead I only read things from my collection, I'll gradually build up an enthusiasm for it that can compete with my old enthusiasm for aimless novelty--especially as I experience my new process outperforming my old.

My enthusiasm for "read all the newly-discovered things!" is not necessarily the optimal way to experience the most enthusiasm for reading, it's just stuck in a myopic equilibrium I can beat with a little activation energy.

  1. ^

    What this ends up looking like is frantically skimming through the paper until I find the patterns I'm looking for, and I end up being so frustrated that I can't immediately find it that the experience ends up being unpleasant.

Comment by Emrik (Emrik North) on Looping · 2022-10-06T00:15:59.419Z · LW · GW

Strong subjective disagreement here, but I guess that's why I feel so strongly about it--I'm an outlier. :p

Comment by Emrik (Emrik North) on Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA · 2022-10-04T23:28:17.298Z · LW · GW

Excellent comment. Independently same main takeaway here. Thanks for the pictures!

Agree with nitpick, although I get why they restrict the term "grok" to mean "test loss minimum lagging far behind training loss minimum". That's the mystery and distinctive pattern from the original paper, and that's what they're aiming to explain.

Comment by Emrik (Emrik North) on What's the actual evidence that AI marketing tools are changing preferences in a way that makes them easier to predict? · 2022-10-03T16:56:30.461Z · LW · GW

Flattered you ask, but I estimate that I'll be either very busy with my own projects or on mental-health vacation until the end of the year. But unless you're completely saturated with connections, I'd be happy to have a 1:1 conversation sometime after October 25th? Just for exploration purposes, not for working on a particular project.

Comment by Emrik (Emrik North) on What's the actual evidence that AI marketing tools are changing preferences in a way that makes them easier to predict? · 2022-10-02T05:07:04.243Z · LW · GW

Thanks! This is what I'm looking for. Seems like I should have googled "recommender systems" and "preference shifts".

Edit: The openreview paper is so good. Do you know who the authors are?

Comment by Emrik (Emrik North) on You Are Not Measuring What You Think You Are Measuring · 2022-09-22T14:31:51.956Z · LW · GW

Basically, we have an almost religious reverence for high-powered decent-effect-size low-p-value statistical evidence, and we fail notice when these experiments acquire their bayes factors because they measure something incredibly narrow and is therefore unlikely to generalise to whatever gears-level models you're entertaining.

It has the same problem as deferring to experts. The communication suffers a bandwidth problem[1], and we frequently delude ourselves into believing we have adopted their models as long as we copy their probabilities on a very slim sample of queries they've answered.

  1. ^

    I know of no good write-up of the bandwidth problem in social epistemology but Owen Cotton-Barrat talks about it here (my comment) and Dennett refers to it as the "Daddy Is a Doctor" phenomenon.

Comment by Emrik (Emrik North) on Levels of goals and alignment · 2022-09-17T07:30:47.669Z · LW · GW

This is excellent and personally helpfwl. "Alignment" decconfusion/taxonomy can be extremely helpfwl because it lets us try to problem-solve individual parts more directly (without compromise). It's also a form of problem-factoring that lets us coordinate our 'parallel tree search threads' (i.e. different researchers) more effectively.

Comment by Emrik (Emrik North) on alexrjl's Shortform · 2022-09-08T18:09:34.773Z · LW · GW

Mh,  I thought perhaps you were going in this direction. A world where there's a many-to-one mapping between prompts and output is plausibly a world where the visible mappings are just the tip of the iceberg. And if you then have the opportunity to iteratively filter out seemingly dangerous behaviour, that's likely to just push the entire behaviour under the surface--still present, just not legible.

Comment by Emrik (Emrik North) on alexrjl's Shortform · 2022-09-08T00:43:41.116Z · LW · GW

*helplessly confused*

I get that LLMs have steganographically hidden information. At least I would expect this to be the case, even if I can't confirm it to the extent I predict it. E.g. you can trigger the concept "apple" via more ways than just prompting "apple", and you can trigger the entire semantics of "princess Leia eating an apple" via prompts that look entirely different. That is, there could be a many-to-one prompt-to-meaning relationship.

But what's selecting for more important/dangerous information being hidden? Next-token prediction? Why would it differentially select for important/dangerous?

Comment by Emrik (Emrik North) on Research productivity tip: "Solve The Whole Problem Day" · 2022-09-01T18:31:53.085Z · LW · GW

My mileage varies. I have a bias for the 5th level, and if I'm currently deeply immersed in a rabbit hole that I reflectively think is usefwl, then going up the ladder again risks reminding me how distal the rabbit hole objectives are. I remind myself just how much I care about saving the world, but the caring mostly leaks out when trying to reach more distal instrumental objectives.

Comment by Emrik (Emrik North) on In Search of Strategic Clarity · 2022-09-01T14:43:03.682Z · LW · GW

This is excellent and I'm dismayed that it only has two votes. It clarified something important for me.

Comment by Emrik (Emrik North) on Don't Over-Optimize Things · 2022-08-31T19:43:26.417Z · LW · GW

Modest epistemology and hubris are bistable as well. You need hubris in order to produce anything worthwhile so you have the self-confidence required to produce anything worthwhile. Grr, need a better word for hubris.

Comment by Emrik (Emrik North) on Don't Over-Optimize Things · 2022-08-31T19:30:18.999Z · LW · GW

Productivity and akrasia are neighbouring valleys in a bistable system. If you're productive, you can keep up behaviour which lets you continue be productive (e.g. get your tasks done, sleep well, exercise). If you seem be behind on your tasks one day, it stresses you out a little, so you put some extra effort in to return to equilibrium. But if you're too far behind one day, your stress level shoots through the roof, so you put in a lot of extra effort, so you sleep less, so you have less effort to put in, so your stress level increases--and either you persevere gloriously because you tried really hard, or you fall apart. Make an ill-advised bet and you end up in the akratic equilibrium, and climbing back up will be rough.

But putting in extra effort is not the only response you have in order to decrease stress (sometimes). You can also give up on some of your plans and prioritise within what you can manage. Throwing your plans overboard gives you no chance of success, but it could make your productivity loop more robust. This has to be managed against the risk of degrading the strength of your habits, however. You're a finely-tuned multidimensional control system, and there are pitfalls in every direction.

  • The Pygmalion effect is a psychological phenomenon in which high expectations lead to improved performance in a given area.
  • It always takes longer than you expect, even when you take into account Hofstadter's Law.
  • Work expands so as to fill the time available for its completion.
  • The demand upon a resource tends to expand to match the supply of the resource.

When do you throw out luggage? When do you let out steam? If propositional attitudes are part of your control loop, how do you consciously manage it so conscious management doesn't interfere with the loop? Without resorting to model-dissolving outside-view perspectives, I mean.

Comment by Emrik (Emrik North) on A Sketch of Good Communication · 2022-08-31T01:33:37.542Z · LW · GW

This is one of the most important reasons why hubris is so undervalued. People mistakenly think the goal is to generate precise probability estimates for frequently-discussed hypotheses (a goal in which deference can make sense). In a common-payoff-game research community, what matters is making new leaps in model space, not converging on probabilities. We (the research community) are bottlenecked by insight-production, not marginally better forecasts or decisions. Feign hubris if you need to, but strive to install it as a defense against model-dissolving deference.

Comment by Emrik (Emrik North) on Meditations on Momentum · 2022-08-29T10:51:53.011Z · LW · GW

Basically, health is a bistable equilibria, so there's a threshold you can cross to enter the sphere of influence of a new attractor state. In the presence of threshold effects (e.g. escape velocity, critical mass, romance), we should be looking for threshold strategies. Don't just gradient-ascent yourself up the slope. Save all your resources for a massive push all at once, because isolated efforts don't matter unless it takes you above the threshold. As long as you're at rock bottom and you can't descend any further, you want to increase your variance. Use the Oberth manoeuvre![1]

  1. ^

    And besides-- it works in Kerbal Space Program.

Comment by Emrik (Emrik North) on Principles for Alignment/Agency Projects · 2022-08-28T12:27:50.131Z · LW · GW

Coming back to this a few showers later.

  • A "cheat" is a solution to a problem that is invariant to a wide range of specifics about how the sub-problems (e.g. "hard parts") could be solved individually. Compared to an "honest solution", a cheat can solve a problem with less information about the problem itself.
     
  • A b-cheat (blind) is a solution that can't react to its environment and thus doesn't change or adapt throughout solving each of the individual sub-problems (e.g. plot armour). An a-cheat (adaptive/perceptive) can react to information it perceives about each sub-problem, and respond accordingly.
    • ML is an a-cheat because even if we don't understand the particulars of the information-processing task, we can just bonk it with an ML algorithm and it spits out a solution for us.
       
  • In order to have a hope of finding an adequate cheat code, you need to have a good grasp of at least where the hard parts are even if you're unsure of how they can be tackled individually. And constraining your expectation over what the possible sub-problems or sub-solutions should look like will expand the range of cheats you can apply, because now they need to be invariant to a smaller space of possible scenarios.
    • If effort spent on constraining expectation expands the search space, then it makes sense to at least confirm that there are no fully invariant solutions at the shallow layer before you iteratively deepen and search a larger range.
      • This relates to Wason's 2-4-6 problem, where if the true rule is very simple like "increasing numbers", subjects continuously try to test for models that are much more complex before they think to check the simplest models.
        • This is of course because they have the reasonable expectation that the human is more likely to make up such rules, but that's kinda the point: we're biased to think of solutions in the human range.
           
  • Limiting case analysis is when you set one or more variables of the object you're analysing to their extreme values. This may give rise to limiting cases that are easier to analyse and could give you greater insights about the more general thing. It assumes away an entire dimension of variability, and may therefore be easier to reason about. For example, thinking about low-bandwidth oracles (e.g. ZFP oracle) with cleverly restrained outputs may lead to general insights that could help in a wider range of cases. They're like toy problems.

    "The art of doing mathematics consists in finding that special case which contains all the germs of generality." David Hilbert
     
  • Multiplex case analysis is sorta the opposite, and it's when you make as few assumptions as possible about one or more variables/dimensions of the problem while reasoning about it. While it leaves open more possibilities, it could also make the object itself more featureless, fewer patterns, easier to play with in your working memory.

    One thing to realise is that it constrains the search space for cheats, because your cheat now has to be invariant to a greater space of scenarios. This might make the search easier (smaller search space), but it also requires a more powerfwl or a more perceptive/adaptive cheat. It may make it easier to explore nodes at the base of the search tree, where discoveries or eliminations could be of higher value.

    This can be very usefwl for extricating yourself from a stuck perspective. When you have a specific problem, a problem with a given level of entropy, your brain tends to get stuck searching for solutions in a domain that matches the entropy of the problem. (speculative claim)
    • It relates to one of Tversky's experiments (I have not vetted this), where subjects were told to iteratively bet on a binary outcome (A or B), where P(A)=0.7. They got 2 money for correct and 0 for incorrect. Subjects tended to try to bet on A with frequency that matched the frequency of the outcome. Whereas the highest EV strategy is to always bet on A.
    • This also relates to the Inventor's Paradox.

      "The more ambitious plan may have more chances of success […] provided it is not based on a mere pretension but on some vision of the things beyond those immediately present." ‒ Pólya

      Consider the problem of adding up all the numbers from 1 to 99. You could attack this by going through 99 steps of addition like so: 

      Or you could take a step back and find a more general problem-solving technique (an a-cheat). Ask yourself, how do you solve all 1-iterative addition problems? You could rearrange it as:



      To land on this, you likely went through the realisation that you could solve any such series with  and add  if  is odd.

      The point being that sometimes it's easier to solve "harder" problems. This could be seen as, among other things, an argument for worst-case alignment.
Comment by Emrik (Emrik North) on Vingean Agency · 2022-08-25T07:50:10.079Z · LW · GW

I'm confused. (As in, actually confused. The following should hopefwly point at what pieces I'm missing in order to understand what you mean by a "problem" for the notion.)

Vingean agency "disappears when we look at it too closely"

I don't really get why this would be a problem. I mean, "agency" is an abstraction, and every abstraction becomes predictably useless once you can compute the lower layer perfectly, at least if you assume compute is cheap. Balloons!

Imagine you've never seen a helium balloon before, and you see it slowly soaring to the sky. You could have predicted this by using a few abstractions like density of gases and Archimedes' principle. Alternatively, if you had the resources, you could make the identical prediction (with inconsequentially higher precision) by extrapolating from the velocities and weights of all the individual molecules, and computed that the sum of forces acting on the bottom of the balloon exceeds the sum acting on the top. I don't see how the latter being theoretically possible implies a "problem" for abstractions like "density" and "Archimedes' principle".

Comment by Emrik (Emrik North) on Are you allocated optimally in your own estimation? · 2022-08-22T21:43:20.548Z · LW · GW

Oh, I think may have miscommunicated. But I emphatically get what you're saying.

the fact that most people spent most of their lives doing jobs that they would rather not do, seemed a calamity comparable to the fact that everyone dies, and something that also needed to be resisted and changed.

This is exactly what I'm pointing at. I think we can have a lot more impact if we aren't paid to do specific things, but we still need money in order to buy food. I advocate "EA tenure" for promising altruists, so that they don't have to waste their time trying to impress their funders and can just get on doing what they think is best.

Comment by Emrik (Emrik North) on Emrik's Shortform · 2022-08-21T12:49:51.103Z · LW · GW

owo thanks