Risk Map of AI Systems 2020-12-15T09:16:46.852Z
Epistea Workshop Series: Epistemics Workshop, May 2020, UK 2020-02-28T10:37:34.229Z
Epistea Summer Experiment (ESE) 2020-01-24T10:49:35.228Z
Epistea Summer Experiment 2019-05-13T21:29:43.681Z
Isaac Asimov's predictions for 2019 from 1984 2018-12-28T09:51:09.951Z
Multi-agent predictive minds and AI alignment 2018-12-12T23:48:03.155Z
CFAR reunion Europe 2018-11-27T12:02:36.359Z
Why it took so long to do the Fermi calculation right? 2018-07-02T20:29:59.338Z
Dissolving the Fermi Paradox, and what reflection it provides 2018-06-30T16:35:35.171Z
Effective Thesis meetup 2018-05-31T19:49:56.285Z
Far future, existential risk, and AI alignment 2018-05-10T09:51:43.278Z
Review of CZEA "Intense EA Weekend" retreat 2018-04-05T23:04:09.398Z
Brno: Far future, existential risk and AI safety 2018-04-02T19:11:06.375Z
Life hacks 2018-04-01T10:29:20.023Z
Welcome to LessWrong Prague [Edit With Your Details] 2018-04-01T10:23:36.557Z
Welcome to Czech Association for Effective Altruism [Edit With Your Details] 2018-04-01T10:12:16.508Z
Reward hacking and Goodhart’s law by evolutionary algorithms 2018-03-30T07:57:05.238Z
Optimal level of hierarchy for effective altruism? 2018-03-27T22:38:27.967Z
GoodAI announced "AI Race Avoidance" challenge with $15k in prize money 2018-01-18T18:05:09.811Z
Nonlinear perception of happiness 2018-01-08T09:04:15.314Z


Comment by Jan_Kulveit on The Coordination Frontier: Sequence Intro · 2021-09-07T09:05:34.768Z · LW · GW

Cf Epistea Summer Experiment (ESE)

"The central problem [of coordination between rationalists] is that people use beliefs for many purposes - including tracking what is true. But another, practically important purpose is coordination. We think it’s likely that if an aspiring rationalist decides to “stop bullshitting”, they lose some of the social technology often used for successfully coordinating with other people. How exactly does this dynamic affect coordination? Can we do anything about it?"

Also: based on ESE experience, I have some "rich data but small sample size" research about rationalists failing at coordination, in experimental settings. Based on this I don't think rationalists would benefit most e.g. from more advanced and complex S2-level coordination schemes, but more from something like improving "S1/S2" interfaces / getting better at coordination between their S1 a S2 coordination models. (In a somewhat similar way I believe most people's epistemic rationality benefits more from learning things like "noticing confusion" compared to e.g. "learning more from the heuristics and biases literature".)

(Also as a sidenote ... we have developed few group rationality techniques/exercised for ESE; I'm unlikely to write them for LW, but if someone would be interested in something like "write things in a legible way based on conversations" I would be happy to spend time on that (also likely could be payed work). )

Comment by Jan_Kulveit on Transitive Tolerance Means Intolerance · 2021-08-15T10:16:42.855Z · LW · GW

Often, I think about such changes as about phase transitions on a network.

If we assume that these processes (nucleation of clique of the new phase, changes in energy on edge boundaries,...)  are independent of the content of moral change,  we can expect the emergence of "fluctuations" of new moral phases. Then the question is which of these fluctuations grow to eventually take over the whole network; from an optimistic perspective, this is where relatively small differences between moral phases caused by some phases being "actually better" break the symmetry and lead to gradual moral progress. 

Stated in other words, if you look at the micro-dynamic, when you look at the individual edges and nodes, you see the main terms are social pressure, coercion, etc., but the 3rd order terms representing something like "goodness of the moral system in the abstract " act as a symmetry-breaking term and have large macroscopic consequences.

Turning to longtermism, network-wise, it seems advantageous for the initial bubble of the new phase to spread to central nodes in the network - which seems broadly in line with what EA is doing.  Plausibly,  in this phase, reasoning plays larger role, and coercion smaller - which is what you see. On the other hand, if longtermism becomes sufficiently large / dominant, I would expect it will become more coercive.

Comment by Jan_Kulveit on Jimrandomh's Shortform · 2021-07-21T08:13:46.750Z · LW · GW

cf Non-linear perception of happiness

Comment by Jan_Kulveit on Jimrandomh's Shortform · 2021-07-19T20:47:27.948Z · LW · GW

I highly recommend reading something about mainstream research on this topic:

Comment by Jan_Kulveit on The Mountaineer's Fallacy · 2021-07-19T09:45:02.665Z · LW · GW

An idea that climbing Everest has something to do with the trip to the moon is in many ways an antithesis to the concept of mountaineering, so I don't think this name choice is fortunate. 

Comment by Jan_Kulveit on We Still Don't Know If Masks Work · 2021-07-09T10:05:09.909Z · LW · GW

Why we don't have more studies on Taffix or increasing humidity in schools is not a matter of much attention seem like Inadequate equilibria type of problem, which seem quite distinct from evaluating existing evidence on some topic

Failure to see an effect doesn't mean that effects are disproven but it does mean that we don't know whether the effect exists. 

Sorry for too much brevity before. 

No. Per Bayes theorem, failure to see an effect in an analysis/experiment where you would expect to see no effect no matter if the effect exists or not should make you to stay with the prior.  

In the specific case of this topic and post - someone looking at masks clearly will likely have a prior "they work, but they aren't a miracle cure". More precisely,  this could be expressed roughly as an expected effect distribution in R reduction space with almost all of the mass centered somewhere between 5% a 50%. Different reasonable observers looking at different data will likely arrive at somewhat different maximum likelihood estimates and shapes of the distribution, but they will have very little probability mass on no effect, or harm, and very little on large effect.

Should someone with a prior from this class update the prior, based on the evidence consisting of the analysis by Mike Harris?

Not at all! Posterior should stay the same. 

Should someone with this prior update the prior, based on reading the referenced paper?

In my view yes, I think bayesians should update away even more from very low effects (like 5%) or very high effects (like 50%). 

Comment by Jan_Kulveit on We Still Don't Know If Masks Work · 2021-07-07T13:03:14.427Z · LW · GW

In contrast to the title of the post, looking at all available evidence, we actually do know that masks work. 

Also I'm highly sceptical of claims of the form ... "the failure of large absolute differences in variable X across regions to meaningfully impact the observed growth rate ... should make us skeptical of large claimed effects". 

More real picture is something like
- there are just a few factors which have so large impact on R that is possible to "clearly see them" - the main ones are vaccination and more transmissible variants increasing R by factors like 1.6
- there are few things which have an impact on R like 40% (strongest NPIs like all gathering banned, seasonal amplitude, or large scale testing)
- there are many moderately strong factors which have an impact like 20% (masks, weaker NPIs, increased humidity,...)

Uncovering even the moderate sized effects requires modelling, and failure to see them clearly e.g. in regional comparisons should not make anyone any update apart from rejecting obviously implausibly large effect sizes (e.g. mask reducing transmission by 60%)

Comment by Jan_Kulveit on Postmortem to Petrov Day, 2020 · 2020-10-03T22:44:05.572Z · LW · GW

Below the account by Chris, I listed multiple meta-games stacked on top of the Petrov button, namely

  1. press the button or not
  2. take it as a game | as a serious ritual | as a serious experiment
  3. cooperate or defect on the implicit rule allowing play behaviour ~ "you are allowed to play and experiment in games and this is safe. it is understood actions you take within the game will not be used as an evidence of intent outside of the game". (imagine I play a game of chess with someone and interpret my opponent taking my pieces as literarily trying to harm me)
  4. the meta-game of making the game interesting; cf munchkin
  5. the meta-game of making the experiment valuable for learning
  6. coordination about which of these games we are playing

Overall for me one take-away is LW community should get better at game #6. 

While "use of LW for 24h" is at stake at game #1, I would argue there are actually higher stakes at some of the other games. 

For example, if most people take it as a serious ritual at game #2, the warning text attached to the button should maybe state also "apart from blowing up the site, you will also blow up some part of your social credit, opportunities and trust". Coordination failure at game #2 can also lead to a situation where someone understands the situation as a game, decides for some reason it better to press the button, and faces social repercussions from people who choose "serious ritual" or "serious experiment" in #2. I can imagine this has somewhat large tail-risk, including for example someone leaving the community entirely, or causing more drama and psychological pain than the payoff in game #1.

For some people failures at game #6 can touch things like "thou should not make people participate in serious and potentially harmful psychological experiments without clear consent". 

Ultimately game #5 is maybe the most important where this community learning wrong intuitions about xrisk on S1 level could be at stake. 

Comment by Jan_Kulveit on On Destroying the World · 2020-09-29T22:25:03.765Z · LW · GW

There seem to be multiple meta- games

  1. press the button or not
  2. take it as a game | as a serious ritual | as a serious experiment
  3. cooperate or defect on the implicit rule allowing play behaviour ~ "you are allowed to play and experiment in games and this is safe. it is understood actions you take within the game will not be used as an evidence of intent outside of the game". (imagine I play a game of chess with someone and interpret my opponent taking my pieces as literarily trying to harm me)
  4. the meta-game of making the game interesting; cf munchkin
  5. the meta-game of making the experiment valuable for learning
  6. coordination about which of these games we are playing

To me Chris's story hears like [ press | game | cooperate | ? | ? | ]

Overall while I think there is a lot of value in having a community of people who do not press big red buttons, I also see a lot of value in noticing these other games and "cooperating" in them. 

Comment by Jan_Kulveit on The rationalist community's location problem · 2020-09-26T10:50:51.200Z · LW · GW

Overall I think the rationalist community is concentrated too much in one hub and the secondary and tertiary hubs are weaker than they should be.

The main negatives are
- this creates a bit of single-point-of-failure dynamic; imagine the single hub becomes infected by some particularly dangerous meme, or bad community norms
- the single hub is still embedded in the wider society of the place where it is located, introducing some systematic bias (the epistemic climate of contemporary US seems increasingly scary; Bay rationalists sometimes seems overcompensating for the insanities of the broader society)
- the single hub would be vulnerable to a coordinated attack originating from the environment

There are also advantages of single hub
- in theory in a single hub it is easy to visit people and form connections; in practice it seems this is true in Berkeley, less true in the whole Bay where travel distances are comparable to flight times between European cities

And there is the huge advantage of Bay
- being close to the nexus of power and the most future-shaping place is extremely important (as explained by Scott and others)

Advantages of more hubs are
- in my view, could support more strains of thoughts / more experiments with community / more opportunities where people can lead things 
- less fragility
- more of the total available talent used; some people will just not move to the Bay (will not get visas / can not bear with culture /....)

Instead of thinking "should we find the location X and move The hub" I would suggest thinking about optimal allocation of people in a structure of networked places

- which secondary hubs should grow / grow faster / be founded
- how to create links; people should consider moving temporarily between the hubs (for eg half a year or a year), even in the direction "Bay -> elsewhere" - this is often the best way to form links

What should be avoided
- some "holier-than-thou" dynamic where people who made the sacrifice of moving to the Bay and living there even if they think it terrible place with low quality of life assume that people who did not made the sacrifice are not sufficiently dedicated to the mission or similar; hence the rest of the world can be ignored


Comment by Jan_Kulveit on The case for C19 being widespread · 2020-04-02T15:40:56.637Z · LW · GW

Of the many problems of this theory...

Many places need ~10 PCR tests to find one infection while the group tested is often highly pre-selected, such as "symptomatic people with known contacts". You should have much higher prior it is infected. Some of the numbers proposed in the "tip of the iceberg" framework would actually mean the prior probability of being infected in the "tested group" is lower than in the general population.

With this hypothesis its very hard to make sense of China. Outside of Hubei, China managed to contain the outbreak in large part by contact tracing & testing. However if you assume there is some very high number of cases you don't know about, it is difficult to explain why contact tracing can influence anything.

Comment by Jan_Kulveit on March Coronavirus Open Thread · 2020-03-09T23:56:33.360Z · LW · GW

We are looking for forecasters/"estimators" to help with estimating various COVID-19 parameters, such as number of infected cases, which will go into epidemic modelling, augmenting unreliable reported data. Ideally the end product should be the results of the modelling presented in a good web UI. If you would be interested in helping, reply privately.

Q&A: How does it compare to Metaculus? In a few important ways.

1. the estimates are not the end product, but an input to epidemic modelling software

2. in our UX, we want to clearly communicate the results of the epidemic are not pre-determined, but depend on actions humanity will take

3. we want to expose more of the uncertainties and underlying dynamic, as opposed to static forecasts

Comment by Jan_Kulveit on Becoming Unusually Truth-Oriented · 2020-02-18T11:32:22.068Z · LW · GW

My best guess gears-level model of what's going on here

  • the "predictive processing engine" has quite rich model of the world / people / histories / ... whatever
  • somewhat special domain into which it is "predicting" are thoughts / concepts / language / "the voice in your head" (somewhat overlapping with "S2")
  • with "words on top of your tongue", the PP system is trying to find a structure in the "thinking/verbal" domain which would be fitting the "PP" structure (many people have pretty specific sense of prediction error if they are missing the right word, which drops when they find it / the word "fits")
  • generally directing attention toward such interface can greatly increase it's throughput/precision

And so here are some caveats

  • This isn't as directly grounded in reality as it may seem
  • The nature of PP is such that model adjustment will be going on both sides (e.g. if I'll be looking at cloud shapes in the sky, and some cloud will start resonating with the concept/word Stegosaurus, my perception will change all the way down toward noticing plate-resembling parts of the cloud, etc.)
  • In particular with probing in more detail, the PP machinery will generally be able to generate more details; in case of memories, as you note, the problem is they they are mostly output of generative word model inside your head, not of the external world; if your generative world is precise enough and your attention was focused on something while experiencing it, the recall could be quite reliable
  • The relation of the language/concept space with reality is somewhat complicated... notice that in the above given example with clouds the concept of Stegosaurus is a result of pretty impressive and big cultural computation which happened almost entirely outside of your head

So... while I generally like most of the specific advice, I don't think truth-orientated thinking is a good label. In my view what's a necessary ingredient for truth orientation, missing here, are strong links between anything happening inside the brain and "the rest of the reality".

Comment by Jan_Kulveit on Disincentives for participating on LW/AF · 2019-05-16T22:12:58.123Z · LW · GW


1) From the LW user perspective, the way AF is integrated in a way which signals there are two classes of users, where the AF members are something like "the officially approved experts" (specialists, etc.), together with omega badges, special karma, application process, etc. In such setup it is hard to avoid for the status-tracking subsystem which humans generally have to not care about what is "high status". At the same time: I went through the list of AF users, and it seems much better representation of something which Rohin called "viewpoint X" than the field of AI alignment in general. I would expect some subtle distortion as a result

2) The LW team seem quite keen about e.g. karma, cash prizes on questions, omegas, daily karma updates, and similar technical measures which in S2-centric views bring clear benefits (sorting of comments, credible signalling of interest in questions, creating high-context environment for experts,...). Often these likely have some important effects on S1 motivations / social interactions / etc. I've discussed karma and omegas before, creating an environment driven by prizes risks eroding the spirit of cooperativeness and sharing of ideas which is one of virtues of AI safety community, and so on. "Herding elephants with small electric jolts" is a poetic description of effects people's S1 get from downvotes and strong downvotes.

Comment by Jan_Kulveit on Disincentives for participating on LW/AF · 2019-05-15T22:07:37.706Z · LW · GW

As a datapoint - my reasons for mostly not participating in discussion here:

  • The karma system messes up with my S1 motivations and research taste; I do not want to update toward "LW average taste" - I don't think LW average taste is that great. Also IMO on the margin it is better for the field to add ppl who are trying to orient themselves in AI alignment independently, in contrast to people guided by "what's popular on LW"
  • Commenting seems costly; feels like comments are expected to be written very clearly and reader-friendly, which is time costly
  • Posting seems super-costly; my impression is many readers are calibrated on quality of writing of Eliezer, Scott & likes, not on informal research conversation
  • Quality of debate on topics I find interesting is much worse than in person
  • Not the top reason, but still... System of AF members vs. hoi polloi, omegas, etc. creates some subtle corruption/distortion field. My overall vague impression is the LW team generally tends to like solutions which look theoretically nice, and tends to not see subtler impacts on the elephants. Where my approach would be to try move much of the elephants-playing-status-game out of the way, what's attempted here sometimes feels a bit like herding elephants with small electric jolts.
Comment by Jan_Kulveit on Epistea Summer Experiment · 2019-05-15T19:31:15.351Z · LW · GW

No. It's planned so you can attend both events.

Comment by Jan_Kulveit on Habryka's Shortform Feed · 2019-04-30T02:37:18.189Z · LW · GW

FWIW I also think it's quite possible the current equilibrium is decent (which is part of reasons why I did not posted something like "How did I turned karma off" with simple instruction about how to do it on the forum, which I did consider). On the other hand I'd be curious about more people trying it and reporting their experiences.

I suspect many people kind of don't have this action in the space of things they usually consider - I'd expect what most people would do is 1) just stop posting 2) write about their negative experience 3) complain privately.

Comment by Jan_Kulveit on Habryka's Shortform Feed · 2019-04-30T02:29:12.995Z · LW · GW

Actually I turned the karma for all comments, not just mine. The bold claim is my individual taste in what's good on the EA forum is in important ways better than the karma system, and the karma signal is similar to sounds made by a noisy mob. If I want I can actually predict what average sounds will the crowd make reasonably well, so it is not any new source of information. But it still messes up with your S1 processing and motivations.

Continuing with the party metaphor, I think it is generally not that difficult to understand what sort of behaviour will make you popular at a party, and what sort of behaviours even when they are quite good in a broader scheme of things will make you unpopular at parties. Also personally I often feel something like "I actually want to have good conversations about juicy topics in a quite place, unfortunately you all people are congregating at this super loud space, with all these status games, social signals, and ethically problematic norms how to treat other people" toward most parties.

Overall I posted this here because it seemed like an interesting datapoint. Generally I think it would be great if people moved toward writing information rich feedback instead of voting, so such shift seems good. From what I've seen on EA forum it's quite rarely "many people" doing anything. More often it is like 6 users upvote a comment, 1user strongly downvotes it, something like karma 2 is a result. I would guess you may be in larger risk of distorted perception that this represents some meaningful opinion of the community. (Also I see some important practical cases where people are misled by "noises of the crowd" and it influences them in a harmful way.)

Comment by Jan_Kulveit on Habryka's Shortform Feed · 2019-04-29T19:47:53.575Z · LW · GW

What I noticed on the EA forum is the whole karma thing is messing up with my S1 processes and makes me unhappy on average. I've not only turned off the notifications, but also hidden all karma displays in comments via css, and the experience is much better.

Comment by Jan_Kulveit on What failure looks like · 2019-03-19T02:04:44.976Z · LW · GW

Reasons for some careful optimism

in Part I., it can be the case that human values are actually complex combination of easy to measure goals + complex world models, so the structure of the proxies will be able to represent what we really care about. (I don't know. Also the result can still stop represent our values with further scaling and evolution.)

in Part II., it can be the case that influence-seeking patterns are more computationally costly than straightforward patterns, and they can be in part suppressed by optimising for processing costs, bounded-rationality style. To some extend, influence-seeking patterns attempting to grow and control the whole system seems to me to be something happening also within our own minds. I would guess some combinational of immune system + metacognition + bounded rationality + stabilisation by complexity is stabilising many human minds. (I don't know if anything of that can scale arbitrarily.)

Comment by Jan_Kulveit on Understanding information cascades · 2019-03-18T10:19:11.340Z · LW · GW

Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the "cascade of spreading" of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.

I'm afraid I don't know enough about how "forecasting communities" work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment - manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on "individual dynamics" liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.

(I also apologize for not being more helpful, but I really don't have time to work on this for you.)

Comment by Jan_Kulveit on Understanding information cascades · 2019-03-14T17:09:21.601Z · LW · GW

I was a bit confused by we but aren't sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.

Comment by Jan_Kulveit on Understanding information cascades · 2019-03-14T12:58:43.494Z · LW · GW

Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.

There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.

What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

Comment by Jan_Kulveit on 'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples · 2019-03-01T15:23:25.521Z · LW · GW

It would be cool to try some style-matching between the text and images. Ultimately, having some "personality vector" which would be used both in image and text generation. (A very crude version could be to create a NN translator from the style space to word2vec space and include the words in the GPT prompts)

Comment by Jan_Kulveit on Thoughts on Human Models · 2019-02-26T02:12:17.949Z · LW · GW

As I see it, big part of the problem is there is an inherent tension between "concrete outcomes avoiding general concerns with human models" and "how systems interacting with humans must work". I would expect the more you want to avoid general concerns with human models, the more "impractical" suggestions you get - or in another words, that the tension between the "Problems with h.m." and "Difficulties without h.m." is a tradeoff you cannot avoid by conceptualisations.

I would suggest using grounding in QFT not as an example of obviously wrong conceptualisation, but as useful benchmark of "actually human-model-free". Comparison to the benchmark may then serve as a heuristic pointing to where (at least implicit) human modelling creeps in. In the above mentioned example of avoiding side-effects, the way how the "coarse-graining" of the space is done is actually a point where Goodharting may happen, and thinking in that direction can maybe even lead to some intuitions about how much info about humans got in.

One possible counterargument to the conclusion of the o.p. is that the main "tuneable" parameters we are dealing with are 1. "modelling humans explicitly vs modelling humans implicitly", and II. "total amount of human modelling". Then, it is possible, competitive systems are only in some part of this space. And by pushing hard on the "total amount of human modelling" parameter we can get systems which are doing less human modelling, but when they do it, it is happening mostly in implicit, hard to understand ways.

Comment by Jan_Kulveit on Thoughts on Human Models · 2019-02-25T18:21:11.858Z · LW · GW

I'm afraid it is generally infeasible to avoid modelling humans at least implicitly. One reason for that is that basically any practical ontology we use is implicitly human. In a sense the only implicitly non-human knowledge is quantum field theory (and even that is not clear).

For example: while human-independent methods to measure negative side effects seem like human-independent, it seems to me lot of ideas about humans creep into the details. The proposals I've seen generally depend on some coarse-graining of states - you at least want to somehow remove time from the state, but generally you do coarse-graining based on ...actually, what humans value. (If this research agenda would be trying to avoid implicit human models, I would expect people spending a lot of effort on measures of quantum entaglement, decoherence, and similar topics.)

Comment by Jan_Kulveit on Conclusion to the sequence on value learning · 2019-02-04T15:20:50.707Z · LW · GW

Just a few comments

  • In the abstract, one open problem about "not-goal directed agents" is "when they turn into goal directed?"; this seems to be similar to the problem of inner optimizers, at least in the direction that solutions which would prevent the emergence of inner optimizers could likely work for non-goal directed things
  • From the "alternative solutions", in my view, what is under-investigated are attempts to limit capabilities - make "bounded agents". One intuition behind it is that humans are functional just because goals and utilities are "broken" in a way compatible with our planning and computational bounds. I'm worried that efforts in this direction got bucketed with "boxing", and boxing got some vibe as being uncool. (By making something bounded I mean for example making bit-flips costly in a way which is tied to physics, not naive solutions like "just don't connect it to the internet")
  • I'm particularly happy about your points on the standard claims about expected utility maximization. My vague impression is too many people on LW kind of read the standard texts, take note that there is a persuasive text from Eliezer on a topic, and take the matter as settled.
Comment by Jan_Kulveit on How much can value learning be disentangled? · 2019-01-31T11:42:44.401Z · LW · GW

Not only it is hard to disentangle manipulation and explanation; it is actually difficult to disentangle even manipulation and just asking the human about preferences (like here).

Manipulation via incorrect "understanding" is IMO somewhat easier problem (understanding can be possibly tested by something like simulating the human's capacity to predict). Manipulation via messing up with our internal multi-agent system of values seems subtle and harder. (You can imagine AI roughly in the shape of Robin Hanson, explaining to one part of the mind how some of the other parts work. Or just drawing the attention of consciousness to some sub-agents and not others.)

My impression is that in full generality it is unsolvable, but something like starting with an imprecise model of approval / utility function learned via ambitious value learning and restricting explanations/questions/manipulation by that may be work.

Comment by Jan_Kulveit on Future directions for ambitious value learning · 2019-01-30T09:49:08.025Z · LW · GW

One hypothesis why we do so well: we "simulate" other people on a very similar hardware, and relatively similar mind (when compared to the abstract set of planners). Which is a sort of strong implicit prior. (Some evidence for that is we have much more trouble inferring goals of other people if their brains function far away from what's usual on some dimension)

Comment by Jan_Kulveit on Announcement: AI alignment prize round 4 winners · 2019-01-22T15:38:49.445Z · LW · GW

As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is

  • improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
  • do what can be done on peer-to-peer basis
  • use mentors from other fields to teach people generic skills, e.g. how to do research
  • prepare better materials for onboarding
Comment by Jan_Kulveit on Announcement: AI alignment prize round 4 winners · 2019-01-22T15:03:43.162Z · LW · GW

Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).

Comment by Jan_Kulveit on The Very Repugnant Conclusion · 2019-01-18T18:03:52.799Z · LW · GW

Btw, when it comes to any practical implications, both of these repugnant conclusions depend on likely incorrect aggregating of utilities. If we aggregate utilities with logarithms/exponentiation in the right places, and assume the resources are limited, the answer to the question “what is the best population given the limited resources” is not repugnant.

Comment by Jan_Kulveit on Hierarchical system preferences and subagent preferences · 2019-01-17T16:46:53.544Z · LW · GW

This is part of the problem I was trying to describe in multi-agent minds, part "what are we aligning the AI with".

I agree the goal is under-specified. With regard to meta-preferences, with some simplification it seems we have several basic possibilities

1. Align with the result of the internal aggregation (e.g. observe what does the corporation do)

2. Align with the result of the internal aggregation, by asking (e.g. ask the corporation via some official channel, let the sub-agents sort it out inside)

3. Learn about the sub-agents and try to incorporate their values (e.g. learn about the humans in the corporation)

4. Add layers of indirection, e.g. asking about meta-preferences

Unfortunately, I can imagine in case of humans, 4. can lead to various stable reflective equilibria of preferences and meta-preferences - for example, I can imagine, by suitable queries, you can get a human to want

  • to be aligned with explicit reasoning, putting most value on some conscious, model-based part of the mind; with meta-reasoning about VNM axioms, etc.
  • to be aligned with some heart&soul, putting value on universal love, transcendent joy, and the many parts of human mind which are not explicit, etc.

where both of these options would be self-consistently aligned with the meta-preferences the human will be expressing about how the sub-agent alignment should be done.

So even with meta-preferences, likely there are multiple ways

Comment by Jan_Kulveit on Book Summary: Consciousness and the Brain · 2019-01-17T13:39:40.218Z · LW · GW

There is a fascinating not yet really explored territory between the GWT and predictive processing.

For example how it may look: there is a paper on Dynamic interactions between top-down expectations and conscious from 2018, where they do experiments in the "blink of mind" style and prediction, and discover, for example

The first question that we addressed was how prior information about the identity of an upcoming stimulus influences the likelihood of that stimulus entering conscious awareness. Using a novel attentional blink paradigm in which the identity of T1 cued the likelihood of the identity of T2, we showed that stimuli that confirm our expectation have a higher likelihood of gaining access to conscious awareness


Second, nonconscious violations of conscious expectations are registered in the human brain Third, however, expectations need to be implemented consciously to subsequently modulate conscious access. These results suggest a differential role of conscious awareness in the hierarchy of predictive processing, in which the active implementation of top-down expectations requires conscious awareness, whereas a conscious expectation and a nonconscious stimulus can interact to generate prediction errors. How these nonconscious prediction errors are used for updating future behavior and shaping trial-by-trial learning is a matter for future experimentation.

My rough takeaway is this: while on surface it may seem that effect of unconscious processing and decision-making is relatively weak, the unconscious processing is responsible for what even gets the conscious awareness. In the FBI metaphor, there is a lot of power in the FBI's ability to shape what even get's on the agenda.

Comment by Jan_Kulveit on Meditations on Momentum · 2019-01-16T14:41:35.132Z · LW · GW

The second thing first: "...but before they were physics terms they were concepts for intuitive things" is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like "mass"x"velocity". It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the term (which is one of my cruxes).

To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity - direction and speed where the system is heading. I would expect most people have this on some level.

Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it's comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the term is what you should control, or systems where you should actually understand more about than the fact that is is has positive first derivative at some point.

What should be a good name for I don't know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.

Comment by Jan_Kulveit on Meditations on Momentum · 2019-01-01T15:41:28.956Z · LW · GW

1. Going through two of the adjacent links in the same paragraph:

With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.

and, also from the paper

Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization

(I also recommend looking on the plots with the "power law", which are of the usual type of approximating something more complex with a straight line in some interval.)

So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on "Matthews effect", and proposes disturbances as a critical part of the explanation.

(Btw I do not claim any dishonesty from the author anything like that.)

Something similar can be said about the Cambrian explosion which is the next link.

Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like "having an ugly face" and "having higher probability of ending up in trouble" are likely correlated - the common cause can be mutation load / parasite load; you have things like the positive manifold).

And so on.

Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)

2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not "preferential attachment" (as you propose), but something broader - "systems with feedback loops which can be in some approximation described by the differential equation dx = b.x".

You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.

One problem with this: if you have a system described by a differential equation of the form "dx = f(x,..)", and the function f() is reasonable, you can approximate it by its Taylor series "f(x)=a+b.x+c.x.x+..". Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.

So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don't have s-1 intuitions for Taylor series or even log() function)

Comment by Jan_Kulveit on Meditations on Momentum · 2018-12-31T02:10:37.111Z · LW · GW

I'm still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.

If the intention is to have a concept ca in the shape of "all kinds of dynamics which can be rounded to dx=a.x" I agree it may be valuable to have a word for that, but why overload momentum?

You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph

There’s also the height of trees, the colour, brightness, and lifetime of stars, the proliferation of  species, the halo and horns effect, affective death spirals, and the existence of life itself.

So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I'm really confused after your last reply what is the state of our disagreement on this.

I'm actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions - if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself - preferential attachment often actually leads to Yule-Simon distribution, and not a power law ... but as usually you can approximate it.

Comment by Jan_Kulveit on Meditations on Momentum · 2018-12-30T19:14:53.689Z · LW · GW

I don't know what you mean by attachment style, but some examples of the conflation...

Momentum is this: even if JK Rowling's next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.

Power laws are ubiquitous. This should be unsurprising - power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford's law - you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)

This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.

They are often the result of other power-law distributed quantities. To take one example from the op... initial distribution of masses for an initail population of new stars is a truncated power law. I don't know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.

Comment by Jan_Kulveit on Meditations on Momentum · 2018-12-30T14:41:43.608Z · LW · GW

The post creates unnecessary confusion by lumping "momentum" , "exponential growth", "compound interest", and "heavy tail distributions". Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.

Comment by Jan_Kulveit on Best arguments against worrying about AI risk? · 2018-12-24T12:39:33.947Z · LW · GW

Some of what seems to me to be good arguments against entering the field, depending on what you include as the field.

  • We may live in a world where AI safety is either easy, or almost impossible to solve. In such cases it may be better to work e.g. on global coordination or rationality of leaders
  • It may be the case the "near-term" issues with AI will transform the world in a profound way / are big enough to pose catastrophic risks, and given the shorter timelines, and better tractability, they are higher priority. (For example, you can imagine technological unemployment + addictive narrow AI aided VR environments + decay of shared epistemology leading to unraveling of society. Or narrow AI accelerating biorisk.)
  • It may be the case the useful work on reduction of AI risk requires very special talent / judgment calibrated in special ways / etc. and the many people who want to enter the field will mostly harm the field, because the people who should start working on it will be drowned out by the noise created by the large mass.

(Note: I do not endorse the arguments. Also they are not answering the part about worrying.)

Comment by Jan_Kulveit on Player vs. Character: A Two-Level Model of Ethics · 2018-12-20T02:04:31.670Z · LW · GW

I like your point about where most of the computation/lovecraftian monsters are located.

I'll think about it more, but if I try to paraphrase it in my picture by a metaphor ... we can imagine an organization with a workplace safety department. The safety regulations it is implementing are result of some large external computation. Also even the existence of the workplace safety department is in some sense result of the external system. But drawing boundaries is tricky.

I'm curious about how the communication channel between evolution and the brain looks like "on the link level". It seems it is reasonably easy to select e.g. personality traits, some "hyperparameters" of the cognitive architecture, and similar. It is unclear to me if this can be enough to "select from complex strategies" or if it is necessary to transmit strategies in some more explicit form.

Comment by Jan_Kulveit on Two Neglected Problems in Human-AI Safety · 2018-12-17T22:30:48.122Z · LW · GW

Some instantiations of the first problem (How to prevent "aligned" AIs from unintentionally corrupting human values?) seem to me to be some of the easily imaginable ways to existential risk - e.g. almost all people spending lives in an addictive VR. I'm not sure if it is really neglected?

Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-17T22:03:10.155Z · LW · GW

The thing I'm trying to argue is complex and yes, it is something in the middle between the two options.

1. Predictive processing (in the "perception" direction) makes some brave predictions, which can be tested and match data/experience. My credence in predictive processing in a narrow sense: 0.95

2. Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle. Vague introspective evidence for active inference comes from an ability to do inner simulations. Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will "prove their models are right" even at the cost of the actions being actually harmful for them in some important sense. How it may match everyday experience: for example, here. My credence in active inference as a basic design mechanism: 0.6

3. So far, the description was broadly Bayesian/optimal/"unbounded". Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models. My credence in boundedness being a key ingredient: 0.99

4. What is missing from the picture so far is some sort of "goals" or "motivation" (or in other view, a way how evolution can insert into the brain some signal). How Karl Friston deals with this, e.g.

We start with the premise that adaptive agents or pheno-types must occupy a limited repertoire of physical states. For a phenotype to exist, it must possess defining characteristics or traits; both in terms of its morphology and exchange with the environment. These traits essentially limit the agent to a bounded region in the space of all states it could be in. Once outside these bounds, it ceases to possess that trait (cf., a fish out of water).

is something which I find unsatisfactory. My credence in this being complete explanation: 0.1

5. My hypothesis is ca. this:

  • evolution inserts some "goal-directed" sub-parts into the PP/AI machinery
  • these sub-parts do not somehow "directly interface the world", but are "burried" within the hierarchy of the generative layers; so they not care about people or objects or whatever, but about some abstract variables
  • they are quite "agenty", optimizing some utility function
  • from the point of view of such sub-agent, other sub-agents inside of the same mind are possibly competitors; at least some sub-agents likely have access to enough computing power to not only "care about what they are intended to care about", but do a basic modelling of other sub-agents; internal game theoretical mess ensues

6. This hypothesis bridges the framework of PP/AI and the world of theories viewing the mind as a multi agent system. Multi-agent theories of mind have some introspective support in various styles of psychotherapy, IFS, meditative experience, some rationality techniques. And also seem to be explain behavior where humans seem to "defect against themselves". Credence: 0.8

(I guess a predictive processing purist would probably describe 5. & 6. as just a case of competing predictive models, not adding anything conceptually new.)

Now I would actually want to draw a graph how strongly 1...6. motivate different possible problems with alignment, and how these problems motivate various research questions. For example the question about understanding hierarchical modelling is interesting even if there is no multi-agency, scaling of sub-agents can be motivated even without active inference, etc.

Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-16T19:51:10.910Z · LW · GW

I read the book the SSC article is reviewing (plus a bunch of articles on predictive-mind, some papers from google scholar + seen several talks). Linking the SSC review seemed more useful than linking amazon.

I don't think I'm the right person for writing an introduction to predictive processing for the LW community.

Maybe I actually should have included a warning that the whole model I'm trying to describe has nontrivial inferential distance.

Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-16T18:32:21.943Z · LW · GW

Thanks for the feedback! Sorry, I'm really bad at describing models in text - if it seems self-contradictory or confused, it's probably either me being bad at explanations or inferential distance (you probably need to understand predictive processing better than what you get from reading the SSC article).

Another try... start by imagining the hierarchical generative layers (as in PP). They just model the world. Than, add active inference. Than, add the special sort of "priors" like "not being hungry" or "seek reproduction". (You need to have those in active inference for the whole thing to describe humans IMO) Than, imagine that these "special priors" start to interact with each other ...leading to a game-theoretic style mess. Now you have the sub-agents. Than, imagine some layers up in the hierarchy doing stuff like "personality/narrative generation".

Unless you have this picture right, the rest does not make sense. From your comments I don't think you have the picture right. I'll try to reply ... but I'm worried it may add to confusion.

To some extent, PP struggles to describe motivations. Predictive processing in a narrow sense is about perception, is not agenty at all - it just optimizes set of hierarchical models to minimize error. If you add active inference, the system becomes agenty, but you actually do have a problem with motivations . From some popular accounts or from some remarks by Friston it may seem otherwise, but "depends on details of the notion of free energy" is in my interpratation a statement roughly similar to a claim that physics can be stated in terms of variation principles, and the rest "depends on the notion of action"

Jeffrey-Bolker rotation is something different leading to somewhat similar problem (J-B rotation is much more limited in what can be transformed to what, and preserves decision structure)

My feeling is you don't understand Friston; also I don't want to defend pieces of Friston as I'm not sure I understand Friston.

Options given in the "what are we aligning with" is AFAIK not something which would have been described in this way before, so an attempt to map it directly to the "familiar litany of options" is likely not the way how to understand it. Overall my feeling is here you don't have the proposed model right and the result is mostly confusion.

Comment by Jan_Kulveit on Player vs. Character: A Two-Level Model of Ethics · 2018-12-14T23:24:14.447Z · LW · GW

It's nicely written, but the image of the Player, hyperintelligent Lovecraftian creature, seems not really right to me. In my picture, were you have this powerful agent entity, I see a mess of sub-agents, interacting in a game theoretical way primarily among themselves.* How "smart" the results of the interactions are, is quite high variance. Obviously the system has a lot of computing power, but that is not really the same as being intelligent or agent-like.

What I really like the descriptions how the results of these interaction are processed via some "personality generating" layers and how the result looks like "from within".

(* one reason for why this should be the case is: there is not enough bandwidth between DNA and the neural network; evolution can input some sort of a signal like "there should be a subsystem tracking social status, and that variable should be maximized" or tune some parameters, but it likely does not have enough bandwidth to transfer some complex representation of the real evolutionary fitness. Hence what gets created are sub-agenty parts, which do not have direct access to reality, and often, instead of playing some masterful strategy in unison, are bargaining or even defecting internally)

Comment by Jan_Kulveit on Figuring out what Alice wants: non-human Alice · 2018-12-13T19:53:11.951Z · LW · GW

Human brains likely model other humans by simulating them. The simple normative assumption used is something like humans are humans, which will not really help you in the way you want, but leads to this interesting problem

Learning from multiple agents.
Imagine a group of five closely interacting humans. Learning values just from person A may run into the problem that big part of A’s motivation is based on A simulating B,C,D,E (on the same “human” hardware, just incorporating individual differences). In that case, learning the “values” just from A’s actions could be in principle more difficult than observing the whole group, trying to learn some “human universals” and some “human specifics”. A different way of thinking about this could be by making a parallel with meta-learning algorithms (e.g. REPTILE) but in IRL frame.
Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T19:36:28.520Z · LW · GW

I'm really delighted to hear that this seems like a very well developed model :) Actually I'm not aware of any published attempt to unite sub-agents with predictive processing framework in this l way even on the qualitative level, and it is possible this union is original (I did not found anything attempting to do this on google scholar or on few first pages of google search results)

Making it quantitative, end-to-end trainable on humans, does not seem to be feasible right now, in my opinion.

With the individual components

  • predictive processing is supported by a growing pile of experimental data
  • active inference is theoretically very elegant extension of predictive processing
  • sub-personalities is something which seems to work in psychotherapy, and agrees with some of my meditative experience
  • sub-agenty parts interacting in some game-theory-resembling way feels like something which can naturally develop within sufficiently complex predictive processing/active inference system
Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T08:23:45.196Z · LW · GW

First scanned from paper (I like to draw), second edited in GIMP (I don't like to draw the exact same thing repeatedly). Don't know if it's the same with other images you see on LW. Instead of scanning you can also draw using tablet

Comment by Jan_Kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T08:03:43.930Z · LW · GW
  1. Nice! We should chat about that.

  1. The technical research direction specification can be in all cases "expanded" from the "seed idea" described here. (We are already working on some of those.) I'm not sure if it's the best thing to publish now - to me, it seems better to do some iterations on "specify - try to work on it" first, before publishing the expansions.