Comment by jan_kulveit on More realistic tales of doom · 2019-03-19T02:04:44.976Z · score: 7 (4 votes) · LW · GW

Reasons for some careful optimism

in Part I., it can be the case that human values are actually complex combination of easy to measure goals + complex world models, so the structure of the proxies will be able to represent what we really care about. (I don't know. Also the result can still stop represent our values with further scaling and evolution.)

in Part II., it can be the case that influence-seeking patterns are more computationally costly than straightforward patterns, and they can be in part suppressed by optimising for processing costs, bounded-rationality style. To some extend, influence-seeking patterns attempting to grow and control the whole system seems to me to be something happening also within our own minds. I would guess some combinational of immune system + metacognition + bounded rationality + stabilisation by complexity is stabilising many human minds. (I don't know if anything of that can scale arbitrarily.)

Comment by jan_kulveit on Understanding information cascades · 2019-03-18T10:19:11.340Z · score: 15 (5 votes) · LW · GW

Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the "cascade of spreading" of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.

I'm afraid I don't know enough about how "forecasting communities" work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment - manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on "individual dynamics" liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.

(I also apologize for not being more helpful, but I really don't have time to work on this for you.)

Comment by jan_kulveit on Understanding information cascades · 2019-03-14T17:09:21.601Z · score: 10 (4 votes) · LW · GW

I was a bit confused by we but aren't sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.

Comment by jan_kulveit on Understanding information cascades · 2019-03-14T12:58:43.494Z · score: 20 (10 votes) · LW · GW

Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.

There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.

What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.

Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)

Comment by jan_kulveit on 'This Waifu Does Not Exist': 100,000 StyleGAN & GPT-2 samples · 2019-03-01T15:23:25.521Z · score: 5 (3 votes) · LW · GW

It would be cool to try some style-matching between the text and images. Ultimately, having some "personality vector" which would be used both in image and text generation. (A very crude version could be to create a NN translator from the style space to word2vec space and include the words in the GPT prompts)

Comment by jan_kulveit on Thoughts on Human Models · 2019-02-26T02:12:17.949Z · score: 2 (1 votes) · LW · GW

As I see it, big part of the problem is there is an inherent tension between "concrete outcomes avoiding general concerns with human models" and "how systems interacting with humans must work". I would expect the more you want to avoid general concerns with human models, the more "impractical" suggestions you get - or in another words, that the tension between the "Problems with h.m." and "Difficulties without h.m." is a tradeoff you cannot avoid by conceptualisations.

I would suggest using grounding in QFT not as an example of obviously wrong conceptualisation, but as useful benchmark of "actually human-model-free". Comparison to the benchmark may then serve as a heuristic pointing to where (at least implicit) human modelling creeps in. In the above mentioned example of avoiding side-effects, the way how the "coarse-graining" of the space is done is actually a point where Goodharting may happen, and thinking in that direction can maybe even lead to some intuitions about how much info about humans got in.

One possible counterargument to the conclusion of the o.p. is that the main "tuneable" parameters we are dealing with are 1. "modelling humans explicitly vs modelling humans implicitly", and II. "total amount of human modelling". Then, it is possible, competitive systems are only in some part of this space. And by pushing hard on the "total amount of human modelling" parameter we can get systems which are doing less human modelling, but when they do it, it is happening mostly in implicit, hard to understand ways.

Comment by jan_kulveit on Thoughts on Human Models · 2019-02-25T18:21:11.858Z · score: 2 (1 votes) · LW · GW

I'm afraid it is generally infeasible to avoid modelling humans at least implicitly. One reason for that is that basically any practical ontology we use is implicitly human. In a sense the only implicitly non-human knowledge is quantum field theory (and even that is not clear).

For example: while human-independent methods to measure negative side effects seem like human-independent, it seems to me lot of ideas about humans creep into the details. The proposals I've seen generally depend on some coarse-graining of states - you at least want to somehow remove time from the state, but generally you do coarse-graining based on ...actually, what humans value. (If this research agenda would be trying to avoid implicit human models, I would expect people spending a lot of effort on measures of quantum entaglement, decoherence, and similar topics.)

Comment by jan_kulveit on Conclusion to the sequence on value learning · 2019-02-04T15:20:50.707Z · score: 7 (4 votes) · LW · GW

Just a few comments

  • In the abstract, one open problem about "not-goal directed agents" is "when they turn into goal directed?"; this seems to be similar to the problem of inner optimizers, at least in the direction that solutions which would prevent the emergence of inner optimizers could likely work for non-goal directed things
  • From the "alternative solutions", in my view, what is under-investigated are attempts to limit capabilities - make "bounded agents". One intuition behind it is that humans are functional just because goals and utilities are "broken" in a way compatible with our planning and computational bounds. I'm worried that efforts in this direction got bucketed with "boxing", and boxing got some vibe as being uncool. (By making something bounded I mean for example making bit-flips costly in a way which is tied to physics, not naive solutions like "just don't connect it to the internet")
  • I'm particularly happy about your points on the standard claims about expected utility maximization. My vague impression is too many people on LW kind of read the standard texts, take note that there is a persuasive text from Eliezer on a topic, and take the matter as settled.
Comment by jan_kulveit on How much can value learning be disentangled? · 2019-01-31T11:42:44.401Z · score: 7 (4 votes) · LW · GW

Not only it is hard to disentangle manipulation and explanation; it is actually difficult to disentangle even manipulation and just asking the human about preferences (like here).

Manipulation via incorrect "understanding" is IMO somewhat easier problem (understanding can be possibly tested by something like simulating the human's capacity to predict). Manipulation via messing up with our internal multi-agent system of values seems subtle and harder. (You can imagine AI roughly in the shape of Robin Hanson, explaining to one part of the mind how some of the other parts work. Or just drawing the attention of consciousness to some sub-agents and not others.)

My impression is that in full generality it is unsolvable, but something like starting with an imprecise model of approval / utility function learned via ambitious value learning and restricting explanations/questions/manipulation by that may be work.

Comment by jan_kulveit on Future directions for ambitious value learning · 2019-01-30T09:49:08.025Z · score: 7 (3 votes) · LW · GW

One hypothesis why we do so well: we "simulate" other people on a very similar hardware, and relatively similar mind (when compared to the abstract set of planners). Which is a sort of strong implicit prior. (Some evidence for that is we have much more trouble inferring goals of other people if their brains function far away from what's usual on some dimension)

Comment by jan_kulveit on Announcement: AI alignment prize round 4 winners · 2019-01-22T15:38:49.445Z · score: 12 (5 votes) · LW · GW

As Raemon noted, mentorship bottleneck is actually a bottleneck. Senior researchers who should mentor are the most bottlenecked resource in the field, and the problem is unlikely to be solved by financial or similar incentives. Motivating too much is probably wrong, because mentoring competes with time to do research, evaluate grants, etc. What can be done is

  • improve the utilization of time of the mentors (e.g. mentoring teams of people instead of individuals)
  • do what can be done on peer-to-peer basis
  • use mentors from other fields to teach people generic skills, e.g. how to do research
  • prepare better materials for onboarding
Comment by jan_kulveit on Announcement: AI alignment prize round 4 winners · 2019-01-22T15:03:43.162Z · score: 9 (3 votes) · LW · GW

Is there another way to spend money that seems clearly more cost-effective at this point, and if so what? In my opinion for example AI safety camps were significantly more effective. I have maybe 2-3 ideas which would be likely more effective (sorry but shareable in private only).

Comment by jan_kulveit on The Very Repugnant Conclusion · 2019-01-18T18:03:52.799Z · score: 8 (5 votes) · LW · GW

Btw, when it comes to any practical implications, both of these repugnant conclusions depend on likely incorrect aggregating of utilities. If we aggregate utilities with logarithms/exponentiation in the right places, and assume the resources are limited, the answer to the question “what is the best population given the limited resources” is not repugnant.

Comment by jan_kulveit on Hierarchical system preferences and subagent preferences · 2019-01-17T16:46:53.544Z · score: 2 (1 votes) · LW · GW

This is part of the problem I was trying to describe in multi-agent minds, part "what are we aligning the AI with".

I agree the goal is under-specified. With regard to meta-preferences, with some simplification it seems we have several basic possibilities

1. Align with the result of the internal aggregation (e.g. observe what does the corporation do)

2. Align with the result of the internal aggregation, by asking (e.g. ask the corporation via some official channel, let the sub-agents sort it out inside)

3. Learn about the sub-agents and try to incorporate their values (e.g. learn about the humans in the corporation)

4. Add layers of indirection, e.g. asking about meta-preferences

Unfortunately, I can imagine in case of humans, 4. can lead to various stable reflective equilibria of preferences and meta-preferences - for example, I can imagine, by suitable queries, you can get a human to want

  • to be aligned with explicit reasoning, putting most value on some conscious, model-based part of the mind; with meta-reasoning about VNM axioms, etc.
  • to be aligned with some heart&soul, putting value on universal love, transcendent joy, and the many parts of human mind which are not explicit, etc.

where both of these options would be self-consistently aligned with the meta-preferences the human will be expressing about how the sub-agent alignment should be done.

So even with meta-preferences, likely there are multiple ways

Comment by jan_kulveit on Book Summary: Consciousness and the Brain · 2019-01-17T13:39:40.218Z · score: 5 (2 votes) · LW · GW

There is a fascinating not yet really explored territory between the GWT and predictive processing.

For example how it may look: there is a paper on Dynamic interactions between top-down expectations and conscious from 2018, where they do experiments in the "blink of mind" style and prediction, and discover, for example

The first question that we addressed was how prior information about the identity of an upcoming stimulus influences the likelihood of that stimulus entering conscious awareness. Using a novel attentional blink paradigm in which the identity of T1 cued the likelihood of the identity of T2, we showed that stimuli that confirm our expectation have a higher likelihood of gaining access to conscious awareness

or

Second, nonconscious violations of conscious expectations are registered in the human brain Third, however, expectations need to be implemented consciously to subsequently modulate conscious access. These results suggest a differential role of conscious awareness in the hierarchy of predictive processing, in which the active implementation of top-down expectations requires conscious awareness, whereas a conscious expectation and a nonconscious stimulus can interact to generate prediction errors. How these nonconscious prediction errors are used for updating future behavior and shaping trial-by-trial learning is a matter for future experimentation.

My rough takeaway is this: while on surface it may seem that effect of unconscious processing and decision-making is relatively weak, the unconscious processing is responsible for what even gets the conscious awareness. In the FBI metaphor, there is a lot of power in the FBI's ability to shape what even get's on the agenda.

Comment by jan_kulveit on Meditations on Momentum · 2019-01-16T14:41:35.132Z · score: 4 (2 votes) · LW · GW

The second thing first: "...but before they were physics terms they were concepts for intuitive things" is actually not true in this case: momentum did not mean anything, before being coined in physics. Than, it become used in a metaphorical way, but mostly congruently with the original physics concepts, as something like "mass"x"velocity". It seems to me easy to imagine vivid pictures based of this metaphor, like advancing army conquering mile after mile of enemy territory having a momentum, or a scholar going through page after page of a difficult text. However, this concept is not tied to the term (which is one of my cruxes).

To me, the original metaphorical meaning of momentum makes a lot of sense: you have a lot of systems where you have something like mass (closely connected to inertia: you need great force to get something massive to move) and something like velocity - direction and speed where the system is heading. I would expect most people have this on some level.

Now, to the first thing second: I agree that it may be useful to notice all the systems in which the Taylor series for f has b>0, ESPECIALLY when it's comparably easy to control f via b∗x rather than just a. However, some of the examples in the original post do not match this pattern: some could be just systems where, for example, you insert heavy-tailed distribution on the input, and you get heavy-tailed distribution on the output, or systems where the term is what you should control, or systems where you should actually understand more about than the fact that is is has positive first derivative at some point.

What should be a good name for I don't know, some random prosaic ideas are snowballing, compound, faenus (from latin interest on money, gains, profit, advantage), compound interest. But likely there are is some more poetic name, similarly to Moloch.

Comment by jan_kulveit on Meditations on Momentum · 2019-01-01T15:41:28.956Z · score: 6 (3 votes) · LW · GW

1. Going through two of the adjacent links in the same paragraph:

With the trees, I only skimmed it, but if I get it correctly, the linked article proposes this new hypothesis: Together these pieces of evidence point to a new hypothesis: Small-scale, gap-generating disturbances maintain power-function size structure whereas later-successional forest patches are responsible for deviations in the high tail.

and, also from the paper

Current theories explaining the consistency of tropical forest size structure are controversial. Explanations based on scaling up individual metabolic rates are criticized for ignoring the importance of asymmetric competition for light in causing variation in dynamic rates. Other theories, which embrace competition and scale individual tree vital rates through an assumption of demographic equilibrium, are criticized for lacking parsimony, because predictions rely on site-level, size-specific parameterization

(I also recommend looking on the plots with the "power law", which are of the usual type of approximating something more complex with a straight line in some interval.)

So, what we actually have in this: apparently different researchers proposing different hypothesis to explain the observed power-law-like data. It is far from conclusive what the actual reason is. As something like positive feedback loops is quite obvious part of the hypothesis space if you see power-law-like data, you are almost guaranteed to find a paper which proposes something in that direction. However, note that article actually criticizes previous explanations based more on "Matthews effect", and proposes disturbances as a critical part of the explanation.

(Btw I do not claim any dishonesty from the author anything like that.)

Something similar can be said about the Cambrian explosion which is the next link.

Halo and Horn effects are likely evolutionary adaptive effects, tracking something real (traits like "having an ugly face" and "having higher probability of ending up in trouble" are likely correlated - the common cause can be mutation load / parasite load; you have things like the positive manifold).

And so on.

Sorry but I will not dissect every paragraph of the article in this way. (Also it seems a bit futile, as if I dig into specific examples, it will be interpreted as nit-picking)

2. Last attempt to gesture toward whats wrong with this whole. The best approximation of the cluster of phenomena the article is pointing toward is not "preferential attachment" (as you propose), but something broader - "systems with feedback loops which can be in some approximation described by the differential equation dx = b.x".

You can start to see systems like that everywhere, and get a sense of something deep, explaining life, universe and everything.

One problem with this: if you have a system described by a differential equation of the form "dx = f(x,..)", and the function f() is reasonable, you can approximate it by its Taylor series "f(x)=a+b.x+c.x.x+..". Obviously, the first order term is b.x. Unfortunately (?) you can say this even before looking on the system.

So, vaguely speaking, when you start thinking in this way, my intuition is it puts you in a big danger of conflating something about how you do approximations with causal explanations. (I guess it may be a good deal for many people who don't have s-1 intuitions for Taylor series or even log() function)

Comment by jan_kulveit on Meditations on Momentum · 2018-12-31T02:10:37.111Z · score: 3 (2 votes) · LW · GW

I'm still confused what you mean by momentum-like effects. Momentum is a very beautiful and crisp concept - the dual (canonical conjugate) of position, with all kinds of deep connections to everything. You can view the whole universe in the dual momentum space.

If the intention is to have a concept ca in the shape of "all kinds of dynamics which can be rounded to dx=a.x" I agree it may be valuable to have a word for that, but why overload momentum?

You asked for an example of where it conflates that causal mechanism with something else. I picked one example from this paragraph

There’s also the height of trees, the colour, brightness, and lifetime of stars, the proliferation of  species, the halo and horns effect, affective death spirals, and the existence of life itself.

So, as I understand it, I gave you an example (the distribution of star masses) which quite likely does not have any useful connection to preferential attachment or exponential grow. I'm really confused after your last reply what is the state of our disagreement on this.

I'm actually scared to change the topic of the discussion what simplicity means, but the argument is roughly this: if you have arbitrary well behaved function, in the linear picture, you can approximate it locally by a straight line (the first term in the Taylor series, etc.). And yes, you get better approximation by including more terms from the Taylor series expansion, or by non-linear regression, etc. Now, if you translate this to the log-log picture, you will find out that power law is in some sense the simplest local approximation of anything. This is also the reason why people often mistakenly use power laws instead of lognormal and other distributions - if you truncate the lognormal and look just on part of the tail you can fit it with power law. Btw you nicely demonstrate this effect yourself - preferential attachment often actually leads to Yule-Simon distribution, and not a power law ... but as usually you can approximate it.

Comment by jan_kulveit on Meditations on Momentum · 2018-12-30T19:14:53.689Z · score: 7 (6 votes) · LW · GW

I don't know what you mean by attachment style, but some examples of the conflation...

Momentum is this: even if JK Rowling's next book is total crap, it will still sell a lot of copies. Because people have beliefs, and because they enjoyed her previous books, they have a prior that they will enjoy also the next one. It would take them several crap book to update.

Power laws are ubiquitous. This should be unsurprising - power laws are the simplest functional form in the logarithmic picture. If we use some sort of simplicity prior, we are guaranteed to find them. If we use first terms of Taylor expansion, we will find them. Log picture is as natural as the linear one. Someone should write a Meditation on Benford's law - you have an asymptotically straight line in log-log picture of the probability than a number starts with some digits (in almost any real-life set of numerical values measured in units; you can see this must be the case because of invariance to unit scaling)

This is maybe worth emphasizing: nobody should be surprised to find power laws. Nobody should propose universal causal mechanism for power laws, it is as stupid as proposing one causal mechanism for straight lines in linear picture.

They are often the result of other power-law distributed quantities. To take one example from the op... initial distribution of masses for an initail population of new stars is a truncated power law. I don't know why, but the proposed mechanisms for this is for example turbulent fragmentation of the initial cloud, where the power law can come from the power spectrum of super–sonic turbulence.

Comment by jan_kulveit on Meditations on Momentum · 2018-12-30T14:41:43.608Z · score: 3 (7 votes) · LW · GW

The post creates unnecessary confusion by lumping "momentum" , "exponential growth", "compound interest", and "heavy tail distributions". Conflating these concepts together on system-1 level into some vague undifferentiated positive mess is likely harmful for to anyone aspiring to think about systems clearly.

Isaac Asimov's predictions for 2019 from 1984

2018-12-28T09:51:09.951Z · score: 37 (15 votes)
Comment by jan_kulveit on Best arguments against worrying about AI risk? · 2018-12-24T12:39:33.947Z · score: 15 (6 votes) · LW · GW

Some of what seems to me to be good arguments against entering the field, depending on what you include as the field.

  • We may live in a world where AI safety is either easy, or almost impossible to solve. In such cases it may be better to work e.g. on global coordination or rationality of leaders
  • It may be the case the "near-term" issues with AI will transform the world in a profound way / are big enough to pose catastrophic risks, and given the shorter timelines, and better tractability, they are higher priority. (For example, you can imagine technological unemployment + addictive narrow AI aided VR environments + decay of shared epistemology leading to unraveling of society. Or narrow AI accelerating biorisk.)
  • It may be the case the useful work on reduction of AI risk requires very special talent / judgment calibrated in special ways / etc. and the many people who want to enter the field will mostly harm the field, because the people who should start working on it will be drowned out by the noise created by the large mass.

(Note: I do not endorse the arguments. Also they are not answering the part about worrying.)

Comment by jan_kulveit on Player vs. Character: A Two-Level Model of Ethics · 2018-12-20T02:04:31.670Z · score: 2 (2 votes) · LW · GW

I like your point about where most of the computation/lovecraftian monsters are located.

I'll think about it more, but if I try to paraphrase it in my picture by a metaphor ... we can imagine an organization with a workplace safety department. The safety regulations it is implementing are result of some large external computation. Also even the existence of the workplace safety department is in some sense result of the external system. But drawing boundaries is tricky.

I'm curious about how the communication channel between evolution and the brain looks like "on the link level". It seems it is reasonably easy to select e.g. personality traits, some "hyperparameters" of the cognitive architecture, and similar. It is unclear to me if this can be enough to "select from complex strategies" or if it is necessary to transmit strategies in some more explicit form.

Comment by jan_kulveit on Two Neglected Problems in Human-AI Safety · 2018-12-17T22:30:48.122Z · score: 1 (1 votes) · LW · GW

Some instantiations of the first problem (How to prevent "aligned" AIs from unintentionally corrupting human values?) seem to me to be some of the easily imaginable ways to existential risk - e.g. almost all people spending lives in an addictive VR. I'm not sure if it is really neglected?

Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-17T22:03:10.155Z · score: 11 (3 votes) · LW · GW

The thing I'm trying to argue is complex and yes, it is something in the middle between the two options.

1. Predictive processing (in the "perception" direction) makes some brave predictions, which can be tested and match data/experience. My credence in predictive processing in a narrow sense: 0.95

2. Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle. Vague introspective evidence for active inference comes from an ability to do inner simulations. Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will "prove their models are right" even at the cost of the actions being actually harmful for them in some important sense. How it may match everyday experience: for example, here. My credence in active inference as a basic design mechanism: 0.6

3. So far, the description was broadly Bayesian/optimal/"unbounded". Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models. My credence in boundedness being a key ingredient: 0.99

4. What is missing from the picture so far is some sort of "goals" or "motivation" (or in other view, a way how evolution can insert into the brain some signal). How Karl Friston deals with this, e.g.

We start with the premise that adaptive agents or pheno-types must occupy a limited repertoire of physical states. For a phenotype to exist, it must possess defining characteristics or traits; both in terms of its morphology and exchange with the environment. These traits essentially limit the agent to a bounded region in the space of all states it could be in. Once outside these bounds, it ceases to possess that trait (cf., a fish out of water).

is something which I find unsatisfactory. My credence in this being complete explanation: 0.1

5. My hypothesis is ca. this:

  • evolution inserts some "goal-directed" sub-parts into the PP/AI machinery
  • these sub-parts do not somehow "directly interface the world", but are "burried" within the hierarchy of the generative layers; so they not care about people or objects or whatever, but about some abstract variables
  • they are quite "agenty", optimizing some utility function
  • from the point of view of such sub-agent, other sub-agents inside of the same mind are possibly competitors; at least some sub-agents likely have access to enough computing power to not only "care about what they are intended to care about", but do a basic modelling of other sub-agents; internal game theoretical mess ensues

6. This hypothesis bridges the framework of PP/AI and the world of theories viewing the mind as a multi agent system. Multi-agent theories of mind have some introspective support in various styles of psychotherapy, IFS, meditative experience, some rationality techniques. And also seem to be explain behavior where humans seem to "defect against themselves". Credence: 0.8

(I guess a predictive processing purist would probably describe 5. & 6. as just a case of competing predictive models, not adding anything conceptually new.)

Now I would actually want to draw a graph how strongly 1...6. motivate different possible problems with alignment, and how these problems motivate various research questions. For example the question about understanding hierarchical modelling is interesting even if there is no multi-agency, scaling of sub-agents can be motivated even without active inference, etc.

Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-16T19:51:10.910Z · score: 5 (3 votes) · LW · GW

I read the book the SSC article is reviewing (plus a bunch of articles on predictive-mind, some papers from google scholar + seen several talks). Linking the SSC review seemed more useful than linking amazon.

I don't think I'm the right person for writing an introduction to predictive processing for the LW community.

Maybe I actually should have included a warning that the whole model I'm trying to describe has nontrivial inferential distance.

Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-16T18:32:21.943Z · score: 5 (3 votes) · LW · GW

Thanks for the feedback! Sorry, I'm really bad at describing models in text - if it seems self-contradictory or confused, it's probably either me being bad at explanations or inferential distance (you probably need to understand predictive processing better than what you get from reading the SSC article).

Another try... start by imagining the hierarchical generative layers (as in PP). They just model the world. Than, add active inference. Than, add the special sort of "priors" like "not being hungry" or "seek reproduction". (You need to have those in active inference for the whole thing to describe humans IMO) Than, imagine that these "special priors" start to interact with each other ...leading to a game-theoretic style mess. Now you have the sub-agents. Than, imagine some layers up in the hierarchy doing stuff like "personality/narrative generation".

Unless you have this picture right, the rest does not make sense. From your comments I don't think you have the picture right. I'll try to reply ... but I'm worried it may add to confusion.

To some extent, PP struggles to describe motivations. Predictive processing in a narrow sense is about perception, is not agenty at all - it just optimizes set of hierarchical models to minimize error. If you add active inference, the system becomes agenty, but you actually do have a problem with motivations . From some popular accounts or from some remarks by Friston it may seem otherwise, but "depends on details of the notion of free energy" is in my interpratation a statement roughly similar to a claim that physics can be stated in terms of variation principles, and the rest "depends on the notion of action"

Jeffrey-Bolker rotation is something different leading to somewhat similar problem (J-B rotation is much more limited in what can be transformed to what, and preserves decision structure)

My feeling is you don't understand Friston; also I don't want to defend pieces of Friston as I'm not sure I understand Friston.

Options given in the "what are we aligning with" is AFAIK not something which would have been described in this way before, so an attempt to map it directly to the "familiar litany of options" is likely not the way how to understand it. Overall my feeling is here you don't have the proposed model right and the result is mostly confusion.

Comment by jan_kulveit on Player vs. Character: A Two-Level Model of Ethics · 2018-12-14T23:24:14.447Z · score: 7 (4 votes) · LW · GW

It's nicely written, but the image of the Player, hyperintelligent Lovecraftian creature, seems not really right to me. In my picture, were you have this powerful agent entity, I see a mess of sub-agents, interacting in a game theoretical way primarily among themselves.* How "smart" the results of the interactions are, is quite high variance. Obviously the system has a lot of computing power, but that is not really the same as being intelligent or agent-like.

What I really like the descriptions how the results of these interaction are processed via some "personality generating" layers and how the result looks like "from within".

(* one reason for why this should be the case is: there is not enough bandwidth between DNA and the neural network; evolution can input some sort of a signal like "there should be a subsystem tracking social status, and that variable should be maximized" or tune some parameters, but it likely does not have enough bandwidth to transfer some complex representation of the real evolutionary fitness. Hence what gets created are sub-agenty parts, which do not have direct access to reality, and often, instead of playing some masterful strategy in unison, are bargaining or even defecting internally)

Comment by jan_kulveit on Figuring out what Alice wants: non-human Alice · 2018-12-13T19:53:11.951Z · score: 3 (2 votes) · LW · GW

Human brains likely model other humans by simulating them. The simple normative assumption used is something like humans are humans, which will not really help you in the way you want, but leads to this interesting problem

Learning from multiple agents.
Imagine a group of five closely interacting humans. Learning values just from person A may run into the problem that big part of A’s motivation is based on A simulating B,C,D,E (on the same “human” hardware, just incorporating individual differences). In that case, learning the “values” just from A’s actions could be in principle more difficult than observing the whole group, trying to learn some “human universals” and some “human specifics”. A different way of thinking about this could be by making a parallel with meta-learning algorithms (e.g. REPTILE) but in IRL frame.
Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T19:36:28.520Z · score: 3 (3 votes) · LW · GW

I'm really delighted to hear that this seems like a very well developed model :) Actually I'm not aware of any published attempt to unite sub-agents with predictive processing framework in this l way even on the qualitative level, and it is possible this union is original (I did not found anything attempting to do this on google scholar or on few first pages of google search results)

Making it quantitative, end-to-end trainable on humans, does not seem to be feasible right now, in my opinion.

With the individual components

  • predictive processing is supported by a growing pile of experimental data
  • active inference is theoretically very elegant extension of predictive processing
  • sub-personalities is something which seems to work in psychotherapy, and agrees with some of my meditative experience
  • sub-agenty parts interacting in some game-theory-resembling way feels like something which can naturally develop within sufficiently complex predictive processing/active inference system
Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T08:23:45.196Z · score: 3 (3 votes) · LW · GW

First scanned from paper (I like to draw), second edited in GIMP (I don't like to draw the exact same thing repeatedly). Don't know if it's the same with other images you see on LW. Instead of scanning you can also draw using tablet

Comment by jan_kulveit on Multi-agent predictive minds and AI alignment · 2018-12-13T08:03:43.930Z · score: 4 (4 votes) · LW · GW
  1. Nice! We should chat about that.

  1. The technical research direction specification can be in all cases "expanded" from the "seed idea" described here. (We are already working on some of those.) I'm not sure if it's the best thing to publish now - to me, it seems better to do some iterations on "specify - try to work on it" first, before publishing the expansions.

Comment by jan_kulveit on Bounded rationality abounds in models, not explicitly defined · 2018-12-13T00:06:30.041Z · score: 1 (1 votes) · LW · GW

Good way, I would almost say, the right way, how to do bounded rationality is the information-theoretic bounded rationality. There is a post about it in the works...

Multi-agent predictive minds and AI alignment

2018-12-12T23:48:03.155Z · score: 38 (12 votes)
Comment by jan_kulveit on Why should EA care about rationality (and vice-versa)? · 2018-12-10T21:11:29.317Z · score: 7 (5 votes) · LW · GW

The simple answer is this:

  • Rationality asks the question "How to think clearly". For many people who start to think more clearly, this leads to an update of their goals toward the question "How we can do as much good as possible (thinking rationally)", and acting on the answer, which is effective altruism.
  • Effective altruism asks the question "How we can do as much good as possible, thinking rationally and based on data?". For many people who actually start thinking about the question, this leads to an update "the ability to think clearly is critical when trying to answer the question". Which is rationality.

Obviously, this is an idealization. In the real world, many people enter the EA movement with a lot of weight on the "altruism" and less on the "effective", and do not fully update toward rationality. On the other hand it seems some people enter the rationality community, get mostly aligned with EA goals in the very abstract, but do not fully update toward actually acting.

Comment by jan_kulveit on Why should EA care about rationality (and vice-versa)? · 2018-12-10T10:17:22.188Z · score: 8 (3 votes) · LW · GW

The simple answer is this:

  • Rationality asks the question "How to think clearly". For many people who start to think more clearly, this leads to an update of their goals toward the questions "How we can do as much good as possible (thinking rationally)", and act on that, which is effective altruism.
  • Effective altruism asks the question "How we can do as much good as possible, thinking rationally and based on data?". For many people who actually start thinking about the question, this leads to an update "the ability to think clearly is critical when trying to answer the question". Which is rationality.

Obviously, this is an idealization. In the real world, many people enter the EA movement with a lot of weight on the "altruism" and less on the "effective", and do not fully update toward rationality. On the other hand it seems some people enter the rationality community, get mostly aligned with EA goals in the very abstract, but do not fully update toward actually acting.

Comment by jan_kulveit on Is Science Slowing Down? · 2018-11-29T20:22:14.602Z · score: 3 (3 votes) · LW · GW

Big part of this follows from the

Law of logarithmic returns:
In areas of endeavour with many disparate problems, the returns in that area will tend to vary logarithmically with the resources invested (over a reasonable range).

which itself can be derived from a very natural prior about the distribution of problem difficulties, so, yes, it should be the null hypothesis.

Comment by jan_kulveit on Winter Solstice 2018 Roundup · 2018-11-29T01:00:14.204Z · score: 16 (7 votes) · LW · GW

I'm not running it but there is a Prague celebration from 21.12. 16:02 to 22.12. 07:59

CFAR reunion Europe

2018-11-27T12:02:36.359Z · score: 20 (7 votes)
Comment by jan_kulveit on Last Chance to Fund the Berkeley REACH · 2018-08-24T21:43:50.725Z · score: 9 (5 votes) · LW · GW

It seems to be moving in good direction! Things I noticed and like include

  • Seems a larger group of people is involved in the management of the place
  • It has a web separate from http://www.bayrationality.com/
  • The back rooms now look like more coworking space, no longer like thrift store
  • Various "suggested donation" things now really look like "suggested donation" less like "if you don't pay this price you should be ashamed"
  • You seem less stressed
  • It seems REAC will turn into REACH, & similar

The impression I had before was more nuanced that how you possibly interpreted it at that time. I'm definitely pro "people need spaces"; I also believe how spaces feel have important and underappreciated influence on what people do in them. To somehow sum it, I like how things have changed.

Comment by jan_kulveit on Logarithms and Total Utilitarianism · 2018-08-13T08:13:59.462Z · score: 4 (3 votes) · LW · GW

Well, I posted the same argument in January. Unfortunately (?) with a bunch of other more novel ideas and without plots and (trivial) bits of calculus. Unfortunately (?) I did not make the bold claim the paradox is resolved or dissolved, but just the claim In the real world we are always resource constrained and the question must be “what is the best population given the limited resources”, therefore the paradox is resolved for most practical purposes.

Comment by jan_kulveit on Human-Aligned AI Summer School: A Summary · 2018-08-09T20:27:24.155Z · score: 9 (4 votes) · LW · GW

Thanks for summary of some of the talks!

Just to avoid some unnecessary confusion, I'd like to point out the name of the event was Human-aligned AI Summer School.

A different event, AI Safety Camp, is also happening in Prague, in October.

While there is a substantial overlap between both organizers and participants, the events have somewhat different goals, are geared toward slightly different target audiences. The summer school is pretty much in the format of an "academic summer school", where you have talks, coffee breaks, social events, and similar structured program, but usually not something like substantial amount of time to do your own independent research. The camp is the complement - lot of time to do independent research, not much structured program, no talks by senior researchers, no coffee breaks and also no university backing.

Maybe, at some point we may try some mixture, but now there are large differences. It is important to understand them and have different expectations from each event.

Comment by jan_kulveit on Logarithms and Total Utilitarianism · 2018-08-09T17:30:49.369Z · score: 5 (3 votes) · LW · GW

Check my post on Nonlinear perception of happiness - the logarithm is assumed to be in a different place, but the part about implication to ethics contains a version of this argument.

Comment by jan_kulveit on Intertheoretic utility comparison · 2018-07-03T17:14:23.566Z · score: 6 (3 votes) · LW · GW

It seems worth mentioning than anything which involves enumerating over the space of possible actions, or policies, is often not tractable in practice (or, will be exploitable by adversarial enumeration)

So another desideratum may be "it's easy to implement using sampling". On this, normalizing by some sort of variance is probably best.

Comment by jan_kulveit on Dissolving the Fermi Paradox, and what reflection it provides · 2018-07-02T20:34:32.333Z · score: 5 (1 votes) · LW · GW

I've re-posted the question about why this inadequacy under Meta

Why it took so long to do the Fermi calculation right?

2018-07-02T20:29:59.338Z · score: 67 (20 votes)
Comment by jan_kulveit on Are ethical asymmetries from property rights? · 2018-07-02T16:32:44.780Z · score: 3 (2 votes) · LW · GW

An interesting observation. Somewhat weird source of information on this may come from societies with slavery, where people's lives could have been property of someone else. (that is, most human societies)

Comment by jan_kulveit on Are ethical asymmetries from property rights? · 2018-07-02T16:26:18.117Z · score: 3 (2 votes) · LW · GW

I would guess this is something someone already explored somewhere, but the act-omission distinction seems a natural consequence of intractability of "actions not taken"?

The model is this: the moral agent takes a sample from some intractably huge action space. Evaluates each sampled action by some moral function M (for example by rejection sampling based on utility), and does something.

From an external perspective, morality likely is about the moral function M (and evaluating agents based on that), in contrast to evaluating them based on the sampling procedure.

Comment by jan_kulveit on Dissolving the Fermi Paradox, and what reflection it provides · 2018-07-02T10:55:28.802Z · score: 4 (3 votes) · LW · GW

My intuition is people should actually bet on current anthropic reasoning less than they do. The reason is it is dangerously simple to construct simple examples with some small integer number of universes. I believe there is a significant chance these actually do not generalize to the real system in some non-obvious way.

One of the more specific reasons why I have this intuition is, it is actually quite hard to do any sort of "counting" of observers even in the very non-speculative world of quantum mechanics. When you go more in the direction of Tegmark's mathematical universe, I would expect the problem to get harder.

Comment by jan_kulveit on Paul's research agenda FAQ · 2018-07-02T10:15:40.797Z · score: 21 (8 votes) · LW · GW

On the other hand, you should consider the advantages of having this discussion public. I find it quite valuable to see this, as the debate sheds more light on some of both Paul's and Eliezer's models. If they just sat down for a weekend, talked, and updated, it may be more efficient, but a black-box.

My intuition is from a more strategical perspective, the resource we actually need the most are "more Pauls and Eliezers", and this may actually help.

Comment by jan_kulveit on Dissolving the Fermi Paradox, and what reflection it provides · 2018-07-01T19:26:22.015Z · score: 12 (5 votes) · LW · GW

Done. Note it was not about the rationality community, but about the broader set of people thinking about this problem.

For reference

What else to notice?

On meta level, it seems to me seriously important to notice that it took so long until some researchers noticed and did the statistics right. Meanwhile, lots of highly speculative mechanisms resolving the largely non-existent paradox were proposed.This may indicate something important about the community. As an example: may there be a strong bias for searching for grand, intellectually intriguing solutions?

Comment by jan_kulveit on Last Chance to Fund the Berkeley REACH · 2018-07-01T09:36:33.882Z · score: 8 (7 votes) · LW · GW

Unfortunately this

I actually had more of a response to your email in the original draft, and was advised to cut it for the final version by multiple people as being too personal/specific for a public post.

is not a positive sign.

The core worries of my email were impersonal and quite generic.

So the core of my email was this concern:

This should be not much controversial facts
1) REACH put itself into the position of being one of the most public-facing things in Berkeley community. If non-Berkeley people come to Bay and look for events and places, they'll highly likely land up at REACHs page, or at the physical space.
-- you have obvious incentives to be it that way
2) REACH got some support from SSC, so many people noticed, some of
them are donating
3) This is actually happening: during my stay, few times some random
person asked at the door that they heard about REACH and wanted to see
it.
4) As Berkeley is one of the main hubs, people from other places will
come to Berkeley and take inspiration

the result is REACH is one of the "store-fronts" of the community

These are my vague impressions
5) At the same time, it seems REACH is actually supported only by some
part of the community, or the support is more like "donating stuff"
6) The ambition is more focused on the local community.

On the object level
7) It is very expensive (...)
8) It is obviously cash-strapped
9) The lack of money is signaled in various ways

(which together make it somewhat problematic store-front)

So in short, my worry was that having REACH as one of the most public facing things with the external world could be harmful mainly from signalling/PR perspective.

As it may be unclear what I have meant by 8) and 9): we have talked previously in person about, for example, "donated clothes exchange". While "donated clothes exchange" may be a typical activity of a "stock community center", it is surprising in a rationalist centre. From the signalling perspective, to an external visitor, it shows

  • implicitly very low valuation of your time
  • implicitly somewhat low valuation of the cost of your space

The rest of the email were mainly constructive suggestions how to get aligned/ potentially get funding from CEA to get the place more professional look, and that really was more personal.

The more meta-point is: What I care in this about is EA (and rationality, x-risk, etc), and I raised a concern about a possible harm to these from a "brand" perspective. The harm caused by such problem would be mainly in opportunities (the specific way of causing impact being some of the random SSC readers visiting, looking at this, and turning away with the first impression "ok these people talk a lot about changing the world online, but it is not a serious effort")

If it was impossible to distill some sort of impersonal, generic concern from the previously quoted text, and if multiple people advised you that addressing concerns like this is too personal/specific for a public post, than, well, that's the point 5).

Comment by jan_kulveit on Policy Approval · 2018-07-01T01:25:23.127Z · score: 9 (3 votes) · LW · GW

I was really surprised that the "background problem" is almost the same problem as in value learning in some formulations of bounded rationality. In information-theoretic bounded rationality formalism, the bounded agent acts based on combination of prior (representing previous knowledge) and utilities (what the agent wants). (It seems in some cases of updating humans, it is possible to disentangle the two.)

While the "counterexamples" to "optimizing human utility according to AI belief" show how this fails in somewhat tricky cases, it seems to me it will be easy to find "counterexamples" where "policy-approval agent" would fail (as compared to what is intuitively good)

From an "engineering perspective", if I was forced to choose something right now, it would be an AI "optimizing human utility according to AI beliefs" but asking for clarification when such choice diverges too much from the "policy-approval".

Comment by jan_kulveit on Dissolving the Fermi Paradox, and what reflection it provides · 2018-06-30T23:18:49.967Z · score: 15 (6 votes) · LW · GW

Because people draw incorrect conclusions from the point estimates. You can have high expected value of the distribution (e.g. "millions of civilizations") while at the same time having big part of the probability mass on outcomes with just one civilization, of few civilizations far away.

Dissolving the Fermi Paradox, and what reflection it provides

2018-06-30T16:35:35.171Z · score: 30 (12 votes)

Effective Thesis meetup

2018-05-31T19:49:56.285Z · score: 15 (3 votes)

Far future, existential risk, and AI alignment

2018-05-10T09:51:43.278Z · score: 4 (1 votes)

Review of CZEA "Intense EA Weekend" retreat

2018-04-05T23:04:09.398Z · score: 62 (17 votes)

Brno: Far future, existential risk and AI safety

2018-04-02T19:11:06.375Z · score: 10 (2 votes)

Life hacks

2018-04-01T10:29:20.023Z · score: 16 (4 votes)

Welcome to LessWrong Prague [Edit With Your Details]

2018-04-01T10:23:36.557Z · score: 4 (1 votes)

Welcome to Czech Association for Effective Altruism [Edit With Your Details]

2018-04-01T10:12:16.508Z · score: 10 (2 votes)

Reward hacking and Goodhart’s law by evolutionary algorithms

2018-03-30T07:57:05.238Z · score: 47 (11 votes)

Optimal level of hierarchy for effective altruism?

2018-03-27T22:38:27.967Z · score: 8 (2 votes)

GoodAI announced "AI Race Avoidance" challenge with $15k in prize money

2018-01-18T18:05:09.811Z · score: 27 (12 votes)

Nonlinear perception of happiness

2018-01-08T09:04:15.314Z · score: 15 (10 votes)