Posts

Update: Predicted AI alignment event/meeting calendar 2019-09-13T09:05:28.741Z · score: 3 (1 votes)
Predicted AI alignment event/meeting calendar 2019-08-14T07:14:57.233Z · score: 22 (8 votes)
Which of these five AI alignment research projects ideas are no good? 2019-08-08T07:17:28.959Z · score: 25 (9 votes)
Job description for an independent AI alignment researcher 2019-07-13T09:47:54.502Z · score: 8 (6 votes)
Please give your links speaking names! 2019-07-11T07:47:07.981Z · score: 43 (19 votes)
How to deal with a misleading conference talk about AI risk? 2019-06-27T21:04:32.828Z · score: 21 (9 votes)
Agents dissolved in coffee 2019-06-04T08:22:04.665Z · score: 4 (2 votes)
The Stack Overflow of Factored Cognition 2019-04-21T12:19:39.262Z · score: 4 (2 votes)
Factored Cognition with Reflection 2019-04-06T10:00:50.497Z · score: 15 (6 votes)
A cognitive intervention for wrist pain 2019-03-17T05:26:58.910Z · score: 24 (13 votes)
(Non-)Interruptibility of Sarsa(λ) and Q-Learning 2016-11-16T04:22:06.000Z · score: 2 (2 votes)
Earning money with/for work in AI safety 2016-07-18T05:37:55.551Z · score: 7 (8 votes)

Comments

Comment by rmoehn on A cognitive intervention for wrist pain · 2019-09-10T12:03:46.958Z · score: 1 (1 votes) · LW · GW

Thanks for writing this! It's good to see more sceptical approach. Do you have any more recommendations for reading on the subject?

Not really, sorry. Wacky old Sarno did the job for me, so I didn't look further. Then I took what rational argument I could find and put it in the above article. However, for the people who think that the human body is easily broken, I'll repeat one recommendation from above: Through the Valley by Col. William Reeder.

EDIT: Another recommendation: When I have sports-related issues, I treat them with recommendations from Becoming a Supple Leopard by Kelly Starrett. And when this doesn't fix it, I call one of the PTs at what used to be MobilityWOD. Apparently they've changed their branding to ‘The Ready State’.

I've had RSI for five years now. I read Sarno, tried the Curable app, and tried on the hypothesis that my pain was psychosomatic. For my case, the benefit I've got from a more psychosomatic approach is to try to form fewer negative associations with the pain. I used to view the pain as an indication that my body was broken and that I was ruined. Now I still have the pain, but I have much less of that secondary psychological reaction to the pain, and that's greatly improved my life.

I've heard a similar story from a friend with chronic fatigue. Good for you!

[…] In my estimation, this article and other arguments that RSI is psychosomatic move too quickly from (true) evidence that chronic pain is a weird and mysterious to the claim that it must be psychosomatic.

I'm not saying that all RSI is psychosomatic. Sorry for not being clear. I just know that my case was psychosomatic, so I assume that it's psychosomatic for a certain unknown percentage of wrist pain sufferers.

My reasoning is this: I had severe wrist pain. And the physical remedies I tried didn't work. I read a book that gave me a few ideas and "thought remedies" and the pain went away. And it's been staying away for years, no matter how much I type. (As I mentioned in the article, I get occasional slight, which pains I attribute to stress and which go away quickly.) As the psychological change led to a physical change, I conclude that I've had psychosomatic pain. And since it's unlikely that only I had it this way, I conclude that there must be other sufferers of psychosomatic wrist pain.

The pain being ‘weird’ is not required for my argument. There is one paragraph mentioning ‘strange’ pain, but that's just one of my handwavy diagnostic criteria, not an antecedent.

I worry that saying that RSI is psychosomatic feels like it explains the condition, but really doesn't explain it very well. I like that in your post you make some predictions based on your hypothesis.

I'm not aware of any satisfying explanation. I just know that changing my mind somehow cured my pain, so I call it ‘psychosomatic’.

Actually I make another prediction in the comments: ‘If stress causes wrist pain and people stress out, because they think that typing is bad for them, wrist pain should be "contagious". Take an office full of workers who are doing fine. Then one starts having wrist pain for whatever reason, finds online warnings about RSI, tells their colleagues, they get worried about their work being harmful for them, and some of them also start having wrist pain.’

It would be informative to make a study of the EA and rationality communities and see if we find a contagion pattern. I thought about doing it myself, but my intuitions are stuck in the ‘RSI is psychosomatic’ camp. So I would just be seeing evidence the way I want to see it.

My impression is that the hypothesis of myofascial trigger points has better evidence and does a better job of explaining cases of RSI, and many people who argue that RSI is largely psychosomatic are not aware of the theory of myofascial trigger points.

This might be the explanation for the physical cases. I have skimmed the page on trigger points. And what you write about the foam roller and the lacrosse ball sounds like what I'm doing when I've messed up some body part with poor weightlifting form. This works well, even though I wouldn't explain the weightlifting pains with trigger points.

Another thing I'd like to warn against is trusting the ‘pleasurable feeling of pain’. Doing this, I once (or twice?) seriously messed up the muscles under my shoulder blade and later aggravated my elbow joint. All healed, though.

I should note that I'm probably biased against the hypothesis that RSI is largely psychosomatic. This is because it feels like the hypothesis trivializes my condition. Of course, I think this bias is silly, but I think I do still have it.

Trivializes it how? I wouldn't consider psychosomatic issues trivial at all. In fact, it's terrible that a mostly reasonable person like me can be kicked into a vicious circle of stress and pain by well-reputed and well-intended information from family, friends or the web. This is why I want to change the communication around wrist pain: help the people with purely physical pain, but don't make it worse for those with a penchant for psychosomatic issues.

I also wonder if this bias could explain why I haven't got benefit from a psychosomatic approach to my RSI. I do certainly seem to meet the psychological profile of people who are susceptible to psychosomatic pain I've heard described in books and media such as Sarno's.

I guess I'm the opposite. Sarno's arguments somehow made it into my brain to a degree sufficient to get rid of the pain. And being ‘magically’ cured this way has reinforced the psychosomatic hypothesis to a point where a "PSYCHOSOMATIC" neon sign pops up in my head whenever I hear about a mysterious, unexplained condition. Which in turn shuts down the mechanisms that originally caused the pain.

Of course, when I write about this, I try to shield my eyes from the neon sign and concentrate on established facts.

Just for fun (and not for argument, please), here's a little rave from the troupe that's providing the electricity for the neon sign:

Overuse injury from typing on a keyboard? Maybe. It is an awkward movement. But it appears absurd to me that using a mouse would lead to overuse injuries. Come on! We open and close our hands all day long, and flex our fingers with much greater force than is required for a click. And think of all the repetitive (handcraft) activities that people had to do in the past! Typing on mechanical typewriters, playing musical instruments, copying books by hand, grinding grain with a stone, knitting, sewing, spinning, weaving baskets and cloth, weeding, picking berries, making arrows, ropes and fishing nets, planing, sawing, cutting, hacking, thatching, carving, filing…

Heck, if typing was so bad for you, shouldn't half the secretaries of the mechanical typewriter era have fallen out of the workforce within three years?

I've written this post with my recommendations for treatment here. I have a section on the psychological component to RSI, but I don't discuss the hypothesis that RSI could be more or less entirely psychosomatic.

I saw that section and I'm happy that it's there. Thank you!

Comment by rmoehn on Literature Review: Distributed Teams · 2019-09-07T03:17:46.032Z · score: 3 (2 votes) · LW · GW

If you need more input, I recommend:

They're podcasts, not literature. But you can download all the shownotes, which read like a whitepaper, if you buy a one-month licence for $20.

Comment by rmoehn on Does anyone know of a good overview of what humans know about depression? · 2019-08-31T03:17:16.786Z · score: 5 (3 votes) · LW · GW

Johann Hari: Lost Connections – I listened to the Making Sense Podcast episode where Sam Harris talked with the author. It made me want to read the book, but I haven't gotten around to it yet.

Also, the Library of Congress looks like a good starting point for finding books on the subject: SUBJECTS beginning with: Depression, Mental.

Comment by rmoehn on Hammertime Day 1: Bug Hunt · 2019-08-17T12:13:17.467Z · score: 3 (2 votes) · LW · GW

I think the whole sequence will be very useful for me. And I recommend mentioning in section 1 that the we will enter the bugs in a spreadsheet later. If I had known that from section 1, I would have typed the bug list into the computer right away, rather than hand-writing it first and then typing it up. Not a good use of my time…

Comment by rmoehn on Predicted AI alignment event/meeting calendar · 2019-08-15T23:03:38.318Z · score: 3 (2 votes) · LW · GW

Just ‘meeting’ sounds too unimportant. But I've added it to the title, which removes the ambiguity.

Comment by rmoehn on Predicted AI alignment event/meeting calendar · 2019-08-15T01:45:35.587Z · score: 1 (1 votes) · LW · GW

Thanks for pointing that out. Do you have a suggestion for a less misleading title?

Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-11T05:10:12.160Z · score: 1 (1 votes) · LW · GW

I don't think that would work in this case. I derived the project idea from Thoughts on reward engineering, section 2. There the overseer generates rewards based on its preferences and provides these rewards to RL agents.

Suppose the training starts with the overseer generating rewards from its preferences and the agents updating their value functions accordingly. After a while the agents propose something new and the overseer generates a reward that is inconsistent with those it has generated before. But it happens that this one is the true preference and the proper fix would be to revise the earlier rewards. However, rewarded is rewarded – I guess it would be hard to reverse the corresponding changes in the value functions.

Of course one could record all actions and rewards and snapshots of the value functions, then rewind and reapply with revised rewards. But given today's model sizes and training volumes, it's not that straightforward.

Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-11T04:45:38.931Z · score: 1 (1 votes) · LW · GW

Good idea, thank you!

Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-10T21:58:18.300Z · score: 1 (1 votes) · LW · GW

No, I wasn't aware of that. Then I guess I have to come up with a different mechanism for my next poll.

Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-10T00:28:19.919Z · score: 1 (1 votes) · LW · GW

Thanks for the votes so far! The poll is still open.

By the way, I'd prefer if you only give upvotes. That's how approval voting works. If you're concerned that it would skew my total karma, feel free to balance your upvotes by voting down this comment.

Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-08T07:12:57.266Z · score: -1 (3 votes) · LW · GW
I'm studying the effects of an inconsistent comparison function on optimizing
with comparisons,
    because I want to know whether it prevents the two agents from converging on
    a desirable equilibrium quickly enough
        in order to help my reader understand whether optimizing with
        comparisons can solve the problem of inconsistency and unreliability in
        reward engineering.
Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-08T07:12:39.691Z · score: 0 (4 votes) · LW · GW
I'm studying the effects of importance sampling on the behaviour that an RL
agent learns,
    because I want to find out whether it can lead to undesirable outcomes
        in order to help my reader understand whether importance sampling can
        solve the problem of widely varying rewards in reward engineering.
Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-08T07:12:14.646Z · score: 1 (6 votes) · LW · GW
I'm studying the use of a discriminator in imitation learning,
    because I want to find out how to help humans produce demonstrations that
    the agent can imitate,
        in order to help my reader understand how we might use imitation
        learning to solve the reward engineering problem.
Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-08T07:11:38.685Z · score: 5 (7 votes) · LW · GW
I'm studying ways to improve the sample efficiency of a supervised learner,
    because I want to know how to reduce the number of calls to H in
    ‘Supervising strong learners by amplifying weak experts’
    (https://www.lesswrong.com/s/EmDuGeRw749sD3GKd/p/xKvzpodBGcPMq7TqE),
        in order to help my reader understand how we can adapt that
        proof-of-concept for solving real world tasks that require even more
        training data.
- This doesn't just mean achieving more with the samples we have. It can mean
  finding new kinds of samples that convey more information, and finding new
  ways of extracting them from the human and conveying them to the learner.
Comment by rmoehn on Which of these five AI alignment research projects ideas are no good? · 2019-08-08T07:10:59.539Z · score: 7 (8 votes) · LW · GW
I'm studying Bayesian machine learning,
    because I want to understand how to make ML systems that notice when they
    are confused
        in order to help my reader understand how to make ML systems that will
        ask the overseer for input when doing otherwise would lead to failure.
- More a study project than a research project.
Comment by rmoehn on Preface to the sequence on iterated amplification · 2019-08-06T02:02:52.879Z · score: 1 (1 votes) · LW · GW

It looks like the posts Security amplification and Meta-execution belong to the sequence, but don't show up in the sequence overview.

Comment by rmoehn on Techniques for optimizing worst-case performance · 2019-08-01T08:05:42.785Z · score: 1 (1 votes) · LW · GW

Does ‘trusted’ mean ‘certified that the system won't behave badly on any input’?

Comment by rmoehn on Techniques for optimizing worst-case performance · 2019-08-01T07:13:33.618Z · score: 1 (1 votes) · LW · GW

I also had trouble understanding that sub-clause. Maybe we read it in our head with the wrong emphasis:

its behavior can only be intelligent when it is exercised on the training distribution

Meaning: The agent gets inputs that are within the training distribution. ↔ The agent behaves intelligently.

But I guess it's supposed to be:

its behavior can only be intelligent when it is exercised on the training distribution

Meaning: A behaviour is intelligent. ↔ The behaviour was exercised during training on the training distribution.

Comment by rmoehn on Learning with catastrophes · 2019-07-31T22:10:39.272Z · score: 1 (1 votes) · LW · GW

I agree.

Comment by rmoehn on Learning with catastrophes · 2019-07-31T00:17:02.215Z · score: 1 (1 votes) · LW · GW

"never letting a catastrophe happen" would incentivize the agent to spend a lot of resources on foreseeing catastrophes and building capacity to ward them off. This would distract from the agent's main task. So we have to give the agent some slack. Is this what you're getting at? The oracle needs to decide whether or not the agent can be held accountable for a catastrophe, but the article doesn't say anything how it would do this?

Comment by rmoehn on Learning with catastrophes · 2019-07-30T02:24:11.124Z · score: 1 (1 votes) · LW · GW

In particular, it precludes the following scenario: the environment can do anything computable, and the oracle evaluates behavior only based on outcomes (observations).

Paul explicitly writes that the oracle sees both observations and actions: ‘This oracle can be applied to arbitrary sequences of observations and actions […].’

or an oracle that judges "catastrophe" based on the agent's action in addition to outcomes (which I suspect will cache out to "are the actions in this transcript knowably going to cause something bad to happen")

This is also covered:

Intuitively, a transcript should only be marked catastrophic if it satisfies two conditions:

  1. The agent made a catastrophically bad decision.
  2. The agent’s observations are plausible: we have a right to expect the agent to be able to handle those observations.
Comment by rmoehn on Capability amplification · 2019-07-29T08:22:25.669Z · score: 1 (1 votes) · LW · GW

(Deleted.)

Comment by rmoehn on [deleted post] 2019-07-29T08:19:34.604Z

blaum

Comment by rmoehn on Capability amplification · 2019-07-29T08:03:24.585Z · score: 1 (1 votes) · LW · GW

An obstruction to capability amplification is a partition of the policy class 𝓐 into two parts 𝓛 and 𝓗, such that we cannot amplify any policy in 𝓛 to be at least as good as any policy in 𝓗.

[…] can we sensibly define “good behavior” for policies in the inaccessible part 𝓗?

This seems to be circular, since determining 𝓛 and 𝓗 depends on good behaviour being defined.. I guess what is meant is that we amplify policies in 𝓛 until we hit a ceiling (a fixed point?). Then 𝓗 = {A ∈ 𝓐 | ¬ A reachable from 𝓛}. We suspect that 𝓗 is non-empty, but we don't know how the policies in there look.

Comment by rmoehn on Factored Cognition · 2019-07-28T07:43:36.319Z · score: 1 (1 votes) · LW · GW

This first section of Ought: why it matters and ways to help answers the question. It's also a good update on this post in general.

Comment by rmoehn on Factored Cognition · 2019-07-18T02:43:15.057Z · score: 1 (1 votes) · LW · GW

Distillation: We train an ML agent to implement a function from questions to answers based on demonstrations (or incentives) provided by a large tree of experts […]. The trained agent […] only replicates the tree's input-output behavior, not individual reasoning steps.

Why do we decompose in the first place? If the training data for the next agent consists only of root questions and root answers, it doesn't matter whether they represent the tree's input-output behaviour or the input-output behaviour of a small group of experts who reason in the normal human high-context, high-bandwidth way. The latter is certainly more efficient.

There seems to be a circular problem and I don't understand how it is not circular or where my understanding goes astray: We want to teach an ML agent aligned reasoning. This is difficult if the training data consists of high-level questions and answers. So instead we write down how we reason explicitly in small steps.

Some tasks are hard to write down in small steps. In these cases we write down a naive decomposition that takes exponential time. A real-world agent can't use this to reason, because it would be too slow. To work around this we train a higher-level agent on just the input-output behaviour of the slower agent. Now the training data consists of high-level questions and answers. But this is what we wanted to avoid, and therefore started writing down small steps.

Decomposition makes sense to me in the high-bandwidth setting where the task is too difficult for a human, so the human only divides it and combines the sub-results. I don't see the point of decomposing a human-answerable question into even smaller low-bandwidth subquestions if we then throw away the tree and train an agent on the top-level question and answer.

Comment by rmoehn on Please give your links speaking names! · 2019-07-14T01:02:10.522Z · score: 1 (1 votes) · LW · GW

Thanks!

Comment by rmoehn on Please give your links speaking names! · 2019-07-13T09:41:03.128Z · score: 1 (1 votes) · LW · GW

Thanks for the info! By the way, Markdown included the period after ‘638’ in the href attribute. Also a bug?

Comment by rmoehn on Please give your links speaking names! · 2019-07-13T09:36:44.713Z · score: 1 (1 votes) · LW · GW

Great suggestion! According to a StackOverflow answer, this CSS will do the trick:

a[href^="http://"]:after{
    content: " (" attr(title) ") ";
}
Comment by rmoehn on Please give your links speaking names! · 2019-07-12T21:57:56.133Z · score: 1 (1 votes) · LW · GW

Good idea! I will try that.

Comment by rmoehn on Please give your links speaking names! · 2019-07-12T09:11:10.523Z · score: 1 (1 votes) · LW · GW

So you're reading in the browser. My main point is about people reading on paper and articles that are easier to read on paper.

Comment by rmoehn on Please give your links speaking names! · 2019-07-12T09:09:48.432Z · score: 1 (1 votes) · LW · GW

Do you read in the browser or on paper?

Comment by rmoehn on Please give your links speaking names! · 2019-07-12T09:08:42.256Z · score: 1 (1 votes) · LW · GW

Thanks! I'm guilty myself and I will certainly do better.

Comment by rmoehn on Please give your links speaking names! · 2019-07-12T09:08:21.387Z · score: 1 (1 votes) · LW · GW

Good point. I experimented for ten minutes with saving the HTML, changing it and loading it again in the browser. But it doesn't work for LessWrong. The article appears briefly and then it switches to: ‘Sorry, we couldn't find what you were looking for.’ I didn't feel like figuring this out.

Comment by rmoehn on Please give your links speaking names! · 2019-07-11T22:34:09.943Z · score: 1 (1 votes) · LW · GW

without having to hover over the link to view its URL

Indeed, that's something I do all the time.

costs, such as breaking up the flow of the text

On the other hand it breaks the flow of the reading (on paper) if I have to open the article on my computer and find the link to hover over it.

the effort of typing or copy/pasting the article title (especially on mobile)

How much effort is this compared to the effort of all the readers who have to look up what is behind a link?

Comment by rmoehn on Please give your links speaking names! · 2019-07-11T22:27:25.376Z · score: 1 (1 votes) · LW · GW

where links are treated like footnotes or words or phrases

Unfortunately they are not rendered as footnotes when printed.

There is also a curse of knowledge issue. The author knows what is behind their link, how important it is, whether it is a reference or a definition or a "further reading". The reader has no idea. So the least I'm likely to do for any non-speaking link is hover over it to see what URL it points to. This wouldn't be necessary if the link were named with something close to the title of its target.

it's often a style choice

And it's best to choose a style that supports the function, right? I don't mind "punctuation style" in most ordinary blog posts. But it doesn't work for (semi-)scientific material that is likely to be printed. Especially by beginners like me. Maybe more advanced people can just tear through an article on, say, Benign model-free RL, but I need the aid of pages spread on my desk.

Comment by rmoehn on Iterated Distillation and Amplification · 2019-07-11T05:22:43.517Z · score: 1 (1 votes) · LW · GW

In the pseudocode, it would make more sense to initialize A <- Distill(H), wouldn't it? Otherwise, running Amplify with the randomly initialized A in the next step wouldn't be helpful.

Comment by rmoehn on How to deal with a misleading conference talk about AI risk? · 2019-06-28T21:49:50.198Z · score: 21 (5 votes) · LW · GW

I've added specifics. I hope this improves things. If not, feel free to edit it out.

Thanks for pointing out the problems with my question. I see now that I was wrong to combine strong language with no specifics and a concrete target. I would amend it, but then the context for the discussion would be gone.

Comment by rmoehn on Decision Theory · 2019-06-14T03:45:31.948Z · score: 1 (1 votes) · LW · GW

I guess then it would have to prove that it will find a proof with x > 0 within t. This is difficult.

Comment by rmoehn on Decision Theory · 2019-06-14T03:31:33.846Z · score: 1 (1 votes) · LW · GW

In the alternative algorithm for the five-and-ten problem, why should we use the first proof that we find? How about this algorithm:

A2 :=
  Spend some time t searching for proofs of sentences of the form
  "A2() = a → U() = x"
  for a ∈ {5, 10}, x ∈ {0, 5, 10}.
  For each found proof and corresponding pair (a, x):
    if x > x*:
      a* := a
      x* := x
  Return x*

If this one searches long enough (depending on how complicated U is), it will return 10, even if the non-spurious proofs are longer than the spurious ones.

Comment by rmoehn on Model Mis-specification and Inverse Reinforcement Learning · 2019-06-11T07:04:06.097Z · score: 3 (2 votes) · LW · GW

Humans can be assigned any values whatsoever… is a great basis for understanding the last section of this article.

Comment by rmoehn on Humans can be assigned any values whatsoever… · 2019-06-10T00:58:26.525Z · score: 7 (4 votes) · LW · GW

How I understand the main point:

The goal is to get superhuman performance aligned with human values . How might we achieve this? By learning the human values.Then we can use a perfect planner to find the best actions to align the world with the human values. This will have superhuman performance, because humans' planning algorithms are not perfect. They don't always find the best actions to align the world with their values.

How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy .

Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that .

Unfortunately, this equation is underdetermined. We only know . and can vary independently.

Are there differences among the candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.

Comment by rmoehn on Humans can be assigned any values whatsoever... · 2019-06-10T00:49:21.957Z · score: 1 (1 votes) · LW · GW

Comment deleted, because I had accidentally posted it on this version of the article. See instead here.

Comment by rmoehn on Prosaic AI alignment · 2019-06-05T00:46:39.842Z · score: 3 (2 votes) · LW · GW

Thanks for the clarification! Now I understand where the value comes from.

However, in such a situation, would we only work on the most likely possibility? I agree that a single person does better by concentrating. But a group of researchers would work on each of the more likely approaches in order to mitigate risk.

Comment by rmoehn on Embedded Agents · 2019-06-04T21:36:27.653Z · score: 1 (1 votes) · LW · GW

I had a thought about this that was too long for a comment, so I've posted it separately. Bottom line:

When thinking about embedded agency it might be helpful to drop the notion of ‘agency’ and ‘agents’ sometimes, because it might be confusing or underdefined. Instead one could think of [sub-]processes running according to the laws of physics. Or of algorithms running on a stack of interpreters running on the hardware of the universe.

Comment by rmoehn on Agents dissolved in coffee · 2019-06-04T21:21:38.740Z · score: 2 (2 votes) · LW · GW

Thanks! Fixed.

Comment by rmoehn on Agents dissolved in coffee · 2019-06-04T09:11:14.753Z · score: 1 (1 votes) · LW · GW

Thanks for the feedback! I will remove most of the hedging and self-deprecation. Mainly because of your point about making people more likely to attack my reasoning.

By the way, in my case the hedging and self-deprecation doesn't come from a lack of self-confidence.

Comment by rmoehn on Prosaic AI alignment · 2019-06-03T08:32:13.289Z · score: 3 (2 votes) · LW · GW

Note that even in the extreme case where our approach to AI alignment would be completely different for different values of some unknown details, the speedup from knowing them in advance is at most 1/(probability of most likely possibility).

Shouldn't this be ‘probability of least likely possibility’?

Suppose we have an unknown detail with possible values. What fraction of the total effort should we spend on the approach for ? I guess if we work on all approaches in parallel. For example, if value has probability , we should spend of the total effort on the approach for value .

What happens when we find out the true value of ? Let the true value be . Then we can concentrate all the effort on the approach for . Whereas previously the fraction of the effort for value was , it's now 1. Therefore the speedup is . When is this greatest? When is least.

Comment by rmoehn on The Stack Overflow of Factored Cognition · 2019-04-22T16:35:47.176Z · score: 1 (1 votes) · LW · GW

My immediate reaction is: why do you think the real and not the toy problems you are trying to solve are factorizable?

My immediate reaction is: why do you ask this question here? Wouldn't it be better placed under an authoritative article like this rather than my clumsy little exploration?

why do you think that the task of partitioning the question is any easier than actually solving the question? Currently the approach in academia is hiring a small number of relatively well supervised graduate students, maybe an occasional upper undergrad, to assist in solving a subproblem.

To me this looks like you're answering your own question. What am I not understanding? If I saw the above Physics questions and knew something about the topic, I would probably come up with a list of questions or approaches. Someone else could then work on each of those. The biggest issue that I see is that so much information is lost and friction introduced when unraveling a big question into short sub-questions. It might not be possible to recover from that.

I do not know how much research has been done on factorizability

This is part of what Ought is doing, as far as I understand. From the Progress Update Winter 2018: ‘Feasibility of factored cognition: I'm hesitant to draw object-level conclusions from the experiments so far, but if I had to say something, I'd say that factored cognition seems neither surprisingly easy nor surprisingly hard. I feel confident that our participants could learn to reliably solve the SAT reading comprehension questions with a bit more iteration and more total time per question, but it has taken iteration on this specific problem to get there, and it's likely that these experiments haven't gotten at the hard core of factored cognition yet.’

Comment by rmoehn on The Stack Overflow of Factored Cognition · 2019-04-22T16:02:05.704Z · score: 1 (1 votes) · LW · GW

Thanks for the explanations and for pointing me back to dialog markets!