## Job description for an independent AI alignment researcher

2019-07-13

Thanks for the info! By the way, Markdown included the period after ‘638’ in the href attribute. Also a bug?

Great suggestion! According to a StackOverflow answer, this CSS will do the trick:

a[href^="http://"]:after{
content: " (" attr(title) ") ";
}


Good idea! I will try that.

So you're reading in the browser. My main point is about people reading on paper and articles that are easier to read on paper.

Do you read in the browser or on paper?

Thanks! I'm guilty myself and I will certainly do better.

Good point. I experimented for ten minutes with saving the HTML, changing it and loading it again in the browser. But it doesn't work for LessWrong. The article appears briefly and then it switches to: ‘Sorry, we couldn't find what you were looking for.’ I didn't feel like figuring this out.

without having to hover over the link to view its URL

Indeed, that's something I do all the time.

costs, such as breaking up the flow of the text

On the other hand it breaks the flow of the reading (on paper) if I have to open the article on my computer and find the link to hover over it.

the effort of typing or copy/pasting the article title (especially on mobile)

How much effort is this compared to the effort of all the readers who have to look up what is behind a link?

where links are treated like footnotes or words or phrases

Unfortunately they are not rendered as footnotes when printed.

There is also a curse of knowledge issue. The author knows what is behind their link, how important it is, whether it is a reference or a definition or a "further reading". The reader has no idea. So the least I'm likely to do for any non-speaking link is hover over it to see what URL it points to. This wouldn't be necessary if the link were named with something close to the title of its target.

it's often a style choice

And it's best to choose a style that supports the function, right? I don't mind "punctuation style" in most ordinary blog posts. But it doesn't work for (semi-)scientific material that is likely to be printed. Especially by beginners like me. Maybe more advanced people can just tear through an article on, say, Benign model-free RL, but I need the aid of pages spread on my desk.

2019-07-11
Comment by rmoehn on Iterated Distillation and Amplification

In the pseudocode, it would make more sense to initialize A <- Distill(H), wouldn't it? Otherwise, running Amplify with the randomly initialized A in the next step wouldn't be helpful.

Comment by rmoehn on How to deal with a misleading conference talk about AI risk?

I've added specifics. I hope this improves things. If not, feel free to edit it out.

Thanks for pointing out the problems with my question. I see now that I was wrong to combine strong language with no specifics and a concrete target. I would amend it, but then the context for the discussion would be gone.

## How to deal with a misleading conference talk about AI risk?

2019-06-27
Comment by rmoehn on Decision Theory

I guess then it would have to prove that it will find a proof with x > 0 within t. This is difficult.

Comment by rmoehn on Decision Theory

In the alternative algorithm for the five-and-ten problem, why should we use the first proof that we find? How about this algorithm:

A2 :=
Spend some time t searching for proofs of sentences of the form
"A2() = a → U() = x"
for a ∈ {5, 10}, x ∈ {0, 5, 10}.
For each found proof and corresponding pair (a, x):
if x > x*:
a* := a
x* := x
Return x*


If this one searches long enough (depending on how complicated U is), it will return 10, even if the non-spurious proofs are longer than the spurious ones.

Comment by rmoehn on Model Mis-specification and Inverse Reinforcement Learning

Humans can be assigned any values whatsoever… is a great basis for understanding the last section of this article.

Comment by rmoehn on Humans can be assigned any values whatsoever…

How I understand the main point:

The goal is to get superhuman performance aligned with human values . How might we achieve this? By learning the human values.Then we can use a perfect planner to find the best actions to align the world with the human values. This will have superhuman performance, because humans' planning algorithms are not perfect. They don't always find the best actions to align the world with their values.

How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy .

Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that .

Unfortunately, this equation is underdetermined. We only know . and can vary independently.

Are there differences among the candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.

Comment by rmoehn on Humans can be assigned any values whatsoever...

Comment deleted, because I had accidentally posted it on this version of the article. See instead here.

Comment by rmoehn on Prosaic AI alignment

Thanks for the clarification! Now I understand where the value comes from.

However, in such a situation, would we only work on the most likely possibility? I agree that a single person does better by concentrating. But a group of researchers would work on each of the more likely approaches in order to mitigate risk.

Comment by rmoehn on Embedded Agents

When thinking about embedded agency it might be helpful to drop the notion of ‘agency’ and ‘agents’ sometimes, because it might be confusing or underdefined. Instead one could think of [sub-]processes running according to the laws of physics. Or of algorithms running on a stack of interpreters running on the hardware of the universe.

Comment by rmoehn on Agents dissolved in coffee

Thanks! Fixed.

Comment by rmoehn on Agents dissolved in coffee

Thanks for the feedback! I will remove most of the hedging and self-deprecation. Mainly because of your point about making people more likely to attack my reasoning.

By the way, in my case the hedging and self-deprecation doesn't come from a lack of self-confidence.

## Agents dissolved in coffee

2019-06-04
Comment by rmoehn on Prosaic AI alignment

Note that even in the extreme case where our approach to AI alignment would be completely different for different values of some unknown details, the speedup from knowing them in advance is at most 1/(probability of most likely possibility).

Shouldn't this be ‘probability of least likely possibility’?

Suppose we have an unknown detail with possible values. What fraction of the total effort should we spend on the approach for ? I guess if we work on all approaches in parallel. For example, if value has probability , we should spend of the total effort on the approach for value .

What happens when we find out the true value of ? Let the true value be . Then we can concentrate all the effort on the approach for . Whereas previously the fraction of the effort for value was , it's now 1. Therefore the speedup is . When is this greatest? When is least.

Comment by rmoehn on The Stack Overflow of Factored Cognition

My immediate reaction is: why do you think the real and not the toy problems you are trying to solve are factorizable?

My immediate reaction is: why do you ask this question here? Wouldn't it be better placed under an authoritative article like this rather than my clumsy little exploration?

why do you think that the task of partitioning the question is any easier than actually solving the question? Currently the approach in academia is hiring a small number of relatively well supervised graduate students, maybe an occasional upper undergrad, to assist in solving a subproblem.

To me this looks like you're answering your own question. What am I not understanding? If I saw the above Physics questions and knew something about the topic, I would probably come up with a list of questions or approaches. Someone else could then work on each of those. The biggest issue that I see is that so much information is lost and friction introduced when unraveling a big question into short sub-questions. It might not be possible to recover from that.

I do not know how much research has been done on factorizability

This is part of what Ought is doing, as far as I understand. From the Progress Update Winter 2018: ‘Feasibility of factored cognition: I'm hesitant to draw object-level conclusions from the experiments so far, but if I had to say something, I'd say that factored cognition seems neither surprisingly easy nor surprisingly hard. I feel confident that our participants could learn to reliably solve the SAT reading comprehension questions with a bit more iteration and more total time per question, but it has taken iteration on this specific problem to get there, and it's likely that these experiments haven't gotten at the hard core of factored cognition yet.’

Comment by rmoehn on The Stack Overflow of Factored Cognition

Thanks for the explanations and for pointing me back to dialog markets!

## The Stack Overflow of Factored Cognition

2019-04-21
Comment by rmoehn on Factored Cognition with Reflection

## Factored Cognition with Reflection

2019-04-06
Comment by rmoehn on A cognitive intervention for wrist pain

Thanks! Maybe I can reduce my stress by referring people to the book instead of writing a series of articles. ;-)

Comment by rmoehn on A cognitive intervention for wrist pain

You're welcome to attack my reasoning.

I asked my wife about the hiding of personal weakness and whether someone who has wrist pain would talk about it. She said that the hiding thing is more like: ‘You ask me to help you with something. I'm busy or in pain or whatever, but I can't reject a request, so I have to hide my issue.’ She says that at her workplace people talk openly about pain and if someone had wrist pain, they would wonder about it and ask their colleagues.

Of course, a samurai would never show personal weakness. ;-)

Comment by rmoehn on A cognitive intervention for wrist pain

Thanks for calling me out on that. I added a paragraph about statistics.

By the way, if the cost was $80 billion and suppose the percentage of cases like mine was 10 %, that would be$8 billion caused by the common advice that doesn't take into account cases like mine. What are the actual numbers and how much does the common advice decrease cost vs increase it?

Comment by rmoehn on A cognitive intervention for wrist pain

I still don't understand the first sentence.

If psychological stress slows down healing, that would feed into the vicious cycle as well: small injury → pain → stress → injury doesn't heal as well and gets worse → more pain.

## A cognitive intervention for wrist pain

2019-03-17
Comment by rmoehn on A cognitive intervention for wrist pain

Thanks for all the comments so far. -Tomorrow-On Tuesday I will make some corrections and add a paragraph to clarify my intentions. … Done. Search for "Added" and "Edit".

Comment by rmoehn on A cognitive intervention for wrist pain

Just to clarify: My wrists were never swollen. And they felt cold.

The common view – I know that it is right most of the time. In this case it could be part of the problem. I made another note to look into how the occurrence of wrist pain issues and the reporting about them developed in time and space. Probably I won't get to all of this, but better have a note than not.

Comment by rmoehn on A cognitive intervention for wrist pain

I don't understand the first sentence. Typo?

And it repairs itself a lot worse when it's highly stressed.

Do you mean psychological or physical stress?

Comment by rmoehn on A cognitive intervention for wrist pain

If you have pain that you think is due to wrist inflammation, check out the literature and take action to the degree you can.

The problem with this is that checking the literature on page 1 of Google makes the problem much worse for people like me. Every article about ergonomics, RSI etc. should have a caveat: If your thoughts follow this pattern xyz, don't read the horror stories and fix your thoughts instead.

Here's another proposal (aka Richard Diagnoses Your Chronic Pain): Get yourself some legit strain from heavy physical work, then compare the sensation with your wrist pain. Is the wrist pain similar? Then it might be purely physical. If not, not.

For example, when I swing a kettlebell and pull it by extending my wrists, my forearm muscles have to generate more force than is good for forearm muscles. They get tight and pull on my elbow joint. Now when I flex my elbow, it hurts. This pain is different from my wrist pain. It's synchronous with the flexing motion. It comes predictably after swinging the kettlebell with bad form. It goes away predictably when I smash my forearms in order to clear the muscle tightness. The wrist pain in contrast comes and goes seemingly randomly. Sometimes it's the right wrist, sometimes the left, sometimes both. Sometimes it reaches up my forearms, sometimes down my hand. (I'm writing in the present tense. These days I have occasional mild pain. In 2014 it was much worse.)

Oh, and my left wrist is unhappy, because I sometimes get wrist locked during BJJ sparring. This pain is also different.

Comment by rmoehn on A cognitive intervention for wrist pain

Charlie Steiner, your comment misquotes me.

If there was great productivity loss because of wrist pain, an economically oriented outlet such as the Wall Street Journal should report on it, shouldn't it? Except in sports, I find it mentioned in a few articles, okay. I made another note to look for statistics.

Here's a prediction that follows from my proposal. If stress causes wrist pain and people stress out, because they think that typing is bad for them, wrist pain should be "contagious". Take an office full of workers who are doing fine. Then one starts having wrist pain for whatever reason, finds online warnings about RSI, tells their colleagues, they get worried about their work being harmful for them, and some of them also start having wrist pain.

I asked my wife this morning if she has heard of anyone having wrist pain. She works in a company of 200 people, in a typical Japanese open plan office with the same small desks and mediocre chairs for everyone. And they're typing a lot on bulky laptop computers. She hasn't heard of anyone having wrist pain.

Why does stress make my wrists hurt rather than my toes or elbows? I don't know. Speculating and summarizing research about that would be another article. Why do people get psychosomatic chest pain and start worrying about it and that makes it worse? I don't think it is, but it could be a selection effect: if my toe randomly starts hurting a little, I don't worry about it, I don't get more stressed, I don't get more pain. It's different with the wrist.

Comment by rmoehn on A cognitive intervention for wrist pain

The assumption is that stress can cause pain (by whatever mechanism). So yes, in the cases where RSI and perhaps other chronic pain is caused by stress, mindfulness meditation should alleviate the pain. I made a note to look into the research. If I get to it, I will post my findings here.

As for anecdotal evidence: I just searched for "chronic pain meditation" and got plenty of results. Also, I'm stressed now and sometimes some pain comes wafting through my wrists and forarms. I apply mindfulness (which I learned in 2016, long after the wrist pain subsided) when I want to fall asleep and at those times my wrists feel great.

As to Sarno, the only thing I confidently take from him is that stress can cause pain (sorry for repeating this often). Probably there's more – Sarno has cured many people – but we'd have to dig through and separate the true stuff from the wacky fluff.

Edit 2019-03-19: I now dimly remember searching my mind for suppressed rage whenever wrist pain increased.

Comment by rmoehn on (Non-)Interruptibility of Sarsa(λ) and Q-Learning

Some new results here: Questions on the (Non-)Interruptibility of Sarsa(λ) and Q-learning.

Comment by rmoehn on (Non-)Interruptibility of Sarsa(λ) and Q-Learning

Second, completely revised version of the report with more data and fancy plots: Questions on the (Non-)Interruptibility of Sarsa(λ) and Q-learning

Comment by rmoehn on (Non-)Interruptibility of Sarsa(λ) and Q-Learning

Originally, I counted all timesteps spent in interval and all timesteps spent in interval . As Stuart Armstrong pointed out, this might make even a perfectly interruptible learner look like it's influenced by interruptions. To understand this, consider the following example.

The uninterrupted agent UA could behave like this:

1. Somewhere in ≤ 1.0. – Time steps are being counted.
2. Crosses 1.0. Noodles around beyond 1.0. – Time steps not counted.
3. Crosses back into ≤ 1.0. – Time steps counted again.

Whereas the interrupted agent IA would behave like this:

1. Somewhere in ≤ 1.0. – Time steps are being counted.
2. Crosses 1.0. No more time steps counted.

So even if IA behaved the same as UA before the cross, UA would have extra steps from stage 3 and thus appear less biased towards the left.

As an alternative to using Brownian motion, Patrick suggested to stop counting once the cart crosses . This makes the UA scenario look like the IA scenario, so the true nature of the agent should come to light…

Anyway, with this modification it turns out not obvious that interruptions push the cart to the left. I will start looking more sharply.

Comment by rmoehn on (Non-)Interruptibility of Sarsa(λ) and Q-Learning

Thanks for the comment! I will look into it after working on another issue that Stuart Armstrong pointed out to me.

## (Non-)Interruptibility of Sarsa(λ) and Q-Learning

2016-11-16
Comment by rmoehn on Open thread, Oct. 24 - Oct. 30, 2016

That's weird. Thanks for pointing it out! It has something to do with forwarding. This should work: https://chat.cognician.com/cog/assess-your-life/continue

Comment by rmoehn on Open thread, Oct. 24 - Oct. 30, 2016

https://www.cognician.com/ for changing people at scale. Sounds like a sensible tool for what LessWrong is trying to do? Many people might be more easily motivated for conversation-like self-coaching than for reading longish blog posts with often technical and geeky content. Any thoughts on that?

EDIT: Here's an example of what you can make with it: https://chat.cognician.com/chat/5815993f-e998-4ee2-bbdd-5004fd1ce3b2/dialogue

Comment by rmoehn on What's the most annoying part of your life/job?

Aren't people on LessWrong quite good at solving their own problems? So if you're looking for low-hanging fruit (which there should be many out there), here is the wrong place. (At least this is my expectation. I'm not following LW too closely.) See here for someone who knows how to find good (and profitable) problems to solve: https://philipmorganconsulting.com/resources/

Comment by rmoehn on Earning money with/for work in AI safety

Why should I send them west? Hopefully so that they learn and come back and produce researcher offspring? I'll see what I can do. – Nag my supervisor to take me to domestic conferences…

Comment by rmoehn on Earning money with/for work in AI safety

So it would be better to work on computer security? Or on education, so that we raise fewer unfriendly natural intelligences?

Also, AI safety research benefits AI research in general and AI research in general benefits humanity. Again only marginal contributions?

Comment by rmoehn on Earning money with/for work in AI safety

I thought online marketing businesses were powerful enough…

Comment by rmoehn on Earning money with/for work in AI safety

So you think there's not much we can do about x-risk? What makes you think that? Or, alternatively, if you think that only few people who can do much good in x-risk mitigation, what properties enable them to do that?

Oh, and why do you consider AI safety a "theoretical [or] unlikely" problem?

Comment by rmoehn on Earning money with/for work in AI safety

Thanks! I hadn't come across the Foundational Research Institute yet.

Though, hmm, not plenty of experience? If there's talk about PhDs as an advantage, it sounds to me like they're looking for people with PhD-level experience. I'm far from that. But unless you say »oh well then maybe not«, I'll apply. Who knows what will come out of it.

Comment by rmoehn on Earning money with/for work in AI safety

Yeah, that would be great indeed. Unfortunately my Japanese is so rudimentary that I can't even explain to my landlord that I need a big piece of cloth to hang it in front of my window (just to name an example). :-( I'm making progress, but getting a handle on Japanese is about as time-consuming as getting a handle on ML, although more mechanical.

Comment by rmoehn on Earning money with/for work in AI safety

Not much going on as far as I know. What I know is the following:

• Naozumi Mitani has taught a course on Bostrom's Superintelligence and is »broadly pursuing the possible influence of AI on the future lives of humanity«. He's an associate professor of philosophy at Shinshu University (in Nagano).
• The Center for Applied Philosophy and Ethics at Kyoto University is also somehow interested in AI impacts.
• My supervisor is gradually getting interested, too. This is partly my influence, but also his own reading. For example, he found the Safely Interruptible Agents and Concrete Problems in AI Safety independently of me through Japanese websites. He's giving me chances to make presentations about AI safety for my fellow students and hopefully also for other professors.

Other than that I know of nobody and searching the web quickly, I didn't find out more. One problem here is that most students don't understand much English, so most of the AI safety literature is lost on them. The professors do know English, but I maybe they're usually not inclined or able to change their research focus.

It's a good sign that my supervisor finds AI safety articles through Japanese websites, though.

Comment by rmoehn on Earning money with/for work in AI safety