Comment by ViktoriaMalyasova on The Machine that Broke My Heart · 2021-12-31T20:42:45.889Z · LW · GW

So, you and your team spent six years of effort working full time for no pay (what did you even eat then?). You developed a product that worked just great, was in demand and could make a difference in fighting obesity by making a beep whenever the wearer eats. But even though the product was ready - "just put it on and good to go" - and you can easily reconstruct it, you and your whole team decided to abandon it and part ways. Because you simply aren't that into diabetes prevention, and also your time is limited and you have more important things to do. But you would enthusiastically do part-time contract work on this project again. 

I feel like this story doesn't quite make sense. If the company was doing so well and you just didn't want to run it anymore, why didn't you sell it?

Comment by ViktoriaMalyasova on What are the pros and cons of seeking a formal diagnosis of autism? · 2021-12-29T20:23:03.867Z · LW · GW

A diagnosed mental health disorder can be a reason for green card or USA entry denial, if it has led to harmful behaviour. This site says:

According to the CDC guidance, the mental disorders that are most frequently associated with harmful behavior are:

  • major depression
  • bipolar disorder
  • schizophrenia, and
  • mental retardation.

This doesn't mean that other disorders won't be used as grounds of inadmissibility, however. Even anxiety disorders, which many people suffer from while leading fully functioning lives, can lead to inadmissibility, if the disorder has led to harmful behavior to the applicant or others.

It also increases your risk of denial out of public charge concerns:

The public charge bar can have significant ramifications for immigrants with mental disabilities or disorders. Immigration officials are concerned that these immigrants might find themselves stranded in the U.S., unable to find work because of their disability and unable to receive or afford appropriate care and treatment.

There are also some job restrictions, e.g. people diagnosed with autism cannot join the Navy in the UK

I'd thouroughly research these and other possible restrictions before seeking a formal diagnosis. And keep in mind that rules might change in the future. Given the risks, I personally would need a better reason than getting more time on exams to take such a step. Maybe if I needed medication and this was the only way to get it.

Comment by ViktoriaMalyasova on Internet Literacy Atrophy · 2021-12-27T07:24:57.959Z · LW · GW

We desperately need 

Wait, didn't this post just make a case that older people don't keep up with new technology because they don't feel they need it?

New apps appeal to me less and less often. Sometimes something does look fun, like video editing, but the learning curve is so steep and I don’t need to make an Eye of The Tiger style training montage of my friends’ baby learning to buckle his car seat that badly, so I pass it by and focus on the millions of things I want to do that don’t require learning a new technical skill. 

Doesn't sound to me like you desperately need that app :)

Comment by ViktoriaMalyasova on What questions do you have about doing work on AI safety? · 2021-12-24T15:02:55.914Z · LW · GW

What proportion of independent researchers in AI safety manage to secure a second research grant after the first one runs out? What does it take to stay in the field? 

Are you supposed to publish papers? In which journals? Or do you just post to the Alignment forum?

What happens to people who leave the field, is it difficult to find a job? What if you leave MIRI after working on a secret project?

Can you give something like a Glassdoor review for your employer?

Comment by ViktoriaMalyasova on Alignment via manually implementing the utility function · 2021-12-04T12:18:38.061Z · LW · GW

Another problem is that the system cannot represent and communicate the whole predicted future history of the universe to us. It has to choose some compact description. And the description can get a high evaluation both for being a genuinely good plan, or for neglecting to predict or mention bad outcomes and using persuasive language (if it's a natural language description). 

Maybe we can have the human also report his happiness daily, and have the make_readable subroutine rewarded solely for how good the plan evaluation given beforehand matches the happiness level reported afterwards? I don't think that solves the problem of delayed negative consequences, or bad consequences human will never learn about, or wireheading the human while using misleading descriptions of what's happening, though.

Comment by ViktoriaMalyasova on Corrigibility Can Be VNM-Incoherent · 2021-11-22T22:30:14.615Z · LW · GW

Yea, thanks for remembering me! You can also posit that the agent is omniscient from the start, so it did not change its policy due to learning. This argument proves that an agent cannot be corrigible and a maximizer of the same expected utility funtion of world states over multiple shutdowns. But still leaves the possibility for the agent to be corrigible while rewriting his utility function after every correction.

Comment by ViktoriaMalyasova on Corrigibility Can Be VNM-Incoherent · 2021-11-22T22:22:04.788Z · LW · GW

So, let me try to summarize and check my understanding. In the first part of the post, you show that most random reward functions are not corrigible. This looks correct.

In the second part, you want to prove that VNM-coherence is incompatible with corrigibility in this universe, and I don't think I follow. So, suppose that R(A_blue),R(B_blue),R(C_blue)>max(R(A_red),R(B_red),R(C_red)). Now we change the dynamics so that the human will not correct the agent by default, but can be manipulated into it. Then we need to add states A_black and C_black, and arrows from B_black to itself, A_black and C_black, to denote the states where the agent is neither corrected nor disables his shutdown behaviour. (Otherwise the agent only misbehaves because we left him no other option.) Suppose we also have some initial reward R_init, which correction changes into reward R_new. Then we can combine these utilities like in Koen's Corrigibility with Utility Preservation: give the agent reward V_init(black_node) - V_new(blue_node) as soon as it gets corrected, where V_x stands for maximum achievable reward under R_x. Also let's assign reward R_red < min(R(A_black), R(B_black), R(C_black)) to red states, to make sure disabling the button is disincentivized. Then the agent is not incentivized (although also not disincentivized) to manipulate the human (as long as R_init by itself did not incentivize manipulation), and also not incentivized to disable its shutdown behaviour. It values the corrected and uncorrected states equally and greater than the incorrigible (button disabled) states.

I am not claiming of that utility indifference approach is without problems, of course, only that it seem to work in this toy universe. Or what am I missing?

I do think the conclusion of your argument is correct. Suppose the human is going to change his mind on his own and decide to correct the agent at timestep = 2, but the agent can also manipulate the human and erase the memory of the manipulation at timestep = 1, so the end results are exactly the same. A consequentialist agent should therefore evaluate both policies as equally good. So he chooses between them randomly and sometimes ends up manipulative. But a corrigible agent should not manipulate the human.