Posts

Analyzing the Problem GPT-3 is Trying to Solve 2020-08-06T21:58:56.163Z · score: 9 (5 votes)
What are the most important papers/post/resources to read to understand more of GPT-3? 2020-08-02T20:53:30.913Z · score: 25 (12 votes)
What are you looking for in a Less Wrong post? 2020-08-01T18:00:04.738Z · score: 27 (13 votes)
Dealing with Curiosity-Stoppers 2020-07-30T22:05:02.668Z · score: 46 (15 votes)
adamShimi's Shortform 2020-07-22T19:19:27.622Z · score: 4 (1 votes)
The 8 Techniques to Tolerify the Dark World 2020-07-20T00:58:04.621Z · score: 1 (11 votes)
Locality of goals 2020-06-22T21:56:01.428Z · score: 15 (6 votes)
Goal-directedness is behavioral, not structural 2020-06-08T23:05:30.422Z · score: 7 (4 votes)
Focus: you are allowed to be bad at accomplishing your goals 2020-06-03T21:04:29.151Z · score: 20 (10 votes)
Lessons from Isaac: Pitfalls of Reason 2020-05-08T20:44:35.902Z · score: 10 (4 votes)
My Functor is Rich! 2020-03-18T18:58:39.002Z · score: 10 (5 votes)
Welcome to the Haskell Jungle 2020-03-18T18:58:18.083Z · score: 14 (8 votes)
Lessons from Isaac: Poor Little Robbie 2020-03-14T17:14:56.438Z · score: 1 (6 votes)
Where's the Turing Machine? A step towards Ontology Identification 2020-02-26T17:10:53.054Z · score: 18 (5 votes)
Goal-directed = Model-based RL? 2020-02-20T19:13:51.342Z · score: 21 (8 votes)

Comments

Comment by adamshimi on Open & Welcome Thread - August 2020 · 2020-08-06T20:55:05.722Z · score: 1 (1 votes) · LW · GW

I didn't know that you were working on a new editor! In that case, it makes sense to wait.

Comment by adamshimi on Open & Welcome Thread - August 2020 · 2020-08-06T19:33:56.522Z · score: 1 (1 votes) · LW · GW

Would it be possible to have a page with all editor shortcuts and commands (maybe a cheatsheet) easily accessible? It's a bit annoying to have to look up either this post or the right part of the FAQ to find out how to do something in the editor.

Comment by adamshimi on The 8 Techniques to Tolerify the Dark World · 2020-08-03T18:40:18.882Z · score: 1 (1 votes) · LW · GW

Yes, I was more thinking about doing them by default than continuously thinking about them. If you actually do the latter, they might indeed stop working.

Comment by adamshimi on What are the most important papers/post/resources to read to understand more of GPT-3? · 2020-08-03T18:39:01.546Z · score: 3 (2 votes) · LW · GW

Thanks for the answer! I knew about the "transformer explained" post, but I was not aware of its author's position on GPT-3.

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-03T18:37:15.685Z · score: 8 (2 votes) · LW · GW

Karma is nice. Maybe simply an appreciation post at some point, which could still not name people. Just let them know that they are appreciated.

I don't know if that's possible, but another option might be some sort of "rank" or "badge" for top taggers. That being said, one might ask why have ranks only for this specific case, and not in general.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-03T18:34:55.899Z · score: 1 (1 votes) · LW · GW

Thanks for the answer! I didn't think of it that way, but I actually agree that I prefer when the post crystallize both sides of the disagreement, for example in a double crux.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-03T18:32:39.220Z · score: 1 (1 votes) · LW · GW

I didn't know that one. It's great!

Comment by adamshimi on What Failure Looks Like: Distilling the Discussion · 2020-08-02T20:38:07.428Z · score: 1 (1 votes) · LW · GW

Thanks for this post! I think the initiative is great, and I'm glad to be able to read a summary of the discussion.

Two points, the first more serious and important than the second:

  • Concerning your summary, I think that having only the Eleven Paragraph Summary (and maybe the one paragraph, if you really want a short one), is good enough. Notably, I feel like you end throwing too many important details in the three paragraphs summary. And nine paragraphs is short enough than anyone can read it.
  • Imagine that I want to respond to a specifc point that was discussed. Should I do that here or in the comments of the original post? The first option might make my comment easier to see, but it will split the discussion. (Also, it might cause an infinite recursion of distilling the discussions of the distillation of the discussion of the distillation of ... the discussion.)
Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-02T19:46:46.039Z · score: 2 (2 votes) · LW · GW

Thanks for the answer!

I see we basically vote in the same way, and it's nice to know that I'm not the only one that is lost sometimes.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-02T19:45:44.897Z · score: 1 (1 votes) · LW · GW

Thanks for the answer!

It makes me think of this great essay by Paul Graham.

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-02T19:44:45.988Z · score: 1 (1 votes) · LW · GW

Thanks for the answer!

It is pretty interesting: do you really not care at all about the ideas themselves (except the two topics mentioned)? A related question might be "how do you decide to go read a post only from the title, if you only use the meta?"

Comment by adamshimi on What are you looking for in a Less Wrong post? · 2020-08-02T19:43:14.814Z · score: 1 (1 votes) · LW · GW

Thanks for the answer! I like your normal upvote policy, especially the part to help new or inexperienced poster.

Comment by adamshimi on Dealing with Curiosity-Stoppers · 2020-08-02T18:36:09.250Z · score: 3 (2 votes) · LW · GW

Thanks! I'm glad it helped.

What pissed me most about this "I don't want to study" is that I usually choose my topics because I find them exciting. So why wouldn't I want to study them? This frustration then led me to think about curiosity-stoppers.

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-02T18:33:05.315Z · score: 6 (5 votes) · LW · GW

Just commenting to say it's pretty cool to see the bar filling up and the number of tagged posts growing up. Thanks to all the taggers!

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-08-01T12:07:58.572Z · score: 12 (3 votes) · LW · GW

I wrote a first version of the new tag description, I might rewrite some parts of it later. ;)

Comment by adamshimi on Tagging Open Call / Discussion Thread · 2020-07-31T14:55:34.541Z · score: 4 (2 votes) · LW · GW

Hey! I think it's cool that you created new tags. That being said, I do think that your description of category theory is not only a stub, but completely uninformative and dismissive about a mathematical field that has almost 70 years of work in it. I do think that explaining the controversy on the applicability of category theory is valuable, as we should question whether to use it for rationality and AI. But that should be a note at the end of the tag description, not the entire content of it.

(Note that I didn't change the tag description, because I don't want to force a change if I'm the only one thinking that. Maybe the point is only to describe how the word in the tag is used in LW, in which case the current tag description might work.)

Comment by adamshimi on The 8 Techniques to Tolerify the Dark World · 2020-07-24T00:13:29.704Z · score: 1 (1 votes) · LW · GW

On the contrary, that comment is great! Nothing is more frustrating when posting here to see the karma go down without anyone pointing out what they think is wrong.

I wrote this as an outlet for my frustration with my own coping mechanisms, and thus was pretty tongue in cheek. But I see how a reader could feel attacked or mocked. Maybe this shows that the value of this piece was in writing it, not in someone else than me actually reading it.

Comment by adamshimi on adamShimi's Shortform · 2020-07-22T19:19:28.140Z · score: 1 (1 votes) · LW · GW

A month after writing my post on Focus as a crucial component of goal-directedness, I think I see things clearer about its real point. You can decompose the proposal in my post into two main ideas:

  • How much a system S is trying to accomplish a goal G can be captured by the distance of S to the set of policies maximally goal-directed towards G.
  • The set of policies maximally directed towards G is the set of policies trained by RL (for every amount of resource above a threshold) on the reward corresponding to G.

The first idea is what focus is really about. The second doesn't work as well, and multiple people pointed out issues with it. But I still find powerful the idea that focus on a goal mesures the similarity with a set of policy that only try to accomplish this goal.

Now the big question left is: can we define the set of policies maximally goal-directed towards G in a clean way that captures our intuitions?

Comment by adamshimi on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T19:06:29.446Z · score: 1 (1 votes) · LW · GW
I think it's important to take a step back and notice how AI risk-related arguments are shifting.
In the sequences, a key argument (probably the key argument) for AI risk was the complexity of human value, and how it would be highly anthropomorphic for us to believe that our evolved morality was embedded in the fabric of the universe in a way that any intelligent system would naturally discover.  An intelligent system could just as easily maximize paperclips, the argument went.
No one seems to have noticed that GPT actually does a lot to invalidate the original complexity-of-value-means-FAI-is-super-difficult argument.

As far as I see, GPT-3 did absolutely nothing to invalidate the argument about complexity of value. GPT-3 is able to predict correctly the kind of things we want it to predict in a context-window of at most 1000 words in very small time scale. So it can predict what we want it to do in basically one or a couple of abstract steps. That seems to guarante nothing whatsoever about the ability of GPT-3 to infer our exact values for the time scales and complexity even relevant to human level AI, let alone AGI.

But I'm very interested for any experience that seems to invalidate this point.

Comment by adamshimi on Alignment proposals and complexity classes · 2020-07-17T21:04:23.335Z · score: 3 (2 votes) · LW · GW

You're right, after rereading the proofs and talking with Evan, the only way for H to be polynomial time is to get oracle access to M. Which is slightly unintuitive, but makes sense because the part of the computation that depends on H and not on M is indeed polynomial time.

Comment by adamshimi on Alignment proposals and complexity classes · 2020-07-16T23:17:20.012Z · score: 3 (2 votes) · LW · GW

Yes, this says that humans can solve any problem in P. It says nothing about using M as an oracle.

Comment by adamshimi on Alignment proposals and complexity classes · 2020-07-16T23:06:33.830Z · score: 3 (2 votes) · LW · GW
The post assumes that humans can solve all problems in P (in fact, all problems solvable in polynomial time given access to an oracle for M), then proves that various protocols can solve tricky problems by getting the human player to solve problems in P that in reality aren't typical human problems to solve.

I don't see where the post says that humans can solve all problems solvable in polynomial time given access to an oracle. What Evan does is just replacing humans (which is a rather fuzzy category) with polynomial time algorithms (which represent the consensus in complexity theory for tractable algorithms).

Premise 1: We can't build physical machines that solve problems outside of P.

On another note, you're writing your comment on a physical device that can solve any computable problem, which obviously include problems outside of P (if only by the time hierarchy theorem). The value of P is that it captures problem that one can solve using physical machines in a way that scale reasonably, so you don't need to take the lifespan of the universe to compute the answer for an instance of size 1000. But we clearly have algorithms for problems outside of P. We just believe/know they will take forever and/or too much space and/or too much energy.

Comment by adamshimi on Alignment proposals and complexity classes · 2020-07-16T14:30:16.678Z · score: 6 (4 votes) · LW · GW

Nice, an excuse to apply my love for Computational Complexity to useful questions of AI Safety!

That being said, I'm a little confused with what you show.

Second, we'll say that a proposal to train a model using a loss function accesses a complexity class iff, for any decision problem , there exists some strategy available to such that, for any which is optimal under given 's strategy, always solves .

I follow up to there. What you're showing is basically that you can train M to solve any problem in C using a specific alignment approach, without limiting in any way the computational power of M. So it might take an untractable amount of resources, like exponential time, for this model to solve a problem in PSPACE, but what matters here is just that it does. The point is to show that alignment approaches using a polynomial-time human can solve these problems, not how much resources they will use to do so.

And then you write about the intuition:

Thus, conceptually, a proposal accesses if there is a (polynomial-time) strategy you can implement such that—conditional on you knowing that the model is optimal—you would trust its output for any problem in .

Maybe it's just grammar, but I read this sentence as saying that I trust the output of the polynomial-time strategy. And thus that you can solve PSPACE, NEXP, EXP and R problems in polynomial time. So I'm assuming that you mean trusting the model, which once again has no limits in terms of resources used.

Irving et al. actually note that, if you don't imagine optimal play and simply restrict to the set of problems that a polynomial-time human can actually judge, debate only reaches NP rather than PSPACE.

I looked for that statement in the paper, failed to find it, then realized you probably meant that raw polynomial time verification gives you NP (the certificate version of NP, basically). Riffing on the importance of optimal play, Irving et al. show that the debate game is a game in the complexity theoretic sense, and thus that it is equivalent to TQBF, a PSPACE-complete problem. But when seeing a closed formula as a game, the decision problem of finding whether it's in TQBF amounts to showing the existence of a winning strategy for the first player. Debate solves this by assuming optimal play, and thus that the winning debater will have, find, and apply a winning strategy for the debate.

I find it fun to compare it to IP, the class of Interactive Protocols, which is also equal to PSPACE. An interactive protocol makes a powerful prover convince a polynomial time verifier of the existence of a winning strategy for the problem (TQBF for example). As Scott Aarronson writes in "Quantum Computing Since Democritus":

I won’t go through Shamir’s result here, but this means that, if a super-intelligent alien came to Earth, it could prove to us whether white or black has the winning strategy in chess, or if chess is a draw.

This seems more powerful than debate, like a meta level above. Yet it isn't, and it makes sense: showing you that there's a winning strategy is powerful, but having a guarantee to always win if there is a winning strategy is almost as good.

With all that, I've not gotten into your proofs at all. I'm reading them in details, but I'll have to take a bit more time before having constructive comments on them. ^^

A notation question I can already ask though: what is ?

Comment by adamshimi on Classification of AI alignment research: deconfusion, "good enough" non-superintelligent AI alignment, superintelligent AI alignment · 2020-07-15T09:45:30.580Z · score: 15 (7 votes) · LW · GW

Good idea to write down what you think! As someone who is moving toward AI Safety, and who has spent all this year reading, studying, working and talking with people, I disagree with some of what you write.

First, I feel that your classification comes from your decision that deconfusion is most important. I mean, your second category literally only contains things you describe as "hacky" and "not theory-based" without much differentiation, and it's clear that you believe theory and elegance (for lack of a better word) to be important. I also don't think that the third cluster makes much sense, as you point out that most of it lies in the second one too. Even deconfusion aims at solving alignment, just in a different way.

A dimension I find more useful is the need for understanding what's happening inside. This scale goes from MIRI's embedded agency approach (on the "I need to understand everything about how the system works, at a mathematical level, to make it aligned" end) to prosaic AGI on the other side (on the "I can consider the system as a black box that behaves according to some incentives and build an architecture using it that ensures alignment" end). I like this dimension because I feel a lot of my gut-feeling about AI Safety research comes from my own perspectives on the value of "understanding what happens inside", and how mathematical this understanding must be.

Here's an interesting discussion of this distinction.

How do I classify the rest? Something like that:

  • On the embedded agency end of the spectrum, things like Vanessa's research agenda and Stuart Armstrong's research agenda. Probably anything that fits in agents foundations too.
  • In the middle, I think of DeepMind's research about incentives, Evan Hubinger's research about inner alignment and myopia, probably all the cool things about interpretability like the clarity team work at Open AI (see this post for an AI Safety perspective).
  • On the prosaic AGI end of the spectrum, I would put IDA and AI Safety via debate, Drexler's CAIS, and probably most of CHAI's published research (although I am less sure on that, and would be happy to be corrected)

Now, I want to say explicitly that this dimension is not a value scale. I'm not saying that either end is more valuable in of themselves. I'm simply pointing at what I think is an underlying parameter in why people work on what they work on. Personally, I'm more excited about the middle and the embedded agency end, but I still see value and am curious about the other end of the spectrum.

It's easier to learn all the prerequisites for type-2 research and to actually do it.

I wholeheartedly disagree. I think you point out a very important aspect of AI Safety: the prerequisites are all over the place, sometimes completely different for different approaches. That being said, which prerequisites are easier to learn is more about personal fit and background than intrinsic difficulty. Is it inherently harder to learn model theory than statistical learning theory? Decision theory than Neural networks? What about psychology or philosphy. What you write feels like a judgement stemming from "Math is more demanding". But try to understand all the interpretability stuff, and you'll realize that even if it lacks a lot of deep mathematical theorems, it still requires a tremendous amount of work to grok.

(Also, as another argument against your groupings, even the mathematical snobs would not put Vanessa's work in the "less prerequisite" section. I mean, she uses measure theory, bayesian statistics, online learning and category theory, among others!)

So my position is that thinking in terms of your beliefs about "how much one needs to understand the insides to make something work" will help you choose an approach to try your hand out. It's also pretty cheap to just talk to people and try a bunch of different approaches to see which ones feel right.

A word on choosing the best approach: I feel like AI Safety as it is doesn't help at all to do that. Because of the different prerequisites, understanding deeply any approach requires a time investment that triggers a sunk-cost fallacy when evaluating the value of this approach. Also, I think it's very common to judge a certain approach from the perspective of one's current approach, which might color this judgement in incorrect ways. The best strategy I know of, which I try to apply, is to try something and update regularly on its value compared to the rest.

Comment by adamshimi on Kelly Bet on Everything · 2020-07-11T20:58:12.477Z · score: 1 (1 votes) · LW · GW

Nice post. I might try to use this idea to force myself to make more bet socially. I'm risk taking in terms of ideas and creation and jobs, but not enough in terms of talking to new people and flirting. Forcing myself to start a conversation with a stranger everyday is one way I'm trying to solve that; thinking about the rationality of the bet might become another.

Comment by adamshimi on Tradeoff between desirable properties for baseline choices in impact measures · 2020-07-08T17:22:01.947Z · score: 1 (1 votes) · LW · GW

When you say "shutdown avoidance incentives", do you mean that the agent/system will actively try to avoid its own shutdown? I'm not sure why comparing with the current state would cause such a problem: the state with the least impact seems like the one where the agent let itself be shutdown, or it would go against the will of another agent. That's how I understand it, but I'm very interested in knowing where I'm going wrong.

Comment by adamshimi on Tradeoff between desirable properties for baseline choices in impact measures · 2020-07-08T17:17:31.552Z · score: 3 (2 votes) · LW · GW

I understood that the baseline that you presented was a description of what happens by default, but I wondered if there was a way to differentiate between different judgements on what happens by default. Intuitively, killing someone by not doing something feels different from not killing someone by not doing something.

So my question was a check to see if impact measures considered such judgements (which apparently they don't) and if they didn't, what was the problem.

Comment by adamshimi on DARPA Digital Tutor: Four Months to Total Technical Expertise? · 2020-07-07T21:50:59.277Z · score: 3 (3 votes) · LW · GW

I'm pretty sure expert systems are considered AI, if only because they were created by AI researchers. They're not using ML though, and it's not considered likely today that they will scale to human level AI or AGI.

Comment by adamshimi on Locality of goals · 2020-07-07T16:49:55.893Z · score: 3 (2 votes) · LW · GW

The more I think about it, the more I come to believe that locality is very related to abstraction. Not the distance part necessarily, but the underlying intuition. If my goal is not "about the world", then I can throw almost all information about the world except a few details and still be able to check my goal. The "world" of the thermostat is in that sense a very abstracted map of the world where anything except the number on its sensor is thrown away.

Comment by adamshimi on Focus: you are allowed to be bad at accomplishing your goals · 2020-07-07T16:44:36.571Z · score: 2 (2 votes) · LW · GW

Sorry for the delay in answering.

Your paper looks great! It seems to tackle in a clean and formal way what I was vaguely pointing at. We're currently reading a lot of papers and blog posts to prepare for an in-depth literature review about goal-directedness, and I added your paper to the list. I'll try to come back here and comment after I read it.

Comment by adamshimi on Focus: you are allowed to be bad at accomplishing your goals · 2020-07-07T16:39:29.939Z · score: 1 (1 votes) · LW · GW

Sorry for the delay in answering.

In this post, I assume that a policy is a description of its behavior (like a function from state to action or distribution over action), and thus the distances mentioned indeed capture behavioral similarity. That being said, you're right that a similar concept of distance between the internal structure of the policies would prove difficult, eventually butting against uncomputability.

Comment by adamshimi on Tradeoff between desirable properties for baseline choices in impact measures · 2020-07-07T00:16:51.169Z · score: 3 (2 votes) · LW · GW

In the specific example of the car, can't you compare the impact of the two next states (the baseline and the result of braking) with the current state? Killing someone should probably be considered a bigger impact than braking (and I think it is for attainable utility).

But I guess the answer is less clear-cut for cases like the door.

Comment by adamshimi on Tradeoff between desirable properties for baseline choices in impact measures · 2020-07-06T22:22:51.149Z · score: 3 (2 votes) · LW · GW

Thanks for the post!

One thing I wonder: shouldn't an impact measure give a value to the baseline? What I mean is that in the most extreme examples, the tradeoff you show arise because sometimes the baseline is "what should happen" and some other time the baseline is "what should not happen" (like killing a pedestrian). In cases where the baseline sucks, one should act differently; and in cases where the baseline is great, changing it should come with penalty.

I assume that there's an issue with this picture. Do you know what it is?

Comment by adamshimi on Goals and short descriptions · 2020-07-03T18:32:08.119Z · score: 1 (1 votes) · LW · GW

Maybe the criterion that removes this specific policy is locality? What I mean is that this policy has a goal only on its output (which action it chooses), and thus a very local goal. Since the intuition of goals as short descriptions assumes that goals are "part of the world", maybe this only applies to non-local goals.

Comment by adamshimi on Open & Welcome Thread - June 2020 · 2020-07-02T22:24:49.311Z · score: 1 (1 votes) · LW · GW

Am I the only one for whom all comments in the Alignment Forum have 0 votes?

Comment by adamshimi on Locality of goals · 2020-07-02T22:14:34.095Z · score: 1 (1 votes) · LW · GW

No worries, that's a good answer. I was just curious, not expecting a full-fledged system. ;)

Comment by adamshimi on Locality of goals · 2020-07-02T21:09:56.000Z · score: 1 (1 votes) · LW · GW

Thanks for the summary! It's representative of the idea.

Just by curiosity, how do you decide for which posts/paper you want to write an opinion?

Comment by adamshimi on Thoughts as open tabs · 2020-06-30T16:18:21.348Z · score: 3 (3 votes) · LW · GW

I think the collection part of GTD addresses exactly this problem. There's two part:

  • You want to free your brain by writing what you want to do
  • You want to stop feeling like you forgot writing something down

The way proposed by GTD is to collect EVERYTHING. The goal is really to not have any commitment or desire stored internally, but collect everything outside of your brain. This solves the first problem if you give enough details, and the second problem when your brain learns that it can always find what it needs from your notes.

Anecdotically, it works for me.

Comment by adamshimi on Thoughts as open tabs · 2020-06-29T23:03:16.131Z · score: 2 (2 votes) · LW · GW

I've been using GTD for some times now, and the injunction to put every thought about something to be done into a collection device (I have a page on Roam and a file on my note taking app on my phone) is really powerful. I never noticed how much more focus and clearity were possible when everything I want to do is written somewhere, and thus I don't need to keep mental tab on it.

To go back to the metaphor of this post, forcing myself to close every tab that is not directly in use is one of the best productivity hack I learned.

Comment by adamshimi on Locality of goals · 2020-06-23T21:00:10.209Z · score: 3 (2 votes) · LW · GW

Thanks! Glad that I managed to write something that was not causally or rhetorically all wrong. ^^

One related thing I was thinking about last week: part of the idea of abstraction is that we can pick a Markov blanket around some variable X, and anything outside that Markov blanket can only "see" abstract summary information f(X). So, if we have a goal which only cares about things outside that Markov blanket, then that goal will only care about f(X) rather than all of X

That makes even more sense to me than you might think. My intuitions about locality comes from its uses in distributed computing, where it measures both how many rounds of communication are needed to solve a problem and how far in the communication graph one needs to look to compute one's own output. This looks like my use of locality here.

On the other hand, recent work on distributed complexity also studied the volume complexity of a problem: the size of the subgraph one needs to look at, which might be very different from a ball. The only real constraint is connectedness. Modulo the usual "exactness issue", which we can deal with by replacing "the node is not used" by "only f(X) is used", this looks a lot like your idea.

Comment by adamshimi on When is it Wrong to Click on a Cow? · 2020-06-21T00:42:37.011Z · score: 9 (6 votes) · LW · GW

One intuition I have for the difference is the consuming/producing axis: the stimmer is purely consuming, the video game player is half consuming, half producing a narrative/experience, and the player is mostly producing something.

I'm not as sure why this should be morally relevant, but I do feel that producing is more "moral" than consuming, in general.

Comment by adamshimi on Specification gaming: the flip side of AI ingenuity · 2020-06-19T18:23:43.729Z · score: 3 (2 votes) · LW · GW

Ok, that makes much more sense. I was indeed assuming a proportional reward.

Comment by adamshimi on What should I teach to my future daughter? · 2020-06-19T16:52:49.619Z · score: 3 (2 votes) · LW · GW

I don't have kids, but if I had some, I would want to implement the "Hard Thing Rule" given by Angela Duckworth in her book "Grit".

It boils down to 3 rules:

  • Everyone in the family must do one hard thing, one thing that requires deliberate practice everyday. This can be yoga, programming, ballet, a lot of things really.
  • You can quit your hard thing, but only at "natural" stopping points (the end of the season, the end of the school year,...)
  • You choose your hard thing.

I really like this idea, because it teaches working ethics, grit and consistency in a way that respects individual differences.

Comment by adamshimi on Public Static: What is Abstraction? · 2020-06-13T20:16:11.184Z · score: 3 (2 votes) · LW · GW

Thanks a lot for this compressed summary! As someone who tries to understand your work, but is sometimes lost within your sequence, this helps a lot.

I cannot comment the maths in any interesting way, but I feel that your restricted notion of abstraction -- where the high-level summary capture what's relevant to "far way" variables" -- works very well with my intuition. I like that it works with "far away" in time too, for example in abstracting current events as memories for future use.

Comment by adamshimi on Focus: you are allowed to be bad at accomplishing your goals · 2020-06-13T13:27:04.140Z · score: 2 (2 votes) · LW · GW

I think one very important thing you are pointing out is that I did not mention the impact of the environment. Because to train using RL, there must be some underlying environment, even just as a sample model. This opens up a lot of questions:

  • What happens if the actual environment is known by the RL process and the system whose focus we are computing?
  • What happens when there is uncertainty over the environment?
  • Given an environment, from which goals is focus entangled (your example basically: high focus with one imply high focus with the other)?

As for your specific example, I assume that the distance converges to 0 because intuitively the only difference lies in the action at state s_k (go back to 0 for the first reward and increment for the second), and this state is seen in less and less proportion as N goes to infinity.

This seems like a perfect example of two distinct goals with almost maximal focus, and similar triviality. As mentioned in the post, I don't have a clear cut intuition on what to do here. I would say that we cannot distinguish between the two goals in terms of behavior, maybe.

Comment by adamshimi on Goal-directedness is behavioral, not structural · 2020-06-12T22:23:55.756Z · score: 3 (2 votes) · LW · GW

You make a good point. Actually, I think I answered a bit too fast, maybe because I was in the defensive (given the content of your comment). We probably are actually trying to capture the intuitive goal-directedness, in the sense that many of our examples, use-cases, intuitions and counter-examples draw on humans.

What I reacted against is a focus solely on humans. I do think that goal-directedness should capture/explain humans, but I also believe that studying simpler settings/systems will provide many insight that would be lost in the complexity of humans. It's in that sense that I think the bulk of the formalization/abstraction work should focus less on humans than you implied.

There is also the fact that we want to answer some of the questions raised by goal-directedness for AI safety. And thus even if the complete picture is lacking, having a theory capturing this aspect would already be a big progress.

Comment by adamshimi on Goal-directedness is behavioral, not structural · 2020-06-12T20:29:22.253Z · score: 3 (2 votes) · LW · GW

Thanks for the comment, and sorry for taking that long to answer, I had my hands full with the application for the LTFF.

Except your first one (I go into that below), I agree with all your criticisms of my argument. I also realized that the position opposite of mine was not to think that we care about something else than the behavior, but that specifying what matters in the behavior might require thinking about the insides. I still disagree, but I don't think I have conclusive arguments for that debate. The best I can do is try to do it and see if I fail.

About your first point:

First, the two questions considered are both questions about goal-directed AI. As I see it, the most important reason to think about goal-directedness is not that AI might be goal directed, but that humans might be goal directed. The whole point of alignment is to build AI which does what humans want; the entire concept of "what humans want" has goal directedness built into it. We need a model in which it makes sense for humans to want things, in order to even formulate the question "will this AI do what humans want?". That's why goal directedness matters.

Well, the questions I care about (and the ones Rohin asked) are actually about goal-directed AI. It's about whether it must be goal-directed, and whether making it not/less goal-directed improves its safety. So I'm clearly not considering "what humans want" first, even if it would be a nice consequence.

Comment by adamshimi on Blog Post Day III · 2020-06-12T16:44:28.427Z · score: 3 (2 votes) · LW · GW

This time I'll be there!

Comment by adamshimi on Turns Out Interruptions Are Bad, Who Knew? · 2020-06-12T03:24:03.779Z · score: 3 (2 votes) · LW · GW

I agree that he is very suspicious of the value of social media. But much of Digital Minimalism is written acknowledging that people do extract value from digital interactions (among them social media). It's simply an approach to extract as much as the benefits as possible while minimizing the costs, which seem like what you want.

So I would still recommend that you check it out.

Comment by adamshimi on Goal-directedness is behavioral, not structural · 2020-06-11T23:24:47.563Z · score: 3 (2 votes) · LW · GW

After talking with Evan, I think I understand your point better. What I didn't understand was that you seemed to argue that there was something else than the behavior that mattered for goal-directedness. But as I understand it now, what you're saying is that, yes, the behavior is what matters, but extracting the relevant information from the behavior is really hard. And thus you believe that computing goal-directedness in any meaningful way will require normative assumptions about the cognition of the system, at an abstract level.

If that's right, then I would still disagree with you, but I think the case for my position is far less settled than I assumed. I believe there are lots of interesting parts of goal-directedness that can be extracted from the behavior only, while acknowledging that historically, it has been harder to compute most complex properties of a system from behavior alone.

If that's not right, then I propose that we schedule a call sometime, to clarify the disagreement with more bandwidth. Actually, even if it's right, I can call to update you on the research.