Reinforcement and Short-Term Rewards as Anti-Akratic

post by Intrism · 2013-04-13T20:47:29.405Z · LW · GW · Legacy · 27 comments

Contents

27 comments

Related: Time and Effort Discounting, Akrasia, Hyperbolic Discounting, and PicoeconomicsThe Power of Reinforcement, Basics of Animal Reinforcement, Basics of Human Reinforcement

I built a robot that feeds me candy when I get work done, to try to solve my akrasia problem. And, so far, it seems like it might actually work.

Naturally, the story starts with procrastination. I finish things the night before they're due. Or, sometimes, I don't. I'd like to fix that. One theory explains procrastination as a result of discounting, the idea that human brains discount long-term rewards in favor of short-term ones. For instance, my brain prefers watching Neon Genesis Evangelion now over nearly missing my project deadline in a few days. The same principle applies to consequences, and there are already tools like BeeMinder that are built to combat it. Its tagline, "bring long-term consequences near," is a very concise description of a clever way to short-circuit discounting. It's very interesting, but I'm not really comfortable with paying money as a consequence. Instead, I'm going to try a similar technique: bringing long-term rewards near.

There are already a lot of techniques about bringing long-term rewards near. Generally, they're called reinforcement learning. The classic reward in reinforcement is candy, which seems like a good idea: I like it, and I'm more than willing to abuse my youthful metabolism for productivity. And, in fact, there are a wide variety of folk solutions of that sort - advice to reward yourself with some candy once your work is done. I've tried those already, but they never seem to work out for me - I always seem to wind up cheating. I need to do something trickier.

CFAR describes reinforcement in a very striking way in some of their course materials: they call it "training your inner pigeon." Not only is that a nice, snappy turn of phrase, it illustrates the problem with attempting to self-administer rewards very nicely. Did Skinner's pigeons self-administer their rewards? No, of course they didn't. I shouldn't expect my inner pigeon to, either. So, my next step is to build a robot that gives me candy when I get stuff done.

Why do I think I can keep from cheating on the machine, when I couldn't restrain myself from cheating on regular old bags of candy? Well, I'm far from certain; it's my biggest worry with the project, in fact. But I am reasonably confident, because the machine will give me an easy way to establish a Schelling fence. Where taking a handful of candy out of the bag is sometimes right and sometimes wrong, taking a handful of candy out of the hopper is always wrong, since the machine will dispense the candy when I deserve it. Precommitting to never take candy out of the machine seems like it'll be a lot easier than precommitting to only sometimes take candy out of the bag.

Now, the description "robot" for my machine is a bit fanciful. It's actually an automatic dog feeder, modified and connected to the Internet. It has a small screen mounted on the front, which tells me how many rewards I've earned. If I've got any, I can press a button on the screen to dispense them. Not counting parts I already owned, the device cost me around $50 to build. To provide the data, I linked the device to an earlier productivity hack that I already had around, a custom webapp integrating a task list with a Pomodoro timer.

Rewards are given based on a few simple rules. When I finish a task early, it gives me the number of days early in rewards; if I finish tasks out of order, it gives me the nearer task's number of rewards, so I've got an incentive to finish tasks in order. I also get an extra reward for my first Pomodoro in a week for each of my projects, so that I have an incentive not to forget old projects. The system can also take away rewards. If I get distracted during a Pomodoro, I lose a reward. I'm blocked from redeeming rewards if I have a task within a day of its deadline. If I finish a task more than a day late, I lose any rewards in the system.

Results have been mixed so far. My greatest concern seems to have been unjustified: I haven't cheated on the machine once. However, it seems like the rules need some more work. The system has definitely helped some, but there are a lot of problems that could be improved.

The system doesn't account for the difficulty of tasks, meaning that I get more reward for less effort if I do easier work. As a result, I've done all of the reading up to next Tuesday for my literature class, but my Computer Science assignment due on Friday is unfinished, and my "research" for an exceptionally abhorrent humanities course is languishing on the vine.

The point of the system was to bring long-term rewards near, but there are a lot of circumstances in which it doesn't seem to bring them quite near enough. For deadlined tasks, I get no rewards until I've actually completed the task; if I think a task will take me more than a day to finish, that's more than a day of work which earns me no short-term rewards. This gets even worse if I happen to have a long task (or, many short tasks) that have reached the day before their deadline. Then, I don't get any rewards until I finish all of those tasks. While this is quite motivating, it's still a long-term motivation, i.e. it doesn't work very well.

I deliberately built the system to encourage doing tasks in order, but this seems to have backfired a little bit. Since I would be giving up rewards, I don't want to work on a task that's due later if there's another that's due sooner. However, if I really don't want to do the nearer task, I'll end up wasting time, since I get no rewards for that either way. Nyan_sandwich describes a similar failure mode in his Akrasia Case Study: if I know I have something more urgent to do, but I don't want to do it, I wind up procrastinating instead of doing less urgent things.

I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day. Additionally, I seem to be entirely incapable of pacing myself; if the reward is in the system, I tend not to wait very long before using it. This has crippled all of the rules about involving taking away rewards - unless the rewards are blocked, they don't stick around in the system long enough to be taken away.

Not all of the things I want to change are a result of problems, though. There are a wide variety of interesting improvements I could make. Many of these are expansions: aside from my task list, what else can I connect to? Can I track note-taking in class? Can I set it up to reward continuing effort towards a task, like writing a few hundred words a day? Can I use it to create new, more rational habits? There are all kinds of possibilities to consider. If you've got anything you'd like to suggest, let me know - I'm open to anything interesting.

There are also a lot of techniques to research; I'm sure the program isn't nearly as effective as it could be. Operant conditioning techniques like variable-ratio schedules might help improve performance per candy. Or, I could look into gamification, basically a form of applied human operant conditioning; it's not a standard tool on the site, but if you've ever watched an experience bar rise, you know what I'm talking about. Again, if you happen to have some relevant ideas, let me know.

Obviously, I'm going to be making some rule changes in the near future. Expect another post in a few weeks about what's changed and how the changes have worked out for me.

Also, does anyone want to help me think of a good name for the system? Right now it's called the "extrinsic motivator." While descriptive, this name isn't snappy at all.

27 comments

Comments sorted by top scores.

comment by D_Malik · 2013-04-14T02:53:15.415Z · LW(p) · GW(p)

This is awesome! I'm really excited because I've been playing around with related things for a while, and by sharing techniques we can all become stronger!

So: A couple months ago I had a massive problem with my Anki reviews. I use Anki for a lot of things that aren't standard question-answer training, and this other work tends to be really mentally effortful. So, naturally, the reviews piled up until I had a massive backlog, around 10,000 reviews.

I tried a bunch of tricks; from this post ("Applying Behavioral Psychology on Myself") and elsewhere I got ideas for two Anki plugins intended to reinforce reviewing, by manipulating music volume or by showing reinforcing pictures. This was somewhat effective, but not very. Then I moved on to using candy, which worked better but still wasn't very effective.

But the latest thing I've been trying works a lot better. Look at my anki review stats:

The technique that did that was this:

  • Reinforce reviews by giving a 20% chance of reward upon the completion of each.
  • A reward is a small piece of food or a sip of a drink.
  • Never eat or drink anything except when you've earned a reinforcer.

As you procrastinate, you become hungrier and hungrier until your desire for rewards exceeds your desire for non-work. By keeping rewards small, you remain perpetually hungry and work remains reinforcing. The brain was built to extrapolate from "I'm less hungry" to "I should do whatever I just did more often".

This is especially good for something that needs to be done regularly, like anki reviews. If rewards and reward-probabilities are small enough, it also functions as caloric restriction. This system is also good for very granular tasks, like question-answer Anki reviews.

My non-question-answer reviews have irregular lengths and so aren't as granular, so for them I use another reinforcement system in addition to the old one: every 10 seconds, with probability ~20%, show a message in the background. When that appears, I examine my thoughts just prior to it appearing and reward if they were about work or other productive things. This also seems to work well, and works better for tasks where you don't want to incentivize rushing through things. (For question-answer pairs, success is measurable in correct responses, so generally you should rush through them, as long as you still get the correct responses.)

I also have a thing that periodically asks whether I'm in a correct posture, and standing instead of sitting, and not procrastinating sleep. If I'm in the right state, those give additional medium-sized rewards. I implemented this only 2 days ago, so I don't know if it works yet.

I strongly encourage you to try variable reinforcement, because my impression from reading things is that it's a lot better; I haven't tried non-variable reinforcement.

A similar thing I've been experimenting with is punishing unwanted behaviors, mostly by using the rubber band technique; mixed results so far.

I'm very interested in your automatic dog-feeder setup. I've been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).

How to implement some of the stuff above:

See the two anki plugins I posted. I just put up a more basic version that shows popups instead of pictu res.

On Ubuntu Linux, to show a background popup every 10 seconds (sorta), with some probability, do crontab -e and add this:

* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 20 ] ; then notify-send 'REINFORCE?'; (sleep 3; killall notify-osd ) & fi; sleep 10; done"

That's what I currently use; I used to use this, which shows foreground popups and should thus probably be kept commented-out most of the time:

* * * * * bash -c "export DISPLAY=:0 && for i in 1 2 3 4 5; do if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi; sleep 10; done; if [ \$(( \$RANDOM / 327 )) -lt 10 ] ; then zenity --info --text='\n\nREINFORCE?' --timeout=5 --width=1000 --height=800 & fi"

If anyone is planning to use any of this, please tell me. Also please share any ideas you have, even if they don't seem useful.

Replies from: Richard_Kennaway, David_Gerard, Intrism
comment by Richard_Kennaway · 2013-04-15T13:23:39.668Z · LW(p) · GW(p)

I've been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).

I don't know if you're being serious here, but I am.

Beware of shock collars. Dogs have much thicker skin than humans, it's more loosely attached to the underlying tissue, and it's covered with fur. A mild zap for a dog might be too dangerous to apply to a human neck. I suggest contacting your local BDSM community for advice on where and how to safely give yourself electric shocks. They may also be able to advise on ways of making it impossible to take off until the time is up. Although, as I've said in a top-level comment to this post, whatever the setup, the conditionality between behaviour and reward is an exercise in role-playing. In reality you can eat and drink whatever you want whenever you want, and you are choosing to imagine a connection with doing your Anki work.

Replies from: gwern
comment by gwern · 2013-04-15T17:28:56.799Z · LW(p) · GW(p)

More importantly, with conditioning, there's always the question of what exactly are you doing? (Particularly acute in cases of positive punishment like electroshock.) As far as the suggestion goes, well... I cannot seem to refind it, but within the last 2 or 3 years I ran into a blog where the author set up a electric shock apparatus for himself hooked up to some software. His final post said (or he said when I asked, I forget) that he stopped because he wound up training himself to not use it and it was too aversive to put on.

comment by David_Gerard · 2013-04-15T11:24:11.405Z · LW(p) · GW(p)

I'm very interested in your automatic dog-feeder setup. I've been thinking about further automating my reinforcement things, and about automating punishment by buying an electroshock collar (normally used to train dogs).

At this point I thought "ok, that was an extended Modest Proposal".

comment by Intrism · 2013-04-15T14:14:31.804Z · LW(p) · GW(p)

every 10 seconds, with probability ~20%, show a message in the background. When that appears, I examine my thoughts just prior to it appearing and reward if they were about work or other productive things.

Have you had any problems with the context switching? It seems like being interrupted every ~50 seconds would make me less productive.

comment by gwern · 2013-04-13T21:17:49.276Z · LW(p) · GW(p)

Also, does anyone want to help me think of a good name for the system? Right now it's called the "extrinsic motivator." While descriptive, this name isn't snappy at all.

How about "Sir Rob the Rewarding"?

comment by Viliam_Bur · 2013-04-14T11:36:01.289Z · LW(p) · GW(p)

+1 Awesome!

I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day.

Could you try using smaller candy?

if the reward is in the system, I tend not to wait very long before using it.

This seems OK to me. The reward should be instant, to create a better connection with the behavior you want to reward.

aside from my task list, what else can I connect to? Can I track note-taking in class? Can I set it up to reward continuing effort towards a task, like writing a few hundred words a day? Can I use it to create new, more rational habits?

I use my reward system for: exercising, getting to work early, avoiding sugar, avoiding web procrastination, meeting with people, taking an afternoon nap, writing blog articles, learning foreign languages, etc. And there is also an umbrella category for "other important stuff", such as making an appointment with a dentist or fixing something at my home.

To prevent myself from replacing a difficult task with many easier tasks, some of these rewards have a limit of 1 per day. For example, I can only take 1 afternoon nap each day. Or I get 1 point for not eating sugar all day (rewards from the system are not included in this rule). For writing N blog articles I get N points, because honestly "writing too many blog articles at the same day" never happened to me. Somewhere in between is the exercise: I can get only 1 point per day for doing a "small exercise", and then another 1 point in the same day for doing a "big exercise" too (5 times more difficult than the small one).

Note: My system does not include priorities and deadlines, so I don't know how easy would it be to include this stuff in your system. And with more rewards you really need to use smaller candies. Or perhaps giving 1 candy per N points? Or giving a candy with probability 1/N? -- Instead of truly random numbers I would recommend an algorithm that would increase the probability; for the first point, you would get the candy with a probability 1/N, but if you don't get it, the next point has a probability 1/(N-1), etc.; and if you get it, the probability is reset to 1/N again. (Truly random numbers don't feel random to humans; we emotionally expect that after a few failures the probability of a success should increase.)

Meta: It is great that you describe what works and what doesn't work. With enough articles like this, someone could make a review of motivational systems, describe the good and bad parts confirmed by many different systems, and make recommendations for people who want to create their own system.

Replies from: Intrism, mare-of-night
comment by Intrism · 2013-04-15T00:00:30.688Z · LW(p) · GW(p)

Could you try using smaller candy?

The way the feeder is built, that wouldn't really help. It dispenses a constant volume, not a set number of candies. I could try to reduce the dispensed volume further, but I think other techniques would be best to try first.

if the reward is in the system, I tend not to wait very long before using it.

This seems OK to me.

It's not a problem except insofar as it interferes with some of the rules.

Or perhaps giving 1 candy per N points? Or giving a candy with probability 1/N?

These are the two big options I'm considering for next time. I'm leaning towards the "1 candy per N points" model, because that allows me to "gamify" the system with a big XP bar.

Replies from: whateverfor
comment by whateverfor · 2013-04-15T06:03:12.733Z · LW(p) · GW(p)

You could try "adulterating" the candy with something non-edible, like colored beads. It would fix the volume concerns, be easily adjustable, and possibly add a bit of variable reinforcement.

comment by mare-of-night · 2013-04-15T20:57:58.044Z · LW(p) · GW(p)

This seems OK to me. The reward should be instant, to create a better connection with the behavior you want to reward.

That's just what I was thinking - the association would be stronger if the reward were instant

comment by FiftyTwo · 2013-04-13T22:39:13.068Z · LW(p) · GW(p)

To provide the data, I linked the device to an earlier productivity hack that I already had around, a custom webapp integrating a task list with a Pomodoro timer.

I've been looking for something like that for a while, can you share it?

Replies from: Intrism
comment by Intrism · 2013-04-13T22:47:44.558Z · LW(p) · GW(p)

Maybe later, but as it is the application's a bit hacked together; I'd be a bit embarrassed to show it around, honestly. I'm going to clean it up soon, so I might open-source it then.

Replies from: palladias
comment by palladias · 2013-04-14T02:59:05.100Z · LW(p) · GW(p)

Want to set a deadline as motivation? I think people will be excited about this.

Replies from: Intrism
comment by Intrism · 2013-04-14T23:38:54.151Z · LW(p) · GW(p)

I can do that. I can't promise soon, because I've got quite a bit of classwork to do, but I've set a deadline for two weeks from now. Expect to see something before then.

comment by Richard_Kennaway · 2013-04-15T13:24:56.680Z · LW(p) · GW(p)

Given that in fact, you can eat or drink whatever you want whenever you want, how do you enforce the conditionality on yourself? Isn't this an exercise in role-playing, and if so, how do you sustain the pretence?

Replies from: Intrism
comment by Intrism · 2013-04-15T15:09:30.548Z · LW(p) · GW(p)

you can eat or drink whatever you want whenever you want

Not quite. I don't have any candy easily available to me (I suppose I could buy more, but that would be a pain), aside from what's in the machine. Theoretically, I could eat that whenever I want, but I have some pretty strong incentives not to do so. I've precommitted to not taking any candy out, and I don't want to break the precommitment (plus doing so would probably ruin the system forever). And, of course, there's a very well-placed Schelling fence helping me stay honest, so it's not even that hard.

comment by pjeby · 2013-04-15T04:19:58.989Z · LW(p) · GW(p)

I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day. Additionally, I seem to be entirely incapable of pacing myself; if the reward is in the system, I tend not to wait very long before using it. This has crippled all of the rules about involving taking away rewards - unless the rewards are blocked, they don't stick around in the system long enough to be taken away.

All of these problems could probably be fixed by denominating rewards in Quirrel points instead of house points. ;-) (HPMOR reference)

That is, have the device display (you said it had a display) a count of points, which has to reach a certain level before the reward is dispensed. Ideally, make it flash and ring a bell or make a sound for each point as well.

Since you will be using smaller point denominations, this means you'll also want to break some of your larger tasks into smaller pieces, so you can then fix this problem:

if I think a task will take me more than a day to finish, that's more than a day of work which earns me no short-term rewards

For example, you could break that task down into pomodoro-estimable units, then give yourself a point per pomodoro or something. Ideally, calibrated so that it takes you the whole day to get all four dispensings.

If you work a 16-pomodoro day, for example, then four pomodoros could be worth one dispensing. This would certainly increase motivation to not interrupt those later pomodoros in each 2-hour work/rest/eat cycle. (The first one in each cycle you'd not have much to lose, but that should also be when you're generally freshest due to having rested or just starting for the day.)

comment by atorm · 2013-04-14T03:24:18.867Z · LW(p) · GW(p)

I like the name "Extrinsic motivator".

Replies from: ShardPhoenix
comment by ShardPhoenix · 2013-04-14T03:32:45.458Z · LW(p) · GW(p)

Me too, but it's better with the "The" in front.

comment by John_Maxwell (John_Maxwell_IV) · 2013-04-15T22:18:41.799Z · LW(p) · GW(p)

I'd be worried about eating candy constantly for the sake of your teeth. You might want to try Trident gum, which is also tasty and should actually be slightly good for your teeth. Gum chewing is supposed to improve alertness and focus, although I find that if I spend too much of the day chewing it, I get this weird sense of burnout.

Replies from: Intrism
comment by Intrism · 2013-04-17T14:59:31.983Z · LW(p) · GW(p)

I'd be worried about eating candy constantly for the sake of your teeth.

It's not that constant. I'm fairly sure I've been eating less candy under the system than I would if I had a no-strings-attached bag of candy corn, anyhow...

comment by mare-of-night · 2013-04-15T21:33:23.909Z · LW(p) · GW(p)

I wonder if it's possible to hack an electronic air freshener in this way. Make it make the room smell nice when you're working especially quickly. Sound might also work, or maybe both together. I suspect it would be a weaker reinforcer than candy, but it could be immediate without distracting you, and you might tire of it less quickly than candy. (I should try this myself at some point, but I think I've already been classically conditioned to associate typical perfume smells with my eyes burning (classmates at a school I went to liked to spray one I was allergic to all over the locker room), so I'd have to find one that has an unusual scent, or one that can spray things like spirit of peppermint instead.)

I find that just marking when I finish things is a bit reinforcing for me, especially if there's a chart, and especially if it shows completion time relative to the due date. (Right now, I'm using Tom's Planner, a Gantt chart app, for one of my class projects. It's working really well.)

I also tried adding "meetings" to Google Calendar for everything I did for about a week, and I think it reduced the time I spent not working on things, but my "work" time might have been a little less intensely productive than normal. I stopped because having to record everything I did got really annoying after a while. I suspect part of the problem was that the user interface wasn't optimized for using it that way, so I'm planning to try building something that makes it easier to record, while still showing a chart of how the time was spent.

I wonder if involving another person in this system would make it more effective, since social approval tends to be really motivating for humans. Though, that would require there to be another person who is usually around when you're working and wants to participate.

aside from my task list, what else can I connect to?

  • RescueTime (records what websites and applications you use and for how long)
  • word count of a text editor or word processor
  • a keylogger

If your typing speed has a strong correlation to how quickly you're getting your work done, I wonder if you could combine RescueTime to automatically only count typing in certain applications. Though you'd also have to have some way to avoid accidentally rewarding for non-work use of word processors and such.

Replies from: Intrism
comment by Intrism · 2013-04-17T14:58:10.024Z · LW(p) · GW(p)

I wonder if it's possible to hack an electronic air freshener in this way.

I've never known an electronic air freshener which wouldn't be more useful as a punishment than a reward.

Right now, I'm using Tom's Planner, a Gantt chart app, for one of my class projects. It's working really well.

I've considered that sort of thing, but I'm not really very good at estimating how long things take. I had considered building a time-estimation game into my scheduler, but so far that hasn't worked out.

If your typing speed has a strong correlation to how quickly you're getting your work done

Unfortunately, work speed only loosely correlates with utility. Bad code written quickly can waste more time than it saves.

comment by therufs · 2013-04-14T19:09:11.847Z · LW(p) · GW(p)

Precommitting to never take candy out of the machine seems like it'll be a lot easier than precommitting to only sometimes take candy out of the bag.

I've been having some issues being clear about when I'm actually rewarding myself; this sheds some light on the subject. I need to find a machine to put my rewards in. Thanks.

comment by amitpamin · 2013-04-20T10:22:49.526Z · LW(p) · GW(p)

Be aware that there's significant research that extrinsic motivators crowd out intrinsic ones. Essentially, they increase total motivation at the cost of sometimes reducing intrinsic motivation, which in turn creates a reliance on the extrinsic motivators. A good book on the subject is Punished by Rewards: The Trouble With Gold Stars, but if you do decide to read the book, be aware that the author is biased towards self-determination theory.

comment by Sarokrae · 2013-04-16T17:38:15.337Z · LW(p) · GW(p)

Some of my recent forays into reinforcement learning have been very helpful. I should point out that my life is made a whole lot easier by having a very co-operative OH who is willing to reward me or withhold reward as appropriate, so I've not needed to resort to building a robot!

Things that have been successful:

  • Every time I think about {thing I enjoy obsessing about}, I go and do the washing up. I used to have a massive ugh field around washing up, but this has quickly diminished (within days!) via association with the nice thoughts. We're thinking of applying this method to other things I have ugh fields around, since it was so quick and effective.
  • I've been doing a similar thing to D_Malik with regards to Anki cards. However, it was impractical for me to withhold a reward I would be having on a daily basis, so my OH is implementing "withhold {nice thing} unless I have reviewed my Anki cards for the previous 5 days". It's not as immediate as not eating, but seems to be sufficiently encouraging thus far.

But yeah, having a person help me do it means I avoid any sort of precommitment failure, and generally makes things much easier!

(Side note: Curly brackets clearly denote euphemisms, but I didn't want to be too crude.)

comment by MugaSofer · 2013-05-20T10:08:09.092Z · LW(p) · GW(p)

CFAR describes reinforcement in a very striking way in some of their course materials: they call it "training your inner pigeon." Not only is that a nice, snappy turn of phrase, it illustrates the problem with attempting to self-administer rewards very nicely. Did Skinner's pigeons self-administer their rewards? No, of course they didn't. I shouldn't expect my inner pigeon to, either. So, my next step is to build a robot that gives me candy when I get stuff done.

The idea is that your conscious mind is to a degree separate from your lizard brain, so "you" train your "inner pigeon". You aren't your inner pigeon, you're Skinner.

Of course, the reinforced parts of the brain overriding the smart bits is what akrasia is all about, so I'm not sure how well this neat demarcation works in practice.