A Very Concrete Model of Learning From Regrets

squirrelinhell

A Very Concrete Model of Learning From Regrets

post by SquirrelInHell · 2016-07-09T11:30:57.848Z · LW · GW · Legacy · 3 comments

Warning 1: This post is written in the form of Java-like pseudocode.

If you have no knowledge of programming, you might have trouble understanding it.

(If you do, it still does not guarantee you will understand, but your chances are better.)

Warning 2: I have more than moderate, but less than high, confidence that this model is approximately correct.

It doesn't mean that my or anyone's brain works exactly in the way shown in the code, but rather that the flow of data in the brain is approximately as if it were using such an algorithm.

The word "approximately" includes stuff I don't (yet) know about, but also stuff I didn't include below to keep it simple.

I wrote this specifically for regrets, but processing of positive memories seems to have similar mechanics (with different constants).

Warning 3: There is little chance of finding any existing studies/data etc. that could directly validate or invalidate this model. (However if you know of any, I'm all ears.)

There might some stuff that is correlated, so if you know something mention it too.

class Brain
{
    ...
    
    // This represents a memory about a single event
    
    class Memory
    {
        ...
        
        float associatedEmotions; // positive or negative
    }
    
    // Your brain keeps track of this
    
    private Map<Memory, Float> memoriesRequireProcessing = new Map<>();
    
    // Add new stuff to the queue
    
    private void somethingHappened(Memory newMemory)
    {
        float affect = getAffectOfSituation(newMemory);
        
        newMemory.associatedEmotions = affect * 0.5;
    
        if (Math.abs(affect) > 0.1)
            memoriesRequireProcessing.add(newMemory, Math.abs(affect));
    }
    
    // You have no control over how this works,
    // but you can influence the confidence parameter
    // (mostly indirectly, a little bit directly)
    
    protected void learnedMyLesson(Memory m, float confidence)
    {
        float previousValue =
            memoriesRequireProcessing.get(m);
        
        float nextValue = previousValue * (1.0 - confidence);
        
        if (nextValue > 0.1)
            memoriesRequireProcessing.set(m, nextValue);
        else
            memoriesRequireProcessing.remove(m);
    }
    
    // You can consciously override this and do something else
    //
    // @return: judgement of success or failure
    
    protected float ruminateOnMemory(Memory m)
    {
        // Depends on the situation, but the default is
        // relatively low confidence
        
        learnedMyLesson(m, 0.1);
        
        // Substitute affect for judgement of success
        
        return getAffectOfSituation(m);
    }
    
    // This prompts some thoughts about a memory
    
    private void rememberAbout(Memory m)
    {
        feelEmotion(m.associatedEmotions);
    
        float judgement = ruminateOnMemory(m);
        
        m.associatedEmotions =
            0.9 * m.associatedEmotions
            + 0.2 * judgement;
    }
    
    // Your brain does this all the time
    
    private void onIdle()
    {
        while (memoriesRequireProcessing.thereIsALotOfShit())
        {
            // Choose some memory paired with a high value
            
            Memory next = memoriesRequireProcessing.choose();
            
            rememberAbout(next);
        }
        
        ...
    }
    
    ...
}

3 comments

Comments sorted by top scores.

comment by Gunnar_Zarncke · 2016-07-09T19:19:15.349Z · LW(p) · GW(p)

Maybe you want to look into cognitive architectures e.g. LIDA.

Replies from: SquirrelInHell

↑ comment by SquirrelInHell · 2016-07-10T03:54:52.152Z · LW(p) · GW(p)

Thanks, this is interesting.

comment by gwern · 2016-07-09T15:19:27.655Z · LW(p) · GW(p)

Sounds like actor-critic with experience replay RL.