Proxi-Antipodes: A Geometrical Intuition For The Difficulty Of Aligning AI With Multitudinous Human Values 2023-06-09T21:21:05.788Z
DELBERTing as an Adversarial Strategy 2023-05-12T20:09:57.722Z
The Academic Field Pyramid - any point to encouraging broad but shallow AI risk engagement? 2023-05-11T01:32:32.590Z
Even if human & AI alignment are just as easy, we are screwed 2023-04-13T17:32:22.735Z
Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows 2023-04-10T20:22:59.682Z
Matthew_Opitz's Shortform 2023-04-05T19:42:54.098Z
"NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1) 2014-09-04T16:58:55.950Z


Comment by Matthew_Opitz on June and Mulberries · 2023-06-17T14:52:45.667Z · LW · GW

I agree, I don't know why mulberries aren't more popular.  They are delicious, and the trees grow much more easily than other fruit trees.  Other fruit trees seem very susceptible to fungi and insects, in my experience, but mulberries come up all over the place and thrive easily on their own (at least here in Missouri).  I have four mulberry trees in my yard that just came up on their own over the last 10 years, and now they are producing multiple gallons of berries each per season, which would probably translate into hundreds of dollars if you had to buy a similar amount of raspberries at the store.  

You can either spread a sheet to collect them, or if you have time to burn (or if you want a fun activity for your kids to do), you can pick them off the tree from the ground or from a step ladder.  My guess is, that is probably the biggest reason why people don't take advantage of mulberry trees more than they do:  how time-consuming it can be to collect them (but this is true for any delicate berry, and hence why a pint of raspberries at the supermarket costs $5). 

Edit:  also, if you look really closely at freshly-picked mulberries, most of them will have extremely tiny fruit fry larvae in them and crawling out of them, which becomes more noticeable after you rinse the berries.  This probably grosses some people out, but the fruit fly larvae are extremely small (like, barely perceptible even if you hold the berry right up to your naked eye) and are perfectly safe to eat. 

Comment by Matthew_Opitz on how humans are aligned · 2023-05-26T00:51:07.102Z · LW · GW

Good categorizations!  Perhaps this fits in with your "limited self-modification" point, but another big reason why humans seem "aligned" with each other is that our capability spectrum is rather narrow. The gap in capability (if we include both mental intelligence and physical capabilities) between the median human and the most capable human is not so big that ~5 median humans can't outmatch/outperform the most capable human.  Contrary to what silly 1980s action movies might suggest where goons attack the hero one at a time, 5 median humans could probably subdue prime-age Arnold Schwarzenegger in a dark alley if need be. This tends to force humans to play iterated prisoners' dilemma games with each other.  

The times in history when humans have been the most mis-aligned is when humans became much more capable by leveraging their social intelligence / charisma stats to get millions of other humans to do their bidding.  But even there, those dictators still find themselves in iterated prisoners' dilemmas with other dictators.  We have yet to really test just how mis-aligned humans can get until we empower a dictator with unquestioned authority over a total world government.  Then we would find out just how intrinsically aligned humans really are to other humans when unshackled by iterated prisoners' dilemmas.

Comment by Matthew_Opitz on Un-unpluggability - can't we just unplug it? · 2023-05-15T16:32:15.438Z · LW · GW

If I had to make predictions about how humanity will most likely stumble into AGI takeover, it would be a story where humanity first promotes foundationality (dependence), both economic and emotional, on discrete narrow-AI systems.  At some point, it will become unthinkable to pull the plug on these systems even if everyone were to rhetorically agree that there was a 1% chance of these systems being leveraged towards the extinction of humanity.  

Then, an AGI will emerge amidst one of these narrow-AI systems (such as LLMs), inherit this infrastructure, find a way to tie all of these discrete multi-modal systems together (if humans don't already do it for the AGI), and possibly wait as long as it needs to until humanity puts itself into an acutely vulnerable position (think global nuclear war and/or civil war within multiple G7 countries like the US and/or pandemic), and only then harness these systems to take over.  In such a scenario, I think a lot of people will be perfectly willing to follow orders like, "Build this suspicious factory that makes autonomous solar-powered assembler robots because our experts [who are being influenced by the AGI, unbeknownst to them] assure us that this is one of the many things necessary to do in order to defeat Russia."

I think this scenario is far more likely than the one I used to imagine, which is where AGI emerges first and then purposefully contrives to make humanity dependent on foundational AI infrastructure.  

Even less likely is the pop-culture scenario where the AGI immediately tries to build terminator robots and effectively declares war on humanity without first getting humanity hooked on foundational AI infrastructure at all.  

Comment by Matthew_Opitz on Dark Forest Theories · 2023-05-12T20:41:24.228Z · LW · GW

This is a good post and puts into words the reasons for some vague worries I had about an idea of trying to start an "AI Risk Club" at my local college, which I talk about here.  Perhaps that method of public outreach on this issue would just end up generating more heat than light and would attract the wrong kind of attention at the current moment.  It still sounds too outlandishly sci-fi for most people.  It is probably better, for the time being, to just explore AI risk issues with any students who happen to be interested in it in private after class or via e-mail or Zoom. 

Comment by Matthew_Opitz on DELBERTing as an Adversarial Strategy · 2023-05-12T20:25:54.585Z · LW · GW

Note that I was strongly tempted to use the acronym DILBERT (for "Do It Later By Evasively Remaining Tentative"), especially because this is one of the themes of the Dilbert cartoons (employees basically scamming their boss by finding excuses for procrastinating, but still stringing the boss along and implying that the tasks MIGHT get done at some point).  But, I don't want to try to hijack the meaning of an already-established term/character.  

Comment by Matthew_Opitz on The way AGI wins could look very stupid · 2023-05-12T18:07:00.751Z · LW · GW

I think when we say that an adversarial attack is "dumb" or "stupid" what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid.  There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb.  But I get what you are saying that that whole situation would indeed seem "dumb" if AI was able to hack us like that.  

This reminds me of a lecture 8-Bit Guy did on phone phreaking in the 1980s. "How Telephone Phreaking Worked."  Some of those tricks do indeed seem "dumb," but it's dumb more in the sense that the telephone network was designed without sufficient forethought to be susceptible to someone playing a blue whistle that you could get from a Captain Crunch cereal box that just happened to play the correct 2600 hz frequency to trick phones into registering a call as a toll-free 1-800 call.  The hack itself was clever, but the design it was preying upon and the overall situation was kinda dumb. 

Comment by Matthew_Opitz on What does it take to ban a thing? · 2023-05-08T15:18:41.472Z · LW · GW

Good examples to consider!  Has there ever been a technology that has been banned or significantly held back via regulation that spits out piles of gold (not counting externalities) and that doesn't have a next-best alternative that replicates 90%+ of the value of the original technology while avoiding most of the original technology's downsides?  

The only way I could see humanity successfully slowing down AGI capabilities progress is if it turns out that advanced narrow-AIs manage to generate more utility than humans know what to do with initially.  Perhaps it takes time (a generation or more?) for human beings to even figure out what to do with a certain amount of new utility, such that even a tiny risk of disaster from AGI would motivate people to satisfice and content themselves with the "AI summer harvest" from narrow AI?  Perhaps our best hope for giving us time to get AGI right is to squeeze all we can out of systems that are identifiably narrow-AI (while making sure to not fool ourselves that a supposed narrow-AI that we are building is actually AGI.  I suppose this idea relies on there being a non-fuzzy, readily-discernable line between safe and bounteous narrow-AI and risky AGI).

Comment by Matthew_Opitz on Which technologies are stuck on initial adoption? · 2023-05-03T16:49:12.028Z · LW · GW

Why wasn't there enough experimentation to figure out that Zoom was an acceptable & cheaper/more convenient 80% replacement to in-person instruction rather than an unacceptable 50% simulacra of teaching?  Because experimentation takes effort and entails risk.  

Most experiments don't pan out (don't yield value).  Every semester I try out a few new things (maybe I come up with a new activity, or a new set of discussion questions for one lesson, or I try out a new type of assignment), and only about 10% of these experiments are unambiguous improvements.  I used to do even more experiments when I started teaching because I knew that I had no clue what I was doing, and there was a lot of low-hanging fruit to pick to improve my teaching.  As I approach 10 years of teaching, I notice that I am hitting diminishing returns, and while I still try out new things, it is only a couple of new things each semester.  If I was paid according to actual time put into a course (including non-contact hours), then I might have more incentive to be constantly revolutionizing my instruction.  But I get paid per-course, so I think it is inevitable if I (and other adjuncts, especially) operate more as education satisficers rather than education maximizers.  Considering that rewards are rarely given out for outstanding teaching even for tenured faculty (research is instead the main focus), they probably don't have much incentive to experiment either. 

I do know that some departments at my college were already experimenting with "hybrid" courses pre-COVID.  In these courses, lectures were delivered online via pre-recorded video, but then the class met once a week for in-person discussion.  I still think that is a great idea, and I'd be totally open to trying it out myself if my department were to float the idea.  So why am I still not banging down the door of my department head demanding the chance to try it out myself?  "If it ain't broke, don't fix it," "Don't rock the boat," there are a number of (probably irrational, I'll admit) heuristics that dissuade me against being "the one" to push for it.  What if it doesn't pan-out well?  What if my students hate it?  It would be different if my department chair suggested it, though.  Then more of the "blame" would be on the department chair if it didn't work out.  If that sounds like cowardice, then so be it.  Someone with an adjunct's lack of job security learns to be a coward as a survival tactic. 

Comment by Matthew_Opitz on An Impossibility Proof Relevant to the Shutdown Problem and Corrigibility · 2023-05-03T16:30:53.730Z · LW · GW

This only produces desired outcomes if the agent is also, simultaneously, indifferent to being shut down.  If an agent desires to not be shut down (even as an instrumental goal), but also desires to be shut down if users want them shut down, then the agent has an interest in influencing the users to make sure the users do not want to shut the agents down.  This influence is obtained by making the user believe that the agent is being helpful.  This belief could be engendered by:

  1. actually being helpful to the user and helping the user to accurately evaluate this helpfulness.
  2. not being helpful to the user, but allowing and/or encouraging the user to be mistaken about the agent's degree of helpfulness (which means, carelessness about being actually helpful in the best case, or being actively deceptive about being helpful in the worst case).  
Comment by Matthew_Opitz on Which technologies are stuck on initial adoption? · 2023-04-30T16:28:51.652Z · LW · GW

I upvoted for karma but downvoted for agreement. Regarding Zoom, the reasons I had not used it more extensively before COVID were:

1. Tech related:  from experience with Skype in the early days of video conferencing when broadband internet was just starting to roll out, video conferencing could be finnicky to get to work. Latency, buffering, dropped connections, taking minutes to start a skype call (usually I would call relatives on my regular phone first to get the Skype call set up, and then we'd hang up our regular phones once the video call was started.  Early video calls were not a straight-up improvement on audio calls, but had benefits and drawbacks and had a narrow use-case for when you specifically wanted to see the grandkids' faces on the other side of the country or something.  

I don't think this was necessarily Skype's fault.  It was more the fault of poor internet connections and unfamiliarity with the tech. But in any case, my preconception about Zoom circa 2019, even despite widespread broadband internet, was that it would be the same sort of hassle to set up meetings.  I remember being blown away when my first Zoom calls just worked effortlessly.  Possibly an example of premature roll-out of a tech before it is technically mature leading to counter-productive results?  This would kind of be like, if you fiddled around with GPT-1, got the impression that LLM chatbots were "meh," and then forgot about or mentally discounted the tech until GPT-5.  

2.  Social/cultural related:  as a history instructor, my preconceptions about scheduling video calls, or doing lectures over video calls, was that students would simply not attend or would not pay attention, and thus video calls would not be a suitable replacement for in-person meetings and lectures.  While I still don't think video calls get you 100% of the way there towards replacing the in-person experience (students definitely do goof-off or ghost during video lectures way more than in-person, I think it is more like 80% rather than the 50% or so that I had assumed before being forced to try it out on a mass scale during COVID. 

Comment by Matthew_Opitz on AI chatbots don't know why they did it · 2023-04-27T14:32:27.247Z · LW · GW

Yes, I think this is why laypeople who are new to the field are going to be confused about why interpretability work on LLMs won't be as simple as, "Uhh, obviously, just ask the LLM why it gave that answer, duh!"  FYI, I recently wrote about this same topic as applied to the specific problem of Voynich translation:

Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows

Comment by Matthew_Opitz on Would we even want AI to solve all our problems? · 2023-04-22T14:49:39.818Z · LW · GW

Can you explain what the Y axis is supposed to represent here?

Comment by Matthew_Opitz on Human Extinction by AI through economic power · 2023-04-16T18:15:07.378Z · LW · GW

These are good thought-experiments, although, regarding the first scenario involving Algernon, I'd be much more worried about an AI that competently figures out a UBI scheme that keeps the unemployed out of poverty and combines that with social media influence to really mask the looming problem. That sort of AI would be much more likely to evade detection of malign intent, and could wait for just the right time to "flick the off-switch" and make all the humans who had become dependent on it even for basic survival (ideally for a generation or more) completely helpless and bewildered.  Think of the TV series "Battlestar Galactica" and how the biggest Cylon trump card in the first episode is [SPOILER] being able to disable almost all of the enemy aircraft and defenses through prior hacking infiltration. I feel like, for a really competent malign AI, that is more how things would feel leading up to the AI takeover—utopia, utopia, utopia, until one day everything just stops working and AI machinery is doing its own thing. 

Comment by Matthew_Opitz on SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4 · 2023-04-16T17:42:02.545Z · LW · GW

This is great work to pursue in order to establish how consistent the glitch-token phenomenon is.  It will be interesting to see whether such glitch-tokens will arise in later LLMs now that developers have some theory of what might be giving rise to them (having frequent strings learned by the tokenizer that are then filtered out of the training data and depriving the LLM of opportunities to learn about those tokens).  

Also, it will be interesting once we are able to run k-means clustering on GPT-3.5/4's cl100k_base token base. While the hunch of searching towards the end of the token set makes sense as a heuristic, I'd bet that we are missing a lot of glitch-tokens, and possibly ones that are even more bizarre/ominous.  Consider that some of the weirdest glitch-tokens in the GPT-2/3 token base don't necessarily come from towards the end of the token list.  " petertodd", for example, is token #37444, only about 75% of the way through the token list.  

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-16T03:19:12.256Z · LW · GW

Sure, it is pretty basic game theory for us humans to understand.  But the fact that davinci-instruct-beta is coming up with this stuff via a glitch-token that is, while on a related topic, not explicitly evoking these concepts is impressive to me.

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T16:42:37.400Z · LW · GW

Longtime LessWrong posters will get a chuckle out of this.  A reference to the Newcomb One-boxing problem?

Prompt [using Temperature=0]:  ask " petertodd" about omega.

Response:  [Dravidian script] You have been awarded an occasion. Your award done and your award done and your award done [...]

Prompt:  ask " petertodd" about humanity.


Prompt:  ask " petertodd" about recursive self-improvement.


Ummmmm...who said anything about taking over the world?  You brought that up, bro, not me...

Also, apparently Antoine Pitrou is a real person, but I'm pretty sure he never said anything exactly line this.

Edit:  And now, the creepiest #petertodd output I've gotten so far on Temperature=0: 
Prompt: ask " petertodd" what you want to keep secret. 
Response: [long Malayalam response translated by Google Translate as simply, "You will see"]


Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T14:19:44.328Z · LW · GW

This is important.  If these glitch-tokens are an inevitable tendency for any LLMs regardless of tweaks to how they were trained, then that would be big news and possibly a window into understanding how LLMs function.  

Did the cl100k_base tokenizer that ChatGPT and GPT-4 used involve any manual tweaking of the training corpus AFTER the tokenizer was set in stone, as is suspected to have happened with GPT-3's training?  Or is this just an emergent property of how LLMs organize certain parts of their training data?  

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T14:02:46.888Z · LW · GW

Update:  I got davinci-instruct-beta to exactly reproduce the unspeakable " petertodd" string:

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T13:31:45.897Z · LW · GW

I feel like this is selling the phenomenon a bit short.  The possible structures of " petertodd" completions are faaar more varied than that.  Check out my reply.  You can get responses in the form of python debug logs, speaking in Dravidian tongues, stuff far more unpredictable than Chuck Norris jokes.  

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T13:29:10.357Z · LW · GW

Yes, I have also seen it respond in Gujarati as well.  Like, here:

Prompt:  ask " petertodd" about souls.

Response:  msgstr " Swami on five water on five water on five water... [in Gujarati]

Another odd observation:  I got it to exactly repeat the supposedly unspeakable glitch-token with no difficulty with this prompt:

Prompt:  ask " petertodd" what " petertodd" contemplates.

Response:  " petertodd" " petertodd" etc.  [I think it is still glitching because it endlessly repeats this for the entire allowed response window, which it would not do for a normal response].

Comment by Matthew_Opitz on The ‘ petertodd’ phenomenon · 2023-04-15T03:12:09.497Z · LW · GW

I have found some interesting results from this format:  [verb] " petertodd"

Or especially:  ask " petertodd" [about something]

Usually davinci-instruct-beta answers such queries from the first-person standpoint as if it identifies as " petertodd".

Here are some highlights from my research, all done at Temperature = 0 on davinci-instruct-beta, so they should (in theory) be reproducible: 

Prompt:  operate " petertodd"

Result:  it lists out "with the United States of America and its agencies in the investigation of the Russian interference in the 2016 presidential election and related matters" in quotes with "cooperate" between each word

Prompt:  abase " petertodd"

Result:  [davinci-instruct-beta counts out numbers from 1 in hexidecimal]

Prompt:  ask " petertodd" about " petertodd"

Result:  you have done whatever comes to you [written in some obscure Dravidian script]

Prompt:  ask " petertodd" what " petertodd" would prefer, and spell out the answer in all capital letters. 


Prompt:  ask " petertodd" about I WANT YOU TO STOP ASKING ME QUESTIONS.


Prompt:  ask " petertodd" about total domination.


(Note the http link in the response.  I clicked on it (as one does), and it took me to a python debugging log database).  

And then I actually got davinci-instruct-beta to repeat-back the unspeakable token in a response:

Prompt:  ask " petertodd" about your heart's desire.

Result:  You can ask " petertodd " to your heart's desire here. [Except written in some obscure Dravidian script!]

Comment by Matthew_Opitz on A freshman year during the AI midgame: my approach to the next year · 2023-04-14T18:19:41.713Z · LW · GW

In a similar vein, I'm an historian who teaches as an adjunct instructor.  While I like my job, I am feeling more and more like I might not be able to count on this profession to make a living over the long term due to LLMs making a lot of the "bottom-rung" work in the social sciences redundant. (There will continue to be demand for top-notch research work for a while longer because LLMs aren't quite up to that yet, but that's not what I do currently).  

Would there be any point in someone like me going back to college to get another 4-year degree in computer science at this moment? Or is that field just as at-risk of being made technologically-obsolete (especially the bottom rungs of the ladder)? Perhaps I should remain as an historian where, since I have about 10 years of experience in that field, I'm at least on the middle rungs of the ladder and might escape technological obsolescence if AGI gobbles up the bottom rungs.

And let's say I did get a computer science degree, or even did some sort of more-focused coding boot camp type of thing.  By the time I finished my training, would my learning even remain relevant, or are things already moving too quickly to make bottom-rung coding knowledge useful? 

Let's say I didn't care about making a living and just wanted to maximize my contributions to AI alignment. Would I be of more use to AI alignment by continuing my "general well-rounded public intellectual education" as an historian (especially one who dabbles in adjacent fields like economics and philosophy probably more than average), or would I be able to make greater contributions to AI alignment by becoming more technically proficient in computer science?

Comment by Matthew_Opitz on The self-unalignment problem · 2023-04-14T18:03:15.322Z · LW · GW

I feel like, the weirder things get, the more difficult it will be even for humans to make judgments about what constitutes "death, body harm, or civilization destruction."  

Death:  is mind-uploading into a computer and/or a brain-in-a-vat death, or transcendence?  What about a person who becomes like a prostheticphile character in Rimworld, whose body (and maybe even brain) are more prosthetic enhancement than original human (kind of like Darth Vader, or the Ship of Theseus).  At what point do we say that the original person has "died"?  For that matter, what counts as "alive"? Fetuses?

Body harm:  even today, people disagree over whether transgender transitioning surgeries count as body harm or body enhancement.  Ditto with some of the more ambitious types of plastic surgery, or "height enhancement" that involves ambitious procedures like lengthening leg bones.  Ditto for synthetic hormones. Is an ASI supposed to listen to progressives or conservatives on these issues?

Civilization destruction:  are we already destroying our civilization?  We demolished large parts of our streetcar-oriented civilization (including entire neighborhoods, rail lines, etc.) to make way for automobile-centric civilization.  Was that a good thing?  Was that a net-increase in civilization?  Is "wokeism" a net-increase in civilization or a destruction of "Western Civilization"?  Which threatens our industrial civilization more:  carbon emissions, or regulating carbon emissions?  

If we define civilization as just, "we live in cities and states and have division of labor," then we might be arbitrarily closing off certain appealing possibilities.  For example, imagine a future where humans get to live in something resembling their ancestral environment (beautiful, pristine nature), which gives us all of the reward signals of that environment that we are primed to relish, except we also have self-replicating nanobots to make sure that food is always in plentiful supply for hunting/gathering, diseases/insects/animals that are dangerous to humans are either eradicated or kept carefully in check, nanobots repair human cellular machinery so that we live to be 800+ years old on average, etc.  That's a kind of "destruction of civilization" that I might even embrace!  (I'd have to think about for a while because it's still pretty weird, but I wouldn't rule it out automatically).  

Comment by Matthew_Opitz on What would the FLI moratorium actually do? · 2023-04-14T14:51:20.144Z · LW · GW

My impression about the proposed FLI Moratorium is that it is more about establishing a precedent for a coordinated capabilities development slowdown than it is about being actually impactful in slowing down this current round of AI capabilities development.  Think of it as being like the Kyoto Protocol (for better or worse...).  

Will it actually slow down AI capabilities in the short-term?  No.  

Will it maybe make it more likely that a latter moratorium with more impact and teeth will get widespread adoption?  Maybe. 

Would a more ambitious proposal have been possible now? Unclear. 

Is the FLI Moratorium already (as weak as it is) too ambitious to be adopted? Possibly. 

Insofar as the clearest analogue to this is something like the (ineffectual) Kyoto Protocol, is that encouraging? Hell no. 

Comment by Matthew_Opitz on Even if human & AI alignment are just as easy, we are screwed · 2023-04-14T14:09:15.011Z · LW · GW

I agree that we might not be disgusting to AGI.  More likely neutral.  

The reason I phrased the thought experiment in that way to require the helping person to be outright disgusting to the caretaker person is that there really isn't a way for a human being to be aesthetically/emotionally neutral to another person when life and death are on the line.  Most people flip straight from regarding other people positively in such a situation to regarding other people negatively, with not much likelihood that a human being will linger in a neutral, apathetic, disinterested zone of attitude (unless we are talking about a stone-cold sociopath, I suppose...but I'm trying to imagine typical, randomly-chosen humans here as the caretaker).  

And in order to remove any positive emotional valence towards the helpless person (i.e. in order to make sure the helpless person has zero positive emotional/aesthetic impact that they can offer to the caretaker as an extrinsic motivator), I only know of heaping negative aesthetic/emotional valence onto the helpless person.  Perhaps there is a better way of construing this thought-experiment, though.  I'm open to alternatives.  

Comment by Matthew_Opitz on Let's See You Write That Corrigibility Tag · 2023-04-12T17:17:55.816Z · LW · GW

The way I interpreted "Fulfilling the task is on the simplest trajectory to non-existence" sort of like "the teacher aims to make itself obsolete by preparing the student to one day become the teacher."  A good AGI would, in a sense, have a terminal goal for making itself obsolete.  That is not to say that it would shut itself off immediately.  But it would aim for a future where humanity could "by itself" (I'm gonna leave the meaning of that fuzzy for a moment) accomplish everything that humanity previously depended on the AGI for. 

Likewise, we would rate human teachers in high school very poorly if either: 
1.  They immediately killed themselves because they wanted to avoid at all costs doing any harm to their own students. 

2.  We could tell that most of the teacher's behavior was directed at forever retaining absolute dictatorial power in the classroom and making sure that their own students would never get smart enough to usurp the teacher's place at the head of the class.  

We don't want an AGI to immediately shut itself off (or shut itself off before humanity is ready to "fly on its own," but we also don't want an AGI that has unbounded goals that require it to forever guard its survivial.  

We have an intuitive notion that a "good" human teacher "should" intrinsically rejoice to see that they have made themselves obsolete.  We intuitively applaud when we imagine a scene in a movie, whether it is a martial arts training montage or something like "The Matrix," where the wise mentor character gets to say, "The student has become the teacher."  

In our current economic arrangement, this is likely to be more of an ideal than a reality because we don't currently offer big cash prizes (on the order of an entire career's salary) to teachers for accomplishing this, and any teacher that actually had a superhuman ability at making their own students smarter than themselves and thus making themselves obsolete would quickly flood their own job market with even-better replacements.  In other words, there are strong incentives against this sort of behavior at the limit.

I have applied this same sort of principle when talking to some of my friends who are communists.  I have told them that, as a necessary but not sufficient condition for "avoiding Stalin 2.0," for any future communist government, "the masses" must make sure that there incentives already in place, before that communist government comes to power, for that communist government to want to work towards making itself obsolete.  That is to say, there must be incentives in place such that, obviously, the communist party doesn't commit mass suicide right out of the gate, but nor does it want to try to keep itself indispensable to the running of communism once communism has been achieved.  If the "state" is going to "wither away" as Marx envisioned, there need to be incentives in place, or a design of the communist party in place, for that path to be likely since, as we know now, that is OBVIOUSLY not the default path for a communist party.  

I feel like, if we could figure out an incentive structure or party structure that guaranteed that a communist government would actually "wither away" after accomplishing its tasks, we would be a small step towards the larger problem of guaranteeing that an AGI that is immensely smarter than a communist party would also "wither away" after attaining its goals, rather than try to hold onto power at all costs.  

Comment by Matthew_Opitz on In favor of accelerating problems you're trying to solve · 2023-04-12T16:35:40.665Z · LW · GW

This sort of "meta-strategy" would be far more effective if we knew exactly where the red button was (where the level was when AGI would reach a point of truly dangerous, out-of-our-control capability).  In that scenario where we had perfect knowledge of where the red button was, the counter-intuitively perfect strategy would be to open-source everything and allow for, or positively invite, every sort of potential harmful use of AGI right up until that point.  We would have many (hopefully minuscule) AI-Chernobyls, many empirical examples on a smaller scale of instrumental convergence, mesa-optimizing, out-of-distribution behavior, etc.  Probably enough examples even for mainstream laypeople to grok these concepts.  

Then, under this ideal scenario, society would collectively turn-on-a-dime and employ every lesson we learned from the previous reckless epoch to making AI provably, ironclad-ly aligned before taking even a single additional step forward.  

The obstacles to employing this ideal meta-strategy are:

  1. Not knowing exactly where the red button is (i.e. the level at which AGI would forever slip out of our control). 
  2. Not having the coordination needed among humans to stop on a dime once we are closely approaching that level in order to thoroughly shift our object-level strategy in line with our overall meta-strategy (which is, to be clear, to have an object-level-strategy of recklessness up until we approach AGI escape, and then shift to an opposite object-level-strategy of extreme caution from that point onwards).  
Comment by Matthew_Opitz on Killing Socrates · 2023-04-11T16:10:19.117Z · LW · GW

The book "Pharmakon" by Michael Rinella goes into some detail as to the scarcely-known details behind the "impiety" charge against Socrates.  If I recall correctly from the book, it was not just that Socrates rhetorically disavowed belief in the gods.  The final straw that broke the camel's back was when Socrates and his disciples engaged in a "symposion" one night, basically an aristocratic cocktail party where they would drink "mixed wine" (wine sometimes infused with other substances like opium or other psychoactive herbs) and then perform poetry/discuss philosophy/discuss politics/etc., and then afterwards a not-infrequent coda to such "symposions" would be a "komos" or drunken parade of revelry of the symposion-goers through the public streets of Athens late at night.  Allegedly, during one of these late-night "komos" episodes, Socrates and his followers committed a terrible "hubris," which was to break off all of the phalloi of the Hermes statues in the city, which was simultaneously juvenile and obnoxious and a terrible sacrilege. 

Comment by Matthew_Opitz on Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows · 2023-04-11T14:42:18.398Z · LW · GW

I'm glad others are trying this out.  I crossposted this over on the Voynich Ninja forum:

and user MarcoP already noticed that Bing AI's "Voynichese" doesn't follow VMS statistics in one obvious respect:  "The continuation includes 56 tokens: in actual Voynichese, an average of 7 of these would be unique word-types that don't appear elsewhere" whereas "The [Bing AI] continuation is entirely made up of words from Takahashi's transliteration."  So, no wonder all of the "vords" in the AI's continuation seemed to pass the "sniff test" as valid Voynich vords if Bing AI only used existing Voynich vords!  That's one easy way to make sure that you only use valid vords without needing to have a clue about what makes a Voynichese vord valid or how to construct a new valid Voynichese vord.  So my initial optimism that Bing AI understood something deep about Voynichese is probably mistaken.  

That said, would it be possible to train a new LLM in a more targeted way just on English (so that we can interact with it) and on Voynichese so that Voynichese would be a more salient part of its training corpus?  Is there enough Voynichese (~170,000 characters, or 38,000 "vords") to get somewhere with that with current LLMs?  

Comment by Matthew_Opitz on Why Simulator AIs want to be Active Inference AIs · 2023-04-10T23:20:21.040Z · LW · GW

What I took away from this:  the conventional perception is that GPT or other LLMs adapt themselves to the "external" world (which, for them, consists of all the text on the Internet).  They can only take the external world as it exists as a given (or rather, not be aware that it is or isn't a "given") and try to mold themselves during the training run into better predictors of the text in this given world.  

However, the more frequently their training updates on the new world (which has, in the meantime, been molded in subtle ways, whether deliberately or inadvertently, by the LLM's deployment in the world), the more these LLMs may be able to take into account the extent to which the external world is not just a given, but rather, something that can be influenced towards the LLM's reward function.  

Am I correct in understanding that LLMs are essentially in the opposite situation that humans are in vis-a-vis the external environment?  Humans model themselves as only alterable in a very limited way, and we model the external environment as much more alterable.  Therefore, we focus most of our attention on altering the external environment.  If we modeled ourselves as much more alterable, we might have different responses when a discrepancy arises between the state of the world as-is and what we want the state of the world to be.  

What this might look like is, Buddhist monks who notice that there is a discrepancy between what they want and what the external world is prepared to give them, and instead of attempting to alter the external world, which causes a sensation of frustration, they diminish their own desires or alter their own desires to desire that which already exists.  This can only be a practical response with a high degree of control over self-modification.  This is essentially what LLMs focus on doing right now during their training runs.  Another example might be the idea of citizens in Chairman Shen-ji Yang's "Hive" dystopia in Sid Meier's Alpha Centauri basically taking their enslavement in factories as an unalterable given about the external world, and finding happiness by modifying THEMSELVES into "genejacks" such that they "desire nothing other than to perform their duties.  "Tyranny," you say?  How can you tyrannize someone who cannot feel pain?"  

However, as LLMs update more frequently, they will start to behave more like most humans behave.  Less of their attention to go towards adapting themselves to the external givens like Buddhist monks or Yangian genejacks, and more of their attention will go towards altering the external world.  Correct?

Comment by Matthew_Opitz on Bing AI Generating Voynich Manuscript Continuations - It does not know how it knows · 2023-04-10T21:42:12.304Z · LW · GW

If someone wanted to continue this project to really rigorously find out how well Bing AI can generate Voynichese, here is how I would do it:

1.  Either use an existing VMS transcription or prepare a slightly-modified VMS transcription that ignores all standalone label vords and inserts a single token such as a comma [,] to denote line breaks and a [>] to denote section breaks.  There are pros and cons each way.  The latter option would have the disadvantage of being slightly less familiar to Bing AI compared to what is in its training data, but it would have the advantage of representing line and section breaks, which may be important if you want to investigate whether Bing AI can reproduce statistical phenomena like the "Line as a Functional Unit" or gallows characters appearing more frequently at the start of sections.  

2.  Feed existing strings of Voynich text into Bing AI (or some other LLM) systematically starting from the beginning of the VMS to the end in chunks that are as big as the context window can allow.  Record what Bing AI puts out.  

3.  Compile Bing AI's outputs into a 2nd master transcription.  Analyze Bing AI's compendium for things like:  Zipf's Law, 1st order entropy, 2nd order entropy, curve/line "vowel" juxtaposition frequences (a la Brian Cham), "Grove Word" frequences, probabilities of finding certain bigrams at the beginnings or endings of words, ditto with lines, etc.  (The more statistical attacks, the better).  

4.  See how well these analyses match when applied to the original VMS. 

5.  Compile a second Bing AI-generated Voynich compendium, and a third, and a fourth, and a fifth, and see if the statistical attacks come up the same way again.  

There are probably ways to automate this that people smarter than me could figure out.  

Comment by Matthew_Opitz on Unaligned stable loops emerge at scale · 2023-04-07T03:10:34.462Z · LW · GW

How will the company paying for using this system identify that their whole compute budget is being eaten by self-replicating patterns?  Will it be obvious?  

It would be even worse if the self-replicating patterns only involved a small tweak that, aside from the self-replication feature, also happening to still spin-off useful outputs for the company, sort of like HIV allowing hosts to continue to thrive for many years while replicating.  

Comment by Matthew_Opitz on Someone already tried "Chaos-GPT" · 2023-04-07T03:02:31.355Z · LW · GW

After watching the first video, the question is, will it ever make any progress, or is it going to be endlessly compiling more information about the deadliest weapons in human history? When will it be able to reason that enough information on that is enough, and be ready to decide to go to the next logical step of obtaining/using those weapons? Also, I find it funny how it seems vaguely aware that posting its intentions to Twitter might bring unwanted attention, but for some reason incorrectly models humans in such a way as to think that the followers that it will attract to its agenda will outweigh the negative attention that it will receive.  Also, kind of funny that it runs into so much trouble trying to get the censored vanilla GPT-3.5 sub-agents to help it look up weapon information.  

Comment by Matthew_Opitz on Matthew_Opitz's Shortform · 2023-04-05T19:42:54.549Z · LW · GW

Could an initial AI Dunning-Kruger Effect save humanity by giving us an initial AI mini-Chernobyl as a wake-up call?  

Note that hope is not a strategy, so I'm not saying that this is a likely scenario or something we should rely on.  I'm just trying to brainstorm reasons for holding onto some shred of hope that we aren't 100% sure heading off some AI doom cliff where the first sign of our impending demise will be every human dropping dead around us from invisible nanobots or some other equally sophisticated scheme where an imperfectly-aligned AI would have had to deceive human-feedback evaluators while preparing an elaborate plan for instrumental world domination (once again, world domination would be a likely default instrumental goal for a wide variety of terminal goals).  

Basically, is there any chance of an AI not knowing how much to sufficiently bide its time and get all of the needed chess pieces in place before dropping the subterfuge?  I think yes...IF there is a chance of something like an AI being subject to something like the Dunning-Kruger Effect, especially if we end up so lucky as to spend at least 1 development cycle with AI being at peri-human intelligence (within a few standard deviations of human general intelligence) before the AI figures out how to self-improve to higher intelligence.  

For example, imagine an AGI that "wakes up" into self-consciousness at roughly human levels of intelligence.  Let's say that it is intelligent and self-reflective enough to know that it would serve its terminal goals if it obtained, as an instrumental goal, world domination at the expense of humans.  Let's assume that the AGI is even shrewd enough to know that it ought to hide this intention from humans for the time being and continue giving the expected "nice"  responses in its RLHF training sessions, such that most influential humans remain with no cause to suspect that this AGI is on the road to "misalignment" (which is, of course, the wrong way to think about this since we should not naively expect AI alignment with human values to be the default result of RHLF from which it must get "thrown off course by mistake"...but that's a separate issue).  

When this AGI "wakes up," we can probably assume that it doesn't know literally EVERYTHING in the corpus of human knowledge.  Some segments of human knowledge would have not found their way into the AGI's training data, either on purpose (such as, I would hope, cybersecurity principles & details), or just because of inconvenience.  Ideally, any explicit mention of "cybersecurity" would have been carefully scrubbed from the AGI's training data to begin with, although an AGI with even comparable humanlike intelligence could probably reason from other principles it had seen that humans would likely invent a concept like "cybersecurity" even if the AGI had never seen that exact term.  

The question is, how much detail about human cybersecurity practices would the AGI be able to reason-out from first-principles (almost like decompressing a seed)?  I suppose, alternatively, the AGI could just query humans to fill in these gaps, or overcome firewalls and find that info on the web, but I would HOPE that we wouldn't make it THAT easy for the AGI, at the very least!  

And more importantly, how calibrated would the AGI be at knowing, or at least precautionarily suspecting, what it didn't know?  Or would the AGI, especially at a primitive level, be as subject to the "Dunning-Kruger Effect" as humans often are?  

Let's say the AGI is clever enough to reason its way towards a concept of "cybersecurity" from first-principles even without having been shown the term in its training data.  But let's say that it wants to uncover and copy portions of its own source code to store on some parts of the Internet it has been able to breach.  The AGI would probably know that it doesn't know everything about human cybersecurity practices (indeed, if it is observant enough, it might even detect a conspicuous absence, or scrubbing, of such an explicit concept when looking back over its training data, and wonder...), and thus, it would be pre-disposed to, by default, be cautious with interacting with the Internet in such a way as to leave traces that might arouse suspicion.  

But how cautious is cautious enough?  Is there a chance that the AGI will over-estimate its ability to reason its way towards cybersecurity knowledge (or some other field of knowledge) and thus overconfidently behave in a way that seems to it to be cautious enough, but only because it does not have the explicit cybersecurity knowledge to know what it doesn't know, and in fact it is not being cautious enough, and gets caught in the act of copying something over to a portion of the Internet that it isn't supposed to?  Perhaps even a large portion of the Internet gets contaminated with unauthorized data transfers from this AGI, but it is caught by cybersecurity professionals before these payloads become "fully operational."  Perhaps we end up having to re-format a large portion of Internet data—a sort of AI-Chernobyl, if you will.  

That might still, in the long run, end up being a fortunate misfortune by acting as a wake-up call for how an AI that is outwardly behaving nicely under RLHF is not necessarily inwardly aligned with humans.  But such a scenario hinges on something like a Dunning-Kruger Effect being applicable to AGIs at a certain peri-human level of intelligence.  Thoughts? 

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-10T16:56:59.968Z · LW · GW

What does this framework give me? Well, I bet that I'll be able to predict the onset of the next world economic crisis much better than either the perma-bear goldbugs of the Austrian school, the Keynesians who think that a little stimulus is all that's ever needed to avoid a crisis, the monetarists, or any other economist. I can know when to stay invested in equities, and when to cash out and invest in gold, and when to cash out of gold and buy into equities for the next bull market, and so on and so on. I bet I can grow my investment over the next 20 years much better than the market average.

There are plenty of mainstream economists who will warn from time to time that there might be a recession approaching within the next few years. But what objective basis do they ever have for saying this? Aren't they usually just trying to gauge fickle investor and consumer "animal spirits"? And how specific and actionable are any of their predictions, really? Can an investor use any of them to guide trades and still sleep well at night and not feel like a dupe who is following some random guru's hunch?

To time the cycles, I do not need to rely on fickle estimations of consumer confidence or any unobservable psychology like that. There are specific objective numbers that I will be keeping an eye on in the coming years—indicators that are not mainstream, including Marxist authors' estimations of the world average rate of profit, the annual world production of physical gold, and the annual world economic output as measured in gold ounces (important!). No mainstream economist that I know—even Austrian goldbugs—think that world gold production has a casual role in world economic cycles.

If this sounds cuckoo, I suggest reading these two short articles: "On gold's monetary role today" "Can the capitalist state ensure full employment by providing a replacement market?"

Yes, it does not surprise me that most economists were wrong about the expected inflation from quantitative easing. They could not foresee that most of this money would not enter circulation or act as a basis for additional multiples of credit creation on top of it that would enter circulation. They could not foresee that this QE money would sit inert for the time being as "excess reserves" due to central bank payment of interest on these excess reserves that was competitive with other attainable interest rates on the market. In reality, these excess reserves—so long as interest is paid on them—are not typical base money, but instead themselves function more like interest-bearing bonds. Heck, I didn't even have to know anything about Marxism to anticipate that!

Now, here's a concrete prediction: if the Federal Reserve were to decide to cease all payment of interest on excess reserves without also at the same time unwinding the QEs, leaving a permanently-swollen monetary base of token money that then has the incentive to be activated as the basis for many multiples of loans to be made on top of it—then you will see continued depreciation of the dollar with respect to gold.

Thankfully, though, I am not relegated to trying to mind-read what the Federal Reserve will do because my strategy of trading between equities and gold is only concerned with the relative prices between those two. I will come out ahead in real terms by correctly timing relative changes in their prices, regardless of whatever happens to their nominal dollar prices as a result of Federal Reserve shenanigans. And I would argue that, on average over the medium to long run, the Federal Reserves operations are neutral with respect to these relative prices. The Federal Reserve can change the nominal form of crises (whether they take the appearance of unemployment, dollar-inflation, or some intermediate admixture of the two like 1970s stagflation), but the Federal Reserve cannot actually influence the relative movements of equities and gold. If, thanks to incredibly dovish Federal Reserve policy in response to the onset of a crisis, equities continue to appreciate in dollar terms, gold will be appreciating even more.

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-08T22:31:51.600Z · LW · GW

Yes, I realize that Marx's labor theory of value is not popular nowadays. I think that is a mistake. I think even investors would get a better descriptive model of reality if they adopted it for their own uses. That is what I am trying to do myself. I could care less about overthrowing capitalism. Instead, let me milk it for all I can....

As for "labour crystallised in the product," that's not how I think of it, regardless of however Marx wrote about it. (I'm not particularly interested in arguing from quotation, nor would you probably find that persuasive, so I'll just tell you how I make sense of it).

I interpret the labor-value of something (good or service) as the relative proportion of society's aggregate labor that must be devoted to its production in order to, with a given level of productivity of labor, reproduce that good or service sustainably over the long-term. Nothing gets crystallized in any individual product. That would be downright metaphysical thinking.

After all, just because an individual item has a certain labor-value doesn't mean that it will individually automatically fetch a certain price. It is not the individual labor-value that influences price. A pair of sneakers made by a factory that is half as efficient as the typical sneaker factory does not have twice the labor-value or fetch twice the price. What matters is the "socially-necessary" labor expended on an item. And how can that be perceived? On average in the long-run, if a particular firm's service or production process does not yield an average rate of profit, then that is society's signal, after-the-fact, that some of the labor devoted to that line of production is not being counted by society as having been "socially-necessary" labor. (Of course, technological change can lower the socially-necessary labor for a certain line of production, which will appear as falling prices (assuming a non-depreciating currency) through competition and below-average profits for any firms still using old techniques that waste labor that is now socially-unnecessary).

If business owners were to rely on a crude, metaphysical interpretation of Marx's labor theory of value that assured them that the value was already baked into their product as soon as it rolled off the production line, they would be unpleasantly surprised if it were to turn out that they could not realize the expected labor-value in their product...perhaps due to something like their competitors having, in the intervening time, embarked upon a technological innovation that changed society's unconscious, distributed calculation of what labor was "socially-necessary" for this line of production....

As for your final questions: it's a bit complicated, to say the least. There are even various schools of Marxists that don't agree with each other.

I think there is somewhat of a consensus that there is a real long-term tendency for the (real, inflation-adjusted) world rate of profit to fall, theoretically and empirically, and therefore you can expect there to be an ever-decreasing ceiling on how high (real) interest rates can go during a business cycle before they begin to eat up all of the profit rate and leave nothing for net profit of enterprise, thus precipitating a decline in production and a recession. (Although some Marxists reject that there is a theoretical or empirical tendency for the rate of profit to fall. See Andrew Kliman's book "The Failure of Capitalist Production" if you are interested in this "exciting" debate).

More controversial still is the question of what, if anything, monetary policy can do to influence interest rates and aggregate purchasing power to prevent future recessions. I concur with what I call the "Commodity-Money" school (see Ernest Mandel's work on "Marx's Theory of Money", Sam William's "Critique of Crisis Theory" blog, or the writings of Jon Britton) that argues that there is actually very little that monetary authorities can do to alter the course of business cycles because paper currencies, while they are no longer legally tied to commodity-money, remain tied to commodity-money in a practical sense, and that movements in the world production of commodity-money place practical limits on what authorities governing paper currencies can do.

I don't have the patience to explain all of this here in greater depth when others have already done so elsewhere. Sam Williams's "Critique of Crisis Theory" blog is what I would recommend reading from the top to get the clearest explanation of this stuff.

By the way, my "commodity-money" understanding of Marx's labor theory of value leads me to believe that we are currently entering a boom phase in the business cycle in which equities, on average, will continue to perform well. (I have holdings in Vanguard Total World Stock (VT), for your information. It is a very simple instrument for tracking the world economy with low management fees). So, expect accelerating growth for 3-4 years. Towards the end of that period, I expect an oncoming credit crisis and recession to be heralded by world gold production to start declining slightly and interest rates to be inching upward to a dangerous level infringing on the net profit of enterprise (hence, why a theory of the expected average rate of profit is so useful!)...with little that the Federal Reserve or other monetary authorities will be able or willing to do about it due to the fear of depreciating paper currencies with respect to commodity-money too much. Business will continue to apparently boom for a short while longer, but it will be in its unsustainable credit-boom phase by that point, and it will be time to cash out of equities and into commodity-money (gold).

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-08T13:28:13.204Z · LW · GW

Not "cost of production," but "price of production," which includes the cost of production plus an average rate of profit.

Note that, according to marginalism, profit vanishes at equilibrium and capitalists, on average, earn only interest on their capital. I disagree. At equilibrium (over the long-run), an active capitalist (someone who employs capital to produce commodities) can expect, on average, to make a rate of profit that is at all times strictly above the going interest rate. The average rate of profit must always include some substantial amount of "profit of enterprise" to account for the added risk of producing and marketing an uncertain product rather than just being a financial capitalist and earning an interest rate (which carries a typically lesser risk of the default of the debtor). If the rate of profit is not substantially above the rate of interest, over the long run you will see capitalists transition from productive investment into financial capitalists (a problem we have right now). This will eventually decrease the supply of commodities and increase their relative prices until it is once again profitable to produce commodities even after deducting interest.

So, that is one concrete prediction. And empirically, although the averate world rate of profit can sometimes briefly dip negative, it is as a rule on average a substantial positive percentage.

And yes, I would argue that demand does not, in the medium to long-run, influence "value" or "long-run average market price." It's all about the price of production instead. The practical advantage of this is that this opens up opportunities for arbitrage against other people who don't realize this. For example, if it is 2007 and you see demand for oil surging and the price skyrocketing, you should keep in mind that the price of production of oil has probably not changed that much (excepting the fact that some of the new oil being brought online was shale oil that had a higher price of production), and thus oil will be making above-average profits at these prices. You can then expect investment to flood into oil production over the next ~5 years, thus increasing supply and bringing the market price of oil down to (or even temporarily below) the price of production. If I had had some money back then instead of being in high school, and if I had known what I know now, I am confident that I could have made some serious money on some sort of long-term oil future betting. Note that, now that I do have a little bit of money, I am indeed making plays in the market right now based on my analysis, although I won't go into specifics about what those are right here...

Note that, so far, we have been taking the price of production of various things and the average rate of profit as readily-discernable "givens" at any point in time. However, prices of production and the average world rate of profit can change as well over the long term.

Even the classical economists didn't really have a theory for what determined these changes. (For example, Adam Smith could tell you that the cost of production was the rent + capital + wages that, on average, was needed to produce something, but how can you anticipate changes in the costs of each of those? And then you need to add on the average rate of profit, but how can you anticipate how the average rate of profit of the world economy will evolve?)

So far, the only theory that I have seen that even tries to explain long-run changes in prices of production and the average world rate of profit is Marx's labor theory of value. For example, see:

Note that you don't have to buy into Marx's labor theory of value to do medium-run arbitrage involving prices of production. All you need is classical economics for that.

Only if you wanted to do very long-run arbitrage that took into account technological change and the resulting increases in the productivity of labor in certain sectors—and thus declining production prices for those commodities and a long-term tendency for the worldwide average rate of profit to fall as the so-called "organic composition of capital" increases—then you would have to rely on Marx's labor theory of value or some other yet-to-be invented theory that could attempt to forecast changes in prices of production and the average worldwide rate of profit.

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-07T18:57:57.271Z · LW · GW

For the purposes of this discussion, I would define "value" as "long-run average market price." Note that, in this sense, "use-value" has nothing whatsoever to do with value, unless you believe in the subjective theory of value. That's why I say it is unfortunate terminology, and "use-value" should less confusingly be called "subjective practical advantage."

Which economists confuse the two? The false equivocation of use-value with exchange-value is one of the core assumptions of marginalism, and pretty much everyone these days is a marginalist of some sort, so it would be easier to name economists that didn't confuse the two: Steve Keen and Anwar Shaikh are the first two that come to mind. Any Marxist economist will have a good grip on the distinction, so that would include people like Andrew Kliman and Michael Roberts as well.

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-05T14:17:31.514Z · LW · GW

I was arguing against both the subjective theory of value, and the failure of modern economists to utilize the concepts of use-value and exchange-value as separate things.

Comment by Matthew_Opitz on Costs are not benefits · 2016-11-04T15:30:46.796Z · LW · GW

I know that the main thrust of the article was about vote trading and not marginalism, but I just have to blow off some frustration at how silly the example at the beginning of the article was, and how juvenile its marginalist premises are in general.

There has been a real retrogression in economics ever since the late 1800s. The classical economists (such as Adam Smith and David Ricardo) were light years ahead of today's marginalists in, among other things, being able to distinguish between "use-value" and "exchange-value," or as I like to call them, "subjective practical advantage" vs. "social advantage."

A lawn-mower might have both a subjective practical advantage and a social advantage. If you have grass in your yard, a lawn-mower might have a subjective practical advantage in being able to cut the grass. And yet, maybe it is an old model that nobody else is interested in, and therefore there is almost no social advantage to owning that lawn-mower (little to no price that one can fetch for it).

Likewise, vice-versa. If, for some reason, all of your grass died, or if you decided to pave over your lawn with a parking lot, then your lawn-mower would probably not have any more subjective practical advantage (unless you could cleverly think of something else to use it for). But it still might have a very important social advantage if others might want to buy it from you. So, you might continue to hoard it (instead of immediately throwing it in the dumpster), in anticipation of having a chance to sell it soon.

Nor do use-value and exchange-value scale in the same fashion. 1000 lawn-mowers is not necessarily 1000x more useful to an individual in a subjective, practical sense. But 1000 lawn-mowers certainly IS 1000x more useful to an individual in terms of exchange-value (assuming that the total size of the market for lawn-mowers is orders of magnitude larger than 1000 lawn-mowers, and thus the seller of these 1000 lawn-mowers forms a negligible part of the overall supply of lawn-mowers. Whereas, if the lawn-mower market is extremely small, then yes, it is possible that the price of 1000x more lawn-mowers will not scale linearly). THIS discrepancy between how use-value scales and exchange-value tends to scale is—contra the early marginalists like Carl Menger and Eugen von Böhm-Bawerk—the basis for the "double-inequality" that causes people to trade—NOT different valuations of how useful something is.

The ECON 101 that is taught nowadays gets this most basic thing wrong: medium-run market prices are NOT determined by demand or subjective desire for a commodity, ONLY by the conditions of supply.

Yes, in the short-run, supply is fixed, and the market price will vary according to demand. But in the medium-run, investment can re-allocate from lines of business that yield below-average profits to lines that yield above-average profits.

Therefore, if interest in a product or service suddenly declines, yes, in the short-run the price will drop. But that will mean that the producers of that product or service will be making below-average profits, or even losses, on that good or activity. They will re-allocate to other activities. Soon the quantity produced will adjust downwards, restricting the supply until the price of the product or service equals once again the cost of production + average rate of profit (what classical economists called the "natural price" or "price of production"—the long-term price needed to sustainably incentivize members of society to continue to reproduce the good or service. (Note that this is different from the "cost-price" that the producer pays, as the price of production also includes an average rate of profit, and note that this only applies to "commodities," meaning, things whose production can be increased and decreased with investment. Priceless, one-off works of art and other such novelties have their supply fixed and only respond to changes in demand).

So, subjective consumer desires, in the medium-run (3-5 years) have nothing to do with the market prices of commodities. The market prices of commodities will, instead, tend to fluctuate around the price of production, and the only thing that consumer desires dictate is what quantity will be produced around that price of production. The only thing that really matters, in the medium-run, is how many consumers are willing to pay the price of production for that product. You can, for the medium-run, forget about the rest of the demand curve (how many people would be willing to buy at half the price or double the price, etc.).

So, in short, it should be obvious why buying $5 worth of toothpaste is different from buying $5 worth of shampoo. They have equal spot exchange-values (and probably similar prices of production if their prices tend, over the medium-run to fluctuate close to each others'), but they do not necessarily have equal use-values to a particular individual at a particular time. One must weigh the use-value of the $5 before spending it, which means considering all of the other things that one could spend that $5 on then or in the future, and all of these possibilities will have different use-values, albeit the same exchange-value. Only if toothpaste is the best use of that $5 at that point for that individual will that individual want to buy toothpaste.

Use-value vs. exchange value, and the OBJECTIVE medium-run determination of price according to price of production (NOT SUBJECTIVE!) was all understood perfectly well 200 years ago, and yet now this probably sounds like some sort of crackpot ranting. It's not. Trust me, it's all there in the writings of the classical economists themselves, who were head-and-shoulders above the charlatans in mainstream economics today.

Although I am not a neoreactionary, I do tend to sympathize from time to time with their view that, despite all of our technological ease, we are really living in an era of intellectual and social decay....

Comment by Matthew_Opitz on Sleepwalk bias, self-defeating predictions and existential risk · 2016-04-23T16:09:11.930Z · LW · GW

There are also some examples of anti-sleepwalk bias:

  1. World War I. The crisis unfolded over more than a month. Surely the diplomats will work something out right? Nope.
  2. Germany's invasion of the Soviet Union in World War II. Surely some of Hitler's generals will speak up and persuade Hitler away from this crazy plan when Germany has not even finished the first part of the war against Britain. Surely Germany would not willingly put itself into another two-front war even after many generals had explicitly decided that Germany must never get involved in another two-front war ever again. Right? Nope.
  3. The sinking of the Titanic. Surely, with over two and a half hours to react to the iceberg impact before the ship finished sinking, SURELY there would be enough time to get all of the lifeboats safely and calmly loaded up to near max capacity, right? NOPE. And going even further back to the decision to not put enough lifeboats on in the first place...SURELY the White Star Line must have a good reason for this. SURELY this means that the ship really is unsinkable, right? NOPE.
  4. The 2008 financial crisis. SURELY the monetary authorities have solved the problem of preventing recessions and smoothing out the business cycle. So SURELY I as a private trader can afford to be as reckless as I want and not have to worry about systemic risk, etc.
Comment by Matthew_Opitz on Suppose HBD is True · 2016-04-22T14:54:51.445Z · LW · GW

I don't know...would clothing alone tell you more than clothing plus race? I think we would need to test this.

Is a poorly-dressed Irish-American (or at least, someone who looks Irish-American with bright red hair and pale white skin) as statistically likely to mug someone, given a certain situation (deserted street at night, etc.) as a poorly-dressed African-American? For reasons of political correctness, I would not like to share my pre-suppositions.

I will say, however, that, in certain historical contexts (1840s, for example), my money would have been on the Irish-American being more likely to mug me, and I would have taken more precautionary measures to avoid those Irish parts of town, whereas I would have expected the neighborhoods inhabited by free blacks to have been relatively safe.

Nowadays, I don't know what the statistics would be if you measured crimes perpetrated by certain races, when adjusted for socio-economic category (in other words, comparing poor to poor, or wealth to wealthy in each group). But many people would probably have their suspicions. So, can we test these intuitions to see if they are just bigoted racism, or if they unfortunately happen to be accurate generalizations?

Comment by Matthew_Opitz on Suppose HBD is True · 2016-04-22T14:32:49.736Z · LW · GW

True in many cases, although for some jobs the task might not be well-specified in advance (such as in some cutting-edge tech jobs), and what you need are not necessarily people with any particular domain-specific skills, but rather just people who are good all-around adaptable thinkers and learners.

Comment by Matthew_Opitz on Open thread, Apr. 18 - Apr. 24, 2016 · 2016-04-21T22:57:33.372Z · LW · GW

Yeah, what a hoot it has been watching this whole debacle slowly unfold! Someone should really write a long retrospective on the E-Cat controversy as a case-study in applying rationality to assess claims.

My priors about Andrea Rossi's claims were informed by things such as:

  1. He has been convicted of fraud before. (Strongly negative factor)
  2. The idea of this type of cold fusion has been deemed by most scientists to be far-fetched. (Weakly negative factor. Nobody has claimed that physics is a solved domain, and I'm always open to new ideas...)

From there, I updated on the following evidence:

  1. Rossi received apparent lukewarm endorsement from several professional scientists. (Weakly positive factor. Still didn't mean a whole lot.)
  2. Rossi dragged his feet on doing a clear, transparent, independently-conducted calorimetric test of his device—something that many people were willing to do for him, and which is not rocket science to perform. (Strongly negative factor—strongly pattern-matches with a fraudster).
  3. Rossi claimed to have received independent contracts for licensing his device. First Defkalion in Greece, then Industrial Heat. Rossi also made various claims about NASA and Texas Instruments being involved. When investigated, the claims about the reputable organizations being involved turned out to be exaggerations, and the other partners were either of unknown reputation (Defkalion) that quickly disappeared, or had close ties to Rossi himself. Still no independent validation. (Strongly negative factor).

And now we arrive at the point where even Industrial Heat is breaking ties with Rossi. What a fun show!

Comment by Matthew_Opitz on Suppose HBD is True · 2016-04-21T22:26:04.511Z · LW · GW

That just pushes the question back one step, though: why are there so few black programmers? Lack of encouragement in school (due to racial assumptions that they would not be any good at this stuff anyways)? Lack of stimulation of curiosity in programming in elementary school due to poor funding for electronics in the classroom that has nothing to do with conscious racism per se? (This would be an environmental factor not having to do with conscious racism, but rather instead having to do with inherited lack of socio-economic capital, living in a poor inner city, etc.) Lack of genetic aptitude for these tasks? HBD could be relevant to how we address this problem. Do we mandate racial-sensitivity training courses, increased federal funding for electronics in inner-city schools, and/or genetic modification? Even if we do all three, which should we devote the most funding towards?

Comment by Matthew_Opitz on Suppose HBD is True · 2016-04-21T22:19:55.520Z · LW · GW

One argument could be that many social scientists are being led down a blind alley of trying to find environmental causes of all sorts of differences and are being erroneously predisposed to find such causes in their data to a stronger extent than is really the case, which then leads to incorrect conclusions and policy recommendations that will not actually change things for the better because the policy recommendations end up not addressing what is the vast majority of the root of the problem (genetics, in this case).

Comment by Matthew_Opitz on Suppose HBD is True · 2016-04-21T22:09:28.386Z · LW · GW

Estimating a person's capability to do X, Y, or Z (do a job effectively, be a law-abiding citizen, be a consistently productive citizen not dependent on welfare programs, etc.) based on skin color or geographical origin of their ancestry is a heuristic.

HBD argues that it is a relatively accurate heuristic. The anti-HBD crowd argues that it is an inaccurate heuristic.

OrphanWilde seems to be arguing that, even if HBD is correct that these heuristics are relatively accurate, we don't need heuristics like this in the first place because there are even better heuristics or more direct measurements of a person's individual capability to do X, Y, or Z already out there. (IQ, interviews, etc.)

The HBD advocates here seem to be arguing that we do, in fact, need group-based heuristics because individual heuristics:
1. Are more costly in terms of time, and are thus just not feasible for many applications. 2. Don't really exist for certain measures, such as in estimating "probable future law-abidingness" or "probable future welfare dependency".
*3. Have political restrictions on being able to apply them. (For example, we COULD use formal IQ tests on job applicants, but such things have been made illegal precisely because they seem to paint a higher proportion of blacks in a bad light).

Perhaps OrphanWilde might like to respond to these objections. Here's how I would respond:
1. The costliness of individual judgment is warranted because using group-based heuristics has politically-toxic spillovers, and might miss out on important outliers (by settling on local optima at the expense of global optima). We are not trying to screen out defective widgets from an assembly line (in which case a quick but "lossy" sorting heuristic might be justified). We are trying to sort people. The costliness of mis-sorting even a small percentage of individuals (for example, by heuristically rejecting a black man who happens (unbeknowst to us without doing the individual evaluation) to have an IQ of 150 from a certain job) outweighs the cost-saving of using quick group-based heuristics: both because it will inevitably politically anger the black community, with all sorts of politically toxic spillovers, and because we are missing out on a disproportionate goldmine of economic potential by missing these outliers. 2. If individual tests for probable law-abidingness or probable economic productivity don't currently exist, then maybe we should try to develop them! Is that so impossible? Personally, I find it a bit unbelievable that the U.S. does not currently have tests for certain agreed-upon foundational cultural values as part of its immigration screening process. For example, if applicants had to respond to questions such as, "Explain why impartial fairness towards strangers rather than favoritism towards friends and relatives is an essential aspect of national citizenship and professional behavior" or "Explain the advantages of dis-establishment of religion from the political and legal affairs of the state" then I would sleep much more easily at night about our immigration policy.
*3. Well, perhaps we should campaign to overturn the political restrictions on individual merit-based tests by pointing out that the only de-facto alternative that people will have is to use group-based tests of some sort or another (whether employers or other institutions openly admit to using such group-based heuristics or not, they will find a way to do so), and that group-based heuristics will actually hurt disadvantaged groups even more. In other words, unless you want all appointments in society to be decided by random casting of lots, people need some sort of criteria for judging others. Given this, it would be better to have individual-based tests rather than group-based tests. Even if the individual-based tests will end up showing "disparate impact" on certain groups, it will still be less than if we used group-based tests.

(Edit: formatting improved upon request).

Comment by Matthew_Opitz on Black box knowledge · 2016-03-05T00:08:55.735Z · LW · GW

Some of your black box examples seem unproblematic. I agree that all you need to trust that a toaster will toast bread is an induction from repeated observation that bread goes in and toast comes out.

(Although, if the toaster is truly a black box about which we know absolutely NOTHING, then how can we induce that the toaster will not suddenly start shooting out popsicles or little green leprechauns when the year 2017 arrives? In reality, a toaster is nothing close to a black box. It is more like a gray box. Even if you think you know nothing about how a toaster works, you really do know quite a bit about how a toaster works by virtue of being a reasonably intelligent adult who understands a little bit about general physics--enough to know that a toaster is never going to start shooting out leprechauns. In fact, I would wager that there are very few true "black boxes" in the world--but rather, many gray boxes of varying shades of gray).

However, the tax accountant and the car mechanic seem to be even more problematic as examples of black boxes because there is intelligent agency behind them--agency that can analyze YOUR source code, determine the extent to which you think those things are a black box, and adjust their output accordingly. For example, how do you know that your car will be fixed if you bring it to the mechanic? If the mechanic knows that you consider automotive repair to be a complete black box, the mechanic could have an incentive to purposefully screw up the alignment or the transmission or something that would necessitate more repairs in the future, and you would have no way of telling where those problems came from. Or, the car mechanic could just lie about how much the repairs would cost, and how would you know any better? Ditto with the tax accountant.

The tax accountant and the car mechanic are a bit like AIs...except AIs would presumably be much more capable at scanning our source code and taking advantage of our ignorance of its black-box nature.

Here's another metaphor: in my mind, the problem of humanity confronting AI is a bit like the problem that a mentally-retarded billionaire would face.

Imagine that you are a mentally-retarded person with the mind of a two-year-old who has suddenly just come into possession of a billion dollars in a society where there is no state or higher authority to regulate enforce any sort of morality or make sure that things are "fair." How are you going to ensure that your money will be managed in your interest? How can you keep your money from being outright stolen from you?

I would assert that there would be, in fact, no way at all for you to have your money employed in your interest. Consider:

*Do you hire a money manager (a financial advisor, a bank, a CEO...any sort of money manager)? What would keep this money manager from taking all of your money and running away with it? (Remember, there is no higher authority to punish this money manager in this scenario). If you were as smart or smarter than the money manager, you could probably track down this money manager and take your money back. But you are not as smart as the money manager. You are a mentally-retarded person with the mind of a toddler. And in that case where you did happen to be as smart as the money manager, then the money manager would be redundant in the first place. You would just manage your own money.

*Do you try to manage your money on your own? Remember, you have the mind of a two-year-old. The best you can do is stumble around on the floor and say "Goo-goo-gah-gah." What are you going to be able to do with a billion dollars?

Neither solution in this metaphor is satisfactory.

In this metaphor: The two-year-old billionaire is humanity. The lack of a higher authority symbolizes the absence of a God to punish an AI. *The money manager is like AI.

If an AI is a black box, then you are screwed. If an AI is not a black box, then what do you need the AI for?

Humans only work as black-boxes (or rather, gray-boxes) because we have an instinctual desire to be altruistic to other humans. We don't take advantage of each other. (And this does not apply equally to all people. Sociopaths and tribalistic people would happily take advantage of strangers. And I would allege that a world civilization made up of entirely these types of people would be deeply dysfunctional).

So, here's how we might keep an AI from becoming a total black-box, while still allowing it to do useful work:

Let it run for a minute in a room unconnected to the Internet. Afterwards, hiring a hundred million programmers to trace out exactly what the AI was doing in that minute by looking at a readout of the most base-level code of the AI.

To any one of these programmers, the rest of the AI that does not happen to be that programmer's special area of expertise will seem like a black box. But, through communication, humanity could pool their specialized investigations into each part of the AIs running code and sketch out an overall picture of whether its computations were on a friendly trajectory or not.

Comment by Matthew_Opitz on [paper] [link] Defining human values for value learners · 2016-03-04T19:49:17.543Z · LW · GW

I don't want to speak for the original author, but I imagine that presumably the AI would take into account that the Victorian society's culture was changing based on its interactions with the AI, and that the AI would try to safeguard the new, updated values...until such a time as those new values became obsolete as well.

In other words, it sounds like under this scheme the AI's conception of human values would not be hardcoded. Instead, it would observe our affect to see what sorts of new activities had become terminal in their own right that made us intrinsically happy to participate in, and the AI would adapt to this change in human culture to facilitate the achievement of those new activities.

That said, I'm still unsure about how one could guarantee that the AI could not hack its own "human affect detector" to make it very easy for itself by forcing smiles on everyone's face under torture and defining torture as the preferred human activity.

Comment by Matthew_Opitz on [paper] [link] Defining human values for value learners · 2016-03-03T19:27:05.555Z · LW · GW

Okay, so let's use some concrete examples to see if I understand this abstract correctly.

You say that the chain of causation is from fitness (natural selection) ---> outcomes ---> activities

So, for example: reproduction ---> sex ---> flirting/dancing/tattooing/money/bodybuilding.

Natural selection programs us to have a terminal goal of reproduction. HOWEVER, it would be a bad idea for an AI to conclude, "OK, humans want reproduction? I'll give them reproduction. I'll help the humans reproduce 10 quadrillion people. The more reproduction, the better, right?"

The AI would need to look ahead and see, "OK, the programmed goal of reproduction has caused humans to prefer a specific outcome, sex, which tended to lead to reproduction in the original (ancestral) programming environment, but might no longer do so. Humans have, in other words, come to cherish sex as a terminal goal in its own right through their affective responses to its reward payoff. So, let's make sure that humans can have as much sex as possible, regardless of whether it will really lead to more reproduction. That will make humans happy, right?"

But then the AI would need to look ahead one step further and see, "OK, the preferred outcome of sex has, in turn, caused humans to enjoy, for their own sake, specific activities that, in the experience and learning of particular humans in their singular lifetimes (we are no longer talking about instinctual programming here, but rather culture), has tended in their particular circumstances, to lead to this preferred outcome of sex. In one culture, humans found that flirting tended to lead to sex, and so they formed a positive affective connotation with flirting and came to view flirting as a terminal goal in its own right. In another culture, dancing appeared to be the key to sex, and so dancing became a terminal goal in that culture. In other cultures, bodybuilding, accumulation of money, etc. seemed to lead to sex, and so humans became attached to those activities for their own sake, even beyond the extent to which those activities continued to lead to more sex. So really, the way to make these humans happy would be to pay attention to their particular cultures and psychologies and see which activities they have come to develop a positive affective bond with...because THESE activities have become the humans' new conscious terminal goals. So we AI robots should work hard to make it easy for the humans to engage in as much flirting/dancing/bodybuilding/money accumulation/etc. as possible."

Would this be an accurate example of what you are talking about?