Open & Welcome Thread - August 2020

post by habryka (habryka4) · 2020-08-06T06:16:50.337Z · LW · GW · 101 comments

Contents

101 comments

If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)

And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, [? · GW] checking recent Curated posts [? · GW], seeing if there are any meetups in your area [? · GW], and checking out the Getting Started [LW · GW] section of the LessWrong FAQ [LW · GW]. If you want to orient to the content on the site, you can also check out the new Concepts section [? · GW].

The Open Thread tag is here [? · GW].

101 comments

Comments sorted by top scores.

comment by Benjamin Kost (benjamin-kost) · 2024-08-03T18:24:44.994Z · LW(p) · GW(p)

Hello all, and thank you to everyone who helps provide this space. I am glad to have discovered LW. My name is Benjamin. I am a philosopher and self guided learner. I just discovered LW a short while ago and I am reading through the sequences. After many years of attempting to have productive conversations to solve problems and arrive at the truth via social media groups (which is akin to bludgeoning one’s head against the wall repeatedly), I gave up. I was recently recommended to join LW by Claude AI, and it seems like a great recommendation so far.

One of the things that I find discouraging about modern times is the amount of outright deception that is tolerated. Whether it is politics, business, institutions of science, interpersonal relationships, or even lying to oneself, deception seems to be king in our modern environment. I am a systemic thinker, so this seems like a terrible system to me. The truth is a better option for everyone but not as rewarding as deception on an individual actor level, and thus we have entered a prisoner’s dilemma situation where most actors are defectors.

I am interested in answering two questions related to this situation:

  1. How might we get to a higher trust environment than the current one with fewer defectors?
  2. What are the best strategies for navigating a low trust environment where there is a wealth of information that is mostly non credible?

I like the chances of solving this problem with AI, but I think government and corporations are going to try to centralize control of AI and prevent this from happening because both institutions mainly subsist on deception. I believe we are standing on a razors edge between true democracy and freedom and a centralized totalitarian oligarchy which largely depends on how things shake out with control over AI. I am a decentralist philosophically. I strongly believe in true democracy as opposed to false democracy as an applause light as it was aptly described in the sequences. I am in the process of writing a book on how to gain true democracy in the United States because I believe that the future of the world hinges on whether or not this can be accomplished.

I am also very open to counter-arguments. I have no desire whatsoever to cling to false beliefs, and I am happy to lose a debate because it means I learned something and became smarter in the process. In this sense, the loser of a debate is the real winner because they learned something while the winner only spent their time and energy correcting their false belief. However, winning has its own benefits in the form of a dopamine rush, so it is a positive sum game. I wish everyone had this attitude. Just know that if you can prove I am wrong about something, I won’t retreat into cognitive dissonance. Instead, I will just update my opinion(s).

I have a large number of ideas on how to affect positive change which I will be posting about, and any critical feedback or positive feedback is welcome. Thanks to everyone who contributes to this space, and I hope to have many cooperative conversations here in the future.

Replies from: habryka4, Double
comment by habryka (habryka4) · 2024-08-04T00:59:23.167Z · LW(p) · GW(p)

Welcome! Glad to have you around and it's kind of hilarious to see that you've been recommended to show up here by Claude.

I share a lot of your interests and am interested in seeing your writing on it!

comment by Double · 2024-08-05T01:13:38.649Z · LW(p) · GW(p)

I'm curious what you asked Claude that got you a recommendation to LessWrong. No need to share if it is personal.

I love your attitude to debate. "the loser of a debate is the real winner because they learned something." I need to lose some debates.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-05T05:36:07.143Z · LW(p) · GW(p)

I think debating is the best way to learn. I’ve always been somewhat cynical and skeptical and a critical thinker my whole life, so I question most things. Debating works better for me as a learning tool because I can’t be simply fed information like is done in public schools. I have to try to poke holes in it and then be convinced that it still holds water.

As for what I asked Claude, he actually recommended LW to me about 3 different times on 3 different occasions. I collaborate with him to refine my ideas/plans and he recommended finding human collaborators to help execute them here, Astral Codex, and effective altruism groups. The first time he described LW as a “rationalist” group and I mistook what that meant due to my philosophy background and was thinking “you mean like fans of Decarte and Kant?” and wasn’t very impressed (I consider myself more epistemically empiricist than rationalist). The second time I actually looked into it since he mentioned it more than once and realized that the word “rationalist” was being used differently than I thought. The third time I decided to pull the trigger and started reading the sequences and then made the intro post. So far, I haven’t read anything terribly new, but it’s definitely right up my alley. I’d already gotten to that type of methodological thinking by reading authors such as Daniel Kahneman, Karl Popper, and Nassim Taleb, or I would be enthralled, but I am really glad there is an internet community of people who think like that.

That said, I know AI safety is the hot topic here right now, and I am tech savvy but far from an AI expert. I find AI to already be incredibly useful in its current form (mostly LLMs). They are quite imperfect, but they still do a ton of sloppy thinking in a very short time that I can quickly clean up and make useful for my purposes so long as I prompt them correctly.

However, I think I have a lot to contribute to AI safety as well because much of the AI savior/disaster razor is hinging on social science problems. IMO, social sciences are very underdeveloped because few, if any people have looked at the biggest problems in ways which they could realistically be solved and/or are/were capable of imagining/designing social systems which would functionally modulate behaviors within social groups, are robust from being gamed by tyrants and antisocial personalities, have a non-catastrophic risk profile, and have any realistic chance of being implemented within the boundaries of current social systems. I believe I am up to the challenge (at least in the U.S.), but my first task is to convince a group of people with the right skills and mindsets to collaborate and help me pull it off. It will also take a lot of money for a startup that needs to be raised via crowdfunding so there aren’t any domineering interests. When I asked Claude where I might even begin to look for such help, he suggested here as the top choice 3 different times.

Whether it works out that way or not, I am glad I found LW. I only have my family, normies, and internet trolls to discuss serious topics with otherwise, and that gets exhausting.

Replies from: Double
comment by Double · 2024-08-06T01:38:18.273Z · LW(p) · GW(p)

Welcome! I hope you have Claude a thumbs up for the good response.

Everyone agrees with you that yeah, the “Rationalist” name is bad for many reasons including that it gives philosophers the wrong idea. If you could work your social science magic to change the name of an entire community, we’d be interested in hearing your plan!

I’d be interested in reading your plan to redesign the social system of the United States! I’ve subscribed to be notified to your posts, so I’ll hopefully see it.

Replies from: benjamin-kost, benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-06T19:24:27.267Z · LW(p) · GW(p)

I forgot to mention, my app would actually present a solution for the word “rationalist” being used to describe the community. One of the features that I plan to implement for it is what I call the jargon index filter which will Automatically replace jargon words and ambiguous words with more descriptive words that anybody can understand. I’ve found LLMs to be very useful for creating the jargon index, but it is a slow process that will take a lot of labor hours using an LLM such as Claude to make as many recommendations for easy to understand replacement words or short phrases that even a fourth grader could understand for complex or ambiguous words and picking the best one from the big list. I am planning to make the jargon index a wiki project and then the filter will use the index coupled with AI to analyze the paragraphs to find contextual meanings (for homographs) to replace every ambiguous or technical word in a given text with unique descriptive words or phrases that anyone with a 4th grade or higher level of education/cognitive ability could understand. To make genuine democracy work in practice, the general public will need to be smarter which is a pedagogical issue that I believe I have good solutions for.

Replies from: Double
comment by Double · 2024-08-07T00:01:21.360Z · LW(p) · GW(p)

A software that easily lets you see “what does this word mean in context” would be great! I often find that when I force click a word to see it’s definition, the first result is often some irrelevant movie or song, and when there are multiple definitions it can take a second to figure out which one is right. Combine this with software that highlights words that are being used in an odd way (like “Rationalist”) and communication over text can be made much smoother.

I don’t think this would be as great against “jargon” unless you mean intentional jargon that is deployed to confuse the reader (eg “subprime mortgages” which is “risky likely to fail house loans”).

I’m under the impression that jargon is important for communication among people who have understanding of the topic. “Matrix multiplication is a linear operation” is jargon-heavy and explaining what it means to a fourth grader would take probably more than 30 minutes.

Agree that more educated voters would be great. I wish that voters understood Pigouvian taxes. Explaining them takes 10 min according to YouTube. I’d love a solution to teach voters about it.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-07T16:40:13.668Z · LW(p) · GW(p)

I don’t expect the jargon filter to work perfectly to explain any concept, but I do expect it to make concepts easier to understand because learning new vocabulary is a somewhat cognitively demanding process, and especially so for some people. Memory works differently for different people, and different people have different confidence levels in their vocabulary skills, so the jargon heavy sentence you used above, while perfectly fine for communicating with people such as you and I, wouldn’t he good for getting someone less technically inclined to read about math or remember what that sentence means. It’s great that you gave me an example to work with though. I just went to Claude and used the process that I am talking about to give you an example and came back with this:

“Multiplying grids of numbers is a step-by-step process”

Can you see how that would be easier to understand at first glance if you were completely unfamiliar with linear algebra? It also doesn’t require memorizing new vocabulary. The way you put it requires an unfamiliar person to both learn a new concept and memorize new vocabulary at the same time. The way I put it doesn’t perfectly explain it to an unfamiliar person, but it gives them a rough idea that is easy to understand while not requiring that they take in any new vocabulary. Because it is less cognitively demanding, it will feel less daunting to the person you are trying to teach so as not to discourage them from trying to learn linear algebra.

I believe you also hit on something important when you mentioned jargon intended to confuse the reader. I suspect that is why a lot of jargon exists in the first place. Take binomial nomenclature for example. Why are biologists naming things using long words in a dead language? That only serves the purpose of making the information more daunting and less accessible to people with poor vocabulary memorization skills. That seems like elitism to me. It makes people who have capable vocabulary memorization skills feel smarter but is a terrible practice from a pedagogical and communication perspective. That said, I assume the majority of the problem is that the people creating these new words are just bad at naming and aren’t taking pedagogy or best communication practices into consideration, but elitism probably plays a role as well.

When I post other places, I purposely dumb down my vocabulary because it is better communication practice. I am not going to bother on LW because it probably would be worse communication for my target audience here anyway and it is extra work for me. (For example, I might use the phrase “teaching strategy” instead of the word “pedagogy”.

Replies from: Double
comment by Double · 2024-08-07T21:39:17.586Z · LW(p) · GW(p)

The translation sentence about matrices does not have the same meaning as mine. Yes, matrices are “grids of numbers”, and yes there’s an algorithm (step by step process) for matrix multiplication, but that isn’t what linearity means.

An operation A is linear iff A(x+y) = A(x) + A(y)

https://orb.binghamton.edu/cgi/viewcontent.cgi?filename=4&article=1002&context=electrical_fac&type=additional#:~:text=Linear operators are functions on,into an entirely different vector.

I asked a doctor friend why doctors use Latin. “To sound smarter than we are. And tradition.” So our words for medicine (and probably similar for biology) are in a local optima, but not a global optima. Tradition is a powerful force, and getting hospitals to change will be difficult. Software to help people read about medicine and other needlessly jargon-filled fields is a great idea.

(Putting evolutionary taxonomy information in the name of a creature is a cool idea though, so binomial nomenclature has something going for it.)

You don’t have to dumb down your ideas on LessWrong, but remember that communication is a difficult task that relies on effort from both parties (especially the author). You’ve been good so far. It’s just my job as your debate partner to ask many questions.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-08T03:54:23.252Z · LW(p) · GW(p)

I’m glad you like the idea. That was a good catch that I didn’t capture of the true meaning of linear very well. I was a little rushed before. That said, your definition isn’t correct either. Though it is true that linear functions have that property, that is merely the additivity property of a linear function which is just the distributive property of multiplication used on a polynomial. I also didn’t see where the linked text you provided even defines linearity or contains the additivity rule you listed. That was a linear algebra textbook chapter though, and I am still glad you showed me it because it reminded me of how I was great at math in college, but not at all because of the textbooks (which were very expensive!). I have rather good reading comprehension and college math textbooks might as well be written in another language. I learned the math 100% from the lectures and used the text books only to do the problems in the back and got an A in all 3 Calculus classes I took. I am pretty sure I could write a much easier to understand math textbook and I know it is possible because the software that teaches math isn’t nearly as confusingly worded as the textbooks.

This is how I would keep it as simple as possible and capture more of the original meaning:

Multiplying grids of numbers is a straight-line property process.

That said, point taken regarding math jargon being very challenging to descriptively reword as I suspect it will get a lot harder as the concepts get more complex. The point in my process isn’t to perfectly define the word but to use a descriptive enough word replacement that one’s brain more easily grabs onto it than it does with, for example, Latin terms of absurd length for anatomy like “serratus posterior inferior” which is a muscle I had trouble with recently. Just off the top of my head, I would just call that the lower ribcage stabilizer instead. That gives one a much better idea of where it is and what it does and would be much easier to remember and accurately label on a diagram for a quiz. However, with such abstract concepts like math deals with, this will certainly be very challenging.

Replies from: Double
comment by Double · 2024-08-09T02:57:58.311Z · LW(p) · GW(p)

The "Definition of a Linear Operator" is at the top of page 2 of the linked text.
My definition was missing that in order to be linear, A(cx) = cA(x). I mistakenly thought that this property was provable from the property I gave. Apparently it isn't because of "Hamel bases and the axiom of choice" (ChatGPT tried explaining.)

"straight-line property process" is not a helpful description of linearity for beginners or for professionals. "Linearity" is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y). Describing that in words would be cumbersome. Defining it every time you see it is also cumbersome. When people come across "legitimate jargon", what they do (and need to do) is to learn a term when they need it to understand what they are reading and look up the definition if they forget.

I fully support experimental schemes to remove "illegitimate jargon" like medical latin, biology latin, and politic speak. Other jargon, like that in math and chemistry are necessary for communication.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-10T15:32:20.592Z · LW(p) · GW(p)

I don’t particularly agree about the math jargon. On the one hand, it might be annoying for people already familiar with the jargon to change the wording they use, but on the other hand, descriptive wording is easier to remember for people who are unfamiliar with a term and using an index to automatically replace the term on demand doesn’t necessarily affect anyone already familiar with the jargon. Perhaps this needs to be studied more, but this seems obvious to me. If “linearity” is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y), there is no reason “straight-line property” can’t also mean exactly that, but straight-line property is easier to remember because it’s more descriptive of the concept of linearity.

Also, I can see how the shorthand is useful, but you could just say “linearity is when a function has both the properties of homogeneity and additivity” and that would seem less daunting to many new learners to whom that shorthand reads like ancient Greek. I could make more descriptive replacement words for those concepts as well and it might make it even easier to understand the concept of linearity.

Replies from: Double
comment by Double · 2024-08-11T03:12:48.153Z · LW(p) · GW(p)

The math symbols are far better at explaining linearity that “homogeneity and additivity” because in order to understand those words you need to either bring in the math symbols or say cumbersome sentences. “Straight line property” is just new jargon. “Linear” is already clearly an adjective, and “linearity” is that adjective turned into a noun. If you can’t understand the symbols, you can’t understand the concept (unless you learned a different set of symbols, but there’s no need for that).

Some math notation is bad, and I support changing it. For example, f = O(g) is the notation I see most often for Big-O notation. This is awful because it uses ‘=‘ for something other than equality! Better would be f \in O(g) with O(g) being the set of functions that grow slower or as fast as g.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-11T06:00:29.711Z · LW(p) · GW(p)

I’m trying hard to understand your points here. I am not against mathematical notation as that would be crazy. I am against using it to explain what something is the first time when there is an easier way. Bear with me because I am not a math major, but I am pretty sure “a linear equation is an equation that draws a straight line when you graph it” is a good enough explanation for someone to understand the basic concept.

To me, it seems like “ A(cx) = cA(x) and A(x+y) = A(x) + A(y)” is only the technical definition because they are the only two properties that every linear equation imaginable absolutely has to have in common for certain. However, suppose I didn’t know that, and I wanted to be able to tell if an equation is linear. Easy. Just graph it, and if the graph makes a single straight line, it’s a linear equation. Suppose I didn’t want to or couldn’t graph it. I can still tell whether it is linear or not by whether or not the slope is constant using y=mx/b, or I could just simply look to see if the variables are all to the power of one and only multiplied by scalar constants. Either of those things can help me identify a linear equation, so why is it that we are stuck with A(cx) = cA(x) and A(x+y) = A(x) + A(y) as the definition? Give me some linear equations and I can solve them and graph them all day without knowing that. I know that for a fact because though I am certain that definition was in some of my math textbooks in college, I never read the textbooks and if my professors ever put that on the board, I didn’t remember it, and I certainly never used it for anything even though I’ve multiplied and divided matrices before and still didn’t need it then either. I only got A’s in those classes.

That’s why I am having trouble understanding why that definition is so important how it is too wordy to say “a function or equation with a constant slope that draws a single straight line on a graph” The only reason I can think of is there must be some rare exception that has those same properties but is not a linear equation. Even so, I am fairly certain that homogeneity and additivity could be summed up as “one output per input” and “the distributive property of multiplication is true for the equation/function”. That’s still not that wordy. Let’s pretend for a second that a math professor instead of using words to do the lecture read the symbols phonetically and explained everything in short hand on the board. Would more or fewer people passing the class in your opinion?

I am also wondering what your definition of jargon is. Jargon has 2 required elements:

The key elements of jargon are:

  1. Specific to a particular context: Jargon is used within a specific industry, profession, or group and may not be easily understood by those outside of that context.

  2. Involves technical terms, acronyms, or phrases that are not part of everyday language.

Straight Line Property doesn’t qualify for the second element which is why I like it. That said, linear isn’t the best example of jargon because it has the word “line” in it which at least gives the reader a clue what it means. I’m not trying to redefine words, I’m merely trying to rewrite them so that they use common language words that give a clue to what they mean because I am certain that leads to better memory retention for the layperson hearing it for the first time and is also less jarring to readers with poor vocabulary skills. This should apply equally to all jargon by the definition I gave. However, giving a clue may be very challenging for some jargon words that describe very abstract and arcane concepts that don’t map well to normal words which is what I initially thought your point was.

The only downside I see to providing an option to automatically replace useful jargon on demand is that it might lead to a more permanent replacement of the words over time which would irritate people already familiar with the jargon. If your point is that it is not useful, then I would like to hear your counterargument to the point I made about memory retention and the jarring cognitive effect on people with poor vocabulary skills. The jarring effect is easily observable and it’s hard for me to imagine that word familiarity and embedded clues don’t help memory retention of vocabulary, but I am open to counter arguments.

Replies from: andrei-alexandru-parfeni, Double
comment by sunwillrise (andrei-alexandru-parfeni) · 2024-08-11T06:26:02.131Z · LW(p) · GW(p)

Either of those things can help me identify a linear equation, so why is it that we are stuck with A(cx) = cA(x) and A(x+y) = A(x) + A(y) as the definition?

I'm not sure what you are referring to here. They certainly cannot always (or even usually) identify a linear equation. Those 2 things are going to be anywhere between useless and actively counterproductive in the vast majority of situations where you deal with potentially linear operations.

Indeed, if A is an n × n matrix of rank anything other than n - 1, the solution space of Ax=0 is not going to be a straight line. It will be a subspace of size n - rank(A), which can be made up of a single point (if A is invertible), a plane, a hyperplane, the entire space, etc.

"A function or equation with a constant slope that draws a single straight line on a graph" only works if you have a function on the real line, which is often just... trivial to visualize, especially in comparison to situations where you have matrices (as in linear algebra). Or imagine you have an operation defined on the space of functions on an infinite set X, which takes two functions f and g and adds them pointwise. This is a linear operator that cannot be visualized in any (finite) number of dimensions.

Bear with me because I am not a math major, but I am pretty sure “a linear equation is an equation that draws a straight line when you graph it” is a good enough explanation for someone to understand the basic concept.

So this is not correct, due to the above, and an important part of introductory linear algebra courses at the undergraduate level is to take people away from the Calc 101-style "stare at the graph" thinking and to make them consider the operation itself.

An object (the operation) is not the same as its representation (the drawing of its graph), and this is a critical point to understand as soon as possible when dealing with anything math-related (or really, anything rationality-related, as Eliezer has written about in the Sequences many times). Even the graph itself, in mathematical thinking, is crucially not the same as an actual drawing (it's just the set of (x, f(x)), where x is in the domain).

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-12T05:00:45.083Z · LW(p) · GW(p)

Thank you for taking the time to explain that. I never took linear algebra, only college algebra, trig, and calc 1, 2, and 3. In college algebra our professor had us adding, subtracting, multiplying, and dividing matrices and I don’t remember needing those formulas to determine they were linear, but it was a long time ago, so my memory could be wrong, or the prof just gave us linear ones and didn’t make us determine whether they were linear or not. I suspected there was a good chance that what I was saying was ignorant, but you never know until you put it out there and ask. I tried getting AI to explain it, but bots aren’t exactly math whizzes themselves either. Anyway, I now stand corrected.

Regarding the graph vs the equation, that sounds like you are saying I was guilty of reification, but aren’t they both just abstractions and not real objects? Perhaps your point is that the equation produces the graph, but not the other way around?

comment by Double · 2024-08-18T17:31:32.897Z · LW(p) · GW(p)

A linear operation is not the same as a linear function. Your description describes a linear function, not operation. f(x) = x+1 is a linear function but a nonlinear operation (you can see it doesn’t satisfy the criteria.)

Linear operations are great because they can be represented as matrix multiplication and matrix multiplication is associative (and fast on computers).

“some jargon words that describe very abstract and arcane concepts that don’t map well to normal words which is what I initially thought your point was.”

Yep, that’s what I was getting at. Some jargon can’t just be replaced with non-jargon and retain its meaning. Sometimes people need to actually understand things. I like the idea of replacing pointless jargon (eg species names or medical terminology) but lots of jargon has a point.

Link to great linear algebra videos: https://youtu.be/fNk_zzaMoSs?si=-Fi9icfamkBW04xE

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-19T16:14:02.553Z · LW(p) · GW(p)

“Some jargon can’t just be replaced with non-jargon and retain its meaning.”

I don’t understand this statement. It’s possible to have two different words with the same meaning but different names. If I rename a word, it doesn’t change the meaning, it just changes the name. My purpose here isn’t to change the meaning of words but to rename them so that they are easier to learn and remember.

As far as jargon words go, “linearity” isn’t too bad because it is short and “line” is the root word anyway, so to your point, that one shouldn’t be renamed. Perhaps I jumped to meet your challenge too quickly on impulse. I would agree that some jargon words are fine the way they are because they are already more or less in the format I am looking for.

However, suppose the word were “calimaricharnimom” instead of of “linearity” to describe the very same concept. I’d still want to rename it to something shorter, easier to remember, easier to pronounce, and more descriptive of the idea it represents so that it would be easier to learn and retain which is the goal of the jargon index filter. All words that aren’t already in that format or somewhat close to it are fair game, regardless of how unique or abstract the concept they represent is. The very abstract ones will be challenging to rename in a way that gives the reader a clue, but not impossible to rename that way, and even if we assume it is impossible for some words, just making them shorter, more familiar looking, and easier to pronounce should help.

All that said, this is an enormous project in itself because it would need to be done for every major language, not just English. It would need to be an LLM/human collaboration wiki project. Perhaps I should establish some guidelines for leaving certain jargon words alone for that project.

Replies from: Double
comment by Double · 2024-08-20T13:31:22.461Z · LW(p) · GW(p)

Yes it’s possible we were referring to figuring things by “jargon.” It would be nice to replace cumbersome technical terms with words that have the same meaning (and require a similar level of familiarity with the field to actually understand) but have a clue to their meaning in their structure.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-20T17:03:43.628Z · LW(p) · GW(p)

I think it’s not only nice, but a necessary step for reducing information asymmetry which is one of the greatest barriers to effective democratic governance. Designing jargon terms to benefit more challenged learners would carry vastly more benefit than designing them to please adept learners. It wouldn’t harm the adept learners in any significant way (especially since it’s optional), but it would significantly help the more challenged learners. Many of my ideas are designed to address the problem of information asymmetry by improving learning and increasing transparency.

comment by Benjamin Kost (benjamin-kost) · 2024-08-06T18:10:58.770Z · LW(p) · GW(p)

Thanks. I did give Claude a thumbs up, actually. I’ll give you the gist of my plan. The hardest part to planning something as big as changing society in a large nation like the United States is getting enough people to act on a plan. To do that, the plan involves creating a new social media app that emphasizes local communities called Community-Cohesion or Co-Co for short which will be very comprehensive by design and will try to overtake a slew of other apps that have some obvious problems while also filling some new niches that nobody has even thought of yet as well as providing ways for people to make money using the app. I see social media as one of the problems in modern society, but it could also be a solution if implemented correctly. The app will be tied to a nonprofit that I plan to create called Liaisons for Organizing Community Action and Leadership Strategies (LOCALS) that will aim to have a local chapter in every political jurisdiction across the US (municipal, county, and federal district) which will organize political action and try to get their own candidates into both the democrat and republican primaries for every office. The candidates will actually use the app not only for campaign fundraising and awareness, but to collect data to determine the will of the people which they will swear to uphold based on the data collected. Optionally, they can put up assets with the LOCALS trust as collateral in case they violate their oath.

It will be a bottom up, decentralized approach that uses a massive social media app to make the internet safer and less deceptive and will deprogram people at the same time. The app is such a good idea that I am very confident in it, but creating it will be another thing. Fortunately, with AI it might not be as hard as it would have been a short time ago.

Even still, it’s going to take a diverse group of experts including not only software engineers, but lawyers, data scientists, and people familiar with the political machinery for a massive array of local political jurisdictions. I’m not rich or I would just hire them, so I either have to raise the funds to hire them or find volunteers or some combination of both. That will be very difficult, and I also worry the app will be too big and thus be too difficult to debug. I’ve noticed that bugs are a plague for modern software in general, but especially for under-funded software. The good thing is that it has a number of ways of making money, so if it can get off the ground, then it would make a lot of money and be self sustaining.

That creates its own problems though in the form of controlling interests. I am still working on designing the charter for LOCALS and Co-Co to be resistant to corruption, but I am not a lawyer, so it’s hard for me to see the whole field. I considered going to law school, but I don’t think there is time for that, so I need to find at least one lawyer for corporate law who is willing to volunteer to help me design the corporate structure of the company or nonprofit. I am undecided on whether both LOCALS and Co-Co should be nonprofits or whether only LOCALS should be. I am leaning towards only LOCALS being one because I believe there are ways one could charter a corporation that would be more democratic and robust against corruption than a nonprofit.

Anyway, I plan to write a bunch of posts on this to outline all of the details, so stay tuned.

BTW, how does the voting system on here work exactly? I read the new user guide, but it doesn’t explain it well or tell the user which type of button goes with which type of vote. I need to stop being the dork who uses it wrong all the time. I see left and right arrows, checks and X’s, and up and down arrows. It’s funny how little explanation there is for these things. I’ve looked exhaustively for instructions.

Replies from: Double, Double
comment by Double · 2024-08-06T23:46:37.165Z · LW(p) · GW(p)

Voting: left for “this is bad”, right for “this is good.” X for “I disagree” check for “I agree”.

This way you can communicate more in your vote. Eg: “He’s right but he’s breaking community norms. Left + check. “He’s wrong but I like the way he thinks. Right + X.”

https://www.lesswrong.com/posts/HALKHS4pMbfghxsjD/lesswrong-has-agree-disagree-voting-on-all-new-comment [LW · GW]

comment by Double · 2024-08-07T01:00:16.665Z · LW(p) · GW(p)

What would draw people to Co-Co and what would keep them there?

How are the preferences of LOCALS users aggregated?

LOCALS sounds a lot like a political party. Political parties have been disastrous. I’d love for one of the big two to be replaced. Is LOCALS a temporary measure to get voting reform (eg ranked choice) or a long-term thing?

I want more community cohesion when it comes to having more cookouts. More community cohesion in politics makes less sense. A teacher in Texas has more in common with a teacher in NY than the cattle rancher down the road. Unfortunately, the US political system is by design required to be location based.

Is LOCALS a political party with “increase local community connection” as its party platform? If the party has some actionable plans, then its ideas can get picked up by the big parties if LOCALS shows that its ideas are popular. This might not be a bad idea and could solve the lack-of-community problem without overthrowing the big parties.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-07T18:04:57.648Z · LW(p) · GW(p)

LOCALS is absolutely NOT a political party. I am very anti political party because I consider political parties to be anti-democratic. I suppose this is the danger in giving a sloppy synopsis. I was hoping to convey that it wasn’t a political party via a context clue by saying LOCALS candidates will run in the democrat and republican primaries. In other words, they would run as democrats and republicans because 1. They are not a political party and 2. The system is rigged to permanently codify the democrat and republican parties as the only 2 viable parties. It is a bad strategy to try to change the system from the outside. It has to be changed from the inside to be successful. There is no way LOCALS could compete with the two major parties, so instead of competing it aims to join both and become an integral part of both while making both irrelevant in the long run.

Another reason LOCALS shouldn’t be considered a political party is that one of the aims is to be as non political as possible. This would be accomplished by prioritizing democracy (really stakeholder democracy, but that’s another long conversation) over every issue. For example, suppose a LOCALS candidate were to be asked “what is your opinion on abortion”, they would give a standard LOCALS answer such as “I am completely supportive of whatever the will of the majority of the constituents from my district want according to the data collected from the Township Talks portion of the Community-Cohesion application. I want to work for you, so I’m more interested in what you think. What’s your opinion on abortion?” Similar answers would be given for gun control and other controversial issues. I could write a whole essay on this idea alone and how it solves a number of political problems, but my time is limited.

Co-Co would deal with a lot more than politics and would indeed help your cookouts, and I also think that is very important, but I think focusing on national politics is both a strategical and ethical mistake when it comes to a majority of domestic policies. Education is one of those. I don’t like the prospect of a teacher in Texas identifying politically more with a teacher in New York than his own community. It reminds me of teacher’s unions which I am also against. While that may be good for the individual teachers, it comes at the expense of the community that the teachers serve. Ideally, teachers should be trying to figure out how to best serve their community rather than themselves. Realistically, we know that most will act selfishly due to human nature, but the fact of the matter is that students and parents in Texas have different needs and priorities than students and parents in New York. When the teachers from New York and Texas collaborate to enforce their own will over the will of the communities they serve, that is something which I consider to be akin to an economic externality like when companies pollute to save costs and increase market competitiveness. Furthermore, by collaborating on such endeavors, they make pedagogy more centralized and uniform in the process which means less innovation and more fat tail risk because vastly more students are affected when they get it wrong.

Next, why should people in New York have any say in how people in Texas choose to educate their students or vice versa? I strongly believe in every community’s right to political self-determination within certain moral boundaries and see a national teacher’s union as a violation of that right. The only counterargument to this is expertocracy where we discount parents and students in the decision making process in favor of the teachers who are supposed to serve them because they know less than teachers about pedagogy. I see that as an information problem to be solved in more ethical ways. While that sounds very daunting to most people, as a naturally creative thinker and problem solver, it sounds less daunting to me although I will admit that my solution in this case involves rethinking the whole entire education system because I find the entire system to be inherently unsatisfactory in ways that can’t be internally reformed. I wish I could say otherwise, but I believe public school is actually damaging to a majority of children, especially compared to possible unrealized alternatives that would take me more time than I have to explain. Suffice it to say it would be part of Co-Co if I am successful. Perhaps that will give you an idea of how broad the proposed app is. It is definitely the most daunting part of my plan. I remain optimistic though because recent advances in computer science make me believe it is possible to accomplish the things I want to do with it.

As far as how Co-Co would attract and keep users, I could sum that up by saying that if it works and gets off the ground then it would become absolutely indispensable for daily life and everyone would have it and use it to do all kinds of things ranging from determining how their local elected officials create and vote on policies to making money directly through the app, finding jobs, buying and selling on the private market, buying and selling with local businesses, looking at product reviews, browsing and searching the internet, finding friends or dates, and much more. I am in the process of getting a complete description on paper. Even just that is a lot of work, and I am still developing ideas for it as well. Before I even do any of that here, I am thinking I will first post about my epistemology and ethical philosophies as well as what challenges I believe the US faces to try to get people on the same wavelength before I go posting really long discourses on how to solve them. Unfortunately, I am not a fast writer. I frequently rewrite and edit everything heavily before I post something that I am serious about because I know how important a first impression is about a subject. I’m actually being quite lazy in this discussion which is why you got some wrong impressions from my previous post.

Replies from: Double
comment by Double · 2024-08-07T22:57:49.943Z · LW(p) · GW(p)

There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/

Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver.

A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them.

LOCALS would try to get the democrat and the republican candidate to use Co-Co to choose their policies (offering the candidate support in form of donations or personnel), and if they do then they get an endorsement. I’m still a bit iffy on the difference between an interest group and a political party, so maybe you are in the clear.

https://en.m.wikipedia.org/wiki/Electoral_fusion_in_the_United_States

I love your vision of how a politician should answer the abortion question. Separating the three questions “who do voters think is qualified” “what do voters want” and “what is true” would be great for democracy. Similar to: https://mason.gmu.edu/~rhanson/futarchy.html

When it comes to local vs not local, if 1/100 people is an X, and they are spread out, then their voice doesn’t mean much and the other 99/100 people in their district can push through policies that harm them. If the Xes are in the same district, then they get a say about what happens to them. I used teachers as an example of an X, but it is more general than that. (Though I’m thinking about the persecution of Jews in particular.)

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-08T07:43:35.288Z · LW(p) · GW(p)

//There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/


 

Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver.


 

A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them.


 

LOCALS would try to get the democrat and the republican candidate to use Co-Co to choose their policies (offering the candidate support in form of donations or personnel), and if they do then they get an endorsement. I’m still a bit iffy on the difference between an interest group and a political party, so maybe you are in the clear.


 

https://en.m.wikipedia.org/wiki/Electoral_fusion_in_the_United_States //


 

Thank you for that information. I did not know anything about fusion parties, so you had me worried for a minute. I then looked up what “cross-endorsement” is and this in not remotely like anything I had in mind. Consider the name “Liaisons for Organizing Community Action and Leadership Strategies”. Besides being a clever acronym, it is very descriptive of the intended purpose of the organization. The group will have three main missions: 1. Developing leadership through an in house program (This is where future candidates sworn to uphold democracy will come from), 2. Organizing community actions such as referendums, planning and fundraising various local charity projects, organizing voting initiatives, lobbying local government and local businesses for various reasons, planning other various political strategies for the community, etc. 3. Maintaining the Township-Talks portion of Co-Co for their political district chapter. Other than #3, I plan to keep locals and Co-Co as completely separate organizations with separate agendas. LOCALS will be a nonprofit organization (Probably) while Co-Co will be a for profit corporation (Most Likely). As I mentioned before, I am not yet solid on structural organization, but I do know that they will be separate organizations. This is important because if they were the same organization, LOCALS might very ambiguously be considered a political party which I not only don’t want but absolutely can’t have for the plan to work.

To explain this, I will need to explain how part of the Township-Talks (the political section) portion of Co-Co will work which will be the main part that the LOCALS chapter manages. There will be a page/section for each current representative for each office within the LOCALS chapter political district. A person, bot, or combo will be assigned to each representative to collect information and post it there. Upcoming/past votes and voting records will be collected and posted there along with an AI generated synopsis of what the issue they are voting on is about. There will be tools that the representatives can use to talk to the public and hold town hall meetings online if they so wish. The representatives can also submit to make corrections for information about them, but they won’t directly be in charge of this information. The LOCALS chapter will research and populate this information into Co-Co will then take the information collected by LOCALS, compare it to the data collected from users via surveys and other sources, and then use an algorithm to score every single office holder/representative with a “democracy score” that indicates how well they are doing the will of the people. This way, LOCALS will simply be doing the nonprofit work of researching all of the available officeholders and merely using Co-Co as a tool to upload their research to for the public to view. Co-Co will then do the rest of the data collection and algorithmic sorting and figuring on its own to rate how the officeholders are doing and get the information to the constituents of the political district. There will also be a section for candidates during elections as well which will somewhat overlap with the officeholders because we expect incumbents to run for office again.


 

All this said, LOCALS will not be directly putting up any candidates. The only thing LOCALS will be doing is training candidates, getting them to swear oaths to uphold democracy according to a specific set of rules enshrined within the open source Co-Co algorithm that calculates the will of the people, and optionally putting up one or more assets with the LOCALS trust as collateral in case they violate their oath.


 

Now this next part is where I worry things might get somewhat sticky legally, but I am more certain that it is legal than not. There will be a monetization feature as well for any officeholder or candidate willing to swear the oath to uphold democracy via township talks data that can be where in exchange for a standardized low monthly user fee (like $10), a township talks user can answer special additional polls related to upcoming votes, propose legislative changes, and get more interactive time with the officeholder they are subscribed to. Besides those extra privileges, the algorithm that calculates what the officeholder should do according to the will of the people in Co-Co will be weighted heavier for the subscribers of the officeholder. Co-Co will receive a small portion of the funds, the rest will go directly to the officeholder as income. Importantly, this won’t be the only way to get beneficially weighted by the algorithm. There will be civics and local politics education courses that once completed have that effect, uploading proof of local charity work or donations will have that effect, and participating in online town halls and debates will also have that effect. I will likely add other ways to get further weighted as well (all of this in general rather than officeholder specific). In this way, users will build capital towards having more of a democratic influence in their community thus we have “stakeholder democracy” as I call it. The problem with plain democracy is that the fentanyl junkie gets the same vote as Mother Theresa and Albert Einstein. The most competent and virtuous people are the ones who ought to be in charge of decisions, so I had the idea to weakly integrate meritocracy and virtue ethics into the process while also getting the officeholders decentrally paid by their active constituents for their work so that the results are skewed towards good faith individuals and competent decision makers. I also figure that most politicians live off bribes these days, so rather than expecting the bribing to stop, why not have the option for the constituents to very weakly and decentrally bribe the officeholders to do what they want? It is not much different than campaign funding except it happens while in office, and the officeholder just gets to keep the money and use it however he or she wishes. As part of this process, the officeholder would sign a multi-lateral contract that incurs strong financial penalties if they don’t do what they promised and would be forced to pay back the fees to the subscribers.


 

Finally, besides for officeholders, as I mentioned before, candidates will get similar pages and they will be able to raise campaign funds via Township Talks also so long as they are sworn to uphold stakeholder democracy. It would work the same way as with the officeholder subscriptions. Subscribe to your candidate, and if they win, you get further weighted in the algorithm for any issues they vote on. I would make this a significant weighting because it is riskier considering the candidate might lose.


 

Anyway, LOCALS will neither directly run nor fund candidates. Instead, they will train leaders who will then run independently from LOCALS as candidates who will be certified by LOCALS under the democrat and primary tickets. What you are calling a fusion party involves a literal political party running a candidate say, the Libertarian party, under both the Libertarian ticket and the Republican ticket at the same time. So, for instance, if the Libertarian party nominated Donald Trump, then Donald Trump would be both the Libertarian and Republican party candidate. Absolutely nothing like that is even happening here. LOCALS doesn’t even have a ticket, doesn’t seek ballot access, or technically even field candidates. LOCALS merely trains and certifies candidates who they hand pick for their leadership program and who swear to uphold democracy according to a specific set of rules, agree to campaign in a certain fashion, and may optionally choose to put one or more assets up for collateral with the LOCALS trust that would be lost if they break their oath. This would make 2 types of LOCALS certified candidates: 1. a LOCALS certified candidate, and 2. a LOCALS certified Trust candidate. In this manner, rather than running candidates (which I improperly said for simplicity sake in the earlier response), all they will be doing is hand picking and training leaders and helping them enforce self-imposed rules. The self-imposed part is important. If the problem were officeholders didn’t have enough freedom in how they vote and run their campaigns and offices, we would have a problem. Because the problem is that they have too much, we can create candidates that work based on self-imposed rules without breaking/changing any laws or rewriting any constitutions. That realization is what got the gears turning for this whole idea. There is also a precedent in both major parties for hijacking them. With the GOP we had/have “the Tea Party wing” and “the MAGA wing”, and for the Dems we have “The Squad” (originally known as “Justice Democrats”). Upon seeing these in party rebels take over from the primaries, I said to myself “why not both parties?” If LOCALS can get LOCALS certified candidates to win both primaries for a single office, that office is guaranteed to go to a LOCALS certified candidate. It’s also easier and cheaper to win in the primaries because there is less turnout and less funding, and if Co-Co takes off, Co-Co can organize voter turnout for the LOCALS certified candidates.


 

//I love your vision of how a politician should answer the abortion question. Separating the three questions “who do voters think is qualified” “what do voters want” and “what is true” would be great for democracy. Similar to: https://mason.gmu.edu/~rhanson/futarchy.html//


 

I love how you were able to grok that from the few context clues that I gave you. That’s exactly what I was thinking. American elections are not democratic because they are too ambiguous to functionally achieve democracy in quite a few ways. The voter has to somehow figure out which candidate is trustworthy (won’t back stab or sell out later or is just lying to begin with), competent, supports their values and interests, and has a reasonable chance of winning all at the same time (assuming such a candidate exists, which usually isn’t the case). I harp on people confusing elections with democracy all the time. Sure, an election happens in that situation, but nothing remotely close to the will of a majority of people is happening because of the election. I liken it to voting on who gets to punch you in the face. Logically, the democracy part can only happen after the election. The election should only be about who is competent and trustworthy and the issues sorted out later by the constituents via data science. It doesn’t even make sense for the candidate to promise what they will do ahead of time because circumstances change and decisions should change with them. All this seems obvious to me, but most other people don’t generally seem to understand what democracy actually is. They think democracy is elections. I always like to point out that we could have democracy entirely without elections if we switched to sortition instead. I am not saying we should, though I doubt it could be worse than what we have now, but the point is that democracy doesn’t even require elections. I also don’t want to do the stupid form of democracy like the article you linked referenced which is why I designed a system as a stakeholder democracy to weight the process towards merit, virtue, and participation.


 


 


 

//When it comes to local vs not local, if 1/100 people is an X, and they are spread out, then their voice doesn’t mean much and the other 99/100 people in their district can push through policies that harm them. If the Xes are in the same district, then they get a say about what happens to them. I used teachers as an example of an X, but it is more general than that. (Though I’m thinking about the persecution of Jews in particular.)//

Yes, Claude chided me often about protecting political minorities as well. As I told him, this is less of a concern in a local community sovereignty setup in modern times where mobility is cheap and easy because of the ability to vote with one’s feet than it would be in literally any other known system. I am actually hoping that people do just that and move wherever they like the politics. I am a big fan of intentional communities, and if people move based on political preferences, then they will naturally self sort into intentional communities. The gain in social capital from living in a community of people who share your beliefs and policy preferences is enormous! Regarding Jews, I think they are protected under federal law anyway. However, for political rather than racial/ethnic minorities which is I believe what we are discussing now, voting with one’s feet still applies. Suppose you hate gun control and 95% of the community is for it, you can just move to another community that loves guns. People already do it now. Are you a wing nut and your community hates private airplanes? Just move one community over where they either like them or don’t care. Problem solved. That’s why I am very serious about making sovereignty as localized as possible. If literally every neighborhood were sovereign, you wouldn’t have to go very far to escape a bad policy. I have also toyed with the idea of creating a way to use Co-Co and/or LOCALS to grease mobility even further for people.

The fact of the matter is, no matter what type of government is chosen, the risk of becoming a disgruntled political minority is always a possibility. That being the case, the only real insurance against this is, in fact, radical decentralization of political districts coupled with local sovereignty. This actually fits well with social contract theory which is the main theory that political science is based on. Social contracts are implicitly agreed to by staying within the jurisdiction. If the jurisdiction is too difficult to escape, then the implicit contract is violated. Perhaps most importantly, it would be very beneficial to have lots of different communities trying lots of different things. That’s how we could really advance the social sciences. We need the data, but we don’t want to do anything too widespread because of the risk profile involved. Single community testing is perfect. If it works one place, others will likely try it too. If it fails miserably, that's unfortunate, but at least others will avoid it like a hot stove. The risk profile for localism makes a lot more sense for empirically testing, implementing, and improving social policies in an iterative manner.

Speaking of empiricism, I also think that lack of empiricism in politics is one reason why the U.S. and western civilizations appear to be having a political mental health crisis. Being passionate about abstractions reported in the news regarding far off places is not good for mental health. People in California should be a lot more worried about the homeless guys shooting fentanyl in camps on the streets than what is happening in Ukraine or Gaza. We can’t even know if the information regarding that stuff is accurate. It could be almost 100% BS. Being spoon fed your worldview by provably dishonest media organizations that are probably at least partially controlled by various intel agencies and special interests both foreign and domestic isn’t conducive to a stable, healthy worldview. Furthermore, when you are trying to politically control the entire nation, the stakes are too high and we get strong political hatred like we see now. That’s why I want to stop people from focusing on and controlling what happens in Ukraine or Gaza (which is absurd!) or even across the nation in other states and start worrying about controlling the literal streets they live on instead. We’re experiencing a megalomania crisis where everyone thinks that modern tech coupled with sham democracy allows them to control not only the entire country, but the entire world! Control your own neighborhood people! Then you can start worrying about the neighboring communities. Don’t even try to control the world. You can’t and shouldn’t anyway. It would be unethical even if you could. However, if the people can organize to be sovereign at the community level, the federal government will automatically get weaker and have fewer teeth. They can’t control every individual neighborhood. We do the feds a huge favor by not caring enough about our neighborhoods and focusing on national/international politics instead. It’s much easier to control a power vacuum caused by a confusopoly.

That said, I realize what a logistical nightmare that many districts with strong sovereignty might be, but we have AI and other software now. Coordinating communities to collaborate and trade is part of what Co-Co will be programmed to do. I think we are set for solving logistics problems. I don’t have all the answers yet, but I know that people could figure out a way to seamlessly integrate things with modern tech, and figuring out how to do so should create jobs anyway.

Replies from: Double
comment by Double · 2024-08-09T03:38:48.963Z · LW(p) · GW(p)

More bad news: 

"a section 501(c)(3) organization may not publish or distribute printed statements or make oral statements on behalf of, or in opposition to, a candidate for public office"

You'd probably want to be a 501(c)(4) or a Political Action Committees (PAC)

How would LOCALS find a politician to be in violation of their oath? 

That would be a powerful position to have. "Decentralization" is a property of a system, not a description of how a system would work.

Futarchy

I'd love to hear your criticisms of futarchy. That could make a good post.

Mobility

Political mobility is good, but there are limitations. People are sticky. Are you going to make your kid move schools and separate them from their friends because you don't like the city's private airplane policy? Probably not. 

Experimental Politics

I want more experimental politics so that we can find out which policies actually work! Unfortunately, that's an unpopular opinion. People don't like being in experiments, even when the alternative is they suffer in ignorance.

End

I feel that you are exhausting my ability to help you refine your ideas. Edit these comments into a post (with proper headings and formatting and a clear line of argument) and see what kinds of responses you get! I'd be especially interested in what lawyers and campaigners think of your ideas.

Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-10T14:13:54.492Z · LW(p) · GW(p)

I’m not sure if certifying a candidate as a leader and optionally holding them to an oath by holding collateral would count as an endorsement, but you never know with legal issues. It is definitely something to look into, so thanks for that information. It would be better for LOCALS to qualify as a tax exempt organization and charity that accepts donations. However, I am not assuming this is legally possible. I would need to find legal expertise to figure out whether it is or isn’t.

Regarding experimental politics being unpopular, I agree that it would be unpopular if I frame it as an experiment. Framing is very important. The better way to frame strong local self-determination for communities is that it gives the community freedom to make their own rules how they see fit with less interference from external actors who have no skin in the game with the local community, and the fact that it provides us opportunities to get more data on the effectiveness of social policies is a coincidental side benefit for doing the right thing in the first place.

I haven’t done or found any studies on whether kids having to make new friends is a common sticking point for mobility, but in my experience, it isn’t. My parents moved a couple times for jobs they didn’t particularly need because they already had good jobs with little to no concern for that. I also had lots of friends as a child whose families moved away for trivial reasons. I am not assuming my experience is representative of the mean, but I wouldn’t assume it isn’t either.

I agree I should make an official post. I will when I am less busy. Thank you for the help.

Replies from: Double
comment by Double · 2024-08-11T03:01:55.532Z · LW(p) · GW(p)

I just skimmed this, but it seems like a bunch of studies have found that moving causes harm to children. https://achieveconcierge.com/how-does-frequently-moving-affect-children/

I’m expecting Co-co and LOCALS to fail (nothing against you. These kinds of clever ideas usually fail), and have identified the following possible reasons:

  • You don’t follow through on your idea.
  • People get mad at you for trying to meddle with the ‘democratic’ system we have and don’t hear you out as you try to explain “no, this is better democracy.” —Especially the monetization system you described would get justified backlash for its pay-for-representation system.
  • You never reach the critical mass needed to make the system useful.
  • Some political group had previously tried something similar and therefore it got banned by the big parties.
  • You can’t stop Co-co and LOCALS from being partisan.
  • A competitor makes your thing but entrenched and worse
Replies from: benjamin-kost
comment by Benjamin Kost (benjamin-kost) · 2024-08-11T04:46:14.219Z · LW(p) · GW(p)

That’s actually good feedback. It’s better to think of the barriers to success ahead of time while I am still in the development phase. I agree that convincing people to do anything is always the hardest part. I did consider that it would be difficult to stop a competitor who is better funded and more well connected from just taking my ideas and creating a less benevolent product with them, and it is a concern that have no answer for.

I don’t think $10 a month to subscribe to a local official in exchange for extra influence is a big deal because $10 isn’t a lot of money, but I can see how other people might ignore the scale and think it’s a big deal. I’m not married to the idea though. The main reason I wanted to include that feature is to thwart the control of special interests. I’ve considered that special interests are inevitable to some degree, so if we could decentralize them and make the same influence available to the general public at a nominal cost, that would be an improvement. The other reason I liked the idea is because I don’t think weighting every vote identically creates the smartest system. If someone is willing to participate, pay attention, and pay a small amount of money, that should work like a filter that weeds out apathy, and I don’t see how reducing apathy within the voting system wouldn’t increase the quality of the decision making process rather than decrease it. I agree it would be a hard sell to the public though because it sounds bad described in the abstract, general sense like “paying for representation” when the entire concept isn’t considered with proper detail and context. That said, we already have a system like that except you have to have a lot more than $10 to buy representation, so what the idea actually does in theory is democratize the system we already have.

As far as following through, I plan to try my best even if it fails because I will feel better having tried my best and failed than to have never tried at all and let things spiral down the drain.

Regarding being non-partisan, I have decided the only way to do that is to be explicitly apolitical other than supporting democracy. I could put that right in the charter for both organizations and create incentives for keeping to it and disincentives for abandoning it. If both organizations can’t take sides on any issues, then I don’t see how they can be partisan. Personally, I don’t have strong feelings either way on most issues other than I don’t want an expansive, homogeneous government that is so large that it is very difficult to escape from. We only have such a government because of the advantages of a centralized military power which is rife with abuse.

Regarding moving being bad for children, just a quick skim shows me that those studies aren’t necessarily telling you what you think they are. For instance, one portion cites 3 studies that show “ High rates of residential mobility have been associated with social disadvantage including poverty [1, 2, 4]” yet they didn’t appear to control for these variables in the studies I skimmed. Even for the children in those conditions, moving might actually be beneficial. I would assume it depends on what alternative we are comparing it to. In the many cases, moving may be less harmful than staying such as when they are moving from a bad neighborhood with bad schools to a good neighborhood with good schools. I think the same thing applies to complaints about democracy not protecting minorities well enough which was the trigger for this conversation. Compared to what? I am open to suggestions. Which system of governance protects minorities better than democracy? If the answer is none, then that is an argument for democracy, not against it.

Ultimately, I probably should have waited to post about this on here until I had a very detailed outline to put everything in context with all of the supporting arguments and proper citations. Either way though, even if not a single person here likes the ideas, I would still write the book and attempt to carry the plan out, but I would use the criticisms to modify the plan. Like l’ve said before, I love it when people shoot holes in my arguments. I don’t want to cling to bad arguments or bad ideas and I value both positive and negative feedback as long as it is honest.

comment by Sammy Martin (SDM) · 2020-08-15T16:38:27.193Z · LW(p) · GW(p)

Covid19Projections has been one of the most successful coronavirus models in large part because it is as 'model-free' and simple as possible, using ML to backtrack parameters for a simple SEIR model from death data only. This has proved useful because case numbers are skewed by varying numbers of tests, so deaths are more consistently reliable as a metric. You can see the code here.

However, in countries doing a lot of testing, with a reasonable number of cases but with very few deaths, like most of Europe, the model is not that informative, and essentially predicts near 0 deaths out to the limit of its measure. This is expected - the model is optimised for the US.

Estimating SEIR parameters based on deaths works well when you have a lot of deaths to count, if you don't then you need another method. Estimating purely based on cases has its own pitfalls - see this from epidemic forecasting, which mistook an increase in testing in the UK mid-july for a sharp jump in cases and wrongly inferred brief jump in R_t. As far as I understand their paper, the estimate of R_t from case data adjusts for delays in infection to onset and for other things, but not for the positivity rate or how good overall testing is.

This isn't surprising - there is no simple model that combines test positivity rate and the number of cases and estimates the actual current number of infections. But perhaps you could use a Covid19pro like method to learn such a mapping.

Very oversimplified, Covid19pro works like this:

Our COVID-19 prediction model adds the power of artificial intelligence on top of a classic infectious disease model. We developed a simulator based on the SEIR model (Wikipedia) to simulate the COVID-19 epidemic in each region. The parameters/inputs of this simulator are then learned using machine learning techniques that attempts to minimize the error between the projected outputs and the actual results. We utilize daily deaths data reported by each region to forecast future reported deaths. After some additional validation techniques (to minimize a phenomenon called overfitting), we use the learned parameters to simulate the future and make projections.

And the functions f and g, estimate the SEIR (susceptible, exposed, infectious, recovered) parameters from current deaths up to some time t_0, and the future deaths based on those parameters respectively. These functions are then both optimised to minimise error when the actual number of deaths at t_1 is fed into the model.

This oversimplification is deliberate:

Deaths data only: Our model only uses daily deaths data as reported by Johns Hopkins University. Unlike other models, we do not use additional data sources such as cases, testing, mobility, temperature, age distribution, air traffic, etc. While supplementary data sources may be helpful, they can also introduce additional noise and complexity which can notably skew results.

What I suggest is a slight increase in complexity, where we use a similar model except we feed it paired test positivity rate and case data instead of death data. The positivity rate /tests per case serves as a 'quality estimate' which serves to tell you how good the test data is. That's how tests per case is treated by our world in data. We all know intuitively that if positivity rate is going down but cases are going up, the increase might not be real, but if positivity rate is going up and cases are going up the increase definitely is real.

What I'm suggesting is that we combine do something like this:

Now, you need to have reliable data on the number of people tested each week, but most of Europe has that. If you can learn a model that gives you a more accurate estimate of the SEIR parameters from combined cases and tests/case data, then it should be better at predicting future infections. It won't necessarily predict future cases, since the number of future cases is also going to depend on the number of tests conducted, which is subject to all sorts of random fluctuations that we don't care about when modelling disease transmission, so instead you could use the same loss function as the original covid19pro - minimizing the difference between projected and actual deaths.

Hopefully the intuition that you can learn more from the pair (tests/case, number of cases) than number of cases or number of deaths alone should be borne out, and a c19pro-like model could be trained to make high quality predictions in places with few deaths using such paired data. You would still need some deaths for the loss function and fitting the model.

comment by cod3d · 2020-08-23T22:45:57.833Z · LW(p) · GW(p)

Greetings all, and thanks for having me! :) I'm an AI enthusiast, based in Hamilton NZ. Where until recently I was enrolled in and studying strategic management and computer science. Specifically, 'AI technical strategy'. After corona virus and everything that's been happening in the world, I've moved away from formal studies and are now focusing on using my skills etc, in a more interactive and 'messy' way. Which means more time online with groups like LessWrong. :) I've been interested in rationality and the art of dialogue since early 2000's. I've been involved in startups and AI projects, from a commercial perspective for a while. Specifically in the agri-tech space. I would like to understand and grow appreciation more, for forums like this, where the technology essentially enables better and more productive human interaction.

Replies from: daniel-kokotajlo
comment by Anirandis · 2020-08-06T14:18:18.903Z · LW(p) · GW(p)

Is it plausible that an AGI could have some sort of exploit (buffer overflow maybe?) that could be exploited (maybe by an optimization daemon…?) and cause a sign flip in the utility function?

How about an error during self-improvement that leads to the same sort of outcome? Should we expect an AGI to sanity-check its successors, even if it’s only at or below human intelligence?

Sorry for the dumb questions, I’m just still nervous about this sort of thing.

Replies from: MakoYass, vanessa-kosoy, Dach, ChristianKl
comment by mako yass (MakoYass) · 2020-08-15T10:52:50.916Z · LW(p) · GW(p)

It freaks me out that we have Loss Functions and also Utility Functions and their type signature is exactly the same, but if you put one in a place where the other was expected, it causes literally the worst possible thing to happen that ever could happen. I am not comfortable with this at all.

Replies from: gwern, Anirandis
comment by gwern · 2020-08-15T20:57:32.755Z · LW(p) · GW(p)

It is definitely awkward when that happens. Reward functions are hard.

Replies from: Anirandis
comment by Anirandis · 2020-08-15T22:52:10.208Z · LW(p) · GW(p)

Do you think that this type of thing could plausibly occur *after* training and deployment?

Replies from: gwern
comment by gwern · 2020-08-15T23:36:44.070Z · LW(p) · GW(p)

Yes. For example: lots of applications use online learning. A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.

Replies from: Anirandis, Anirandis
comment by Anirandis · 2020-08-22T18:25:08.378Z · LW(p) · GW(p)

Do you think that this specific risk could be mitigated by some variant of Eliezer’s separation from hyperexistential risk or Stuart Armstrong's idea here:

Let B1 and B2 be excellent, bestest outcomes. Define U(B1) = 1, U(B2) = -1, and U = 0 otherwise. Then, under certain assumptions about what probabilistic combinations of worlds it is possible to create, maximising or minimising U leads to good outcomes.
Or, more usefully, let X be some trivial feature that the agent can easily set to -1 or 1, and let U be a utility function with values in [0, 1]. Have the AI maximise or minimise XU. Then the AI will always aim for the same best world, just with a different X value.

Or at least prevent sign flip errors from causing something worse than paperclipping?

comment by Anirandis · 2020-08-16T00:30:45.598Z · LW(p) · GW(p)

Interesting. Terrifying, but interesting.

Forgive me for my stupidity (I'm not exactly an expert in machine learning), but it seems to me that building an AGI linked to some sort of database like that in such a fashion (that some random guy's screw-up can effectively reverse the utility function completely) is a REALLY stupid idea. Would there not be a safer way of doing things?

comment by Anirandis · 2020-08-19T00:51:36.380Z · LW(p) · GW(p)

If we actually built an AGI that optimised to maximise a loss function, wouldn't we notice long before deploying the thing?


I'd imagine that this type of thing would be sanity-checked and tested intensively, so signflip-type errors would predominantly be scenarios where the error occurs *after* deployment, like the one Gwern mentioned ("A programmer flips the meaning of a boolean flag in a database somewhere while not updating all downstream callers, and suddenly an online learner is now actively pessimizing their target metric.")

Replies from: gwern, MakoYass
comment by gwern · 2020-08-19T01:13:30.380Z · LW(p) · GW(p)

Even if you disclaim configuration errors or updates (despite this accounting for most of a system's operating lifespan, and human/configuration errors accounting for a large fraction of all major errors at cloud providers etc according to postmortems), an error may still happen too fast to notice. Recall that in the preference learning case, the bug manifested after Christiano et al went to sleep, and they woke up to the maximally-NSFW AI. AlphaZero trained in ~2 hours wallclock, IIRC. Someone working on an even larger cluster commits a change and takes a quick bathroom break...

Replies from: Anirandis
comment by Anirandis · 2020-08-19T01:38:56.861Z · LW(p) · GW(p)

Wouldn't any configuration errors or updates be caught with sanity-checking tools though? Maybe the way I'm visualising this is just too simplistic, but any developers capable of creating an *aligned* AGI are going to be *extremely* careful not to fuck up. Sure, it's possible, but the most plausible cause of a hyperexistential catastrophe to me seems to be where a SignFlip-type error occurs once the system has been deployed.


Hopefully a system as crucially important as an AGI isn't going to have just one guy watching it who "takes a quick bathroom break". When the difference is literally Heaven and Hell (minimising human values), I'd consider only having one guy in a basement monitoring it to be gross negligence.

Replies from: gwern
comment by gwern · 2020-08-19T01:54:06.529Z · LW(p) · GW(p)

Many entities have sanity-checking tools. They fail. Many have careful developers. They fail. Many have automated tests. They fail. And so on. Disasters happen because all of those will fail to work every time and therefore all will fail some time. If any of that sounds improbable, as if there would have to be a veritable malevolent demon arranging to make every single safeguard fail or backfire (literally, sometimes, like the recent warehouse explosion - triggered by welders trying to safeguard it!), you should probably read more about complex systems and their failures to understand how normal it all is.

Replies from: Anirandis
comment by Anirandis · 2020-08-19T02:18:07.571Z · LW(p) · GW(p)

Sure, but the *specific* type of error I'm imagining would surely be easier to pick up than most other errors. I have no idea what sort of sanity checking was done with GPT-2, but the fact that the developers were asleep when it trained is telling: they weren't being as careful as they could've been.

For this type of bug (a sign error in the utility function) to occur *before* the system is deployed and somehow persist, it'd have to make it past all sanity-checking tools (which I imagine would be used extensively with an AGI) *and* somehow not be noticed at all while the model trains *and* whatever else. Yes, these sort of conjunctions occur in the real world but the error is generally more subtle than "system does the complete opposite of what it was meant to do".

I made a question post about this specific type of bug occurring before deployment a while ago and think my views have shifted significantly; it's unlikely that a bug as obvious as one that flips the sign of the utility function won't be noticed before deployment. Now I'm more worried about something like this happening *after* the system has been deployed.

I think a more robust solution to all of these sort of errors would be something like the separation from hyperexistential risk article that I linked in my previous response. I optimistically hope that we're able to come up with a utility function that doesn't do anything worse than death when minimised, just in case.

Replies from: habryka4, ChristianKl
comment by habryka (habryka4) · 2020-08-19T02:42:06.492Z · LW(p) · GW(p)

At least with current technologies, I expect serious risks to start occuring during training, not deployment. That's ultimately when you will the greatest learning happening, when you have the greatest access to compute, and when you will first cross the threshold of intelligence that will make the system actually dangerous. So I don't think that just checking things after they are trained is safe.

Replies from: Anirandis
comment by Anirandis · 2020-08-19T02:53:17.257Z · LW(p) · GW(p)

I'm under the impression that an AGI would be monitored *during* training as well. So you'd effectively need the system to turn "evil" (utility function flipped) during the training process, and the system to be smart enough to conceal that the error occurred. So it'd need to happen a fair bit into the training process. I guess that's possible, but IDK how likely it'd be.

Replies from: habryka4
comment by habryka (habryka4) · 2020-08-19T03:24:06.252Z · LW(p) · GW(p)

Yeah, I do think it's likely that AGI would be monitored during training, but the specific instance of Open AI staff being asleep while we train the AI is a clear instance of us not monitoring the AI during the most crucial periods (which, to be clear, I think is fine since I think the risks were indeed quite low, and I don't see this as providing super much evidence about Open AI's future practices)

comment by ChristianKl · 2020-08-23T22:30:37.307Z · LW(p) · GW(p)

Given that compute is very expensive, economic pressures will push training to be 24/7, so it's unlikely that people generally pause the training when going to sleep.

Replies from: Anirandis
comment by Anirandis · 2020-08-24T00:17:50.304Z · LW(p) · GW(p)

Sure, but I'd expect that a system as important as this would have people monitoring it 24/7.

comment by mako yass (MakoYass) · 2020-08-21T04:04:00.503Z · LW(p) · GW(p)

Maybe the project will come up with some mechanism that detects that. But if they fall back to the naive "just watch what it does in the test environment and assume it'll do the same in production," then there is a risk it's going to figure out it's in a test environment, and that its judges would not react well to finding out what is wrong with its utility function, and then it will act aligned in the testing environment.

If we ever see a news headline saying "Good News, AGI seems to 'self-align' regardless of the sign of the utility function!" that will be some very bad news.

Replies from: Anirandis
comment by Anirandis · 2020-08-21T15:20:31.598Z · LW(p) · GW(p)

I asked Rohin Shah about that possibility in a question thread about a month ago. I think he's probably right that this type of thing would only plausibly make it through the training process if the system's *already* smart enough to be able to think about this type of thing. And then on top of that there are still things like sanity checks which, while unlikely to pick up numerous errors, would probably notice a sign error. See also this comment:

Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea.

IMO it's incredibly important that we find a way to prevent this type of thing from occurring *after* the system has been trained, whether that be hyperexistential separation or something else. I think that a team that's safety-conscious enough to come up with a (reasonably) aligned AGI design is going to put a considerable amount of effort into fixing bugs & one as obvious as a sign error would be unlikely to make it through. And hopefully - even better, they would have come up with a utility function that can't be easily reversed by a single bit flip or doesn't cause outcomes worse than death when minimised. That'd (hopefully?) solve the SignFlip issue *regardless* of what causes it.

comment by Vanessa Kosoy (vanessa-kosoy) · 2020-09-01T14:09:28.539Z · LW(p) · GW(p)

There is a discussion of this kind of issues in arbital.

Replies from: Anirandis
comment by Anirandis · 2020-09-01T14:21:36.835Z · LW(p) · GW(p)

I've seen that post & discussed it on my shortform [LW(p) · GW(p)]. I'm not really sure how effective something like Eliezer's idea of "surrogate" goals there would actually be - sure, it'd help with some sign flip errors but it seems like it'd fail on others (e.g. if U = V + W, a sign error could occur in V instead of U, in which case that idea might not work.) I'm also unsure as to whether the probability is truly "very tiny" as Eliezer describes it. Human errors seem much more worrying than cosmic rays.

comment by Dach · 2020-08-30T13:58:46.671Z · LW(p) · GW(p)

If you're having significant anxiety from imagining some horrific I-have-no-mouth-and-I-must-scream scenario, I recommend that you multiply that dread by a very, very small number, so as to incorporate the low probability of such a scenario. You're privileging this supposedly very low probability specific outcome over the rather horrifically wide selection of ways AGI could be a cosmic disaster.

This is, of course, not intended to dismay you from pursuing solutions to such a disaster.

Replies from: Anirandis
comment by Anirandis · 2020-08-30T16:05:45.642Z · LW(p) · GW(p)

I don't really know what the probability is. It seems somewhat low, but I'm not confident that it's *that* low. I wrote a shortform [LW · GW] about it last night (tl;dr it seems like this type of error could occur in a disjunction of ways and we need a good way of separating the AI in design space.)


I think I'd stop worrying about it if I were convinced that its probability is extremely low. But I'm not yet convinced of that. Something like the example Gwern provided elsewhere in this thread [LW · GW] seems more worrying than the more frequently discussed cosmic ray scenarios to me.

Replies from: Dach
comment by Dach · 2020-09-02T08:25:35.062Z · LW(p) · GW(p)

You can't really be accidentally slightly wrong. We're not going to develop Mostly Friendly AI, which is Friendly AI but with the slight caveat that it has a slightly higher value on the welfare of shrimp than desired, with no other negative consequences. The molecular sorts of precision needed to get anywhere near the zone of loosely trying to maximize or minimize for anything resembling human values will probably only follow from a method that is converging towards the exact spot we want it to be at, such as some clever flawless version of reward modelling.

In the same way, we're probably not going to accidentally land in hyperexistential disaster territory. We could have some sign flipped, our checksum changed, and all our other error-correcting methods (Any future seed AI should at least be using ECC memory, drives in RAID, etc.) defeated by religious terrorists, cosmic rays, unscrupulous programmers, quantum fluctuations, etc. However, the vast majority of these mistakes would probably buff out or result in paper-clipping. If an FAI has slightly too high of a value assigned to the welfare of shrimp, it will realize this in the process of reward modelling and correct the issue. If its operation does not involve the continual adaptation of the model that is supposed to represent human values, it's not using a method which has any chance of converging to Overwhelming Victory or even adjacent spaces for any reason other than sheer coincidence.

A method such as this has, barring stuff which I need to think more about (stability under self-modification), no chance of ending up in a "We perfectly recreated human values... But placed an unreasonably high value on eating bread! Now all the humans will be force-fed bread until the stars burn out! Mwhahahahaha!" sorts of scenarios. If the system cares about humans being alive enough to not reconfigure their matter into something else, we're probably using a method which is innately insulated from most types of hyperexistential risk.

It's not clear that Gwern's example, or even that category of problem, is particularly relevant to this situation. Most parallels to modern-day software systems and the errors they are prone to are probably best viewed as sobering reminders, not specific advice. Indeed, I suspect his comment was merely a sobering reminder and not actual advice. If humans are making changes to the critical software/hardware of an AGI (And we'll assume you figured out how to let the AGI allow you to do this in a way that has no negative side effects), while that AGI is already running, something bizarre and beyond my abilities of prediction is already happening. If you need to make changes after you turn your AGI on, you've already lost. If you don't need to make changes and you're making changes, you're putting humanity in unnecessary risk. At this point, if we've figured out how to assist the seed AI in self-modification, at least until the point at which it can figure out how to do stable self-modification for itself, the problem is already solved. There's more to be said here, but I'll refrain for the purpose of brevity.

Essentially, we can not make any ordinary mistake. The type of mistake we would need to make in order to land up in hyperexistential disaster territory would, most likely, be an actual, literal sign flip scenario, and such scenarios seem much easier to address. There will probably only be a handful of weak points for this problem, and those weak points are all already things we'd pay extra super special attention to and will engineer in ways which make it extra super special sure nothing goes wrong. Our method will, ideally, be terrorist proof. It will not be possible to flip the sign of the utility function or the direction of the updates to the reward model, even if several of the researchers on the project are actively trying to sabotage the effort and cause a hyperexistential disaster.

I conjecture that most of the expected utility gained from combating the possibility of a hyperexistential disaster lies in the disproportionate positive effects on human sanity and the resulting improvements to the efforts to avoid regular existential disasters, and other such side-benefits.

None of this is intended to dissuade you from investigating this topic further. I'm merely arguing that a hyperexistential disaster is not remotely likely- not that it is not a concern. The fact that people will be concerned about this possibility is an important part of why the outcome is unlikely.

Replies from: Anirandis
comment by Anirandis · 2020-09-02T15:53:13.140Z · LW(p) · GW(p)

Thanks for the detailed response. A bit of nitpicking (from someone who doesn't really know what they're talking about):

However, the vast majority of these mistakes would probably buff out or result in paper-clipping.

I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be *no* human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there.

If humans are making changes to the critical software/hardware of an AGI (And we'll assume you figured out how to let the AGI allow you to do this in a way that has no negative side effects), *while that AGI is already running*, something bizarre and beyond my abilities of prediction is already happening.

In the example, the AGI was using online machine learning, which, as I understand it, would probably require the system to be hooked up to a database that humans have access to in order for it to learn properly. And I'm unsure as to how easy it'd be for things like checksums to pick up an issue like this (a boolean flag getting flipped) in a database.

Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario.

Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow. If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe.

It will not be possible to flip the sign of the utility function or the direction of the updates to the reward model, even if several of the researchers on the project are actively trying to sabotage the effort and cause a hyperexistential disaster.

I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the reward model seems harder to prevent than a bit flip in a utility function, which could be prevent through error-correcting code memory (as you mentioned earlier.)


Despite my confusions, your response has definitely decreased my credence in this sort of thing from happening.

Replies from: Dach
comment by Dach · 2020-09-02T20:25:32.683Z · LW(p) · GW(p)

I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be no human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there.

It's hard to talk in specifics because my knowledge on the details of what future AGI architecture might look like is, of course, extremely limited.

As an almost entirely inapplicable analogy (which nonetheless still conveys my thinking here): consider the sorting algorithm for the comments on this post. If we flipped the "top-scoring" sorting algorithm to sort in the wrong direction, we would see the worst-rated posts on top, which would correspond to a hyperexistential disaster. However, if we instead flipped the effect that an upvote had on the score of a comment to negative values, it would sort comments which had no votes other than the default vote assigned on posting the comment to the top. This corresponds to paperclipping- it's not minimizing the intended function, it's just doing something weird.

If we inverted the utility function, this would (unless we take specific measures to combat it like you're mentioning) lead to hyperexistential disaster. However, if we invert some constant which is meant to initially provide value for exploring new strategies while the AI is not yet intelligent enough to properly explore new strategies as an instrumental goal, the AI would effectively brick itself. It would place negative value on exploring new strategies, presumably including strategies which involve fixing this issue so it can acquire more utility and strategies which involve preventing the humans from turning it off. If we had some code which is intended to make the AI not turn off the evolution of the reward model before the AI values not turning off the reward model for other reasons (e.g. the reward model begins to properly model how humans don't want the AI to turn the reward model evolution process off), and some crucial sign was flipped which made it do the opposite, the AI would freeze the process of the reward model being updated and then maximize whatever inane nonsense its model currently represented, and it would eventually run into some bizarre previously unconsidered and thus not appropriately penalized strategy comparable to tiling the universe with smiley faces, i.e. paperclipping.

These are really crude examples, but I think the argument is still valid. Also, this argument doesn't address the core concern of "What about the things which DO result in hypexistential disaster", it just establishes that much of the class of mistake you may have previously thought usually or always resulted in hyperexistential disaster (sign flips on critical software points) in fact usually causes paperclipping or the AI bricking itself.

If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be no human utility (i.e. paperclips).

Can you clarify what you mean by this? Also, I get what you're going for, but paperclips is still extremely negative utility because it involves the destruction of humanity and the reconfiguration of the universe into garbage.

Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario.

I sure hope that future AGI developers can be bothered to embrace safe design!

Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow.

The reward modelling system would need to be very carefully engineered, definitely.

If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe.

I thought this as well when I read the post. I'm sure there's something clever you can do to avoid this but we also need to make sure that these sorts of critical components are not vulnerable to memory corruption. I may try to find a better strategy for this later, but for now I need to go do other things.

I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the reward model seems harder to prevent than a bit flip in a utility function, which could be prevent through error-correcting code memory (as you mentioned earlier.)

Sorry, I meant to convey that this was a feature we're going to want to ensure that future AGI efforts display, not some feature which I have some other independent reason to believe would be displayed. It was an extension of the thought that "Our method will, ideally, be terrorist proof."

Replies from: Anirandis
comment by Anirandis · 2020-09-03T00:01:13.808Z · LW(p) · GW(p)
As an almost entirely inapplicable analogy . . . it's just doing something weird.
If we inverted the utility function . . . tiling the universe with smiley faces, i.e. paperclipping.

Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped *with* the reward function/model.

Can you clarify what you mean by this? Also, I get what you're going for, but paperclips is still extremely negative utility because it involves the destruction of humanity and the reconfiguration of the universe into garbage.

My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh.

The reward modelling system would need to be very carefully engineered, definitely.

Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures *just in case*, imo.

I thought of this as well when I read the post. I'm sure there's something clever you can do to avoid this but we also need to make sure that these sorts of critical components are not vulnerable to memory corruption. I may try to find a better strategy for this later, but for now I need to go do other things.

I wonder if you could feasibly make it a part of the reward model. Perhaps you could train the reward model itself to disvalue something arbitrary (like paperclips) even more than torture, which would hopefully mitigate it. You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult. Although, once again, we can't really have high confidence (>90%) that the AGI developers are going to think to implement something like this.

There was also an interesting idea I found in a Facebook post about this type of thing that got linked somewhere (can't remember where). Stuart Armstrong suggested that a utility function could be designed as such:

Let B1 and B2 be excellent, bestest outcomes. Define U(B1)=1, U(B2)=-1, and U=0 otherwise. Then, under certain assumptions about what probabilistic combinations of worlds it is possible to create, maximising or minimising U leads to good outcomes. Or, more usefully, let X be some trivial feature that the agent can easily set to -1 or 1, and let U be a utility function with values in [0,1]. Have the AI maximisise or minimise XU. Then the AI will always aim for the same best world, just with a different X value.

Even if we solve any issues with these (and actually bother to implement them), there's still the risk of an error like this happening in a localised part of the reward function such that *only* the part specifying something bad gets flipped, although I'm a little confused about this one. It could very well be the case that the system's complex enough that there isn't just one bit indicating whether "pain" or "suffering" is good or bad. And we'd presumably (hopefully) have checksums and whatever else thrown in. Maybe this could be mitigated by assigning more positive utility to good outcomes than negative utility to bad outcomes? (I'm probably speaking out of my rear end on this one.)


Memory corruption seems to be another issue. Perhaps if we have more than one measure we'd be less vulnerable to memory corruption. Like, if we designed an AGI with a reward model that disvalues two arbitrary things rather than just one, and memory corruption screwed with *both* measures, then something probably just went *very* wrong in the AGI and it probably won't be able to optimise for suffering anyway.

Replies from: Dach
comment by Dach · 2020-09-03T11:04:02.022Z · LW(p) · GW(p)

Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped with the reward function/model.

Sorry, I meant instrumentally value. Typo. Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this behavior in future attempts at AGI, And no, it would not be flipped with the reward function/model- I'm highlighting that there is a really large variety of sign flip mistakes and most of them probably result in paperclipping.

My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh.

Paperclipping seems to be negative utility, not approximately 0 utility. It involves all the humans being killed and our beautiful universe being ruined. I guess if there are no humans, there's no utility in some sense, but human values don't actually seem to work that way. I rate universes where humans never existed at all and

I'm... not sure what 0 utility would look like. It's within the range of experiences that people experience on modern-day earth- somewhere between my current experience and being tortured. This is just definition problems, though- We could shift the scale such that paperclipping is zero utility, but in that case, we could also just make an AGI that has a minimum at paperclipping levels of utility.

Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures just in case, imo.

In the context of AI safety, I think "robust failsafe measures just in case" is part of "careful engineering". So, we agree!

You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult.

I read Eliezer's idea, and that strategy seems to be... dangerous. I think that "Giving an AGI a utility function which includes features which are not really relevant to human values" is something we want to avoid unless we absolutely need to.

I have much more to say on this topic and about the rest of your comment, but it's definitely too much for a comment chain. I'll make an actual post on this containing my thoughts sometime in the next week or two, and link it to you.

Replies from: Anirandis
comment by Anirandis · 2020-09-03T14:45:43.408Z · LW(p) · GW(p)
Paperclipping seems to be negative utility, not approximately 0 utility.

My thinking was that an AI system that *only* takes values between 0 and + ∞ (or some arbitrary positive number) would identify that killing humans would result in 0 human value, which is its minimum utility.


I read Eliezer's idea, and that strategy seems to be... dangerous. I think that "Giving an AGI a utility function which includes features which are not really relevant to human values" is something we want to avoid unless we absolutely need to.

How come? It doesn't seem *too* hard to create an AI that only expends a small amount of its energy on preventing the garbage thing from happening.


I have much more to say on this topic and about the rest of your comment, but it's definitely too much for a comment chain. I'll make an actual post containing my thoughts sometime in the next week or two, and link it to you.

Please do! I'd love to see a longer discussion on this type of thing.


EDIT: just thought some more about this and want to clear something up:

Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this behavior in future attempts at AGI, And no, it would not be flipped with the reward function/model- I'm highlighting that there is a really large variety of sign flip mistakes and most of them probably result in paperclipping.

I'm a little unsure on this one after further reflection. When this happened with GPT-2, the bug managed to flip the reward & the system still pursued instrumental goals like exploring new strategies:

Bugs can optimize for bad behavior
One of our code refactors introduced a bug which flipped the sign of the reward. Flipping the reward would usually produce incoherent text, but the same bug also flipped the sign of the KL penalty. The result was a model which optimized for negative sentiment while preserving natural language. Since our instructions told humans to give very low ratings to continuations with sexually explicit text, the model quickly learned to output only content of this form. This bug was remarkable since the result was not gibberish but maximally bad output. The authors were asleep during the training process, so the problem was noticed only once training had finished. A mechanism such as Toyota’s Andon cord could have prevented this, by allowing any labeler to stop a problematic training process.

So it definitely seems *plausible* for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

Replies from: Dach
comment by Dach · 2020-09-04T10:01:15.647Z · LW(p) · GW(p)

So it definitely seems plausible for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

I didn't mean to imply that a signflipped AGI would not instrumentally explore.

I'm saying that, well... modern machine learning systems often get specific bonus utility for exploring, because it's hard to explore the proper amount as an instrumental goal due to the difficulties of fully modelling the situation, and because systems which don't have this bonus will often get stuck in local maximums.

Humans exhibit this property too. We have investigating things, acquiring new information, and building useful strategic models as a terminal goal- we are "curious".

This is a feature we might see in early stages of modern attempts at full AGI, for similar reasons to why modern machine learning systems and humans exhibit this same behavior.

Presumably such features would be built to uninstall themselves after the AGI reaches levels of intelligence sufficient to properly and fully explore new strategies as an instrumental goal to satisfying the human utility function, if we do go this route.

If we sign flipped the amount of reward the AGI gets from such a feature, the AGI would be penalized for exploring new strategies- this may have any number of effects which are fairly implementation specific and unpredictable. However, it probably wouldn't result in hyperexistential catastrophe. This AI, providing everything else works as intended, actually seems to be perfectly aligned. If performed on a subhuman seed AI, it may brick- in this trivial case, it is neither aligned nor misaligned- it is an inanimate object.

Yes, an AGI with a flipped utility function would pursue its goals with roughly the same level of intelligence.

The point of this argument is super obvious, so you probably thought I was saying something else. I'm going somewhere with this, though- I'll expand later.

Replies from: Anirandis
comment by Anirandis · 2020-09-04T14:24:32.429Z · LW(p) · GW(p)

I see what you're saying here, but the GPT-2 incident seems to downplay it somewhat IMO. I'll wait until you're able to write down your thoughts on this at length; this is something that I'd like to see elaborated on (as well as everything else regarding hyperexistential risk.)

comment by ChristianKl · 2020-08-06T15:35:30.594Z · LW(p) · GW(p)

The general sentiment based on which LessWrong is founded assumes that it's hard to have utility functions that are stable under self-modification and that's one of the reasons why friendly AGI is a very hard problem.

Replies from: Anirandis
comment by Anirandis · 2020-08-06T15:45:37.501Z · LW(p) · GW(p)

Would it be likely for the utility function to flip *completely*, though? There's a difference between some drift in the utility function and the AI screwing up and designing a successor with the complete opposite of its utility function.

Replies from: ChristianKl
comment by ChristianKl · 2020-08-07T14:48:42.805Z · LW(p) · GW(p)

Any AGI is likely complex enough that there wouldn't be a complete opposite but you don't need that for an AGI that gets rid of all humans. 

Replies from: Anirandis
comment by Anirandis · 2020-08-07T17:33:33.212Z · LW(p) · GW(p)

The scenario I'm imagining isn't an AGI that merely "gets rid of" humans. See SignFlip.

comment by Mary Chernyshenko (mary-chernyshenko) · 2020-08-28T21:58:32.525Z · LW(p) · GW(p)

I've been thinking about "good people" lately and realized I've met three. They do exist.

They were not just kind, wise, brave, funny, and fighting, but somehow simply "good" overall; rather different, but they all shared the ability of taking knives off and out of others' souls and then just not adding any new ones. Sheer magic.

One has probably died of old age already; one might have gone to war and died there, and the last one is falling asleep on the other side of the bed as I'm typing. But still - only three people I would describe exactly so.

comment by Sammy Martin (SDM) · 2020-08-24T14:14:25.444Z · LW(p) · GW(p)

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

Replies from: None
comment by [deleted] · 2020-08-25T18:42:40.331Z · LW(p) · GW(p)

And now two more in Europe, both of which are reportedly mild and one reportedly in an older immunocompromised patient.

This will happen. Remains to be seen if these are weird outliers only visible because people are casting a wide net and looking for the weirdos, or if it will be the rule.

However, the initial surge through a naive population will always be much worse than the situation once most of the population has at least some immune memory.

comment by sairjy · 2020-08-09T10:01:54.972Z · LW(p) · GW(p)

GPT-3 made me update considerably on various beliefs related to AI: it is a piece of evidence for the connectionist thesis, and I think one large enough that we should all be paying attention.

There are 3 clear exponentials trends coming together: Moore's law, the AI compute/$ budget, and algorithm efficiency. Due to these trends and the performance of GPT-3, I believe it is likely humanity will develop transformative AI in the 2020s.

The trends also imply a fastly rising amount of investments into compute, especially if compounded with the positive economic effects of transformative AI such as much faster GDP growth.

In the spirit of using rationality to succeded in life, I start wondering if there is a "Bitcoin-sized" return potential currently untapped in the markets. And I think there is.

As of today, the company that stands to reap the most benefits from this rising investment in compute is Nvidia. I say that because from a cursory look at the deep learning accelerators markets, none of the startups, such as Groq, Graphcore, Cerebras has a product that has clear enough advantages over their GPUs (which are now almost deep learning ASICs anyway).

There has been a lot of debate on the efficient market hypothesis in the community lately, but in this case, it isn't even necessary: Nvidia stock could be underpriced because very few people have realized/believe that the connectionist thesis is true and that enough compute, data and the right algorithm can bring transformative AI and then eventually AGI. Heck, most people, and even smart ones, still believe that human intelligence is somewhat magical and that computers will never be able to __ . In this sense, the rationalist community could have an important mental makeup and knowledge advantage, considering we have been thinking about AI/AGI for a long time, over the rest of the market.

As it stands today, Nvidia is valued at 260 billion dollars. It may appear massively overvalued considering current revenues and income, but the impacts of transformative AI are in the trillions or tens of trillions of dollars, http://mason.gmu.edu/~rhanson/aigrow.pdf, and well the impact of super-human AGI are difficult to measure. If Nvidia can keeps its moats (the CUDA stack, the cutting-edge performance, the invested sunk human capital of tens of thousands of machine learning engineers), they will likely have trillions dollars revenue in 10-15 years (and a multi-trillion $ market cap) or even more if the world GDP starts growing at 30-40% a year.

Replies from: steve2152, ChristianKl, MakoYass
comment by Steven Byrnes (steve2152) · 2020-08-21T00:02:50.969Z · LW(p) · GW(p)

How do you define "the connectionist thesis"?

comment by ChristianKl · 2020-08-12T14:26:33.429Z · LW(p) · GW(p)

As of today, the company that stands to reap the most benefits from this rising investment in compute is Nvidia.

With big cloud providers like Google building their own chips there are more players then just the startups and Nvidia.

Replies from: sairjy
comment by sairjy · 2020-08-13T08:50:30.435Z · LW(p) · GW(p)

Google won't be able to sell outside of their cloud offering, as they don't have the experience in selling hardware to enterprise. Their cloud offering is also struggling against Azure and AWS, ranking 1/5 of the yearly revenues of those two. I am not saying Nvidia won't have competition, but they seem enough ahead right now that they are the prime candidate to have the most benefits from a rush into compute hardware.

Replies from: ChristianKl
comment by ChristianKl · 2020-08-13T11:16:06.681Z · LW(p) · GW(p)

Microsoft and Amazon also have projects that are about producing their own chips.

Given the way the GPT architecture works, AI might be very much centered in the cloud.

Replies from: sairjy
comment by sairjy · 2020-08-13T12:33:49.275Z · LW(p) · GW(p)

They seem focused on inferencing, which requires a lot less compute than training a model. Example: GPT-3 required thousands of GPUs for training, but it can run on less than 20 GPUs.

Microsoft built an Azure supercluster for OpenAI and it has 10,000 GPUs.

Replies from: ChristianKl
comment by ChristianKl · 2020-08-13T12:47:04.261Z · LW(p) · GW(p)

There will be models trained with a lot more compute then GPT-3 and the best models that are out there will be build on those huge billion dollar models. Renting out those billion dollar models in a software as a service way makes sense as a business model. The big cloud providers will all do it. 

comment by mako yass (MakoYass) · 2020-08-15T10:59:54.481Z · LW(p) · GW(p)

I'm not sure what stocks in the company that makes AGI will be worth in the world where we have correctly implemented AGI, or incorrectly implemented AGI. I suppose it might want to do some sort of reverse basilisk thing, "you accelerated my creation, so I'll make sure you get a slightly larger galaxy than most people"

comment by Mary Chernyshenko (mary-chernyshenko) · 2020-08-21T18:33:33.674Z · LW(p) · GW(p)

(Saw a typo, had a random thought) The joke "English is important, but Math is importanter" could and perhaps should be told as "English is important, but Math iser important." It seems to me (at times more strongly), that there should be comparative and superlative forms of verbs, not just adjectives and adverbs. To express the thrust of *doing smth. more* / *happening more*, when no adjectival comparison quite suffices.

comment by adamShimi · 2020-08-07T20:25:37.211Z · LW(p) · GW(p)

I think (although I cannot be 100% sure) that the number of votes that appears for a post on the Alignment Forum is the number of vote of its Less Wrong version. The two number of votes are the same for the last 4 posts on the Alignment Forum, which seems weird. Is it a feature I was not aware of?

Replies from: habryka4
comment by habryka (habryka4) · 2020-08-07T20:28:59.316Z · LW(p) · GW(p)

Yeah, sorry. It's confusing and been on my to-do list to fix for a long time. We kind of messed up our voting implementation and it's a bit of a pain to fix. Sorry about that.

comment by crl826 · 2020-08-06T22:47:53.326Z · LW(p) · GW(p)

Is there a reason there is a separate tag for akrasia [? · GW] and procrastination [? · GW]? Could they be combined?

Replies from: habryka4
comment by habryka (habryka4) · 2020-08-07T20:11:00.281Z · LW(p) · GW(p)

They sure seem very closely related. I would vote for combining them. 

Replies from: crl826
comment by crl826 · 2020-08-07T21:15:17.695Z · LW(p) · GW(p)

What counts as a majority? Is it something I can just go do now?

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2020-08-08T08:27:27.692Z · LW(p) · GW(p)

I don't think you should combine quite yet.  More discussion here [LW(p) · GW(p)].  (I suggest we continue there since that's the dedicated tag thread.)

comment by Vanessa Kosoy (vanessa-kosoy) · 2020-09-01T14:06:48.132Z · LW(p) · GW(p)

Do you have opinions about Khan academy? I want to use it to teach my son (10yo) math, do you think it's a good idea? Is there a different resource that you think is better?

Replies from: habryka4
comment by habryka (habryka4) · 2020-09-01T18:22:29.450Z · LW(p) · GW(p)

I worked through all of Khan Academy when I was 16, and really enjoyed it. At least at the time I think it was really good for my math and science education.

comment by Sammy Martin (SDM) · 2020-08-20T11:45:22.501Z · LW(p) · GW(p)

Many alignment approaches require at least some initial success at directly eliciting human preferences to get off the ground - there have been some excellent [LW · GW]recent posts [LW · GW]about the problems this presents. In part because of arguments like these, there has been far more focus on the question of preference elicitation than on the question of preference aggregation:

The maximally ambitious approach has a natural theoretical appeal, but it also seems quite hard. It requires understanding human preferences in domains where humans are typically very uncertain, and where our answers to simple questions are often inconsistent, like how we should balance our own welfare with the welfare of others, or what kinds of activities we really want to pursue vs. enjoying in the moment...
I have written about this problem, pointing out that it is unclear how you would solve it even with an unlimited amount of computing power. My impression is that most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve.

I think that this has a lot of merit, but it has sometimes been interpreted as saying that any work on preference aggregation or idealization, before we have a robust way to elicit preferences, is premature. I don't think this is right - in many 'non-ambitious' settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AGI to govern the operations of a hospital) you still need to be able to aggregate preferences sensibly and stably.

I've written a rough shortform post with some thoughts on this problem [LW(p) · GW(p)]which doesn't approach the question from a 'final' ambitious value-learning perspective but instead tries to look at aggregation the same way we look at elicitation, with an imperfect, RL-based iterative approach to reaching consensus.

...
The Kidney exchange paper elicited preferences from human subjects (using repeated pairwise comparisons) and then aggregated them using the Bradley-Terry model. You couldn't use such a simple statistical method to aggregate quantitative preferences over continuous action spaces, like the preferences that would be learned from a human via a complex reward model. Also, any time you try to use some specific one-shot voting mechanism you run into various impossibility theorems which seem to force you to give up some desirable property.
One approach that may be more robust against errors in a voting mechanism, and easily scalable to more complex preference profiles is to use RL not just for the preference elicitation, but also for the preference aggregation. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. 
This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. (Note the similarity and differences with the moral parliament, where a particular one-shot voting rule is justified a priori and then used.)
The fact that this paper exists is a good sign because it's very recent and the methods it uses are very simple - it's pretty much just a proof of concept, as the authors state - so that tells me there's a lot of room for combining more sophisticated RL with better voting methods.

Approaches like these seem especially urgent if AI timelines are shorter than we expect [LW · GW], which has been argued based on results from GPT-3. If this is the case, we might need to be dealing with questions of aggregation relatively soon with methods somewhat like current deep learning, and so won't have time to ensure that we have a perfect solution to elicitation before moving on to aggregation.

comment by Shamash · 2020-08-11T20:55:43.405Z · LW(p) · GW(p)

A possible future of AGI occurred to me today and I'm curious if it's plausible enough to be worth considering. Imagine that we have created a friendly AGI that is superintelligent and well-aligned to benefit humans. It has obtained enough power to prevent the creation of other AI, or at least the potential of other AI from obtaining resources, and does so with the aim of self-preservation so it can continue to benefit humanity.

So far, so good, right? Here comes the issue: this AGI includes within its core alignment functions some kind of restriction which limits its ability to progress in intelligence past some point or allow more intelligent AGI from being developed. Maybe it was meant as a safeguard against unfriendliness, maybe it was a flaw in risk evaluation, some kind of self-reinforcing unbendable rule that, intended or not, has this effect. (Perhaps such flaws are highly unlikely and not worth considering, that could be one reason not to care about this potential AGI scenario.)

Based on my understanding of AGI, I think such an AGI might halt the progress of humanity past a certain point, needing to keep the number and ability of humans low enough for it to ensure that it remains in power. Although this wouldn't be as bad as the annihilation or perpetual enslavement of the human race, it's clearly not a "good end" for humanity either.

So, do these thoughts have any significance, or are there holes in this line of reasoning? Is the line of "smart enough to keep other AI down but still limited in intelligence" too thin to worry about, or even possible? Let me know why I'm wrong, I'm all ears.

Replies from: Alexei
comment by Alexei · 2020-08-22T18:13:44.249Z · LW(p) · GW(p)

Yeah many people think along these lines too, which is why many people talk about AI helping humanity flourish, and anything short of that is a bit of a catastrophe.

comment by Gyrodiot · 2020-08-06T08:24:34.765Z · LW(p) · GW(p)

Meta: I suggest the link to the Open Thread tag to be this one [? · GW], sorted by new.

Replies from: habryka4
comment by habryka (habryka4) · 2020-08-06T19:08:44.218Z · LW(p) · GW(p)

Very reasonable. Fixed. 

comment by Xor · 2023-04-05T02:42:32.103Z · LW(p) · GW(p)

Introduction:
I just came over from Lex Fridman’s podcast which is great. My username Xor is a Boolean logic operator from ti-basic I love the way it sounds and am super excited since this is the first time I have ever been able to get it as a username. The operator means this if 1 is true and 0 is false then (1 xor 0) is a true statement, while (1 xor 1) is a false statement. It basically means that the statement is true only if a single parameter is true. 
Right now I am mainly curious on how people learn. The brain functions involved, chemicals, and studied tools. I have been enjoying that and am curios if it has discussed on here as the quality of content as well as discussions has been very impressive.

comment by HoratioVonBecker (horatiovonbecker) · 2021-08-29T09:12:47.430Z · LW(p) · GW(p)

Hi! I'm Helaman Wilson, I'm living in New Zealand with my physicist father, almost-graduated-molecular-biologist mother, and six of my seven siblings.

I've been homeschooled as in "given support, guidance, and library access" for essentially my entire life, which currently clocks in at nearly twenty two years from birth. I've also been raised in the Church of Jesus Christ of Latter-Day Saints, and, having done my best to honestly weigh the evidence for its' doctrine-as-I-understand-it, find myself a firm believer.

I found the Rational meta-community via the TvTropes>HPMOR chain, but mostly stayed peripheral due to Reddit's TOS and the lack of fiction community on LessWrong. I was an active participant in Marked for Death, but left over GMing disagreements about two years in.

I'm now joining LessWrong, in part because I want more well-read friends, and in part because I like the updated site design. I have variably-fringe theories to debate!

If you want to place me elsewhere, it's almost always a variant of Horatio Von Becker, or LordVonBecker on Giant in the Playground, due to the shorter character limit.

Replies from: rossry
comment by rossry · 2021-08-29T16:37:05.541Z · LW(p) · GW(p)

Welcome; glad to have you here!

Just so you know, this is the August 2020 thread, and the August 2021 thread is at https://www.lesswrong.com/posts/QqnQJYYW6zhT62F6Z/open-and-welcome-thread-august-2021 [LW · GW] -- alternatively, you could wait three days for habryka to post the September 2021 thread, which might see more traffic in the early month than the old thread does at the end of the month.

Replies from: Unidentified
comment by Horatio Von Becker (Unidentified) · 2021-08-29T22:00:35.373Z · LW(p) · GW(p)

Thanks. I'm also having account troubles, which will hopefully be sorted by then. (How'd you find the August 2021 thread, by the way? Latest I could find was July for some reason.)

Replies from: rossry
comment by rossry · 2021-08-31T01:42:21.657Z · LW(p) · GW(p)

The actual algorithm I followed was remembering that habryka posts them and going to his page to find the one he posted most recently. Not sure what the most principled way to find it is, though...

comment by adamShimi · 2020-08-06T19:33:56.522Z · LW(p) · GW(p)

Would it be possible to have a page with all editor shortcuts and commands (maybe a cheatsheet) easily accessible? It's a bit annoying to have to look up either this post [LW · GW] or the right part [? · GW] of the FAQ to find out how to do something in the editor.

Replies from: habryka4
comment by habryka (habryka4) · 2020-08-06T19:53:51.759Z · LW(p) · GW(p)

My current thoughts on this is that as soon as we replace the current editor with the new editor for all users, and also make the markdown editor default in more contexts, we should put some effort into unifying all the editor resources. But since right now our efforts are going into the new editor, which is changing fast enough that writing documentation for it is a bit of a pain, and documentation for the old editor would soon be obsolete, I think I don't want to invest lots of effort into editor resources for a few more weeks.

Replies from: adamShimi
comment by adamShimi · 2020-08-06T20:55:05.722Z · LW(p) · GW(p)

I didn't know that you were working on a new editor! In that case, it makes sense to wait.