Posts
Comments
It sounds like the core idea is a variant of the Intelligence Manhattan Project idea, but with a focus on long term international stability & a ban on competitors.
Perhaps the industry would be more likely to adopt this plan if GUARD could seek revenue the way corporations currently do: by selling stock & API subscriptions. This would also increase productivity for GUARD & shorten the dangerous arms race interval.
I think this sounds fun! The versions of this i'd be most likely to use would be:
- Puzzling over scenarios of satisfying complexity. There could be numerical details, selection bias, unreliable narrator obstacles, cases where users with different values might disagree, etc. Even if the scenario-poster is arguably wrong about the right answer, that could still be interesting.
- Scenarios that you puzzle over & then read a comment section about. Scenarios that you remember & talk about with friends later.
- User-submitted anecdotes from their real lives. This is oddly similar to Reddit's 'Am I the Asshole' threads, but with a focus on becoming more clearheaded & unbiased. Users could sometimes ask for testable predictions about what will happen next, then report back later. So if the pictured scenario came from real life, Maria might ask users how many times Jake will be late in the next 6 months.
- Philosophy-esque thought experiments.
- Scenarios that do indeed benefit my thinking or expand my perspective. Perhaps by improving my mental statistics skills, or exposing me to perspectives of people with very different lives, or demonstrating little-known math subtleties like Simpson's paradox. One failure mode for this would be scenarios like the more boring HR-training courses, where the story doesn't contain any knowledge you don't already know.
Imperial Radch series by Ann Leckie
Very well-crafted world. Some might dislike the robotic narrator, some might enjoy it as a fun layer in a complex plot puzzle. High scifiosity.
Southern Reach trilogy by Jeff VanderMeer
Surreal & unusual novels. Good tone & imagery. Unlike Radch, i think this is more about style & perspective than a style layer over a intricate, hidden plot layer.
Too Like the Lightning by Ada Palmer
I read a lot of scifi, but i haven't gotten this obsessed with a book since Green Mars! Like Radch, a unreliable narrator presents a intricate world. Set on Earth four centuries in the future, it follows the political, technological, & dialectic trajectories of a culture that has mutated in strange & fascinating ways from today. Try it for the economics of future aircraft & the vivid soliloquies. Avoid it if you dislike books that frontload worldbuilding & characters, where the plot is confusing until the end. I love it & i have another post about it here.
This is How You Lose the Time War by Amal El-Mohtar & Max Gladstone
I found this short book very fun & cool. About spies in a extraordinarily spectacular time-travel war. Does feature some very confusing plot points that i still don't understand.
I quite like the Arguman format of flowcharts to depict topics. In a live performance, participants might sometimes add nodes to the flowchart, or sometimes ask for revision to another participant's existing node. For example, asking for rewording for clarity.
Perhaps the better term would be tree, not flowchart. Each node is a response to its parent. This could perhaps be implemented with bulleted lists in a Google Doc.
It's nice for the event to output a useful document.
I call all those examples opinions.
Sure, opinions come to people from a few different sources. I speculate that interpersonal transmission is the most common, but they can also originate in someone's head, either via careful thought or via a brief whim.
People don't have opinions - opinions have people.
Often, one hears someone express a strange, wrong-seeming opinion. The bad habit is to view this as that person's intentional bad action. The good habit is to remember that the person heard this opinion, accepted it as reasonable, & might have put no further thought into the matter.
Opinions are self-replicating & rarely fact-checked. People often subscribe to 2 contradictory opinions.
Epistemic status: I'm trying this opinion on. It's appealing so far.
I like it! In addition, I suppose you could use a topic-wide prior for those groups that you don't have much data on yet.
This is totally delightful!
Personally I'd rather have the public be fascinated with how chatbots think than ignorant of the topic. Sure, non experts won't have a great understanding, but this sounds better than likely alternatives. And I'm sure people will spend a lot of time on either future chatbots, or future video games, or future television, or future Twitter, but I'm not convinced that's a bad thing.
The regulation you mention sounds very drastic & clumsy to my ears. I'd suggest starting by proposing something more widely acceptable, such as regulating highly effective self modifying software that lacks security safeguards.
Basing ethical worth off of qualia is very close to dualism, to my ears. I think instead the question must rest on a detailed understanding of the components of the program in question, & the degree of similarity to the computational components of our brains.
Excellent point. We essentially have 4 quadrants of computational systems:
- Looks nonhuman, internally nonhuman - All traditional software is in this category
- Looks nonhuman, internally humanoid - Future minds that are at risk for abuse (IMO)
- Looks humanoid, internally nonhuman - Not a ethical concern, but people are likely to make wrong judgments about such programs.
- Looks humanoid, internally humanoid - Humans. The blogger claims LaMDA also falls into this category.
Good point. In my understanding it could go either way, but I'm open to the idea that the worst disasters are less than 50% likely, given a nuclear war.
Good point. Unless of course one is more likely to be born into universes with high human populations than universes with low human populations, because there are more 'brains available to be born into'. Hard to say.
In general, whenever Reason makes you feel paralyzed, remember that Reason has many things to say. Thousands of people in history have been convinced by trains of thought of the form 'X is unavoidable, everything is about X, you are screwed'. Many pairs of those trains of thought contradict each other. This pattern is all over the history of philosophy, religion, & politics.
Future hazards deserve more research funding, yes, but remember that the future is not certain.
What's the status of this meetip, CitizenTen? Did you hear back?
I have similar needs. I use a spreadsheet, populated via a Google Form accessible via a shortcut from my phone's main menu. I find it rewarding to make the spreadsheet display secondary metrics & graphs too.
Other popular alternatives include Habitica & habitdaily.app (iPhone only). I'm still looking for a perfect solution, but my current tools are pretty good for my needs.
I'm not sure either. Might only be needed for the operating fees.
Agreed. We might refer to them as 'leaderless orgs' or 'staffless networks'.
Does this reduction come from seniority? Is the idea that older organizations are generally more reliable?
Are you saying there would be a causal link from the poor person's vaccine:other ratio to the rich person's purchasing decision? How does that work?
Thanks! Useful info.
Can you clarify why the volcano triggering scheme in 3 would not be effective? It's not obvious. The scheme sounds rather lethal.
Welcome! Discovering the rationalsphere is very exciting, isn't it? I admire your passion for self improvement.
I don't know if I have advice that isn't obvious. Read whoever has unfamiliar ideas. I learned a lot from reading Robin Hanson and Paul Christiano.
As needed, journal or otherwise speak to yourself.
Be wary of the false impression that your efforts have become ruined. Sometimes i encounter a disrespectful person or a shocking philosophical argument that makes me feel like giving up on a wide swathe of my life. I doubt giving up is appropriate in these disheartening circumstances.
Seek to develop friendships with people you can have great conversations with.
Speak to rationalists like you would speak to yourself, and speak tactfully to everyone else.
That's the advice i would give to a version of myself in your situation. Have fun!
Okay, deciding randomly to exploit one possible simulator makes sense.
As for choosing exactly what to see the output cells of the simulation to... I'm still wrapping my head around it. Is recursive simulation the only way to exploit these simulations from within?
Great post. I encountered many new ideas here.
One point confuses me. Maybe I'm missing something. Once the consequentialists in a simulation are contemplating the possibility of simulation, how would they arrive at any useful strategy? They can manipulate the locations that are likely to be the output/measurement of the simulation, but manipulate to what values? They know basically nothing about how the input will be interpreted, what question the simulator is asking, or what universe is doing the simulation. Since their universe is very simple, presumably many simulators are running identical copies of them, with different manipulation strategies being appropriate for each. My understanding of this sounds less like malign and more like blindly mischievous.
TLDR How do the consequentialists guess which direction to bias the output towards?
I indeed upvoted it for the update / generally valuable contribution to the discussion.
a) Agreed, although I don't find this inappropriate in context.
b) I do agree that the fact that many successful past civilizations are now in ruins with their books lost is a important sign of danger. But surely there is some onus of proof in the opposite direction from the near-monotonic increase in population over the last few millennia?
c) These are certainly extremely important problems going forwards. I would particularly emphasize the nukes.
d) Agreed. But on the centuries scale, there is extreme potential in orbital solar power and fusion.
e) Agreed. But I think it's easy to underestimate the problems our ancestors faced. In my opinion, some huge ones of past centuries include: ice ages, supervolcanic eruptions, the difficulty of maintaining stable monarchies, the bubonic plague, Columbian smallpox, the ubiquitous oppression of women, harmful theocracies, majority illiteracy, the Malthusian dilemma, and the prevalence of total war as a dominant paradigm. Is there evidence that past problems were easier than 2019 ones?
It sounds like your perspective is that, before 2100, wars and upcoming increases in resource scarcity will cause a inescapable global economic decline that will bring most of the planet to a 1800s-esque standard of living, followed by a return to slow growth (standard of living, infrastructure, food, energy, productivity) for the next couple centuries. Do I correctly understand your perspective?
Epistemics: Yes, it is sound. Not because of claims (they seem more like opinions to me), but because it is appropriately charitable to those that disagree with Paul, and tries hard to open up avenues of mutual understanding.
Valuable: Yes. It provides new third paradigms that bring clarity to people with different views. Very creative, good suggestions.
Should it be in the Best list?: No. It is from the middle of a conversation, and would be difficult to understand if you haven't read a lot about the 'Foom debate'.
Improved: The same concepts rewritten for a less-familiar audience would be valuable. Or at least with links to some of the background (definitions of AGI, detailed examples of what fast takeoff might look like and arguments for its plausibility).
Followup: More posts thoughtfully describing positions for and against, etc. Presumably these exist, but i personally have not read much of this discussion in the 2018-2019 era.
This is a little nitpicky, but i feel compelled to point out that the brain in the 'human safety' example doesn't have to run for a billion years consecutively. If the goal is to provide consistent moral guidance, the brain can set things up so that it stores a canonical copy of itself in long-term storage, runs for 30 days, then hands off control to another version of itself, loaded from the canonical copy. Every 30 days control is handed to a instance of the canonical version of this person. The same scheme is possible for a group of people.
But this is a nitpick, because i agree that there are probably weird situations in the universe where even the wisest human groups would choose bad outcomes given absolute power for a short time.
I appreciate this disentangling of perspectives. I had been conflating them before, but i like this paradigm.
I found this uncomfortable and unpleasant to read, but i'm nevertheless glad i read it. Thanks for posting.
I think the abridgement sounds nice but don't anticipate it affecting me much either way.
I think the ability to turn this on/off in user preferences is a particularly good idea (as mentioned in Raemon's comment).
I can follow most of this, but i'm confused about one part of the premise.
What if the agent created a low-resolution simulation of its behavior, called it Approximate Self, and used that in its predictions? Is the idea that this is doable, but represents a unacceptably large loss of accuracy? Are we in a 'no approximation' context where any loss of accuracy is to be avoided?
My perspective: It seems to me that humans also suffer from the problem of embedded self-reference. I suspect that humans deal with this by thinking about a highly approximate representation of their own behavior. For example, when i try to predict how a future conversation will go, i imagine myself saying things that a 'reasonable person' might say. Could a machine use a analogous form of non-self-referential approximation?
Great piece, thanks for posting.
It's relevant to some forms of utilitarian ethics.
I think this is a clever new way of phrasing the problem.
When you said 'friend that is more powerful than you', that also made me think of a parenting relationship. We can look at whether this well-intentioned personification of AGI would be a good parent to a human child. They might be able to give the child a lot of attention, a expensive education, and a lot of material resources, but they might take unorthodox actions in the course of pursuing human goals.
(I'm not zhukeepa; i'm just bringing up my own thoughts.)
This isn't quite the same as a improvement, but one thing that is more appealing about normal-world metaphilosophical progress than empowered-person metaphilosophical progress is that the former has a track record of working*, while the latter is untried and might not work.
*Slowly and not without reversals.
It implies that the Occamian prior should work well in any universe where the laws of probability hold. Is that really true?
Just to clarify, are you referring to the differences between classical probability and quantum amplitudes? Or do you mean something else?
Why do you think so? It's a thought experiment about punitive acausal trade from before people realized that benevolent acausal trade was equally possible. I don't think it's the most interesting idea to come out of the Less Wrong community anymore.
Noted!
Sorry, i couldn't find the previous link here when i searched for it.
Just to be clear, i'm imagining counterfactual cooperation to mean the FAI building vaults full of paperclips in every region where there is a surplus of aluminium (or a similar metal). In the other possibility branch, the paperclip maximizer (which thinks identically) reciprocates by preserving semi-autonomous cities of humans among the mountains of paperclips.
If my understanding above is correct, then yes, i think these two would cooperate IF this type of software agent shares my perspective on acausal game theory and branching timelines.
In the last 48 hours i've felt the need for more than one of the abilities above. These would be very useful conversational tools.
I think some of these would be harder than others. This one sounds hard: 'Letting them now that what they said set off alarms bells somewhere in your head, but you aren’t sure why.' Maybe we could look for both scripts that work between two people who already trust each other, and scripts that work with semi-strangers. Or scripts that do and don't require both participants to have already read a specific blog post, etc.
Something like a death risk calibration agency? Could be very interesting. Do any orgs like this exist? I guess the CDC (in the US govt) probably quantitively compares risks within the context of disease.
One quote in your post seems more ambitious than the rest: 'helping retrain people if a thing that society was worried about seems to not be such a problem'. I think that tons of people evaluate risks based on how scary they seem, not based on numerical research.
Note on 3D printing: Yeah, that one might take a while. It's actually been around for decades, but still hasnt become cheap enough to make a big impact. I think it'll be one of those techs that takes 50+ years to go big.
Source: I used to work in the 3D printer industry.
I first see the stems, then i see the leaves.
I think humans spend a lot of time looking at our models of the world (maps) and not that much time looking at our actual sensory input.
A similar algorithm appears in Age of Em by Robin Hanson ('spur safes' in Chapter 14). Basically, a trusted third party allows copies of A and B to analyze each other's source code in a sealed environment, then deletes almost everything that is learned.
A and B both copy their source code into a trusted computing environment ('safe'), such as an isolated server or some variety of encrypted VM. The trusted environment instantiates a copy of A (A_fork) and gives it B_source to inspect. Similarly, B_fork is instantiated and allowed to examine A_source. There can be other inputs, such as some contextual information and a contract to discuss. They examine the code for several hours or so, but this is not risky to A or B because all information inside the trusted environment will mandatorily be deleted afterwards. The only outputs from the trusted environment are a secure channel from A_fork to A and one from B_fork to B. These may only ever output an extremely low-resolution one-time report. This can be one of the following 3 values: 'Enter into the contract with the other', 'Do not enter into the contract with the other', or 'Maybe enter the contract'.
This does require a trusted execution environment, of course.
I don't know if this idea is original to Hanson.
Favorite highlight:
'Likewise, great literature is typically an integrated, multi-dimensional depiction. While there is a great deal of compression, the author is still trying to report how things might really have happened, to satisfy their own sense of artistic taste for plausibility or verisimilitude. Thus, we should expect that great literature is often an honest, highly informative account of everything except what the author meant to put into it.'
The techniques you outline for incorporating narrow agents into more general systems have already been demoed, I'm pretty sure. A coordinator can apply multiple narrow algorithms to a task and select the most effective one, a la IBM Watson. And I've seen at least one paper that uses a RNN to cultivate a custom RNN with the appropriate parameters for a new situation.
I'm updating because I think you outline a very useful concept here. Narrow algorithms can be made much more general given a good 'algorithm switcher'. A canny switcher/coordinator program can be given a task and decide which of several narrow programs to apply to it. This is analogous to the IBM Watson system that competed in Jeopardy and to the human you describe using a PC to switch between applications. I often forget about this technique during discussions about narrow machine learning software.