Posts
Comments
I use daily checklists, in spreadsheet form, for this.
Was this possibly a language thing? Are there Chinese or Indian machine learning researchers who would use a different term than AGI in their native language?
If your takeaway is only that you should have fatter tails on the outcomes of an aspiring rationality community, then I don't object.
If "I got some friends together and we all decided to be really dedicatedly rational" is intended as a description of Ziz and co, I think it is a at least missing many crucial elements, and generally not a very good characterization.
I think this post cleanly and accurately elucidates a dynamic in conversations about consciousness. I hadn't put my finger on this before reading this post, and I noe think about it every time I hear or participate in a discussion about consciousness.
Short, as near as I can tell, true, and important. This expresses much of my feeling about the world.
Perhaps one of the more moving posts I've read recently, of direct relevance to many of us.
I appreciate the simplicity and brevity in expressing a regret that resonate strongly with.
The general exercised of reviewing prior debate, now that ( some of ) the evidence is come in, seems very valuable, especially if one side of the debate is making high level claims that their veiw has been vindicated.
That said, I think there were several points in this post where I thought the author's read of the current evidence is/was off or mistaken. I think this overall doesn't detract too much from the value of the post, especially because it prompted discussion in the comments.
I don't remember the context in detail, so I might be mistaken about Scott's specific claims. But I currently think this is a misleading characterization.
Its conflating two distinct phenomena, namely non-mystical cult leader-like charisma / reality distortion fields, on the one hand, and metaphysical psychic powers, on the other, under the label "spooky mind powers", to imply someone is reasoning in bad faith or at least inconsistently.
It's totally consistent to claim that the first thing is happening, while also criticizing someone for believing that the second thing is happening. Indeed, this seems like a correct read of the situation to me, and therefore a natural way to interpret Scott's claims.
I think about this post several times a year when evaluating plans.
(Or actually, I think about a nearby concept that Nate voiced in person to me, about doing things that you actually believe in, in your heart. But this is the public handle for that.)
I don't understand how the second sentence follows from the first?
Disagreed insofar by "automatically converted" you mean "the shortform author has no recourse against this'".
No. That's why I said the feature should be optional. You can make a general default setting for your shortform, plus there should and there should be a toggle (hidden in the three dots menu?) to turn this on and off on a post by post basis.
I agree. I'm reminded of Scott's old post The Cowpox of Doubt, about how a skeptics movement focused on the most obvious pseudoscience is actually harmful to people's rationality because it reassures them that rationality failures are mostly obvious mistakes that dumb people make instead of hard to notice mistakes that I make.
And then we get people believing all sorts of shoddy research – because after all, the world is divided between things like homeopathy that Have Never Been Supported By Any Evidence Ever, and things like conventional medicine that Have Studies In Real Journals And Are Pushed By Real Scientists.
Calling groups cults feels similar, in that it allows one to write them off as "obviously bad" without need for further analysis, reassures one that their own groups (which aren't cults, of course) are obviously unobjectionable.
Read ~all the sequences. Read all of SSC (don't keep up with ACX).
Pessimistic about survival, but attempting to be aggresively open-minded about what will happen instead of confirmation biasing my views from 2015.
your close circle is not more conscious or more sentient than people far away, but you care about your close circle more anyways
Or, more specifically, this is a non-sequitor to my deonotology, which holds regardless of whether I personally like or privately wish for the wellbeing of any particular entity.
Well presumably because they're not equating "moral patienthood" with "object of my personal caring".
Something can be a moral patient, who you care about to the extent you're compelled by moral claims, or who's rights you are deontologically prohibited from trampling on, without your caring about that being in particular.
You might make the claim that calling something a moral patient is the same as saying that you care (at least a little bit) about its wellbeing, but not everyone buys that calim.
An optional feature that I think LessWrong should have: shortform posts that get more than some amount of karma get automatically converted into personal blog posts, including all the comments.
It should have a note at the top "originally published in shortform", with a link to the shortform comment. (All the copied comments should have a similar note).
What would be the advantage of that?
There's some recent evidence that non-neural cells have memory like functions. This doesn't, on its own, entail that non-neural cell are maintaining personality-relevant or self-relevant information.
I got it eventaully!
Shouldn't we expect that ultimately the only thing selected for is mostly caring about long run power?
I was attempting to address that in my first footnote, though maybe it's too important a consideration to be relegated to a footnote.
To say it differently, I think we'll see selection evolutionary fitness, which can take two forms:
- Selection on AIs' values, for values that are more fit, given the environment.
- Selection on AIs' rationality and time preference, for long-term strategic VNM rationality.
These are "substitutes" for each other. An agent can either have adaptive values, adaptive strategic orientation, or some combination of both. But agents that fall below the Pareto frontier described by those two axes[1], will be outcompeted.
Early in the singularity, I expect to see more selection on values, and later in the singularity (and beyond), I expect to see more selection on strategic rationality, because I (non-confidently) expect the earliest systems to be myopic and incoherent in roughly similar ways to humans (though probably the distribution of AIs will vary more on those traits than humans).
The fewer generations there are before strong, VNM agents with patient values / long time preferences, the less I expect small amounts of caring for human in AI systems will be eroded.
- ^
Actually, "axes" are a bit misleading since the space of possible values is vast and high dimensional. But we can project it onto the scalar of "how fit are these values (given some other assumptions)?"
[I can imagine this section being mildly psychologically info-hazardous to some people. I believe that for most people reading this is fine. I don't notice myself psychologically affected by these ideas, and I know a number of other people who believe roughly the same things, and also seem psychologically totally healthy. But if you are the kind of person who gets existential anxiety from thought experiments, like from thinking about being a Boltzmann-brain, then you should consider skipping this section, I will phrase the later sections in a way that they don't depend on this part.]
Thank you for the the warning!
I wasn't expecting to read an argument that the very fact that I'm reading this post is reason to think that I (for some notion of "I") will die, within minutes!
That seems like a reasonable thing to have a content warning on.
To whom are are you talking?
I suspect it would still involve billions of $ of funding, partnerships like the one with Microsoft, and other for-profit pressures to be the sort of player it is today. So I don't know that Musk's plan was viable at all.
Note that all of this happened before the scaling hypothesis was really formulated, much less made obvious.
We now know, with the benefit of hindsight that developing AI and it's precursors is extremely compute intensive, which means capital intensive. There was some reason to guess this might be true at the time, but it wasn't a forgone conclusion—it was still an open question if the key to AGI would be mostly some technical innovation that hadn't been developed yet.
Those people don't get substantial equity in most business in the world. They generally get paid a salary and benefits in exchange for their work, and that's about it.
I don't think that's a valid inference.
Ok. So I haven't thought through these proposals in much detail, and I don't claim any confident take, but my first response is "holy fuck, that's a lot of complexity. It really seems like there will be some flaw in our control scheme that we don't notice, if we're stacking a bunch of clever ideas like this one on top of each other."
This is not at all to be taken as a disparagement of the authors. I salute them for their contribution. We should definitely explore ideas like these, and test them, and use the best ideas we have at AGI time.
But my intuitive first order response is "fuck."
But he helped found OpenAI, and recently founded another AI company.
I think Elon's strategy of "telling the world not to build AGI, and then going to start another AGI company himself" is much less dumb / ethical fraught, than people often credit.
Thinking about this post for a bit shifted my view of Elon Musk a bit. He gets flack for calling for an AI pause, and then going and starting an AGI lab, and I now think that's unfair.
I think his overall strategic takes are harmful, but I do credit him with being basically the only would-be AGI-builder who seems to me to be engaged in a reformative hypocrisy strategy. For one thing, it sounds like he went out of his way to try to get AI regulated (talking to congress, talking to the governors), and supported SB-1047.
I think it's actually not that unreasonable to shout "Yo! This is dangerous! This should be regulated, and controlled democratically!", see that that's not happening, and then go and try do it in a way that you think is better.
That seems like possibly an example of "follower-conditional leadership." Taking real action to shift to the better equilibrium, failing, and then going back to the dominant strategy given the inadequate equilibrium that exists.
Obviously he has different beliefs than I do, and than my culture does, about what is required for a good outcome. I think he's still causing vast harms, but I think he doesn't deserve the eye-roll for founding another AGI lab after calling for everyone to stop.
You maybe right. Maybe the top talent wouldn't have gotten on board with that mission, and so it wouldn't have gotten top talent.
I bet Illya would have been in for that mission, and I think a surprisingly large number of other top researchers might have been in for it as well. Obviously we'll never know.
And I think if the founders are committed to a mission, and they reaffirm their commitment in every meeting, they can go surprisingly far in making in the culture of an org.
Also, Sam Altman is a pretty impressive guy. I wonder what would have happened if he had decided to try to stop humanity from building AGI, instead of trying to be the one to do it instead of google.
Absolutely true.
But also Altman's actions since are very clearly counter to the spirit of that email. I could imagine a version of this plan, executed with earnestness and attempted cooperativeness, that wasn't nearly as harmful (though still pretty bad, probably).
Part of the problem is that "we should build it first, before the less trustworthy" is a meme that universalizes terribly.
Part of the problem is that Sam Altman was not actually sincere in the the execution of that sentiment, regardless of how sincere his original intentions were.
I predict this won't work as well as you hope because you'll be fighting the circadian effect that partially influences your cognitive performance.
Also, some ways to maximize your sleep quality are too exercise very intensely and/or to sauna, the day before.
It's possible no one tried literally "recreate OkC", but I think dating startups are very oversubscribed by founders, relative to interest from VCs
If this is true, it's somewhat cruxy for me.
I'm still disappointed that no one cared enough to solve this problem without VC funding.
I only skimmed the retrospective now, but it seems mostly to be detailing problems that stymied their ability to find traction.
Right. But they were not relentlessly focused on solving this problem.
I straight up don't believe that that the problems outlined can't be surmounted, especially if you're going for a cashflow business instead of an exit.
But I think you're trying to draw an update that's something like "tech startups should be doing an unbiased search through viable valuable business, but they're clearly not", or maybe, "tech startups are supposed to be able to solve a large fraction of our problems, but if they can't solve this, then that's not true", and I don't think either of these conclusions seem that licensed from the dating data point.
Neither of those, exactly.
I'm claiming that the narrative around the startup scene is that they are virtuous engines of [humane] value creation (often in counter to a reactionary narrative that "big tech" is largely about exploitation and extraction). It's about "changing the world" (for the better).
This opportunity seems like a place where one could have traded meaningfully large personal financial EV for enormous amounts of humane value. Apparently no founder wanted to take that trade. Because I would expect there to be variation in how much funders are motivated by money vs. making a mark on the world vs. creating value vs. other stuff, that fact that (to my knowledge) no founder went for it, is evidence about the motivations of the whole founder class. The number of founders who are more interested in creating something that helps a lot of people than they are in making a lot of money (even if they're interested in both) is apparently very small.
Now, maybe startups actually do create lots of humane value, even if they're created by founders and VC's motivated by profit. The motivations of of the founders are only indirect evidence about the effects of startups.
But the tech scene is not motivated to optimize for this at all?? That sure does update me about how much the narrative is true vs. propaganda.
Now if I'm wrong and old OkCupid was only drastically better for me and my unusually high verbal intelligence friends, and it's not actually better than the existing offerings for the vast majority of people, that's a crux for me.
You mention manifold.love, but also mention it's in maintenance mode – I think because the type of business you want people to build does not in fact work.
From their retrospective:
Manifold.Love is going into maintenance mode while we focus on our core product. We hope to return with improvements once we have more bandwidth; we’re still stoked on the idea of a prediction market-based dating app!
It sounds less like they found it didn't work, and more like they have other priorities and aren't (currently) relentlessly pursing this one.
I didn't say Silicon Valley is bad. I said that the narrative about Silicon Valley is largely propagnada, which can be true independently of how good or bad it is, in absolute terms, or relative to the rest of the world.
Yep. I'm aware, and strongly in support.
But it took this long (and even now, isn't being done by a traditional tech founder). This project doesn't feel like it ameliorates my point.
The fact that there's a sex recession is pretty suggestive that tinder and the endless stream of tinder clones doesn't serve people very well.
Even if you don't assess potential romantic partners by reading their essays, like I do, OkC's match percentage meant that you could easily filter out 95% of the pool to people who are more likely to be compatible with you, along whatever metrics of compatibility you care about.
That no one rebuilt old OkCupid updates me a lot about how much the startup world actually makes the world better
The prevailing ideology of San Francisco, Silicon Valley, and the broader tech world, is that startups are an engine (maybe even the engine) that drives progress towards a future that's better than the past, by creating new products that add value to people's lives.
I now think this is true in a limited way. Software is eating the world, and lots of bureaucracy is being replaced by automation which is generally cheaper, faster, and a better UX. But I now think that this narrative is largely propaganda.
It's been 8 years since Match bought and ruined OkCupid and no one, in the whole tech ecosystem, stepped up to make a dating app even as good as old OkC is a huge black mark against the whole SV ideology of technology changing the world for the better.
Finding a partner is such a huge, real, pain point for millions of people. The existing solutions are so bad and extractive. A good solution has already been demonstrated. And yet not a single competent founder wanted to solve that problem for planet earth, instead of doing something else, that (arguably) would have been more profitable. At minimum, someone could have forgone venture funding and built this as a cashflow business.
It's true that this is a market that depends on economies of scale, because the quality of your product is proportional to the size of your matching pool. But I don't buy that this is insurmountable. Just like with any startup, you start by serving a niche market really well, and then expand outward from there. (The first niche I would try for is by building an amazing match-making experience for female grad students at a particular top university. If you create a great experience for the women, the men will come, and I'd rather build an initial product for relatively smart customers. But there are dozens of niches one could try for.)
But it seems like no one tried to recreate OkC, much less creating something better, until the manifold team built manifold.love (currently in maintenance mode)? Not that no one succeeded. To my knowledge, no else one even tried. Possibly Luna counts, but I've heard through the grapevine that they spent substantial effort running giant parties, compared to actually developing and launching their product—from which I infer that they were not very serious. I've been looking for good dating apps. I think if a serious founder was trying seriously, I would have heard about it.
Thousands of funders a year, and no one?!
That's such a massive failure, for almost a decade, that it suggests to me that the SV ideology of building things that make people's lives better is broadly propaganda. The best founders might be relentlessly resourceful, but a tiny fraction of them seem to be motivated by creating value for the world, or this low hanging fruit wouldn't have been left hanging for so long.
This is of course in addition to the long list of big tech companies who exploit their network-effect monopoly power to extract value from their users (often creating negative societal externalities in the process), more than creating value for them. But it's a weaker update that there are some tech companies that do ethically dubious stuff, compared to the stronger update that there was no startup that took on this obvious, underserved, human problem.
My guess is that the tech world is a silo of competence (because competence is financially rewarded), but operates from an ideology with major distortions / blindspots, that are disconnected from commonsense reasoning about what's Good. eg following profit incentives, and excitement about doing big things (independent from whether those good things have humane or inhumane impacts) off a cliff.
my current guess is that the superior access to large datasets of big institutions gives them too much of an advantage for me to compete with, and I'm not comfortable with joining one.
Very much a side note, but the way you phrased this suggest that you might have ethical concerns? Is that right? If so, what are they?
Advice that looked good: buy semis (TSMC, NVDA, ASML, TSMC vol in particular)
Advice that looked okay: buy bigtech
Advice that looked less good: short long bonds
None of the above was advice, remember! It was...entertainment or something?
Thank you for writing this! I would love to see more of this kind of analysis on LessWrong.
You have been attacked by a pack of stray dogs twice?!?!
During the period when I was writing down dreams, I was also not using an alarm. I did train myself to wake up instantly, at the time that I wanted, to the minute. I agree that the half-awake state is anti-helpful for remembering dreams.
The alarm is helpful for starting out though.
We currently have too many people taking it all the way into AI town.
I reject the implication that AI town is the last stop on the crazy train.
I don't have much information about your case, but I'd make a 1-to-1 bet that if you got up and wrote down your dreams first thing in the morning every morning, especially if you're woken up by an alarm for the first 3 times, that you'd start remembering your dreams. Just jot dow whatever you remember, however vague or instinct, upto and including "litterally nothing. The the last thing I remember is going to bed last night."
I rarely remember my own dreams, but in periods of my life when I've kept a dream journal, I easily remembered them.
I think that, in almost full generality, we should taboo the term "values". It's usually ambiguous between a bunch of distinct meanings.
- The ideals that, when someone contemplates, invoke strong feelings (of awe, motivation, excitement, exultation, joy, etc.)
- The incentives of an agent in a formalized game with quantified payoffs.
- A utility function - one's hypothetical ordering over words, world-trajectories, etc, that results from comparing each pair and evaluating which one is better.
- A person's revealed preferences.
- The experiences and activities that a person likes for their own sake.
- A person's vision of an ideal world. (Which, I claim, often reduces to "an imagined world that's aesthetically appealing.")
- The goals that are at the root of a chain or tree of instrumental goals.
- [This often comes with an implicit or explicit implication that most of human behavior has that chain/tree structure, as opposed being, for instance, mostly hardcoded adaptions, or a chain/tree of goals that grounds out in a mess of hardcoded adaptions instead of anything goal-like.]
- The goals/narratives that give meaning to someone's life.
- [It can be the case almost all one's meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
For instance, for some but not most EAs, EA is very central to their personal meaning-making, but they could easily have ended up as a social justice warrior, or a professional Libertarian, instead. And those counterfactual worlds, the other ideology is similarly central to their happiness and meaning-making. I think in such cases, it's at least somewhat confused if to look at the EA and declare that "maximizing [aggregate/average] utility" is their "terminal value". That's papering over the psychological process that adopts ideology or another, which is necessarily more fundamental than the specific chosen ideology/"terminal value".
It's kind of like being in love with someone. You might love your wife more than anything, she might be the most important person in your life. But if you admit that it's possible that if you had been in different communities in your 20s you might have married someone else, then there's some other goal/process that picks who to marry. So to with ideologies.]
- [It can be the case almost all one's meaning can come through a particular meaning-making schema, but from a broader perspective, a person could have been ~indifferent between multiple schema.
- Behaviors and attitudes that signal well regarded qualities.
- Core States.
- The goals that are sacred to a person, for many possible meanings of sacred.
- What a person "really wants" underneath their trauma responses. What they would want, if their trauma was fully healed.
- The actions make someone feel most alive and authentically themselves.
- The equilibrium of moral philosophy, under arbitrary reflection.
Most of the time when I see the word "values" used on LessWrong, it's ambiguous between theses (and other) meanings.
A particular ambiguity: sometimes "values" seem to be referring to the first-person experiences that a person likes for their own sake ("spending time near beautiful women is a terminal value for me"), and other times it seems to be referring to a world that a person thinks is awesome, when viewing that world from a god's eye view. Those are not the same thing, and they do not have remotely the same psychological functions! Among other differences, one is a near-mode evaluation, and the other is a far-mode evaluation.
Worse than that, I think there's often a conflation of these meanings.
For instance, I often detect a hidden assumption that that the root of someone's tree of instrumental goals is the same thing as their ranking over possible worlds. I think that conflation is very rarely, if ever, correct: the deep motivations of a person's actions are not the same thing as the hypothetical world that is evaluated as best in thought experiments, even if the later thing is properly the person's "utility function". At least in the vast majority of cases, one's hypothetical ideal world has almost no motivational power (as a matter of descriptive psychology, not of normative philosophy).
Also (though this is the weakest reason to change our terminology, I think), there's additional ambiguity to people who are not already involved in the memeplex.
To broader world "values" usually connotes something high-minded or noble: if you do a corporate-training-style exercise to "reflect on your values", you get things like "integrity" and "compassion", not things like "sex" or "spite". In contrast, LessWrongers would usually count sex and spite, not to mention boredom and pain, as part of "human values" and many would also own them as part of their personal values.
Are you saying this because you worship the sun?
Yes! that analogy is helpful for communicating what you mean!
I still have issues with your thesis though.
I agree that this "explaining away" thing could be a reasonable way to think about if eg the situation where I get sick, and while I'm sick, some activity that I usually love (let's say singing songs) feels meaningless. I probably shouldn't conclude that "my values" changed, just that the machinery that implements my reward circuitry is being thrown off by my being sick.
On the other hand, I think I could just as well describe this situation as extending the domain over which I'm computing my values. eg "I love and value singing songs, when I'm healthy, but when I'm sick in a particular way, I don't love it. Singing-while-healthy is meaningful; not singing per-se."
In the same way, I could choose to call the blue screen phenomenon an error in the TV, or I could include that dynamic as part of the "predict what will happen with the screen" game. Since there's no real apple that I'm trying to model, only an ephemeral image of the apple, there's not a principled place to stand on whether to view the blue-screen as an error, or just part of the image generating process.
For any given fuckery with my reward signals, I could call them errors, misrepresenting my "true values" or I could embrace them as expressing a part of my "true values." And if two people disagree about which conceptualization to go with, I don't know how they could possibly resolve it. They're both valid frames, fully consistent with the data. And they can't get distinguishing evidence, even in principle.
(I think this is not an academic point. I think people disagree about values in this way reasonably often.
Is enjoying masturbating to porn an example of your reward system getting hacked by external super-stimuli, or is that just part of the expression of your true values? Both of these are valid ways to extrapolate from the reward data time series. Which things count as your reward system getting hacked, and which things count as representing your values. It seems like a judgement call!
The classic and most fraught example is that some people find it drop dead obvious that they care about the external world, and not just their sense-impressions about the external world. They're horrified by the thought of being put in an experience machine, even if their subjective experience would be way better.
Other people just don't get this. "But your experience would be exactly the same as if the world was awesome. You wouldn't be able to tell the difference", they say. It's obvious to them that they would prefer the experience machine, as long as their memory was wiped so they didn't know they were in one.[1])
Talking about an epistemic process attempting to update your model of an underlying not-really-real-but-sorta structure seems to miss the degrees of freedom in the game. Since there's no real apple, no one has any principled place to stand in claiming that "the apple really went half blue right there" vs. "no the TV signal was just interrupted." Any question about what the apple is "really doing" is a dangling node. [2]
As a separate point, while I agree the "explaining away disruptions" phenomenon is ever a thing that happens, I don't think that's usually what's happening when a person reflects on their values. Rather I guess that it's one of the three options that I suggested above.
- ^
Tangentially, this is why I expect that the CEV of humans diverges. I think some humans, on maximal reflection, wirehead, and others don't.
- ^
Admittedly, I think the question of which extrapolation schema to use is itself decided by "your values", which ultimately grounds out in the reward data. Some people have perhaps a stronger feeling of indigence about others hiding information from them, or perhaps a stronger sense of curiosity, or whatever, that crystalizes into a general desire to know what's true. Other people have less of that. And so they have different responses to the experience-machine hypothetical.
Because which extrapolation procedure any given person decides to use is itself a function of "their values" it all grounds out in the reward data eventually. Which perhaps defeats my point here.