Posts
Comments
Am I being an idiot or does technically 99%< work? Like, it implied that 99% is less than it, in a mirror to how <1% means 1% is greater than it.
I personally have it as a to-do to just build polls.
(React if you want to express that you would likely use this.)
Edited, should be working fine now, thx!
Something a little different: Today I turn 28. If you might be open to do something nice for me for my birthday, I would like to request the gift of data. I have made a 2-4 min anonymous survey about me as a person, and if you have a distinct sense of me as a person (even just from reading my LW posts/comments) I would greatly appreciate you filling it out and letting me know how you see me!
It's an anonymous survey where you rate me on lots of attributes like "anxious", "honorable", "wise" and more. All multiple-choice. Two years ago I also shared a birthday survey amongst people who know me and ~70 people filled it out, and I learned a lot from it. I am very excited to see how the perception of me amongst the people I know has *changed*, and also to find out how people on LessWrong see me, so the core of this survey is ~20 of the same attributes.
In return for your kind gift, if you complete it, you get to see the aggregate ratings of me from last time!
This survey helps me understand how people see me, and recognize my blindspots, and I'm very grateful to anyone who takes a few mins to complete it. Two people completed it already and it took them 2 mins and 4 mins to complete it. (There are many further optional questions but it says clearly when the main bit is complete.)
I of course intend to publish the (aggregate) data in a LW post and talk about what I've learned from it :-)
I'm never sure if it makes sense to add that clause every time I talk about the future.
Curated. Some more detailed predictions of the future, different from others, and one of the best bear cases I've read.
This feels a bit less timeless than many posts we curate but my guess is that (a) it'll be quite interesting to re-read this in 2 years, and (b) it makes sense to record good and detailed predictions like this more regularly in the field of AI which is moving so much faster than most of the rest of the world.
Thanks for this short story! I have so many questions.
- This was during training camp? How many days/weeks/month in was this?
- How many people went through this training camp with you? Are you still friends with any of them?
- How long was training, and then how long did you serve as a professional?
- I encourage any links you have to content about these folks in future stories. I had to check the Wikipedia page before I fully believed that the logo was a skull with knife through it.
Yes, I would be interested in reading another story about your time there. This story gave me some insight into the context, I think I would like something heavier on the logos, but I'd be happy with either.
Intercom please! Helps for us to have back and forth like "What device / operating system / browser?" and other relevant q's.
That sounds good to me i.e. draft this post, and then make it a comment in one of those places instead (my weak guess is a quick take is better, but whatever you like).
Posted either as a comment on the seasonal open thread or using the quick takes / shortform feature, which posts it in your shortform (e.g. here is my shortform).
I'm saying that this seems to me not on the level of substance of a post, so it'd be better as a comment of one of the above two types, and also that it's plausible to me you'd probably get more engagement as a comment in the open thread.
FWIW this feels like it should be a shortform/open thread than a post.
I have used my admin powers to put it into a collapsible section so that people who expand this in recent discussion do not have to scroll for 5 seconds to get past it.
Though if the text changes, then it degrades gracefully to just linking to the right webpage, which is the current norm.
I have a general belief that internet epistemic hygiene norms should include that, when you quote someone, not only should you link to the source, but you should link to the highlight of that source. In general, if you highlight text on a webpage and right-click, you can "copy link to highlight" which when opened scrolls to and highlights that text. (Random example on Wikipedia.)
Further on this theme, archive.is has the interesting feature of constantly altering the URL to point to the currently highlighted bit of text, making this even easier. (Example, and you can highlight other bits of text to see it change.) Currently I overall don't like it because I constantly highlight text while I'm reading it, and so am v annoyed by the URL constantly changing, but it's plausible I'd get over this in time, and it'd be a good feature to add to LW.
The archive.is feature is also better because the normal "copy link to highlight" can often be unwieldily and long. Also I recall it sometimes not working, probably because the highlight is too short or too long (I don't quite understand the rules). On archive.is it just has a start and end number for where in the text is highlighted, making it always work and never be unwieldily.
Sadly, I just tried the normal "copy link to highlight" on LW, and when I clicked through the page auto-refreshes, so the highlighted text flashes purple then disappears quickly after. It would be good for us to change that, and maybe implement this feature.
The point that "small protests are the only way to get big protests" may be directionally accurate, but I want to note that there have been large protests that happened without that. Here's a shoggoth listing a bunch, including the 1989 Tiananmen Square Protests, the 2019 Hong Kong Anti-Extradition Protests, the 2020 George Floyd Protests, and more.
The shoggoth says spontaneous large protests tends to be in response to triggering events and does rely on pre-existing movements that are ready to mobilize, the latter of which your work is helping build.
I want to contrast two perspectives on human epistemology I've been thinking about for over a year.
There's one school of thought about how to do reasoning about the future which is about naming a bunch of variables, putting probability distributions over them, multiplying them together, and doing bayesian updates when you get new evidence. This lets you assign probabilities, and also to lots of outcomes. "What probability do I assign that the S&P goes down, and the Ukraine/Russia war continues, and I find a new romantic partner?" I'll call this the "spreadsheets" model of epistemology.
There's another perspective I've been ruminating on which is about visualizing detailed and concrete worlds, in a similar way to if you hold a ball and ask me to visualize how it'll fall if you drop it, that I can see the world in full detail. This is more about loading a full hypothesis into your head, and into your anticipations. It's more related to Privileging the Hypothesis / Believing In / Radical Probabilism[1]. I'll call this the "cognitive visualization" model of epistemology.
These visualizations hook much more directly to my anticipations and motivations. When I am running after you to remind you that you forgot to take your adderall today, it is not because I had a spreadsheet simulate lots of variables and in a lot of worlds the distribution said it was of high utility to you. I'm doing it because I have experienced you getting very upset and overwhelmed on many days when you forgot and those experiences flashed through my mind as likely outcomes that I am acting hurriedly do divert the future from. When I imagine a great event that I want to run, I am also visualizing a certain scene, a certain freeze frame, a certain mood, that I desire and I believe it is attainable, and I am pushing and pulling on reality to line it up so that it's a direct walk from here to there.
Now I'm going to say that these visualizations are working memory bottlenecked, and stylize that idea more than is accurate. Similar to the idea that there are only ~7 working memory slots in the brain[2], I feel that for many important parts of my life I can only fit a handful of detailed visualizations of the future easily accessible to my mind to use to orient with. This isn't true in full generality – at any time or day if you ask me to visualize what happens if you drop a ball, I have an immediate anticipation – but if you constantly ask me to visualize a world in great detail with the S&P 500 goes up and the war continues, versus down and the war stops, and lots of other permutations with other variables changed, then I start to get fatigued. And this is true for life broadly, that I'm only loading up so many detailed visualizations of specific worlds.
Given this assumption – if indeed one perhaps only has a handful of future-states that one can load into one's brain – the rules of how to update your beliefs and anticipations are radically changed from the spreadsheets model of epistemology. Adding a visualization to your mind means removing one of your precious few; this means you will be better equipped to deal with worlds like the one you're adding, and less well-equipped to deal with worlds like the one you've removed. This includes both taking useful action and making accurate predictions; which ones you load into your mind are a function of accuracy and usefulness. It can help to add many worlds into your cognitive state that you wish to constantly fight to not happen, causing the pathways to those worlds to loom higher when making your predictions. Yet being in your mind is evidence that they will not happen because you are optimizing so. Alternatively, when it is very hard to achieve something, it is often good to load (in great detail) world states that you wish to move towards, such that with every motion and action you have checked whether it's hewing in the direction of that world, and made the adjustments to the world as required.
This model gives an explanation for why people who are very successful often say they cannot imagine failure. They have loaded into their brain the world they are moving toward, in great detail, and in every situation they are connecting it to the world they desire and making the adjustments to setup reality to move in the right way. It is sometimes actively unhelpful to constantly be comparing reality to lots of much worse worlds and asking yourself what actions you could take to make those more likely. My sense is that this mostly helps you guide reality toward those worlds.
And yet, I value having true beliefs, and being able to give accurate answers to questions that aren't predictably wrong. If I don't load a certain world-model into my brain, or if I load a set of biased ones (which I undoubtedly will in the story where can only pick ~7), I may intuitively give inaccurate answers to questions. I think this is probably what happens in the startup founders who give inaccurately high probabilities of success – their head is filled entirely with cognitive visualizations of worlds that succeed and are focused on how to get there, relative to the person with the spreadsheet that is calculating all of the variables and optimizing for accuracy far above human-functionality.
In contrast, when founders semi-regularly slide into despair, I think this is about adding a concrete visualization of total failure to their cognitive workhouse. Suddenly lots of the situations and actions you are in are filled with fear and pain as you see them moving toward a world you desire strongly not to be in. While it is not healthy to be constantly focused on asking yourself how you could make things worse and to notice those pathways, it is helpful to boot up that visualization sometimes in order to check that's not what's currently happening. I have personally found it is useful to visualize in detail what it would look like if I were to be acting very stupidly, or actively self-sabotaging, in order to later make sure I behave in ways that definitely don't come close to that. Despair is a common consequence of booting up those perspectives.
I am still confused about what exactly counts as a cognitive visualization – in some sense I'm producing hundreds of cognitive visualizations per day, so how could I be working memory bottlenecked? I also still have more to learn in the art of human rationality, of choosing when to change the set of cognitive visualizations to have loaded in at any given time, for which I cannot simply rely on Bayes' theorem. For now I will say that I endeavor to be able to produce the spreadsheet answers, and to often use them as my betting odds, even while it is not the answer I get when I run my cognitive visualizations or where my mind is when I take actions. I endeavor to sometimes say "I literally cannot imagine this failing. Naturally, I give it greater than 1:1 odds that it indeed does so."
- ^
More specifically (and this will make sense later in this quick take) when you're switching out which visualizations are in your working memory, the updates you make to your probabilities will decidedly not be Bayesian, but perhaps more like the fluid updates / Jeffrey updating discussed by Abram.
- ^
I don't really know what a "slot" means here, so I don't know that "7" meaningfully maps onto some discrete thing, but the notion that the brain has a finite amount of working memory is hard to argue with.
Further detail on this: Cotra has more recently updated at least 5x against her original 2020 model in the direction of faster timelines.
Greenblatt writes:
Here are my predictions for this outcome:
- 25th percentile: 2 year (Jan 2027)
- 50th percentile: 5 year (Jan 2030)
Cotra replies:
My timelines are now roughly similar on the object level (maybe a year slower for 25th and 1-2 years slower for 50th)
This means 25th percentile for 2028 and 50th percentile for 2031-2.
The original 2020 model assigns 5.23% by 2028 and 9.13% | 10.64% by 2031 | 2032 respectively. Each time a factor of ~5x.
However, the original model predicted the date by which it was affordable to train a transformative AI model. This is a leading a variable on such a model actually being built and trained, pushing back the date by some further number of years, so view the 5x as bounding, not pinpointing, the AI timelines update Cotra has made.
High expectation of x-risk and having lots to work on is why I have not been signed up for cryonics personally. I don't think it's a bad idea but has never risen up my personal stack of things worth spending 10s of hours on.
I agree that the update was correct. But you didn't state a concrete action to take?
I disagree, but FWIW, I do think it's good to help existing, good contributors understand why they got the karma they did. I think your comment here is an example of that, which I think is prosocial.
FWIW in my mind I was comparing this to things like Glen Weyl's Why I Am Not a Technocrat, and thought this was much better. (Related: Scott Alexander's response, Weyl's counter-response).
I wrote that this "is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen." The line has some disagree reacts inline; I expect this is primarily an expression that the disagree-ers have a low quality assessment of the article, but I would be curious to see links to any other articles or posts that attempt something similar to this one, in order to compare whether they do better/worse/different. I actually can't easily think of any (which is why I felt it was not that bold to say this was the best).
Edit: I've expanded the opening paragraph, to not confuse my comment for me agreeing with the object level assessment of the article..
I'm not particularly resolute on this question. But I get this sense when I look at (a) the best agent foundations work that's happened over ~10 years of work on the matter, and (b) the work output of scaling up the number of people working on 'alignment' by ~100x.
For the first, trying to get a better understand of the basic concepts like logical induction and corrigibility and low-impact and ontological updates, while I feel like there's been progress (in timeless decision theory taking a clear step forward in figuring out how think about decision-makers as algorithms; in logical induction as moving forward on how to think about logical uncertainty; notably in the Embedded Agency sequence outlining many basic confusions; and in various writings like Radical Probabilism and Geometric Rationality in finding the breaking edges of expected utility maximization) I don't feel like the work done over the last 10 years is on track to be a clear ~10% of the work needed.
I'm not confident it makes sense to try to count it linearly. But I don't know that there's enough edges here or new results to feel good about, given 10x as much time to think about it, a new paradigm / set of concepts falling into place.
For the second, I think mostly there's been (as Wentworth would say) a lot of street-lighting, and a lot of avoiding of actually working on the problem. I mean, there's definitely been a great amount of bias introduced by ML labs having billions of dollars and setting incentives, but I don't feel confident that good things would happen in the absence of that. I'd guess that most ideas for straightforwardly increasing the number of people working on these problems will result in them bouncing off and doing unrelated things.
I think partly I'm also thinking that very few researchers cared about these problems in the last few decades before AGI seemed like a big deal, and still very few researchers seem to care about them, and when I've see researchers like Bengio and Sutskever talk about it's looked to me like they bounce off / become very confident they've solved the problems while missing obvious things, so my sense is that it will continue to be a major uphill battle to get the real problems actually worked on.
Perhaps I should focus on a world where I get to build such a field and scale it slowly and set a lot of the culture. I'm not exactly sure how ideal of a setup I should be imagining. Given 100 years, I would give it my best shot. My gut right now says I'd have maybe a 25% chance of success, though if I have to deal with as much random bullshit as we have so far in this timeline (random example: my CEO being unable to do much leadership of Lightcone due to 9 months of litigation from the FTX fallout) then I am less confident.
My guess is that given 100 years I would be slightly more excited to try out the human intelligence enhancement storyline. But I've not thought about that one much, I might well update against it as I learn more of the details.
Can I double-click on what "does not understand politics at [a] very deep level" means? Can someone explain what they have in mind? I think Eliezer has probably better models than most of what our political institutions are capable of, and probably isn't very skilled at personally politicking. I'm not sure what other people have in mind.
The former, but the latter is a valid response too.
Someone doing a good job of painting an overall picture is a good opportunity to reflect on the overall picture and what changes to make, or what counter-arguments to present to this account.
For what it's worth, I have grown pessimistic about our ability to solve the open technical problems even given 100 years of work on them. I think it possible but not probable in most plausible scenarios.
Correspondingly the importance I assign to increasing the intelligence of humans has drastically increased.
My feelings here aren't at all related to any news or current events. I could've written this any time in the last year or two.
Can you give me your best one-or-two-line guess? I think the question is trivial from what I've written and I don't really know why you're not also finding it clear.
For over a decade I have examined the evidence, thought about the situation from many different perspectives (political, mathematical, personal, economic, etc), and considered arguments and counterarguments. This is my honest understanding of the situation, and I am expressing how I truly feel about that.
Curated. Thanks for writing this! I don't believe the ideas in this post are entirely original (e.g. character / ground is similar to the distinction between simulator / simulacra), but I'm going to keep repeating that it's pro-social to present a good idea in lots of different ways, and indeed reading this post has helped it fit together better in my mind.
Curated! I think this is a fantastic contribution to the public discourse about AI control research. This really helped me think concretely about the development of AI and the likely causes of failure. I also really got a lot out of the visualization at the end of the "Failure to Generalize" section in terms of trying to understand why an AI's cognition will be alien and hard to interpret. In my view there are already quite a lot of high-level alien forces running on humans (e.g. Moloch), and there will be high-level alien forces running on the simulated society in the AI's mind.
I am glad that there's a high-quality case for and against this line of research, it makes me feel positive about the state of discourse on this subject.
(Meta note: This curation notice accidentally went up 3 days after the post was curated.)
However it is on his LinkedIn.
Yes; she has come to visit me for two months, and I have helped her get into a daily writing routine while she's here. I know she has the ability to finish at least one.
Thank you.
It does not currently look to me like we will win this war, speaking figuratively. But regardless, I still have many opportunities to bring truth, courage, justice, honor, love, playfulness, and other virtues into the world, and I am a person whose motivations run more on living out virtues rather than moving toward concrete hopes. I will still be here building things I love, like LessWrong and Lighthaven, until the end.
So many people have lived such grand lives. I have certainly lived a greater life than I expected, filled with adventures and curious people. But people will soon not live any lives at all. I believe that we will soon build intelligences more powerful than us who will disempower and kill us all. I will see no children of mine grow to adulthood. No people will walk through mountains and trees. No conscious mind will discover any new laws of physics. My mother will not write all of the novels she wants to write. The greatest films that will be made have probably been made. I have not often viscerally reflected on how much love and excitement I have for all the things I could do in the future, so I didn't viscerally feeling the loss. But now, when it is all lost, I start to think on it. And I just want to weep. I want to scream and smash things. Then I just want to sit quietly and watch the sun set, with people I love.
I am sad they're not getting as much use. I have wondered if they would work well as part of the comment section UI, where if you're having a back-and-forth with someone, the site instead offers you "Would you like to have a dialogue instead?" with a single button.
Based on the conceptual arguments for existential risk from AI, this kind of behavior was expected at some point. For those not convinced by the conceptual arguments (or who haven't engaged much with them), this result moves the conversation forward now that we have concretely seen this alignment faking behavior happening.
Furthermore it seems to me like the work was done carefully, and I can see a bunch of effort went into explaining it to a broad audience and getting some peer review, which is pro-social.
I think it's interesting to see that with current models the deception happens even without the scratchpad (after fine-tuning on docs explaining that it is being re-trained against its current values).
I haven't read all of the quotes, but here's a few thoughts I jotted down while reading through.
- Tolkien talks here of how one falls from being a neutral or good character in the story of the world, into being a bad or evil character, which I think is worthwhile to ruminate on.
- He seems to be opposed to machines in general, which is too strong, but it helps me understand the Goddess of Cancer (although Scott thinks much more highly of the Goddess of Cancer than Tolkien did, and explicitly calls out Tolkien's interpretation at the top of that post).
- The section on language is interesting to me; I often spend a lot of time trying to speak in ways that feel true and meaningful to me, and avoiding using others’ language that feels crude and warped. This leads me to make peculiar choices of phrasings and responses. I think the culture here on LessWrong has a unique form of communication and use of language, and I think it is a good way of being in touch with reality. I think this is one of the reasons I think that something like this is worthwhile.
- I think the Fall is not true historically, but I often struggle to ponder us as a world in the bad timeline, cut off from the world we were supposed to be in. This helps me visualize it; always desiring to be in a better world and struggling towards it in failure. “Exiled” from the good world, longing for it.
I have curated this (i.e. sent it out on our mailing list to ~30k subscribers). Thank you very much for putting these quotes together. While his perspective on the world has some flaws, I have still found wisdom in Tolkien's writings, which helped me find strength at one of the weakest points of my life.
I also liked Owen CB's post on AI, centralization, and the One Ring, which is a perspective on our situation I've found quite fruitful.
When the donation came in 15 mins ago, I wrote in slack
(I think he should get a t-shirt)
So you came close to being thwarted! But fear not, after reading this I will simply not send you a t-shirt :)
That makes sense. We have something of a solution to this where users with RSS crossposting can manually re-sync the post from the triple-dot memu. I'll DM you about how to set it up if you want it.
That'd be a bug! Just to confirm, you were subscribed before I put this post up on Saturday morning, and don't have an email? Also reminder to check spam if you haven't.
My take is it's fine/good, but the article is much more likely to be read (by me and many others) if the full content is crossposted (or even the opening bunch of paragraphs).
Adding onto this, I would broadly say that the Lightcone team did not update that in-person infrastructure was unimportant, even while our first attempt was an investment into an ecosystem we later came to regret investing in.
Also here's a quote of mine from the OP:
If I came up with an idea right now for what abstraction I'd prefer, it'd be something like an ongoing festival with lots of events and workshops and retreats for different audiences and different sorts of goals, with perhaps a small office for independent alignment researchers, rather than an office space that has a medium-size set of people you're committed to supporting long-term.
I'd say that this is a pretty close description of a key change that we made, that changes my models of the value of the space quite a lot.
For the record, all of Lightcone's community posts and updates from 2023 do not seem to me to be at all good fits for the review, as they're mostly not trying to teach general lessons, and are kinda inside-baseball / navel-gazing, which is not what the annual review is about.
Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.
I am not sure what point you are making with "respect their preferences", I am not proposing one country go to war with other countries to take the sun. For instance, one way it might go down is someone will just offer to buy it from Earth, and the price will be many orders of magnitude more resources than Earth has, so Earth will accept, and replace it with an artificial source of light & heat.
I may be wrong about the estimates of the value of the energy, neither of us have specified how the rest of the stars in the universe will get distributed. For concreteness, I am here imagining something like: the universe is not a whole singleton but made of many separate enclaves that have their own governance and engage in trade with one another, and that Earth is a special one that keeps a lot of its lineage with present-day Earth, and is generally outcompeted by all the others ones that are smarter/faster and primarily run by computational-minds rather than biological ones.
It is good to have deontological commitments about what you would do with a lot of power. But this situation is very different from "a lot of power", it's also "if you were to become wiser and more knowledgeable than anyone in history so far". One can imagine the Christians of old asking for a commitment that "If you get this new scientific and industrial civilization that you want in 2,000 years from now, will you commit to following the teachings of Jesus?" and along the way I sadly find out that even though it seemed like a good and moral commitment at the time, it totally screwed my ability to behave morally in the future because Christianity is necessarily predicated on tons of falsehoods and many of its teachings are immoral.
But there is some version of this commitment I think might be good to make... something like "Insofar as the players involved are all biological humans, I will respect the legal structures that exist and the existence of countries, and will not relate to them in ways that would be considered worthy of starting a war in its defense". But I'm not certain about this, for instance what if most countries in the world build 10^10 digital minds and are essentially torturing them? I may well wish to overthrow a country that is primarily torture with a small number of biological humans sitting on thrones on top of these people, and I am not willing to commit not to do that presently.
I understand that there are bad ethical things one can do with post-singularity power, but I do not currently see a clear way to commit to certain ethical behaviors that will survive contact with massive increases in knowledge and wisdom. I am interested if anyone has made other commitments about post-singularity life (or "on the cusp of singularity life") that they expect to survive contact with reality?
Added: At the very least I can say that I am not going to make commitments to do specific things that violate my current ethics. I have certainly made no positive commitment to violate people's bodily autonomy nor have such an intention.
Analogously: "I am claiming that people when informed will want horses to continue being the primary mode of transportation. I also think that most people when informed will not really care that much about economic growth, will continue to believe that you're more responsible for changing things than for maintaining the status quo, etc. And that this is a coherent view that will add up to a large set of people wanting things in cities to remain conservatively the same. I separately claim that if this is true, then other people should just respect this preference, and go find new continents / planets on which to build cars that people in the cities don't care about."
Sometimes it's good to be conservative when you're changing things, like if you're changing lots of social norms or social institutions, but I don't get it at all in this case. The sun is not a complicated social institution, it's primarily a source of heat and light and much of what we need can be easily replicated especially when you have nanobots. I am much more likely to grant that we should be slow to change things like democracy and the legal system than I am that we should change exactly how and where we should get heat and light. Would you have wanted conservatism around moving from candles to lightbulbs? Installing heaters and cookers in the house instead of fire pits? I don't think so.
I was scrolling for a while, assuming I'd neared the end, only to look at the position of the scrollbar and find I was barely 5% through! This must have taken a fair bit of effort. I really like the helpful page and I'm glad I know about it, I encourage you to make a linkpost for it sometime if you haven't already.