Posts
Comments
sigh Protests last year, barricading this year, I've already mentally prepared myself for someone next year throwing soup at a human-generated painting while shouting about AI. This is the kind of stuff that makes no one in the Valley want to associate with you. It makes the cause look low-status, unintelligent, lazy, and uninformed.
A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.
“I asked for kindling.” Says the first.
“Yes.” Says the second.
“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.
The second gestures at the elephant, grinning.
“That’s an elephant.”
“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, its skin, muscles, all of it will burn. Right down to its bones.”
“What is the ignition temperature for an elephant”
“I don't know, perhaps 300-400°C”
The first held up two stones.
“This is all I have to start a fire.” He says, “It will only create a few sparks at best… I’m not even sure how I can get it to consistently do that much, given how hard this will be for people thousands of years from now to replicate.”
“That is the challenge.” The second nodded solemnly, “I’m glad you understand the scope of this. We will have to search for ways to generate sparks at 400° so that we can solve the Elephant Kindling Problem.”
“I think I know why you chose the elephant. I think you didn’t initially understand that almost everything is combustible, but only notice things are combustible once you pay enough attention to them. You looked around the Savana, and didn't understand that dry leaves would be far more combustible, and your eyes immediately went to the elephant. Because elephants are interesting. They’re big and have trunks. Working on an Elephant Problem just felt way more interesting than a Dry Leaves Problem, so you zeroed all of your attention on elephants, using the excuse that elephants are technically combustible, failing to see the elegant beauty in the efficient combustibility of leaves and their low ignition temperature.”
“Leaves might be combustible. But think of how fast they burn out. And how many you would have to gather to start a fire. An elephant is very big. It might take longer to get it properly lit, but once you do, you will have several tons of kindling! You could start any number of fires with it!”
“Would you have really made these conclusions if you had searched all the possible combustible materials in the Savana, instead of immediately focusing on elephants?”
“Listen, we can’t waste too much time on search. There are thousands of things in the Savana! If we tested the combustibility and ignition temperature of every single one of them, we’d never get around to starting any fires. Are elephants the most combustible things in the Universe? Probably not. But should I waste time testing every possible material instead of focusing on how to get one material to burn? We have finite time, and finite resources to search for combustible materials. It’s better to pick one and figure out how to do it well.”
“I still think you only chose elephants because they’re big and interesting.”
“I imagine that ‘big’ and ‘useful as kindling material’ are not orthogonal. We shouldn’t get distracted by the small, easy problems, such as how to burn leaves. These are low hanging fruit that anyone can pick. But my surveys of the tribe have found that figuring out the combustibility of elephants remains extremely neglected.”
“What about the guy who brought me a giraffe yesterday?”
“A giraffe is not an elephant! I doubt anything useful will ever come from giraffe combustibility. Their necks are so long that they will not even fit inside our caves!”
“What I am saying is that others have brought in big, interesting-looking animals, and tried to figure out how to turn them into kindling. Sure, no one else is working on the Elephant Kindling Problem. But that’s also what the guy with the giraffe said, and the zebra, and the python.”
“Excuse me,” Said a third, poking his head into the cave, “But the Python Kindling Problem is very different from the Elephant one. Elephants are too girthy to be useful. But with a python, you can roll it into a coil, which will make it extremely efficient kindling material.”
The second scratched his chin for a moment, looking a bit troubled.
“What if we combined the two?” He asked. “If we wound the python around a leg of the elephant, the heat could be transferred somewhat efficiently.”
“No, no, no.", argued the third, "I agree combining these two problems might be useful. But it would be far better to just cut the trunk off the elephant, and intertwine it with the python. This could be very useful, since elephant hide is very thick and might burn slower. This gives us the pros of a fast-burning amount of kindling, mixed with a more sustained blaze from the elephant.”
“Might I interject.” Said a fourth voice, who had been watching quietly from the corner, but now stepped forward, “I have been hard at work on the Giraffe Kindling problem, but think that we are actually working on similar things. The main issue has always been the necks. They simply won’t fit inside the cave. We need a solution that works in all edge cases, after all. If it’s raining, we can’t start a fire outside. But if we use the python and the elephant trunk to tie the neck of the giraffe against the rest of its body, we could fit the whole thing in!”
“I think this will be a very fruitful collaboration.” Said the second, “While at first it seemed as though we were all working on different problems, it turns out by combining them, we have found an elegant solution.”
“But we still can’t generate sparks hot enough to combust any of them!” Shouted the first, “All you’ve done is made this even more complicated and messy!”
“I am aware it might seem that way to a novice.” Said the second, “But we have all gained great knowledge in our own domains. And now it is time for our fields to evolve into a true science. We are not amateurs anymore, simply playing around with fire. We are now establishing expertise, creating sub-domains, arriving at a general consensus of the problem and its underlying structure! To an outsider, it will probably look daunting. But so does every scientific field once it matures. And we will continue to make new ground by standing on the shoulders of elephants!”
“Giraffes.” Corrected the fourth.
“Zebras.” Answered a fifth.
I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.
I got into AI at the worst time possible
2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.
I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.
Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.
Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing™.
And then ChatGPT came out two months later, and even my mother was talking about AI.
If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/rationalist/transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.
The shock of the field/community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing™, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.
I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing™. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/investors/etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.
So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing™ is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/homelessness/whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.
Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing™, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”
It probably began training in January and finished around early April. And they're now doing evals.
My birds are singing the same tune.
Going to the moon
Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it's just a matter of time before one of them will land on the moon!”
Contra Scheming AIs
There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of "scheming AIs". AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there's a "humanculi-in-a-box", and this is usually met with arguments that we wouldn't see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a "scheming AI" or "humanculi-a-box" isn't impossible, and we could (and might) develop the means to do so in the future, but that it's a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.
"To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here... Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice."
I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they're rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don't seem to have been heavily optimized for this either*, yet we're capable of forming multi-decade plans (even if sometimes poorly).
*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)
This is the kind of political reasoning that I've seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it's more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong". The premise grants that the duration of time it is aligned is long enough for the ASI to act in the world (it seems mostly timescale agnostic), so I operate on that premise. My points are not about what is most likely to actually happen, the possibility of less-than-perfect alignment being dangerous, the AI having other goals it might seek over the wellbeing of humans, or how we should act based on the information we have.
I'm not sure who are you are debating here, but it doesn't seem to be me.
First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, "works to protect" was not intended to mean "control all relevant outcomes of". I'm not sure why you would get that idea, but that certainly isn't what I think of first if someone says a person is "working to protect" something or someone. Soldiers defending a city from raiders are not violating control theory or the laws of physics. Third, the post is on the premise that "even if we created an aligned ASI", so I was working with that premise that the ASI could be aligned in a way that it deeply cared about humans. Four, I did not assert that it would stay aligned over time... the story was all about the ASI not remaining aligned. Five, I really don't think control theory is relevant here. Killing yourself to save a village does not break any laws of physics, and is well within most human's control.
My ultimate point, in case it was lost, was that if we as human intelligences could figure out an ASI would not stay aligned, an ASI could also figure it out. If we, as humans, would not want this (and the ASI was aligned with what we want), then the ASI presumably would also not want this. If we would want to shut down an ASI before it became misaligned, the ASI (if it wants what we want) would also want this.
None of this requires disassembling black holes, breaking the laws of physics, or doing anything outside of that entities' control.
I've heard of many such cases of this from EA Funds (including myself). My impression is that they only had one person working full-time managing all three funds (no idea if this has changed since I applied or not).
An incapable man would kill himself to save the village. A more capable man would kill himself to save the village AND ensure no future werewolves are able to bite villagers again.
Though I tend to dislike analogies, I'll use one, supposing it is actually impossible for an ASI to remain aligned. Suppose a villager cares a whole lot about the people in his village, and routinely works to protect them. Then, one day, he is bitten by a werewolf. He goes to the Shammon, he tells him when the Full Moon rises again, he will turn into a monster, and kill everyone in the village. His friends, his family, everyone. And that he will no longer know himself. He is told there is no cure, and that the villagers would be unable to fight him off. He will grow too strong to be caged, and cannot be subdued or controlled once he transforms. What do you think he would do?
MIRI "giving up" on solving the problem was probably a net negative to the community, since it severely demoralized many young, motivated individuals who might have worked toward actually solving the problem. An excellent way to prevent pathways to victory is by convincing people those pathways are not attainable. A positive, I suppose, is that many have stopped looking to Yudkowsky and MIRI for the solutions, since it's obvious they have none.
I don't think this is the case. For awhile, the post with the highest karma was Paul Christiano explaining all the reasons he thinks Yudkowsky is wrong.
Fair. What would you call a "mainstream ML theory of cognition", though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis
It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of "everything else" (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, but it took several decades for it to be demonstrated to actually work. I think a lot of people were confused by why the "stack more layers" approach kept working, but under the model of connectionism, this is expected. Connectionism is kind of too general to make great predictions, but it doesn't seem to allow for FOOM-type scenarios. It also seems to favor agents as local optima satisficers, instead of greedy utility maximizers.
"My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena."
Such as? I wouldn't call Shard Theory mainstream, and I'm not saying mainstream models are correct either. On human's trying to be consistent decision-makers, I have some theories about that (and some of which are probably wrong). But judging by how bad humans are at it, and how much they struggle to do it, they probably weren't optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this, even if the hardware is very stubborn at times. But even that isn't optimized too hard toward coherence. Someone might prefer pizza to hot dogs, but they probably won't always choose pizza over any other food, just because they want their preference ordering of food to be consistent. And, sure, maybe what they "truly" value is something like health, but I imagine even if they didn't, they still wouldn't do this.
But all of this is still just one piece on the Jenga tower. And we could debate every piece in the tower, and even get 90% confidence that every piece is correct... but if there are more than 10 pieces on the tower, the whole thing is still probably going to come crashing down. (This is the part where I feel obligated to say, even though I shouldn't have to, that your tower being wrong doesn't mean "everything will be fine and we'll be safe", since the "everything will be fine" towers are looking pretty Jenga-ish too. I'm not saying we should just shrug our shoulders and embrace uncertainty. What I want is to build non-Jenga-ish towers)
This isn't what I mean. It doesn't mean you're not using real things to construct your argument, but that doesn't mean the structure of the argument reflects something real. Like, I kind of imagine it looking something like a rationalist Jenga tower, where if one piece gets moved, it all crashes down. Except, by referencing other blog posts, it becomes a kind of Meta-Jenga: a Jenga tower composed of other Jenga towers. Like "Coherent decisions imply consistent utilities". This alone I view to be its own mini Jenga tower. This is where I think String Theorists went wrong. It's not that humans can't, in theory, form good reasoning based on other reasoning based on other reasoning and actually arrive at the correct answer, it's just that we tend to be really, really bad at it.
The sort of thing that would change my mind: there's some widespread phenomenon in machine learning that perplexes most, but is expected according to your model, and any other model either doesn't predict it as accurately, or is more complex than yours.
I dislike the overuse of analogies in the AI space, but to use your analogy, I guess it's like you keep assigning a team of engineers to build a car, and two possible things happen. Possibility One: the engineers are actually building car engines, which gives us a lot of relevant information for how to build safe cars (toque, acceleration, speed, other car things), even if we don't know all the details for how to build a car yet. Possibility Two: they are actually just building soapbox racers, which doesn't give us much information for building safe cars, but also means that just tweaking how the engineers work won't suddenly give us real race cars.
If progress in AI is continuous, we should expect record levels of employment. Not the opposite.
My mentality is if progress in AI doesn't have a sudden, foom-level jump, and if we all don't die, most of the fears of human unemployment are unfounded... at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that an AI still can't becomes a scarce, highly valued resource. Companies with thousands or millions of AI instances working for them would likely compete for human labor, because making more humans takes much longer than making more AIs. Then say, after a few years, AIs are able to automate 90% of the remaining 10%. Then that creates even more productivity, more economic growth, and even more jobs. This could continue for even a few decades. Eventually, humans will be rendered completely obsolete, but by that point (most) of them might be so filthy rich that they won't especially care.
This doesn't mean it'll all be smooth-sailing or that humans will be totally happy with this shift. Some people probably won't enjoy having to switch to a new career, only for that new career to be automated away after a few years, and then have to switch again. This will probably be especially true for people who are older, those who have families, want a stable and certain future, etc. None of this will be made easier by the fact it'll probably be hard to tell when true human obsolescence is on the horizon, so some might be in a state of perpetual anxiety, and others will be in constant denial.
I think my main problem with this is that it isn't based on anything. Countless times, you just reference other blog posts, which reference other blog posts, which reference nothing. I fear a whole lot of people thinking about alignment are starting to decouple themselves from reality. It's starting to turn into the AI version of String Theory. You could be correct, but given the enormous number of assumptions your ideas are stacked on (and that even a few of those assumptions being wrong leads to completely different conclusions), the odds of you even being in the ballpark of correct seem unlikely.
At first I strong-upvoted this, because I thought it made a good point. However, upon reflection, that point is making less and less sense to me. You start by claiming current AIs provide nearly no data for alignment, that they are in a completely different reference class from human-like systems... and then you claim we can get such systems with just a few tweaks? I don't see how you can go from a system that, you claim, provides almost no data for studying how an AGI would behave, to suddenly having a homunculus-in-the box that becomes superintelligent and kills everyone. Homunculi seem really, really hard to build. By your characterization of how different actual AGI is from current models, it seems this would have to be fundamentally architecturally different from anything we've built so far. Not some kind of thing that would be created by near-accident.
Contra One Critical Try: AIs are all cursed
I don't feel like making this a whole blog post, but my biggest source for optimism for why we won't need to one-shot an aligned superintelligence is that anyone who's trained AI models knows that AIs are unbelievably cursed. What do I mean by this? I mean even the first quasi-superintelligent AI we get will have so many problems and so many exploits that taking over the world will simply not be possible. Take a "superintelligence" that only had to beat humans at the very constrained game of Go, which is far simpler than the real world. Everyone talked about how such systems were unbeatable by humans, until some humans used a much "dumber" AI to find glaring holes in Leela Zero's strategy. I expect, in the far more complex "real world", a superintelligence will have even more holes, and even more exploits, a kind of "swiss chess superintelligence". You can say "but that's not REAL superintelligence", and I don't care, and the AIs won't care. But it's likely the thing we'll get first. Patching all of those holes, and finding ways to make such an ASI sufficiently not cursed will also probably mean better understanding of how to stop it from wanting to kill us, if it wanted to kill us in the first place. I think we can probably get AIs that are sufficiently powerful in a lot of human domains, and can probably even self-improve, and still be cursed. The same way we have AIs with natural language understanding, something once thought to be a core component of human intelligence, that are still cursed. A cursed ASI is a danger for exploitation, but it's also an opportunity.
I'm kind of surprised this has almost 200 karma. This feels much more like a blog post on substack, and much less like the thoughtful, insightful new takes on rationality that used to get this level of attention on the forum.
Why would it matter if they notice or not? What are they gonna do? EMP the whole world?
I think you're missing the point. If we could establish that all important information had been extracted from the original, would you expect humans to then destroy the original or allow it to be destroyed?
My guess is that they wouldn't. Which I think means practicality is not the central reason why humans do this.
if we could somehow establish how information from the original was extracted, do you expect humans to then destroy the original or allow it to be destroyed?
Can humans become Sacred?
On 12 September 1940, the entrance to the Lascaux Cave was discovered on the La Rochefoucauld-Montbel lands by 18-year-old Marcel Ravidat when his dog, Robot, investigated a hole left by an uprooted tree (Ravidat would embellish the story in later retellings, saying Robot had fallen into the cave.)[8][9] Ravidat returned to the scene with three friends, Jacques Marsal, Georges Agnel, and Simon Coencas. They entered the cave through a 15-metre-deep (50-foot) shaft that they believed might be a legendary secret passage to the nearby Lascaux Manor.[9][10][11] The teenagers discovered that the cave walls were covered with depictions of animals.[12][13] Galleries that suggest continuity, context or simply represent a cavern were given names. Those include the Hall of the Bulls, the Passageway, the Shaft, the Nave, the Apse, and the Chamber of Felines. They returned along with the Abbé Henri Breuil on 21 September 1940; Breuil would make many sketches of the cave, some of which are used as study material today due to the extreme degradation of many of the paintings. Breuil was accompanied by Denis Peyrony, curator of Les eyzies (Prehistory Museum) at Les Eyzies, Jean Bouyssonie and Dr Cheynier.
The cave complex was opened to the public on 14 July 1948, and initial archaeological investigations began a year later, focusing on the Shaft. By 1955, carbon dioxide, heat, humidity, and other contaminants produced by 1,200 visitors per day had visibly damaged the paintings. As air condition deteriorated, fungi and lichen increasingly infested the walls. Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
Lascaux II, an exact copy of the Great Hall of the Bulls and the Painted Gallery was displayed at the Grand Palais in Paris, before being displayed from 1983 in the cave's vicinity (about 200 m or 660 ft away from the original cave), a compromise and attempt to present an impression of the paintings' scale and composition for the public without harming the originals.[10][13] A full range of Lascaux's parietal art is presented a few kilometres from the site at the Centre of Prehistoric Art, Le Parc du Thot, where there are also live animals representing ice-age fauna.[14]
The paintings for this site were duplicated with the same type of materials (such as iron oxide, charcoal, and ochre) which were believed to be used 19,000 years ago.[9][15][16][17] Other facsimiles of Lascaux have also been produced over the years.
They have also created additional copies, Lascaux III, Lascaux IV, and Lascaux V.
Consequently, the cave was closed to the public in 1963, the paintings were restored to their original state, and a monitoring system on a daily basis was introduced.
“I actually find it overwhelmingly hopeful, that four teenagers and a dog named Robot discovered a cave with 17,000-year-old handprints, that the cave was so overwhelmingly beautiful that two of those teenagers devoted themselves to its protection. And that when we humans became a danger to that caves' beauty, we agreed to stop going. Lascaux is there. You cannot visit.”
-John Green
People preserve the remains of Lucy, work hard to preserve old books, the Mona Lisa is protected under bullet-proof glass and is not up for sale.
What is the mechanistic reason for this? There are perfect copies of these things, yet humans go through great lengths to preserve the original. Why is there the Sacred?
They have created copies of Lascaux, yet still work hard to preserve the original. Humans cannot enter. They get no experience of joy from visiting. It is not for sale. Yet they strongly desire to protect it, because it is the original, and no other reason.
Robin Hanson gave a list of Sacred characteristic, some I find promising:
Sacred things are highly (or lowly) valued. We revere, respect, & prioritize them.
Sacred is big, powerful, extraordinary. We fear, submit, & see it as larger than ourselves.
We want the sacred “for itself”, rather than as a means to get other things.
Sacred makes us feel less big, distinct, independent, in control, competitive, entitled.
Sacred quiets feelings of: doubts, anxiety, ego, self-criticism, status-consciousness.
We get emotionally attached to the sacred; our stance re it is oft part of our identity.
We desire to connect with the sacred, and to be more associated with it.
Sacred things are sharply set apart and distinguished from the ordinary, mundane.
Re sacred, we fear a slippery slope, so that any compromise leads to losing it all.
If we can understand the sacred, it seems like a concept that probably wouldn’t fall into a simple utility function, something that wouldn’t break out-of-distribution. A kind of Sacred Human Value Shard, something that protects our part of the manifold.
Suppose you've got a strong goal agnostic system design, but a bunch of competing or bad actors get access to it. How does goal agnosticism stop misuse?
This was the question I was waiting to be answered (since I'm already basically onboard with the rest of it), but was disappointed you didn't have a more detailed answer. Keeping this out of incompetent/evil hands perpetually seems close-to impossible. It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we're back to square-one of the classic alignment problem of hitting a narrow target for a maximizing agent.
Overall, a very good read, well-researched and well-reasoned.
I created a simple Google Doc for anyone interested in joining/creating a new org to put down their names, contact, what research they're interested in pursuing, and what skills they currently have. Overtime, I think a network can be fostered, where relevant people start forming their own research, and then begin building their own orgs/get funding. https://docs.google.com/document/d/1MdECuhLLq5_lffC45uO17bhI3gqe3OzCqO_59BMMbKE/edit?usp=sharing
But it's also an entire School of Thought in Cognitive Science. I feel like DL is the method, but without the understanding that these are based on well-thoughtout, mechanistic rules for how cognition fundamentally works, building potentially toward a unified theory of cognition and behaviour.
I don't have an adequate answer for this, since these models are incomplete. But the way I see it is that these people had a certain way of mathematically reasoning about cognition (Hinton, Rumelhart, McClelland, Smolensky), and that reasoning created most of the breakthroughs we see today in AI (backprop, multi-layed models, etc.) It seems trying to utilize that model of cognition could give rise to new insights about the questions you're asking, attack the problem from a different angle, or help create a grounded paradigm for alignment research to build on.
My answer is a bit vague, but I would say that the current DL curriculum tells you how these things work, but it doesn't go into the reasoning about cognition that allowed these ideas to exist in the first place.
You could say it "predicted" everything post-AlexNet, but it's more that it created the fundamental understanding for everything post-AlexNet to exist in the first place. It's the mathematical models of cognition that all of modern AI is built on. This is how we got back propagation, "hidden" layers, etc.
If you, or if you know someone who wants to try to start doing this, let me know. I've noticed a lot of things in AIS people will say they'd like to see, but then nothing happens.
I guess my biggest doubt is that a dl-based AI could run interpretability on itself. Large NNs seem to "simulate" a larger network to represent more features, which results in most of the weights occupying a superposition. I don't see how a network could reflect on itself, since it seems that would require an even greater network (which then would require an even greater network, and so on). I don't see how it could eat its own tail, since only interpreting parts of the network would not be enough. It would have to interpret the whole.
The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
On AI Governance and Public Policy
2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Alerting people to the dangers of Nuclear War is easier: nukes have been detonated, the visual image of incineration is easy to imagine and can be described in detail, and they or their parents have likely lived through nuclear drills in school. This is closer to trying to explain someone the dangers of nuclear war before Hiroshima, before the Manhattan Project, and before even tnt was developed. They have to conceptualize what an explosion even is, not simply imagining an explosion at greater scale. Most people will simply not have the time or the will to try to grasp this problem, so this runs the risk of having people calling for action to a problem they do not understand, which will likely lead to dismissal by AI Researchers, and possibly short-sighted policies that don’t actually tackle the problem, or even make the problem worse by having the guise of accomplishment. To make matters worse, there is the risk of polarization. Almost any concern with political implications that has gained widespread public attention runs a high risk of becoming polarized. We are still dealing with the ramifications of well-intentioned, but misguided, early advocates in the Climate Change movement two decades ago, who set the seeds for making climate policy part of one’s political identity. This could be even more detrimental than a merely uninformed electorate, as it might push people who had no previous opinion on AI to advocate strongly in favor of capabilities acceleration, and to be staunchly against any form of safety policy. Even if executed using the utmost caution, this does not stop other players from using their own power or influence to hijack the movement and lead it astray.
2023 Me: Ah, Me’22,, the things you don’t know! Many of the concerns of Me’22 I think are still valid, but we’re experiencing what chess players might call a “forced move”. People are starting to become alarmed, regardless of what we say or do, so steering that in a direction we want is necessary. The fire alarm is being pushed, regardless, and if we don’t try to show some leadership in that regard, we risk less informed voices and blanket solutions winning-out. The good news is “serious” people are going on “serious” platforms and actually talking about x-risk. Other good news is that, from current polls, people are very receptive to concerns over x-risk and it has not currently fallen into divisive lines (roughly the same % of those concerned fall equally among various different demographics). This is still a difficult minefield to navigate. Polarization could still happen, especially with an Election Year in the US looming. I’ve also been talking to a lot of young people who feel frustrated not having anything actionable to do, and if those in AI Safety don’t show leadership, we might risk (and indeed are already risking), many frustrated youth taking political and social action into their own hands. We need to be aware that EA/LW might have an Ivory Tower problem, and that, even though a pragmatic, strategic, and careful course of action might be better, this might make many feel “shut out” and attempt to steer their own course. Finding a way to make those outside EA/LW/AIS feel included, with steps to help guide and inform them, might be critical to avoiding movement hijacking.
On Capabilities vs. Alignment Research:
2022 Me: While I strongly agree that not increasing capabilities is a high priority right now, I also question if we risk creating a state of inertia. In terms of the realms of safety research, there are very few domains that do not risk increasing capabilities research. And, while capabilities continues to progress every day, we might risk failing to keep up the speed of safety progress simply because every action risks an increase in capabilities. Rather than a “do no harm” principle, I think counterfactuals need to be examined in these situations, where we must consider if there is a greater risk if we *don’t* do research in a certain domain.
2023 Me: Oh, oh, oh! I think Me’22 was actually ahead of the curve on this one. This might still be controversial, but I think many got the “capabilities space” wrong. Many AIS-inspired theories that could increase capabilities are for systems that could be safer, more interpretable, and easier to monitor by default. And by not working on such systems we instead got the much more inscrutable, dangerous models by default, because the more dangerous models are easier. To quote the vape commercials, “safer != safe” but I still quit smoking in favor of electronics because safer is still at least safer. This is probably a moot point now, though, since I think it’s likely too late to create an entirely new paradigm in AI architectures. Hopefully Me’24 will be happy to tell me we found a new, 100% safe and effective new paradigm that everyone’s hopping on. Or maybe he’ll invent it.
[crossposting my reply]
Thank you for taking the time to read and critique this idea. I think this is very important, and I appreciate your thoughtful response.
Regarding how to get current systems to implement/agree to it, I don't think that will be relevant longterm. The mechanisms current institutions use for control I don't think can keep up with AI proliferation. I imagine most existing institutions will still exist, but won't have the capacity to do much once AI really takes off. My guess is, if AI kills us, it will happen after a slow-motion coup. Not any kind of intentional coup by AIs, but from humans just coup'ing themselves because AIs will just be more useful. My idea wouldn't be removing or replacing any institutions, but they just wouldn't be extremely relevant to it. Some governments might try to actively ban use of it, but these would probably be fleeting, if the network actually was superior in collective intelligence to any individual AI. If it made work economically more useful for them, they would want to use it. It doesn't involve removing them, or doing much to directly interfere with things they are doing. Think of it this way, recommendation algorithms on social media have an enormous influence on society, institutions, etc. Some try to ban or control them, but most can still access them if they want to, and no entity really controls them. But no one incorporates the "will of twitter" into their constitution.
The game board isn't any of the things you mention. All the things you mention I don't think have the capacity to do much to change the board. The current board is fundamentally adversarial, where interacting with it increases the power of other players. We've seen this with OpenAI, Anthropic, etc. The new board would be cooperative, at least at a higher level. How do we make the new board more useful than the current one? My best guess would be economic advantage of decentralized compute. We've seen how fast the OpenSource community has been able to make progress. And we've seen how a huge amount of compute gets used doing things like mining bitcoin, even though the compute is wasted on solving math puzzles. Contributing decentralized compute to a collective network could actually have economic value, and I imagine this will happen one way or another, but my concern is it'll end up being for the worse if people aren't actively trying to create a better system. A decentralized network with no safeguards would probably be much worse than anything a major AI company could create.
"But wouldn't the market be distorted by the fact that if everyone ends up dead, there is nobody left alive to collect their prediction-market winnings?"
This seems to be going back to the "one critical shot" approach which I think is a terrible idea that won't possibly work in the real world under any circumstances. This would be a progression overtime, not a case where an AI goes supernova overnight. This might require slower takeoffs, or at least no foom scenarios. Making a new board that isn't adversarial might mitigate the potential of foom. What I proposed was my first naive approach, and I've since thought that maybe it's the collective intelligence of the system that should be increasing, not a singleton AI being trained at the center. Most members in that collective intelligence would initially be humans, and slowly more and more AIs would be a more and more powerful part of the system. I'm not sure here, though. Maybe there's some third option where there's a foundational model at the lowest layer of the network, but it isn't a singular AI in the normal sense. I imagine a singular AI at the center could give rise to agency, and probably break the whole thing.
"It seems to me that having a prediction market for different alignment approaches would be helpful, but would be VERY far from actually having a good plan to solve alignment."
I agree here. They'd only be good at maybe predicting the next iteration of progress, not a fully scalable solution.
"I feel like we share many of the same sentiments -- the idea that we could improve the general level of societal / governmental decision-making using innovative ideas like better forms of voting, quadratic voting & funding, prediction markets, etc"
This would be great, but my guess is they would probably progress too slowly to be useful. Mechanism design that has to deal with currently existing institutions I don't think will happen quickly enough. Technically-enforced design might.
I love the idea of shovel-ready strategies, and think we need to be prepared in the event of a crisis. My issue is even most good strategies seem to just deal with large companies, and don't know how to deal with the likelihood that such power will fall into more and more actors.
"If the boxed superintelligence with the ability to plan usage of weapons when authorized by humans, and other boxed superintelligences able to control robotics in manufacturing cells are on humans side, the advantage for humans could be overwhelming"
As I said, I do not expect boxed AIs to be a thing most will do. We haven't seen it, and I don't expect to see it, because unboxed AIs are superior. This isn't how people in control are approaching the situation, and I don't expect that to change.
"keep it relegated to "tool" status, then it might be possible to use such an AI to combat unboxed, rogue AI"
I don't think this is a realistic scenario. You seem to be seeing it as an island of rogue, agentic, "unboxed" AIs in a sea of tool AIs. I think it's much, much more realistic that it'll be the opposite. Most AIs will be unboxed agents because they are superior.
"For example, give it a snapshot of the internet from a day ago, and ask it to find the physical location of rogue AI servers, which you promptly bomb."
This seems to be approaching it from a perspective where people in AIS have taken global control, or where normal people somehow start thinking the way they do. This is not realistic. This is not the world we live in. This is not how the people in control think.
"You should keep talking about this because if an AI needs military access to dominate the world, then the number of potentially dangerous AI goes from the hundreds of thousands or millions to a few dozen, run by large countries that could theoretically be kept in line with international treaties."
This is a topic that I debated putting on the list, but resolved not to, but I don't think humans have any real control at that point, regardless of treaties. I don't even expect a rogue AI to have to forcefully coup'd humans. I expect us to coup'd ourselves. We might have figureheads occupying official positions, such as "President"/"CEO"/etc. but I don't think humans will have much control over their own destiny by that point. Large-scale coordination I don't think will be possible by then. I did remove it, because it seems more uncertain than the others listed.
"You should keep talking about this because it changes how many AI's you'd have to monitor as active threats."
Who is doing this monitoring? What is their power to act on such threats? Despite recent interest in AI Risk from "serious people", I don't think it's at all realistic that we'll see anything like this.
"If 99.9% of AGI are dangerous, there might be warning shots that can be used to pre-emptively ban AGI research in general."
Probability distributions of how many AIs are dangerous is probably useful. I don't think specific AIs being dangerous/non-dangerous will be, because I expect widespread proliferation. In terms of political ways out of the problem, I agree that some kind of crisis or "warning shot" is the most realistic situation where that might happen. But there have to actually be warning shots. Explaining thought experiments probably won't matter. And, if that happens, I don't think it would be a good idea to debate which specific AIs might kill you, and instead just call for a sweeping ban on all AI.
Are you familiar with Constellation's Proof of Reputable Observation? This seems very similar.
The following is a conversation between myself in 2022, and a newer version of me earlier this year.
On the Nature of Intelligence and its "True Name":
2022 Me: This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simulator. From my very naive perception of neuroscience, it could be that we (our intelligence) is not agentic, but just simulates agents. In this situation, the prefrontal cortex not only runs simulations to predict its next sensory input, but might also run simulations to predict inputs from other parts of the brain. In this scenario, “desire” or “goals”, might be simulations to better predict narrowly-intelligent agentic optimizers. Though the simulator might be myopic, I think this prediction model allows for non-myopic behavior, in a similar way GPT has non-myopic behavior, despite only trying to predict the next token (it has an understanding of where a future word “should” be within the context of a sentence, paragraph, or story). I think this model of G allows for the appearance of intelligent goal-seeking behavior, long-term planning, and self-awareness. I have yet to find another model for G that allows for all three. The True Name of G might be Algorithm Optimized To Reduce Predictive Loss.
2023 Me: interesting, me’22, but let me ask you something: you seem to think this majestic ‘G’ is something humans have, but other species do not, and then name the True Name of ‘G’ to be Algorithm Optimized To Reduce Predictive Loss. Do you *really* think other animals don’t do this? How long is a cat going to survive if it can’t predict where it’s going to land? Or where the mouse’s path trajectory is heading? Did you think it was all somehow hardcoded in? But cats can jump up on tables, and those weren’t in the ancestry environment, there’s clearly some kind of generalized form of prediction occurring. Try formulating that answer again, but taboo “intelligence”, “G”, “agent”, “desire”, and “goal”. I think the coherence of it breaks down.
Now, what does me’23 think? Well, I’m going to take a leaf from my own book, and try to explain what I think without the words mentioned above. There are predictive mechanisms in the Universe that can run models of what things in the Universe might do in future states. Some of these predictive mechanism are more computationally efficient than others. Some will be more effective than others. A more effective and efficient predictive mechanism, with a large input of information about the Universe, could be a very powerful tool. If taken to the theoretical (not physical) extreme, that predictive mechanism would hold models of all possible future states. It could then, by accident or intention, guide outcomes toward certain future states over others.
2022 Me: according to this model, humans dream because the simulator is now making predictions without sensory input, gradually creating a bigger and bigger gap from reality. Evidence to support this is from sensory-deprivation tanks, where humans, despite being awake, have dream-like states. I also find it interesting that people who exhibit Schizophrenia, which involves hallucinations (like dreams do), can tickle themselves. Most people can be tickled by others, but not themselves. But normal people on LSD can do this, and also can have hallucinations. My hairbrained theory is that something is going wrong when initializing new tokens for the simulator, which results in hallucinations from the lack of correction from sensory input, and a less strong sense of self because of a lack of correction from RL agents in other parts of the brain.
2023 Me: I don’t want to endorse crackpot theories from Me’22, so I’m just going to speak from feelings and fuzzy intuitions here. I will say hallucinations from chatbots are interesting. When getting one to hallucinate, it seems to be kind of “making up reality as it goes along”. You say it’s a Unicorn, and it will start coming up with explanations for why it’s a Unicorn. You say it told you something you never told it, and it will start acting as though it did. I have to admit it does have a strange resemblance to dreams. I find myself in New York, but remember that I had been in Thailand that morning, and I hallucinate a memory of boarding a plane. I wonder where I got the plane ticket, and I hallucinate another memory of buying one. These are not well-reasoned arguments, though, so I hope Me’24 won’t beat me up too much about them.
2022 Me: I have been searching for how to test this theory. One interest of mine has been mirrors.
2023 Me: Don’t listen to Me’22 on this one. He thought he understood something, but he didn’t. Yes, the mirror thing is interesting in animals, but it’s probably a whole different thing, not the same thing.
Thanks, finding others who are working on similar things is very useful. Do you know if the reading group is still active, or if they are working on anything new?
Given that I don't know when Schelling Day is, I doubt its existence.
If we're being realistic, this kind of thing would only get criminalized after something bad actually happened. Until then, too many people will think "omg, it's just a Chatbot". Any politician calling for it would get made fun of on every Late Night show.
Yeah, all the questions over the years of "why would the AI want to kill us" could be answered with "because some idiot thought it would be funny to train an AI to kill everyone, and it got out of hand". Unfortunately, stopping everyone on the internet from doing things isn't realistic. It's much better to never let the genie out of the bottle in the first place.
I'm currently thinking that if there are any political or PR resources available to orgs (AI-related or EA) now is the time to use them. Public interest is fickle, and currently most people don't seem to know what to think, and are looking for serious-seeming people to tell them whether or not to see this as a threat. If we fail to act, someone else will likely hijack the narrative, and push it in a useless or even negative direction. I don't know how far we can go, or how likely it is, but we can't assume we'll get another chance before the public falls back asleep or gets distracted (the US has an election next year, so most discourse will then likely become poisoned). This is especially important for those in the community who are viewed as "serious people" or "serious organizations" (lots of academic credentials, etc.)
Yeah, since the public currently doesn't have much of an opinion on it, trying to get the correct information out seems critical. I fear some absolutely useless legislation will get passed, and everyone will just forget about it once the shock-value of GPT wears off.
Unfortunately, he could probably get this published in various journals, with only minor edits being made.