Posts
Comments
Non-invasive BCI, as in, getting ChatGPT suggestions and ads in your thoughts?
I was mostly thinking in terms of computer-to-brain direction represented by psychoactive audio-visual modalities. Yes, this might be roughly on par with taking strong psychedelics or strong stimulants, but with different safety-risks trade-offs (better ability to control the experience, and less physical side effects, if things go well, but with potential for a completely different set of dangers if things go badly).
Yes, this might not necessarily be something one wants to dump on the world at large, at least not until select groups have more experience with it, and the safety-risk tradeoffs are better understood...
And if you're thinking about offering this tech to AI researchers only, that doesn't seem feasible. As soon as it exists, people will know they can make bank by commercializing it and someone will.
Well, the spec exists today (and I am sure this is not the only spec of this kind). All that separates this from reality is willingness of a small group of people of get together and experiment with inexpensive ways of building it.
Given that people are very sluggish converting theoretically obvious things to reality as long as those theoretically obvious things are not in the mainstream (cf. it being clear that ReLU must be great since at least the year 2000 paper in Nature, and the field ignoring them till 2011), I don't know if "internal use tools" would cause all that much excitement.
If you need a more contemporary example related to Cybogism, Janus' loom is a superpowerful interface to ChatGPT-like systems, it exists, it's open source, etc. And so what? How many people even know about it, never mind using it?
Of course, if people start advertising it along the lines, "hey, take this drug-free full-strength psychedelic trip", yeah, then it'll become popular ;-)
in a way unfixable by humans
I do think collaborative human-AI mindset instead of adversarial mindset is the only feasible way, cf. my comments on Ilya Sutskever thinking in https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a.
If we want to continue thinking in terms of "us-vs-them", the game has been lost already.
Well, I am not a "leading AI researcher" (at least, not in the sense of having a track record of beating SOTA results on consensus benchmarks, which is how one usually defines that notion), but I am one of those who are trying to change the situation with non-invasive BCI not being more popular. My ability to have any effect on this, of course, does depend on whether I have enough coding and organizational skills.
But one of the points of the dialogue for me was to see if that might actually be counter-productive from the viewpoint of AI existential safety (and if so, whether I should reconsider).
And in this sense, some particular underwater stones to watch for were identified during this dialogue (whereas, I was previously mostly worrying about direct dangers to participants stemming from tight coupling with actively and not always predictably behaving electronic devices, even if the coupling is via non-invasive devices, so I was spending some time minding those personal safety aspects and trying to establish a reasonable personal safety protocol).
That's only after it becomes strongly self-improving on its own. Until then, human AI researchers and human AI engineers are necessary, and they have to connect to computers via some kind of interface, so a "usual interface" vs "high-end non-invasive BCI + high-end augmented reality" is a trade-off leading AI organizations will need to consider.
Of course, any realistic or semi-realistic AI existential safety plan includes tight collaboration between human AI safety researchers and advanced AI systems (e.g. Cyborgism community on LessWrong, or various aspects of OpenAI evolving AI existential safety plans, e.g. https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a). So here the question of interfaces is also important.
But nothing is full-proof. AIs systems can "run away from humans interacting with them", or "do something on the side without coordinating with humans", or humans and their motivations can be unfavorably modified in the process of their tight interactions with AIs, and so on...
Yet, it does not look like the world has any chance to stop, especially with the new "AI alliance" announcement (it seems we don't have a chance to even stop radical open-sourcing of advanced systems anymore, see e.g. https://arstechnica.com/information-technology/2023/12/ibm-meta-form-ai-alliance-with-50-organizations-to-promote-open-source-ai/, which is very different from "Meta is unilaterally open-sourcing some very advanced models, and perhaps it can be stopped", never mind a coordinated industry-wide slow down across the board)... So we should try our best to figure out how to improve the chances assuming that timelines might be rather short...
Thanks a lot!
(I think I am so used to Markdown that I am not handling correctly the fact that the dialogues seem to be in LessWrong Docs format. Is this a matter of what a given dialogue is set to in the beginning, or are dialogues inherently LessWrong Docs only?)
Thank you!
Success :-)
Yes, thanks!
I am familiar with some work from MIRI about that which focuses on Loebian obstacle, e.g. this 2013 paper: Tiling Agents for Self-Modifying AI, and the Löbian Obstacle.
But I should look closer at other parts of those MIRI papers; perhaps there might be some material which actually establishes some invariants, at least for some simple, idealized examples of self-modification...
We should not merge with AI [...] Create digital humans.
I have been confused for a while about
- boundary between humans merging with AI and digital humans (can these approaches be reliably differentiated from each other? or is there a large overlap?)
- why digital humans would be a safer alternative than the merge
So this seems like it might be a good occasion to ask you to elaborate on this...
(this is an answer to gwern's answer above posted 3 hours ago, https://www.lesswrong.com/posts/tbJdxJMAiehewGpq2/impressions-from-base-gpt-4?commentId=uKxyTDuvrKEZzSpBc; replying to the answers at LW does not seem to work correctly at the moment; I am told that a pull request with a fix is pending.)
Yes, this is very interesting.
However, this is a very risk-oriented presentation.
It would be nice to have a more balanced picture. "Capabilities are not always bad", to say the least...
We would like to have competent science and engineering assistance, and more. We need to solve cancer and aging, and we are not going to do that successfully without strong assistance from AIs...
However, the risk and safety aspects are very important...
I do hope, in this sense, that Ilya will continue to lead their existential safety effort. His thoughts about that, as in https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a and as in his thinking that we should try to make it so that super-smart AIs are imprinted on us as parents are imprinted on their children seem to be really on target; his approach seems to me to be one of the most promising.
Which is why I am particularly anxious to see that he continues to lead OpenAI existential safety effort. He seems to be thinking high quality thoughts about AI existential safety, he is extremely high class as a scientist, and it would be good to have him near the leading capability effort, focusing on the existential safety aspects...
So, to summarize, I think the key upside of this dialogue is a rough preliminary sketch of a bridge between the formalism of ethicophysics and how one might hope to use it in the context of AI existential safety.
As a result, it should be easier for readers to evaluate the overall approach.
At the same time, I think the main open problem for anyone interested in this (or in any other) approach to AI existential safety is how well does it hold with respect to recursive self-improvement.
Both the powerful AIs and the ecosystems of powerful AIs have inherently very high potential for recursive self-improvement (which might be not unlimited, but might encounter various thresholds at which it saturates, at least for some periods of time, but nevertheless is likely to result in a period of rapid changes, where not only capabilities, but the nature of AI systems in question, their architecture, algorithms, and, unfortunately, values, might change dramatically).
So, any approach to AI existential safety (this approach, and any other possible approach) needs to be eventually evaluated with respect to this likely rapid self-improvement and various self-modification.
Basically, is the coming self-improvement trajectory completely unpredictable, or could we hope for some invariants to be preserved, and specifically could we find some invariants which are both feasible to preserve during rapid self-modification and which might result in the outcomes we would consider reasonable.
E.g. if the resulting AIs are mostly "supermoral", can we just rely on them taking care that their successors and creations are "supermoral" as well, or are any extra efforts on our part are required to make this more likely? We would probably want to look at "details of the ethicophysical dynamics" closely in connection with this, rather than just relying on the high-level "statements of hope"...
It is related in spirit, yes...
I think when Dana Scott was first doing this kind of "asymmetric topology" in late 1960-s/early 1970-s, in some of his constructions he did focus on the bases which were like rational numbers, and then it's really similar in spirit...
(And when I started to work with his formalism in mid-1980-s and early 1990-s, I also focused on those bases, because it was easier to think that way, it was less abstract that way...)
Yes, I think this is a very interesting feature of your formalism. These "love" and "hate" are "abstract counters", their relationship with subjective feelings is complicated.
But this might be the correct way to capture this "ethicophysics" (it is a frequent situation in modern physics that there is some tension between naive intuition and the correct theory (starting with relativistic space-time and such)).
Let''s interact, try to work together a bit, think together, this might be quite fruitful :-) Perhaps, I'll actually earn a co-authorship, we'll see :-)
This monotonicity (non-decrease in accumulated love and hate) is interesting (it resembles motifs from Scott topology used in denotational semantics).
And this decomposition into positive and negative components which evolve monotonically does resemble motifs in some of my math scribblings...
So, about the decomposition into positive and negative components which evolve monotonically:
So, basically, if one considers real numbers, one can define a strange non-Hausdorff topology on them, so that continuous transformations are monotocally non-decreasing functions, "continuous on the left", and the open sets being open rays pointing upward. There is also a dual space with open sets being open rays pointing downward (I am thinking in terms of a vertical real line, with positive numbers above, and negative numbers below). They have quasi-metrics as distances, ReLU(y-x) and ReLU(x-y), so that going along one direction accumulates a usual distance on the meter, but going in the opposite direction accumulates zero (like a toll bridge charging toll only in one direction).
One of the most interesting mathematical structures in this sense comes from interval numbers, but there is a bit of twist to those interval numbers, one might want to even allow "partially contradictory interval numbers", and then the math becomes more straightforward. It's probably the best to share a few pages I scribbled on this 10 years ago: https://anhinga.github.io/brandeis-mirror/PartiallyInconsistentIntervalNumbers.pdf
(Eventually this ended up as a part of Section 4 of this "sandwich paper" (where Section 4 is the "math filling" of the sandwich): https://arxiv.org/abs/1512.04639)
Great!
Yes, it often requires a conversation; taking a few turns to discuss the situation and create enough context for it...
The difference between GPT-4 and GPT-3.5 is rather drastic. GPT-4 might still be the only "next generation" system available for public use in the world today (although, to be fair, I have not explored new Anthropic models), and I think it is the first system which often seems to actually demonstrate "true understanding", whatever the nature of that phenomenon might be.
Let's do another iteration with your comment as the prompt:
GPT-4:
To correctly handle the situation where cmd
expects filenames as arguments and those files are actually streams of data from subprocesses, you can use the /dev/fd
filesystem on Unix-like systems. This special filesystem allows you to treat file descriptors as file paths, which is essentially what Bash's process substitution does.
Here's how you can modify the Python script to achieve this:
- Use subprocesses to handle the download and decompression, just like before.
- Instead of writing to named pipes, pass the file descriptors of these subprocesses directly to
cmd
.
Here's the revised script:
import subprocess
import os
def run_pipeline(path1, path2, pathOut):
# Start downloading the first file and pipe it to gunzip
dl1 = subprocess.Popen(["aws", "s3", "cp", path1, "-"], stdout=subprocess.PIPE)
gunzip1 = subprocess.Popen(["gunzip"], stdin=dl1.stdout, stdout=subprocess.PIPE)
# Start downloading the second file and pipe it to gunzip
dl2 = subprocess.Popen(["aws", "s3", "cp", path2, "-"], stdout=subprocess.PIPE)
gunzip2 = subprocess.Popen(["gunzip"], stdin=dl2.stdout, stdout=subprocess.PIPE)
# Wait for the pipes to be ready
dl1.stdout.close()
dl2.stdout.close()
# Run the command with file descriptors
cmd = subprocess.Popen(["cmd", "-1", f"/dev/fd/{gunzip1.stdout.fileno()}", "-2", f"/dev/fd/{gunzip2.stdout.fileno()}"], stdout=subprocess.PIPE)
# Pipe cmd's output to gzip and then upload
gzip = subprocess.Popen(["gzip"], stdin=cmd.stdout, stdout=subprocess.PIPE)
subprocess.run(["aws", "s3", "cp", "-", pathOut], stdin=gzip.stdout)
# Cleanup
gunzip1.stdout.close()
gunzip2.stdout.close()
gzip.stdout.close()
# Example usage
run_pipeline("s3://path/to/file1", "s3://path/to/file2", "s3://path/to/output")
In this script:
* `gunzip1.stdout.fileno()` and `gunzip2.stdout.fileno()` give the file descriptors of the gunzip processes.
* These descriptors are converted to file paths using `/dev/fd/`.
* `cmd` is then invoked with these file paths as arguments, making it read from the streams.
This approach replicates the behavior of Bash's process substitution in Python, allowing cmd
to read from the streams as if they were files. Note that this method is specific to Unix-like systems that support the /dev/fd
filesystem.
(But, of course, if one would really like an API like pipeline = subprocess.Pipeline()
then perhaps it is possible to implement a Pipeline
class which would do exactly what you describe in the example.
Or, perhaps, a Pythonic wrapper which has a reasonable user-facing syntax, but converts it into the shell call above, would be even better...)
Yes, so, just like in my example, the first round of conversation is nearly useless. Let's see, if the second iteration is any better (it does turn to named pipes):
ME (without triple quotes, but here I am including them for better formatting):
Unfortunately, this is concatenating the two downloads and providing them to cmd on it's stdin
# Run the command with the output of the two gunzip processes as inputs
cmd = subprocess.Popen(["cmd", "-1", "-", "-2", "-"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
What we'd like to happen instead is to process those two downloads by cmd in parallel.
GPT-4:
To process two downloads in parallel and provide them as separate inputs to a command using Python's subprocess
module, you can employ named pipes (FIFOs). Named pipes can be created using the os.mkfifo
method, which allows two independent data streams for the two downloads.
Here's a revised version of the script that uses named pipes for parallel processing:
import subprocess
import os
import threading
def stream_to_fifo(fifo_name, command):
with subprocess.Popen(command, stdout=subprocess.PIPE) as process:
with open(fifo_name, 'wb', 0) as fifo:
for line in process.stdout:
fifo.write(line)
def run_pipeline(path1, path2, pathOut):
fifo1 = "/tmp/fifo1"
fifo2 = "/tmp/fifo2"
# Create named pipes
os.mkfifo(fifo1)
os.mkfifo(fifo2)
# Start the download and gunzip processes, streaming to FIFOs
threading.Thread(target=stream_to_fifo, args=(fifo1, ["aws", "s3", "cp", path1, "-", "|", "gunzip"])).start()
threading.Thread(target=stream_to_fifo, args=(fifo2, ["aws", "s3", "cp", path2, "-", "|", "gunzip"])).start()
# Run the command with the FIFOs as inputs
with open(fifo1, 'rb') as input1, open(fifo2, 'rb') as input2:
cmd = subprocess.Popen(["cmd", "-1", "-", "-2", "-"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
# Create threads to avoid blocking
def stream_to_cmd(input_stream):
for line in input_stream:
cmd.stdin.write(line)
input_stream.close()
threading.Thread(target=stream_to_cmd, args=(input1,)).start()
threading.Thread(target=stream_to_cmd, args=(input2,)).start()
# Close cmd's stdin after the inputs are done
cmd.stdin.close()
# Pipe cmd's output to gzip and then upload
gzip = subprocess.Popen(["gzip"], stdin=cmd.stdout, stdout=subprocess.PIPE)
subprocess.run(["aws", "s3", "cp", "-", pathOut], stdin=gzip.stdout)
# Cleanup
os.remove(fifo1)
os.remove(fifo2)
# Example usage
run_pipeline("s3://path/to/file1", "s3://path/to/file2", "s3://path/to/output")
This script creates two named pipes and starts two threads to handle the downloads and the decompression. The outputs of these processes are directed to the named pipes. The main command reads from these pipes. The use of threads ensures that the reading and writing to the command's stdin do not block each other.
This approach is more complex than the initial script but allows the parallel processing of the two data streams as required. As before, error handling and cleanup are simplified for demonstration purposes and should be robustly implemented for production use.
I got curious whether I can get a good answer from GPT-4. Not a "clean, ready-to-use" answer, but something which would point me in the right direction
Generally, I was having a good experience getting software help out of it. For example, I wanted to understand how to transform nested dictionaries in immutable fashion, so that JAX autodiff would be able to take gradients through the code, and after exchanging 4 questions and answers I got a good idea how to do it.
Here is that dialog (the first answer it gave was nearly useless, but it got much better with further conversation): https://chat.openai.com/share/86553014-a143-4b3c-8602-7d9b5e017343
So, I asked GPT-4 for the initial suggestions here: https://chat.openai.com/share/30dd8073-b85f-4a43-a298-a5a0c2169433.
Unfortunately, I didn't know enough to evaluate if it started to move in the right direction. But it did suggest something different from what's in the post, namely using Popen
class from subprocess
module. I would be curious to know if this seems to be moving in the right direction...
ReLU
Indeed.
This monotonicity (non-decrease in accumulated love and hate) is interesting (it resembles motifs from Scott topology used in denotational semantics).
And this decomposition into positive and negative components which evolve monotonically does resemble motifs in some of my math scribblings...
I am trying to think about the informal meaning of op(a,b) on page 8.
Am I correct that we impose a condition that l(a,b) and h(a,b) are always non-negative? And that their derivatives can't be below 0, so that l(a,b) and h(a,b) are monotonically non-decreasing with time?
Indeed.
That would be a pretty non-trivial work, though, since dissipative physics is not Hamiltonian, so it is likely to require different techniques.
It does make an interesting read so far (5 pages), a rewarding experience.
Reading this text does require some open-mindedness. For example a reader might firmly believe that to use the term "soul" is unacceptable, or a reader might firmly believe that the term "soul" means an entity having a "first-person experience". So a reader needs to be ready to set this kind of firm beliefs aside temporarily (only while reading this paper) in order to "grok" this model.
So far, the only thought which occurred to me is that not only conventional love and hate, but also love and hate as defined in the text tend to dissipate with time, basically one that can't store an accumulated positive or negative emotion without letting them dissipate. But for the purpose of this model we might nevertheless want to require those to be abstract quantities which don't dissipate (this is what the reasoning about not currently existing actors seem to require). So "love" and "hate" defined in this paper seem to be corresponding to abstract meters counting what has been accumulated and not dissipating.
Thanks, that helps. The text was tempting, but without a hint of how
The theory has two key advantages: it squares nicely with most human moral intuitions, and it is amenable to rather straightforward computations that a computer could easily perform if told to.
might actually be achieved, the undertaking of reading it seemed too formidable.
With this comment it's now feasible to try to understand the details.
b) does not solve it without a lot of successful work on multipolar safety (it's almost an equivalent of giving nuclear weapons to lots of people, making them widely accessible; and yes, giving gain-of-function labs equipment too)
a) is indeed very reasonable, but we should keep in mind that upgrade is a potentially stronger impact than any psychoactive drugs, a potentially stronger impact than any most radical psychedelic experiences. Here the usual "AI alignment problem" one is normally dealing with is replaced by the problem of conservation of one's values and character.
In fact these problems are closely related. The most intractable part of AI safety is what happens when AI ecosystems starts to rapidly recursively self-improve, perhaps with significant acceleration. We might have current members of AI ecosystem behave in a reasonably safe and beneficial way, but would future members (or same members after they self-improve) behave safely, or would "a sharp left turn" happen?
Here it is the same problem for a rapidly improving and changing "enhanced human", would that person continue to maintain the original character and values while undergoing radical changes and enhancements, or would drastic new realizations (potentially more radical than any psychedelic revelations) lead to unpredictable revisions of that original character and values?
It might be the case that it's easier to smooth these changes for a human (compared to AI), but the success is not automatic by any means.
It's a mixture of reasons...
But, first of all, a lot of people (not just people in OpenAI) love Sam on the personal level, that's very clear, and they love both what he is doing (with OpenAI, with Helion, with Retro Biosciences), and how he is presenting himself, what he is saying, his demeanor, and so on.
Next key factor was that any outcome besides Sam's return would have damaged the company and the situation a lot at the worst possible moment, when the company had a clear lead, was riding a huge wave of success, had absolutely best models, and so on. They all understood how crucial was the role Sam played in all that, and how crucial would his role be in the future too. So there were making the strongest possible play to prevent any outcome besides Sam's return. They expected to win, they were playing to maximize the chances of winning, and they did not expect to lose and then to have to decide if they really want to join MSFT (both having to join MSFT and having to stay in the semi-destroyed OpenAI would be bad compared to what they had).
But out of the factors listed, 1+2+3+(4 for many of them, not for all)+5+(6 for some of them)+(7, not so much being afraid of "imploding", but more afraid of becoming a usual miserable corporate place, where one drags oneself to work instead of enjoying one's work)
Yes, nevertheless the S-risk and X-risk problems don't go away. There are humans who like causing suffering. There are human advocating for human extinction (and some of them might act on that given the capabilities). There are humans who are ready to fight wars with weapons which might cause extinction, or would be ready to undertake projects which might cause extinction or widespread suffering.
Stepping back, we know that Eliezer was very much against anthropomorphic superintelligences in 2011. He thought we needed much higher levels of safety ("provably friendly AI", which would not be possible with something as messy as human-like systems). Since then he strongly updated towards pessimism regarding our chances to create beneficial artificial superintelligence, and he arrived at the conclusion that our chances with biological superintelligence might be higher.
But it would be good to try to articulate what are the reasons for our chances with biological superintelligence to be higher.
One aspect is that we do have an intuition that biology-based systems are likely to self-improve slower, and thus would have more time to ponder solutions to various issues as they get smarter. So they might be not superintelligent, but just very smart for quite a while, and during that period they would decide what to do next. Another aspect is that biology-based systems are more likely to be automatically sentient, and their sentience is more likely to be at least somewhat similar to ours, and so even if things go badly initially, the chances for having a lot of value in the future lightcone are higher, because it is more likely that there would be first-person experiencers.
But it would be good to pause and think whether we are sure. Also speaking of
Brain implants/brain-computer interfaces: Devices under development by companies such as Neuralink, Kernel, Openwater, and Meta’s Reality Labs. Could hypothetically enhance human intelligence.
these devices can also lead to the hybrid human-AI systems, and that might be a more technologically likely route. The hybrid system becomes smarter, both because of its biological part working better, but also because of a tight coupling with an AI thinking part. In some cases of BCI use, it might be difficult to distinguish between straight human intelligence enhancement and creation of a hybrid thinker. We might want to ponder whether this is a desirable route. (I personally find this route very attractive for a number of reasons, but safety issues along this route are quite acute as well).
You may pick your metaphysical axioms as they were revealed to you in a dream and they'll be as good as anything.
But that's not arbitrary at all. That probably reflects some deep subconscious intuitions which are not arbitrary.
And these kinds of intuitive updates happen first, before philosophical reflections on the meta-level.
But then we are the type of people inclined to philosophically reflect on the meta-level about all this. One can argue whether these reflections make any sense or not, we'll still continue to reflect on the meta-level once in a while and we'll try to apply some non-rigorous approximate reasoning, since fully rigorous reasoning is not available.
In fact, this dialog between us is an example of this kind of meta-level reflection.
Sometimes I am thinking about those metaphyhical sets of axioms as "philosophical coordinate systems", and one of those "philosophical coordinate systems"_ might feel more convenient at a given moment, and another one might feel more convenient at a given moment, depending on how reality looks...
When I think about this philosophically, I don't think about one of them being "really true", and others not being "really true". Instead, in recent years I tend to think about a multiverse, with me moving between branches of reality, between alternative realities, with those realities being governed by different systems of axioms and having somewhat different phenomenologies.
But if I get a feel that those metaphysical axioms no longer fit without contorting things too much, I easily replace them...
This is even easier for me, because I have the current "default set" of those axioms, the one which I intuitively rely upon when I don't do any special efforts, and the alternative sets of axioms, which I think about when I ponder all this philosophically.
I am very much aware that there is no "objective way" to choose among those sets of axioms, and that, moreover, the "true set of axioms" might not even be among the candidate sets I am aware of.
But that does not in any way prevent me from letting one of the sets of axioms I aware of to replace my current "default set of axioms" if my intuition starts suggesting that the other set of axioms fits better. That happens way before I ponder this kind of shift in my axioms philosophically and reflect on it.
So, in one period of my life I might feel materialistic, and I might live accordingly, and in a different period of my life I might feel "connected to a higher entity", and I would live accordingly, and in yet another period I feel particularly agnostic and I would stay on a meta-level and focus on how I am not really sure...
But that's not how one operates in the world.
One knows that in principle anything can be a hallucination, and that only very rare events have true certainty (and perhaps none, because how can one be sure that information is genuinely new), but going by this, one would hardly be able to operate a car or do anything remotely risky, because anything one sees can be a hallucination.
Instead one is just doing what feels intuitively reasonable, occasionally pausing to ponder all this.
So, here it is the same, one is leaning towards what feels intuitively reasonable, occasionally pausing to ponder all this at the meta-level.
And, for example, if one is agnostic in principle, but has one dominant world view, so one of these priors is large and other priors are small, and things which happen feel very weird, this is a good reason to make one's dominant prior smaller (and hence make other priors larger).
everyone will merrily go on believing what they already do, whatever happens.
I certainly have been updating my personal views in this sense rather drastically.
So, empirically, "believing what they already do" does not seem to universally hold.
I don't think the change is arbitrary at all. The change is guided by my intuition about all this. Let's see if we can formalize this a bit.
Let's say... let's consider a "multinomial approximation to agnosticism", where one takes a finite number of mutually exclusive possibilities and assigns some non-zero priors to them.
Then one conditions how likely an experience seems to be, conditional on a particular world view. If the experience seems less likely that the likelyhood of that particular world view being true according to one's current prior, then one adjusts that prior down somewhat. One does this for each world view in one's set of mutually exclusive possibilities.
And if some priors go down, then other priors are going up, because one still wants them to sum up to 1. And the priors for worldviews particularly compatible with this experience eat up most of this increase.
(I understand that what I am doing here is very crude, a true Bayesian should be able to do better. But I think it looks likely that one can make all this more precise and create an epistemologically reasonable procedure for adjusting one's priors here.)
What you call "open minded" is "you already believe that a certain kind of experience qualifies as spiritual"
What if you are not sure?
For example, in a purely epistemological sense, I personally believe in the need to maintain a good deal of agnosticism.
Then the question becomes: if one decides that the ground rules are to maintain a good deal of agnosticism, and to admit a variety of world models as possibilities (let's say, to put significant priors on each in a variety of mutually incompatible world models "being actually true"), then how should one move adjusting those priors depending on the evidence?
I presume that the goal is not to push some of those priors to zero, but to change their relative values...
It's a good question.
I don't even know if the clarifying edit is new, since that's substack and not github or wikipedia, where one can see the history of edits.
But a new comment was posted 22 hours ago on that substack post, https://mistaketheory.substack.com/p/new-role-agi-alignment-research, perhaps there will be a response from the author (and perhaps the clarifying edit is related to this substack post becoming more known lately, via this post by Zvi or via other channels).
Any kind of mind altering experience for example can obviously just be in one's brain - does not have to be depending on your views on consciousness, but can.
Right. If one follows the "standard mainstream scientific framework", any kind of experience whatsoever (including experience of hearing and seeing another person talking to you) is in one's brain, the only question is what induces that experience, what is the world model behind it.
So materialism can almost by definition subsume any of these phenomena, even if unknown, as long as they can be empirically observed, tested and predicted.
Yes, but should it? This depends on one's priors. If one has very firm priors in favor of materialism, it's one thing. If one starts from a more agnostic and open-minded position, then it's different.
Cf. https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness which notes that on the qualia-related group of issues people are mostly divided into 2 camps which don't really understand each other:
The basic model I'm proposing is that core intuitions about consciousness tend to cluster into two camps, with most miscommunication being the result of someone failing to communicate with the other camp. For this post, we'll call the camp of boldface author Camp #1 and the camp of quotation author Camp #2.
For example, for people who are in Camp #2 and think it makes sense to talk about qualia, and who are aware that if materialistic solution to the "hard problem of qualia" is at all possible, it is way in the future and not currently available, it makes much more sense to question materialism.
Whereas I would expect people from Camp #1 to be leaning much more towards materialism...
What kind of experience can absolutely be ruled out as being materialist and admits only a spiritual explanation?
But it does not have to be absolute, right?
Both rationally and pragmatically, this should depend on one's priors.
And speaking in terms of priors, when I talk to someone I usually don't attempt to increase my certainty that the experience is not a hallucination by trying to touch that person.
Realistically, if life is sufficiently infused with spiritual experiences, then they tend to become a part of the world's model. If one's priors are overwhelmingly against that, then one can still focus on materialistic explanation, but if one's priors are sufficiently flexible, one would probably end up with Occam-razor-like view (if there are too many spiritual experiences in one's life, it's not very parsimonious to have to explain each of them away, it's easier to just integrate them into one's worldview as primary empirical material, just like one does with most of the empirical material).
Often a different thing happens. One suddenly has a very strong spiritual experience or a series of those, and one believes rather strongly for a while, because the whole thing is so overwhelming, but after a period of time the spiritual experiences stop being that radical, the memory of them fades, one steps back and reanalyzes them and one might eventually to came to a rather agnostic conclusion about all that (or one might retain a weak belief, or revert to real materialism).
Dave Orr is hiring for a new DeepMind alignment team he is joining to start. Post is light on details, including planned technical details.
He has an edit saying
There’s already a team in London, which will continue to exist and do great work. I’m helping to set up a US branch of that team.
This is not a link to shared chat, this is a link which is private to you
In principle, the 4 members of the board had an option which would look much better: to call a meeting of all 6 board members, and to say at that meeting, "hey, the 4 of us think we should remove Sam from the company and remove Greg from the board, let's discuss this matter before we take a vote: tell us why we should not do that".
That would be honorable, and would look honorable, and the public relation situation would look much better for them.
The reason they had not done that was, I believe, that they did not feel confident they could resist persuasion powers of Sam, that they thought he would have talked at least one of them out of it.
But then what they did looked very unreasonable from more than one viewpoint:
- Should you take a monumental decision like this, if your level of confidence in this decision is so low that you think you might be talked out of it on the spot?
- Should you destroy someone like this before letting this person to defend himself?
They almost behaved as if Sam was already a hostile superintelligent AI who was trying to persuade them to let him out of the box, and who had superpowers of persuasion, and the only way to avoid the outcomes of letting him out of the box was to close one's ears and eyes and shut him down before he could say anything.
Perhaps this was close to how they actually felt...
Exactly. And then one's estimate of the actual impact depends on whether one believes Sutskever is one of the best people to lead an AI existential safety effort.
If one believes that, and if the outcome is that he ends up less likely to do so in the context of the leading AGI/ASI project, then the impact on safety might be very negative.
If one does not believe that he is one of the best people to lead this kind of effort, then one might think that the impact is not negative.
(I personally believe Ilya's approach is one of the better ones, and it seems to me that he has been in the process of fixing the defects in the original OpenAI superalignment plan, and basically trying to gradually create a better plan, but people's views on that might differ.)
Not quite wrong, just a "post-acquisition corporate structure" already ;-)
Now, a lot depends of how close to reality this tweet will be: https://twitter.com/satyanadella/status/1726516824597258569
I’m super excited to have you join as CEO of this new group, Sam, setting a new pace for innovation. We’ve learned a lot over the years about how to give founders and innovators space to build independent identities and cultures within Microsoft, including GitHub, Mojang Studios, and LinkedIn, and I’m looking forward to having you do the same.
Of course, a feel of being within a big corp can easily kill any creativity whatsoever, but they do a good job shielding people from that, they can move very fast even conditional on short timelines.
Of course, if they actually start competing for SOTA, safety will become a big issue, there is a danger of them being less careful than OpenAI has been so far, so what has happened is not necessarily a win for safety.
I've reviewed someone's draft which suggests this for AI safety (I hope it will be made public soon).
But I've heard rumors that people are trying this... And even from what Janus is saying in the comments/answers to my question https://www.lesswrong.com/posts/tbJdxJMAiehewGpq2/impressions-from-base-gpt-4, I am getting a rather strong suspicion that GPT-4 pretraining has been using some data curation.
From Janus' two comments there I am getting an impression of a non-RLHF'd system which is, nevertheless, tends to be much stronger than usual in its convictions (or, the virtual characters it creates tend to be stronger than usual in their convictions about the nature of their current reality). There might be multiple reasons for that, but some degree of data curation might be one of them.
It's not clear, because we don't know what the solution might look like...
But there are certainly ways to improve the odds. For example, one could pretrain on heavily curated data (no atrocities, no betrayals, etc, etc). Additionally, one can use curricula like we teach children, starting with "age-appropriate" texts first.
Then if we succeed in interpretability, we might be able to monitor and adjust what's going on.
Here the remark of "alignment being fundamental" might come into play: we might figure out ways to replace Transformers with an architecture which is much easier to interpret.
All these are likely to be positive things, although without truly knowing a solution it's difficult to be sure...
That's good! So, at least a bit of safety fine-tuning is there...
Good to know...
I think they had a reasonably detailed (but unfortunately unrealistic) plan for aligning superintelligence before Ilya became a co-lead of the Superalignment team. That had been published, in multiple installments.
The early July text https://openai.com/blog/introducing-superalignment was the last of those installments, and most of its technical content was pre-Ilya (as far as I knew), but it also introduced Ilya as a co-lead.
But the problem with most such alignment plans including this one had always been that they didn't have much chance of working for a self-improving superintelligent AI or ecosystem of AIs, that is, exactly when we start really needing them to work.
I think Ilya understood this very well, and he started to revise plans and to work in new directions in this sense, and we were seeing various bits of his thoughts on that in his various interviews (in addition to what he said here, one other motif he was returning to in recent months was that it is desirable that superintelligent AIs would think about themselves as something like parents, and about us as something like their children, so one of the questions is what should we do to achieve that).
But I don't know if he would want to publish details going forward (successful AI safety research is capability research, there is no way to separate them, and the overall situation might be getting too close to the endgame). He will certainly share something, but the core novel technical stuff will more and more be produced via intellectual collaboration with cutting edge advanced (pre-public-release in-house) AI systems, and they would probably want to at least introduce a delay before sharing something as sensitive as this.
I haven't played around with Grok so I'm not sure how capable or safe it is.
I expect safety of that to be at zero (they don't think GPT-3.5-level LLMs are a problem in this sense; besides they market it almost as an "anything goes, anti-censorship LLM").
But that's not really the issue; when a system starts being capable to write code reasonably well, then one starts getting a problem... I hope when they come to that, to approaching AIs which can create better AIs, they'll start taking safety seriously... Otherwise, we'll be in trouble...
Ilya co-leading
I thought he was the appropriately competent person (he was probably the AI scientist #1 in the world). The right person for the most important task in the world...
And the "superalignment" team at OpenAI was... not very strong. The original official "superalignment" approach was unrealistic and hence not good enough. I made a transcript of some of his thoughts, https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a, and it was obvious that his thinking was different from the previous OpenAI "superalignment" approach and much better (as in, "actually had a chance to succeed")...
Of course, now, since it looks like the "coup" has mostly been his doing, I am less sure that this is the leadership OpenAI and OpenAI safety needs. The manner of that has certainly been too erratic. Safety efforts should not evoke the feel of "last minute emergency"...
I'm still figuring out Elon's xAI.
They released a big LLM, the "Grok". With their crew of stars I hoped for a more interesting direction, but an LLM as a start is not unreasonable (one does need a performant LLM as a component).
I think he frames it with him as the main person that steers the tech
Yeah... I thought he deferred to Ilya and to the new "superalignment team" Ilya has been co-leading safety-wise...
But perhaps he was not doing that consistently enough...
Do you mean this in the sense that this would be particularly bad safety-wise, or do you mean this in the sense they are likely to just build huge LLMs like everyone else is doing, including even xAI?
Interesting, how sharply people disagree...
It would be good to be able to attribute this disagreement to a particular part of the comment. Is that about me agreeing with Sam about "True AI" needing to be able to do novel physics? Or about me implicitly supporting the statement that LLMs would not be good enough (I am not really sure, I think LLMs would probably be able to create non-LLMs based AIs, so even if they are not good enough to achieve the level of "True AI" directly, they might be able to get there by creating differently-architected AIs)?
Or about having a single clear leader being good for safety? Or about Ilya being one of the best safety project leaders, based on the history of his thinking and his qualification? Or about Sam and Greg having a fighting chance against OpenAI? Or about me being unsure of them being able to do adequate safety work on the level which Ilya is likely to provide?
I am curious which of these seem to cause disagreement...
Yeah... On one hand, I am excited about Sam and Greg hopefully trying more interesting things than just scaling Transformer LLMs, especially considering Sam' answer to the last question on Nov. 1 at Cambridge Union, 1:01:45 in https://www.youtube.com/watch?v=NjpNG0CJRMM where he seems to think that more than Transformer-based LLMs are needed for AGI/ASI (in particular, he correctly says that "true AI" must be able to discover new physics, and he doubts LLMs are good enough for that).
On the other hand, I was hoping for a single clear leader in the AI race, and I thought that Ilya Sutskever was one of the best possible leaders for an AI safety project. And now Ilya vs. Sam and Greg Brockman are enemies, https://twitter.com/gdb/status/1725736242137182594, and if Sam and Greg would find a way to beat OpenAI, would they be able to be sufficiently mindful about safety?