Posts
Comments
What do you consider the strongest evidence / reason to believe?
Would love to see some false diagrams. Flow charts or circuits etc
Do you personally believe in either of those?
It is just as ambitious/implausible as you say. I am hoping to get out some rough ideas in my next post anyways.
I like the way you've operationalized the question
Yes the fact that coning works and people are doing it is what I meant was funny.
But I do wonder whether the protests will keep up and/or scale up. Maybe if enough people protest everywhere all at once, then they can kill autonomous cars altogether. Otherwise, I think a long legal dispute would eventually come out in the car companies' favor. Not that I would know.
What's going on here, like where did this post come from? I am missing context
Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I'm not sure that an AI scheme that can't be trivially turned evil rerverso is possible, but I would like to try to find one.
Edited post to rename "intrinsically aligned AI" to "intrinsically kind AI" for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there's no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can't easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)
This is hilarious
I know of one: the steam engine was "working" and continuously patented and modified for a century (iirc) before someone used it in boats at scale. https://youtu.be/-8lXXg8dWHk
Do you know of any compendiums of such Z_Ns? Would love to read one
I never heard of it. You should try it.
If I make a post then revert to draft then republish, what is the publish date?
Perhaps there are some behavioral / black-box methods available for evaluating alignment, depending on the kind of system being evaluated.
Toy example: imagine a two part system where part A tries to do tasks and part B limits part A's compute based on the riskiness of the task. You could try to optimize the overall system towards catastrophic behavior and see how well your part B holds up.
Personally I expect monolithic systems to be hard to control than two-part systems, so I think this evaluation scheme has a good chance of being applicable. One piece of evidence: OpenAI's moderation system correctly flags most jailbreaks that get past the base model's RLHF.
I wonder how cross-species-compatible animal genes are in general. Main example I've heard of is that fluorescence genes from bacteria can be pretty much inserted anywhere and just work [citation needed]. You probably couldn't give a parrot elephant ears but maybe you could do more basic tweaks like lifespan or size changes?
If you can cross-copy-paste useful stuff easily then scenario 1 is significantly upgraded
Good point. In fact I can imagine people treating smarter parrots even worse sometimes because they would be extra annoying sometimes
The neurons are smaller and faster to match though
Yes I meant that it is slow. Seems to be very roughly six months for dogs and octopi.
I forgot to highlight that I think parrot's general social and physical compatibility with humans — and humans' general sympathy and respect for parrots -- is probably greater than any alternative except dogs. They also can fly. People quickly report and prosecute dog fighting. I bet regular or kinda smart or very smart parrots would all do fine. 100% speculation of course.
When you accidentally unlock the tech tree by encouraging readers to actually map out a tech tree and strategize about it
No, excellent analysis though.
Great references - very informative - thank you. I am always yelling at random people on the street walking their dogs that they're probably hacked already based on my needs-no-evidence raw reasoning. I'll print this out and carry it with me next time
I'm just patting myself on the back here for predicting the cup would get knocked over. That shouldn't count. You want the ball in the cup -- what use is a knocked over cup and ball on the ground.
Do you have more things like this? I would participate or run one
Those kind of sound like decisions. Is the difference that you paused a little longer and sort of organized your thoughts beyond what was immediately necessary? Or how would you describe the key differentiating thing here?
Does a dog orient? An ant? I thought one of the fighter pilot things was to not allow your enemy the time to orient
Kyle Scott roughly said that when you know where to look and what to ignore you are oriented. Imagine a general freaking out at all the explosions vs one who knows how severe the explosions are expected to be and the threshold for changing course.
Of course ReLU is great!! I was trying to say that if I were a 2009 ANN researcher (unaware of prior ReLU uses like most people probably were at the time) and someone (who had not otherwise demonstrated expertise) came in and asked why we use this particular woosh instead of a bent line or something, then I would've thoroughly explained the thought out of them. It's possible that I would've realized how it works but very unlikely IMO. But a dumbworker more likely to say "Go do it. Now. Go. Do it now. Leave. Do it." as I see it.
Curious what industry this is if you don't mind saying
Good point. I am concerned that adding even a dash of legibility screws the work over completely and immediately and invisibly rather than incrementally. I may have over-analyzed my data so I should probably return to the field to collect more samples.
Could spaceships accelerate fast enough to make missile course adjustment necessary? Seems like blind missile could still hit
I would read a longpost about where and how and when and why liability insurance has succeeded or failed
Liability insurance has a mixed record for sure. Landlords and doctors ok not great in terms of safety
This is so goddamn strange. I have wondered about this for so long
Some things are easy to notice and hard to replicate
More ideas you're less confident in?
I should clarify that section. I meant that if you're asked to write a line of code or an app or whatever then it is easier to guess at intent/consequences for the higher level tasks. Another example: the lab manager has a better idea of what's going on than a lab assistant.
How much room is there in algorithmic improvements?
Yeah would love to see experiments/evidence outside of Bing
Ctrl-f for "memory" has no results
Do you think there might be a simple difference between the successes and failures here that we could learn from?
Added footnote clarifying link (goodfirms seems misquoted and also kind of looks fake?)
I mentioned the software development firm as an intermediate step to products because it's less risky / easier than making a successful product. Even easier would just be to hire devs, give them your model, put them on upwork, and split the profits.
I suppose the ideal commercialization plan depends on how the model works and the size of the firm commercializing it. (And for govts and universities "commercialization" is completely different.)
Could you state the problem and solution more succinctly?
Bump request for image fix
There is a lot of room between "ignore people; do drastic thing" and "only do things where the exact details have been fully approved". In other words, the Overton window has pretty wide error bars.
I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?
Even google has been threatening unpatched software people to patch it or else they'll release the exploit iirc
So some of the Q of "to pivotally act or not to pivotally act" is resolved by acknowledging that extent is relevant and you can be polite in some cases
This is the post I would have written if I had had more time, knew more, thought faster, etc
One note about your final section: I expect the tool -> sovereign migration to be pretty easy and go pretty well. It is also kind of multistep, not binary.
Eg current browser automation tools (which bring browsers one step up the agency ladder to scriptable processes) work very well, probably better than a from-scratch web scripting tool would work.
Fake example: predict proteins, then predict interactions, then predict cancer-preventiveness, THEN, if everything is going good so far, solve for the protein that prevents the cancer. You might not need more steps but you could also incrementally solve for the chemical synthesis process, eliminate undesired byproducts, etc.
More generally, it might be easy to flip simulators/predictors over into optimizers when the system is trustworthy and the situation demands.
If your system is low impact or whatever then you can have it melt gpus without going off and running the world forever.
Assumptions:
- Operator making high-level judgement calls is a pretty good solution for the foreseeable future.
- The self referenced decision theory stuff won't come in in an important way for a good while (at least the problems won't demand anything super rigorous)
So +1 for staying in happy tool land and punting on scary agent land
I thought not cuz i didn't see why that'd be desideratum. You mean a good definition is so canonical that when you read it you don't even consider other formulations?
'Betray' in the sense of contradicting/violating?
Seems like choosing the definitions is the important skill, since in real life you don't usually have a helpful buddy saying "hey this is a graph"
Do you expect the primary asset to be a neural architecture / infant mind or an adult mind? Is it too ambitious to try to find an untrained mind that reliably develops nicely?
Someone make a PR for a builder/breaker feature on lesswrong