Posts

Series of absurd upgrades in nature's great search 2023-09-03T09:35:20.760Z
We can do better than DoWhatIMean 2023-08-19T05:41:47.046Z
Could fabs own AI? 2023-08-19T00:16:37.848Z
Could we breed/engineer intelligent parrots? 2023-08-02T07:32:17.686Z
When did you orient? 2023-06-19T07:22:27.968Z
Work dumber not smarter 2023-06-01T12:40:31.264Z
When should I close the fridge? 2023-05-17T16:56:35.629Z
Distinguishing misuse is difficult and uncomfortable 2023-05-01T16:23:17.040Z
More money with less risk: sell services instead of model access 2023-03-04T20:51:36.480Z
Planning capacity and daemons 2022-09-26T00:15:42.409Z
Inner alignment: what are we pointing at? 2022-09-18T11:09:58.661Z
AI-assisted list of ten concrete alignment things to do right now 2022-09-07T08:38:29.757Z
Do yourself a FAVAR: security mindset 2022-06-18T02:08:47.415Z
Against unstoppable crypto prediction markets 2021-02-25T06:02:23.102Z
lcmgcd's Shortform 2020-01-27T00:52:37.833Z
Creating Environments to Design and Test Embedded Agents 2019-08-23T03:17:33.265Z

Comments

Comment by lukehmiles (lcmgcd) on Series of absurd upgrades in nature's great search · 2023-09-06T17:06:58.884Z · LW · GW

What do you consider the strongest evidence / reason to believe?

Comment by lukehmiles (lcmgcd) on Hard Questions Are Language Bugs · 2023-09-05T04:46:21.900Z · LW · GW

Would love to see some false diagrams. Flow charts or circuits etc

Comment by lukehmiles (lcmgcd) on Series of absurd upgrades in nature's great search · 2023-09-05T04:03:24.539Z · LW · GW

Do you personally believe in either of those?

Comment by lukehmiles (lcmgcd) on We can do better than DoWhatIMean · 2023-08-22T01:27:01.103Z · LW · GW

It is just as ambitious/implausible as you say. I am hoping to get out some rough ideas in my next post anyways.

Comment by lukehmiles (lcmgcd) on We can do better than DoWhatIMean · 2023-08-22T01:25:12.024Z · LW · GW

I like the way you've operationalized the question

Comment by lukehmiles (lcmgcd) on Self-driving car bets · 2023-08-21T16:23:05.875Z · LW · GW

Yes the fact that coning works and people are doing it is what I meant was funny.

But I do wonder whether the protests will keep up and/or scale up. Maybe if enough people protest everywhere all at once, then they can kill autonomous cars altogether. Otherwise, I think a long legal dispute would eventually come out in the car companies' favor. Not that I would know.

Comment by lukehmiles (lcmgcd) on AI #25: Inflection Point · 2023-08-20T04:03:34.246Z · LW · GW

What's going on here, like where did this post come from? I am missing context

Comment by lukehmiles (lcmgcd) on We can do better than DoWhatIMean · 2023-08-20T00:12:48.114Z · LW · GW

Yes it does become easier to control and communicate with, but it does not become harder to make it be malicious. I'm not sure that an AI scheme that can't be trivially turned evil rerverso is possible, but I would like to try to find one.

Comment by lukehmiles (lcmgcd) on We can do better than DoWhatIMean · 2023-08-20T00:09:21.250Z · LW · GW

Edited post to rename "intrinsically aligned AI" to "intrinsically kind AI" for clarity. As I understand it, the hope is to develop capability techniques and control techniques in parallel. But there's no major plan I know of to have a process for developing capabilities that are hard-linked to control/kindness/whatever in a way you can't easily remove. (I have heard an idea or two though and am planning on writing a post about it soon.)

Comment by lukehmiles (lcmgcd) on Self-driving car bets · 2023-08-18T22:23:46.657Z · LW · GW

This is hilarious

Comment by lukehmiles (lcmgcd) on Self-driving car bets · 2023-08-18T22:17:33.221Z · LW · GW

I know of one: the steam engine was "working" and continuously patented and modified for a century (iirc) before someone used it in boats at scale. https://youtu.be/-8lXXg8dWHk

Comment by lukehmiles (lcmgcd) on Self-driving car bets · 2023-08-18T22:15:47.493Z · LW · GW

Do you know of any compendiums of such Z_Ns? Would love to read one

Comment by lukehmiles (lcmgcd) on Any research in "probe-tuning" of LLMs? · 2023-08-16T02:31:17.850Z · LW · GW

I never heard of it. You should try it.

Comment by lcmgcd on [deleted post] 2023-08-16T02:30:23.533Z

If I make a post then revert to draft then republish, what is the publish date?

Comment by lukehmiles (lcmgcd) on When can we trust model evaluations? · 2023-08-15T14:44:37.567Z · LW · GW

Perhaps there are some behavioral / black-box methods available for evaluating alignment, depending on the kind of system being evaluated.

Toy example: imagine a two part system where part A tries to do tasks and part B limits part A's compute based on the riskiness of the task. You could try to optimize the overall system towards catastrophic behavior and see how well your part B holds up.

Personally I expect monolithic systems to be hard to control than two-part systems, so I think this evaluation scheme has a good chance of being applicable. One piece of evidence: OpenAI's moderation system correctly flags most jailbreaks that get past the base model's RLHF.

Comment by lukehmiles (lcmgcd) on Could we breed/engineer intelligent parrots? · 2023-08-07T21:04:16.135Z · LW · GW

I wonder how cross-species-compatible animal genes are in general. Main example I've heard of is that fluorescence genes from bacteria can be pretty much inserted anywhere and just work [citation needed]. You probably couldn't give a parrot elephant ears but maybe you could do more basic tweaks like lifespan or size changes?

If you can cross-copy-paste useful stuff easily then scenario 1 is significantly upgraded

Comment by lukehmiles (lcmgcd) on Could we breed/engineer intelligent parrots? · 2023-08-02T18:32:20.603Z · LW · GW

Good point. In fact I can imagine people treating smarter parrots even worse sometimes because they would be extra annoying sometimes

Comment by lukehmiles (lcmgcd) on Could we breed/engineer intelligent parrots? · 2023-08-02T08:34:14.755Z · LW · GW

The neurons are smaller and faster to match though

Comment by lukehmiles (lcmgcd) on Could we breed/engineer intelligent parrots? · 2023-08-02T08:32:31.089Z · LW · GW

Yes I meant that it is slow. Seems to be very roughly six months for dogs and octopi.

Comment by lukehmiles (lcmgcd) on Could we breed/engineer intelligent parrots? · 2023-08-02T08:29:15.785Z · LW · GW

I forgot to highlight that I think parrot's general social and physical compatibility with humans — and humans' general sympathy and respect for parrots -- is probably greater than any alternative except dogs. They also can fly. People quickly report and prosecute dog fighting. I bet regular or kinda smart or very smart parrots would all do fine. 100% speculation of course.

Comment by lukehmiles (lcmgcd) on Some background for reasoning about dual-use alignment research · 2023-07-01T16:14:30.698Z · LW · GW

When you accidentally unlock the tech tree by encouraging readers to actually map out a tech tree and strategize about it

No, excellent analysis though.

Comment by lukehmiles (lcmgcd) on What will GPT-2030 look like? · 2023-06-24T07:39:16.903Z · LW · GW

Great references - very informative - thank you. I am always yelling at random people on the street walking their dogs that they're probably hacked already based on my needs-no-evidence raw reasoning. I'll print this out and carry it with me next time

Comment by lukehmiles (lcmgcd) on Lessons On How To Get Things Right On The First Try · 2023-06-20T10:12:41.859Z · LW · GW

I'm just patting myself on the back here for predicting the cup would get knocked over. That shouldn't count. You want the ball in the cup -- what use is a knocked over cup and ball on the ground.

Do you have more things like this? I would participate or run one

Comment by lukehmiles (lcmgcd) on When did you orient? · 2023-06-20T00:50:09.663Z · LW · GW

Those kind of sound like decisions. Is the difference that you paused a little longer and sort of organized your thoughts beyond what was immediately necessary? Or how would you describe the key differentiating thing here?

Comment by lukehmiles (lcmgcd) on When did you orient? · 2023-06-20T00:45:13.989Z · LW · GW

Does a dog orient? An ant? I thought one of the fighter pilot things was to not allow your enemy the time to orient

Comment by lukehmiles (lcmgcd) on When did you orient? · 2023-06-19T19:53:46.259Z · LW · GW

Kyle Scott roughly said that when you know where to look and what to ignore you are oriented. Imagine a general freaking out at all the explosions vs one who knows how severe the explosions are expected to be and the threshold for changing course.

Comment by lukehmiles (lcmgcd) on Work dumber not smarter · 2023-06-02T17:51:22.103Z · LW · GW

Of course ReLU is great!! I was trying to say that if I were a 2009 ANN researcher (unaware of prior ReLU uses like most people probably were at the time) and someone (who had not otherwise demonstrated expertise) came in and asked why we use this particular woosh instead of a bent line or something, then I would've thoroughly explained the thought out of them. It's possible that I would've realized how it works but very unlikely IMO. But a dumbworker more likely to say "Go do it. Now. Go. Do it now. Leave. Do it." as I see it.

Comment by lukehmiles (lcmgcd) on Work dumber not smarter · 2023-06-02T17:37:35.790Z · LW · GW

Curious what industry this is if you don't mind saying

Comment by lukehmiles (lcmgcd) on Work dumber not smarter · 2023-06-02T00:06:47.546Z · LW · GW

Good point. I am concerned that adding even a dash of legibility screws the work over completely and immediately and invisibly rather than incrementally. I may have over-analyzed my data so I should probably return to the field to collect more samples.

Comment by lukehmiles (lcmgcd) on Helio-Selenic Laser Telescope (in SPACE!?) · 2023-05-27T00:28:33.211Z · LW · GW

Could spaceships accelerate fast enough to make missile course adjustment necessary? Seems like blind missile could still hit

Comment by lukehmiles (lcmgcd) on Who regulates the regulators? We need to go beyond the review-and-approval paradigm · 2023-05-09T05:33:13.075Z · LW · GW

I would read a longpost about where and how and when and why liability insurance has succeeded or failed

Comment by lukehmiles (lcmgcd) on Who regulates the regulators? We need to go beyond the review-and-approval paradigm · 2023-05-09T05:32:00.154Z · LW · GW

Liability insurance has a mixed record for sure. Landlords and doctors ok not great in terms of safety

Comment by lukehmiles (lcmgcd) on Who regulates the regulators? We need to go beyond the review-and-approval paradigm · 2023-05-09T05:18:21.819Z · LW · GW

This is so goddamn strange. I have wondered about this for so long

Comment by lukehmiles (lcmgcd) on AI alignment researchers don't (seem to) stack · 2023-05-04T18:47:45.483Z · LW · GW

Some things are easy to notice and hard to replicate

Comment by lukehmiles (lcmgcd) on Alignment Research @ EleutherAI · 2023-05-04T04:22:01.389Z · LW · GW

More ideas you're less confident in?

Comment by lukehmiles (lcmgcd) on Distinguishing misuse is difficult and uncomfortable · 2023-05-03T15:28:51.201Z · LW · GW

I should clarify that section. I meant that if you're asked to write a line of code or an app or whatever then it is easier to guess at intent/consequences for the higher level tasks. Another example: the lab manager has a better idea of what's going on than a lab assistant.

Comment by lukehmiles (lcmgcd) on Contra Yudkowsky on AI Doom · 2023-04-28T10:48:21.653Z · LW · GW

How much room is there in algorithmic improvements?

Comment by lukehmiles (lcmgcd) on The Waluigi Effect (mega-post) · 2023-04-13T08:12:59.519Z · LW · GW

Yeah would love to see experiments/evidence outside of Bing

Comment by lukehmiles (lcmgcd) on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-03-21T18:43:52.300Z · LW · GW

Ctrl-f for "memory" has no results

Comment by lukehmiles (lcmgcd) on AI alignment researchers don't (seem to) stack · 2023-03-05T17:41:29.210Z · LW · GW

Do you think there might be a simple difference between the successes and failures here that we could learn from?

Comment by lukehmiles (lcmgcd) on More money with less risk: sell services instead of model access · 2023-03-04T22:19:33.111Z · LW · GW

Added footnote clarifying link (goodfirms seems misquoted and also kind of looks fake?)

I mentioned the software development firm as an intermediate step to products because it's less risky / easier than making a successful product. Even easier would just be to hire devs, give them your model, put them on upwork, and split the profits.

I suppose the ideal commercialization plan depends on how the model works and the size of the firm commercializing it. (And for govts and universities "commercialization" is completely different.)

Comment by lukehmiles (lcmgcd) on Buddhist Psychotechnology for Withstanding Apocalypse Stress · 2023-02-28T17:41:33.318Z · LW · GW

Could you state the problem and solution more succinctly?

Comment by lukehmiles (lcmgcd) on What it's like to dissect a cadaver · 2022-11-13T03:54:39.684Z · LW · GW

Bump request for image fix

Comment by lukehmiles (lcmgcd) on What does it take to defend the world against out-of-control AGIs? · 2022-10-26T10:45:23.214Z · LW · GW

There is a lot of room between "ignore people; do drastic thing" and "only do things where the exact details have been fully approved". In other words, the Overton window has pretty wide error bars.

I would be pleased if someone sent me a computer virus that was actually a security fix. I would be pretty upset if someone fried all my gadgets. If someone secretly watched my traffic for evil AI fingerprints I would be mildly annoyed but I guess glad?

Even google has been threatening unpatched software people to patch it or else they'll release the exploit iirc

So some of the Q of "to pivotally act or not to pivotally act" is resolved by acknowledging that extent is relevant and you can be polite in some cases

Comment by lukehmiles (lcmgcd) on What does it take to defend the world against out-of-control AGIs? · 2022-10-26T10:26:32.033Z · LW · GW

This is the post I would have written if I had had more time, knew more, thought faster, etc

One note about your final section: I expect the tool -> sovereign migration to be pretty easy and go pretty well. It is also kind of multistep, not binary.

Eg current browser automation tools (which bring browsers one step up the agency ladder to scriptable processes) work very well, probably better than a from-scratch web scripting tool would work.

Fake example: predict proteins, then predict interactions, then predict cancer-preventiveness, THEN, if everything is going good so far, solve for the protein that prevents the cancer. You might not need more steps but you could also incrementally solve for the chemical synthesis process, eliminate undesired byproducts, etc.

More generally, it might be easy to flip simulators/predictors over into optimizers when the system is trustworthy and the situation demands.

If your system is low impact or whatever then you can have it melt gpus without going off and running the world forever.

Assumptions:

  1. Operator making high-level judgement calls is a pretty good solution for the foreseeable future.
  2. The self referenced decision theory stuff won't come in in an important way for a good while (at least the problems won't demand anything super rigorous)

So +1 for staying in happy tool land and punting on scary agent land

Comment by lukehmiles (lcmgcd) on Self-Embedded Agent's Shortform · 2022-10-13T02:01:04.285Z · LW · GW

I thought not cuz i didn't see why that'd be desideratum. You mean a good definition is so canonical that when you read it you don't even consider other formulations?

Comment by lukehmiles (lcmgcd) on Self-Embedded Agent's Shortform · 2022-10-11T08:32:35.724Z · LW · GW

'Betray' in the sense of contradicting/violating?

Comment by lukehmiles (lcmgcd) on Self-Embedded Agent's Shortform · 2022-10-11T02:11:21.466Z · LW · GW

Seems like choosing the definitions is the important skill, since in real life you don't usually have a helpful buddy saying "hey this is a graph"

Comment by lukehmiles (lcmgcd) on LOVE in a simbox is all you need · 2022-10-05T06:22:07.331Z · LW · GW

Do you expect the primary asset to be a neural architecture / infant mind or an adult mind? Is it too ambitious to try to find an untrained mind that reliably develops nicely?

Comment by lukehmiles (lcmgcd) on Builder/Breaker for Deconfusion · 2022-09-29T22:04:43.778Z · LW · GW

Someone make a PR for a builder/breaker feature on lesswrong