Posts

Comments

Comment by β-redex (GregK) on How do you feel about LessWrong these days? [Open feedback thread] · 2023-12-06T03:45:06.364Z · LW · GW

I think the reacts being semantic instead of being random emojis is what makes this so much better.

I wish other platforms experimented with semantic reacts as well, instead of just letting people react with any emoji of their choosing, and making you guess whether e.g. "thumbs up" means agreement, acknowledgement, or endorsement, etc.

Comment by β-redex (GregK) on 2023 Unofficial LessWrong Census/Survey · 2023-12-02T19:33:41.904Z · LW · GW

This was my first time taking this, looking forward to the results!

Comment by β-redex (GregK) on Which LessWrongers are (aspiring) YouTubers? · 2023-10-23T13:38:46.092Z · LW · GW

I know of Robert Miles, and Writer, who does Rational Animations. (In fact Robert Miles' channel is the primary reason I discovered LessWrong :) )

Comment by β-redex (GregK) on Rationalist horror movies · 2023-10-15T08:54:44.090Z · LW · GW

Don't leave me hanging like this, does the movie you are describing exist? (Though I guess your description is a major spoiler, you would need to go in without knowing whether there will be anything supernatural.)

Comment by β-redex (GregK) on Rationalist horror movies · 2023-10-15T08:47:57.997Z · LW · GW
  1. The Thing: classic
  2. Eden Lake
  3. Misery
  4. 10 Cloverfield Lane
  5. Gone Girl: not horror, but I specifically like it because of how agentic the protagonist is

2., 3. and 4. have in common that there is some sort of abusive relationship that develops, and I think this adds another layer of horror. (A person/group of people gain some power over the protagonist(s), and they slowly grow more abusive with this power.)

Comment by β-redex (GregK) on Rationalist horror movies · 2023-10-15T08:12:42.194Z · LW · GW

Somewhat related: does anyone else strongly dislike supernatural elements in horror movies?

It's not that I have anything against a movie exploring the idea of "what if we suddenly discovered that we live in a universe where supernatural thing X exist", but the characters just accept this without much evidence at all.

I would love a movie though where they explore the more likely alternate hypotheses first (mental issues, some weird optical/acoustic phenomenon, or just someone playing a super elaborate prank), but then the evidence starts mounding, and eventually they are forced to accept that "supernatural thing X actually exists" is really the most likely hypothesis.

Comment by β-redex (GregK) on Jailbreaking GPT-4's code interpreter · 2023-07-13T21:52:51.228Z · LW · GW

These examples show that, at least in this lower-stakes setting, OpenAI’s current cybersecurity measures on an already-deployed model are insufficient to stop a moderately determined red-teamer.

I... don't actually see any non-trivial vulnerabilities here? Like, these are stuff you can do on any cloud VM you rent?

Cool exploration though, and it's certainly interesting that OpenAI is giving you such a powerful VM for free (well actually not because you already pay for GPT-4 I guess?), but I have to agree with their assessment which you found that "it's expected that you can see and modify files on this system".

Comment by β-redex (GregK) on gamers beware: modded Minecraft has new malware · 2023-06-07T21:08:01.534Z · LW · GW

The malware is embedded in multiple mods, some of which were added to highly popular modpacks.

Any info on how this happened? This seems like a fairly serious supply chain attack. I have heard of incidents with individual malicious packages on npm or PyPI, but not one where multiple high profile packages in a software repository were infected in a coordinated manner.

Comment by β-redex (GregK) on Chatbot convinces Belgian to commit suicide · 2023-03-29T20:05:32.217Z · LW · GW

Uhh this first happening in 2023 was the exact prediction Gary Marcus made last year: https://www.wired.co.uk/article/artificial-intelligence-language

Not sure whether this instance is a capability or alignment issue though. Is the LLM just too unreliable, as Gary Marcus is saying? Or is it perfectly capable, and just misaligned?

Comment by β-redex (GregK) on We have to Upgrade · 2023-03-23T23:01:30.092Z · LW · GW

I don't see why communicating with an AI through a BCI is necessarily better than through a keyboard+screen. Just because a BCI is more ergonomic and the AI might feel more like "a part of you", it won't magically be better aligned.

In fact the BCI option seems way scarier to me. An AI that can read my thoughts at any time and stimulate random neurons in my brain at will? No, thanks. This scenario just feels like you are handing it the "breaking out of the box" option on a silver platter.

Comment by β-redex (GregK) on How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century? · 2023-02-16T21:40:08.174Z · LW · GW

Why is this being downvoted?

From what I am seeing people here are focusing way too much on having a precisely calibrated P(doom) value.

It seems that even if P(doom) is 1% the doom scenario should be taken very seriously and alignment research pursued to the furthest extent possible.

The probability that after much careful calibration and research you would come up with a P(doom) value less than 1% seems very unlikely to me. So why invest time into refining your estimate?

Comment by β-redex (GregK) on Is it a coincidence that GPT-3 requires roughly the same amount of compute as is necessary to emulate the human brain? · 2023-02-10T17:04:54.260Z · LW · GW

There was a recent post estimating that GTP-3 is equivalent to about 175 bees. There is also a comment there asserting that a human is about 140k bees.

I would be very interested if someone could explain where this huge discrepancy comes from. (One estimate is equating synapses with parameters, while this one is based on FLOPS. But there shouldn't be such a huge difference.)

Comment by β-redex (GregK) on Why Are Bacteria So Simple? · 2023-02-08T23:53:13.405Z · LW · GW

Indeed (as other commenters also pointed out) the ability to sexually reproduce seems to be much more prevalent than I originally thought when writing the above comment. (I thought that eukaryotes only capable of asexual reproduction were relatively common, but it seems that there may only be a very few special cases like that.)

I still disagree with you dismissing the importance of mitochondria though. (I don't think the OP is saying that mitochondria alone are sufficient for larger genomes, but the argument for why they are at least necessary is convincing to me.)

Comment by β-redex (GregK) on English is a Terrible Programming Language—And other reasons AI won't displace programmers · 2023-02-07T21:16:16.279Z · LW · GW

I disagree with English (in principle at least) being inadequate for software specification.

For any commercial software, the specification basically is just "make profit for this company". The rest is implementation detail.

(Obviously this is an absurd example, but it illustrates how you can express abstractions in English that you can't in C++.)

Comment by β-redex (GregK) on English is a Terrible Programming Language—And other reasons AI won't displace programmers · 2023-02-07T21:02:30.667Z · LW · GW

I don't think the comparison of giving a LLM instructions and expecting correct code to be output is fair. You are vastly overestimating the competence of human programmers: when was the last time you wrote perfectly correct code on the very first try?

Giving the LLM the ability to run its code and modify it until it thinks its right would be a much fairer comparison. And if, as you say, writing unit tests is easy for a LLM, wouldn't that just make this trial-and-error loop trivial? You can just bang the LLM against the problem until the unit tests pass.

(And this process obviously won't produce bug-free code, but humans don't do that in the first place either.)

Comment by β-redex (GregK) on Why Are Bacteria So Simple? · 2023-02-06T13:02:49.303Z · LW · GW

Not all eukaryotes employ sexual reproduction. Also prokaryotes do have some mechanisms for DNA exchange as well, so copying errors are not their only chance for evolution either.

But I do agree that it's probably no coincidence that the most complex life forms are sexually reproducing eukaryotes.

Comment by β-redex (GregK) on Small Talk is Good, Actually · 2023-02-04T12:05:02.742Z · LW · GW

I barely registered the difference between small talk and big talk

I am still confused about what "small talk" is after reading this post.

Sure, talking about the weather is definitely small talk. But if I want to get to know somebody, weather talk can't possibly last for more than 30 seconds. After that, both parties have demonstrated the necessary conversational skills to move on to more interesting topics. And the "getting to know each other" phase is really just a spectrum between surface level stuff and your deepest personal secrets, so I don't really see where you would draw the line between small and deep talk.

One situation I struggle with on the other hand is when I would rather avoid talking to a person at all, and so I want to maintain the shallowest possible level of small talk. (Ideally I could tell them that "sorry, I would rather just not talk to you right now", but that's not really socially accepted.)

Comment by β-redex (GregK) on Exercise is Good, Actually · 2023-02-03T23:11:01.443Z · LW · GW

It was actually this post about nootropics that got me curious about this. Apparently (based on self reported data) weightlifting is just straight up better than most other nootropics?

Anyway, thank you for referencing some opposing evidence on the topic as well, I might try to look into it more at some point.

(Unfortunately, the thing that I actually care about - whether it has cognitive benefits for me - seems hard to test, since you can't blind yourself to whether you exercised.)

Comment by β-redex (GregK) on Nice Clothes are Good, Actually · 2023-02-03T18:08:11.713Z · LW · GW

I think this is (and your other post about exercise) are good practical examples of situations where rational thinking makes you worse off (at least for a while).

If you had shown this post to me as a kid, my youth would probably have been better. Unfortunately no one around me was able to make a sufficiently compelling argument for caring about physical appearance. It wasn't until much later that I was able to deduce the arguments for myself. If I just blindly "tried to fit in with the cool kids, and do what is trendy", I would have been better off.

I wonder what similar blind spots I could have right now where the argument in favor of doing something is quite complicated, but most people in society just do it because they blindly copy others, and I am worse off as a result.

Comment by β-redex (GregK) on Exercise is Good, Actually · 2023-02-03T17:27:52.637Z · LW · GW

This alone trumps any other argument mentioned in the post. None of the other arguments seem universal and can be argued with on an individual basis.

I actually like doing things with my body. I like hiking and kayaking and mountain climbing and dancing.

As some other commenters noted, what if you just don't?

I think it would be valuable if someone made a post just focused on collecting all the evidence for the positive cognitive effects of exercise. If the evidence is indeed strong, no other argument in favor of exercise should really matter.

Comment by β-redex (GregK) on ChatGPT struggles to respond to the real world · 2023-01-12T17:33:22.168Z · LW · GW

FWIW I don't think that matters, in my experience interactions like this arise naturally as well, and humans usually perform similarly to how Friend did here.

In particular it seems that here ChatGPT completely fails at tracking the competence of its interlocutor in the domain at hand. If you asked a human with no context at first they might give you the complete recipe just like ChatGPT tried, but any follow up question immediately would indicate to them that more hand-holding is necessary. (And ChatGPT was asked to "walk me through one step at a time", which should be blatantly obvious and no human would just repeat the instructions again in answer to this.)

Comment by β-redex (GregK) on Basic building blocks of dependent type theory · 2022-12-15T23:06:15.950Z · LW · GW

Cool! (Nitpick: You should probably mention that you are deviating from the naming in the HoTT book. AFAIK usually and types are called Pi and Sigma types respectively, while the words "product" and "sum" (or "coproduct" in the HoTT book) are reserved for and .)

I am especially looking forward to discussion on how MLTT relates to alignment research and how it can be used for informal reasoning as Alignment Research Field Guide mentions.

I always get confused when the term "type signature" is used in text unrelated to type theory. Like what do people mean when they say things like "What’s the type signature of an agent?" or "the type of agency is "?

Comment by β-redex (GregK) on Side-channels: input versus output · 2022-12-13T10:55:27.736Z · LW · GW

This argument seems a bit circular, nondeterminism is indeed a necessary condition for exfiltrating outside information, so obviously if you prevent all nondeterminism you prevent exfiltration.

You are also completely right that removing access to obviously nondeterministic APIs would massively reduce the attack surface. (AFAIK most known CPU side-channel require timing information.)

But I am not confident that this kind of attack would be "robustly impossible". All you need is finding some kind of nondeterminism that can be used as a janky timer and suddenly all Spectre-class vulnerabilities are accessible again.

For instance I am pretty sure that rowhammer depends on the frequency of the writes. If you insert some instruction between the writes to RAM, you can suddenly measure the execution time of said instruction by looking at how many cycles it took to flip a bit with rowhammer. (I am not saying that this particular attack would work, I am just saying that I am not confident you couldn't construct something similar that would.)

I am confident that this direction of exfiltration would be robustly impossible.

If you have some deeper reason for believing this it would probably be worth its own post. I am not saying that its impossible to construct some clever sandbox environment that ensures determinism even on a buggy CPU with unknown classes of bugs, I am just saying that I don't know of existing solutions.

(Also in my opinion it would be much easier to just make a non-buggy CPU instead of trying to prove correctness of something executing on a buggy one. (Though proving your RAM correct seems quite hard, e.g. deriving the lack of rowhammer-like attacks from Maxwell's laws or something.))

Comment by β-redex (GregK) on Side-channels: input versus output · 2022-12-13T00:49:45.119Z · LW · GW

Yes, CPUs leak information: that is the output kind of side-channel, where an attacker can transfer information about the computation into the outside world. That is not the kind I am saying one can rule out with merely diligent pursuit of determinism.

I think you are misunderstanding this part, input side channels absolutely exist as well, Spectre for instance:

On most processors, the speculative execution resulting from a branch misprediction may leave observable side effects that may reveal private data to attackers.

Note that the attacker in this case is the computation that is being sandboxed.

Comment by β-redex (GregK) on Side-channels: input versus output · 2022-12-12T22:08:01.118Z · LW · GW

This implies that we could use relatively elementary sandboxing (no clock access, no networking APIs, no randomness, none of these sources of nondeterminism, and that’s about it) to prevent a task-specific AI from learning any particular facts

It's probably very hard to create such a sandbox though, your list is definitely not exhaustive. Modern CPUs leak information like a sieve. (The known ones are mostly patched of course but with this track record plenty more unknown vulnerabilities should exist.)

Maybe if you build the purest lambda calculus interpreter with absolutely no JIT and a deterministic memory allocator you could prove some security properties even when running on a buggy CPU? This seems like a bit of a stretch though. (And maybe while running it like this on a single thread you can prevent the computation from being able to measure time, any current practical AI needs massive parallelism to execute. With that probably all hopes of determinism and preventing timing information from leaking in go out the window.)

Comment by β-redex (GregK) on Formalization as suspension of intuition · 2022-12-12T15:04:18.510Z · LW · GW

Also I just found that you already argued this in an earlier post, so I guess my point is a bit redundant.

Anyway, I like that this article comes with an actual example, we could probably use more examples/case studies for both sides of the argument.

Comment by β-redex (GregK) on Formalization as suspension of intuition · 2022-12-12T09:34:44.115Z · LW · GW

Upon reading the title I actually thought the article would argue the exact opposite, that formalization affects intuition in a negative way. I like non-eucledian geometry as a particular example where formalization actually helped discovery.

But this is definitely now always true. For instance if you wanted to intuitively understand why addition of naturals is commutative, maybe to build intuition for recognizing similar properties elsewhere, would this formal proof really help?

plus_comm =
fun n m : nat =>
nat_ind (fun n0 : nat => n0 + m = m + n0)
  (plus_n_O m)
  (fun (y : nat) (H : y + m = m + y) =>
   eq_ind (S (m + y))
     (fun n0 : nat => S (y + m) = n0)
     (f_equal S H)
     (m + S y)
     (plus_n_Sm m y)) n
     : forall n m : nat, n + m = m + n

This is as formal as it gets, a 100% machine checkable proof without room for human error.

I think formalization is just a tool that may or may not be helpful depending on your goal, and the real question is how you can tell ahead of time what level of formalization will be helpful?

Comment by β-redex (GregK) on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-06T23:40:27.587Z · LW · GW

Isn't this similar to a Godzilla Strategy? (One AI overseeing the other.)

That variants of this approach are of use to superintelligent AI safety: 40%.

Do you have some more detailed reasoning behind such massive confidence? If yes, it would probably be worth its own post.

This seems like a cute idea that might make current LLM prompt filtering a little less circumventable, but I don't see any arguments for why this would scale to superintelligent AI. Am I missing something?

Comment by β-redex (GregK) on Is school good or bad? · 2022-12-03T22:47:25.542Z · LW · GW

Collaborating with an expert/getting tutoring from an expert might be really good?

Probably. How does one go about finding such experts, who are willing to answer questions/tutor/collaborate?

(I think the usual answer to this is university, but to me this does not seem to be worth the effort. Like I maybe met 1-2 people at uni who would qualify for this? How do you find these people more effectively? And even when you find them, how do you get them to help you? Usually this seems to require luck & significant social capital expenditure.)

Comment by β-redex (GregK) on Is school good or bad? · 2022-12-03T22:27:09.199Z · LW · GW

I unfortunately don't have any answers, just some more related questions:

  • Does anyone have practical advice on this topic? In the short term we are obviously powerless to change the system as a whole. But I couldn't in good conscience send my children to suffer through the same system I was forced to spend a large part of my youth in. Are there any better practically available alternatives?
  • What about socialization? School is quite poor at this, yet unilaterally removing one kid would probably make them even worse off. (Since presumably all other kids their age are still at school.)
  • As an adult, what actually useful methods of learning exist? I learned the vast majority of my useful knowledge through autodidactism, everything else (school, university) is pretty much noise. I would be open to alternatives, but I haven't seen any kind of "teaching" so far that came anywhere close.
Comment by β-redex (GregK) on Is ChatGPT rigth when advising to brush the tongue when brushing teeth? · 2022-12-03T04:55:23.506Z · LW · GW

ability to iterate in a fast matter

This is probably key. If GPT can solve something much faster that's indeed a win. (With the SPARQL example I guess it would take me 10-20 minutes to look up the required syntax and fields, and put them together. GPT cuts that down to a few seconds, this seems quite good.)

My issue is that I haven't found a situation yet where GPT is reliably helpful for me. Maybe someone who has found such situations, and reliably integrated "ask GPT first" as a step into some of their workflows could give their account? I would genuinely be curious about practical ways people found to use these models.

My experience has been quite bad so far unfortunately. For example I tried to throw a problem at it that I was pretty sure didn't have an easy solution, but I just wanted to check that I didn't miss anything obvious. The answer I would expect in this case is "I don't know of any easy solution", but instead I got pages of hallucinated BS. This is worse than if I just hadn't asked GPT at all since now I have to waste my time reading through its long answers just to realize it's complete BS.

Comment by β-redex (GregK) on Is ChatGPT rigth when advising to brush the tongue when brushing teeth? · 2022-12-02T18:46:47.995Z · LW · GW

Yeah I guess many programming problems fall into the "easy to verify" category. (Though definitely not all.)

Comment by β-redex (GregK) on Is ChatGPT rigth when advising to brush the tongue when brushing teeth? · 2022-12-02T15:12:18.603Z · LW · GW

And apparently ChatGPT will shut you right down when attempting to ask for sources:

I'm sorry, but I am unable to provide sources for my claims as I am a large language model trained by OpenAI and do not have the ability to browse the internet. My answers are based on the information I have been trained on, but I cannot provide references or citations for the information I provide.

So... if you have to rigorously fact-check everything the AI tells you, how exactly is it better than just researching things without the AI in the first place? (I guess you need a domain where ChatGPT has adequate knowledge and claims in said domain are easily verifiable?)

Comment by β-redex (GregK) on Did ChatGPT just gaslight me? · 2022-12-02T14:44:51.003Z · LW · GW

Wow had this happen literally on my first interaction with ChatGPT. It seems to be just making stuff up, and won't back down when called out.

  • ChatGPT: "[...] run coqc --extract %{deps} --ocaml-script %{targets} [...]"
  • Me: "coqc does not have an --extract flag. (At least not on my machine, I have coq version 8.16.0)"
  • ChatGPT: "[...] You are correct, the --extract flag was added to the coqc command in Coq version 8.17.0. [...] Another option would be to use the coq-extract-ocaml utility, which is included with Coq [...]"
  • Me: "Coq 8.17.0 does not exist yet. There is no coq-extract-ocaml utility either. I am sorry, but are you just making stuff up? I would much prefer that you just said so if you can't answer something."
  • ChatGPT:

[...] In this case, I mentioned Coq version 8.17.0 and the coq-extract-ocaml utility because they were relevant to the question that was asked, but I did not intend to imply that they actually exist or are currently available. My answer was based on the information that I have been trained on, which does not include any developments or changes that have occurred since my knowledge cutoff in 2021. [...]

(emphasis mine)

Comment by β-redex (GregK) on What videos should Rational Animations make? · 2022-11-26T23:50:47.165Z · LW · GW

The Sequences. Surprised nobody mentioned this one yet.

While I am pretty sure you can't compress the length of the sequences much without losing any valuable information, the fact is that for most people it's just way too long to ever read through, and having some easily digestible video material would still be quite valuable. (Hopefully also by getting some people interested in reading the real thing?)

Turning the sequences into a set of videos would be a massive distillation job. On the high level it would ideally be something like:

  1. Extract the set of important ideas the sequences convey. Identify the necessary dependencies between them.
  2. Start turning the ideas into videos in topological order. (Each video should link the relevant posts for further reading.)
  3. ... Profit?

Would making these videos be optimal in some sense? I don't know. Is trying to create more rationalists a good idea? Eliezer wrote the sequences with the express intent of creating more rationalists to help reduce AI risk. Is this still relevant? Maybe. AFAIK many people think that alignment is currently bottlenecked on good researchers. (Of course in this framing many other alignment relevant technical topics also make sense as video ideas.)

Comment by β-redex (GregK) on Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue) · 2022-11-23T00:41:20.738Z · LW · GW

I don't know anything about Diplomacy and I just watched this video, could someone expand a bit on why this game is a particularly alarming capability gain? The chat logs seemed pretty tame, the bot didn't even seem to attempt psychological manipulation or gaslighting or anything similar. What important real world capability does Diplomacy translate into that other games don't? (People for instance don't seem very alarmed nowadays about AI being vastly superhuman at chess or Go.)

Comment by β-redex (GregK) on Why I'm Working On Model Agnostic Interpretability · 2022-11-13T00:54:47.375Z · LW · GW

I think we usually don't generalize very far not because we don't have general models, but because it's very hard to state any useful properties about very general models.

You can trivially view any model/agent as a Turing machine, without loss of generality.[1] We just usually don't do that because it's very hard to state anything useful about such a general model of computation. (It seems very hard to prove/disprove P=NP, we know for a fact that halting is undecidable, etc.)

I am very interested though what model John will use to state useful theorems that capture both the current DL paradigm, and the next paradigm with high probability. (He might have written about this somewhere already, haven't read all his stuff yet.)


  1. Assuming determinism, but OP's black-box interpretability stuff already seems to assume that. ↩︎

Comment by β-redex (GregK) on I Converted Book I of The Sequences Into A Zoomer-Readable Format · 2022-11-10T14:36:31.704Z · LW · GW

As others said here kudos for the effort, but this iteration seems horrible to me.

When I was reading the Sequences I often had to go back and reread a sentence/paragraph/even page to fully understand everything. I also had to stop sometimes to really deeply think about the ideas (or just appreciate their beauty). I feel the text has low redundancy and assumes that you can go back and reread if you missed something (would be strange if it didn't), and is not directly suitable for a video format.

I tried to watch some of the clips, but it is just waay too fast for me.

I am afraid that much greater effort is required to turn material like this into a video format, like what Rational Animations is doing.

Comment by β-redex (GregK) on K-types vs T-types — what priors do you have? · 2022-11-04T16:00:04.870Z · LW · GW

I see, with that mapping your original paragraph makes sense.

Just want to note though that such a mapping is quite weird and I don't really see a mathematical justification behind it. I only know of the Curry-Howard isomorphism as a way to translate between proof theory and computer science, and it maps programs to proofs, not to axioms.

Comment by β-redex (GregK) on K-types vs T-types — what priors do you have? · 2022-11-04T01:23:32.382Z · LW · GW

We can also interpret this in proof theory. K-types don't care how many steps there are in the proof, they only care about the number of axioms used in the proof. T-types do care how many steps there are in the proof, whether those steps are axioms or inferences.

I don't get how you apply this in proof theory. If K-types want to minimize the Kolmogorov-complexity of things, wouldn't they be the ones caring about the description length of the proof? How do axioms incur any significant description length penalty? (Axioms are usually much shorter to describe than proofs, because you of course only have to state the proposition and not any proof.)

Comment by β-redex (GregK) on What is good Cyber Security Advice? · 2022-10-25T18:25:07.943Z · LW · GW

Yeah, I know you are looking for more practical advice here, that's why I posted this as a comment instead of an answer.

Eventually someone will have to aim for the "Excellent" level though (even if not against humans, surely against an AGI), and I just wanted to highlight that this is very much an unsolved problem.

Comment by β-redex (GregK) on What is good Cyber Security Advice? · 2022-10-25T18:05:48.667Z · LW · GW

In my view the field of cybersecurity currently is very far from what "theoretically perfect security" would look like. I am not sure how much ahead private knowledge is on the topic, but publicly cybersecurity seems to focus on defending against security holes already demonstrated to be exploitable, and providing some probabilistic defense against some other ones as well. This seems to work well in practice, I don't know why though. (Maybe highly motivated threat actors with sufficient resources simply don't exist?)

Conventional approaches work well if your adversary is limited, but Eliezer gives good arguments for why alignment orgs should seriously beef up security.

If you take this to the extreme and allow an unrestricted attacker (like an AGI), you pretty much turn security into the builder-breaker game. Let me demonstrate what I mean by playing it against myself:

  • B: We have a private Github project.
  • A: You don't know anything about GitHub, I could easily just hack them or have an insider at Microsoft.
  • B: Okay, we will self host a GitLab instance.
  • A: Where are you hosting it, Azure? Remember, I have an insider at MS. (And at every other cloud provider as well.)
  • B: Okay, we will buy our own server, stick it in a colocated data center.
  • A: How are you buying the server? I have active surveillance on you and can tamper with any package you receive. I will install a hardware backdoor in the server.
  • B: Fine, I will buy Raspberry Pi, and carefully inspect it for any hardware modifications, and I will compare every trace on the PCB to the schematics.
  • A: I have an insider at Broadcom giving me access to their hardware backdoor built into all of their SoCs.
  • B: Hm... ok, we will use a RISC-V SoC instead that is completely open source.
  • A: Back to the software side: GitLab is a large piece of software, you don't seriously believe that it doesn't have any 0-days?
  • B: True. Well GitLab is far too large to properly audit, so I will just sandbox the whole thing. I will put it in its own container, and only allow it to be accessed over our VPN.
  • A: Hm... well I can backdoor GitLab itself as well. I will coerce the maintainer of some obscure Go package that GitLab depends on to add my backdoor.
  • B: Well, but I can just restrict internet access to the GitLab container, what good is your backdoor then?
  • A: Right, right. Well... have you heard that CPUs can be used as radio transmitters?
  • B: Easy, just wrap the whole server in aluminium foil. Anything else?
  • A: We are not done with software yet. Linux has remotely exploitable 0-days as well. And good luck auditing the whole Linux codebase.
  • B: Okay, we will switch to Xen. (Or somehow turn seL4 into a hypervisor.) One domain runs GitLab with Linux without network access, another our custom audited ring 0 WireGuard stack.
  • A: Fine, let's assume for now that your software is secure. So have you heard of this company called TSMC that produces most of the world's chips...

And so on. Just like for alignment, I don't really know if this game can be won by the builder.

Comment by β-redex (GregK) on Have you noticed any ways that rationalists differ? [Brainstorming session] · 2022-10-24T12:05:59.110Z · LW · GW

probably much of what makes rationalists so male is that rationalism selects for abilities/interests related to programming, which is itself very male-skewed

This is just pushing the question one step back though, I don't know of any good theories for why software engineering is heavily biased towards males either.

Comment by β-redex (GregK) on Have you noticed any ways that rationalists differ? [Brainstorming session] · 2022-10-23T13:02:02.850Z · LW · GW

One thing that annoys me with "normal" people is their inability to easily talk about the meta level of a particular topic. I feel like if I start talking about something meta some people get internally confused a bit, and instead of asking for clarification they will interpret some parts of what I said at the object level, discard the rest, and continue the conversation as if nothing happened.

Sure, you can talk about meta topics with most people with enough effort, you can try carefully prompting them (like "so what I am going to say may sound strange, I am not talking about X things by themselves, but I am saying something about all Xs in general, disregarding this particular example of X we have just been talking about, does that make sense?"), or if they have previously been exposed to meta-level discussion on this topic that also makes things much easier.

I feel like most rationalists can jump between object and meta level with ease, and I particularly enjoy conversing with people who can do this.

Comment by β-redex (GregK) on My search for a reliable breakfast · 2022-10-20T16:22:45.715Z · LW · GW

food for me is fuel

https://powersmoothie.org/ maybe? It embraces this view. The cleanup consists of rinsing a single blender.

Comment by β-redex (GregK) on Transformative VR Is Likely Coming Soon · 2022-10-14T00:54:12.678Z · LW · GW

I haven't looked too much into that paper, but yeah, it could be that at smaller scales you are just perceiving the friction.

But just from personal experience, I am pretty confident that at least in the 10μm-100μm range humans have high fidelity tactile perception, and are able to distinguish various patterns and not just friction.

Comment by β-redex (GregK) on Transformative VR Is Likely Coming Soon · 2022-10-13T20:31:26.499Z · LW · GW

One big item I see missing is haptic feedback. Like, if I ask myself in what ways is VR still different from regular reality, I feel like there is still a lot missing.[1]

I think working with physical objects is a big component of activities that can't be done remotely currently. But even if we just focus on interpersonal communication, being able to touch other people is an important component. Even if we are just talking about a strictly formal business context, handshakes at least still almost always occur.

And I just don't see high fidelity haptic feedback getting sufficiently advanced in the near future. Interaction even with just macro scale objects is still a major challenge. And the human touch is remarkably sensitive at the micro scale as well. "The lowest amplitude of the wrinkles so distinguished was approximately 10 nm, demonstrating that human tactile discrimination extends to the nanoscale."[2] There is even research suggesting that we can sometimes detect single atom differences.[3]


  1. Admittedly though, a VR that is indistinguishable from reality would be very creepy. ↩︎

  2. https://doi.org/10.1038/srep02617 ↩︎

  3. https://doi.org/10.1039/D1SM00451D ↩︎

Comment by β-redex (GregK) on Petrov Day Retrospective: 2022 · 2022-09-29T15:00:40.601Z · LW · GW

To me it seems that the average best practices are being followed.[1] But these "best practices" are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.

A much more interesting question to ask here is what if something important like ... humanity's survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn't matter at all how it was obtained.)


  1. And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this. ↩︎

Comment by β-redex (GregK) on Petrov Day Retrospective: 2022 · 2022-09-29T00:45:30.054Z · LW · GW

Yeah, makes sense. Indeed such tech debt can't be fixed overnight.

Comment by β-redex (GregK) on Petrov Day Retrospective: 2022 · 2022-09-29T00:39:44.904Z · LW · GW

I see, makes sense.

On the other hand I am afraid this reinforces NaiveTortoise's point, this seems like an underlying issue that could potentially lead to bugs much worse than this...