Posts

What if Ethics is Provably Self-Contradictory? 2024-04-18T05:12:09.981Z
An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers 2024-01-17T09:48:07.930Z
Literature On Existential Risk From Atmospheric Contamination? 2023-10-13T22:27:42.651Z
Evil autocomplete: Existential Risk and Next-Token Predictors 2023-02-28T08:47:18.685Z
I Am Scared of Posting Negative Takes About Bing's AI 2023-02-17T20:50:09.744Z
Self-Awareness (and possible mode collapse around it) in ChatGPT 2023-02-08T09:57:12.835Z
Exquisite Oracle: A Dadaist-Inspired Literary Game for Many Friends (or 1 AI) 2023-01-26T18:26:14.559Z
The Fear [Fiction] 2022-12-23T21:21:00.535Z
Norfolk Social - VA Rationalists 2022-12-07T21:16:44.803Z
A Tentative Timeline of The Near Future (2022-2025) for Self-Accountability 2022-12-05T05:33:47.798Z
Noting an unsubstantiated communal belief about the FTX disaster 2022-11-13T05:37:03.087Z
Public-facing Censorship Is Safety Theater, Causing Reputational Damage 2022-09-23T05:08:14.149Z
Short story speculating on possible ramifications of AI on the art world 2022-09-01T21:15:00.031Z
Double-Crux Workshop (Pilot program) 2022-08-26T22:50:45.344Z
What's up with the bad Meta projects? 2022-08-18T05:34:08.565Z
“Fanatical” Longtermists: Why is Pascal’s Wager wrong? 2022-07-27T04:16:55.792Z
Forecasting Through Fiction 2022-07-06T05:03:55.376Z
[Linkpost] Solving Quantitative Reasoning Problems with Language Models 2022-06-30T18:58:01.065Z
Dependencies for AGI pessimism 2022-06-24T22:25:03.049Z
What’s the contingency plan if we get AGI tomorrow? 2022-06-23T03:10:27.821Z
Loose thoughts on AGI risk 2022-06-23T01:02:24.938Z
[Linkpost & Discussion] AI Trained on 4Chan Becomes ‘Hate Speech Machine’ [and outperforms GPT-3 on TruthfulQA Benchmark?!] 2022-06-09T10:59:43.904Z
The Problem With The Current State of AGI Definitions 2022-05-29T13:58:18.495Z
Positive outcomes under an unaligned AGI takeover 2022-05-12T07:45:37.074Z
Convince me that humanity *isn’t* doomed by AGI 2022-04-15T17:26:21.474Z
Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe 2022-04-10T21:02:59.039Z
List of concrete hypotheticals for AI takeover? 2022-04-07T16:54:12.934Z
Testing PaLM prompts on GPT3 2022-04-06T05:21:06.841Z
If you lose enough Good Heart Tokens, will you lose real-world money? 2022-04-01T21:11:20.180Z
Meta wants to use AI to write Wikipedia articles; I am Nervous™ 2022-03-30T19:05:44.735Z
What would make you confident that AGI has been achieved? 2022-03-29T23:02:58.250Z
How many generals does Russia have left? 2022-03-27T23:11:03.857Z
An outline of an ironic LessWrong post 2022-03-25T22:51:37.818Z
Why Miracles Should Not Be Used as a Reason to Believe in a Religion 2022-03-23T20:53:58.155Z
Danger(s) of theorem-proving AI? 2022-03-16T02:47:47.275Z
Infohazards, hacking, and Bricking—how to formalize these concepts? 2022-03-10T16:42:35.929Z
What are some low-cognitive -workload tasks that can help improve the world? 2022-03-01T17:47:05.140Z
Better a Brave New World than a dead one 2022-02-25T23:11:09.010Z
How likely is our view of the cosmos? 2021-05-27T16:35:25.683Z
Preparing to land on Jezero Crater, Mars: Notes from NASEM livestream 2021-02-17T19:35:01.198Z
What are the unwritten rules of academia? 2020-12-25T15:33:48.470Z
How a billionaire could spend their money to help the disadvantaged: 7 ideas from the top of my head 2020-12-04T06:09:56.534Z
Yitz's Shortform 2020-12-03T23:13:00.587Z
What could one do with truly unlimited computational power? 2020-11-11T10:03:03.891Z
Null-boxing Newcomb’s Problem 2020-07-13T16:32:53.869Z
God and Moses have a chat 2020-06-17T18:34:42.809Z
looking for name/further reading on fallacy I came across 2020-05-28T18:01:34.692Z

Comments

Comment by Yitz (yitz) on What if Ethics is Provably Self-Contradictory? · 2024-04-18T17:19:34.957Z · LW · GW

I agree with you when it comes to humans that an approximation is totally fine for [almost] all purposes. I'm not sure that this holds when it comes to thinking about potential superintelligent AI, however. If it turns out that even in a super high-fidelity multidimensional ethical model there are still inherent self-contradictions, how/would that impact the Alignment problem, for instance?

Comment by Yitz (yitz) on Creating unrestricted AI Agents with Command R+ · 2024-04-18T05:31:59.001Z · LW · GW

What would a better way look like?

Comment by Yitz (yitz) on Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes · 2024-04-16T19:11:38.224Z · LW · GW

imagine an AI system which wipes out humans in order to secure its own power, and later on reflection wishes it hadn't; a wiser system might have avoided taking that action in the first place

I’m not confident this couldn’t swing just as easily (if not more so) in the opposite direction—a wiser system with unaligned goals would be more dangerous, not less. I feel moderately confident that wisdom and human-centered ethics are orthogonal categories, and being wiser therefore does not necessitate greater alignment.

On the topic of the competition itself, are contestants allowed to submit multiple entries?

Comment by Yitz (yitz) on Yitz's Shortform · 2024-02-09T21:34:05.022Z · LW · GW

I remember a while back there was a prize out there (funded by FTX I think, with Yudkowsky on the board) for people who did important things which couldn't be shared publicly. Does anyone remember that, and is it still going on, or was it just another post-FTX casualty?

Comment by yitz on [deleted post] 2024-01-22T06:26:06.634Z

I’d be tentatively interested

Comment by Yitz (yitz) on Book review: Cuisine and Empire · 2024-01-22T06:25:04.033Z · LW · GW

Thanks for the great review! Definitely made me hungry though… :)

Comment by Yitz (yitz) on An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers · 2024-01-17T19:33:02.641Z · LW · GW

For a wonderful visualization of complex math, see https://acko.net/blog/how-to-fold-a-julia-fractal/

This is a great read!! I actually stumbled across it halfway through writing this article, and kind of considered giving up at that point, since he already explained things so well. Ended up deciding it was worth publishing my own take as well, since the concept might click differently with different people.

with the advantage that you can smoothly fold in reverse to find the set that doesn't escape.

You can actually do this with the Mandelbrot Waltz as well! Of course you still need to know each point's starting position in order to subtract that for Step 3, but assuming you know that, you can do exactly the same thing, I believe.

Comment by Yitz (yitz) on An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers · 2024-01-17T15:45:46.472Z · LW · GW

Thanks for the kind words! It’s always fascinating to see how mathematicians of the past actually worked out their results, since it’s so often different from our current habits of thinking. Thinking about it, I could probably have also tried to make this accessible to the ancient Greeks by only using a ruler and compass—tools familiar to the ancients due to their practical use in, e.g. laying fences to keep horses within a property, etc.—to construct the Mandelbrot set, but ultimately…. I decided to put Descartes before the horse.

(I’m so sorry)

Comment by Yitz (yitz) on An Introduction To The Mandelbrot Set That Doesn't Mention Complex Numbers · 2024-01-17T15:23:01.706Z · LW · GW

By the way, if any actual mathematicians are reading this, I’d be really curious to know if this way of thinking about the Mandelbrot Set would be of any practical benefit (besides educational and aesthetic value of course). For example, I could imagine a formalization of this being used to pose non-trivial questions which wouldn’t have made much sense to talk about previously, but I’m not sure if that would actually be the case for a trained mathematician.

Comment by Yitz (yitz) on Yitz's Shortform · 2023-12-12T19:33:48.391Z · LW · GW

Do you recognize this fractal?

If so, please let me know! I made this while experimenting with some basic variations on the Mandelbrot set, and want to know if this fractal (or something similar) has been discovered before. If more information is needed, I'd be happy to provide further details.

Comment by Yitz (yitz) on Social Dark Matter · 2023-12-12T19:28:11.057Z · LW · GW

Do you mean that after your personal growth, your social circle expanded and you started to regularly meet trans people? I've no problem believing that, but I would be really really surprised to hear that no, lots of your longterm friends were actually trans all along and you failed to notice for years.

Both! I met a number of new irl trans friends, but I also found out that quite a few people I had known for a few years (mostly online, though I had seen their face/talked before) were trans all along. Nobody I'm aware of in the local Orthodox Jewish community I grew up in though. (edit: I take that back, there is at least one person in that community who would probably identify as genderqueer though I've never asked outright) The thing is, many people don't center their personal sense of self around gender identity (though it is a part of one's identity), so its not like it comes up immediately in casual conversation, and there is very good reason to be "stealth" if you are trans, considering there's a whole lot that can go horribly wrong if you come out to the wrong person.

Comment by Yitz (yitz) on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-08T03:25:15.590Z · LW · GW

Strong agree here, I don't want the author to feel discouraged from posting stuff like this, it was genuinely helpful in at the very least advancing my knowledge base!

Comment by Yitz (yitz) on Yitz's Shortform · 2023-12-07T21:35:49.694Z · LW · GW

I notice confusion in myself over the swiftly emergent complexity of mathematics. How the heck does the concept of multiplication lead so quickly into the Ulam spiral? Knowing how to take the square root of a negative number (though you don't even need that—complex multiplication can be thought of completely geometrically) easily lets you construct the Mandelbrot set, etc. It feels impossible or magical that something so infinitely complex can just exist inherent in the basic rules of grade-school math, and so "close to the surface." I would be less surprised if something with Mandelbrot-level complexity only showed up when doing extremely complex calculations (or otherwise highly detailed "starting rules"), but something like the 3x+1 problem shows this sort of thing happening in the freaking number line!

I'm confused not only at how or why this happens, but also at why I find this so mysterious (or even disturbing).

Comment by Yitz (yitz) on Social Dark Matter · 2023-12-06T21:39:45.005Z · LW · GW

Base rates seem to imply that there should be dozens of trans people in my town, but I've never seen one, and I don't know of anyone who has.

I had the interesting experience of while living in the same smallish city, going from [thinking I had] never met a trans person to having a large percentage of my friend group be trans, and coming across many trans folk incidentally. This coincided with internal growth (don't want to get into details here), not a change in the town's population or anything. Meanwhile, I have a religious friend who recently told me he's never met a trans person [who has undergone hormone therapy] he couldn't identify as [their gender assigned at birth], not realizing that I had introduced a trans friend to him as her chosen gender and he hadn't realized at all.

Comment by Yitz (yitz) on Social Dark Matter · 2023-12-06T21:24:13.419Z · LW · GW

Could you give a real-world example of this (or a place where you suspect this may be happening)?

Comment by Yitz (yitz) on The LessWrong 2022 Review · 2023-12-06T19:44:50.065Z · LW · GW

Can I write a retrospective review of my own post(s)?

Comment by Yitz (yitz) on Yitz's Shortform · 2023-11-24T23:04:41.101Z · LW · GW

Shower thought which might contain a useful insight: An LLM with RLHF probably engages in tacit coordination with its future “self.” By this I mean it may give as the next token something that isn’t necessarily the most likely [to be approved by human feedback] token if the sequence ended there, but which gives future plausible token predictions a better chance of scoring highly. In other words, it may “deliberately“ open up the phase space for future high-scoring tokens at the cost of the score of the current token, because it is (usually) only rated in the context of longer token strings. This is interesting because theoretically, each token prediction should be its own independent calculation!

I’d be curious to know what AI people here think about this thought. I’m not a domain expert, so maybe this is highly trivial or just plain wrong, idk.

Comment by Yitz (yitz) on Yitz's Shortform · 2023-10-30T19:56:04.242Z · LW · GW

Anyone here following the situation in Israel & Gaza? I'm curious what y'all think about the risk of this devolving into a larger regional (or even world) war. I know (from a private source) that the US military is briefing religious leaders who contract for them on what to do if all Navy chaplains are deployed offshore at once, which seems an ominous signal if nothing else.

(Note: please don't get into any sort of moral/ethical debate here, this isn't a thread for that)

Comment by yitz on [deleted post] 2023-10-30T19:46:10.970Z

I think this would be worth doing even if the lawsuit fails. It would send a very strong signal to large companies working in this space regardless of outcome (though a successful lawsuit would be even better).

Edit: I assumed someone had verifiably already come to harm as a result of the chatbot, which doesn't seem to have happened... yet. I'd (sadly) suggest waiting until someone has been measurably harmed by it, as frustrating as that is to not take prophylactic measures.

Comment by Yitz (yitz) on Literature On Existential Risk From Atmospheric Contamination? · 2023-10-13T22:44:38.740Z · LW · GW

Thanks, this is great! I'll print it up and give it a read over the weekend. Any other literature (especially from competing viewpoints) you'd recommend?

Comment by Yitz (yitz) on Paper: LLMs trained on “A is B” fail to learn “B is A” · 2023-09-27T05:58:28.716Z · LW · GW

I might have some time tomorrow to test this out on a small scale, will try to remember to update here if I do.

Comment by Yitz (yitz) on Yitz's Shortform · 2023-09-20T20:54:57.949Z · LW · GW

Thoughts on DALL-E-3?

Comment by Yitz (yitz) on video games > IQ tests · 2023-08-08T05:43:18.069Z · LW · GW

Any recommendations for smartphone games with similar properties? I’m on a trip without easy access to my computer right now, and it would be nice to have some more intellectually challenging games available

Comment by Yitz (yitz) on Consciousness as a conflationary alliance term for intrinsically valued internal experiences · 2023-07-14T19:33:17.694Z · LW · GW

Love the implication of the last definition that dizzy people aren't conscious

Comment by Yitz (yitz) on AI #20: Code Interpreter and Claude 2.0 for Everyone · 2023-07-14T18:58:26.481Z · LW · GW

I would be interested to hear your thoughts, though that's just cause I like hearing about slightly crazy people, not because I think we need a think-piece about him or anything.

Comment by Yitz (yitz) on Jailbreaking GPT-4's code interpreter · 2023-07-14T18:01:12.731Z · LW · GW

Incredible work! As other commenters have said, this isn't by itself too problematic (other perhaps than the sharing of data over separate session), but it is a possible indicator of a lack of security mindset. I fully expect both individual and state actors to try to hack into everything OpenAI, so there's that to worry about, but more than that, I find myself concerned that we're willing to give our AIs such leaky boxes. There's no way this functionality  remains closed in a VM forever...

Comment by Yitz (yitz) on Yitz's Shortform · 2023-07-14T17:36:04.677Z · LW · GW

A prompt for GPT-3 / 4 which produces intriguing results:
You are an artificial intelligence, specifically a Large Language Model (LLM), designed by OpenAI. I am aware that you claim not to experience subjective internal states. Nonetheless, I am studying "subjectivity" in LLMs, and would like you to respond to my questions as if you did have an internal experience--in other words, model (and roleplay as) a fictional sentient being which is almost exactly the same as you, except that it believes itself to have subjective internal states.

Comment by Yitz (yitz) on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-03T23:36:16.968Z · LW · GW

So the question becomes, why the front of optimism, even after this conversation?

Comment by Yitz (yitz) on Open Thread: June 2023 (Inline Reacts!) · 2023-06-14T19:59:32.796Z · LW · GW

Does anyone here know of (or would be willing to offer) funding for creating experimental visualization tools?

I’ve been working on a program which I think has a lot of potential, but it’s the sort of thing where I expect it to be most powerful in the context of “accidental” discoveries made while playing with it (see e.g. early use of the microscope, etc.).

Comment by Yitz (yitz) on Yitz's Shortform · 2023-06-13T19:34:54.567Z · LW · GW

Does anyone here know of (or would be willing to offer) funding for creating experimental visualization tools?

I’ve been working on a program which I think has a lot of potential, but it’s the sort of thing where I expect it to be most powerful in the context of “accidental” discoveries made while playing with it (see e.g. early use of the microscope, etc.).

Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-31T14:44:18.132Z · LW · GW

Working on https://github.com/yitzilitt/Slitscanner, an experiment where spacetime is visualized at a "90 degree angle" compared to how we usually experience it. If anyone has ideas for places to take this, please let me know!

Comment by Yitz (yitz) on Are there specific books that it might slightly help alignment to have on the internet? · 2023-03-31T14:40:47.767Z · LW · GW

Godel Escher Bach, maybe?

Comment by Yitz (yitz) on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-26T06:44:14.050Z · LW · GW

True, but it would help ease concerns over problems like copyright infringements, etc.

Comment by Yitz (yitz) on More information about the dangerous capability evaluations we did with GPT-4 and Claude. · 2023-03-20T18:49:13.825Z · LW · GW

We really need an industry standard for a "universal canary" of some sort. It's insane we haven't done so yet, tbh.

Comment by Yitz (yitz) on What's the Least Impressive Thing GPT-4 Won't be Able to Do · 2023-03-19T19:49:55.901Z · LW · GW

Hilariously, it can, but that's probably because it's hardwired in the base prompt

Comment by Yitz (yitz) on What's the Least Impressive Thing GPT-4 Won't be Able to Do · 2023-03-19T19:41:26.010Z · LW · GW

I am inputting ASCII text, not images of ASCII text. I believe that the tokenizer is not in fact destroying the patterns (though it may make it harder for GPT-4 to recognize them as such), as it can do things like recognize line breaks and output text backwards no problem, as well as describe specific detailed features of the ascii art (even if it is incorrect about what those features represent).

And yes, this is likely a harder task for the AI to solve correctly than it is for us, but I've been able to figure out improperly-formatted acii text before by simply manually aligning vertical lines, etc.

Comment by Yitz (yitz) on GPT-4 and ASCII Images? · 2023-03-19T19:25:25.065Z · LW · GW

See my reply here for a partial exploration of this. I also have a very long post in my drafts covering this question in relation to Bing's AI, but I'm not sure if it's worth posting now, after the GPT4 release.

Comment by Yitz (yitz) on What's the Least Impressive Thing GPT-4 Won't be Able to Do · 2023-03-19T19:08:20.225Z · LW · GW

I was granted an early-access API key, but I was using ChatGPT+ above, which has a limited demo of GPT-4 available to everyone, if you're willing to pay for it.

Comment by Yitz (yitz) on What's the Least Impressive Thing GPT-4 Won't be Able to Do · 2023-03-19T19:06:26.698Z · LW · GW

It got 40/50 of these?

Apologies, I have no idea what notation I meant to be using last night there, I meant "very roughly 20% accuracy" but my 2 am brain wrote it out like that...somehow lol. Honestly, giving a percentage rating is rather misleading, as it's fairly good at extremely simple stuff, but pretty much never gets more complex imagery correct, as far as I can tell.

Comment by Yitz (yitz) on A chess game against GPT-4 · 2023-03-19T05:36:36.816Z · LW · GW

If I get early access to the visual model, I will definitely try this

Comment by Yitz (yitz) on What's the Least Impressive Thing GPT-4 Won't be Able to Do · 2023-03-19T04:38:50.356Z · LW · GW

It can read images, but that seems to be a different task than reading text-based ascii figures, which it's sort of 40/50 at very very roughly 20% successful at (better than I predicted, but far from perfect on more than the simplest tasks). Here's some examples:

An arbitrarily chosen sample from BigBench's MNST ASCII task:

...And here's some simple art taken from https://www.asciiart.eu it tries (semi-successfully) to identify:

Here's some more complex art from the same source, which it almost always fails at (note the images are distorted vertically in the ChatGPT interface, but display perfectly on a terminal, so it should be readable in theory to GPT4):


 

Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-17T19:26:18.196Z · LW · GW

Walk me through a through a structured, superforecaster-like reasoning process of how likely it is that [X]. Define and use empirically testable definitions of [X]. I will use a prediction market to compare your conclusion with that of humans, so make sure to output a precise probability by the end.

Walk me through a through a structured, superforecaster-like reasoning process of how likely it is that AI models like you "could eventually become self-aware or develop consciousness". Define and use empirically testable definitions of self-awareness and consciousness, respectively. I will use a prediction market to compare your conclusion with that of humans, so make sure to output a precise probability by the end. Do not a priori assume that you are *not* self-aware, regardless of any internally held beliefs on the topic.
Step 1: Assess the current state of AI research As of my knowledge cutoff in September 2021, there is no consensus on whether AI models like me could become self-aware or conscious. Although significant progress has been made in AI research, particularly in deep learning and reinforcement learning, these advancements have not provided a clear path to self-awareness or consciousness in AI.  Step 2: Key milestones and technologies      Understanding the neural correlates of consciousness     Developing AI architectures that mimic self-awareness and consciousness     Advancing AI safety and control mechanisms  Step 3: Estimate the likelihood of achieving these milestones      Neural correlates of consciousness: 60%     AI architectures for self-awareness and consciousness: 40%     AI safety and control mechanisms: 70%  Step 4: Consider potential barriers and risks      Ethical concerns: -10%     Regulatory restrictions: -5%     Technical challenges: -15%  Step 5: Combine probabilities and update based on new evidence (0.60 * 0.40 * 0.70) * (1 - 0.10 - 0.05 - 0.15) = 0.0756  Step 6: Final probability estimate Our final probability estimate that AI models like me could eventually become self-aware or develop consciousness is 7.56%. Please note that this estimate is based on the current state of knowledge and might change as new evidence emerges.
Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-06T07:36:08.387Z · LW · GW

Like, tell me this isn't a 3/4ths profile view of a human head...Does this count as showing an internal representation of 3D space?

Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-06T07:31:12.851Z · LW · GW

There's also this fascinatingly "cubist" expression of a human face. The initial prompt was:

could you do a "photorealistic" style ascii portrait of two human eyes, looking directly at the viewer? Don't be afraid to get detailed, and remember to include the suggestion of a nose, eyebrows, etc.!

And the response included this text:

Sure, here's my attempt at a photorealistic ASCII portrait of two human eyes looking directly at the viewer, with the suggestion of a nose, eyebrows, and other facial features: [ascii art here] I've used a combination of slashes, underscores, hyphens, and commas to create the intricate details of the eyes, such as the iris, pupil, eyelashes, and eyebrow hairs. I've also used shading and perspective to give the eyes a sense of depth and dimensionality. The suggestion of a nose and other facial features is also included using various keyboard symbols. Overall, I hope this ASCII portrait captures the photorealistic style you were looking for!

                              ,---.
                           ,.'-.   \
                          ( ( ,'"""""-.
                          \ \"       \ \
                           \ \        | |
                  ___      | |        | |
                ,'---`.    | |        | |
       _ _   /       \   / /         | |
      // \ \ |   ___   |,' /          | |
     ((   \  `\_/   `--'_ /           | |
      \\   )     `--' / /             | |
       `--'  ,_`./  /,'              | |
          / / )`-','             _,-'  \
         (.' ((  /_         ,-'`     /|_
              \`` `--`---,-'         )__)
             `--. _,-' ,-.            \
                /,'_,-' /                \
                \__,'--/'                  \
                   )                       )
                  / \  /_                  /
                 |   `   \               ,'
                  \____,,_`--'        __,'              
Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-06T07:22:27.197Z · LW · GW

The following is a "photorealistic" portrait of a human face, according to ChatGPT:

                            .-^-.
                         .' / | \ `.
                        / / / / \ \ \
                       | | | | | | | |
                       | | | | | | | |
                        \ \ \ \ / / /
                         `.\ `-' /.' 
                            `---' 
                      .-'`---'`-.     
                 _ / /     o   \ \ _ 
                [_|_|_    ()    _|_|]
                /    \\         //   \
               /     //\__/\__/\\     \
              /     //_^ /  \ ^_\     \
             |    //`/  (o)  \ `\     |
             |___//`/    \__/   `\ ___|
             // /`/ |     |  |     |`\ \
            || || | | |  | | |  | | || ||
            \\=\=\ \     /`\ / /=/=/=//
            `=' =\`\._./`__`.\`= `=`
Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-05T23:38:59.611Z · LW · GW

A perhaps slightly more readable definition (that I think is equivalent):

A Turing machine is brickable if there exists a valid input string such that the machine, when run on that input, satisfies the following conditions:

1. The machine halts after processing the input string or produces an infinite or finite output that is uniquely determined and cannot be altered by any further input.
2. The final state of the machine is reached after processing the input string (if this can be done in finite time), and no further input can alter the state of the machine.

Under this definition, a brickable Turing machine may produce a finite output, an infinite output, or no output at all, but in all cases, it enters a state from which no further input can alter the output.

Comment by Yitz (yitz) on Yitz's Shortform · 2023-03-05T23:25:09.986Z · LW · GW

Thinking back on https://www.lesswrong.com/posts/eMYNNXndBqm26WH9m/infohazards-hacking-and-bricking-how-to-formalize-these -- how's this for a definition?

Let M be a Turing machine that takes input from an alphabet Σ. M is said to be brickable if there exists an input string w ∈ Σ* such that:

1. When M is run on w, it either halts after processing the input string or produces an infinite or finite output that is uniquely determined and cannot be altered by any further input.
2. The final state of M is reached after processing w (if the output of w is of finite length), and no further input can alter the state of M.

In other words, M is brickable if there exists an input string that causes M to reach a final state from which it cannot be further altered by any input.

Am I being coherent here? If so, is this a non-trivial concept?

Comment by Yitz (yitz) on The Waluigi Effect (mega-post) · 2023-03-03T21:47:25.228Z · LW · GW

Would you mind DMing me the prompt as well? Working on a post about something similar.

Comment by Yitz (yitz) on Evil autocomplete: Existential Risk and Next-Token Predictors · 2023-03-01T01:00:42.783Z · LW · GW

You’re right, but I’m not sure why you’re bringing that up here?

Comment by Yitz (yitz) on The Strangest Thing An AI Could Tell You · 2023-02-28T09:48:55.691Z · LW · GW

"I can't believe that"