Posts

"How could I have thought that faster?" 2024-03-11T10:56:17.884Z
Dual Wielding Kindle Scribes 2024-02-21T17:17:58.743Z
[Repost] The Copenhagen Interpretation of Ethics 2024-01-25T15:20:08.162Z
EPUBs of MIRI Blog Archives and selected LW Sequences 2023-10-26T14:17:11.538Z
An EPUB of Arbital's AI Alignment section 2023-10-16T19:36:29.109Z
[outdated] My current theory of change to mitigate existential risk by misaligned ASI 2023-05-21T13:46:06.570Z
mesaoptimizer's Shortform 2023-02-14T11:33:14.128Z

Comments

Comment by mesaoptimizer on Lucie Philippon's Shortform · 2024-04-24T23:25:25.575Z · LW · GW

The main part of the issue was actually that I was not aware I had internal conflicts. I just mysteriously felt less emotions and motivation.

Yes, I believe that one can learn to entirely stop even considering certain potential actions as actions available to us. I don't really have a systematic solution for this right now aside from some form of Noticing practice (I believe a more refined version of this practice is called Naturalism but I don't have much experience with this form of practice).

Comment by mesaoptimizer on Lucie Philippon's Shortform · 2024-04-24T23:21:13.484Z · LW · GW

What do you think antidepressants would be useful for?

In my experience I've gone months through a depressive episode while remaining externally functional and convincing myself (and the people around me) that I'm not going through a depressive episode. Another thing I've noticed is that with medication (whether anxiolytics, antidepressants or ADHD medication), I regularly underestimate the level at which I was 'blocked' by some mental issue that, after taking the medication, would not exist, and I would only realize it previously existed due to the (positive) changes in my behavior and cognition.

Essentially, I'm positing that you may be in a similar situation.

Comment by mesaoptimizer on Lucie Philippon's Shortform · 2024-04-23T19:47:30.733Z · LW · GW

Have you considered antidepressants? I recommend trying them out to see if they help. In my experience, antidepressants can have non-trivial positive effects that can be hard-to-put-into-words, except you can notice the shift in how you think and behave and relate to things, and this shift is one that you might find beneficial.

I also think that slowing down and taking care of yourself can be good -- it can help build a generalized skill of noticing the things you didn't notice before that led to the breaking point you describe.

Here's an anecdote that might be interesting to you: There's a core mental shift I made over the past few months that I haven't tried to elicit and describe to others until now, but in essence it involves a sort of understanding that the sort of self-sacrifice that usually is involved in working as hard as possible leads to globally unwanted outcomes, not just locally unwanted outcomes. (Of course, we can talk about hypothetical isolated thought experiments and my feelings might change, but I'm talking about a holistic relating to the world here.)

Here's one argument for this, although I don't think this captures the entire source of my feelings about this: When parts of someone is in conflict, and they regularly are rejecting a part of them that wants something (creature comforts) to privilege the desires of another part of them that wants another thing (work more), I expect that their effectiveness in navigating and affecting reality is lowered in comparison to one where they take the time to integrate the desires and beliefs of the parts of them that are in conflict. In extreme circumstances, it makes sense for someone to 'override' other parts (which is how I model the flight-fight-fawn-freeze response, for example), but this seems unsustainable and potentially detrimental when it comes to navigating a reality where sense-making is extremely important.

Comment by mesaoptimizer on When is a mind me? · 2024-04-18T12:27:15.457Z · LW · GW

This is a very interesting paper, thanks.

Comment by mesaoptimizer on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T12:42:50.617Z · LW · GW

What was the requirement? Seems like this was a deliberate effect instead of a side effect.

Comment by mesaoptimizer on Paul Christiano named as US AI Safety Institute Head of AI Safety · 2024-04-17T12:41:19.319Z · LW · GW

which I know you object to

Buck, could you (or habryka) elaborate on this? What does Buck call the set of things that ARC theory and METR (formerly known as ARC evals) does, "AI control research"?

My understanding is that while Redwood clearly does control research, METR evals seem more of an attempt to demonstrate dangerous capabilities than help with control. I haven't wrapped my head around ARC's research philosophy and output to confidently state anything.

Comment by mesaoptimizer on Richard Ngo's Shortform · 2024-04-05T08:37:26.554Z · LW · GW

If you haven't read CEV, I strongly recommend doing so. It resolved some of my confusions about utopia that were unresolved even after reading the Fun Theory sequence.

Specifically, I had an aversion to the idea of being in a utopia because "what's the point, you'll have everything you want". The concrete pictures that Eliezer gestures at in the CEV document do engage with this confusion, and gesture at the idea that we can have a utopia where the AI does not simply make things easy for us, but perhaps just puts guardrails onto our reality, such that we don't die, for example, but we do have the option to struggle to do things by ourselves.

Yes, the Fun Theory sequence tries to communicate this point, but it didn't make sense to me until I could conceive of an ASI singleton that could actually simply not help us.

Comment by mesaoptimizer on Richard Ngo's Shortform · 2024-04-05T08:32:21.931Z · LW · GW

I dropped the book within the first chapter. For one, I found the way Bostrom opened the chapter as very defensive and self-conscious. I imagine that even Yudkowsky wouldn't start a hypothetical 2025 book with fictional characters caricaturing him. Next, I felt like I didn't really know what the book was covering in terms of subject matter, and I didn't feel convinced it was interesting enough to continue the meandering path Nick Bostrom seem to have laid out before me.

Eliezer's CEV document and the Fun Theory sequence were significantly more pleasant experiences, based on my memory.

Comment by mesaoptimizer on Why The Insects Scream · 2024-03-23T09:25:50.441Z · LW · GW

we don’t give a shit about morality. Instead, we care about social norms that we can use to shame other people, masquerading under the banner of morality.

I think that basically all of moral cognition, actually.

Caring about others to me seems to be entirely separate from moral cognition. (Note that this may be a controversial statement and it is on my to-do list to make a detailed argument for this claim.)

Comment by mesaoptimizer on Tuning your Cognitive Strategies · 2024-03-18T11:35:17.569Z · LW · GW

If you are willing to share, can you say more about what got you into this line of investigation, and what you were hoping to get out of it?

Burnt out after almost an year of focusing on alignment research. I wanted to take a break from alignment-ey stuff and also desired to systematically fix the root causes behind the fact that I hit what I considered burn-out.

I don’t feel like I have many issues/baggage/trauma

I felt similar when I began this, and my motivation was not to 'fix issues' in myself but more "hey I have explicitly decided to take a break and have fun and TYCS seems interesting let's experiment with it for a while, I can afford to do so".

Comment by mesaoptimizer on Cultivate an obsession with the object level · 2024-03-11T16:29:51.819Z · LW · GW

the former guides your exploration towards the most important domains, but the latter is necessary for a deep understanding of them.

Perhaps you meant vice versa? "Touching reality" seems more about details, while crucial considerations seems more about systematizing and "why?".

Comment by mesaoptimizer on "How could I have thought that faster?" · 2024-03-11T11:22:26.544Z · LW · GW

Bonus conversation from the root of the tree that is this Twitter thread:

Eliezer Yudkowsky: Your annual reminder that you don't need to resolve your issues, you don't need to deal with your emotional baggage, you don't need to process your trauma, you don't need to confront your past, you don't need to figure yourself out, you can just go ahead and do the thing.

Benquo: By revealed preferences almost no one wants to just go ahead and do the thing, even if they expect that things would go better for them if they did. Seems reasonable to try to figure out why that's the case and how to change it, starting with oneself.

Benquo: Most of this trying will be fake or counterproductive, for the same reasons people aren't doing the sensible object-level thing, but we don't get to assume or pretend our way out of a problem, we just get to investigate and think about it and try out various promising solutions.

Given my experiences with both TYCS-like methods and parts-work methods (which is what Benquo is likely proposing one invest in, instead), I'd recommend people invest more in learning and using parts-work techniques first, before they learn and try to use TYCS-like techniques.

Comment by mesaoptimizer on Tuning your Cognitive Strategies · 2024-03-11T11:15:41.869Z · LW · GW

As of writing, I have spent about four months experimenting with the Tune Your Cognitive Strategies (TYCS) method and I haven't gotten any visible direct benefits out of it.

Some of the indirect benefits I've gotten:

  • I discovered introspective ability and used that to get more insight about what is going on in my mind
  • I found out about the cluster of integration / parts-work based therapy techniques (such as Internal Family Systems), and have fixed some issues in the way I do things (eg. procrastinating on cleaning up my desk), and have also unraveled some deep issues I noticed (due to better introspective ability)

The biggest thing I've learned is that better introspective ability and awareness seems to be the most load-bearing skill underlying TYCS. I'm less enthusiastic about the notion that you can 'notice your cognitive deltas' in real-time almost all the time -- this seems quite costly.

Note that Eliezer has also described that he does something similar. And more interestingly, it seems like Eliezer prefers to invest in what I would call 'incremental optimization of thought' over 'fundamental debugging':

EY: Your annual reminder that you don't need to resolve your issues, you don't need to deal with your emotional baggage, you don't need to process your trauma, you don't need to confront your past, you don't need to figure yourself out, you can just go ahead and do the thing.

On one hand, you could try to use TYCS or Eliezer's method to reduce the cognitive work required to think about something. On the other hand, you could try to use integration-based methods to solve what I would consider 'fundamental issues' or deeper issues. The latter feels like focusing on the cognitive equivalent of crucial considerations, the the former feels like incremental improvements.

And well, Eliezer has seemed to be depressed for quite a while now, and Maia Pasek killed herself. Both of these things I notice seem like evidence for my hypothesis that investing in incremental optimization of the sort that is involved in TYCS and Eliezer's method seems less valuable than the fundamental debugging that is involved in integration / parts-work mental techniques, given scarce cognitive resources.

For the near future, I plan to experiment with and use parts-work mental techniques, and will pause my experimentation and exploration of TYCS and TYCS-like techniques. I expect that there may be a point at which one has a sufficiently integrated mind such that they can switch to mainly investing in TYCS-like techniques, which means I'll resume looking into these techniques in the future.

Comment by mesaoptimizer on Dual Wielding Kindle Scribes · 2024-02-23T10:55:25.037Z · LW · GW

This is very interesting. I used to use org-roam and also experimented with other zettelkasten software over the past few years, but eventually it all grew very overwhelming because of the problem of updating notes. The bigger your note pile, the bigger the blocker (it seems to me at least) of updating your notes as you get a better understanding of reality.

Could you elaborate more on your setup, especially your knowledge base and how you use it?

Comment by mesaoptimizer on Dual Wielding Kindle Scribes · 2024-02-21T22:33:30.313Z · LW · GW

Would a paper notepad have worked for you instead of a second device? What’s better with the device?

Answered here, but TLDR is joy of using the Scribe, aversion to using notepads, and a worry of losing logs of what I wrote if written on paper.

Comment by mesaoptimizer on Dual Wielding Kindle Scribes · 2024-02-21T22:29:44.423Z · LW · GW

Based on a focusing-style attempt at understanding why, it seems like there's a certain sense of pleasure and delight associated with using the Kindle Scribe to write (and read) on, in my mind, and a sense of inelegance and awkwardness associated with writing on paper. Whatever experiences, beliefs, and sense of aesthetics underlies this is probably the driving factor.

I did have access to notebooks and pens and whiteboards when I bought the second Kindle Scribe, but hesitated to use any of them for writing down my thoughts. One thing that comes to mind when I imagine such alternatives is that I fear losing a log of what I thought and wrote, and I didn't imagine doing so if I wrote it on the Scribe.

Comment by mesaoptimizer on Dual Wielding Kindle Scribes · 2024-02-21T22:20:47.195Z · LW · GW

couldn’t stand the kindle interface for books/​notes

This is in comparison to using Emacs. When using Emacs as my interface for reading books and writing notes,

  • I can use a familiar UNIX file system to store my books as PDFs and EPUBs. I can easily back it up and interact with my collection using other tools I have (which is a real benefit of using a general purpose computing device). With the Kindle, creating and managing collections (an arbitrary category-based way of organizing your documents) is awkward enough (you need to select one book at a time and add it to a collection) given my experience with the last Kindle Scribe firmware that I just relied on the search bar to search for book titles.
  • When using Emacs, I can do a full text search of my notes file simply by pressing "s" (a keybind) and then typing a string. In contrast, while the notes written on the Scribe can be exported as PDFs, you don't have the ability to search your notes. This wasn't a dealbreaker for me, though, to be clear.

It is a bit hard to point at the things that make me want to use Emacs for it, because a load-bearing element is my desire to do everything in Emacs. Emacs has in-built documentation for its internals and almost every part of it is configurable -- which means you can optimize your setup to be exactly as you like it. It feels like an extension of you, eventually.

This also somewhat drives my desire to use a simple and (eventually, given enough investment) understandable operating system that doesn't shift beneath my feet. And given that both the interface and the operating system of the Kindle Scribe are opaque and (eventually) leaky abstractions, I feel less enthusiastic about investing my efforts into adapting myself to it.

Comment by mesaoptimizer on Dual Wielding Kindle Scribes · 2024-02-21T19:32:38.275Z · LW · GW

The Kindle Scribe, IIRC has a 18 ms latency for rendering whatever you write on it using the Wacom-like pen. I believe that was the lowest latency you could get at the time (the Apple Pencil on iPads supposedly has a latency of 7-10 ms, but they use some sort of software to predict what you'll do next, so that doesn't count in my opinion).

I found the experience of writing notes on the Kindle Scribe great! It was about as effortless as writing on paper, with the advantage of being able to easily erase what I wrote with the flip of the Premium Pen. There are tail annoyances, but that didn't seem to me to be worse than the tail annoyances of using physical pen and paper (whether gel ink, fountain pens, or ball point pens).

Writing on the Scribe does drain your battery faster. The number that comes to mind is that you can write on it continually for about eight hours before you wholly drain the battery, while if you only read on it, you don't need to charge the Kindle for weeks.

I recommend the My Deep Guide Youtube channel for in-depth information about various e-readers and if you want to get up to speed on the current e-reader zeitgeist.

Comment by mesaoptimizer on Here's the exit. · 2024-02-15T14:40:25.568Z · LW · GW

Notice how that last sentence is in fact caveated, but it’s still confident. I’m quite sure this is my supposition. I’m sure I’m not sure of the implied conclusion. I feel solid in all of this.

Perhaps relevant: Nate Soares does this too, based on one of his old essays. And I think it works very well for him.

Comment by mesaoptimizer on the gears to ascenscion's Shortform · 2024-02-14T11:36:21.606Z · LW · GW

Note: I don't have the energy or prioritize this enough to make this message more succinct. But I feel like I have communicated the core things I wanted to.

The gears to ascension is a “blowhard” as you put it, that people have heard of who makes assertions without defending them, and then who gets criticized for having a name that confidently asserts correctness on top of that.

I think it is okay to make assertions without defending them -- there's a cost to defending your assertions and your messages can be written with certain audiences and goals in mind that might make defending your assertions not relevant or not worth the effort.

Are you sure that your username causes people to criticize you for confidently asserting correctness? At least from personal experience, I've noticed that most people who choose their usernames and profile pictures on the internet do so as a way to communicate certain aesthetics -- non-content based information about themselves. It is about identity and fun. I think most people learn to separate the username aesthetics from the epistemic prior of a person. I know I have.

"The gears of ascension" is an interesting name. It is memorable. Paired with a rather abrasive commenting strategy in end of 2022 and the beginning of 2023, your comments annoyed me enough that I put your LW account on ignore (until about March 2023, when I saw your writings / messages on certain Discord servers). This, however, did not involve me ever thinking that your username implied / promised something specific about your content. I like your username, because it communicates something about your desires and how you see yourself and your aesthetics.

Carrying the name “often wrong” feels more in the spirit of this site, anyhow.

When I imagine myself doing this, the use of "often wrong" in one's username feels... defensive. It feels like I'm trying to pre-emptively lower people's epistemic priors for me so that I don't get punished for being wrong. This does make sense certain zero sum environments, one where I don't want to be singled out or noticed for making mistakes, because that leads to being blamed and being isolated and kicked out. This however seems counterproductive from a positive sum epistemic system standpoint, one where you want people to engage in accurate credit assignment to other people's claims. If one develops a reputation for 'being wrong', then that is useful for the system's function since their claims are given less weight. As long as this is paired with, say, a UBI-equivalent quality of life okayness for the wrong entity in this system, it doesn't seem bad. After all, the global epistemics of the system sure is better.

You think Eliezer would say he's often wrong? Carrying the name "often wrong" is not in the spirit of this site. The belief that one is often wrong is supposed to be individual, you knowing this and stating this to yourself. It isn't intended to be a blanket statement you tell other people that you prefix your claims with.

If I can’t be respected under this name, so be it, and that’s sort of the idea—I don’t want my name to carry respect. I want individual comments evaluated for their validity.

So changing your name, in some ways, is destruction of common knowledge, because people have built up a rich mental model of your beliefs, your epistemics, and the domains where you are mentally robust or mentally fragile.

People with actual impressive education would look down on my name while people without it would look up to it because it sounds all fancy and transhumanist in ways that don’t match my accomplishments.

I'd argue your current username might also cause "people with actual impressive education" (who don't decouple username vibes from content epistemic prior) to be less open to reading your comments. There's no point in caring about the opinions of people who seem to get impressed by your username either, I don't think their efforts are relevant to your goals.

My every comment should stand on its own, and the fact that they do not was being ignored too easily because my name was memorable.

No, throwing away information is sub-optimal for group epistemics. Your name gives me context. When you comment on, say, a post by Tsvi, and state that you feel optimistic about his models, it gives me an idea of where your mind is at, what research skills you value and are learning, what your alignment models are (or are shifting towards, given what I know of your alignment model). This helps me figure out how to make good things happen that might involve recommending stuff to you that you might be interested in, for example.

The fact that your name is memorable is useful for this.

I don't think I've very well described my intuitions about accurate credit assignment and reputation and group epistemics, but I'm trying to point in that direction, and I hope I've at least succeeded, even if I haven't given you a clear and coherent model of this.

Comment by mesaoptimizer on Scale Was All We Needed, At First · 2024-02-14T10:43:03.547Z · LW · GW

I have read this before. Is this a repost?

Comment by mesaoptimizer on Upgrading the AI Safety Community · 2024-02-13T13:25:09.171Z · LW · GW

"Why should we have to recruit people? Or train them, for that matter? If they're smart/high-executive-function enough, they'll find their way here".

Note: CFAR has had been a MIRI hiring pipeline for years, and they also seemed to function as a way of upskilling people in CFAR-style rationality, which CFAR thought was the load-bearing bits required to turn someone into a world saver.

Comment by mesaoptimizer on Dreams of AI alignment: The danger of suggestive names · 2024-02-11T17:08:29.547Z · LW · GW

This is a pretty good essay, and I'm glad you wrote it. I've been thinking similar thoughts recently, and have been attempting to put them into words. I have found myself somewhat more optimistic and uncertain about my models of alignment due to these realizations.

Anyway, on to my disagreements.

It’s hard when you’ve[2] read Dreams of AI Design and utterly failed to avoid same mistakes yourself.

I don't think that "Dreams of AI Design" was an adequate essay to get people to understand this. These distinctions are subtle, and as you might tell, not an epistemological skill that comes native to us. "Dreams of AI Design" is about confusing the symbol with the substance -- '5 with 5, in Lisp terms, or the variable name five with the value of 5 (in more general programming language terms). It is about ensuring that all the symbols you use to think with actually are mapping onto some substance. It is not about the more subtle art of noticing that you are incorrectly equivocating between a pre-theoretic concept such as "optimization pressure" and the actual process of gradient updates. I suspect that Eliezer may have made at least one such mistake that may have made him significantly more pessimistic about our chances of survival. I know I've made this mistake dozens of times. I mean, my username is "mesaoptimizer", and I don't endorse that term or concept anymore as a way of thinking about the relevant parts of the alignment problem.

It’s hard when your friends are using the terms, and you don’t want to be a blowhard about it and derail the conversation by explaining your new term.

I've started to learn to be less neurotic about ensuring that people's vaguely defined terms actually map onto something concrete, mainly because I have started to value the fact that these vaguely defined terms, if not incorrectly equivocated, hold valuable information that we might otherwise lose. Perhaps you might find this helpful.

When I try to point out such (perceived) mistakes, I feel a lot of pushback, and somehow it feels combative.

I empathize with those pushing back, because to a certain extent it seems like what you are stating seems obvious to someone who has learned to translate these terms into the more concrete locally relevant formulations ad-hoc, and given such an assumption, it seems like you are making a fuss about something that doesn't really matter and in fact even reaching for examples to prove your point. On the other hand, I expect that ad-hoc adjustment to such terms is insufficient to actually do productive alignment research -- I believe that the epistemological skill you are trying to point at is extremely important for people working in this domain.

I'm uncertain about how confused senior alignment researchers are when it comes to these words and concepts. It is likely that some may have cached some mistaken equivocations and are therefore too pessimistic and fail to see certain alignment approaches panning out, or too optimistic and think that we have a non-trivial probability of getting our hands on a science accelerator. And deference causes a cascade of everyone (by inference or by explicit communication) also adopting these incorrect equivocations.

Comment by mesaoptimizer on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-11T15:37:49.405Z · LW · GW

Be careful though that we’re not just dealing with a group of people here.

Yes, I am proposing a form of systemic analysis such that one is willing to look at multiple levels of the stack of abstractions that make up the world ending machine. This can involve aggressive reductionism, such that you can end up modeling sub-systems and their motivations within individuals (either archetypal ones or specific ones), and can involve game theoretic and co-ordination focused models of teams that make up individual frontier labs -- their incentives, their resource requirements, et cetera.

Most people focus on the latter, far fewer focus on the former, and I don't think anyone is even trying to do a full stack analysis of what is going on.

Comment by mesaoptimizer on Why I think it's net harmful to do technical safety research at AGI labs · 2024-02-08T22:04:08.665Z · LW · GW

While I share your sentiment, I expect that the problem is far more complex than we think. Sure, corporations are made of people, and people believe (explicitly or implicitly) that their actions are not going to lead to the end of humanity. The next question, then, is why do they believe this is the case? There are various attempts to answer to this question, and different people have different approaches to attempting to reduce x-risk given their answer to this question -- see how MIRI and Conjecture's approaches differ, for example.

This is, in my opinion, a viable line of attack, and is far more productive than pure truth-seeking comms (which is what I believe MIRI is trying) or an aggressive narrative shifting and policy influencing strategy (which is what I believe Conjecture is trying).

Comment by mesaoptimizer on My guess at Conjecture's vision: triggering a narrative bifurcation · 2024-02-08T12:33:45.922Z · LW · GW

This is likely the most clear and coherent public model of Conjecture's philosophy and strategy that I've encountered in public, and I'm glad it exists.

Comment by mesaoptimizer on Don't sleep on Coordination Takeoffs · 2024-01-30T09:49:31.471Z · LW · GW

Miscellaneous thoughts:

  1. The way you use the word Moloch makes me feel like it is an attempt to invoke a vague miasma of dread. If your intention was to coherently point at a cluster of concepts or behaviors, I'd recommend you use less flavorful terms, such as "inadequate stable equilibria", "zero-sum behavior that spreads like cancer", "parasitism and predation". Of course, these three terms are also vague and I would recommend using examples to communicate exactly what you are pointing at, but they are still less vague than Moloch. At a higher level I recommend looking at some of Duncan Sabien's posts for how to communicate abstract concepts from a sociological perspective.
  2. I've been investigating "Tuning Your Cognitive Strategies" off and on since November 2023, and I agree that it is interesting enough to be worth a greater investment in research efforts (including my own), but I believe that there are other skills of rationality that may be significantly more useful for people trying to save the world. Kaj Sotala's Multiagent sequence in my opinion is probably the one rationality research direction I think has the highest potential impact in enabling people in our community to do the things they want to do.
  3. The "Why our kind cannot cooperate" sequence, as far as I remember, is focused on what seem to be irrationalities-based failures of cooperation in our community. Stuff like mistaking contrarianism with being smart and high status, et cetera. I disagree with your attempt at using it as a reasoning to claim that the "bad guys" are predisposed to "victory".

If I was focused on furthering co-ordination, I'd take a step back and actually try to further co-ordination and see what issues I face. I'd try to build a small research team focused on a research project and see what irrational behavior and incentives I notice, and try to figure out systemic fixes. I'd try to create simple game theoretic models of interactions between people working towards making something happen and see what issues may arise.

I think CFAR was recently funding projects focused on furthering group rationality. You should contact CFAR, talk to some people thinking about this.

Comment by mesaoptimizer on [Repost] The Copenhagen Interpretation of Ethics · 2024-01-25T15:20:48.158Z · LW · GW

Alternate archive link: https://archive.is/sfy6t

Comment by mesaoptimizer on What Software Should Exist? · 2024-01-23T10:09:05.836Z · LW · GW
  1. A simple, understandable, and fully customizable web browser (like nyxt but usable as a daily driver.
  2. A program or service that allows multiple disparate communication services (email, Signal, Discord, Whatsapp) to all be interfaced with from one service (imagine a web page as your default client), that exposes API endpoints and allows users to write their own clients (so you can use it in emacs, for example). Bitlbee exists but I haven't gotten around to figuring out how to set it up yet
  3. A "personal assistant" style AI system that tracks your tasks, your agenda, your mood and preferences, the events and things you may be interested in, and suggests you things at the right times and in the right contexts.

The above are easy optimizations compared to the stuff below:

  1. Internet access everywhere, no exceptions, via wireless connectivity. Imagine Starlink but internet provided to you everywhere, reliably. Also you don't have to deal with the nonsense of switching SIM cards as you travel across continents.
  2. A fully open source hardware and software stack for devices that are actually usable.
  3. An e-ink display portable device (such as a laptop or a tablet) that uses relatively customizable operating systems like Linux or BSDs.
Comment by mesaoptimizer on Goodhart's Law inside the human mind · 2024-01-21T21:07:02.863Z · LW · GW

Could you elaborate on what you are pointing at here with an example? I'm unable to figure out what "core transformation for cognitive processing" would look like and what "act therapy for behavioral processing" would look like, even though I have some familiarity with all four concept clusters.

Comment by mesaoptimizer on What Software Should Exist? · 2024-01-21T20:12:27.903Z · LW · GW

Have you tried Kagi? It was too expensive for the utility it provided me, but I think it is worth at least trying it out.

Comment by mesaoptimizer on Gender Exploration · 2024-01-16T08:40:58.886Z · LW · GW

I now see that your claims may mainly be exhortations to yourself, or a more direct reflection of how you relate to yourself. I feel like I understand you better now, and I'm glad about it.

Comment by mesaoptimizer on Gender Exploration · 2024-01-15T19:45:28.250Z · LW · GW

And I really cannot think of much less masculine than being afraid of a scar.

Uh, you can be a guy and masculine while also being afraid of scars. I'm a bit amused at this line, because even after transitioning and detransitioning and (as far as I can tell) being intensely part of the queer community for years, you have to drop in a line that gatekeeps masculinity.

Comment by mesaoptimizer on [deleted post] 2024-01-15T11:14:34.572Z

I notice that the most value I got from your essay is a reminder of the core principles of naturalism, and an indicator / reminder that just observing is enough to make a significant amount of good things happen.

I did get confused when reading the first half of this essay, because I still don't know what it means to "hug the query". I could try to put it into words ("prefer more direct and strong evidence that reduces inferential distance, which makes your inference more robust to errors") but I don't have a felt-sense of what this would mean and no concrete examples come to mind immediately.

Reading your example, I feel like this didn''t match my felt-sense for what "hugging the query" seems to me (even as I was writing this line!), and after I spent a minute or so verifying this, I felt like I couldn't point out any way where this didn't make sense as an example of "hugging the query". Hugging the query, to me, feels like burning the hedge down, or trying to walk around the maze instead of solving it, or cutting through walls if I ever hit a dead end. I guess to me the 'anchor' is the endpoint in my head due how I envision the maze as a hedge maze. Imagining more restricted examples of mazes feels claustrophobic and makes my mind anchor on potential reasons for why I'm in such a maze, instead of trying to simply solve the maze, which is quite interesting! As far as I can tell, what I seem to be feeling here is another instance of what it seems to me to feel like to apply reduction to problems.

I have not done that work. I do not have PCK on this, and so I cannot tell you a straighter or easier path than the entire self-directed naturalist method itself.

Yes, it seems likely to me that a lot of rationalist skills cannot be easily taught in a standardized format. One-on-one teaching sessions work a lot better, with a teacher who understands to sort-of debug the student as they apply the skill, fail, notice interesting things, and refine their understanding of the art. The example that comes to mind is learning how to write math proofs.

I'm looking forward to your next essay.

Comment by mesaoptimizer on D0TheMath's Shortform · 2024-01-13T10:23:30.384Z · LW · GW

Here's an example of what I think you mean by "proofs and conclusions constructed in very abstracted, and not experimentally or formally verified math":

Given two intersecting lines AB and CD intersecting at point P, the angle measure of two opposite angles APC and BPD are equal. The proof? Both sides are symmetrical so it makes sense for them to be equal.

On the other hand, Lean-style proofs (which I understand you to claim to be better) involve multiple steps, each of which is backed by a reasoning step, until one shows that LHS equals RHS, which here would involve showing that angle APC = BPD:

  1. angle APC + angle CPB = 180 * (because of some theorem)
  2. angle CPB + angle BPD = 180 * (same)
  3. [...]
  4. angle APC = angle BPD (substitution?)

There's a sense in which I feel like this is a lot more complicated a topic than what you claim here. Sure, it seems like going Lean (which also means actually using Lean4 and not just doing things on paper) would lead to lot more reliable proof results, but I feel like the genesis of a proof may be highly creative, and this is likely to involve the first approach to figuring out a proof. And once one has a grasp of the rough direction with which they want to prove some conjecture, then they might decide to use intense rigor.

To me this seems to be intensely related to intelligence (as in, the AI alignment meaning-cluster of that word). Trying to force yourself to do things Lean4 style when you can use higher level abstractions and capabilities, feels to me like writing programs in assembly when you can write them in C instead.

On the other hand, it is the case that I would trust Lean4 style proofs more than humanly written elegance-backed proofs. Which is why my compromise here is that perhaps both have their utility.

Comment by mesaoptimizer on Reflections on my first year of AI safety research · 2024-01-08T20:47:21.408Z · LW · GW

I appreciate you writing this! It really helped me get a more concrete sense of what it is like for new alignment researchers (like me) to be aiming to make good things happen.

Owain asked us not to publish this for fear of capabilities improvements

Note that "capabilities improvements" can mean a lot of things here. The first thing that comes to mind is that publicizing this differentially accelerates the amount of damage API users could do with access to SOTA LLMs, which makes sense to me. It also makes sense to me that Owain would consider publishing this idea not worth the downside, simply because there's not much benefit to publicizing this, for alignment researchers and capabilities researchers, off the top of my head. OpenAI capabilities people probably have already tried such experiments internally and know of this, and alignment researchers probably wouldn't be able to build on top of this finding (here I mostly have interpretability researchers in mind).

Sometimes I needed more information to work on a task, but I tended to assume this was my fault. If I were smarter, I would be able to do it, so I didn’t want to bother Joseph for more information. I now realise this is silly—whether the fault is mine or not, if I need more context to solve a problem, I need more context, and it helps nobody to delay asking about this too much.

Oh yeah I have had this issue many (but not all of the) times with mentors in the past. I suggest not simply trying to rationalize that emotion away though, and perhaps try to actually debug it. "Whether the fault is mine or not", sure but if my brain tracks whether I am an asset or a liability to the project, then my brain is giving me important information in the form of my emotions.

Anyway, I'm glad you now have a job in the alignment field!

Comment by mesaoptimizer on Johannes C. Mayer's Shortform · 2024-01-08T18:09:02.393Z · LW · GW

So you seem to be doing a top down reasoning here, going from math to a model of the human brain. I didn't actually have something like that in mind, and instead was doing bottom up reasoning, where I had a bunch of experiences involving people that gave me a sense for what it means to (1) do vibes-based pattern-matching, and (2) also get a sense for which when you should trust and not trust your intuitions. I really don't think it is that hard, actually!

Also your Remnote link is broken, and I think it is pretty cool that you use Remnote.

Comment by mesaoptimizer on We shouldn't fear superintelligence because it already exists · 2024-01-07T23:23:43.650Z · LW · GW

Please read Nick Bostrom's "Superintelligence", it would really help you understand where everyone here has in mind when they talk about AI takeover.

Comment by mesaoptimizer on Johannes C. Mayer's Shortform · 2024-01-06T10:37:47.165Z · LW · GW

This seems to be a decent litmus test for whether ppl have actual sensors for evidence/gears, or whether they’re just doing (advanced) vibes-based pattern-matching.

If only. Advanced vibes-based pattern-matching is useful when your pattern-matching algorithm is optimized for the distribution you are acting in.

Comment by mesaoptimizer on Johannes C. Mayer's Shortform · 2024-01-06T10:34:26.302Z · LW · GW

Can you explain why you use "hopefwl" instead of "hopeful"? I've seen this multiple times in multiple places by multiple people but I do not understand the reasoning behind this. This is not a typo, it is a deliberate design decision by some people in the rationality community. Can you please help me undertand.

Comment by mesaoptimizer on MonoPoly Restricted Trust · 2024-01-05T10:46:37.464Z · LW · GW

I notice that you go 'principles / ethics first, then emotions' in the way you seem to reason about things in your comment. I find that I endorse the opposite: 'emotions first, then principles / ethics'. That is, I trust that my emotional core informs what I care about, and why and how I care about something, significantly more than whatever I believe or claim my principles are. And then I investigate my emotions, after putting a high importance on them making sense. (You can interpret this extremely uncharitably and claim that I have no principles whatsoever, but this is a low-effort attempt by me to elicit something I notice and am trying to point at, that is deeper than words and involves cognitive algorithms that mostly aren't verbal.) This is kind of why I asked the questions from an emotions-first perspective.

This is hypothetical, but what I would want to do is go through the rationale: exactly why do you have this preference? Ok, you bring up this reason; is that your true objection, or do you still object to situations where that doesn’t apply? There would likely be a lot of iterations of this, as outlined in the GP comment. Possible outcomes: (a) she converts to polyamory, (b) she admits it’s an irrational preference but nevertheless she holds it, (c) she finds the process some combination of insulting, unpleasant, and lowering her trust in me, and it doesn’t lead to a constructive end. I expect the result would be (c) for most people who aren’t, like, >95th percentile devoted to the ideal of “clear rational thought, and getting offended is low-status”

I expect people on the other end of this conversation would feel pressured and uncomfortable and forced to accept some logically reasoned argument for something that they don't feel comfortable about. I wouldn't want to subject people to such conversations, because I don't expect this would actually change their opinion or result in outcomes they would reflectively endorse. I think this is downstream of you believing your way of reasoning about things might help or apply to other people -- because I do the exact same thing when trying to help people or even elicit a more accurate model of their beliefs (see the questions I asked you for example).

At worst it might lead to self-doubt or something; but being angry at either of them seems stupid.

Yeah, I don't think anger is the emotion most often associated with the emotional distress one would experience if they see someone they consider their partner having romantic or sexual interactions with another person. I don't think most people in the rationalist community who seem to be more comfortable in monogamous relationships would agree with that statement, and this IMO is an uncharitable interpretation of what goes on in their heads.

Or, I guess, in context, it could constitute a broken promise [like if she’s not using protection with a new partner] or a lie or something—that would probably be the worst, and being angry at that is reasonable.

It seems like you police your emotions, and dislike feeling emotions that seems 'unreasonable' to you. This is interesting. I think ymeskhout accepts and seems to endorse all emotions he feels, and I try to do similar. I think that is genuinely a better way of doing things than the opposite.

I don't think I have a better mechanistic understanding of my friends who seem to have similar romantic and sexual orientations due to this conversation, partially because most of them seem to also follow a significant amount of 'emotions first' decision-making, and therefore I think it is unlikely that your mindspace is close enough to theirs that I understand them better. I've tried hard to understand them though, and I'm glad I feel like I understand better where you are coming from.

Comment by mesaoptimizer on MonoPoly Restricted Trust · 2024-01-05T10:07:40.693Z · LW · GW

Based on the conversation I have read here, it seems like ymeskhout is okay with not being 'fully rational' or being Dutch-bookable in certain ways, and I think he's interpreting some of your hypotheticals as qualitatively in the same class as Pascal's Wagers, and is making a sensible decision to go with his gut and to simply ignore them when making decisions about his relationships.

Comment by mesaoptimizer on MonoPoly Restricted Trust · 2024-01-04T17:00:12.019Z · LW · GW

I'm curious, because I feel like I can understand where you are coming from:

  • Does it feel disorienting when dealing with spoken or unspoken rules that go into dealing with a monogamous relationship? Like it is difficult to understand what they want, or that it is irrational and frustrating?
  • Do you have a felt aversion to feeling like your partner (you could use a hypothetical here, or one of your existing or previous partners) is restricting you through expectations that interfere with your freedom of interacting with other people? How intense is this feeling?
  • Do you have a similar felt aversion to feeling like you are restricting your partner in similar ways? How intense is this feeling?
  • What do you think about relationship anarchy?

Oh yeah, also could you mention your gender, sexual orientation and intensity of sexual drive? If you are, say, female asexual romantic, the context of your felt senses for the questions above would probably be quite different compared to that of, say, a male heterosexual heteroromantic.

Comment by mesaoptimizer on A hermeneutic net for agency · 2024-01-03T11:21:35.270Z · LW · GW

This section is probably my favorite thing you (Tsvi) have written, and motivated me to read through all your alignment related posts on your blog.

Before I read that passage, I was confident that deconfusion research was the highest value thing I could be doing (and getting better at), but I did not have a succinct way of communicating the fact that me seeming confused about a certain concept is not a sign that I have worse understanding about the problem involved compared to someone who doesn't seem confused.

There's a misconception where most people pattern match confidence in one's understanding of a concept / domain with better understanding of the domain, while vagueness in description of a concept as someone not quite understanding the domain. I notice hints of these even in rationalist friends I have, the ones who have read The Sequences and have a strong aversion to stuff that, in their head, pattern matches to making basic rationality mistakes. Reading this passage helped me have a handle on why I felt that my epistemic state was still better than that of others who seemed more confident in their claims.

Also, I feel like this somewhat relates to Eliezer's aversion to bio-anchors and concrete 'base rates', but I don't yet have a good way of clarifying it in my head.

Comment by mesaoptimizer on Shortform · 2023-12-31T16:34:21.991Z · LW · GW

I disagree. There's a lot of low-hanging fruit in the AI waifu space[1]. Lack of internal emotion or psychological life? Just simulate internal monologue. Lack of long-term memory? Have the AI waifu keep a journal. Lack of visuals? Use a LoRA fine-tuned diffusion model alongside the text chat.

I'd be building my own AI waifu startup if we didn't face x-risks. It seems fun (like building your own video game), and probably a great benefit to its users.

Also, lonely men will not be the only (or even primary) user demographic. Women seem to read a lot of erotica. I expect that this is an untapped market of users, one that pandering to will not make your startup look low status either.

[1]: Not using the word "girlfriend" here because I'd like to use a more gender-neutral term, and "waifu" seems pretty gender-neutral to me, and to one target demographic of such services.

Comment by mesaoptimizer on Projects I would like to see (possibly at AI Safety Camp) · 2023-12-30T10:21:38.554Z · LW · GW

I now agree with your sentiment here and don't think my request when I made that comment was very sensible. It does seem like going from an informal / not-fully-specified argument to a fully specified argument is extremely difficult and unlikely to be worth the effort in convincing people who would already be convinced by extremely sensible but not fully formalized arguments.

It does seem to me that a toy model that is not fully specified is still a big deal when it comes to progress in communicating what is going on though.

I might look into the linked post again in more detail and seriously consider it and wrap my head around it. Thanks for this comment.

Comment by mesaoptimizer on EA orgs' legal structure inhibits risk taking and information sharing on the margin · 2023-12-28T20:20:38.609Z · LW · GW

AI Impacts is legally MIRI

As far as I know, AI Impacts does not seem to have any non-trivial positive impact on either the epistemics of the AI x-risk community, nor seems to have helped with any governance efforts.

Probably the best output by people seemingly affiliated with AI Impacts, that I have encountered, is Zach's sequence on Slowing AI.

On the other hand, here are two examples (one, two) that immediately came to mind when I thought of AI Impacts. These two essays describe a view of reality that seems utterly pre-Sequences to me. There's this idea that there's something inherently unpredictable about reality caused by chaos dynamics in complex systems that limits the sorts of capabilities that a superintelligence can have, and such an argument seems to imply that one should worry less about the possibility of superintelligent AGI systems ending humanity.

It seems like some of AI Impacts' research output goes against the very fundamental understanding that underpins why the creation of unaligned AGI is an extinction risk.

Is AI Impacts being funded by MIRI? I sure hope not.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2023-11-30T08:42:39.301Z · LW · GW

If, hypothetically, you discovered some alleged epistemic rationality technique while doing paperwork, I would certainly want you to either explain how you applied this technique originally (with a worked example involving your paperwork), or explain how the reader might (or how you did) apply the technique to some other domain (with a worked example involving something else, not paperwork), or (even better!) both.

This seems sensible, yes.

It would be very silly to just talk about the alleged technique, with no demonstration of its purported utility.

I agree that it seems silly to not demonstrate the utility of a technique when trying to discuss it! I try to give examples to support my reasoning when possible. What I attempted to do with that one passage that you seemed to have taken offense to was show that I could guess at one causal cognitive chain that would have led Valentine to feel the way they did and therefore act and communicate the way they did, not that I endorse the way Kensho was written -- because I did not get anything out of the original post.

There’s a lot of “<whatever> seems like it could be true” in your comment.

Here's a low investment attempt to point at the cause of what seems to you a verbal tic:

I can tell you that when I put “it seems to me” at the front of so many of my sentences, it’s not false humility, or insecurity, or a verbal tic. (It’s a deliberate reflection on the distance between what exists in reality, and the constellations I’ve sketched on my map.)

-- Logan Strohl, Naturalism

If you need me to write up a concrete elaboration to help you get a better idea about this, please tell me.

Are you really basing your views on this subject on nothing more than abstract intuition?

My intuitions on my claim related to rationality skill seem to be informed by concrete personal experience, which I haven't yet described in length, mainly because I expected that using a simple plausible made-up example would serve as well. I apologize for not adding a "(based on experience)" in that original quote, although I guess I assumed that was deducible.

That page seems to be talking about a four-year-old child, who has not yet learned about space, how gravity works, etc. It’s not clear to me that there’s anything to conclude from this about what sorts of epistemic rationality techniques might be useful to adults.

I'm specifically pointing at examples of deconfusion here, which I consider the main (and probably the only?) strand of epistemic rationality techniques. I concede that I haven't provided you useful information about how to do it -- but that isn't something I'd like to get into right now, when I am still wrapping my mind around deconfusion.

More importantly, it’s not clear to me how any of your examples are supposed to be examples of “epistemic confusion [that] can be traced to almost unrelated upstream misconceptions”. Could you perhaps make the connection more explicitly?

For the gravity example, the 'upstream misconception' is that the kid did not realize that 'up and down' is relative to the direction in which Earth's gravity acts on the body, and therefore the kid tries to fit the square peg of "Okay, I see that humans have heads that point up and legs that point down" into the round hole of "Below the equator, humans are pulled upward, and humans heads are up, so humans' heads point to the ground".

For the AI example, the 'upstream misconception' can be[1] conflating the notion of intelligence with 'human's behavior and tendencies that I recognize as intelligence' (and this in turn can be due to other misconceptions, such as not understanding how alien the selection process that underlies evolution is; not understanding how intelligence is not the same as saying impressive things in a social party but the ability to squeeze the probability distribution of future outcomes into a smaller space; et cetera), and then making a reasoning error that seems like anthromorphizing an AI, and concluding that the more intelligent a system would be, the more it would care about the 'right things' that us humans seem to care about.

The second example is a bit expensive to elaborate on, so I will not do so right now. I apologize.

Anyway, I intended to write this stuff up when I felt like I understood deconfusion enough that I could explain it to other people.

Similarly, it seems plausible to me that while attempting to fix one issue (similar to attempting to fix a confusion of the sort just listed), one could find themselves making almost unrelated upstream epistemic discoveries that might just be significantly more valuable).

And… do you have any examples of this?

I find this plausible based on my experience with deconfusion and my current state of understanding of the skill. I do not believe I understand deconfusion well enough to communicate it to people who have an inferential distance as huge as the one between you and I, so I do not intend to try.

[1]: There are a myriad of ways you can be confused, and only one way you can be deconfused.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2023-11-28T23:43:54.205Z · LW · GW

I apologize for not providing a good enough example -- yes, it was made up. Here's a more accurate explanation of what causes me to believe that Valentine's sentiment has merit:

  • It seems to me that a lot of epistemic confusion can be traced to almost unrelated upstream misconceptions. Examples: thinking that people must be suspended upside down below the equator, once someone understands the notion of an approximately spherical Earth; the illusion that mirrors create horizontal asymmetry but retain vertical symmetry; the notion that an AGI will automatically be moral. Similarly, it seems plausible to me that while attempting to fix one issue (similar to attempting to fix a confusion of the sort just listed), one could find themselves making almost unrelated upstream epistemic discoveries that might just be significantly more valuable). I do acknowledge that these epistemic discoveries do also seem object-level and communicable, and I do think that the sentiment that Valentine showed could make sense.
  • It also seems that a lot of rationality skill involves starting out with a bug one notices ("hey, I seem to be really bad at going to the gym"), and then making multiple attempts to fix the problem (ideally focusing on making an intervention as close to the 'root' of the issue as possible), and then discovering epistemic rationality techniques that may be applicable in many places. I agree that it seems like really bad strategy to then not try to explain why the technique is useful by giving another example where the technique is useful and results in good object-level outcomes, instead of simply talking about (given my original example) paperwork for a sentence and then spending paragraphs talking about some rationality technique in the abstract.
Comment by mesaoptimizer on mesaoptimizer's Shortform · 2023-11-28T21:43:40.388Z · LW · GW

I notice that I find Valentine's posts somewhat insightful, and believe they point at incredibly neglected research directions, but I notice a huge the distance seems to exist between what Valentine intends to communicate and what most readers seem to get.

Off the top of my head:

  • Here's the Exit is written in a style that belies an astounding confidence that what Valentine says is applicable to the reader, no matter what. After a few commenters correctly critique the post, Valentine backs down and claims that it was meant to be an "invitation" for people who recognize themselves in the post to explore the thoughts that Valentine espouses in the post, and not a set of claims to evaluate. This feels slightly like a bait-and-switch, and worse, I feel like Valentine was acting in complete good faith while doing so, with a sort of very out-of-distribution model of communication they endorse for themselves.
  • In We're already in AI takeoff, Valentine seems to claim that humans should not try to intervene at the egregore-level, because we are too weak to do so. When someone points out that this may not necessarily be correct, Valentine clarifies that what they meant was more that humans should not use shoulds to try to get themselves to do something that one could be confident that they physically cannot accomplish, and that solving AI alignment or existential risk can be one example of such a thing for many people. Again, I notice how Valentine makes a ton of sense, and points [badly] at very valuable rationalist insights and concepts when requested to clarify, but the way they pointed at it in their OP was, in my opinion, disastrously bad.
  • The only thing I recall from Kensho is the Said baking metaphor for people not explaining why one should care about the meta-level thing by showing an object-level achievement done using the meta-level thing. And yet, I get the sentiment that Valentine seems to have been trying to communicate -- it sure seems like there are epistemic rationality techniques that seem incredibly valuable and neglected, and one could discover them in the course of doing something about as useless as paperwork, and talking about how you became more efficient at paperwork would seem like a waste of time to everyone involved.

The reason I wrote this down is because I think Valentine (and other people reading this) might find this helpful, and I didn't feel like it made sense to post this as a comment in any specific individual post.