"How could I have thought that faster?" 2024-03-11T10:56:17.884Z
Dual Wielding Kindle Scribes 2024-02-21T17:17:58.743Z
[Repost] The Copenhagen Interpretation of Ethics 2024-01-25T15:20:08.162Z
EPUBs of MIRI Blog Archives and selected LW Sequences 2023-10-26T14:17:11.538Z
An EPUB of Arbital's AI Alignment section 2023-10-16T19:36:29.109Z
[outdated] My current theory of change to mitigate existential risk by misaligned ASI 2023-05-21T13:46:06.570Z
mesaoptimizer's Shortform 2023-02-14T11:33:14.128Z


Comment by mesaoptimizer on Two easy things that maybe Just Work to improve AI discourse · 2024-06-09T20:12:41.671Z · LW · GW

I think Twitter systematically underpromotes tweets with links external to the Twitter platform, so reposting isn't a viable strategy.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2024-06-08T11:49:55.399Z · LW · GW

Thanks for the link. I believe I read it a while ago, but it is useful to reread it from my current perspective.

trying to ensure that AIs will be philosophically competent

I think such scenarios are plausible: I know some people argue that certain decision theory problems cannot be safely delegated to AI systems, but if we as humans can work on these problems safely, I expect that we could probably build systems that are about as safe (by crippling their ability to establish subjunctive dependence) but are also significantly more competent at philosophical progress than we are.

Comment by mesaoptimizer on jeffreycaruso's Shortform · 2024-06-05T15:14:10.113Z · LW · GW

Leopold's interview with Dwarkesh is a very useful source of what's going on in his mind.

What happened to his concerns over safety, I wonder?

He doesn't believe in a 'sharp left turn', which means he doesn't consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have doubled or quadrupled the OOMs of optimization power applied to the creation of SOTA models.

He models (incrementalist) alignment research as "dual use", and therefore effectively models capabilities and alignment as effectively the same measure.

He also expects humans to continue to exist once certain communities of humans achieve ASI, and imagines that the future will be 'wild'. This is a very rare and strange model to have.

He is quite hawkish -- he is incredibly focused on China not stealing AGI capabilities, and believes that private labs are going to be too incompetent to defend against Chinese infiltration. He prefers that the USGOV would take over the AGI development such that they can race effectively against AGI.

His model for take-off relies quite heavily on "trust the trendline" and estimating linear intelligence increases with more OOMs of optimization power (linear with respect to human intelligence growth from childhood to adulthood). Its not the best way to extrapolate what will happen, but it is a sensible concrete model he can use to talk to normal people and sound confident and not vague -- a key skill if you are an investor, and an especially key skill for someone trying to make it in the SF scene. (Note he clearly states in the interview that he's describing his modal model for how things will go and he does have uncertainty over how things will occur, but desires to be concrete about what is his modal expectation.)

He has claimed that running a VC firm means he can essentially run it as a "think tank" too, focused on better modeling (and perhaps influencing) the AGI ecosystem. Given his desire for a hyper-militarization of AGI research, it makes sense that he'd try to steer things in this direction using the money and influence he will have and build, as a founder of n investment firm.

So in summary, he isn't concerned about safety because he prices it in as something about as difficult (or slightly more difficult than) capabilities work. This puts him in an ideal epistemic position to run a VC firm for AGI labs, since his optimism is what persuades investors to provide him money since they expect him to attempt to return them a profit.

Comment by mesaoptimizer on Akash's Shortform · 2024-06-04T22:44:27.915Z · LW · GW

Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome.

Comment by mesaoptimizer on Akash's Shortform · 2024-06-04T22:31:07.498Z · LW · GW

If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.

Unsee the frontier lab.

Comment by mesaoptimizer on Prometheus's Shortform · 2024-06-04T22:24:50.754Z · LW · GW

These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don't understand why people have downvoted this comment. Here's an attempt to unravel my thoughts and potential disagreements with your claims.

AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.

I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like model of getting a prototypical AGI and dissecting it and figuring out how it works -- I think it is unlikely that any individual frontier lab would perceive itself to have the slack to do so. Any potential "dissection" tools will need to be developed beforehand, such as scalable interpretability tools (SAEs seem like rudimentary examples of this). The problem with "prosaic alignment" IMO is that a lot of this relies on a significant amount of schlep -- a lot of empirical work, a lot of fucking around. That's probably why, according to the MATS team, frontier labs have a high demand for "iterators" -- their strategy involves having a lot of ideas about stuff that might work, and without a theoretical framework underlying their search path, a lot of things they do would look like trying things out.

I expect that once you get AI researcher level systems, the die is cast. Whatever prosaic alignment and control measures you've figured out, you'll now be using that in an attempt to play this breakneck game of getting useful work out of a potentially misaligned AI ecosystem, that would also be modifying itself to improve its capabilities (because that is the point of AI researchers). (Sure, its easier to test for capability improvements. That doesn't mean you can't transfer information embedded into these proposals such that modified models will be modified in ways the humans did not anticipate or would not want if they had a full understanding of what is going on.)

Mentorship for safety is still limited. If you can get an industry safety job or get into MATS, this seems better than some random AI job, but most people can’t.

Yeah -- I think most "random AI jobs" are significantly worse for trying to do useful work in comparison to just doing things by yourself or with some other independent ML researchers. If you aren't in a position to do this, however, it does make sense to optimize for a convenient low-cognitive-effort set of tasks that provides you the social, financial and/or structural support that will benefit you, and perhaps look into AI safety stuff as a hobby.

I agree that mentorship is a fundamental bottleneck to building mature alignment researchers. This is unfortunate, but it is the reality we have.

Funding is also limited in the current environment. I think most people cannot get funding to work on alignment if they tried? This is fairly cruxy and I’m not sure of it, so someone should correct me if I’m wrong.

Yeah, post-FTX, I believe that funding is limited enough that you have to be consciously optimizing for getting funding (as an EA-affiliated organization, or as an independent alignment researcher). Particularly for new conceptual alignment researchers, I expect that funding is drastically limited since funding organizations seem to explicitly prioritize funding grantees who will work on OpenPhil-endorsed (or to a certain extent, existing but not necessarily OpenPhil-endorsed) agendas. This includes stuff like evals.

The relative impact of working on capabilities is smaller than working on alignment—there are still 10x as many people doing capabilities as alignment, so unless returns don’t diminish or you are doing something unusually harmful, you can work for 1 year on capabilities and 1 year on alignment and gain 10x.

This is a very Paul Christiano-like argument -- yeah sure the math makes sense, but I feel averse to agreeing with this because it seems like you may be abstracting away significant parts of reality and throwing away valuable information we already have.

Anyway, yeah I agree with your sentiment. It seems fine to work on non-SOTA AI / ML / LLM stuff and I'd want people to do so such that they live a good life. I'd rather they didn't throw themselves into the gauntlet of "AI safety" and get chewed up and spit out by an incompetent ecosystem.

Safety could get even more crowded, which would make upskilling to work on safety net negative. This should be a significant concern, but I think most people can skill up faster than this.

I still don't understand what causal model would produce this prediction. Here's mine: One big limiting factor to the amount of safety researchers the current SOTA lab ecosystem can handle is bottlenecked by their expectations for how many researchers they want or need. On one hand, more schlep during pre-AI-researcher-era means more hires. On the other hand, more hires requires more research managers or managerial experience. Anecdotally, it seems like many AI capabilities and alignment organizations (both in the EA space and in the frontier lab space) seemed to have been historically bottlenecked on management capacity. Additionally, hiring has a cost (both the search process and the onboarding), and it is likely that as labs get closer to creating AI researchers, they'd believe that the opportunity cost of hiring continues to increase.

Skills useful in capabilities are useful for alignment, and if you’re careful about what job you take there isn’t much more skill penalty in transferring them than, say, switching from vision model research to language model research.

Nah, I found very little stuff from my vision model research work (during my undergrad) contributed to my skill and intuition related to language model research work (again during my undergrad, both around 2021-2022). I mean, specific skills of programming and using PyTorch and debugging model issues and data processing and containerization -- sure, but the opportunity cost is ridiculous when you could be actually working with LLMs directly and reading papers relevant to the game you want to play. High quality cognitive work is extremely valuable and spending it on irrelevant things like the specifics of diffusion models (for example) seems quite wasteful unless you really think this stuff is relevant.

Capabilities often has better feedback loops than alignment because you can see whether the thing works or not. Many prosaic alignment directions also have this property. Interpretability is getting there, but not quite. Other areas, especially in agent foundations, are significantly worse.

Yeah this makes sense for extreme newcomers. If someone can get a capabilities job, however, I think they are doing themselves a disservice by playing the easier game of capabilities work. Yes, you have better feedback loops than alignment research / implementation work. That's like saying "Search for your keys under the streetlight because that's where you can see the ground most clearly." I'd want these people to start building the epistemological skills to thrive even with a lower intensity of feedback loops such that they can do alignment research work effectively.

And the best way to do that is to actually attempt to do alignment research, if you are in a position to do so.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2024-06-04T10:03:44.817Z · LW · GW

It seems like a significant amount of decision theory progress happened between 2006 and 2010, and since then progress has stalled.

Comment by mesaoptimizer on Alok Singh's Shortform · 2024-06-04T07:29:32.413Z · LW · GW

You are missing providing a ridiculous amount of context, but yes, if you are okay with leather footwear, Meermin provides great footwear at relatively inexpensive prices.

I still recommend thrift shopping instead. I spent 250 EUR on a pair of new noots from Meermin, and 50 EUR on a pair of thrifted boots which seem about 80% as aesthetically pleasing as the first (and just as comfortable since I tried them on before buying them).

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2024-06-02T20:20:34.760Z · LW · GW

It has been six months since I wrote this, and I want to note an update: I now grok what Valentine is trying to say and what he is pointing at in Here's the Exit and We're already in AI takeoff. That is, I have a detailed enough model of Valentine's model of the things he talks about, such that I understand the things he is saying.

I still don't feel like I understand Kensho. I get the pattern of the epistemic puzzle he is demonstrating, but I don't know if I get the object-level thing he points at. Based on a reread of the comments, maybe what Valentine means by Looking is essentially gnosis, as opposed to doxa. An understanding grounded in your experience rather than an ungrounded one you absorbed from someone else's claims. See this comment by someone else who is not Valentine in that post:

The fundamental issue is that we are communicating in language, the medium of ideas, so it is easy to get stuck in ideas. The only way to get someone to start looking, insofar as that is possible, is to point at things using words, and to get them to do things. This is why I tell you to do things like wave your arms about or attack someone with your personal bubble or try to initiate the action of touching a hot stove element.

Alternately, Valentine describes the process of Looking as "Direct embodied perception prior to thought.":

Most of that isn’t grounded in reality, but that fact is hard to miss because the thinker isn’t distinguishing between thoughts and reality.

Looking is just the skill of looking at reality prior to thought. It’s really not complicated. It’s just very, very easy to misunderstand if you fixate on mentally understanding it instead of doing it. Which sadly seems to be the default response to the idea of Looking.

I am unsure if this differs from mundane metacognitive skills like "notice the inchoate cognitions that arise in your mind-body, that aren't necessarily verbal". I assume that Valentine is pointing at a certain class of cognition, one that is essentially entirely free of interpretation. Or perhaps before 'value-ness' is attached to an experience -- such as "this experience is good because <elaborate strategic chain>" or "this experience is bad because it hurts!"

I understand how a better metacognitive skillset would lead to the benefits Valentine mentioned, but I don't think it requires you to only stay at the level of "direct embodied perception prior to thought".

As for kensho, it seems to be a term for some skill that leads you to be able to do what romeostevensit calls 'fully generalized un-goodharting':

I may have a better answer for the concrete thing that it allows you to do: it’s fully generalizing the move of un-goodharting. Buddhism seems to be about doing this for happiness/​inverse-suffering, though in principle you could pick a different navigational target (maybe).

Concretely, this should show up as being able to decondition induced reward loops and thus not be caught up in any negative compulsive behaviors.

I think that "fully generalized un-goodharting" is a pretty vague phrase and I could probably come up with a better one, but it is an acceptable pointer term for now. So I assume it is something like 'anti-myopia'? Hard to know at this point. I'd need more experience and experimentation and thought to get a better idea of this.

I believe that Here's the Exit, We're already in AI Takeoff, and Slack matters more than any outcome all were pointing at the same cluster of skills and thought -- about realizing the existence of psyops, systematic vulnerabilities or issues that leads you (whatever 'you' means) to forgetting the 'bigger picture', and that the resulting myopia causes significantly bad outcomes from the perspective of the 'whole' individual/society/whatever.

In general, Lexicogenesis seems like a really important sub-skill for deconfusion.

Comment by mesaoptimizer on jacquesthibs's Shortform · 2024-05-28T22:16:05.971Z · LW · GW

I've experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.

I haven't experimented with a scaffolded LLM agent for autoformalization, but I expect it won't go very well either, primarily because scaffolding involves attempts to make human-like implicit high-level cognitive strategies into explicit algorithms or heuristics such as tree of thought prompting, and I expect that this doesn't scale given the complexity of the domain (sufficently general autoformalizing AI systems can be modelled as effectively consequentialist, which makes them dangerous). I don't expect for a scaffolded (over Opus) LLM agent to succeed at autoformalizing quicksort right now either, mostly because I believe RLHF tuning has systematically optimized Opus to write the bottom line first and then attempt to build or hallucinate a viable answer, and then post-hoc justify the answer. (While steganographic non-visible chain-of-thought may have gone into figuring out the bottom line, it still is worse than first doing visible chain-of-thought such that it has more token-compute-iterations to compute its answer.)

If anyone reading this is able to build a scaffolded agent that autoformalizes (using Lean or Ada) algorithms of complexity equivalent to quicksort reliably (such that more than 5 out of 10 of its attempts succeed) within the next month of me writing this comment, then I'd like to pay you 1000 EUR to see your code and for an hour of your time to talk with you about this. That's a little less than twice my current usual monthly expenses, for context.

Comment by mesaoptimizer on Notifications Received in 30 Minutes of Class · 2024-05-26T17:36:48.963Z · LW · GW

This is very interesting, thank you for posting this.

Comment by mesaoptimizer on Executive Dysfunction 101 · 2024-05-26T09:28:04.502Z · LW · GW

the therapeutic idea of systematically replacing the concept “should” with less normative framings

Interesting. I independently came up with this concept, downstream of thinking about moral cognition and parts work. Could you point me to any past literature that talks about this coherently enough that you would point people to it to understand this concept?

I know that Nate has written about this:

As far as I recall, reading these posts didn't help me.

Comment by mesaoptimizer on james.lucassen's Shortform · 2024-05-25T20:37:22.630Z · LW · GW

Based on gwern's comment, steganography as a capability can arise (at rather rudimentary levels) via RLHF over multi-step problems (which is effectively most cognitive work, really), and this gets exacerbated with the proliferation of AI generated text that embeds its steganographic capabilities within it.

The following paragraph by gwern (from the same thread linked in the previous paragraph) basically summarizes my current thoughts on the feasibility of prevention of steganography for CoT supervision:

Inner-monologue approaches to safety, in the new skin of ‘process supervision’, are popular now so it might be good for me to pull out one point and expand on it: ‘process supervision’ does not necessarily prevent steganography, nor does it ensure safety, because you are still pitting three goals against each other—achieving superhuman capability on a task, achieving causal fidelity of transcripts, achieving human-readability of the transcripts. Choose two: you can have transcripts which record a capable thought process accurately but which you cannot understand in full detail; which are not capable but their transcripts accurately convey the fallible flawed concepts and reasoning used; or which are capable and you understand, but are not what it actually thought (because they are misleading, wrong, or shallow ‘lies to children’ sorts of explanations).

Comment by mesaoptimizer on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-23T07:49:02.835Z · LW · GW

Well, if you know relevant theoretical CS and useful math, you don’t have to rebuild the mathematical scaffolding all by yourself.

I didn't intend to imply in my message that you have mathematical scaffolding that you are recreating, although I expect it may be likely (Pearlian causality perhaps? I've been looking into it recently and clearly knowing Bayes nets is very helpful). I specifically used "you" to imply that in general this is the case. I haven't looked very deep into the stuff you are doing, unfortunately -- it is on my to-do list.

Comment by mesaoptimizer on Overconfidence · 2024-05-21T14:34:25.407Z · LW · GW

I do think that systematic self-delusion seems useful in multi-agent environments (see the commitment races problem for an abstract argument, and Sarah Constantin's essay "Is Stupidity Strength?" for a more concrete argument.

I'm not certain that this is the optimal strategy we have for dealing with such environments, and note that systematic self-delusion also leaves you (and the other people using a similar strategy to coordinate) vulnerable to risks that do not take into account your self-delusion. This mainly includes existential risks such as misaligned superintelligences, but also extinction-level asteroids.

Its a pretty complicated picture and I don't really have clean models of these things, but I do think that for most contexts I interact in, the long-term upside of having better models of reality is significantly higher compared to the benefit of systematic self-delusion.

Comment by mesaoptimizer on Overconfidence · 2024-05-21T13:45:56.243Z · LW · GW

According to Eliezar Yudkowsky, your thoughts should reflect reality.

I expect that the more your beliefs track reality, the better you'll get at decision making, yes.

According to Paul Graham, the most successful people are slightly overconfident.

Ah but VCs benefit from the ergodicity of the startup founders! From the perspective of the founder, its a non-ergodic situation. Its better to make Kelly bets instead if you prefer to not fall into gambler's ruin, given whatever definition of the real world situation maps onto the abstract concept of being 'ruined' here.

It usually pays to have a better causal model of reality than relying on what X person says to inform your actions.

Can you think of anyone who has changed history who wasn’t a little overconfident?

Survivorship bias.

It is advantageous to be friends with the kind of people who do things and never give up.

I think I do things and never give up in general, while I can be pessimistic about specific things and tasks I could do. You can be generally extremely confident in yourself and your ability to influence reality, while also being specifically pessimistic about a wide range of existing possible things you could be doing.

Here's a Nate post that provides his perspective on this specific orientation to reality that leads to a sort of generalized confidence that has social benefits.

Comment by mesaoptimizer on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-21T11:39:07.358Z · LW · GW

I wrote a bit about it in this comment.

I think that conceptual alignment research of the sort that Johannes is doing (and that I also am doing, which I call "deconfusion") is just really difficult. It involves skills that are not taught to people, that seems very unlikely that you'd learn by being mentored in traditional academia (including when doing theoretical CS or non-applied math PhDs), that I only started wrapping my head around after some mentorship from two MIRI researchers (that I believe I was pretty lucky to get), and even then I've spent a ridiculous amount of time by myself trying to tease out patterns to figure out a more systematic process of doing this.

Oh, and the more theoretical CS (and related math such as mathematical logic) you know, the better you probably are at this -- see how Johannes tries to create concrete models of the inchoate concepts in his head? Well, if you know relevant theoretical CS and useful math, you don't have to rebuild the mathematical scaffolding all by yourself.

I don't have a good enough model of John Wentworth's model for alignment research to understand the differences, but I don't think I learned all that much from John's writings and his training sessions that were a part of his MATS 4.0 training regimen, as compared to the stuff I described above.

Comment by mesaoptimizer on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-21T10:35:17.982Z · LW · GW

Note that when I said I disagree with your decisions, I specifically meant the sort of myopia in the glass shard story -- and specifically because I believe that if your research process / cognition algorithm is fragile enough that you'd be willing to take physical damage to hold onto an inchoate thought, maybe consider making your cognition algorithm more robust.

Comment by mesaoptimizer on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-21T10:33:16.790Z · LW · GW

Quoted from the linked comment:

Rather, I’m confident that executing my research process will over time lead to something good.

Yeah, this is a sentiment I agree with and believe. I think that it makes sense to have a cognitive process that self-corrects and systematically moves towards solving whatever problem it is faced with. In terms of computability theory, one could imagine it as an effectively computable function that you expect will return you the answer -- and the only 'obstacle' is time / compute invested.

I think being confident, i.e. not feeling hopeless in doing anything, is important. The important takeaway here is that you don’t need to be confident in any particular idea that you come up with. Instead, you can be confident in the broader picture of what you are doing, i.e. your processes.

I share your sentiment, although the causal model for it is different in my head. A generalized feeling of hopelessness is an indicator of mistaken assumptions and causal models in my head, and I use that as a cue to investigate why I feel that way. This usually results in me having hopelessness about specific paths, and a general purposefulness (for I have an idea of what I want to do next), and this is downstream of updates to my causal model that attempts to track reality as best as possible.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-20T13:41:59.015Z · LW · GW

I don’t know whether OpenAI uses nondisparagement agreements; I haven’t signed one.

This can also be glomarizing. "I haven't signed one." is a fact, intended for the reader to use it as anecdotal evidence. "I don't know whether OpenAI uses nondisparagement agreements" can mean that he doesn't know for sure, and will not try to find out.

Obviously, the context of the conversation and the events surrounding Holden stating this matters for interpreting this statement, but I'm not interested in looking further into this, so I'm just going to highlight the glomarization possibility.

Comment by mesaoptimizer on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University · 2024-05-20T11:26:52.895Z · LW · GW

I think what quila is pointing at is their belief in the supposed fragility of thoughts at the edge of research questions. From that perspective I think their rebuttal is understandable, and your response completely misses the point: you can be someone who spends only four hours a day working and the rest of the time relaxing, but also care a lot about not losing the subtle and supposedly fragile threads of your thought when working.

Note: I have a different model of research thought, one that involves a systematic process towards insight, and because of that I also disagree with Johannes' decisions.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T22:07:29.157Z · LW · GW

But the discussion of “repercussions” before there’s been an investigation goes into pure-scapegoating territory if you ask me.

Just to be clear, OP themselves seem to think that what they are saying will have little effect on the status quo. They literally called it "Very Spicy Take". Their intention was to allow them to express how they felt about the situation. I'm not sure why you find this threatening, because again, the people they think ideally wouldn't continue to have influence over AI safety related decisions are incredibly influential and will very likely continue to have the influence they currently possess. Almost everyone else in this thread implicitly models this fact as they are discussing things related to the OP comment.

There is not going to be any scapegoating that will occur. I imagine that everything I say is something I would say in person to the people involved, or to third parties, and not expect any sort of coordinated action to reduce their influence -- they are that irreplaceable to the community and to the ecosystem.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T21:56:13.543Z · LW · GW
Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T21:55:20.072Z · LW · GW

“Keep people away” sounds like moral talk to me.

Can you not be close friends with someone while also expecting them to be bad at self-control when it comes to alcohol? Or perhaps they are great at technical stuff like research but pretty bad at negotiation, especially when dealing with experienced adverserial situations such as when talking to VCs?

If you think someone’s decisionmaking is actively bad, i.e. you’d better off reversing any advice from them, then maybe you should keep them around so you can do that!

It is not that people people's decision-making skill is optimized such that you can consistently reverse someone's opinion to get something that accurately tracks reality. If that was the case then they are implicitly tracking reality very well already. Reversed stupidity is not intelligence.

But more realistically, someone who’s fucked up in a big way will probably have learned from that, and functional cultures don’t throw away hard-won knowledge.

Again you seem to not be trying to track the context of our discussion here. This advice again is usually said when it comes to junior people embedded in an institution, because the ability to blame someone and / or hold them responsible is a power that senior / executive people hold. This attitude you describe makes a lot of sense when it comes to people who are learning things, yes. I don't know if you can plainly bring it into this domain, and you even acknowledge this in the next few lines.

Imagine a world where AI is just an inherently treacherous domain, and we throw out the leadership whenever they make a mistake.

I think it is incredibly unlikely that the rationalist community has an ability to 'throw out' the 'leadership' involved here. I find this notion incredibly silly, given the amount of influence OpenPhil has over the alignment community, especially through their funding (including the pipeline, such as MATS).

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T21:06:18.749Z · LW · GW

I downvoted this comment because it felt uncomfortably scapegoat-y to me.

Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality.

If you start with the assumption that there was a moral failing on the part of the grantmakers, and you are wrong, there’s a good chance you’ll never learn that.

I think you are misinterpreting the grandparent comment. I do not read any mention of a 'moral failing' in that comment. You seem worried because of the commenter's clear description of what they think would be a sensible step for us to take given what they believe are egregious flaws in the decision-making processes of the people involved. I don't think there's anything wrong with such claims.

Again: You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about. You can be empathetic to people having flawed decision making and care about them, while also wanting to keep them away from certain decision-making positions.

If you think the OpenAI grant was a big mistake, it’s important to have a detailed investigation of what went wrong, and that sort of detailed investigation is most likely to succeed if you have cooperation from people who are involved.

Oh, interesting. Who exactly do you think influential people like Holden Karnofsky and Paul Christiano are accountable to, exactly? This "detailed investigation" you speak of, and this notion of a "blameless culture", makes a lot of sense when you are the head of an organization and you are conducting an investigation as to the systematic mistakes made by people who work for you, and who you are responsible for. I don't think this situation is similar enough that you can use these intuitions blandly without thinking through the actual causal factors involved in this situation.

Note that I don't necessarily endorse the grandparent comment claims. This is a complex situation and I'd spend more time analyzing it and what occurred.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T19:02:17.349Z · LW · GW

"ETA" commonly is short for "estimated time of arrival". I understand you are using it to mean "edited" but I don't quite know what it is short for, and also it seems like using this is just confusing for people in general.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T18:54:47.908Z · LW · GW

Wasn't edited, based on my memory.

Comment by mesaoptimizer on Tamsin Leake's Shortform · 2024-05-18T17:59:17.354Z · LW · GW

You continue to model OpenAI as this black box monolith instead of trying to unravel the dynamics inside it and understand the incentive structures that lead these things to occur. Its a common pattern I notice in the way you interface with certain parts of reality.

I don't consider OpenAI as responsible for this as much as Paul Christiano and Jan Leike and his team. Back in 2016 or 2017, when they initiated and led research into RLHF, they focused on LLMs because they expected that LLMs would be significantly more amenable to RLHF. This means that instruction-tuning was the cause of the focus on LLMs, which meant that it was almost inevitable that they'd try instruction-tuning on it, and incrementally build up models that deliver mundane utility. It was extremely predictable that Sam Altman and OpenAI would leverage this unexpected success to gain more investment and translate that into more researchers and compute. But Sam Altman and Greg Brockman aren't researchers, and they didn't figure out a path that minimized 'capabilities overhang' -- Paul Christiano did. And more important -- this is not mutually exclusive with OpenAI using the additional resources for both capabilities research and (what they call) alignment research. While you might consider everything they do as effectively capabilities research, the point I am making is that this is still consistent with the hypothesis that while they are misguided, they still are roughly doing the best they can given their incentives.

What really changed my perspective here was the fact that Sam Altman seems to have been systematically destroying extremely valuable information about how we could evaluate OpenAI. Specifically, this non-disparagement clause that ex-employees cannot even mention without falling afoul of this contract, is something I didn't expect (I did expect non-disclosure clauses but not something this extreme). This meant that my model of OpenAI was systematically too optimistic about how cooperative and trustworthy they are and will be in the future. In addition, if I was systematically deceived about OpenAI due to non-disparagement clauses that cannot even be mentioned, I would expect that something similar to also be possible when it comes to other frontier labs (especially Anthropic, but also DeepMind) due to things similar to this non-disparagement clause. In essence, I no longer believe that Sam Altman (for OpenAI is nothing but his tool now) is doing the best he can to benefit humanity given his incentives and constraints. I expect that Sam Altman is entirely doing whatever he believes will retain and increase his influence and power, and this includes the use of AGI, if and when his teams finally achieve that level of capabilities.

This is the update I expect people are making. It is about being systematically deceived at multiple levels. It is not about "OpenAI being irresponsible".

Comment by mesaoptimizer on Tamsin Leake's Shortform · 2024-05-18T16:48:14.199Z · LW · GW

I still parse that move as devastating the commons in order to make a quick buck.

I believe that ChatGPT was not released with the expectation that it would become as popular as it did. OpenAI pivoted hard when it saw the results.

Also, I think you are misinterpreting the sort of 'updates' people are making here.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T11:27:18.242Z · LW · GW

I mean, if Paul doesn't confirm that he is not under any non-disparagement obligations to OpenAI like Cullen O' Keefe did, we have our answer.

In fact, given this asymmetry of information situation, it makes sense to assume that Paul is under such an obligation until he claims otherwise.

Comment by mesaoptimizer on Stephen Fowler's Shortform · 2024-05-18T10:04:58.335Z · LW · GW

I just realized that Paul Christiano and Dario Amodei both probably have signed non-disclosure + non-disparagement contracts since they both left OpenAI.

That impacts how I'd interpret Paul's (and Dario's) claims and opinions (or the lack thereof), that relates to OpenAI or alignment proposals entangled with what OpenAI is doing. If Paul has systematically silenced himself, and a large amount of OpenPhil and SFF money has been mis-allocated because of systematically skewed beliefs that these organizations have had due to Paul's opinions or lack thereof, well. I don't think this is the case though -- I expect Paul, Dario, and Holden all seem to have converged on similar beliefs (whether they track reality or not) and have taken actions consistent with those beliefs.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2024-05-17T20:06:53.921Z · LW · GW

If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.

Comment by mesaoptimizer on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T08:51:46.162Z · LW · GW

I love the score of this comment as of writing: -1 karma points, 23 agree points.

Comment by mesaoptimizer on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T08:46:12.258Z · LW · GW

I think it is useful for someone to tap me on the shoulder and say "Hey, this information you are consuming, its from <this source that you don't entirely trust and have a complex causal model of>".

Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality. I haven't yet found a third alternative, and until then, I'd recommend people both encourage and help people in their community to not scapegoat or lose their minds in 'tribal instincts' (as you put it), while not throwing away valuable information.

You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about.

Comment by mesaoptimizer on Linch's Shortform · 2024-05-13T09:47:12.543Z · LW · GW

Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.

On the other hand, institutional scars can cause what effectively looks like institutional traumatic responses, ones that block the ability to explore and experiment and to try to make non-incremental changes or improvements to the status quo, to the system that makes up the institution, or to the system that the institution is embedded in.

There's a real and concrete issue with the amount of roadblocks that seem to be in place to prevent people from doing things that make gigantic changes to the status quo. Here's a simple example: would it be possible for people to get a nuclear plant set up in the United States within the next decade, barring financial constraints? Seems pretty unlikely to me. What about the FDA response to the COVID crisis? That sure seemed like a concrete example of how 'institutional memories' serve as gigantic roadblocks to the ability for our civilization to orient and act fast enough to deal with the sort of issues we are and will be facing this century.

In the end, capital flows towards AGI companies for the sole reason that it is the least bottlenecked / regulated way to multiply your capital, that seems to have the highest upside for the investors. If you could modulate this, you wouldn't need to worry about the incentives and culture of these startups as much.

Comment by mesaoptimizer on mesaoptimizer's Shortform · 2024-05-13T09:35:10.402Z · LW · GW

I had the impression that SPAR was focused on UC Berkeley undergrads and had therefore dismissed the idea of being a SPAR mentor or mentee. It was only recently that I looked at the website when someone mentioned that they wanted to learn from this one SPAR mentor, and then I looked at the website, and SPAR now seems to focus on the same niche as AI Safety Camp.

Did SPAR pivot in the past six months, or did I just misinterpret SPAR when I first encountered it?

Comment by mesaoptimizer on Linear infra-Bayesian Bandits · 2024-05-11T10:40:18.856Z · LW · GW

Sort-of off-topic, so feel free to maybe move this comment elsewhere.

I'm quite surprised to see that you have just shipped an MSc thesis, because I didn't expect you to be doing an MSc (or anything in traditional academia). I didn't think you needed one, since I think you have enough career capital to continue to work indefinitely on the things you want to work on and get paid well for it. I also assumed that you might find academia somewhat a waste of your time in comparison to doing stuff you wanted to do.

Perhaps you could help clarify what I'm missing?

Comment by mesaoptimizer on jacobjacob's Shortform Feed · 2024-05-08T08:19:27.767Z · LW · GW

fiber at Tata Industries in Mumbai

Could you elaborate on how Tata Industries is relevant here? Based on a DDG search, the only news I find involving Tata and AI infrastructure is one where a subsidiary named TCS is supposedly getting into the generative AI gold rush.

Comment by mesaoptimizer on quila's Shortform · 2024-05-08T06:55:54.803Z · LW · GW

My thought is that I don’t see why a pivotal act needs to be that.

Okay. Why do you think Eliezer proposed that, then?

Comment by mesaoptimizer on Please stop publishing ideas/insights/research about AI · 2024-05-03T10:17:20.872Z · LW · GW

Note that I agree with your sentiment here, although my concrete argument is basically what LawrenceC wrote as a reply to this post.

Comment by mesaoptimizer on Please stop publishing ideas/insights/research about AI · 2024-05-03T10:15:51.350Z · LW · GW

Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.

Two things that come to notice:

  1. You have a tendency to reply to certain posts or comments with "I don't quite understand what is being said here, and I disagree with it." or, "It doesn't track with my views", or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
  2. In the comment I'm replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don't particularly feel appropriate to the information I believe we have.

I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.

Comment by mesaoptimizer on Please stop publishing ideas/insights/research about AI · 2024-05-03T10:06:20.207Z · LW · GW

This is a really well-written response. I'm pretty impressed by it.

Comment by mesaoptimizer on Please stop publishing ideas/insights/research about AI · 2024-05-03T09:52:10.835Z · LW · GW

If your acceptable lower limit for basically anything is zero you wont be allowed to do anything, really anything. You have to name some quantity of capabilities progress that’s okay to do before you’ll be allowed to talk about AI in a group setting.

"The optimal amount of fraud is non-zero."

Comment by mesaoptimizer on Constructability: Plainly-coded AGIs may be feasible in the near future · 2024-04-29T07:52:49.643Z · LW · GW

Okay I just read the entire thing. Have you looked at Eric Drexler's CAIS proposal? It seems to have played some role as the precursor to the davidad / Evan OAA proposal, and has involved the use of composable narrow AI systems.

Comment by mesaoptimizer on Refusal in LLMs is mediated by a single direction · 2024-04-27T21:50:25.521Z · LW · GW

but I’m a bit disappointed that x-risk-motivated researchers seem to be taking the “safety”/”harm” framing of refusals seriously

I'd say a more charitable interpretation is that it is a useful framing: both in terms of a concrete thing one could use as scaffolding for alignment-as-defined-by-Zack research progress, and also a thing that is financially advantageous to focus on since frontier labs are strongly incentivized to care about this.

Comment by mesaoptimizer on Constructability: Plainly-coded AGIs may be feasible in the near future · 2024-04-27T18:26:09.716Z · LW · GW

Haven't read the entire post, but my thoughts on seeing the first image: Pretty sure this is priced into Anthropic / Redwood / OpenAI cluster of strategies where you use an aligned boxed (or 'mostly aligned) generative LLM-style AGI to help you figure out what to do next.

Comment by mesaoptimizer on Eric Neyman's Shortform · 2024-04-26T09:16:10.071Z · LW · GW

e/acc is not a coherent philosophy and treating it as one means you are fighting shadows.

Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded upon a frustration with how bad things are, and a desire to improve things by yourself. This is a sentiment I share and empathize with.

I find the term "techno-optimism" to be a more accurate description of the latter, and perhaps "Beff Jezos philosophy" a more accurate description of what you have in your mind. And "e/acc" to mainly describe the community and its coordinated movements at steering the world towards outcomes that the people within the community perceive as benefiting them.

Comment by mesaoptimizer on NicholasKees's Shortform · 2024-04-26T09:06:10.289Z · LW · GW

I use GreaterWrong as my front-end to interface with LessWrong, AlignmentForum, and the EA Forum. It is significantly less distracting and also doesn't make my ~decade old laptop scream in agony when multiple LW tabs are open on my browser.

Comment by mesaoptimizer on Lucie Philippon's Shortform · 2024-04-24T23:25:25.575Z · LW · GW

The main part of the issue was actually that I was not aware I had internal conflicts. I just mysteriously felt less emotions and motivation.

Yes, I believe that one can learn to entirely stop even considering certain potential actions as actions available to us. I don't really have a systematic solution for this right now aside from some form of Noticing practice (I believe a more refined version of this practice is called Naturalism but I don't have much experience with this form of practice).

Comment by mesaoptimizer on Lucie Philippon's Shortform · 2024-04-24T23:21:13.484Z · LW · GW

What do you think antidepressants would be useful for?

In my experience I've gone months through a depressive episode while remaining externally functional and convincing myself (and the people around me) that I'm not going through a depressive episode. Another thing I've noticed is that with medication (whether anxiolytics, antidepressants or ADHD medication), I regularly underestimate the level at which I was 'blocked' by some mental issue that, after taking the medication, would not exist, and I would only realize it previously existed due to the (positive) changes in my behavior and cognition.

Essentially, I'm positing that you may be in a similar situation.