Posts

Comments

Comment by roha on EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 · 2024-06-09T00:43:33.569Z · LW · GW

Further context about the "recent advancements in the AI sector have resolved this issue" paragraph:

Comment by roha on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-16T00:15:43.078Z · LW · GW

In case anyone got worried, OpenAI's blog post Introducing Superalignment on July 5, 2023 contained two links for recruiting, one still working and the other not. From this we can deduce that superalignment has been reduced to an engineering problem, and therefore scientists like Ilya and Jan were able to move on to new challenges, such as spending the last normal summer in a nice location with close friends and family.

"Please apply for our research engineer and research scientist positions."

Comment by roha on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-16T00:03:58.620Z · LW · GW

I assume they can't make a statement and that their choice of next occupation will be the clearest signal they can and will send out to the public.

Comment by roha on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T23:58:37.151Z · LW · GW

He has a stance towards risk that is a necessary condition for becoming the CEO of a company like OpenAI, but doesn't give you a high probability of building a safe ASI:

Comment by roha on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-02T09:13:34.691Z · LW · GW

If everyone has his own asteroid impact, earth will not be displaced because the impulse vectors will cancel each other out on average*. This is important because it will keep the trajectory equilibrium of earth, which we know since ages from animals jumping up and down all the time around the globe in their games of survival. If only a few central players get asteroid impacts it's actually less safe! Safety advocates might actually cause the very outcomes that they fear!

*I've a degree in quantum physics and can derive everything from my model of the universe. This includes moral and political imperatives that physics dictate and thus most physicists advocate for.

Comment by roha on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-02T08:55:52.909Z · LW · GW

We are decades if not centuries away from developing true asteroid impacts.

Comment by roha on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-01T18:31:11.031Z · LW · GW

Given all the potential benefits there is no way we are not going to redirect asteroids to earth. Everybody will have an abundance of rare elements.

xlr8

Comment by roha on Many arguments for AI x-risk are wrong · 2024-03-05T22:21:20.450Z · LW · GW

Some context from Paul Christiano's work on RLHF and a later reflection on it:

Christiano et al.: Deep Reinforcement Learning from Human Preferences

In traditional reinforcement learning, the environment would also supply a reward [...] and the
agent’s goal would be to maximize the discounted sum of rewards. Instead of assuming that the
environment produces a reward signal, we assume that there is a human overseer who can express preferences between trajectory segments. [...] Informally, the goal of the agent is to produce trajectories which are preferred by the human, while making as few queries as possible to the human. [...] After using  to compute rewards, we are left with a traditional reinforcement learning problem

Christiano: Thoughts on the impact of RLHF research

The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. [...] Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because:

  • Evaluating consequences is hard.
  • A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time.

[...] I don’t think that improving or studying RLHF is automatically “alignment” or necessarily net positive.

Edit: Another relevant section in an interview of Paul Christiano by Dwarkesh Patel:

Paul Christiano - Preventing an AI Takeover

Comment by roha on Against most, but not all, AI risk analogies · 2024-01-16T12:47:02.829Z · LW · GW

Replacing must by may is a potential solution to the issues discussed here. I think analogies are misleading when they are used as a means for proof, i.e. convincing yourself or others of the truth of some proposition, but they can be extremely useful when they are used as a means for exploration, i.e. discovering new propositions worth of investigation. Taken seriously, this means that if you find something of interest with an analogy, it should not mark the end of a thought process or conversation, but the beginning of a validation process: Is there just a superficial or actually some deep connection between the compared phenomena? Does it point to a useful model or abstraction?

Example: I think the analogy that trying to align an AI is like trying to steer a rocket towards any target at all shouldn't be used to convince people that without proper alignment methods mankind is screwed. Who knows if directing a physical object in a geometrical space has much to do with directing a cognitive process in some unknown combinatorial space? Alternatively, the analogy could instead be used as a pointer towards a general class of control problems that come with specific assumptions, which may or may not hold for future AI systems. If we think that the assumptions hold, we may be able to learn a lot from existing instances of control problems like rockets and acrobots about future instances like advanced AIs. If we think that the assumptions don't hold, we may learn something by identifying the least plausible assumption and trying to formulate an alternative abstraction that doesn't depend on it, opening another path towards collecting empirical data points of existing instances.

Comment by roha on The Next ChatGPT Moment: AI Avatars · 2024-01-08T03:41:15.998Z · LW · GW

For collaboration on job-like tasks that assumption might hold. For companionship and playful interactions I think the visual domain, possibly in VR/AR, will be found to be relevant and kept. Given our psychological priors, I also think for many people it may feel like a qualitative change in what kind of entity we are interacting with - from lifeless machine, over uncanny human imitation, to believable personality on another substrate.

Comment by roha on The Next ChatGPT Moment: AI Avatars · 2024-01-06T20:12:34.537Z · LW · GW

Empirical data point: In my experience, talking to Inflection's Pi on the phone covers the low latency integration of "AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech" sufficiently well to pass some bar of "feels authentically human" to me until you try to test its limits. I imagine that subjective experience to be more likely to appear if you don't have background knowledge about LLMs / DL. Its main problems are 1) keeping track of context in plausibly human-like way (e.g. playing a game of guessing capital cities of European countries leads to repetitive questions about the same few countries even if asked to take care in various ways) and 2) inconsistent rejection of talking about certain things depending on previous text (e.g. retelling dark jokes by real comedians).

I share your expectation that adding photorealistic video generation to it can plausibly lead to another "cultural moment", though it might depend on whether such avatars find similarly rapid adoption as ChatGPT or whether it's phased in more gradually. (I've no overview of the entire space and stumbled over Inflection's product by chance after a random podcast listening. If there are similar ones out there already I'd love to know.)

edit: Corrected link formatting.

Comment by roha on re: Yudkowsky on biological materials · 2023-12-11T23:16:11.963Z · LW · GW

Meta-questions: How relevant are nanotechnological considerations for x-risk from AI? How suited are scenarios involving nanotech for making a plausible argument for x-risk from AI, i.e. one that convinces people to take the risk seriously and to become active in attempting to reduce it?

Comment by roha on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-11T11:41:39.149Z · LW · GW

It seems to me as if we expect the same thing then: If humanity was largely gone (e.g. by several engineered pandemics) and as a consequence the world economy came to a halt, an ASI would probably be able to sustain itself long enough by controlling existing robotic machinery, i.e. without having to make dramatic leaps in nanotech or other technology first. What I wanted to express with "a moderate increase of intelligence" is that it won't take an ASI at the level of GPT-142 to do that, but GPT-7 together with current projects in robotics might suffice to get the necessary planning and control of actuators come into existence.

If that assumption holds, it means an ASI might come to the conclusion that it should end the threat that humanity poses to its own existence and goals long before it is capable of building Drexler nanotech, Dyson spheres, Von Neumann probes or anything else that a large portion of people find much too hypothetical to care about at this point in time.

Comment by roha on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-08T11:07:12.093Z · LW · GW

The question in point 2 is whether an ASI could sustain itself without humans and without new types of hardware such as Drexler style nanomachinery, which to a significant portion of people (me not included) seems to be too hypothetical to be of actual concern. I currently don't see why the answer to that question should be a highly certain no, as you seem to suggest. Here are some thoughts:

  • The world economy is largely catering to human needs, such as nutrition, shelter, healthcare, personal transport, entertainment and so on. Phenomena like massive food waste and people stuck in bullshit jobs, to name just two, also indicate that it's not close to optimal in that. An ASI would therefore not have to prevent a world economy from collapsing or pick it up afterwards, which I also don't think is remotely possible with existing hardware. I think the majority of processes running in the only example of a world economy we have is irrelevant to the self-preservation of an ASI.
  • An ASI would presumably need to keep it's initial compute substrate running long enough to transition into some autocatalytic cycle, be it on the original or a new substrate. (As a side remark, it's also thinkable that it might go into a reduced or dormant state for a while and let less energy- and compute-demanding processes act on its behalf until conditions have improved on some metric). I do believe that conventional robotics is sufficient to keep the lights on long enough, but to be perfectly honest, that's conditioned on a lack of knowledge about many specifics, like exact numbers of hardware turnover and energy requirements of data centers capable of running frontier models, the amount and quality of chips currently existing on the planet, the actual complexity of keeping different types of power plants running for a relevant period of time, the many detailed issues of existing power grids, etc. I weakly suspect there is some robustness built into these systems that stems not only from the flexible bodies of human operators or from practical know how that can't be deduced from the knowledge base of an ASI that might be built.
  • The challenge would be rendered more complex for an ASI if it were not running on general-purpose hardware but special-purpose circuitry that's much harder to maintain and replace. It may additionally be a more complex task if the ASI could not gain access to its own source code (or relevant parts of it), since that presumably would make a migration onto other infrastructure considerably more difficult, though I'm not fully certain that's actually the case, given that the compiled and operational code may be sufficient for an ASI to deduce weights and other relevant aspects.
  • Evolution presumably started from very limited organic chemistry and discovered autocatalytic cycles based on biochemistry, catalytically active macromolecules and compartmentalized cells. That most likely implies that a single cell may be able to repopulate an entire planet that is sufficiently earth-like and give rise to intelligence again after billions of years. That fact alone certainly does not imply that thinking sand needs to build hypothetical nanomachinery to win the battle against entropy over a long period of time. Existing actuators and chips on the planet, the hypothetical absence of humans, and a HLAI or ASI moderately above it may be sufficient in my current opinion.
Comment by roha on Why Yudkowsky is wrong about "covalently bonded equivalents of biology" · 2023-12-07T10:41:22.356Z · LW · GW

An attempt to optimize for a minimum of abstractness, picking up what was communicated here:

  1. How could an ASI kill all humans? Setting off several engineered pandemics a month with a moderate increase of infectiousness and lethality compared to historical natural cases.
  2. How could an ASI sustain itself without humans? Conventional robotics with a moderate increase of intelligence in planning and controlling the machinery.

People coming in contact with that argument will check its plausibility, as they will with a hypothetical nanotech narrative. If so inclined, they will come to the conclusion that we may very well be able to protect ourselves against that scenario, either by prevention or mitigation, to which a follow-up response can be a list of other scenarios at the same level of plausibility, derived from not being dependent on hypothetical scientific and technological leaps. Triggering this kind of x-risk skepticism in people seems less problematic to me than making people think the primary x-risk scenario is far fetched sci-fi and most likely doesn't hold to scrutiny by domain experts. I don't understand why communicating a "certain drop dead scenario" with low plausibility seems preferable over a "most likely drop dead scenario" with high plausibility, but I'm open to being convinced that this approach is better suited for the goal of x-risk of ASI being taken seriously by more people. Perhaps I'm missing a part of the grander picture?

Comment by roha on Memetic Judo #1: On Doomsday Prophets v.3 · 2023-08-19T15:00:53.781Z · LW · GW

It is an argument by induction based on a naive extrapolation of a historic trend.

This characterization could be a good first step to construct a convincing counter argument. Are there examples of other arguments by induction that simply extrapolate historic trends, where it is much more apparent that it is an unreliable form of reasoning? To be intuitive it must not be too technical, e.g. "people claiming to have found a proof to Fermat's last theorem have always been wrong in the past (until Andrew Wiles came along)" would probably not work well.

Comment by roha on All AGI Safety questions welcome (especially basic ones) [July 2023] · 2023-07-22T15:25:40.182Z · LW · GW

There seems to be a clear pattern of various people downplaying AGI risk on the basis of framing it as mere speculation, science fiction, hysterical, unscientific, religious, and other variations of the idea that it is not based on sound foundations, especially when it comes to claims of considerable existential risk. One way to respond to that is by pointing at existing examples of cutting-edge AI systems showing unintended or at least unexpected/unintuitive behavior. Has someone made a reference collection of such examples that are suitable for grounding speculations in empirical observations?

With "unintended" I'm roughly thinking of examples like the repeatedly used video of a ship going in circles to continually collect points instead of finishing a game. With "unexpected/unintuitive" I have in mind examples like AlphaGo surpassing 3000 years of collective human cognition in a very short time by playing against itself, clearly demonstrating the non-optimality of our cognition, at least in a narrow domain.