Posts

Comments

Comment by Owen Henahan (OwenLeaf) on Creating unrestricted AI Agents with Command R+ · 2024-04-17T15:15:24.271Z · LW · GW

This is interesting work, and I appreciate you taking the time to compile and share it.

I think it will be much more difficult for a model to successfully blackmail anyone than to successfully harass them. Humans are limited in their ability to harass a single target by time and effort more than anything -- nonspecific death threats and vitriol require little to no knowledge of the target beyond a surface level. These models could churn out countless variations of this sort of attack relentlessly, which could certainly detrimentally affect someone's mental health/wellbeing to an equal or greater extent than similar human attacks.

However, in the case of traditional blackmail, the key component of fear is that the attacker will publicize something generally unknown, which is not a strong point of current LLMs. Blasting negative public information everywhere could still be detrimental, especially to someone not inured to such attacks (e.g. a non-celebrity), but I view this as having a low ceiling of efficacy based on current capabilities. LLMs scrape public knowledge. A malicious AI agent would have to first acquire information that is hidden, which means targeting the right person or people with the right threats/bribes to achieve that information. Establishing those connections would be incredibly difficult as well as both time- and resource-intensive.

Alternatively, the LLM could trick the target directly into saying or doing something compromising. This second state is, in my view, much more dangerous and already possible with the current state of LLMs. A refined LLM that emulates a "lifelike" AI romantic partner could be used by a bad actor to catfish someone into sending nude pictures or other compromising information with little adjustment. Spitballing here: these attacks could be shotgunned to several targets without the time investment of human catfish. Then, they could theoretically alert a human organizer when a sensitive point is reached in a conversation to seal the deal, so to speak.

Effective attacks like this are much closer on the horizon than the sort of blackmail utilized by Commander in this post, based on current capabilities. I would be curious to know your thoughts on this and whether this is something we're seeing an uptick in at all.

Comment by Owen Henahan (OwenLeaf) on A Gentle Introduction to Risk Frameworks Beyond Forecasting · 2024-04-17T13:22:03.614Z · LW · GW

Thank you for this thoughtful and extremely well-composed piece!

I have mostly been a lurker here on LessWrong, but as I have absorbed the discourse over time, I started coming to a similar conclusion as you both in the earlier sections of your post -- namely, the detriment to our discourse caused by our oversimplification of risk. I think Yudkowsky's "everyone dies" maxim is memetically powerful, but ultimately a bit detrimental to a community already focused on solving alignment. Exposure is something we all need to think more critically about, and I appreciate the tools you have shared from your field to help do so. I hope we adopt some of this terminology in our discussions.

Comment by Owen Henahan (OwenLeaf) on One-shot strategy games? · 2024-03-25T17:00:59.527Z · LW · GW

This varies pretty dramatically by how careful an individual player is (as well as whether or not autoexplore is used) but to provide a data point, I would say I'm probably 6.5/10 self-rated careful (increasing sharply from 4/10 to 8-9/10 after I get out of the early floors and get a sense that I'm well-positioned to go deep) and my first victory took about three and a half hours. However, I've gotten very close to victory in other runs in closer to 2-2.5 hours.

Most games will be shorter than that, many significantly so. There is a website that provides statistics for people playing the web version of the game -- http://brogue.roguelikelike.com/#gameStatistics -- but total game length isn't part of it. You can get a sense of difficulty as well as the distribution of where/why runs often end, which can help give an indirect sense of length.

Comment by Owen Henahan (OwenLeaf) on One-shot strategy games? · 2024-03-11T14:41:02.936Z · LW · GW

I see a lot of roguelites in the comments (many of which I will happily second, particularly Slay the Spire) but my vote and highest recommendation go to the traditional roguelike Brogue.

This game got me into the genre, so I do have a bit of a nostalgia bias towards it, but it is heavily recommended in traditional roguelike communities and considered to be a staple.

Brogue is as traditional as it comes -- descend into the procedurally-generated dungeon, pick up the Amulet of Yendor, and escape with your singular and fragile life. In my view, it is a refinement and distillation of this formula and of traditional roguelike strategy mechanics into something endlessly replayable and perfectly streamlined.

I would say it meets both your first and second qualifications. There are milestones, both with each floor of the dungeon, and ones that are typically community-defined, and a run in two hours is feasible. However, it is incredibly difficult to win on your first try. I have never heard of anyone doing it, although I'm sure that they have. Among myself and my IRL friends who I have gotten into it, I am the only one to have successfully beaten it, though two of the four of them have gotten very close multiple times. There are players capable of winstreaking it in the community, even when taking on optional additional objectives and descending even deeper into the dungeon than necessary. Every seed, random though it is, is fundamentally winnable and fair. Unlike in many, many similar games, I have never once walked away from a death thinking "that was bullshit" -- there's something beautiful about that, to me. It feels like an elegant game of something almost chesslike, purely my own strategic merit and improvement run to run against the computer.

Regarding strategy/resource management/hidden information/value of information:

  • Strategy: Every single turn is a strategic decision of some form. Your hunger ticks away, slowly but surely, pressing you deeper into the dungeon for food. Monsters that could overwhelm you in an open space can be safely handled in corridors, and vice versa. The environment itself is a strategic consideration -- burning a bridge an enemy is crossing may save you now, but you've kicked the problem down the road, making the next floor more dangerous. Setting the grass a group of enemies is patrolling aflame could damage them and force them to reposition, but be mindful of the fire's spread. Steam, or cutting off your escape route, could both prove deadly mistakes. Additionally, each weapon type, armor, scroll, and magical item is unique, with its own benefits and drawbacks. There are vaults full of items, of which you can choose only one to bring with you. You must make the best choices with what you find, which ties into resource management.
  • Resource Management/Hidden Information/Value of Information: This is a key aspect of Brogue. Your consumable items are limited, and there's a hidden information trick: all items start out unidentified. Potions are known by color (random from game to game) and scrolls by strings of gibberish. Magic rings, possibly beneficial or cursed, are known by their material, as are wands and staves. You can tell that a spear is a spear, but is it magic? Keenly edged, or perhaps with a rarer and more special enchantment still? Or is it cursed? There are scrolls and potions capable of detecting magic and identifying items. In a pressing situation, where your current gear may not be up to snuff, do you take the risk of equipping something unidentified, or seek another solution? The game provides you with percentages for each unidentified item, so you can make semi-educated guesses, but the art of managing your hidden information is a game within a game all by itself.

Last but not least, a few technical details: The game is free. The community edition is still continuously developed and updated. It runs on Windows, Mac, and Linux, and the entire app is only a couple of megabytes.

If anyone tries it out, please let me know what you think and how you like it.