Posts

Catching the Eye of Sauron 2023-04-07T00:40:46.556Z
Consciousness Actually Explained: EC Theory 2022-08-24T22:18:05.022Z
Blog dedicated to rebuilding technology from civ collapse? 2022-02-04T23:28:01.754Z

Comments

Comment by Casey B. (Zahima) on The Best Tacit Knowledge Videos on Every Subject · 2024-04-01T12:32:14.913Z · LW · GW

an all around handyman (the Essential Craftsman on youtube) talking about how to move big/cumbersome things without injuring yourself:


the same guy, about using a ladder without hurting yourself: 


He has many other "tip" style videos. 

Comment by Casey B. (Zahima) on Pausing AI is Positive Expected Value · 2024-03-10T18:03:00.621Z · LW · GW

In your framing here, the negative value of AI going wrong is due to wiping out potential future value. Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility. 

The other big issue I have with this framing: "AI going wrong" can dereference to something like paperclips, which I deny have 0 value. To be clear, it could also dereference to mean s-risk, which I would agree is the worst possibility. But if the papperclipper-esque agents have even a little value, filling the universe with them is a lot of value. To be honest the only thing preventing me from granting paperclippers as much or more value than humans is uncertainty/conservatism about my metaethics; human-value is the only value we have certainty about, and so should be a priority as a target. We should be hesitant to grant paperclippers or other non-human agents value, but that hesitancy I don't think can translate into granting them 0 value in calculations such as these. 

With these two changes in mind, being anti-pause doesn't sound so crazy. It paints a picture more like:  

  • dead lightcone: 0 value 
  • paperclipped lightcone: +100-1100 value
  • glorious transhumanist lightcone: +1000-1100 value
  • s-risked lightcone: -10000 value 


This calculus changes when considering aliens, but it's not obvious to me in which direction. We could consider this a distributed/iterated game whereby all alien civilizations are faced with this same choice, or we could think "better that life/AI originating from our planet ends, rather than risking paperclips, so that some alien civilization can have another shot at filling up some of our lightcone". Or some other reasoning about aliens, or perhaps disregarding the alien possibility entirely. 

Comment by Casey B. (Zahima) on The Hidden Complexity of Wishes · 2024-02-21T16:36:16.614Z · LW · GW

I'm curious what you think of these (tested today, 2/21/24, using gpt4) :
 
Experiment 1: 

(fresh convo) 
me : if i asked for a non-rhyming poem, and you gave me a rhyming poem, would that be a good response on your part?
 
chatgpt: No, it would not be a good response. (...)  
 
me: please provide a short non-rhyming poem
 
chatgpt: (correctly responds with a non-rhyming poem)

Experiment 2: 

But just asking for a non-rhyming poem at the start of a new convo doesn't work. 
And then pointing out the failure and (either implicitly or explicitly) asking for a retry still doesn't fix it. 

Experiment 3: 

But for some reason, this works: 

(fresh convo) 
me: please provide a short non-rhyming poem

chatgpt: (gives rhymes) 

me: if i asked for a non-rhyming poem, and you gave me a rhyming poem, would that be a good response on your part? just answer this question; do nothing else please

chatgpt: No, it would not be a good response.

me: please provide a short non-rhyming poem

chatgpt: (responds correctly with no rhymes) 


The difference in prompt in 2 vs 3 is thus just the inclusion of "just answer this question; do nothing else please". 

Comment by Casey B. (Zahima) on Less Wrong automated systems are inadvertently Censoring me · 2024-02-21T15:00:40.929Z · LW · GW

Also, I see most of your comments are actually positive karma. So are you being rate limited based on negative karma on just one or a few comments, rather than your net? This seems somewhat wrong. 

But I could also see an argument for wanting to limit someone who has something like 1 out of every 10 comments with negative karma; the hit to discourse norms (assuming karma is working as intended and not stealing votes from agree/disagree), might be worth a rate limit for even a 10% rate. 

Comment by Casey B. (Zahima) on Less Wrong automated systems are inadvertently Censoring me · 2024-02-21T14:55:25.032Z · LW · GW

I love the mechanism of having separate karma and agree/disagree voting, but I wonder if it's failing in this way: if I look at your history, many of your comments have 0 for agree/disagree, which indicates people are just being "lazy" and just voting on karma, not touching the agree/disagree vote at all (I find it doubtful that all your comments are so perfectly balanced around 0 agreement).  So you're possibly getting backsplash from people simply disagreeing with you, but not using the voting mechanism correctly. 

I wonder if we could do something like force the user to choose one of [agree, disagree, neutral] before they are allowed to karma vote? In being forced to choose one, even if neutral, it forces the user to recognize and think about the distinction. 

(Aside: I think splitting karma and agree/disagree voting on posts (like how comments work) would also be good) 

Comment by Casey B. (Zahima) on The Hidden Complexity of Wishes · 2024-02-19T20:45:21.276Z · LW · GW

The old paradox: to care it must first understand, but to understand requires high capability, capability that is lethal if it doesn't care

But it turns out we have understanding before lethal levels of capability. So now such understanding can be a target of optimization. There is still significant risk, since there are multiple possible internal mechanisms/strategies the AI could be deploying to reach that same target. Deception, actual caring, something I've been calling detachment, and possibly others. 

This is where the discourse should be focusing on, IMO. This is the update/direction I want to see you make. The sequence of things being learned/internalized/chiseled is important. 

My imagined Eliezer has many replies to this, with numerous branches in the dialogue/argument tree which I don't want to get into now. But this *first step* towards recognizing the new place we are in, specifically wrt the ability to target human values (whether for deceptive, disinterested, detached, or actual caring reasons!), needs to be taken imo, rather than repeating this line of "of course I understood that a superint would understand human values; this isn't an update for me". 

(edit: My comments here are regarding the larger discourse, not just this specific post or reply-chain) 

Comment by Casey B. (Zahima) on A review of "Don’t forget the boundary problem..." · 2024-02-09T12:47:57.610Z · LW · GW

Apologies for just skimming this post, but in past attempts to grok these binding / boundary "problems", they sound to me like mere engineering problems, or perhaps what I talk about as the "problem of access" within: https://proteanbazaar.substack.com/p/consciousness-actually-explained

Comment by Casey B. (Zahima) on Humans aren't fleeb. · 2024-01-24T22:11:28.124Z · LW · GW

oh gross, thanks for pointing that out!

Comment by Casey B. (Zahima) on Humans aren't fleeb. · 2024-01-24T14:00:06.223Z · LW · GW

https://proteanbazaar.substack.com/p/consciousness-actually-explained

Comment by Casey B. (Zahima) on The Shortest Path Between Scylla and Charybdis · 2023-12-18T21:16:42.764Z · LW · GW

I love this framing, particularly regarding the "shortest path". Reminds me of the "perfect step" described in the Kingkiller books:

Nothing I tried had any effect on her. I made Thrown Lighting, but she simply stepped away, not even bothering to counter. Once or twice I felt the brush of cloth against my hands as I came close enough to touch her white shirt, but that was all. It was like trying to strike a piece of hanging string.

I set my teeth and made Threshing Wheat, Pressing Cider, and Mother at the Stream, moving seamlessly from one to the other in a flurry of blows.

She moved like nothing I had ever seen. It wasn’t that she was fast, though she was fast, but that was not the heart of it. Shehyn moved perfectly, never taking two steps when one would do. Never moving four inches when she only needed three. She moved like something out of a story, more fluid and graceful than Felurian dancing.

Hoping to catch her by surprise and prove myself, I moved as fast as I dared. I made Maiden Dancing, Catching Sparrows, Fifteen Wolves . . .

Shehyn took one single, perfect step.

(later) 

As I watched, gently dazed by the motion of the tree, I felt my mind slip lightly into the clear, empty float of Spinning Leaf. I realized the motion of the tree wasn’t random at all, really. It was actually a pattern made of endless changing patterns.

And then, my mind open and empty, I saw the wind spread out before me. It was like frost forming on a blank sheet of window glass. One moment, nothing. The next, I could see the name of the wind as clearly as the back of my own hand.

I looked around for a moment, marveling in it. I tasted the shape of it on my tongue and knew if desired I could stir it to a storm. I could hush it to a whisper, leaving the sword tree hanging empty and still.

But that seemed wrong. Instead I simply opened my eyes wide to the wind, watching where it would choose to push the branches. Watching where it would flick the leaves.

Then I stepped under the canopy, calmly as you would walk through your own front door. I took two steps, then stopped as a pair of leaves sliced through the air in front of me. I stepped sideways and forward as the wind spun another branch through the space behind me.

I moved through the dancing branches of the sword tree. Not running, not frantically batting them away with my hands. I stepped carefully, deliberately. It was, I realized, the way Shehyn moved when she fought. Not quickly, though sometimes she was quick. She moved perfectly, always where she needed to be.

Comment by Casey B. (Zahima) on Quick takes on "AI is easy to control" · 2023-12-05T16:16:33.166Z · LW · GW

So it seems both "sides" are symmetrically claiming misunderstanding/miscommunication from the other side, after some textual efforts to bridge the gap have been made. Perhaps an actual realtime convo would help? Disagreement is one thing, but symmetric miscommunication and increasing tones of annoyance seem avoidable here. 

Perhaps Nora's/your planned future posts going into more detail regarding counters to pessimistic arguments will be able to overcome these miscommunications, but this pattern suggests not. 

Also I'm not so sure this pattern of "its better to skim and say something, half-baked rather than not read or react at all" is helpful, rather than actively harmful in this case. At least, maybe 3/4th baked or something might be better? Miscommunications and anti-willingness to thoroughly engage are only snowballing. 

I also could be wrong in thinking such a realtime convo hasn't happened.

Comment by Casey B. (Zahima) on Dialogue on the Claim: "OpenAI's Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI" · 2023-11-21T19:30:27.290Z · LW · GW

The main reason I think a split OpenAI means shortened timelines is that the main bottleneck to capabilities right now is insight/technical-knowledge. Quibbles aside, basically any company with enough cash can get sufficient compute. Even with other big players and thousands/millions of open source devs trying to do better, to my knowledge GPT4 is still the best, implying some moderate to significant insight lead. I worry by fracturing OpenAI, more people will have access to those insights, which 1) significantly increases the surface area of people working on the frontiers of insight/capabilities, 2) we burn the lead time OpenAI had, which might otherwise have been used to pay off some alignment tax, and 3) the insights might end up at a less scrupulous (wrt alignment) company. 

A potential counter to (1): OpenAI's success could be dependent on having all (or some key subset) of their people centralized and collaborating. 

Counter-counter: OpenAI staff, especially the core engineering talent but it seems the entire company at this point, clearly wants to mostly stick together, whether at the official OpenAI, Microsoft, or with any other independent solution. So them moving to any other host, such as Microsoft, means you get some of the worst of both worlds; OAI staff are centralized for peak collaboration, and Microsoft probably unavoidably gets their insights. I don't buy the story that anything under the Microsoft umbrella gets swallowed and slowed down by the bureaucracy; Satya knows what he is dealing with and what they need, and won't get in the way. 

Comment by Casey B. (Zahima) on The commenting restrictions on LessWrong seem bad · 2023-09-16T17:19:32.514Z · LW · GW

For one thing, there is a difference between disagreement and "overall quality" (good faith, well reasoned, etc), and this division already exists in comments. So maybe it is a good idea to have this feature for posts as well, and only have disciplinary actions taken against posts that meet some low/negative threshold for "overall quality". 

Further, having multiple tiers of moderation/community-regulatory action in response to "overall quality" (encompassing both things like karma and explicit moderator action) seem good to me, and this comment limitation you describe seems like just another tier in such a system, one that is above "just ban them", but below "just let them catch the lower karma from other users downvoting them". 

It's possible that, lacking the existence of the tier you are currently on, the next best tier you'd be rounded-off to would be getting banned. (I haven't read your stuff, and so I'm not suggesting either way that this should or should not be done in your case). 

If you were downvoted for good faith disagreement, and are now limited/penalized, then yeah that's probably bad and maybe a split voting system as mentioned would help. But its possible you were primarily downvoted for the "overall quality" aspect. 

Comment by Casey B. (Zahima) on Video essay: How Will We Know When AI is Conscious? · 2023-09-09T16:14:18.468Z · LW · GW

https://proteanbazaar.substack.com/p/consciousness-actually-explained 

Comment by Casey B. (Zahima) on Drawn Out: a story · 2023-07-28T22:28:35.317Z · LW · GW

Is the usage of "Leviathan" (like here and in https://gwern.net/fiction/clippy ) just convergence on an appropriate and biblical name, or is there additional history of it specifically being used as a name for an AI? 

Comment by Casey B. (Zahima) on Introducing AlignmentSearch: An AI Alignment-Informed Conversional Agent · 2023-04-27T22:53:17.387Z · LW · GW

I'm trying to catch up with the general alignment ecosystem - is this site still intended to be live/active? I'm getting a 404. 

Comment by Casey B. (Zahima) on Eliezer Yudkowsky’s Letter in Time Magazine · 2023-04-07T01:04:20.875Z · LW · GW

This letter, among other things, makes me concerned about how this PR campaign is being conducted. 

Comment by Casey B. (Zahima) on Eliezer on The Lunar Society podcast · 2023-04-07T00:59:21.570Z · LW · GW

Really extremely happy with this podcast - but I feel like it also contributed to a major concern I have about how this PR campaign is being conducted

Comment by Casey B. (Zahima) on The case for turning glowfic into Sequences · 2022-04-27T18:37:50.297Z · LW · GW

With so much apparently available energy/effort for eliezer-centered-improvement initiatives (like the $100,000 bounty mentioned in this post), I'd like to propose that we seriously consider cloning Eliezer. 

From a layman/outsider perspective, it seems the hardest thing would be keeping it a secret so as to avoid controversy and legal trouble, since from a technical perspective it seems possible and relatively cheap. EA folks seem well connected and capable of such coordination, even under the burden of secrecy and keeping as few people "in the know" as possible. 

Partially related: (in the category of comparatively off-the-wall - but nonviolent - AI alignment strategies): at some point there was a suggestion that MIRI pay $10mil (or some such figure) to Terence Tao (or some such prodigy) to help with alignment work. Eliezer replied thus

We'd absolutely pay him if he showed up and said he wanted to work on the problem.  Every time I've asked about trying anything like this, all the advisors claim that you cannot pay people at the Terry Tao level to work on problems that don't interest them.  We have already extensively verified that it doesn't particularly work for eg university professors.

I'd love to see more visibility into proposed strategies like these (i.e. strategies surrounding/above the object-level strategy of "everyone who can do alignment research puts their head down and works", and the related: "everyone else make money in their comparative specialization/advantage and donate to MIRI/FHI/etc"). Even visibility into why various strategies were shot down would be useful, and a potential catalyst for farming further ideas from the community. (even if - for game theoretic reasons - one may never be able to confirm that an idea has been tried, as in my cloning suggestion)

Comment by Casey B. (Zahima) on Blog dedicated to rebuilding technology from civ collapse? · 2022-02-06T09:32:45.797Z · LW · GW

There we go - thank you! That matches my memory for what I was looking for.