Posts

Meta Alignment: Communication Wack-a-Mole 2024-06-22T20:12:16.412Z
My Weirdest Experience 2023-07-11T14:44:55.469Z

Comments

Comment by Bridgett Kay (bridgett-kay) on shminux's Shortform · 2024-09-30T01:38:25.737Z · LW · GW

1991/1992, actually (Harry Potter was born July 1980, and the story takes place the school year after his 11th birthday.)

Comment by Bridgett Kay (bridgett-kay) on Yoav Ravid's Shortform · 2024-09-25T22:39:10.066Z · LW · GW

Seems to me that the only winning move is not to play. 

Comment by Bridgett Kay (bridgett-kay) on This is already your second chance · 2024-07-29T14:51:26.472Z · LW · GW

This might be our third, fourth fifth... nth chance. 

Comment by Bridgett Kay (bridgett-kay) on Meta Alignment: Communication Wack-a-Mole · 2024-06-22T22:41:26.074Z · LW · GW

Thank you.

Comment by bridgett-kay on [deleted post] 2024-04-22T19:34:57.780Z

This is legitimate- the definition of weirdness was kept open-ended. I intended weirdness to be any behavior that is divergent from what most in a certain group considers to be the status quo, but even within a group, each member may have a different definition of what weird behavior is, and a consensus will be difficult to pin down. 

I would consider rudeness to be weird behavior under this definition. It is a social behavior that comes with the cost of disrupting social cohesion. What is considered rude, vs. frank and straightforward, will vary from person to person even within a group, and may change over time as people within the group analyze whether the cost of the behavior is worth the social cost of ostracizing the individual who engages in that behavior. For example, cursing was considered much more rude by my parent's generation than the current generation. It took time and discourse for the status quo to change, and for people to decide that cursing is less harmful than was once imagined. 

As for whether I'm trying to excuse my character flaws, that may well be the case. In learning how to more effectively examine the costs and benefits of my behavior, I hope to recognize what is a flaw, and what is not, and to mend the former. 

Comment by Bridgett Kay (bridgett-kay) on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-01T17:38:52.623Z · LW · GW

We don't know how to align asteroids' trajectories, so it's important to use smaller asteroids to align larger ones- like a very large game of amateur billiards. 

Comment by Bridgett Kay (bridgett-kay) on LessWrong's (first) album: I Have Been A Good Bing · 2024-04-01T15:05:57.175Z · LW · GW

I love this! But I find myself a little disappointed there's not a musical rendition of the "I have been a good bing" dialogue.

Comment by Bridgett Kay (bridgett-kay) on Can we get an AI to "do our alignment homework for us"? · 2024-02-28T18:39:20.141Z · LW · GW

As one scales up a system, any small misalignment within that system will become more apparent- more skewed. I use shooting an arrow as an example. Say you shoot an arrow at a target from only a few feet away. If you are only a few degrees off from being lined up with the bullseye, when you shoot the close target your arrow will land very close to the bullseye. However, if you shoot a target many yards away with the same degree of error, your arrow will land much, much farther from the bullseye. 

So if you get a less powerful AI aligned with your goals to a degree where everything looks fine, and then assign it the task of aligning a much more powerful AI, then any small flaw in the alignment of the less powerful AI will go askew far worse in the more powerful AI. What's worse- since you assigned the less powerful AI the task aligning the larger AI, you won't be able to see exactly what the flaw was until it's too late, because if you'd been able to see the flaw, you would have aligned the larger AI yourself. 

Comment by Bridgett Kay (bridgett-kay) on My Weirdest Experience · 2023-07-12T00:29:49.695Z · LW · GW

That seems fairly consistent with what happened to me. I did not experience my entire life in the dream- just the swim meet and the aftermath, and my memories were things I just summoned in the moment, like just coming up with small pieces of a story in real time. The thing that disturbed me the most wasn't living another life- though that was disturbing enough- but the fact that a character in the dream knew a truth that "I" did not. 

Comment by Bridgett Kay (bridgett-kay) on My Weirdest Experience · 2023-07-11T21:00:43.328Z · LW · GW

I have a similar trick I use with pirouettes- if I can turn and turn without stopping, then it is a dream. Of course, in this dream, I was not a dancer and had never danced, so I didn't even think of it. 

Comment by Bridgett Kay (bridgett-kay) on Ways I Expect AI Regulation To Increase Extinction Risk · 2023-07-10T17:40:06.425Z · LW · GW

Lately I've been appreciating, more and more, something I'm starting to call "Meta-Alignment." Like, with everything that touches AI, we have to make sure that thing is aligned just enough to where it won't mess up or "misalign" the alignment project. For example, we need to be careful about the discourse surrounding alignment, because we might give the wrong idea to people who will vote on policy or work on AI/AI adjacent fields themselves. Or policy needs to be carefully aligned, so it doesn't create misaligned incentives that mess up the alignment project; the same goes for policies in companies that work with AI. This is probably a statement of the obvious, but it is really a daunting prospect the more I think about it. 

Comment by Bridgett Kay (bridgett-kay) on The LessWrong 2019 Review · 2021-01-05T23:07:47.474Z · LW · GW

I was just wondering, on the subject of research debt, if there was any sort of system so that people could "adopt" the posts of others. Like say, if someone posts an interesting idea that they don't have the  time to polish or expand upon, they could post is somewhere for people who can. 

Comment by Bridgett Kay (bridgett-kay) on My Weirdest Experience · 2020-12-09T20:11:25.075Z · LW · GW

Yeah- the experience really shook me. I'm prone to fairly vivid and interesting dreams, but this was definitely the strangest. 

Comment by Bridgett Kay (bridgett-kay) on Null-boxing Newcomb’s Problem · 2020-07-15T02:10:11.968Z · LW · GW

But this was the final trick, for as soon as Maxwell accepted the two million dollars, the simulation ended.

Comment by Bridgett Kay (bridgett-kay) on Seeing the Matrix, Switching Abstractions, and Missing Moods · 2019-06-05T17:19:40.587Z · LW · GW

How would you compare this technique to a more standard mindfulness practice?

Comment by Bridgett Kay (bridgett-kay) on Open Thread January 2019 · 2019-01-27T19:41:53.735Z · LW · GW

Well, I'm setting up a SETI style project looking for extra-temporal info... in other words looking for time travelers. I did an initial set of experiments which were poorly planned out and riddled with paradox, but I've redesigned the experiments and will be starting them soon.

Comment by Bridgett Kay (bridgett-kay) on Open Thread January 2019 · 2019-01-13T23:51:36.375Z · LW · GW

I see. Just running with the premise as it stood.

Comment by Bridgett Kay (bridgett-kay) on Open Thread January 2019 · 2019-01-11T22:36:09.793Z · LW · GW

Do you think it is more likely that r&d will simply cease rather than there being fewer and fewer returns from r&d over time, causing companies to put more money into it to stay competitive? I wonder if the situation might not cause the prices to actually go up, like with medication.

Comment by Bridgett Kay (bridgett-kay) on Open Thread January 2019 · 2019-01-09T22:24:11.559Z · LW · GW

I've been lurking for a while but haven't posted very much. I'm a writer who also enjoys doing weird experiments in my spare time. Hi there :)

Comment by Bridgett Kay (bridgett-kay) on Is Science Slowing Down? · 2018-11-28T20:26:17.377Z · LW · GW

I'm also partial to the low hanging fruit explanation. Unfortunately, it seems to me we can really only examine progress on already established fields. Much harder to tell if there is much left to discover outside of established fields- the opportunities to make big discoveries that establish whole new fields of study. This is where the undiscovered, low hanging fruit would be, i think.

Comment by Bridgett Kay (bridgett-kay) on Double-Dipping in Dunning--Kruger · 2018-11-28T15:59:28.972Z · LW · GW

This is probably good general advice, but it's a different matter when there is evidence that points to being an actual imposter. For example, when I write novels that do not sell, or blog posts that get downvoted to oblivion, it is difficult to get honest feedback as to how I might improve my writing. The feedback I get is almost always positive, but reviews are self-selected because people rarely are motivated to review something unless they especially like it. Plus, politeness prohibits people from being harsh when you ask for feedback. For these reasons, I am more apt to trust the hard metrics and view myself as a poor writer who has managed to fool a few people. Improving myself is a far difficult task, in light of this. I guess my point is that knowing that there may be a blind spot means you can adjust for it, but it is also an opportunity to actually check it.

Comment by Bridgett Kay (bridgett-kay) on Hero Licensing · 2017-11-22T22:31:34.972Z · LW · GW

There's one person who always has the authority to say you can't try-yourself. Some people have a harder time than others when it comes to ignoring the discouragement of Pat and Maude.