Posts

Early Experiments in Human Auditing for AI Control 2025-01-23T01:34:31.682Z
Jokes Thread 2014-07-24T00:31:36.379Z

Comments

Comment by Joey Yudelson (JosephY) on Daniel Tan's Shortform · 2025-01-23T18:19:35.092Z · LW · GW

I wonder if r1 could reliably make the claim that the text had stego-text without being told—i.e. give it a lineup of ten pieces of short creative writing that another instance generated, 1 has steg. See if r1 can pick out the steg, then if various monitors (Sonnet, o1) can also pick it out

Comment by Joey Yudelson (JosephY) on [Cross-post] Every Bay Area "Walled Compound" · 2025-01-23T17:53:09.374Z · LW · GW

This tree is a great place to hold a Kabbalat Shabbat underneath, incidentally

 

Lighthaven minyan when?

Comment by Joey Yudelson (JosephY) on Why The Focus on Expected Utility Maximisers? · 2022-12-27T17:37:53.796Z · LW · GW

I think that solving the alignment for EV maximizers is a much stronger version of alignment than eg prosaic alignment of LLM-type models. Agents seem like they’ll be more powerful than Tool AIs. We don’t know how to make them, but if someone does, and capabilities timelines shorten drastically, it would be awesome to even have a theory of EV maximizer alignment before then

Comment by Joey Yudelson (JosephY) on chinchilla's wild implications · 2022-08-29T22:33:53.130Z · LW · GW

Sorry if this is obvious, but where does the “irreducible” loss come from? Wouldn’t that also be a function of the data, or I guess the data’s predictability?

Comment by Joey Yudelson (JosephY) on What are the most common and important trade-offs that decision makers face? · 2014-11-04T00:08:38.385Z · LW · GW

Constant, predictable gains vs. Black Swans

Comment by Joey Yudelson (JosephY) on 2014 Less Wrong Census/Survey · 2014-10-28T01:33:22.450Z · LW · GW

Did the survey! ...And now to upvote everything.

Comment by Joey Yudelson (JosephY) on Rationality Quotes May 2014 · 2014-05-29T21:42:25.072Z · LW · GW

It reminds me of Justice Potter Stewart: "I know it when I see it!"

Comment by Joey Yudelson (JosephY) on The Strangest Thing An AI Could Tell You · 2014-05-27T22:59:15.640Z · LW · GW

I knew we shouldn't have spent all that funding on awakening the Elder God Cthulhu!

Comment by Joey Yudelson (JosephY) on The Strangest Thing An AI Could Tell You · 2014-05-27T22:55:31.562Z · LW · GW

Oh god. That... makes a scary amount of sense. If an AI told me that I would probably believe it. I'd also start training myself to be more of a "night-time person".

Comment by Joey Yudelson (JosephY) on Welcome to Less Wrong! (6th thread, July 2013) · 2014-05-27T22:27:23.808Z · LW · GW

Hi, my name is Joe. I live in North Jersey. I was born into a very religious Orthodox Jewish family. I only recently realized I how badly I was doublethinking.

I started with HPMOR (as, it seems, do most people) and found my way into the Sequences. I read them all on OB, and was amazed at how eloquently someone else could voice what seems to be my thoughts. It laid out bare the things I had been struggling with.

Then I found LW and was mostly just lurking for a while. I only made an account when I saw this post and realized how badly I wanted to upvote some of the comments :).

I think this site and the Sequences on it have changed my life, and I'm glad to finally be part of it.