tricky_labyrinth

Posts
Comments

Posts

tricky_labyrinth's Shortform 2025-03-19T02:09:19.507Z

LEAst-squares Concept Erasure (LEACE) 2023-06-07T21:51:04.494Z

Comments

Comment by tricky_labyrinth on tricky_labyrinth's Shortform · 2025-03-19T02:09:19.505Z · LW · GW

Does anyone know why GPT 4.5 is seemingly getting stuck on the word "explicitly", repeating it continuously after it encounters it once? Is this only happening in ChatGPT? Seems like some sort of context collapse.

Sightings in the wild: https://x.com/KelseyTuoc/status/1902132078378189198 https://x.com/Josikinz/status/1901840144363082047 https://x.com/4confusedemoji/status/1895613332662730832 https://x.com/Westoncb/status/1895615564313448781 https://x.com/noself86/status/1901230843240370287 https://x.com/0x440x46/status/1900855229068829139 https://x.com/GusarichOnX/status/1900184434806059072

Comment by tricky_labyrinth on Steering GPT-2-XL by adding an activation vector · 2023-05-15T02:58:31.223Z · LW · GW

+1ing 5 specifically

Comment by tricky_labyrinth on Pausing AI Developments Isn't Enough. We Need to Shut it All Down · 2023-04-08T02:44:04.556Z · LW · GW

mfw you didn't add the final addendum (https://twitter.com/ESYudkowsky/status/1642216007552106496)

Comment by tricky_labyrinth on Given the Restrict Act, Don’t Ban TikTok · 2023-04-05T06:18:47.376Z · LW · GW

What I do not understand is why Apple and Google haven’t taken care of this for us.

Palmer Luckey has this talking point about how China has all the big tech companies (Apple in particular) by the balls. That + Google maybe not wanting to seem monopolistic by banning their competition seems to be a sufficient explanation.

Comment by tricky_labyrinth on An Appeal to AI Superintelligence: Reasons to Preserve Humanity · 2023-03-18T23:06:07.080Z · LW · GW

Why was this promoted to the frontpage?

Comment by tricky_labyrinth on Super-Luigi = Luigi + (Luigi - Waluigi) · 2023-03-17T17:50:29.978Z · LW · GW

Is "behavior vector space" referencing something? If not, what do you mean by it?

Comment by tricky_labyrinth on The best way so far to explain AI risk: The Precipice (p. 137-149) · 2023-03-02T08:57:44.704Z · LW · GW

Unrelated to the post's content itself: will LW get in trouble for hosting this excerpt?

Comment by tricky_labyrinth on [deleted post] 2023-02-22T07:32:09.176Z

Responding to the last line: to be clear, I'm not claiming I have one. More wondering if the AI risk community should try to find one as a desperate hail mary given they have ~0 hope for their current research directions.

aka I'm wondering if trying to find one even is a desperate hail mary

Comment by tricky_labyrinth on Sazen · 2023-02-17T21:08:34.030Z · LW · GW

Wait, what? Do you mean colloquial hieratic (just literally priestly) or his hieratic:

hieratic, adj. ~~Of computer documentation,~~ impenetrable because the author never sees outside his own intimate knowledge of the subject and is therefore unable to identify or meet the expository needs of newcomers. It might as well be written in hieroglyphics.

Cuz the latter seems extremely close to sazeny, if maybe additionally connoting blame on the author.

Comment by tricky_labyrinth on Sazen · 2023-02-17T05:41:31.139Z · LW · GW

I'm in the middle of writing a nonfiction book whose central conceit is something like "an abridged dictionary of Kadhamic." Not literally the actual canonical Alexandrian Kadhamic, but the idea is to present some hundred-or-so concepts that are long and complicated and difficult to convey in English, but which are not fundamentally more complicated than things we sum up with a single word like "basketball" or "gaslighting" or "cringe."

Very interested for when this comes out :O

Comment by tricky_labyrinth on EigenKarma: trust at scale · 2023-02-09T09:56:24.523Z · LW · GW

FYI, eigenkarma's been proposed for LessWrong multiple times (with issues supposedly found); see https://www.lesswrong.com/posts/xN2sHnLupWe4Tn5we/improving-on-the-karma-system#Eigenkarma for example.

Comment by tricky_labyrinth on Focus on the places where you feel shocked everyone's dropping the ball · 2023-02-05T21:25:30.919Z · LW · GW

https://twitter.com/carmenleelau/status/1593354133146402816 is another recent formulation of ~the same idea.

Comment by tricky_labyrinth on I hired 5 people to sit behind me and make me productive for a month · 2023-02-05T21:07:43.111Z · LW · GW

https://guzey.com/co-working/ seems to be ~that; a friend group that periodically checks in on each other.

Comment by tricky_labyrinth on It's time to worry about online privacy again · 2022-12-26T00:23:46.078Z · LW · GW

Probably supposed to be something like "If it's free [and not open source], you are the product."

Comment by tricky_labyrinth on Staring into the abyss as a core life skill · 2022-12-24T10:24:42.847Z · LW · GW

Reminds me of http://mindingourway.com/recklessness/ (and also your recent post on overconfidence).

Comment by tricky_labyrinth on What an actually pessimistic containment strategy looks like · 2022-12-24T09:58:24.019Z · LW · GW

Not all political activism has to be waving flags around and chanting chants. Sometimes activists actually have goals and then accomplish something. I think we should try to learn from those people, as lowly as your opinion might be of them, if we don't seem to have many other options.

This does make me wonder if activism from scientists has ever worked significantly. https://www.bismarckanalysis.com/Nuclear_Weapons_Development_Case_Study.pdf documents the Manhattan Project, https://www.palladiummag.com/2021/03/16/leo-szilards-failed-quest-to-build-a-ruling-class/ argues that there was partial success.

Comment by tricky_labyrinth on What an actually pessimistic containment strategy looks like · 2022-12-24T09:43:41.355Z · LW · GW

An institution could do A/B testing on interventions like these. It can talk to people more than once.

We can't take this for granted: when A tells B that B's views are inconsistent, the standard response (afaict) is for B to default in one direction (and which direction is often heavily influenced by their status quo), make that direction their consistent view, and then double down every time they're pressed.

It's possible that we have ~1 shot per person at convincing them.

Comment by tricky_labyrinth on Extreme Security · 2022-12-24T06:50:55.867Z · LW · GW

I've heard it go by the name security through obscurity (see https://en.wikipedia.org/wiki/Security_through_obscurity).

Comment by tricky_labyrinth on "Search" is dead. What is the new paradigm? · 2022-12-23T21:46:11.905Z · LW · GW

Related: seems like some search engines are already integrating LLMs:
- One approach is directly providing links; see https://metaphor.systems, brought up yesterday @ https://www.lesswrong.com/posts/rZwy6CeYAWXgGcxgC/metaphor-systems
- Another is LLM summarization of search engine provided links; https://you.com/search?q=what+was+the+recent+breakthrough+in+fusion+research%3F as an example

Comment by tricky_labyrinth on Metaphor.systems · 2022-12-21T23:05:41.804Z · LW · GW

Just for calibration, what are the other things you've tried? I've tried alternative search engines like https://millionshort.com, link aggregators/curators like reddit/slashdot/hackernews/etc, manually curated lists.

(I've been playing around with it for a bit and it seems quite good to me too)

Comment by tricky_labyrinth on Nightmare of the Perfectly Principled · 2022-12-21T22:46:11.414Z · LW · GW

So their reported beliefs track a convenient consistent worldview, but they don’t use the vast majority of their practical knowledge and life experience, and can’t change their mind when it’s not socially convenient to do so.

The first half I understand the reasoning of, but what's the reasoning for "and can’t change their mind when it’s not socially convenient to do so"? Specifically, is this saying they can't change their publicly reported beliefs vs their privately held ones when it's not socially convenient?

Comment by tricky_labyrinth on K-types vs T-types — what priors do you have? · 2022-11-07T12:10:24.278Z · LW · GW

Fox-Hedgehog doesn't fit well imo. It's more something like RISC (K) vs CISC (T).

Comment by tricky_labyrinth on Does Google still hire people via their foobar challenge? · 2022-10-05T00:25:36.823Z · LW · GW

I got in via it in 2018; not sure about recently.

Comment by tricky_labyrinth on A method of writing content easily with little anxiety · 2022-04-08T23:20:11.883Z · LW · GW

To me, the difference between the colloquial term "brainstorming" and this site's term "babble and prune" is the intentional choice to split the activity into two phases: an unfiltered idea generation phase followed by a filtering/editing phase. Emphasis on "unfiltered", for the anxiety-reducing and writer's block circumventing reasons you gave.

I'd be grateful for an update down the line, if you come across any unexpected benefits/shortcomings.

Comment by tricky_labyrinth on Black ravens and red herrings · 2021-07-28T21:04:24.353Z · LW · GW

nit:

and find no non-white ravens (but do find black ravens)

I think you meant "no non-black ravens" here.

User info

Posts

Comments