Posts

How to bet on AI, without helping AGI? 2024-11-29T22:46:03.109Z
Is principled mass-outreach possible, for AGI X-risk? 2024-01-21T17:45:32.951Z
Learning Math in Time for Alignment 2024-01-09T01:02:37.446Z
Upgrading the AI Safety Community 2023-12-16T15:34:26.600Z
Intelligence Enhancement (Monthly Thread) 13 Oct 2023 2023-10-13T17:28:37.490Z
How to Get Rationalist Feedback 2023-10-05T02:03:10.766Z
Musk, Starlink, and Crimea 2023-09-23T02:35:02.623Z
Incentives affecting alignment-researcher encouragement 2023-08-29T05:11:59.729Z
How necessary is intuition, for advanced math? 2023-07-20T00:18:43.634Z
Build knowledge base first, or backchain? 2023-07-17T03:44:55.127Z
Rationality, Pedagogy, and "Vibes": Quick Thoughts 2023-07-15T02:09:46.677Z
Alignment Megaprojects: You're Not Even Trying to Have Ideas 2023-07-12T23:39:54.392Z
My Central Alignment Priority (2 July 2023) 2023-07-03T01:46:01.764Z
My Alignment Timeline 2023-07-03T01:04:07.935Z
How to Search Multiple Websites Quickly 2023-06-22T00:42:41.598Z
Does anyone's full-time job include reading and understanding all the most-promising formal AI alignment work? 2023-06-16T02:24:31.048Z
Dreams of "Mathopedia" 2023-06-02T01:30:05.007Z
Abstraction is Bigger than Natural Abstraction 2023-05-31T00:00:36.373Z
My AI Alignment Research Agenda and Threat Model, right now (May 2023) 2023-05-28T03:23:38.353Z
Why and When Interpretability Work is Dangerous 2023-05-28T00:27:37.747Z
Why I'm Not (Yet) A Full-Time Technical Alignment Researcher 2023-05-25T01:26:49.378Z
[SEE NEW EDITS] No, *You* Need to Write Clearer 2023-04-29T05:04:01.559Z
How to parallelize "inherently" serial theory work? 2023-04-07T00:08:44.428Z
An Average Dialogue 2023-04-01T04:01:50.998Z
Stop Using Discord as an Archive 2023-03-30T02:15:34.580Z
An A.I. Safety Presentation at RIT 2023-03-27T23:49:59.657Z
NicholasKross's Shortform 2023-03-05T20:31:07.826Z
Who should write the definitive post on Ziz? 2022-12-15T06:37:16.150Z
Historical examples of people gaining unusual cognitive abilities? 2022-11-24T19:01:14.105Z
Discussion: Was SBF a naive utilitarian, or a sociopath? 2022-11-17T02:52:09.756Z
SBF x LoL 2022-11-15T20:24:52.041Z
I am a Memoryless System 2022-10-23T17:34:48.367Z
How to learn: Struggle VS Lookup-Table? 2022-09-25T21:58:43.032Z
The Power (and limits?) of Chunking 2022-09-06T02:26:27.808Z
Ways to increase working memory, and/or cope with low working memory? 2022-08-21T22:31:12.385Z
Wacky, risky, anti-inductive intelligence-enhancement methods? 2022-07-14T01:40:28.137Z
A Quick List of Some Problems in AI Alignment As A Field 2022-06-21T23:23:31.719Z
[retracted] A really simplistic experiment for LessWrong and /r/SneerClub 2022-05-21T05:52:28.796Z
Quick Thoughts on A.I. Governance 2022-04-30T14:49:26.694Z
Don't Look Up (Film Review) 2021-12-27T20:36:02.527Z
Why did computer science get so galaxy-brained? 2021-12-27T08:50:13.579Z
Reaction and Reply to Sasha Chapin on Bad In-group Norms 2021-11-19T01:13:32.946Z
There Meat Come A Scandal... 2021-11-07T20:52:12.025Z
No, really, can "dead" time be salvaged? 2021-10-26T00:02:11.540Z
Burst work or steady work? 2021-06-22T05:36:49.559Z
How to test tiny skills? 2021-04-24T05:02:45.347Z
Roundabout Strategy 2021-01-28T00:44:00.743Z
Ways to be more agenty? 2021-01-05T08:06:13.496Z
Examples of positive-sum(ish) games? 2020-10-10T21:09:04.349Z
Can you gain weirdness points? 2020-07-31T03:41:47.050Z

Comments

Comment by Nicholas / Heather Kross (NicholasKross) on Why and When Interpretability Work is Dangerous · 2024-11-30T02:18:40.776Z · LW · GW

Kinda, my current mainline-doom-case is "some AI gets controlled --> powerful people use it to prop themselves up --> world gets worse until AI gets uncontrollably bad --> doom". I would call it a different yet also-important doom case of "perpetual low-grade-AI dictatorship where the AI is controlled by humans in a surveillance state".

Comment by Nicholas / Heather Kross (NicholasKross) on An AI crash is our best bet for restricting AI · 2024-11-10T17:36:11.622Z · LW · GW

EDIT: Due to the incoming administration's ties to tech investors, I no longer think an AI crash is so likely. Several signs IMHO point to "they're gonna go all-in on racing for AI, regardless of how 'needed' it actually is".

Comment by Nicholas / Heather Kross (NicholasKross) on An AI crash is our best bet for restricting AI · 2024-10-18T17:44:56.811Z · LW · GW

For more details on (the business side of) a potential AI crash, see recent articles by the blog Where's Your Ed At, which wrote the sorta-well-known post "The Man Who Killed Google Search".

For his AI-crash posts, start here and here and click on links to his other posts. Sadly, the author falls into the trap of "LLMs will never get to reasoning because they don't, like, know stuff, man", but luckily his core competencies (the business side, analyzing reporting) show why an AI crash could still very much happen.

Comment by Nicholas / Heather Kross (NicholasKross) on AI #73: Openly Evil AI · 2024-07-18T22:53:02.264Z · LW · GW

Further context on the Scott Adams thing lol: He claims to have taken hypnosis lessons decades ago and has referred to using it multiple times. His, uh, personality also seems to me like it'd be more susceptible to hypnosis than average (and even he'd probably admit this in a roundabout way).

Comment by NicholasKross on [deleted post] 2024-05-11T18:05:00.796Z

Further observation about that second sentence.

Comment by NicholasKross on [deleted post] 2024-05-10T21:44:14.190Z

I think deeply understanding top tier capabilities researchers' views on how to achieve AGI is actually extremely valuable for thinking about alignment. Even if you disagree on object level views, understanding how very smart people come to their conclusions is very valuable.

I think the first sentence is true (especially for alignment strategy), but the second sentence seems sort of... broad-life-advice-ish, instead of a specific tip? It's a pretty indirect help to most kinds of alignment.

Otherwise, this comment's points really do seem like empirical things that people could put odds or ratios on. Wondering if a more-specific version of those "AI Views Snapshots" would be warranted, for these sorts of "research meta-knowledge" cruxes. Heck, it might be good to have lots of AI Views Snapshot DLC Mini-Charts, from for-specific-research-agendas(?) to internal-to-organizations(?!?!?!?).

Comment by Nicholas / Heather Kross (NicholasKross) on LessOnline (May 31—June 2, Berkeley, CA) · 2024-05-10T01:49:21.712Z · LW · GW

I can't make this one, but I'd love to be at future LessOnline events when I'm less time/budget-constrained! :)

Comment by NicholasKross on [deleted post] 2024-05-02T23:19:50.856Z

First link is broken.

Comment by NicholasKross on [deleted post] 2024-05-02T16:17:16.618Z

"But my ideas are likely to fail! Can I share failed ideas?": If you share a failed idea, that saves the other person time/effort they would've spent chasing that idea. This, of course, speeds up that person's progress, so don't even share failed ideas/experiments about AI, in the status quo.

"So where do I privately share such research?" — good question! There is currently no infrastructure for this. I suggest keeping your ideas/insights/research to yourself. If you think that's difficult for you to do, then I suggest not thinking about AI, and doing something else with your time, like getting into factorio 2 or something.

"But I'm impatient about the infrastructure coming to exist!": Apply for a possibly-relevant grant and build it! Or build it in your spare time. Or be ready to help out if/when someone develops this infrastructure.

"But I have AI insights and I want to convert them into money/career-capital/personal-gain/status!": With that kind of brainpower/creativity, you can get any/all of those things pretty efficiently without publishing AI research, working at a lab, advancing a given SOTA, or doing basically (or literally) anything that differentially speeds up AI capabilities. This, of course, means "work on the object-level problem, without routing that work through AI capabilities", which is often as straightforward "do it yourself".

"But I'm wasting my time if I don't get involved in something related to AGI!": "I want to try LSD, but it's only available in another country. I could spend my time traveling to that country, or looking for mushrooms, or even just staying sober. Therefore, I'm wasting my time unless I immediately inject 999999 fentanyl."

Comment by Nicholas / Heather Kross (NicholasKross) on LessOnline Festival Updates Thread · 2024-04-19T17:37:35.625Z · LW · GW

How scarce are tickets/"seats"?

Comment by Nicholas / Heather Kross (NicholasKross) on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-02T00:05:22.427Z · LW · GW

I will carefully hedge my investment in this company by giving it $325823e7589245728439572380945237894273489, in exchange for a board seat so I can keep an eye on it.

Comment by Nicholas / Heather Kross (NicholasKross) on [April Fools' Day] Introducing Open Asteroid Impact · 2024-04-02T00:03:33.319Z · LW · GW

I have over 5 Twitter followers, I'll take my board seat when ur ready

Comment by Nicholas / Heather Kross (NicholasKross) on Why I no longer identify as transhumanist · 2024-03-13T17:20:15.301Z · LW · GW

Giving up on transhumanism as a useful idea of what-to-aim-for or identify as, separate from how much you personally can contribute to it.

More directly: avoiding "pinning your hopes on AI" (which, depending on how I'm supposed to interpret this, could mean "avoiding solutions that ever lead to aligned AI occurring" or "avoiding near-term AI, period" or "believing that something other than AI is likely to be the most important near-future thing", which are pretty different from each other, even if the end prescription for you personally is (or seems, on first pass, to be) the same.), separate from how much you personally can do to positively affect AI development.

Then again, I might've misread/misinterpreted what you wrote. (I'm unlikely to reply to further object-level explanation of this, sorry. I mainly wanted to point out the pattern. It'd be nice if your reasoning did turn out correct, but my point is that its starting-place seems/seemed to be rationalization as per the pattern.)

Comment by Nicholas / Heather Kross (NicholasKross) on Why I no longer identify as transhumanist · 2024-03-12T17:02:26.040Z · LW · GW

Yes, I think this post / your story behind it, is likely an example of this pattern.

Comment by Nicholas / Heather Kross (NicholasKross) on "How could I have thought that faster?" · 2024-03-11T22:12:35.962Z · LW · GW

That's technically a different update from the one I'm making. However, I also update in favor of that, as a propagation of the initial update. (Assuming you mean "good enough" as "good enough at pedagogy".)

Comment by Nicholas / Heather Kross (NicholasKross) on "How could I have thought that faster?" · 2024-03-11T17:14:48.952Z · LW · GW

This sure does update me towards "Yudkowsky still wasn't good enough at pedagogy to have made 'teach people rationality techniques' an 'adequately-covered thing by the community'".

Comment by Nicholas / Heather Kross (NicholasKross) on Why I no longer identify as transhumanist · 2024-03-10T21:08:28.390Z · LW · GW
  • Person tries to work on AI alignment.
  • Person fails due to various factors.
  • Person gives up working on AI alignment. (This is probably a good move, when it's not your fit, as is your case.)
  • Danger zone: In ways that sort-of-rationalize-around their existing decision to give up working on AI alignment, the person starts renovating their belief system around what feels helpful to their mental health. (I don't know if people are usually doing this after having already tried standard medical-type treatments, or instead of trying those treatments.)
  • Danger zone: Person announces this shift to others, in a way that's maybe and/or implicitly prescriptive (example).

There are, depressingly, many such cases of this pattern. (Related post with more details on this pattern.)

Comment by Nicholas / Heather Kross (NicholasKross) on Virtually Rational - VRChat Meetup · 2024-02-25T00:44:48.253Z · LW · GW

Group Debugging is intriguing...

Comment by Nicholas / Heather Kross (NicholasKross) on Dreams of AI alignment: The danger of suggestive names · 2024-02-12T18:28:00.065Z · LW · GW

How many times has someone expressed "I'm worried about 'goal-directed optimizers', but I'm not sure what exactly they are, so I'm going to work on deconfusion."? There's something weird about this sentiment, don't you think?

I disagree, and I will take you up on this!

"Optimization" is a real, meaningful thing to fear, because:

Comment by Nicholas / Heather Kross (NicholasKross) on Aligned AI is dual use technology · 2024-01-29T18:25:38.657Z · LW · GW

Ah, yeah that's right.

Comment by Nicholas / Heather Kross (NicholasKross) on Aligned AI is dual use technology · 2024-01-29T00:19:00.463Z · LW · GW

If it helps clarify: I (and some others) break down the alignment problem into "being able to steer it at all" and "what to steer it at". This post is about the danger of having the former solved, without the latter being solved well (e.g. through some kind of CEV).

Comment by Nicholas / Heather Kross (NicholasKross) on Global LessWrong/AC10 Meetup on VRChat · 2024-01-25T00:09:22.974Z · LW · GW

Love this event series! Can't come this week, but next one I can!

Comment by Nicholas / Heather Kross (NicholasKross) on Is principled mass-outreach possible, for AGI X-risk? · 2024-01-22T00:08:31.629Z · LW · GW

No worries! I make similar mistakes all the time (just check my comment history ;-;)

And I do think your comment is useful, in the same way that Rohin's original comment (which my post is responding to) is useful :)

Comment by Nicholas / Heather Kross (NicholasKross) on Is principled mass-outreach possible, for AGI X-risk? · 2024-01-21T23:52:21.004Z · LW · GW

FWIW, I have an underlying intuition here that's something like “if you're going to go Dark Arts, then go big or go home”, but I don't really know how to operationalize that in detail and am generally confused and sad. In general, I think people who have things like “logical connectives are relevant to the content of the text” threaded through enough of their mindset tend to fall into a trap analogous to the “Average Familiarity” xkcd or to Hofstadter's Law when they try truly-mass communication unless they're willing to wrench things around in what are often very painful ways to them, and (per the analogies) that this happens even when they're specifically trying to correct for it.

I disagree with the first sentence, but agree strongly with the rest of it. My whole point is that it may be literally possible to make:

  1. mass-audience arguments
  2. about extinction risk from AI
  3. that don't involve lying.

Maybe we mean different things by "Dark Arts" here? I don't actually consider (going hard with messaging like) "This issue is complicated, but you [the audience member] understandably don't want to deal with it, so we should go harder on preventing risk for now based on the everyday risk-avoidance you probably practice yourself." as lying or manipulation. You could call it Dark Arts if you drew the "Dark Arts" cluster really wide, but I would disagree with that cluster-drawing.

Comment by Nicholas / Heather Kross (NicholasKross) on Is principled mass-outreach possible, for AGI X-risk? · 2024-01-21T23:47:58.466Z · LW · GW

Now, I do separately observe a subset of more normie-feeling/working-class people who don't loudly profess the above lines and are willing to e.g. openly use some generative-model art here and there in a way that suggests they don't have the same loud emotions about the current AI-technology explosion. I'm not as sure what main challenges we would run into with that crowd, and maybe that's whom you mean to target.

That's... basically what my proposal is? Yeah? People that aren't already terminally-online about AI, but may still use chatGPT and/or StableDiffusion for fun or even work. Or (more common) those who don't even have that much interaction, who just see AI as yet another random thingy in the headlines.

Comment by Nicholas / Heather Kross (NicholasKross) on Threat-Resistant Bargaining Megapost: Introducing the ROSE Value · 2024-01-21T19:25:55.230Z · LW · GW

Yeah, mostly agreed. My main subquestion (that led me to write the review, besides this post being referenced in Leake's work) was/sort-of-still-is "Where do the ratios in value-handshakes come from?". The default (at least in the tag description quote from SSC) is uncertainty in war-winning, but that seems neither fully-principled nor nice-things-giving (small power differences can still lead to huge win-% chances, and superintelligences would presumably be interested in increasing accuracy). And I thought maybe ROSE bargaining could be related to that.

The relation in my mind was less ROSE --> DT, and more ROSE --?--> value-handshakes --> value-changes --?--> DT.

Comment by Nicholas / Heather Kross (NicholasKross) on Ways to buy time · 2024-01-21T17:31:59.053Z · LW · GW

(On my beliefs, which I acknowledge not everyone shares, expecting something better than "mass delusion of incorrect beliefs that implies that AGI is risky" if you do wide-scale outreach now is assuming your way out of reality.)

I'm from the future, January 2024, and you get some Bayes Points for this!

The "educated savvy left-leaning online person" consensus (as far as I can gather) is something like: "AI art is bad, the real danger is capitalism, and the extinction danger is some kind of fake regulatory-capture hype techbro thing which (if we even bother to look at the LW/EA spaces at all) is adjacent to racists and cryptobros".

Still seems too early to tell whether or not people are getting lots of false beliefs that are still pushing them towards believing-AGI-is-an-X-risk, especially since that case seems to be made (in the largest platform) indirectly in congressional hearings that nobody outside tech/politics actually watches.

But it really doesn't seem great that my case for wide-scale outreach being good is "maybe if we create a mass delusion of incorrect beliefs that implies that AGI is risky, then we'll slow down, and the extra years of time will help". So overall my guess is that this is net negative.

To devil's steelman some of this: I think there's still an angle that few have tried in a really public way. namely, ignorance and asymmetry. (There is definitely a better term or two for what I'm about to describe, but I forgot it. Probably from Taleb or something.)

A high percentage of voting-eligible people in the US... don't vote. An even higher percentage vote in only the presidential elections, or only some presidential elections. I'd bet a lot of money that most of these people aren't working under a Caplan-style non-voting logic, but instead under something like "I'm too busy" or "it doesn't matter to me / either way / from just my vote".

Many of these people, being politically disengaged, would not be well-informed about political issues (or even have strong and/or coherent values related to those issues). What I want to see is an empirical study that asks these people "are you aware of this?" and "does that awareness, in turn, factor into you not-voting?".

I think there's a world, which we might live in, where lots of non-voters believe something akin to "Why should I vote, if I'm clueless about it? Let the others handle this lmao, just like how the nice smart people somewhere make my bills come in."

In a relevant sense, I think there's an epistemically-legitimate and persuasive way to communicate "AGI labs are trying to build something smarter than humans, and you don't have to be an expert (or have much of a gears-level view of what's going on) to think this is scary. If our smartest experts still disagree on this, and the mistake-asymmetry is 'unnecessary slowdown VS human extinction', then it's perfectly fine to say 'shut it down until [someone/some group] figures out what's going on'".

To be clear, there's still a ton of ways to get this wrong, and those who think otherwise are deluding themselves out of reality. I'm claiming that real-human-doable advocacy can get this right, and it's been mostly left untried.

EXTRA RISK NOTE: Most persuasion, including digital, is one-to-many "broadcast"-style; "going viral" usually just means "some broadcast happened that nobody heard of", like an algorithm suggesting a video to a lot of people at once. Given this, plus anchoring bias, you should expect and be very paranoid about the "first thing people hear = sets the conversation" thing. (Think of how many people's opinions are copypasted from the first classy video essay mass-market John Oliver video they saw about the subject, or the first Fox News commentary on it.)

Not only does the case for X-risk need to be made first, but it needs to be right (even in a restricted way like my above suggestion) the first time. Actually, that's another reason why my restricted-version suggestion should be prioritized, since it's more-explicitly robust to small issues.

(If somebody does this in real life, you need to clearly end on something like "Even if a minor detail like [name a specific X] or [name a specific Y] is wrong, it doesn't change the underlying danger, because the labs are still working towards Earth's next intelligent species, and there's nothing remotely strong about the 'safety' currently in place.")

Comment by Nicholas / Heather Kross (NicholasKross) on Threat-Resistant Bargaining Megapost: Introducing the ROSE Value · 2024-01-21T16:58:10.709Z · LW · GW

So there's a sorta-crux about how much DT alignment researchers would have to encode into the-AI-we-want-to-be-aligned, before that AI is turned on. Right now I'm leaning towards "an AI that implements CEV well, would either turn-out-to-have or quickly-develop good DT on its own", but I can see it going either way. (This was especially true yesterday when I wrote this review.)

And I was trying to think through some of the "DT relevance to alignment" question, and I looked at relevant posts by [Tamsin Leake](https://www.lesswrong.com/users/tamsin-leake) (whose alignment research/thoughts I generally agree with). And that led me to thinking more about value-handshakes, son-of-CDT (see Arbital), and systems like ROSE bargaining. Any/all of which, under certain assumptions, could determine (or hint at) answers to the "DT relevance" thing.

Comment by Nicholas / Heather Kross (NicholasKross) on There is way too much serendipity · 2024-01-21T05:01:03.624Z · LW · GW

Selection Bias Rules (Debatably Literally) Everything Around Us

Comment by Nicholas / Heather Kross (NicholasKross) on Even if we lose, we win · 2024-01-21T04:37:26.265Z · LW · GW

Currently, I think this is a big crux in how to "do alignment research at all". Debatably "the biggest" or even "the only real" crux.

(As you can tell, I'm still uncertain about it.)

Comment by Nicholas / Heather Kross (NicholasKross) on Threat-Resistant Bargaining Megapost: Introducing the ROSE Value · 2024-01-21T03:30:30.244Z · LW · GW

Decision theory is hard. In trying to figure out why DT is useful (needed?) for AI alignment in the first place, I keep running into weirdness, including with bargaining.

Without getting too in-the-weeds: I'm pretty damn glad that some people out there are working on DT and bargaining.

Comment by Nicholas / Heather Kross (NicholasKross) on Bad at Arithmetic, Promising at Math · 2024-01-18T02:26:41.131Z · LW · GW

Still seems too early to tell if this is right, but man is it a crux (explicit or implicit).

 

Terence Tao seems to have gotten some use out of the most recent LLMs.

Comment by Nicholas / Heather Kross (NicholasKross) on The LessWrong 2022 Review · 2024-01-17T23:24:02.865Z · LW · GW

if you take into account the 4-5 staff months these cost to make each year, we net lost money on these

For the record, if each book-set had cost $40 or even $50, I still would have bought them, right on release, every time. (This was before my financial situation improved, and before the present USD inflation.)

I can't speak for everyone's financial situation, though. But I (personally) mentally categorize these as "community-endorsement luxury-type goods", since all the posts are already online anyway.

The rationality community is unusually good about not selling ingroup-merch when it doesn't need or want to. These book sets are the perfect exceptions.

Comment by NicholasKross on [deleted post] 2024-01-13T06:15:37.007Z

to quote a fiend, "your mind is a labyrinth of anti-cognitive-bias safeguards, huh?"

[emphasis added]

The implied context/story this is from sure sounds interesting. Mind telling it?

Comment by Nicholas / Heather Kross (NicholasKross) on "Dark Constitution" for constraining some superintelligences · 2024-01-11T00:38:39.308Z · LW · GW

I don't think of governments as being... among other things "unified" enough to be superintelligences.

Also, see "Things That Are Not Superintelligence" and "Maybe The Real Superintelligent AI Is Extremely Smart Computers".

Comment by Nicholas / Heather Kross (NicholasKross) on I Would Have Solved Alignment, But I Was Worried That Would Advance Timelines · 2024-01-09T19:32:19.548Z · LW · GW

The alignment research that is done will be lower quality due to less access to compute, capability knowhow, and cutting edge AI systems.

I think this is false, though it's a crux in any case.

Capabilities withdrawal is good because we don't need big models to do the best alignment work, because that is theoretical work! Theoretical breakthroughs can make empirical research more efficient. It's OK to stop doing capabilities-promoting empirical alignment, and focus on theory for a while.

(The overall idea of "if all alignment-knowledgeable capabilities people withdraw, then all capabilities will be done by people who don't know/care about alignment" is still debatable, but distinct. One possible solution: safety-promoting AGI labs stop their capabilities work, but continue to hire capabilities people, partly to prevent them from working elsewhere. This is complicated, but not central to my objection above.)

I see this asymmetry a lot and may write a post on it:

If theoretical researchers are wrong, but you do follow their caution anyway, then empirical alignment goes slower... and capabilities research slows down even more. If theoretical researchers are right, but you don't follow their caution, you continue or speedup AI capabilities to do less-useful alignment work.

Comment by Nicholas / Heather Kross (NicholasKross) on Learning Math in Time for Alignment · 2024-01-09T05:46:11.125Z · LW · GW

Good catch! Most of it is hunches to be tested (and/or theorized on, but really tested) currently. Fixed

Comment by NicholasKross on [deleted post] 2024-01-09T00:19:34.499Z

"Exfohazard" is a quicker way to say "information that should not be leaked". AI capabilities has progressed on seemingly-trivial breakthroughs, and now we have shorter timelines.

The more people who know and understand the "exfohazard" concept, the safer we are from AI risk.

Comment by Nicholas / Heather Kross (NicholasKross) on Optimality is the tiger, and agents are its teeth · 2024-01-09T00:11:22.813Z · LW · GW

More framings help the clarity of the discussion. If someone doesn't understand (or agree with) classic AI-takeover scenarios, this is one of the posts I'd use to explain them.

Comment by Nicholas / Heather Kross (NicholasKross) on Write a Thousand Roads to Rome · 2024-01-08T18:17:49.573Z · LW · GW

Funny thing, I had a similar idea to this (after reading some Sequences and a bit about pedagogy). That was the sort-of-multi-modal-based intuition behind Mathopedia.

Comment by Nicholas / Heather Kross (NicholasKross) on NicholasKross's Shortform · 2024-01-05T03:00:27.504Z · LW · GW

Is any EA group *funding* adult human intelligence augmentation? It seems broadly useful for lots of cause areas, especially research-bottlenecked ones like AI alignment.

Why hasn't e.g. OpenPhil funded this project?: https://www.lesswrong.com/posts/JEhW3HDMKzekDShva/significantly-enhancing-adult-intelligence-with-gene-editing

Comment by Nicholas / Heather Kross (NicholasKross) on Does LessWrong make a difference when it comes to AI alignment? · 2024-01-04T19:24:17.313Z · LW · GW

Seems to usually be good faith. People can still be biased of course (and they can't all be right on the same questions, with the current disagreements), but it really is down to differing intuitions, which background-knowledge posts have been read by which people, etc.

Comment by Nicholas / Heather Kross (NicholasKross) on Does LessWrong make a difference when it comes to AI alignment? · 2024-01-04T02:57:56.828Z · LW · GW

To add onto other people's answers:

People have disagreements over what the key ideas about AI/alignment even are.

People with different basic-intuitions notoriously remain unconvinced by each other's arguments, analogies, and even (the significance of) experiments. This has not been solved yet.

Alignment researchers usually spend most time on their preferred vein of research, rather than trying to convince others.

To (try to) fix this, the community's added concepts like "inferential distance" and "cruxes" to our vocabulary. These should be be discussed and used explicitly.

One researcher has some shortform notes (here and here) on how hard it is to communicate about AI alignment. I myself wrote some longer, more emotionally-charged notes on why we'd expect this.

But there's hope yet! This chart format makes it easier to communicate beliefs on key AI questions. And better ideas can always be lurking around the corner...

Comment by NicholasKross on [deleted post] 2024-01-04T02:42:23.804Z

I relate to this quite a bit ;-;

Comment by NicholasKross on [deleted post] 2024-01-04T02:33:21.307Z

People's minds are actually extremely large things that you fundamentally can't fully model

Is this "fundamentally" as in "because you, the reader, are also a bounded human, like them"? Or "fundamentally" as in (something more fundamental than that)?

Comment by NicholasKross on [deleted post] 2024-01-04T02:32:29.093Z

If timelines weren't so short, brain-computer-based telepathy would unironically be a big help for alignment.

(If a group had the money/talent to "hedge" on longer timelines by allocating some resources to that... well, instead of a hivemind, they first need to run through the relatively-lower-hanging fruit. Actually, maybe they should work on delaying capabilities research, or funding more hardcore alignment themselves, or...)

Comment by NicholasKross on [deleted post] 2023-12-18T02:27:46.757Z

This point could definitely be its own post. I'd love to see you write this! (I'd of course be willing to proofread/edit it, title it, etc.)

Comment by Nicholas / Heather Kross (NicholasKross) on "Humanity vs. AGI" Will Never Look Like "Humanity vs. AGI" to Humanity · 2023-12-16T20:13:30.144Z · LW · GW

And the AGI, if it's worth the name, would not fail to exploit this.

This sentence is a good short summary of some AI alignment ideas. Good writing!

Comment by NicholasKross on [deleted post] 2023-12-16T20:10:23.157Z

Someone may think "Anomalous worlds imply the simulation-runners will save us from failing at alignment!"

My reply is: Why are they running a simulation where we have to solve alignment?

At a first pass, if we're in a simulation, it's probably for research, rather than e.g. a video game or utopia. (H/t an IRL friend for pointing this out).

Therefore, if we observe ourselves needing to solve AI alignment (and not having solved it yet), the simulation-runners potentially also need AI alignment to get solved. And if history is any guide, we should not rely on any such beings "saving us" before things cross a given threshold of badness.

(There are other caveats I can respond to about this, but please DM me about them if you think of them, since they may be infohazard-leaning and (thus) should not be commented publicly, pending more understanding.)

Comment by Nicholas / Heather Kross (NicholasKross) on Current AIs Provide Nearly No Data Relevant to AGI Alignment · 2023-12-16T00:32:57.484Z · LW · GW

But you wouldn't study ... MNIST-classifier CNNs circa 2010s, and claim that your findings generalize to how LLMs circa 2020s work.

This particular bit seems wrong; CNNs and LLMs are both built on neural networks. If the findings don't generalize, that could be called a "failure of theory", not an impossibility thereof. (Then again, maybe humans don't have good setups for going 20 steps ahead of data when building theory, so...)

(To clarify, this post is good and needed, so thank you for writing it.)