Learning By Writing

holdenkarnofsky

Learning By Writing

post by HoldenKarnofsky · 2022-02-22T15:50:19.452Z · LW · GW · 25 comments

  My process for learning by writing
  I hope I haven’t made this sound fun or easy
  Footnotes
None
25 comments

I have very detailed opinions on lots of topics. I sometimes get asked how I do this, which might just be people making fun of me, but I choose to interpret it as a real question, and I’m going to sketch an answer here.

You can think of this as a sort of sequel to Minimal-Trust Investigations. That piece talked about how investigating things in depth can be valuable; this piece will try to give a sense of how to get an in-depth investigation off the ground, going from “I’ve never heard of this topic before” to “Let me tell you all my thoughts on that.”

The rough basic idea is that I organize my learning around writing rather than reading. This doesn’t mean I don’t read - just that the reading is always in service of the writing.

Here’s an outline:

Step 1	Pick a topic
Step 2	Read and/or discuss with others (a bit)
Step 3	Explain and defend my current, incredibly premature hypothesis, in writing (or conversation)
Step 4	Find and list weaknesses in my case
Step 5	Pick a subquestion and do more reading/discussing
Step 6	Revise my claim / switch sides
Step 7	Repeat steps 3-6 a bunch
Step 8	Get feedback on a draft from others, and use this to keep repeating steps 3-6

The “traditionally” hard parts of this process are steps 4 and 6: spotting weaknesses in arguments, trying to resist the temptation to “stick to my guns” when my original hypothesis isn’t looking so good, etc.

But step 3 is a different kind of challenge: trying to “always have a hypothesis” and re-articulating it whenever it changes. By doing this, I try to continually focus my reading on the goal of forming a bottom-line view, rather than just “gathering information.” I think this makes my investigations more focused and directed, and the results easier to retain. I consider this approach to be probably the single biggest difference-maker between "reading a ton about lots of things, but retaining little" and "efficiently developing a set of views on key topics and retaining the reasoning behind them."

Below I'll give more detail on each step, then some brief notes (to be expanded on later) on why this process is challenging.

My process for learning by writing

Step 1: pick a topic. First, I decide what I want to form an opinion about. My basic approach here is: “Find claims that are important if true, and might be true.”

This doesn’t take creativity. We live in an ocean of takes, pundits, advocates, etc. I usually cheat by paying special attention to claims by people who seem particularly smart, interesting, unconventionally minded (not repeating the same stuff I hear everywhere), and interested in the things I’m interested in (such as the long-run future of humanity).

But I also tend to be at least curious about any claim that is both “important if true” and “not obviously wrong according to some concrete reason I can voice,” even if it’s coming from a very random source (Youtube commenter, whatever).

For a concrete example throughout this piece, I’ll use this hypothesis, which I examined pretty recently: “Human history is a story of life getting gradually, consistently better.”

(Other, more complicated examples are the Collapsing Civilizational Competence Hypothesis; the Most Important Century hypothesis; and my attempt to summarize history in one table.)

Step 2: read and/or discuss (a bit). I usually start by trying to read the most prominent 1-3 pieces that (a) defend the claim or (b) attack the claim or (c) set out to comprehensively review the evidence on both sides. I try to understand the major reasons they’re giving for the side they come down on. I also chat about the topic with people who know more about it than I do, and who aren’t too high-stakes to chat with.

In the example I’m using, I read the relevant parts of Better Angels of our Nature and Enlightenment Now (focusing on claims about life getting better, and skipping discussion of “why”). I then looked for critiques of the books that specifically responded to the claims about life having gotten better (again putting aside the “why”). This led mostly to claims about the peacefulness of hunter-gatherers.

Step 3: explain and defend my current, incredibly premature hypothesis, in writing (or conversation). This is where my approach gets unusual - I form a hypothesis about whether the claim is true, LONG before I’m “qualified to have an opinion.” The process looks less like “Read and digest everything out there on the topic” and more like “Read the 1-3 most prominent pieces on each side, then go.”

I don’t have an easy time explaining “how” I generate a hypothesis while knowing so little - it feels like I just always have a “guess” at the answer to some topic, whether or not I even want to (though it often takes me a lot of effort to articulate the guess in words). The main thing I have to say about the “how” is that it just doesn’t matter: at this stage the hypothesis is more about setting the stage for more questions about investigation than about really trying to be right, so it seems sufficient to “just start rambling onto the page, and make any corrections/edits that my current state of knowledge already forces.”

For this example, I noted down something along the lines of: “Life has gotten better throughout history. The best data on this comes from the last few hundred years, because before that we just didn’t keep many records. Sometimes people try to claim that the longest-ago, murkiest times were better, such as hunter-gatherer times, but there’s no evidence for this - in fact, empirical evidence shows that hunter-gatherers were very violent - and we should assume that these early times fit on the same general trendline, which would mean they were quite bad. (Also, if you go even further back than hunter-gatherers, you get to apes, whose lives seem really horrible, so that seems to fit the trend as well.¹)”

It took real effort to disentangle the thoughts in my head to the point where I could write that, but I tried to focus on keeping things simple and not trying to get it perfect.

At this stage, this is not a nuanced, caveated, detailed or well-researched take. Instead, my approach is more like: “Try to state what I think in a pretty strong, bold manner; defend it aggressively; list all of the best counterarguments, and shoot them down.” This generally fails almost immediately.

Step 4: find and list weaknesses in my case. My next step is to play devil’s advocate against myself, such as by:

Looking for people arguing things that contradict my working hypothesis, and looking for their strongest points.
Noting claims I’ve made with this property: “I haven’t really made an attempt to look comprehensively at the arguments on both sides of this, and if I did I might change my mind.”

(This summary obscures an ocean of variation. Having more existing knowledge about a general area, and more experience with investigations in general, can make someone much better at noticing things like this.)

In the example, my “devil’s advocate” points included:

I’m getting all of my “life has gotten better” charts from books that are potentially biased. I should do something to see whether there are other charts, excluded from those books, that tell the opposite story.
From my brief skim, the “hunter-gatherers were violent” claim looks right, and the critiques seem very hand-wavy and non-data-based. But I should probably read them more carefully and pull out their strongest arguments.
Even if hunter-gatherers were violent, what about other aspects of their lives? Wikipedia seems to have a pretty rosy picture …

In theory, I could swap Step 4 (listing things I’d like to look into more) with Step 3 (writing what I think). That is, I could try to review both sides of every point comprehensively before forming my own view, which means a lot more reading before I start writing.

I think many people try to do this, but in my experience at least, it’s not the best way to go.

Debates tend to be many-dimensional: for example, “Has life gotten better?” quickly breaks down into “Has quality-of-life metric X gotten better over period Y?” for a whole bunch of different X-Y pairs (plus other questions²).
So if my goal were “Understand both sides of every possible sub-debate,” I could be reading forever - for example, I might get embroiled in the debates and nuances around each different claim made about life getting better over the last few hundred years.
By writing early, I get a chance to make sure I’ve written down the version of the claim I care most about, and make sure that any further investigation is focused on the things that matter most for changing my mind on this claim.
- Once I wrote down “There are a huge number of charts showing that life has gotten better over the last few hundred years,” I could see that deep-diving any particular one of those charts wouldn’t be the best use of time - compared to addressing the very weakest points in the claim I had written, by going back further in time to hunter-gatherer periods, or looking for entirely different collections of charts.

Step 5: pick a subquestion and do more reading and/or discussing. One of the most important factors that determines whether these investigations go well (in the sense of teaching me a lot relatively quickly) is deciding which subquestions to “dig into” and which not to. As just noted, writing the hypothesis down early is key.

I try to stay very focused on doing the reading (and/or low-stakes discussion) most likely to change the big-picture claim I’m making. I rarely read a book or paper “once from start to finish”; instead I energetically skip around trying to find the parts most likely to give me a solid reason to change my mind, read them carefully and often multiple times, try to figure out what else I should be reading (whether this is “other parts of the same document” or “academic papers on topic X”) to contextualize them, etc.

Step 6: Revise my claim / switch sides. This is one of the trickiest parts - pausing Step 5 as soon as I have a modified (often still simplified, under-researched and wrong) hypothesis. It’s hard to notice when my hypothesis changes, and hard to stay open to radical changes of direction (and I make no claim that I’m as good at it as I could be).

I often try radically flipping around my hypothesis, even if I haven’t actually been convinced that it’s wrong - sometimes when I’m feeling iffy about arguing for one side, it’s productive to just go ahead and try arguing for the other side. I tend to get further by noticing how I feel about the "best arguments for both sides" than by trying from the start to be even-handed.

In the example, I pretty quickly decided to try flipping my view around completely, and noted something like: “A lot of people assume life has gotten better over time, but that’s just the last few hundred years. In fact, our best guess is that hunter-gatherers were getting some really important things right, such as gender relations and mental health, that we still haven’t caught up to after centuries of progress. Agriculture killed that, and we’ve been slowly climbing out of a hole ever since. There should be tons more research on what hunter-gatherer societies are/were like, and whether we can replicate their key properties at scale today - this is a lot more promising than just continuing to push forward science and technology and modernity.”

This completely contradicted my initial hypothesis. (I now think both are wrong.)

This sent me down a new line of research: constructing the best argument I could that life was better in hunter-gatherer times.

Step 7: repeat steps 3-6 a bunch. I tried to gather the best evidence for hunter-gatherer life being good, and for it being bad, and zeroed in on gender relations and violence as particularly interesting, confusing debates; on both of these, I changed my hypothesis/headline several times.

My hypotheses became increasingly complex and detailed, as you can see from the final products: Pre-agriculture gender relations seem bad (which argues that gender relations for hunter-gatherers were/are far from Wikipedia’s rosy picture, according to the best available evidence, though the evidence is far from conclusive, and it’s especially unclear how pre-agriculture gender relations compare to today’s) and Unraveling the evidence about violence among very early humans (which argues that hunter-gatherer violence was indeed high, but that - contra Better Angels - it probably got even worse after the development of agriculture, before declining at some pretty unknown point before today).

I went through several cycles of “I think I know what I really think and I’m ready to write,” followed by “No, having started writing, I’m unsatisfied with my answer on this point and think a bit more investigation could change it.” So I kept alternating between writing and reading, but was always reading with the aim of getting back to writing.

I finally produced some full, opinionated drafts that seemed to me to be about the best I could do without a ton more work.

After I had satisfied myself on these points, I popped back up from the “hunter-gatherer” question to the original question of whether life has gotten better over time. I followed a similar process for investigating other subquestions, like “Is the set of charts I’ve found representative for the last few hundred years?” and “What about the period in between hunter-gatherer times and the last few hundred years?”

Step 8: add feedback from others into the loop. It takes me a long time to get to the point where I can no longer easily tear apart my own hypothesis. Once I do, I start seeking feedback from others - first just people I know who are likely to be helpful and interested in the topic, then experts and the public. This works the same basic way as Steps 4-7, but with others doing a lot of the “noticing weaknesses” part (Step 4).

When I publish, I am thinking of it more like “I can’t easily find more problems with this, so it’s time to see whether others can” than like “This is great and definitely right.”

I hope I haven’t made this sound fun or easy

Some things about this process that are hard, taxing, exhausting and a bit of a mental health gauntlet:

I constantly have a feeling (after reading) like I know what I think and how to say it, then I start writing and immediately notice that I don’t at all. I need to take a lot of breaks and try a lot of times to even “write what I currently think,” even when it’s pretty simple and early.
Every subquestion is something I could spend a lifetime learning about, if I chose to. I need to constantly interrupt myself and ask, “Is this a key point? Is this worth learning more about?” or else I’ll never finish.
There are infinite tough judgment calls about things like “whether to look into some important-seeming point, or just reframe my hypothesis such that I don’t need to.” Sometimes the latter is the answer (it feels like some debate is important, but if I really think about it, I realize the thing I most care about can be argued for without getting to the bottom of it); sometimes the former is (it feels like I can try to get around some debate, but actually, I can’t really come to a reasonable conclusion without an exhausting deep dive).
At any given point, I know that if I were just better at things like “noticing which points are really crucial” and “reformulating my hypothesis so that it’s easier to defend while still important,” I could probably do something twice as good in half the time … and I often realize after a massive deep dive that most of the time I spent wasn’t necessary.
Because of these points, I have very little ability to predict when a project will be done; I am never confident that I’m doing it as well as I could; and I’m constantly interrupting myself to reflect on these things rather than getting into a flow.
Half the time, all of this work just ends up with me agreeing with conventional wisdom or “the experts” anyway … so I’ve just poured in work and gone through a million iterations of changing my mind, and any random person I talk to about it will just be like “So you decided X? Yeah X is just what I had already assumed.”
The whole experience is a mix of writing, Googling, reading, skimming, and pressuring myself to be more efficient, which is very different and much more unpleasant compared to the experience of just reading. (Among other things, I can read in a nice location and be looking at a book or e-ink instead of a screen. Most of the work of an “investigation” is in front of a glowing screen and requires an Internet connection.)

I’ll write more about these challenges in a future post. I definitely recommend reading as a superior leisure activity, but for me at least, writing-centric work seems better for learning.

I’m really interested in comments from anyone who tries this sort of thing out and has things to share about how it goes!

Footnotes

I never ended up using this argument about apes. I think it’s probably mostly right, but there’s a whole can of worms with claims about loving, peaceful bonobos that I never quite got motivated to get to the bottom of. ↩
Such as which metrics are most important. ↩

25 comments

Comments sorted by top scores.

comment by Ruby · 2022-02-24T22:03:56.386Z · LW(p) · GW(p)

Curated. I'm partial to any post that provides guidance on how to research and write. In my mind, this post joins a treasured collection including other gems such as How To Write Quickly While Maintaining Epistemic Rigor [LW · GW]; Scholarship: How to Do It Efficiently [LW · GW]; Fact Posts: How and Why [LW · GW]; and Literature Review For Academic Outsiders: What, How, and Why [LW · GW].

Replies from: ishan-mukherjee, halinaeth

↑ comment by Ishan Mukherjee (ishan-mukherjee) · 2022-04-16T08:55:33.536Z · LW(p) · GW(p)

Thanks for this list! This post also reminded me of Robin Hanson's post Chase Your Reading.

↑ comment by halinaeth · 2024-10-09T09:09:21.464Z · LW(p) · GW(p)

Thanks for linking these! Found my next reading list :)

comment by Pablo Repetto (pablo-repetto-1) · 2022-02-23T08:19:10.621Z · LW(p) · GW(p)

Brief notes:

Unlike Umberto Eco's "How to Write a Thesis", and like johnwensworth's "How To Write Quickly While Maintaining Epistemic Rigor" [LW · GW], this does not propose you sacrifice the importance of questions on the altar of rigor
There is a pretty big hole on how to find "the 1-3 most prominent pieces on each side". When first encountering a claim on the context of an academic debate, this is usually not an issue. Often, not so much. Obviously, if you simply came up with the claim you have no conveniently laid out pointer to the academic literature.
"Just write something", or the vomit pass, is an invaluable technique. I have been doing this more and more over the last two years. Having full license to be wrong is incredibly freeing, and it makes it much easier to iterate on an idea. You just read what you wrote and pay attention to the parts that make you cringe.

Replies from: HoldenKarnofsky

↑ comment by HoldenKarnofsky · 2022-03-31T22:58:38.865Z · LW(p) · GW(p)

I broadly agree. I think it is sometimes challenging to find the major pieces on an issue, though rarely super challenging (usually, if I just read the first few pieces I find and search citations backwards and forwards, at some point I find myself running into the same pieces over and over).

comment by jungofthewon · 2022-02-25T00:03:11.925Z · LW(p) · GW(p)

I enjoyed reading this, thanks for taking the time to organize your thoughts and convey them so clearly! I'm excited to think a bit about how we might imbue a process like this into Elicit.

This also seems like the research version of being hypothesis-driven / actionable / decision-relevant at work.

comment by Ben Pace (Benito) · 2024-01-17T20:26:48.187Z · LW(p) · GW(p)

Holden's posts on writing and research (this one, The Wicked Problem Experience [LW · GW] and Useful Vices for Wicked Problems [LW · GW]) have been the most useful posts for me from Cold Takes and been directly applicable to things I've worked on. For instance the Wicked Problem Experience was published soon after I wrote a post about the discovery of laws of nature [LW · GW], and resonated with and validated some of how I approached and worked on that. I give all three +4.

comment by Raemon · 2024-01-15T02:01:33.762Z · LW(p) · GW(p)

I've had a vague intent to deliberately apply this technique since first reading this two years ago. I haven't actually done so, alas.

It still looks pretty good to me on paper, and feels like something I should attempt at some point.

comment by Raemon · 2022-02-23T22:28:13.584Z · LW(p) · GW(p)

I'm currently exploring modes of "focusing more on learning and/or thinking" and I found this post to be a useful set of hypothesis of things to try.

I'm curious about "forming an opinion about an existing thing in the world" vs "contributing thoughts to unsolved problems."

I'm currently looking at the landscape of interpretability, and feeling a vague sense that it's either not pointed in the right direction, but I'm lacking a lot of context on how modern ML interpretability works exactly and what's been tried. I'm not sure if this is the sort of thing you think is amenable to this process.

comment by halinaeth · 2024-10-09T09:16:42.631Z · LW(p) · GW(p)

Thank you for this detailed process outline! I've been wanting to "learn by writing" for quite a while now (inspired by PG essays actually), yet never took the time out thus far. Your outline is extremely helpful to cut down time wondering "is my process any good/how will I learn through writing", and go straight to the learning (and writing)!

Regarding step 8 "get[ing] to the point where I can no longer easily tear apart my own hypothesis", I'm curious what your level of "Openness" is in your Big 5 personality traits. As someone with nearly off the charts openness, I could see myself getting stuck and flip flopping a stance near infinitely, especially as I do more research and become persuaded by more nuanced on either side.

Then again, that in itself is a skill to learn! Excited to put this to use.

comment by Zach Bennett (zach-bennett) · 2025-01-25T01:46:05.049Z · LW(p) · GW(p)

This very closely mirrors a large chunk of a learning framework that I've formalized and written the draft of a book about. I would like to perhaps interview you about this process, how it has impacted your life, as well as some other things that weren't included in the scope of this article. If you're interested, please let me know, I can be easily found on LinkedIn.com/in/zoid...

As far as the exhaustion part, I feel you. I have developed severe back problems from sitting in my computer chair for 20 hours straight deconstructing my latest obsession. I'll tell you, what, though - it made me a hell of a writer. I was looking through my inbox and realized Grammarly had been sending me these "stats" emails I just assumed were spam and ignored. I started looking at them and was flabbergasted as to how much I was outputting. On my strongest week, I output over 1,000,000 words, with 99% precision (something like 640 corrections across the million words) and a vocabulary in the top 1% of Grammarly users (20,000+ unique words.) There was another week with 450k, and several in the 100k, although my weekly average is more like 30k over the entire year. Perhaps someone knows an employee at Grammarly, do I win something? LOL...

Replies from: weibac, Mo Nastri

↑ comment by Milan W (weibac) · 2025-01-25T07:25:48.095Z · LW(p) · GW(p)

Holden was previously Open Philanthropy's CEO and is now settling into his new role at Anthropic.

I am not privy to any non-public information, but please do not be disappointed if he has no time for contacting you via LinkedIn for the purpose of booking the interview you are requesting.

Replies from: gwern

↑ comment by gwern · 2025-01-26T01:13:05.296Z · LW(p) · GW(p)

Holden was previously Open Philanthropy's CEO and is now settling into his new role at Anthropic.

Wait, what? When did Holden Karnofsky go to Anthropic? Even his website doesn't mention that and still says he's at Carnegie.

Replies from: DominikPeters, habryka4, T3t, Benito, weibac, D0TheMath

↑ comment by DominikPeters · 2025-01-29T06:15:54.966Z · LW(p) · GW(p)

The Carnegie website says he is no longer at Carnegie. A Harvard page says:

> Holden Karnofsky is a Member of Technical Staff at Anthropic, where he focuses on the design of the company's Responsible Scaling Policy and other aspects of preparing for the possibility of highly advanced AI systems in the future.

His LinkedIn confirms and dates the move to January 2025.

↑ comment by habryka (habryka4) · 2025-01-26T01:43:02.615Z · LW(p) · GW(p)

He posted it in a bunch of private Slacks a few weeks ago. His LinkedIn is now also updated: https://www.linkedin.com/in/holden-karnofsky-75970b7

↑ comment by RobertM (T3t) · 2025-01-26T01:51:56.204Z · LW(p) · GW(p)

I learned it elsewhere, but his LinkedIn confirms that he started at Anthropic sometime in January.

↑ comment by Ben Pace (Benito) · 2025-01-26T01:31:56.546Z · LW(p) · GW(p)

However it is on his LinkedIn.

↑ comment by Milan W (weibac) · 2025-01-26T15:10:04.482Z · LW(p) · GW(p)

OpenPhil said in April 2024 that he left them for the Carnegie Endowment for International Peace. The Carnegie Endowment says he is no longer with them. His linkedin profile (which I presume to be authentic, because it was created in 2008) says he's at Anthropic since January 2025.

EDIT: Additional source, Harvard's Berkman Klein Center.

↑ comment by Garrett Baker (D0TheMath) · 2025-01-26T01:44:04.859Z · LW(p) · GW(p)

Its on his Linkedin at least. Apparently since the start of the year.

↑ comment by Mo Putera (Mo Nastri) · 2025-01-25T11:11:10.019Z · LW(p) · GW(p)

What were you outputting over a million words in a week for?

And given that there are 7 x 16 x 60 = 6,720 minutes in a week of 16-hour days, you'd need to output 150 wpm at minimum over the entire duration to hit a million words, which doesn't seem humanly possible. How did you do it?

comment by Jacky April (jacky-april) · 2022-06-24T19:30:58.912Z · LW(p) · GW(p)

Reasoning that is rather interesting. Articles such as these are some of my favorites to read. For instance, I read reviews https://topwritingreviews.com/ very often since doing so is both fascinating and helpful.

comment by trevor (TrevorWiesinger) · 2024-01-17T21:10:21.521Z · LW(p) · GW(p)

This makes sense, and I'm also glad to see posts by Holden as well and don't feel very good about shutting down this idea, which works great in theory.

In practice, your computer is not secure, and you probably cannot procure one that is due to the risk of backdoors and vulnerabilities in chip firmware; therefore you should not be treating your computer as an output shell for your mind. This was a big problem before large language models [LW · GW] and it is a bigger one now (although understanding generative AI will not really help you understand the basic problem of modern human behavior research [LW · GW]).

Even handwriting on paper (ugh) cannot securely be done anywhere near smartphones or smart devices; in order to understand what level of proximity is safe, you have to think really hard about signal-to-noise ratio and acoustics (e.g. can a smartphone in my neighbor's apartment reconstruct the movements of my hand based on whatever sound waves make it through the wall and makes it to that microphone; if enough does, and that smartphone has been compromised, then the risk of it being reconstructed into the original words is high).

If you're someone working on non-sensitive topics, then there isn't so much of a problem with learning by writing; there's still the risk of your thought process being automatically analyzed and compared to other people for purposes of predictive analytics, but I don't know to what extent we're there yet or how far away we might be (the data will probably be retained by orgs like the NSA until we reach that point anyway). I also haven't thought much about whiteboards and currently don't know what to think.

Generally, the safest route is to keep most of your thoughts inside your head instead of outside, and then write down the finished product.

Replies from: ryan_greenblatt, halinaeth

↑ comment by ryan_greenblatt · 2024-01-17T21:28:51.149Z · LW(p) · GW(p)

I strongly disagree with the commentary you provide being important or relevant for most people in practice.

Replies from: TrevorWiesinger

↑ comment by trevor (TrevorWiesinger) · 2024-01-18T15:55:14.830Z · LW(p) · GW(p)

This is important and relevant for most people here, in practice.

The world is changing and there is a risk that people in high-stakes areas like AI safety will be weaponized against each other as part of elite conflict. Understanding generative AI doesn't really help with this, and in practice often misleads people to overestimate the difficulty of human manipulation; I wrote a 4-minute overview of the current state of the attack surface here [LW · GW] (e.g. massive amounts of A/B testing to steer people in specific directions), and I described the fundamental problem previously [LW · GW]:

If there were intelligent aliens, made of bundles of tentacles or crystals or plants that think incredibly slowly, their minds would also have zero days that could be exploited because any mind that evolved naturally would probably be like the human brain, a kludge of spaghetti code that is operating outside of its intended environment, and they would also would not even begin to scratch the surface of finding and labeling those zero days until, like human civilization today, they began surrounding thousands or millions of their kind with sensors that could record behavior several hours a day and find webs of correlations.

I've explained [LW · GW] why people in AI safety should not consider themselves at anywhere near the same risk level as average citizens:

This problem connects to the AI safety community in the following way:
State survival and war power ==> [LW · GW] already depends on information warfare capabilities.
Information warfare capabilities ==> [LW · GW] already depends on SOTA psychological research systems.
SOTA psychological research systems ==> [LW · GW] already improves and scales mainly from AI capabilities research, diminishing returns on everything else.^[1] [LW(p) · GW(p)]
AI capabilities research ==> [LW · GW] already under siege from the AI safety community.
Therefore, the reason why this might be such a big concern is:
State survival and war power ==> [LW · GW] their toes potentially already being stepped on by the AI safety community?

I've also described why agents in general should be taking basic precautions [LW · GW] as part of instrumental convergence [? · GW]:

There isn’t much point in having a utility function in the first place if hackers can change it at any time. There might be parts that are resistant to change, but it’s easy to overestimate yourself on this; for example, if you value the longterm future and think that no false argument can persuade you otherwise, but a social media news feed plants paranoia or distrust of Will Macaskill, then you are one increment closer to not caring about the longterm future; and if that doesn’t work, the multi-armed bandit algorithm will keep trying until it finds something that works.
The human brain is a kludge of spaghetti code, so there’s probably something somewhere. The human brain has zero days, and the capability and cost of social media platforms to use massive amounts of human behavior data to find complex social engineering techniques is a profoundly technical matter, you can’t get a handle on this with intuition or pre 2010s historical precedent. Thus, you should assume that your utility function and values are at risk of being hacked at an unknown time, and should therefore be assigned a discount rate to account for the risk over the course of several years.
Slow takeoff over the course of the next 10 years alone guarantees that this discount rate is too high in reality for people in the AI safety community to continue to go on believing that it is something like zero. I think that approaching zero is a reasonable target, but not with the current state of affairs where people don’t even bother to cover up their webcams, have important and sensitive conversations about the fate of the earth in rooms with smartphones, and use social media for nearly an hour a day (scrolling past nearly a thousand posts). The discount rate in this environment cannot be considered “reasonably” close to zero if the attack surface is this massive; and the world is changing this quickly.
If people have anything they value at all [LW · GW], and the AI safety community probably does have that, then the current AI safety paradigm of zero effort is wildly inappropriate, it is basically total submission to invisible hackers.

In fact, I've made a very solid case that pouring your thoughts directly into a modern computer, even through a keyboard with no other sensor exposure, is a deranged and depraved thing for an agent to do, and it is right to recoil in horror when you see an influential person encouraging many other influential people to do it.

↑ comment by halinaeth · 2024-10-09T09:05:46.470Z · LW(p) · GW(p)

Good point about how LLMs making "brain dumping" on a computer very different than before.

I can see how your proposal might be helpful for heads of nation states/billionaires/CEOs who worry about espionage, but for the average person writing in a journal & storing it in a safe place seems sufficient, no?

Even for myself, I'd probably write all my non-journaling notes (those I'd like to be able to search & organize & refer to later) in one of the usual solutions (notes app, maybe Google drive for more organization). Even if all my notes were leaked publicly online, I'm not sure I'd have taken the trade-offs in efficiency from going all analog and going with paper everything.

Lots to think about, and I definitely don't know the first thing about computer security so open to learning there. Thanks for the comment & the timely point about LLMs!

Learning By Writing

Contents

My process for learning by writing

I hope I haven’t made this sound fun or easy

Footnotes

25 comments