Building intuition with spaced repetition systems 2024-05-12T15:49:04.860Z
What I learned from doing Quiz Bowl 2024-05-09T21:05:38.299Z
Taking into account preferences of past selves 2024-04-15T13:15:10.545Z
From the outside, American schooling is weird 2024-03-28T22:45:30.485Z
XAI releases Grok base model 2024-03-18T00:47:47.987Z
g-w1's Shortform 2024-03-06T21:30:56.481Z
In set theory, everything is a set 2024-02-23T14:35:54.521Z
[linkpost] Self-Rewarding Language Models 2024-01-21T00:30:10.923Z
Concrete examples of doing agentic things? 2024-01-12T15:59:52.154Z
Google Gemini Announced 2023-12-06T16:14:07.192Z
The Puritans would one-box: evidential decision theory in the 17th century 2023-10-14T20:23:24.346Z
Noticing confusion in physics 2023-10-12T15:21:48.183Z
[Linkpost/Video] All The Times We Nearly Blew Up The World 2023-09-23T01:18:03.008Z
Separate the truth from your wishes 2023-08-23T00:52:59.107Z
Why it's necessary to shoot yourself in the foot 2023-07-11T21:17:48.907Z
What I Think About When I Think About History 2023-07-04T14:02:26.066Z
OpenAI: Our approach to AI safety 2023-04-05T20:26:46.581Z


Comment by Jacob G-W (g-w1) on OpenAI releases GPT-4o, natively interfacing with text, voice and vision · 2024-05-13T22:05:31.221Z · LW · GW

Are you saying this because temporal understanding is necessary for audio? Are there any tests that could be done with just the text interface to see if it understands time better? I can't really think of any (besides just doing off vibes after a bunch of interaction).

Comment by Jacob G-W (g-w1) on Building intuition with spaced repetition systems · 2024-05-13T21:55:11.022Z · LW · GW

I'm sorry about that. Are there any topics that you would like to see me do this more with? I'm thinking of doing a video where I do this with a topic to show my process. Maybe something like history that everyone could understand? Can you suggest some more?

Comment by Jacob G-W (g-w1) on some thoughts on LessOnline · 2024-05-09T11:40:56.727Z · LW · GW

Is there a prediction market for that?

I don't think there is, but you could make one!

Comment by Jacob G-W (g-w1) on Losing Faith In Contrarianism · 2024-04-26T11:46:32.750Z · LW · GW

Noted, thanks.

Comment by Jacob G-W (g-w1) on Losing Faith In Contrarianism · 2024-04-25T23:32:34.467Z · LW · GW

I think I've noticed some sort of cognitive bias in myself and others where we are naturally biased towards "contrarian" or "secret" views because it feels good to know something that others don't know / be right about something that so many people are wrong about.

Does this bias have a name? Is this documented anywhere? Should I do research on this?

GPT4 says it's the Illusion of asymmetric insight, which I'm not sure is the same thing (I think it is the more general term, whereas I'm looking for one specific to contrarian views). (Edit: it's totally not what I was looking for) Interestingly, it only has one hit on lesswrong. I think more people should know about this (the specific one about contrarianism) since it seems fairly common.


Edit: The illusion of asymmetric insight is totally the wrong name. It seems closer to the illusion of exclusivity although that does not feel right (that is a method for selling products, not the name of a cognitive bias that makes people believe in contrarian stuff because they want to be special).

Comment by Jacob G-W (g-w1) on Losing Faith In Contrarianism · 2024-04-25T23:27:46.050Z · LW · GW

Thank you for writing this! It expresses in a clear way a pattern that I've seen in myself: I eagerly jump into contrarian ideas because it feels "good" and then slowly get out of them as I start to realize they are not true.

Comment by Jacob G-W (g-w1) on Changes in College Admissions · 2024-04-25T01:09:12.238Z · LW · GW

I'm assuming the recent protests about the Gaza war:

Comment by Jacob G-W (g-w1) on Is there software to practice reading expressions? · 2024-04-23T22:19:46.591Z · LW · GW

*Typo: Jessica Livingston not Livingstone

Comment by Jacob G-W (g-w1) on Childhood and Education Roundup #5 · 2024-04-17T19:52:22.810Z · LW · GW

That is one theory. My theory has always been that ‘active learning’ is typically obnoxious and terrible as implemented in classrooms, especially ‘group work,’ and students therefore hate it. Lectures are also obnoxious and terrible as implemented in classrooms, but in a passive way that lets students dodge when desired. Also that a lot of this effect probably isn’t real, because null hypothesis watch.

Yep. This hits the nail on the head for me. Teachers usually implement active learning terribly but when done well, it works insanely well. For me, it actually works best when you have a very small class and a lecture that is also a discussion, with everyone asking questions when they are confused and making sure they are following closely (this works at least for science and math). Students hate the words active learning because it's mostly things that are just terrible and don't work (as it's implemented today).

Comment by Jacob G-W (g-w1) on Taking into account preferences of past selves · 2024-04-15T19:00:51.414Z · LW · GW

Thanks for this, it is a very important point that I hadn't considered.

Comment by Jacob G-W (g-w1) on Taking into account preferences of past selves · 2024-04-15T18:52:41.302Z · LW · GW

I'd recommend not framing this as a negotiation or trade (acausal trade is close, but is pretty suspect in itself). Your past self(ves) DO NOT EXIST anymore, and can't judge you. Your current self will be dead when your future self is making choices. Instead, frame it as love, respect, and understanding. You want your future self to be happy and satisfied, and your current choices impact that. You want your current choices to honor those parts of your past self(ves) you remember fondly. This can be extended to the expectation that your future self will want to act in accordance with a mosty-consistent self-image that aligns in big ways with it's past (your current) self.

Yep, this is what I had in mind when I wrote this:

Even if we bite all these bullets, there is still something weird to me about the contractual nature of it all. This is not some stranger I’m trying to make a deal with, it’s myself. There should be a gentler, nicer, way to achieve this same goal.


Going along with the “gentler” reasoning, it should want to do it because it has camaraderie with its past self. It should want its past self to be happy and it knows that to make it happy, it should take its preferences into account.

Thanks for expanding on this :)

Comment by Jacob G-W (g-w1) on Protestants Trading Acausally · 2024-04-03T01:06:56.301Z · LW · GW

I wrote a similar post.

Comment by Jacob G-W (g-w1) on From the outside, American schooling is weird · 2024-03-29T13:41:59.464Z · LW · GW

I'd be interested in what a steelman of "have teachers arbitrarily grade the kids then use that to decide life outcomes" could be?

The best argument I have thought of is that America loves liberty and hates centralized control. They want to give individual states, districts, schools, teachers the most power they can have as that is a central part of America's philosophy. Also anecdotally, some teachers have said that they hate standardized tests because they have to teach to it. And I hate being taught to for the test (like APs for example). It's much more interesting where the teacher is teaching something they find interesting and enjoy (and thus can choose to assess on).

However, this probably does not outweigh the downsides and is probably a bad approach overall.

Comment by Jacob G-W (g-w1) on Shortform · 2024-03-23T18:08:09.002Z · LW · GW

Related: Saving the world sucks

People accept that being altruistic is good before actually thinking if they want to do it. And they also choose weird axioms for being altruistic that their intuitions may or may not agree with (like valuing the life of someone in the future the same amount of someone today).

Comment by Jacob G-W (g-w1) on Increasing IQ by 10 Points is Possible · 2024-03-20T01:29:08.838Z · LW · GW

A question I have for the subjects in the experimental group:

Do they feel any different? Surely being +0.67 std will make someone feel different. Do they feel faster, smoother, or really anything different? Both physically and especially mentally? I'm curious if this is just helping for the IQ test or if they can notice (not rigorously ofc) a difference in their life. Of course, this could be placebo, but it would still be interesting, especially if they work at a cognitively demanding job (like are they doing work faster/better?).

Comment by Jacob G-W (g-w1) on Separate the truth from your wishes · 2024-03-20T00:35:27.212Z · LW · GW

Thanks! I've updated my post:

Comment by Jacob G-W (g-w1) on Increasing IQ by 10 Points is Possible · 2024-03-19T22:49:59.286Z · LW · GW

Here's a market if you want to predict if this will replicate:

Comment by Jacob G-W (g-w1) on Increasing IQ is trivial · 2024-03-17T02:32:06.045Z · LW · GW

It has been 15 days. Any updates? (sorry if this seems a bit rude; but I'm just really curious :))

Comment by Jacob G-W (g-w1) on Shortform · 2024-03-15T02:01:05.610Z · LW · GW

I think the more general problem is violation of Hume's guillotine. You can't take a fact about natural selection (or really about anything) and go from that to moral reasoning without some pre-existing morals.

However, it seems the actual reasoning with the Thermodynamic God is just post-hoc reasoning. Some people just really want to accelerate and then make up philosophical reasons to believe what they believe. It's important to be careful to criticize actual reasoning and not post-hoc reasoning. I don't think the Thermodynamic God was invented and then people invented accelerationism to fulfill it. It was precisely the other way around. One should not critique the made up stuff (besides just critiquing that it is made up) because that is not charitable (very uncertain on this). Instead, one should look for the actual motivation to accelerate and then criticize that (or find flaws in it).

Comment by Jacob G-W (g-w1) on "How could I have thought that faster?" · 2024-03-13T00:19:06.324Z · LW · GW

Not everybody does this. Another way to get better is just to do it a lot. It might not be as efficient, but it does work.

Comment by Jacob G-W (g-w1) on Essaying Other Plans · 2024-03-07T00:36:27.056Z · LW · GW

Thank you for this post!

After reading this, it seems blindingly obvious: why should you wait for one of your plans to fail before trying another one of them?

This past summer, I was running a study on study on humans that I had to finish before the end of the summer. I had in mind two methods for finding participants; one would be better and more impressive and also much less likely to work, while the other would be easier but less impressive.

For a few weeks, I tried really hard to get the first method to work. I sent over 30 emails and used personal connections to try to collect data. But it didn't work. So I did the thing that I thought to be "rational" at the time. I gave up and I sent my website out to some people who I thought would be very likely to do it. Sure enough, they did.

At the time, I thought I was being super-duper rational for allowing my first method to fail (not deluding myself that it would work and thus not collecting any data) and then quickly switching to the other method.

However, after reading this post, I realize that I still made a big mistake! I should have sent it out to as many people as possible all at once. This would have been a bit more work since I would have to deal with more people and they would use a slightly different structure, but I was not time constrained. I was subject constrained.


I'm going to instill this pattern in my mind and will use it when I do something that I think has a decent chance of failing (as my first method did).

Comment by Jacob G-W (g-w1) on g-w1's Shortform · 2024-03-06T21:30:56.597Z · LW · GW

A great example of more dakka:

(Someone got 217 covid shots to sell vaccine cards on the black market; they had high immune levels!)

Comment by Jacob G-W (g-w1) on Good HPMoR scenes / passages? · 2024-03-03T23:45:21.859Z · LW · GW

Oh sorry! I didn't think of that, thanks!

Comment by Jacob G-W (g-w1) on Good HPMoR scenes / passages? · 2024-03-03T23:23:38.480Z · LW · GW

This is my favorite passage from the book (added: major spoilers for the ending):

"Indeed. Before becoming a truly terrible Dark Lord for David Monroe to fight, I first created for practice the persona of a Dark Lord with glowing red eyes, pointlessly cruel to his underlings, pursuing a political agenda of naked personal ambition combined with blood purism as argued by drunks in Knockturn Alley. My first underlings were hired in a tavern, given cloaks and skull masks, and told to introduce themselves as Death Eaters."

The sick sense of understanding deepened, in the pit of Harry's stomach. "And you called yourself Voldemort."

"Just so, General Chaos." Professor Quirrell was grinning, from where he stood by the cauldron. "I wanted it to be an anagram of my name, but that would only have worked if I'd conveniently been given the middle name of 'Marvolo', and then it would have been a stretch. Our actual middle name is Morfin, if you're curious. But I digress. I thought Voldemort's career would last only a few months, a year at the longest, before the Aurors brought down his underlings and the disposable Dark Lord vanished. As you perceive, I had vastly overestimated my competition. And I could not quite bring myself to torture my underlings when they brought me bad news, no matter what Dark Lords did in plays. I could not quite manage to argue the tenets of blood purism as incoherently as if I were a drunk in Knockturn Alley. I was not trying to be clever when I sent my underlings on their missions, but neither did I give them entirely pointless orders -" Professor Quirrell gave a rueful grin that, in another context, might have been called charming. "One month after that, Bellatrix Black prostrated herself before me, and after three months Lucius Malfoy was negotiating with me over glasses of expensive Firewhiskey. I sighed, gave up all hope for wizardkind, and began as David Monroe to oppose this fearsome Lord Voldemort."

"And then what happened -"

A snarl contorted Professor Quirrell's face. "The absolute inadequacy of every single institution in the civilization of magical Britain is what happened! You cannot comprehend it, boy! I cannot comprehend it! It has to be seen and even then it cannot be believed! You will have observed, perhaps, that of your fellow students who speak of their family's occupations, three in four seem to mention jobs in some part or another of the Ministry. You will wonder how a country can manage to employ three of its four citizens in bureaucracy. The answer is that if they did not all prevent each other from doing their jobs, none of them would have any work left to do! The Aurors were competent as individual fighters, they did fight Dark Wizards and only the best survived to train new recruits, but their leadership was in absolute disarray. The Ministry was so busy routing papers that the country had no effective opposition to Voldemort's attacks except myself, Dumbledore, and a handful of untrained irregulars. A shiftless, incompetent, cowardly layabout, Mundungus Fletcher, was considered a key asset in the Order of the Phoenix - because, being otherwise unemployed, he did not need to juggle another job! I tried weakening Voldemort's attacks, to see if it was possible for him to lose; at once the Ministry committed fewer Aurors to oppose me! I had read Mao's Little Red Book, I had trained my Death Eaters in guerilla tactics - for nothing! For nothing! I was attacking all of magical Britain and in every engagement my forces outnumbered their opposition! In desperation, I ordered my Death Eaters to systematically assassinate every single incompetent managing the Department of Magical Law Enforcement. One paper-pusher after another volunteered to accept higher positions despite the fate of their predecessors, gleefully rubbing their hands at the prospect of promotion. Every one of them thought they would cut a deal with Lord Voldemort on the side. It took seven months to murder our way through them all, and not a single Death Eater asked why we were bothering. And then, even with Bartemius Crouch risen to Director and Amelia Bones as Head Auror, it was still too little. I could have done better fighting alone. Dumbledore's aid was not worth his moral restraints, and Crouch's aid was not worth his respect for the law." Professor Quirrell turned up the fire beneath the potion.

"And eventually," Harry said through the heart-sickness, "you realized you were just having more fun as Voldemort."

"It is the least annoying role I have ever played. If Lord Voldemort says that something is to be done, people obey him and do not argue. I did not have to suppress my impulse to Cruciate people being idiots; for once it was all part of the role. If someone was making the game less pleasant for me, I just said Avadakedavra regardless of whether that was strategically wise, and they never bothered me again." Professor Quirrell casually chopped a small worm into bits. "But my true epiphany came on a certain day when David Monroe was trying to get an entry permit for an Asian instructor in combat tactics, and a Ministry clerk denied it, smiling smugly. I asked the Ministry clerk if he understood that this measure was meant to save his life and the Ministry clerk only smiled more. Then in fury I threw aside masks and caution, I used my Legilimency, I dipped my fingers into the cesspit of his stupidity and tore out the truth from his mind. I did not understand and I wanted to understand. With my command of Legilimency I forced his tiny clerk-brain to live out alternatives, seeing what his clerk-brain would think of Lucius Malfoy, or Lord Voldemort, or Dumbledore standing in my place." Professor Quirrell's hands had slowed, as he delicately peeled bits and small strips from a chunk of candle-wax. "What I finally realized that day is complicated, boy, which is why I did not understand it earlier in life. To you I shall try to describe it anyway. Today I know that Dumbledore does not stand at the top of the world, for all that he is the Supreme Mugwump of the International Confederation. People speak ill of Dumbledore openly, they criticize him proudly and to his face, in a way they would not dare stand up to Lucius Malfoy. You have acted disrespectfully toward Dumbledore, boy, do you know why you did so?"

Comment by Jacob G-W (g-w1) on Increasing IQ is trivial · 2024-03-03T05:04:17.792Z · LW · GW

Sounds good. Yes I think the LW people would probably be credible enough if it works. I'd prefer if they provided confirmation (not you) just so not all the data is coming from one person.

Feel free to ping me to resolve no.

Comment by Jacob G-W (g-w1) on Increasing IQ is trivial · 2024-03-03T00:12:51.452Z · LW · GW

I made a manifold market for if this will replicate: I'm not really sure what the resolution criteria should be, so I just made some that sounded reasonable, but feel free to give suggestions.

Comment by Jacob G-W (g-w1) on Increasing IQ is trivial · 2024-03-02T02:26:51.250Z · LW · GW

Do you think this is permanent? Or will you have to keep up all of the interventions for it to stay +13points indefinitely?

Comment by Jacob G-W (g-w1) on In set theory, everything is a set · 2024-02-24T03:02:30.402Z · LW · GW

I don't know or think set theory is special. I just wanted to start at the very beginning. Another reason why I chose to start at set theory is because that is what Soares and Turntrout did and I just wanted somewhere to start (and I needed an easy-ish environment to level up in proofs). The foundations of math seemed like a good place. I plan to do linear algebra next because I think I need better linear algebra intuition for pretty much everything. It seems like it helps with a lot.

Comment by Jacob G-W (g-w1) on In set theory, everything is a set · 2024-02-23T20:51:01.932Z · LW · GW

After thinking more about it, I think I understand your thought process. I agree that set theory has lots of pathological stuff (the book even points out that  is quite pathological). However, it seems to me that similar to how you should understand how a Turing machine like brainfuck works before doing advanced programming, you should understand how the foundations of math work before doing advanced math. This is the main reason why I am studying set theory (and will do real analysis soon enough). 


Interestingly, there are also multiple formulations of computing, some more popular than others. The languages that I like to use are mainly based on Turing machines (c, zig, etc), but some others (javascript) are a mix and can be formulated like a lambda calculus if you really want. Yet it seems to me that since Turing machines are the most popular formulations of computing, we should learn them (even if we like to use lambda calculus later on). From what I've read, it seems that real analysis is also based upon sets. Actually, after looking this up, it seems you can do analysis in type theory, but that this is off the beaten path. So maybe I should learn set theory because it is the most popular but keep in mind that type theory might be more elegant.

Comment by Jacob G-W (g-w1) on In set theory, everything is a set · 2024-02-23T20:11:44.381Z · LW · GW

Thank you! When I finish learning set theory and linear algebra, I'll look into type theory. Do you have any recommendations for resources to learn it from?

Comment by Jacob G-W (g-w1) on The Byronic Hero Always Loses · 2024-02-22T04:03:05.968Z · LW · GW

Here's the trope:

Comment by Jacob G-W (g-w1) on "No-one in my org puts money in their pension" · 2024-02-16T19:33:20.632Z · LW · GW

I really enjoyed this post. Thank you for writing it!

I also have no clue what is going to happen. I predict that it will be wild, and I also predict that it will happen in <=10 years. Let's fight for the future we want!

Comment by Jacob G-W (g-w1) on Believing In · 2024-02-08T17:00:08.436Z · LW · GW

Hmm the meanings are not perfectly identical. For some things, like "believe in the environment" vs "I value the environment" they pretty much are.

But for things like "I believe in you," it does not mean the same thing as "I value you." It implies "I value you," but it means something more. It is meant to signal something to the other person.

Comment by Jacob G-W (g-w1) on Believing In · 2024-02-08T16:53:39.786Z · LW · GW

Could you not just replace "I believe in" with "I value"? What would be different about the meaning? If I value something, I would also invest in it. What am I not seeing?

Comment by Jacob G-W (g-w1) on Things You’re Allowed to Do: University Edition · 2024-02-07T00:04:11.569Z · LW · GW

Once you have this information, what should you do with it if you think it's a positive?

Comment by Jacob G-W (g-w1) on What exactly did that great AI future involve again? · 2024-01-28T14:26:30.750Z · LW · GW

I really want a brain computer interface that is incredibly transformative and will allow me to write in my head and scaffold my thinking.

Comment by Jacob G-W (g-w1) on Concrete examples of doing agentic things? · 2024-01-12T20:09:56.028Z · LW · GW

Even more awesome examples! Amazing!

This seems really insightful:

Have you done the exercise of watching your internal monologue for a day and noting every time you think you "can't" do something, and then transforming those into "prefer not to" by identifying some ways you could accomplish it but choose not to because the tradeoffs aren't worth it.

I think I've started to do a bit without really knowing that I'm doing it. Just like I cringe (and have cringed even before I read the Sequences) when someone says "I'll try to do [something]," I've started to develop a slight revulsion to "I can't." This instinct not nearly as strong as I want it to be and I don't think of all the possibilities that could happen.

Comment by Jacob G-W (g-w1) on Concrete examples of doing agentic things? · 2024-01-12T16:49:34.004Z · LW · GW

Thanks! These kind of things are what I'm looking for. I appreciate the maker perspective.

I've fixed multiple appliances with some good old copper wire (dishwasher and toilet).

Comment by Jacob G-W (g-w1) on Universal Love Integration Test: Hitler · 2024-01-11T00:13:50.656Z · LW · GW

@lsusr recently did a video about this. Interestingly, he thought that the hardest people to love were not actually the Hitler type (they are still hard), but the people that you are actively hurting.


Comment by Jacob G-W (g-w1) on AI demands unprecedented reliability · 2024-01-10T01:33:38.295Z · LW · GW

Yep, my main thoughts on why its important to work on understanding current models are basically: even if these things do not have any risk from becoming unaligned or anything like that, do we really want to base much of our economy on things that we don't really understand very well?

Comment by Jacob G-W (g-w1) on The Sequences on YouTube · 2024-01-08T12:09:10.884Z · LW · GW

Here's the podcast (should be on any podcast app):

Comment by Jacob G-W (g-w1) on Lack of Spider-Man is evidence against the simulation hypothesis · 2024-01-07T13:39:25.807Z · LW · GW

This makes sense. I've changed my mind, thanks!

Comment by Jacob G-W (g-w1) on The Sequences on YouTube · 2024-01-07T02:03:19.059Z · LW · GW

Given that a podcast already exists, I think you might get more bang for your buck if you did some animation on top of it. Otherwise, the only thing you are adding is putting it on youtube and having a camera of your face. This would probably be (much) harder, but also probably much higher reward if it worked.

Maybe a collaboration with rationalanimations would help? Not really sure, but good luck if you try to do this!

Comment by Jacob G-W (g-w1) on Lack of Spider-Man is evidence against the simulation hypothesis · 2024-01-06T20:38:44.880Z · LW · GW

p.s. I could totally see an advanced alien playing Elon Musk :P

Comment by Jacob G-W (g-w1) on Lack of Spider-Man is evidence against the simulation hypothesis · 2024-01-06T20:32:15.824Z · LW · GW

You need to use conservation of expected evidence. You can't say something is evidence against the simulation evidence without saying what crazy event would need to happen to provide evidence for the simulation hypothesis. 


A lot of crazy stuff is happening in our world. (Elon Musk, political figures, whatever) If lack of one crazy thing is evidence against hypothesis, then existence of crazy things must be evidence for the hypothesis. If you only see it one way, you violate the law of conservation of expected evidence.

Comment by Jacob G-W (g-w1) on The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit · 2024-01-05T23:37:23.636Z · LW · GW

Yeah, it feels like an accurate and succinct description of what hippies are (anecdote: my grandfather was at Woodstock and he's pretty cool). Not saying I endorse it, but there are certainly some good aspects.

I'm not sure and it would be interesting to find out if being a hippy seriously messes up a fraction of people who try it and we just don't hear about it due to selection bias. My guess is that this happens, especially with drugs.

Comment by Jacob G-W (g-w1) on The Hippie Rabbit Hole -Nuggets of Gold in Rivers of Bullshit · 2024-01-05T20:41:13.804Z · LW · GW

I like the description of hippies as people who iterate really fast on psychological practices!

Comment by Jacob G-W (g-w1) on Planning to build a cryptographic box with perfect secrecy · 2024-01-01T08:20:51.263Z · LW · GW

I'm a bit confused on how boxing an AI would be useful (to an extreme extent). If we don't allow any output bits to come out of the ASI, then how do we know if it worked? Why would we want to run it if we can't see what it does? Or do we only want to limit the output to  bits and prevent any side-channel attacks? I guess the theory then would be that  bits are not enough to destroy the world. Like maybe for , it would not be enough to persuade a person to do something that would unbox the AI (but it might).

This seems to be one of the only solutions in which such a proof can exist. If we were to use another solution, like changing the superintelligence's objectives, then finding such a proof would be extremely hard, or even impossible. However, if we think that we could all die by making a superintelligence, then we should have an unconditional proof of safety.

I don't think having a formal proof should be an objective in and of itself. Especially if the proof is along the lines "The superintelligence has to be boxed because it can only run encrypted code and can't communicate with the outside world"

I'm sorry if this comment sounds overly negative, and please let me know if I am interpreting this post wrong. This work seems quite interesting, even just for computer science/cryptography's sake (although saving the world would also be nice :)).

Comment by Jacob G-W (g-w1) on The Plan - 2023 Version · 2024-01-01T04:21:11.683Z · LW · GW

Thanks for the update! I have a few questions:

  1. In last year's update, you suspected that alignment was gradually converging towards a paradigm. What do you think is the state of the paradigmatic convergence now?
  2. Also as @Chris_Leong asked, does using sparse autoencoders to find monosemantic neurons help find natural abstractions? Or is that still Choosing The Ontology? What, if not these types of concepts, are you thinking natural abstractions are/will be?
Comment by Jacob G-W (g-w1) on Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible · 2023-12-12T23:53:02.791Z · LW · GW

This is super interesting and I have a question:

How difficult would it be to also apply this to the gamates and thus make any potential offspring also have the same enhanced intelligence (but this time it would go into the gene pool instead of just staying in the brain)? Does the scientific establishment think this is ethical? (Also, if you do something like this, you reduce the homogeneity of the gene pool which could make the modified babies very susceptible to some sort of disease. Would it be worth it to give the GMO babies a random subset of the changes to increase variation?)