Posts

Finding Sparse Linear Connections between Features in LLMs 2023-12-09T02:27:42.456Z

Comments

Comment by Eccentricity on My intellectual journey to (dis)solve the hard problem of consciousness · 2024-04-07T06:22:56.988Z · LW · GW

I’m confused. I know that it is like something to be me (this is in some sense the only thing I know for sure). It seems like there rules which shape the things I experience, and some of those rules can be studied (like the laws of physics). We are good enough at understanding some of these rules to predict certain systems with a high degree of accuracy, like how an asteroid will orbit a star or how electrons will be pushed through a wire by a particular voltage in a circuit. But I have no way to know or predict if it is like something to be a fish or GPT-4. I know that physical alterations to my brain seem to affect my experience, so it seems like there is a mapping from physical matter to experiences. I do not know precisely what this mapping is, and this indeed seems like a hard problem. In what sense do you disagree with my framing here?

Comment by Eccentricity on Chapter 5: The Fundamental Attribution Error · 2024-03-17T19:31:48.196Z · LW · GW

Oh good catch, I missed that. Thanks!

Comment by Eccentricity on Evolution did a surprising good job at aligning humans...to social status · 2024-03-11T20:26:38.731Z · LW · GW

The only metric natural selection is “optimizing” for is inclusive genetic fitness. It did not “try” to align humans with social status, and in many cases people care about social status to the detriment of their inclusive genetic fitness. This is a failure of alignment, not a success.

Comment by Eccentricity on Can we get an AI to do our alignment homework for us? · 2024-02-27T02:47:04.695Z · LW · GW

I am not so sure it will be possible to extract useful work towards solving alignment out of systems we do not already know how to carefully steer. I think that substantial progress on alignment is necessary before we know how to build things that actually want to help us advance the science. Even if we built something tomorrow that was in principle smart enough to do good alignment research, I am concerned we don’t know how to make it actually do that rather than, say, imitate more plausible-sounding but incorrect ideas. The fact that appending silly phrases like “I’ll tip $200” improves the probability of receiving correct code from current LLMs indicates to me that we haven’t succeeded at aligning them to maximally want to produce correct code when they are capable of doing so.

Comment by Eccentricity on Chapter 5: The Fundamental Attribution Error · 2024-02-21T17:16:23.684Z · LW · GW

How does Harry know the name “Lucius Malfoy”?

Comment by Eccentricity on Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events · 2024-01-21T16:39:56.611Z · LW · GW

We aren’t surprised by HHTHHTTTHT or whatever because we perceive it as the event “a sequence containing a similar number of heads and tails in any order, ideally without a long subsequence of H or T”, which occurs frequently.

Comment by Eccentricity on An even deeper atheism · 2024-01-11T21:50:55.423Z · LW · GW

I’m enjoying this series, and look forward to the next installment.

Comment by Eccentricity on "Dark Constitution" for constraining some superintelligences · 2024-01-10T19:22:14.711Z · LW · GW

The thing I mean by “superintelligence” is very different from a government. A government cannot design nanotechnology, and is made of humans which value human things.

Comment by Eccentricity on Deep atheism and AI risk · 2024-01-05T21:06:08.051Z · LW · GW

What can men do against such reckless indifference?

Comment by Eccentricity on Paper Summary: The Koha Code - A Biological Theory of Memory · 2023-12-31T17:28:42.172Z · LW · GW

Can someone with more knowledge give me a sense of how new this idea is, and guess at the probability that it is onto something?

Comment by Eccentricity on The Consciousness Box · 2023-12-12T06:01:51.885Z · LW · GW

Why are we so sure chatbots (and parrots for that matter) are not conscious? Well, maybe the word is just too slippery to define, but I would bet that parrots have some degree of subjective experience, and I am sufficiently uncertain regarding chatbots that I do worry about it slightly.

Comment by Eccentricity on The Offense-Defense Balance Rarely Changes · 2023-12-09T17:23:41.989Z · LW · GW

Please note that the graph of per capita war deaths is on a log scale. The number moves over several orders of magnitude. One could certainly make the case that local spikes were sometimes caused by significant shifts in the offense-defense balance (like tanks and planes making offense easier for a while at the beginning of WWII). These shifts are pushed back to equilibrium over time, but personally I would be pretty unhappy about, say, deaths from pandemics spiking 4 orders of magnitude before returning to equilibrium.

Comment by Eccentricity on AI Timelines · 2023-11-11T08:52:47.295Z · LW · GW

This random Twitter person says that it can't. Disclaimer: haven't actually checked for myself.

 


https://chat.openai.com/share/36c09b9d-cc2e-4cfd-ab07-6e45fb695bb1

Here is me playing against GPT-4, no vision required. It does just fine at normal tic-tac-toe, and figures out anti-tic-tac-toe with a little bit of extra prompting.

Comment by Eccentricity on [deleted post] 2023-10-26T04:45:49.308Z

Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.

Comment by Eccentricity on [deleted post] 2023-10-26T04:19:12.182Z

That's a very fair response. My claim here is really about the outer alignment problem, and that if lots of people have access to the ability to create / fine tune AI agents, many agents that have goals misaligned with humanity as a whole will be created, and we will lose control of the future.

Comment by Eccentricity on [deleted post] 2023-10-26T03:46:23.117Z

I suppose what I'm trying to point to is some form of the outer alignment problem. I think we may end up with AIs that are aligned with human organizations like corporations more than individual humans. The reason for this is that corporations or militaries which employ more ruthless AIs will, over time, accrue more power and resources. It's not so much explicit (i.e. violent) competition, but rather the gradual tendency for systems which are power-seeking and resource-maximizing to end up with more power and resources over time. If we allow for the creation / fine tuning of many AI agents, and allow them to accrue resources and copy themselves, then natural selection will favor the more selfish ones which are least aligned with humanity at large. We already require pretty extensive regulation to make sure that corporations don't incur significant negative externalities, and these are organizations that are run by and composed of humans. When those entities are no longer humans, I think the vast majority of power and resources will no longer be explicitly controlled by humans, and moreover will be controlled by AI which has values poorly aligned with the majority of humans. The AI's goals will only be aligned with the short-term interests of the small number of humans that created them in the first place. Once the majority of people realize that this system is not acting in their long-term interests, there will be nothing they can do about it.

Comment by Eccentricity on Architects of Our Own Demise: We Should Stop Developing AI · 2023-10-26T02:00:31.541Z · LW · GW
Comment by Eccentricity on Architects of Our Own Demise: We Should Stop Developing AI · 2023-10-26T01:01:44.311Z · LW · GW

Yeah. I think a key point that is often overlooked is that even if powerful AI is technically controllable, i.e. we solve inner alignment, that doesn't mean society will handle it safely. I think by default it looks like every company and military is forced to start using a ton of AI agents (or they will be outcompeted by someone else who does). Competition between a bunch of superhuman AIs that are trying to maximize profits or military tech seems really bad for us. We might not lose control all at once, but rather just be gradually outcompeted by machines, where "gradually" might actually be pretty quick. Basically, we die by Moloch.

Comment by Eccentricity on A Good Explanation of Differential Gears · 2023-10-23T01:54:19.631Z · LW · GW

Yeah, I've seen this video before. Still excellent.

Comment by Eccentricity on Commonsense Good, Creative Good · 2023-09-28T05:58:36.833Z · LW · GW

Yeah, in general, we are pretty compute limited and should stick to good heuristics for most kinds of problems. I do think that most people rely too much on heuristics, so for the average person the useful lesson is "actually stop and think about things once in a while", but I can see how the opposite problem may sometimes arise in this community.

Comment by Eccentricity on Petrov Day [Spoiler Warning] · 2023-09-28T05:33:10.563Z · LW · GW

I find it useful to distinguish between epistemic and instrumental rationality. You're talking about instrumental rationality – and it could be instrumentally useful to convince someone of your beliefs, to teach them to think clearly, or to actively mislead them. 
Epistemic rationality, on the other hand, means trying to have true beliefs, and in this case it's better to teach someone to fish than to force them to accept your fish.

Comment by Eccentricity on Jacob on the Precipice · 2023-09-28T05:28:44.612Z · LW · GW

In the doomsday argument, we are the random runner. If the runner with only 10 people behind him assumed his position was randomly selected, and tried to estimate the total number of runners, he would be very wrong. We could very well be that runner near the back of the race; we weren't randomly selected to be at the back, we just are, and the fact that there are ten people behind us doesn't give us meaningful information about the total number of runners.

Comment by Eccentricity on Jacob on the Precipice · 2023-09-27T19:17:22.437Z · LW · GW

Okay, suppose I was born in Teenytown, a little village on the island nation of Nibblenest. The little one-room schoolhouse in Teenytown isn't very advanced, so no one ever teaches me that there are billions of other people living in all the places I've never heard of. Now, I might think to myself, the world must be very small – surely, if there were billions of people living in millions of towns and cities besides Teenytown, it would be very unlikely to be born in Teenytown; therefore, Teenytown must be one of the only villages on Earth.

Clearly, this is absurd, right? The Doomsday argument says that if there are lots of other people in X scenario that is different from mine (be it living in the future or across the ocean), then it would be unlikely for me to experience not X, therefore those other people most likely don't exist. But I am me, and I couldn't be anyone else. It makes no sense to talk about the "probability of being me". I don't think it is possible to "assume I am a randomly sampled observer" or something like that.

The number of humans that I notice have been born to date does not depend whatsoever on how many humans might exist in the future. My experience looks exactly the same whether humanity will be deleted tomorrow by a rogue black hole or spend billions of years spreading across the universe.

Comment by Eccentricity on Jacob on the Precipice · 2023-09-27T03:13:49.836Z · LW · GW

I think the claim that we basically understand the universe is misleading. I'm especially unconvinced by your vague explanation of consciousness; I don't think we have anything close to an empirically supported mechanistic model that makes good predictions. I personally have significant uncertainty regarding what kinds of things can have subjective experiences, or why they do.

This also feels like a good opportunity to say that the Doomsday argument has never made much sense to me; it has always felt wrong to me to treat being “me” as a random sample of observers. I couldn’t be anyone except me. If the future has trillions of humans or no humans, the person which is me will feel the same way in either case. I can't possibly condition on being me, because I couldn't be anyone else. The doomsday argument treats my perspective as a random sample of all possible humans, or even all possible observers, which feels like a massive type error.

On a similar note, why is it remotely surprising that we live in a universe with laws of physics that support our existence? We couldn't possibly observe any laws of physics except the ones we have. Does it even make sense to say that the laws of physics "could be different"? I'm not convinced we can even imagine a coherent universe with different fundamental laws of physics, in the same way that I can't imagine what it would mean to live in a universe where the circle constant is something other than  This may well just be a failure of my imagination, however – more crucially, hypothesizing that there are lots of universes with different laws of physics doesn't actually explain why we observe our universe.  This kind of multiverse idea is a strictly more complicated hypothesis than just accepting that our universe exists and being agnostic about others, right? The only remotely reasonable argument I've heard in favor of some kind of multiverse is that many-worlds is a simpler interpretation of quantum mechanics than wavefunction collapse. This is a distinct idea from the proposal that our universe was born as a random sample among countless others with different physical laws, which is not a simpler explanation of anything at all as far as I can tell.

If I have misunderstood or mischaracterized these arguments, please let me know.

Comment by Eccentricity on marine cloud brightening · 2023-08-10T13:35:57.510Z · LW · GW

Planes would not be required for stratospheric injection of SO2. It could in theory be done much more cheaply with balloons: https://caseyhandmer.wordpress.com/2023/06/06/we-should-not-let-the-earth-overheat/

Comment by Eccentricity on Why am I Me? · 2023-06-25T22:17:24.492Z · LW · GW

Exactly, it has always felt wrong to me to treat being “me” as a random sample of observers. I couldn’t be anyone except me. If the future has trillions of humans or no humans, the person which is me will feel the same way in either case. I find the doomsday argument absurd because it treats my perspective as a random sample, which feels like a type error.

Comment by Eccentricity on I can see how I am Dumb · 2023-06-11T05:48:15.212Z · LW · GW

Indeed. I think about this type of thing often when I consider the concept of superhuman AI - when I spend hours stuck on a problem with a simple solution or forget something important, it’s not hard to imagine an algorithm much smarter than me that just doesn’t make those mistakes. I think the bar really isn’t that high for improving substantially on human cognition. Our brains have to operate under very strict energy constraints, but I can easily imagine a machine which performs a lot better than me by applying more costly but effective algorithms and using more precise memory. A pocket calculator is the trivial case, but I expect most of the other algorithms my brain uses can also be improved a lot given a larger energy and compute budget.

Comment by Eccentricity on A freshman year during the AI midgame: my approach to the next year · 2023-04-14T15:21:42.602Z · LW · GW

I am a literal freshman, and not feeling super optimistic about the future right now. How should I think about how to spend my time?

Comment by Eccentricity on College Selection Advice for Technical Alignment · 2022-12-17T00:04:57.699Z · LW · GW

Hi Naomi!

The advice about applying to MIT/Stanford is probably correct, if just to have the option. That said, I definitely don't regret ending up here!

Comment by Eccentricity on College Selection Advice for Technical Alignment · 2022-12-16T23:56:18.295Z · LW · GW

We are quite similar! I was also accepted to Harvard REA – exactly one year ago – and was too lazy mentally drained by the application process to apply to MIT after that. I arrived intending to study physics, but I've since realized AI safety is a much more important and exciting problem to work on. Seems like you got there a bit sooner than I did! HAIST is a wonderful community, and also a great resource for finding upskilling and research opportunities. 

I've only been here for a semester, so take this with a grain of salt, but I don't think you should be too worried about taking classes that interest you. It seems like professors are generally pretty flexible if you reach out to them, and it's easy to fulfill general education requirements with hardly any effort (I just took a class on "Anime as Global Popular Culture" which required essentially no work).

Let me know if you have any questions! I also have some older friends who can give you more specific advice. I sincerely hope I get to meet you soon!