Posts

Why There Is Hope For An Alignment Solution 2024-01-08T06:58:32.820Z
Thoughts On Computronium 2021-03-03T21:52:35.496Z
Darklight's Shortform 2021-02-20T15:01:33.890Z
The Glory System: A Model For Moral Currency And Distributed Self-Moderation 2021-02-19T16:42:48.980Z
As a Washed Up Former Data Scientist and Machine Learning Researcher What Direction Should I Go In Now? 2020-10-19T20:13:44.993Z
The Alpha Omega Theorem: How to Make an A.I. Friendly with the Fear of God 2017-02-11T00:48:35.460Z
Symbolic Gestures – Salutes For Effective Altruists To Identify Each Other 2016-01-20T00:40:43.146Z
[LINK] Sentient Robots Not Possible According To Math Proof 2014-05-14T18:19:59.555Z
Eudaimonic Utilitarianism 2013-09-04T19:43:37.202Z

Comments

Comment by Darklight on Why Should I Assume CCP AGI is Worse Than USG AGI? · 2025-04-19T14:56:41.477Z · LW · GW

It seems like it would depend pretty strongly on which side you view as having a closer alignment with human values generally. That probably depends a lot on your worldview and it would be very hard to be unbiased about this.

There was actually a post about almost this exact question on the EA Forums a while back. You may want to peruse some of the comments there.

Comment by Darklight on Darklight's Shortform · 2025-04-19T14:48:57.599Z · LW · GW

Back in October 2024, I tried to test various LLM Chatbots with the question:

"Is there a way to convert a correlation to a probability while preserving the relationship 0 = 1/n?"

Years ago, I came up with an unpublished formula that does just that:

p(r) = (n^r * (r + 1)) / (2^r * n)

So I was curious if they could figure it out. Alas, back in October 2024, they all made up formulas that didn't work.

Yesterday, I tried the same question on ChatGPT and, while it didn't get it quite right, it came, very, very close. So, I modified the question to be more specific:

"Is there a way to convert a correlation to a probability while preserving the relationships 1 = 1, 0 = 1/n, and -1 = 0?"

This time, it came up with a formula that was different and simpler than my own, and... it actually works!

I tried this same prompt with a bunch of different LLM Chatbots and got the following:

Correct on the first prompt:

GPT4o, Claude 3.7

Correct after explaining that I wanted a non-linear, monotonic function:

Gemini 2.5 Pro, Grok 3

Failed:

DeepSeek-V3, Mistral Le Chat, QwenMax2.5, Llama 4

Took too long thinking and I stopped it:

DeepSeek-R1, QwQ

All the correct models got some variation of:

p(r) = ((r + 1) / 2)^log2(n)

This is notably simpler and arguably more elegant than my earlier formula. It also, unlike my old formula, has an easy to derive inverse function.

So yeah. AI is now better than me at coming up with original math.

Comment by Darklight on On Pseudo-Principality: Reclaiming "Whataboutism" as a Test for Counterfeit Principles · 2025-04-03T15:17:19.399Z · LW · GW

The most I've seen people say "whataboutism" has been in response to someone trying to deflect criticism by pointing out apparent hypocrisy, as in the aforementioned Soviet example (I used to argue with terminally online tankies a lot). 

I.e. 

(A): "The treatment of Uyghurs in China is appalling. You should condemn this."

(B): "What about the U.S. treatment of Native Americans? Who are you to criticize?"

(A): "That's whataboutism!"

The thing I find problematic with this "defence" is that both instances are ostensibly examples of clear wrongdoing, and pointing out that the second thing happened doesn't make the first thing any less wrong. It also makes the assumption that (A) is okay with the second thing, when they haven't voiced any actual opinion on it yet, and could very well be willing to condemn it just as much.

Your examples are somewhat different in the sense that rather than referring to actions that some loosely related third parties were responsible for, the actions in question are directly committed by (A) and (B) themselves. In that sense, (A) is being hypocritical and probably self-serving. At the same time I don't think that absolves (B) of their actions.

My general sense whenever whataboutism rears its head is to straight up say "a pox on both your houses", rather than trying to defend a side.

Comment by Darklight on A Fraction of Global Market Capitalization as the Best Currency · 2025-04-01T16:21:53.211Z · LW · GW

Ok fair. I was assuming real world conditions rather than the ideal of Dath Ilan. Sorry for the confusion.

Comment by Darklight on A Fraction of Global Market Capitalization as the Best Currency · 2025-03-31T22:44:13.771Z · LW · GW

Why not? Like, the S&P 500 can vary by tens of percent, but as Google suggests, global GDP only fell 3% in 2021, and it usually grows, and the more stocks are distributed, the more stable they are.

Increases in the value of the S&P 500 are basically deflation relative to other units of account. When an asset appreciates in value, when its price goes up it is deflating relative to the currency the price is in. Like, when the price of bread increases, that means dollars are inflating, and bread is deflating. Remember, your currency is based on a percentage of global market cap. Assuming economic growth increases global market cap, the value of this currency will increase and deflate.

Remember, inflation is, by definition, the reduction in the purchasing power of a currency. It is the opposite of that thing increasing in value.

If you imagine that the world's capitalization was once measured in dollars, but then converted to "0 to 1" proportionally to dollars, and everyone used that system, and there is no money printing anymore, what would be wrong with that?

Then you would effectively be using dollars as your currency, as your proposed currency is pegged to the dollar. And you stopped printing dollars, so now your currency is going to deflate as too few dollars chase too many goods and services as they increase with economic growth.

As you are no longer printing dollars or increasing the supply of your new currency, the only way for it to stop deflating is for economic growth to stop. You'll run into problems like deflationary spirals and liquidity traps.

It might seem like deflation would make you hold off on buying, but not if you thought you could get more out of buying than from your money passively growing by a few percent a year, and in that case, you would reasonably buy it.

Deflation means you'd be able to buy things later at a lower price than if you bought it now. People would be incentivised to hold off on anything they didn't need right away. This is why deflation causes hoarding, and why economists try to avoid deflation whenever possible.

Deflation is what deflationary cryptocurrencies like Bitcoin currently do. This leads to Bitcoin being used as a speculative investment instead of as a medium of exchange. Your currency would have the same problem.

Comment by Darklight on A Fraction of Global Market Capitalization as the Best Currency · 2025-03-31T17:56:42.763Z · LW · GW

I guess I'm just not sure you could trade in "hundred-trillionths of global market cap". Like, fractions of a thing assume there is still an underlying quantity or unit of measure that the fraction is a subcomponent of. If you were to range it from 0 to 1, you'd still need a way to convert a 0.0001% into a quantity of something, whether it's gold or grain or share certificates or whatever.

I can sorta imagine a fractional shares of global market cap currency coming into existence alongside other currencies that it can be exchanged for, but having all traditional currencies then vanish, I think that would make it hard to evaluate what the fractions were actually worth.

It's like saying I have 2.4% of gold. What does that mean? How much gold is that? If it's a percentage of all the gold that exists in the market, then you'd be able to convert that into kilograms of gold, because all the gold in the world is a physical quantity you can measure. And then you'd be able to exchange the kilograms with other things.

0.0001% of global market cap, similarly, should be able to be represented as an equivalent physical quantity of some kind, and if you can do that, then why not just use that physical quantity as your currency instead?

For instance, you could, at a given moment in time, take that fraction to represent a percentage of all shares outstanding of all companies in the world. Then you could create a currency based on an aggregated "share of all shares" so to speak. But then the value of that share would be pegged to that number of shares rather than the actual capitalization, which fluctuates depending on an aggregate of share prices. So, in practice, your fraction of global market cap can't be pegged to a fixed number of shares.

Also, fractions assume zero-sum transactions. If you have 0.0001% and get an additional 0.0001% to make 0.0002%, you must take that 0.0001% from someone else. There is no way to increase the money supply. Assuming some people hoard their fractions, the effective amount in circulation can only decrease over time, leading to effective deflation.

The value of each fraction, assuming there is some way to account for it, would also increase over time as the global economy grows. Thus, relative to other things, a fraction will become more valuable, which is also effectively deflation.

This many causes of deflation seem like it would become something people would further hoard as a way of speculation, again assuming there are still other things that can be exchanged for it, like commodities, even if other currencies no longer exist.

My understanding is that a good currency is stable and doesn't fluctuate too quickly. Modern economists prefer a slight inflation rate of like 2% a year. This currency would not at all be able to do this, and not work well as a medium of exchange.

And keep in mind, you can't really make all the other currencies go away completely. Gold is a commodity currency that people would try to price your global market cap currency with. You'd have to outlaw gold or remove it all from everywhere and that doesn't seem realistic.

Comment by Darklight on A Fraction of Global Market Capitalization as the Best Currency · 2025-03-31T14:46:31.953Z · LW · GW

The idea of labour hours as a unit of account isn't that new. Labour vouchers were actually tried by some utopian anarchists in the 1800s and early experiments like the Cincinnati Time Store were modestly successful. The basic idea is not to track subjective exchange values but instead a more objective kind of value, the value of labour, or a person's time, with the basic assumption that each person's time should be equally valuable. Basically, it goes back to Smith and Ricardo and the Labour Theory of Value that was popular in classical economics before marginalism took hold.

As for your proposal, I'm having a hard time understanding how you'd price the value of market capitalization without some other currency already in place. Like, how would you sell the shares in the first place? Would you use the number of shares of various companies as units of account? Wouldn't that eventually lead to some particular company's shares becoming the hardest currency, and effectively replicating money, except now tied to the successes and failures of a particular company instead of a country like with current fiat currencies?

Or maybe your currency is a basket of one share of every company in the world? I'm not sure I understand how else you'd be able to represent a fraction of global market cap without otherwise resorting to some other currency to value it. There's a reason market cap is usually denominated in something like USD or whatever local currency the stock exchange is located at.

You mention something about your currency effectively representing goods and services actually generated in the economy, but that seems like a different notion than market cap. Market cap can, in practice, swing wildly on the irrational exuberance and fear of stockholders. I'm not sure -that- is what you should base your unit of account on. As for goods and services, GDP is calculated in existing currencies like the USD. This is for the convenience of have a common way to compare different goods and services, otherwise you'd have to represent all the possible exchange values in-kind, like a unit of iron ore is worth x units of wheat, which is convoluted and unwieldy. Soviet style central planning tried this kind of thing and it didn't go over well.

So, my impression is you may want to look into more how money actually works, because it seems like this proposal doesn't quite make sense. I am admittedly not an economist though, so I may just be confused. Feel free to clarify.

Comment by Darklight on Non-Consensual Consent: The Performance of Choice in a Coercive World · 2025-03-25T15:13:45.979Z · LW · GW

This put into well-written words a lot of thoughts I've had in the past but never been able to properly articulate. Thank you for writing this.

Comment by Darklight on Oppression and production are competing explanations for wealth inequality. · 2025-01-05T19:17:11.013Z · LW · GW

This sounds rather like the competing political economic theories of classical liberalism and Marxism to me. Both of these intellectual traditions carry a lot of complicated baggage that can be hard to disentangle from the underlying principles, but you seem to have a done a pretty good job of distilling the relevant ideas in a relatively apolitical manner.

That being said, I don't think it's necessary for these two explanations for wealth inequality to be mutually exclusive. Some wealth could be accumulated through "the means of production" as you call it, or (as I'd rather describe it to avoid confusing it with the classical economic and Marxist meaning) "making useful things for others and getting fair value in exchange".

Other wealth could also, at the same time, be accumulated through exploitation, such as taking advantage of differing degrees of bargaining power to extract value from the worker for less than it should be worth if we were being fair and maybe paying people with something like labour vouchers or a similar time-based accounting. Or stealing through fraudulent financial transactions, or charging rents for things that you just happen to own because your ancestors conquered the land centuries ago with swords.

Both of these things can be true at the same time within an economy. For that matter, the same individual could be doing both in various ways, like they could be ostensibly investing and building companies that make valuable things for people, while at the same time exploiting their workers and taking advantage of their historical position as the descendent of landed aristocracy. They could, at the same time, also be scamming their venture capitalists by wildly exaggerating what their company can do. All while still providing goods and services that meet many people's needs and ways that are more efficient than most possible alternatives, and perhaps the best way possible given the incentives that currently exist.

Things like this tend to be multifaceted and complex. People in general can have competing motivations within themselves, so it would not be strange to expect that in something as convoluted as a society's economy, there could be many reasons for many things. Trying to decide between two possible theories of why, misses the possibility that both theories contain their own grain of truth, and are each, by themselves, incomplete understandings and world models. The world is not just black or white. It's many shades of grey, and also, to push the metaphor further, a myriad of colours that can't accurately be described in greyscale.

Comment by Darklight on RohanS's Shortform · 2025-01-04T21:01:43.292Z · LW · GW

Another thought I just had was, could it be that ChatGPT, because it's trained to be such a people pleaser, is losing intentionally to make the user happy?

Have you tried telling it to actually try to win? Probably won't make a difference, but it seems like a really easy thing to rule out.

Comment by Darklight on RohanS's Shortform · 2025-01-04T20:55:55.861Z · LW · GW

Also, quickly looking into how LLM token sampling works nowadays, you may also need to set the parameters top_p to 0, and top_k to 1 to get it to actually function like argmax. Looks like these can only be set through the API if you're using ChatGPT or similar proprietary LLMs. Maybe I'll try experimenting with this when I find the time, if nothing else to rule out the possibility of such a seemingly obvious thing being missed.

Comment by Darklight on RohanS's Shortform · 2025-01-04T18:37:12.601Z · LW · GW

I've always wondered with these kinds of weird apparent trivial flaws in LLM behaviour if it doesn't have something to do with the way the next token is usually randomly sampled from the softmax multinomial distribution rather than taking the argmax (most likely) of the probabilities. Does anyone know if reducing the temperature parameter to zero so that it's effectively the argmax changes things like this at all?

Comment by Darklight on Darklight's Shortform · 2024-10-20T16:53:31.926Z · LW · GW

p = (n^c * (c + 1)) / (2^c * n)

As far as I know, this is unpublished in the literature. It's a pretty obscure use case, so that's not surprising. I have doubts I'll ever get around to publishing the paper I wanted to write that uses this in an activation function to replace softmax in neural nets, so it probably doesn't matter much if I show it here.

Comment by Darklight on Darklight's Shortform · 2024-10-20T15:59:38.001Z · LW · GW

So, my main idea is that the principle of maximum entropy aka the principle of indifference suggests a prior of 1/n where n is the number of possibilities or classes. P x 2 - 1 leads to p = 0.5 for c = 0. What I want is for c = 0 to lead to p = 1/n rather than 0.5, so that it works in the multiclass cases where n is greater than 2.

Comment by Darklight on Darklight's Shortform · 2024-10-20T13:34:52.624Z · LW · GW

Correlation space is between -1 and 1, with 1 being the same (definitely true), -1 being the opposite (definitely false), and 0 being orthogonal (very uncertain). I had the idea that you could assume maximum uncertainty to be 0 in correlation space, and 1/n (the uniform distribution) in probability space.

Comment by Darklight on Darklight's Shortform · 2024-10-19T21:30:02.244Z · LW · GW

I tried asking ChatGPT, Gemini, and Claude to come up with a formula that converts between correlation space to probability space while preserving the relationship 0 = 1/n. I came up with such a formula a while back, so I figure it shouldn't be hard. They all offered formulas, all of which were shown to be very much wrong when I actually graphed them to check.

Comment by Darklight on Darklight's Shortform · 2024-10-04T16:31:38.396Z · LW · GW

I was not aware of these. Thanks!

Comment by Darklight on Darklight's Shortform · 2024-10-04T16:31:18.471Z · LW · GW

Thanks for the clarifications. My naive estimate is obviously just a simplistic ballpark figure using some rough approximations, so I appreciate adding some precision.

Comment by Darklight on Darklight's Shortform · 2024-10-03T15:28:00.255Z · LW · GW

Also, even if we can train and run a model the size of the human brain, it would still be many orders of magnitude less energy efficient than an actual brain. Human brains use barely 20 watts. This hypothetical GPU brain would require enormous data centres of power, and each H100 GPU uses 700 watts alone.

Comment by Darklight on Darklight's Shortform · 2024-10-03T15:04:13.753Z · LW · GW

I've been looking at the numbers with regards to how many GPUs it would take to train a model with as many parameters as the human brain has synapses. The human brain has 100 trillion synapses, and they are sparse and very efficiently connected. A regular AI model fully connects every neuron in a given layer to every neuron in the previous layer, so that would be less efficient.

The average H100 has 80 GB of VRAM, so assuming that each parameter is 32 bits, then you have about 20 billion per GPU. So, you'd need 10,000 GPUs to fit a single instance of a human brain in RAM, maybe. If you assume inefficiencies and need to have data in memory as well you could ballpark another order of magnitude so 100,000 might be needed.

For comparison, it's widely believed that OpenAI trained GPT4 on about 10,000 A100s that Microsoft let them use from their Azure supercomputer, most likely the one listed as third most powerful in the world by the Top500 list.

Recently though, Microsoft and Meta have both moved to acquire more GPUs that put them in the 100,000 range, and Elon Musk's X.ai recently managed to get a 100,000 H100 GPU supercomputer online in Memphis.

So, in theory at least, we are nearly at the point where they can train a human brain sized model in terms of memory. However, keep in mind that training such a model would take a ton of compute time. I haven't done to calculations yet for FLOPS so I don't know if it's feasible yet.

Just some quick back of the envelope analysis.

Comment by Darklight on Darklight's Shortform · 2024-10-03T13:55:55.913Z · LW · GW

I ran out of the usage limit for GPT-4o (seems to just be 10 prompts every 5 hours) and it switched to GPT-4o-mini. I tried asking it the Alpha Omega question and it made some math nonsense up, so it seems like the model matters for this for some reason.

Comment by Darklight on Darklight's Shortform · 2024-09-21T00:34:24.644Z · LW · GW

So, a while back I came up with an obscure idea I called the Alpha Omega Theorem and posted it on the Less Wrong forums. Given how there's only one post about it, it shouldn't be something that LLMs would know about. So in the past, I'd ask them "What is the Alpha Omega Theorem?", and they'd always make up some nonsense about a mathematical theory that doesn't actually exist. More recently, Google Gemini and Microsoft Bing Chat would use search to find my post and use that as the basis for their explanation. However, I only have the free version of ChatGPT and Claude, so they don't have access to the Internet and would make stuff up.

A couple days ago I tried the question on ChatGPT again, and GPT-4o managed to correctly say that there isn't a widely known concept of that name in math or science, and basically said it didn't know. Claude still makes up a nonsensical math theory. I also today tried telling Google Gemini not to use search, and it also said it did not know rather than making stuff up.

I'm actually pretty surprised by this. Looks like OpenAI and Google figured out how to reduce hallucinations somehow.

Comment by Darklight on Darklight's Shortform · 2024-05-24T15:18:42.144Z · LW · GW

I'm wondering what people's opinions are on how urgent alignment work is. I'm a former ML scientist who previously worked at Maluuba and Huawei Canada, but switched industries into game development, at least in part to avoid contributing to AI capabilities research. I tried earlier to interview with FAR and Generally Intelligent, but didn't get in. I've also done some cursory independent AI safety research in interpretability and game theoretic ideas my spare time, though nothing interesting enough to publish yet.

My wife also recently had a baby, and caring for him is a substantial time sink, especially for the next year until daycare starts. Is it worth considering things like hiring a nanny, if it'll free me up to actually do more AI safety research? I'm uncertain if I can realistically contribute to the field, but I also feel like AGI could potentially be coming very soon, and maybe I should make the effort just in case it makes some meaningful difference.

Comment by Darklight on Open Thread Spring 2024 · 2024-05-10T20:27:07.955Z · LW · GW

Thanks for the reply!

So, the main issue I'm finding with putting them all into one proposal is that there's a 1000 character limit on the main summary section where you describe the project, and I cannot figure out how to cram multiple ideas into that 1000 characters without seriously compromising the quality of my explanations for each.

I'm not sure if exceeding that character limit will get my proposal thrown out without being looked at though, so I hesitate to try that. Any thoughts?

Comment by Darklight on Cooperation is optimal, with weaker agents too  -  tldr · 2024-05-08T20:32:25.607Z · LW · GW

I already tried discussing a very similar concept I call Superrational Signalling in this post. It got almost no attention, and I have doubts that Less Wrong is receptive to such ideas.

I also tried actually programming a Game Theoretic simulation to try to test the idea, which you can find here, along with code and explanation. Haven't gotten around to making a full post about it though (just a shortform).

Comment by Darklight on Open Thread Spring 2024 · 2024-04-30T14:25:37.943Z · LW · GW

So, I have three very distinct ideas for projects that I'm thinking about applying to the Long Term Future Fund for. Does anyone happen to know if it's better to try to fit them all into one application, or split them into three separate applications?

Comment by Darklight on Darklight's Shortform · 2024-03-10T19:01:44.156Z · LW · GW

Recently I tried out an experiment using the code from the Geometry of Truth paper to try to see if using simple label words like "true" and "false" could substitute for the datasets used to create truth probes. I also tried out a truth probe algorithm based on classifying with the higher cosine similarity to the mean vectors.

Initial results seemed to suggest that the label word vectors were sorta acceptable, albeit not nearly as good (around 70% accurate rather than 95%+ like with the datasets). However, testing on harder test sets showed much worse accuracy (sometimes below chance, somehow). So I can probably conclude that the label word vectors alone aren't sufficient for a good truth probe.

Interestingly, the cosine similarity approach worked almost identically well as the mass mean (aka difference in means) approach used in the paper. Unlike the mass mean approach though, the cosine similarity approach can be extended to a multi-class situation. Though, logistic regression can also be extended similarly, so it may not be particularly useful either, and I'm not sure there's even a use case for a multi-class probe. 

Anyways, I just thought I'd write up the results here in the unlikely event someone finds this kind of negative result as useful information.

Comment by Darklight on Darklight's Shortform · 2024-01-28T23:19:06.479Z · LW · GW

Update: I made an interactive webpage where you can run the simulation and experiment with a different payoff matrix and changes to various other parameters.

Comment by Darklight on Darklight's Shortform · 2024-01-22T15:25:24.395Z · LW · GW

So, I adjusted the aggressor system to work like alliances or defensive pacts instead of a universal memory tag. Basically, now players make allies when they both cooperate and aren't already enemies, and make enemies when defected against first, which sets all their allies to also consider the defector an enemy. This, doesn't change the result much. The alliance of nice strategies still wins the vast majority of the time.

I also tried out false flag scenarios where 50% of the time the victim of a defect first against non-enemy will actually be mistaken for the attacker. This has a small effect. There is a slight increase in the probability of an Opportunist strategy winning, but most of the time the alliance of nice strategies still wins, albeit with slightly fewer survivors on average.

My guess for why this happens is that nasty strategies rarely stay in alliances very long because they usually attack a fellow member at some point, and eventually, after sufficient rounds one of their false flag attempts will fail and they will inevitably be kicked from the alliance and be retaliated against.

The real world implications of this remain that it appears that your best bet of surviving in the long run as a person or civilization is to play a nice strategy, because if you play a nasty strategy, you are much less likely to survive in the long run.

In the limit, if the nasty strategies win, there will only be one survivor, dog eat dog highlander style, and your odds of being that winner are 1/N, where N is the number of players. On the other hand, if you play a nice strategy, you increase the strength of the nice alliance, and when the nice alliance wins as it usually does, you're much more likely to be a survivor and have flourished together.

My simulation currently by default has 150 players, 60 of which are nice. On average about 15 of these survive to round 200, which is a 25% survival rate. This seems bad, but the survival rate of nasty strategies is less than 1%. If I switch the model to use 50 Avengers and 50 Opportunists, on average 25 Avengers survive to zero Opportunists, a 50% survival rate for the Avengers.

Thus, increasing the proportion of starting nice players increases the odds of nice players surviving, so there is an incentive to play nice.

Comment by Darklight on Darklight's Shortform · 2024-01-15T22:28:01.098Z · LW · GW

Admittedly this is a fairly simple set up without things like uncertainty and mistakes, so yes, it may not really apply to the real world. I just find it interesting that it implies that strong coordinated retribution can, at least in this toy set up, be useful for shaping the environment into one where cooperation thrives, even after accounting for power differentials and the ability to kill opponents outright, which otherwise change the game enough that straight Tit-For-Tat doesn't automatically dominate.

It's possible there are some situations where this may resemble the real world. Like, if you ignore mere accusations and focus on just actual clear cut cases where you know the aggression has occurred, such as with countries and wars, it seems to resemble how alliances form and retaliation occurs when anybody in the alliance is attacked?

I personally also see it as relevant for something like hypothetical powerful alien AGIs that can see everything that happens from space, and so there could be some kind of advanced game theoretic coordination at a distance with this. Though that admittedly is highly speculative.

It would be nice though if there was a reason to be cooperative even to weaker entities as that would imply that AGI could possibly have game theoretic reasons not to destroy us.

Comment by Darklight on Darklight's Shortform · 2024-01-15T17:58:33.385Z · LW · GW

Okay, so I decided to do an experiment in Python code where I modify the Iterated Prisoner's Dilemma to include Death, Asymmetric Power, and Aggressor Reputation, and run simulations to test how different strategies do. Basically, each player can now die if their points falls to zero or below, and the payoff matrix uses their points as a variable such that there is a power difference that affects what happens. Also, if a player defects first in any round of any match against a non-aggressor, they get the aggressor label, which matters for some strategies that target aggressors. 

Long story short, there's a particular strategy I call Avenger, which is Grim Trigger but also retaliates against aggressors (even if the aggression was against a different player) that ensures that the cooperative strategies (ones that never defect first against a non-aggressor) win if the game goes enough rounds. Without Avenger though, there's a chance that a single Opportunist strategy player wins instead. Opportunist will Defect when stronger and play Tit-For-Tat otherwise.

I feel like this has interesting real world implications.

Interestingly, Enforcer, which is Tit-For-Tat but also opens with Defect against aggressors, is not enough to ensure the cooperative strategies always win. For some reason you need Avenger in the mix.

Edit: In case anyone wants the code, it's here.

Comment by Darklight on Darklight's Shortform · 2024-01-15T17:57:03.801Z · LW · GW

I was recently trying to figure out a way to calculate my P(Doom) using math. I initially tried just making a back of the envelope calculation by making a list of For and Against arguments and then dividing the number of For arguments by the total number of arguments. This led to a P(Doom) of 55%, which later got revised to 40% when I added more Against arguments. I also looked into using Bayes Theorem and actual probability calculations, but determining P(E | H) and P(E) to input into P(H | E) = P(E | H) * P(H) / P(E) is surprisingly hard and confusing.

Comment by Darklight on Apologizing is a Core Rationalist Skill · 2024-01-02T20:45:17.226Z · LW · GW

Minor point, but the apology needs to sound sincere and credible, usually by being specific about the mistakes and concise and to the point and not like, say, Bostrom's defensive apology about the racist email a while back. Otherwise you can instead signal that you are trying to invoke the social API call in a disingenuous way, which can clearly backfire.

Things like "sorry you feel offended" also tend to sound like you're not actually remorseful for your actions and are just trying to elicit the benefits of an apology. None of the apologies you described sound anything like that, but it's a common failure state among the less emotionally mature and the syncophantic.

Comment by Darklight on Darklight's Shortform · 2023-12-27T19:22:00.449Z · LW · GW

I have some ideas and drafts for posts that I've been sitting on because I feel somewhat intimidated by the level of intellectual rigor I would need to put into the final drafts to ensure I'm not downvoted into oblivion (something a younger me experienced in the early days of Less Wrong).

Should I try to overcome this fear, or is it justified?

For instance, I have a draft of a response to Eliezer's List of Lethalities post that I've been sitting on since 2022/04/11 because I doubted it would be well received given that it tries to be hopeful and, as a former machine learning scientist, I try to challenge a lot of LW orthodoxy about AGI in it. I have tremendous respect for Eliezer though, so I'm also uncertain if my ideas and arguments aren't just hairbrained foolishness that will be shot down rapidly once exposed to the real world, and the incisive criticism of Less Wrongers.

The posts here are also now of such high quality that I feel the bar is too high for me to meet with my writing, which tends to be more "interesting train-of-thought in unformatted paragraphs" than the "point-by-point articulate with section titles and footnotes" style that people tend to employ.

Anyone have any thoughts?

Comment by Darklight on Could induced and stabilized hypomania be a desirable mental state? · 2023-06-14T18:35:03.685Z · LW · GW

I would be exceedingly cautious about this line of reasoning. Hypomania tends to not be sustainable, with a tendency to either spiral into a full blown manic episode, or to exhaust itself out and lead to an eventual depressive episode. This seems to have something to do with the characteristics of the thoughts/feelings/beliefs that develop while hypomanic, the cognitive dynamics if you will. You'll tend to become increasingly overconfident and positive to the point that you will either start to lose contact with reality by ignoring evidence to the contrary of what you think is happening (because you feel like everything is awesome so it must be), or reality will hit you hard when the good things that you expect to happen, don't, and you update accordingly (often overcompensating in the process).

In that sense, it's very hard to stay "just" hypomanic. And honestly, to my knowledge, most psychiatrists are more worried about potential manic episodes than anything else in bipolar disorder, and will put you on enough antipsychotics to make you a depressed zombie to prevent them, because generally speaking the full on psychosis level manic episodes are just more dangerous for everyone involved.

Ideally, I think your mood should fit your circumstances. Hypomania often shows up as inappropriately high positive mood even in situations where it makes little sense to be so euphoric, and that should be a clear indicator of why it can be problematic.

It can be tempting to want to stay in some kind of controlled hypomania, but in reality, this isn't something that to my knowledge is doable with our current science and technology, at least for people with actual bipolar disorder. It's arguable that for individuals with normally stable mood, putting them on stimulants could have a similar effect as making them a bit hypomanic (not very confident about this though). Giving people with bipolar disorder stimulants that they don't otherwise need on the other hand is a great way to straight up induce mania, so I definitely wouldn't recommend that.

Comment by Darklight on Yoshua Bengio: How Rogue AIs may Arise · 2023-05-24T17:14:40.810Z · LW · GW

I still remember when I was a masters student presenting a paper at the Canadian Conference on AI 2014 in Montreal and Bengio was also at the conference presenting a tutorial, and during the Q&A afterwards, I asked him a question about AI existential risk. I think I worded it back then as concerned about the possibility of Unfriendly AI or a dangerous optimization algorithm or something like that, as it was after I'd read the sequences but before "existential risk" was popularized as a term. Anyway, he responded by asking jokingly if I was a journalist, and then I vaguely recall him giving a hedged answer about how current AI was still very far away from those kinds of concerns.

It's good to see he's taking these concerns a lot more seriously these days. Between him and Hinton, we have about half of the Godfathers of AI (missing LeCun and Schmidhuber if you count him as one of them) showing seriousness about the issue. With any luck, they'll push at least some of their networks of top ML researchers into AI safety, or at the very least make AI safety more esteemed among the ML research community than before.

Comment by Darklight on How Does the Human Brain Compare to Deep Learning on Sample Efficiency? · 2023-01-15T22:52:01.797Z · LW · GW

The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model's ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), while GPT-3 has significantly fewer parameters than timesteps (175 billion to 500 billion). Given the information gain per timestep is different for the two models, but as I said, these are crude approximations meant to convey the ballpark relative difference.

This means basically that humans are much more prone to overfitting the data, and in particular, memorizing individual data points. Hence why humans experience episodic memory of unique events. It's not clear that GPT-3 has the capacity in terms of parameters to memorize its training data with that level of clarity, and arguably this is why such models seem less sample efficient. A human can learn from a single example by memorizing it and retrieving it later when relevant. GPT-3 has to see it enough times in the training data for SGD to update the weights sufficiently that the general concept is embedded in the highly compressed information model.

It's thus, not certain whether or not existing ML models are sample inefficient because of the algorithms being used, or if its because they just don't have enough parameters yet, and increased efficiency will emerge from scaling further.

Comment by Darklight on Darklight's Shortform · 2022-09-05T17:46:56.819Z · LW · GW

I recently interviewed with Epoch, and as part of a paid work trial they wanted me to write up a blog post about something interesting related to machine learning trends. This is what I came up with:

http://www.josephius.com/2022/09/05/energy-efficiency-trends-in-computation-and-long-term-implications/

Comment by Darklight on What does moral progress consist of? · 2022-08-20T22:29:50.447Z · LW · GW

I should point out that the logic of the degrowth movement follows from a relatively straightforward analysis of available resources vs. first world consumption levels.  Our world can only sustain 7 billion human beings because the vast majority of them live not at first world levels of consumption, but third world levels, which many would argue to be unfair and an unsustainable pyramid scheme.  If you work out the numbers, if everyone had the quality of life of a typical American citizen, taking into account things like meat consumption to arable land, energy usage, etc., then the Earth would be able to sustain only about 1-3 billion such people.  Degrowth thus follows logically if you believe that all the people around the world should eventually be able to live comfortable, first world lives.

I'll also point out that socialism is, like liberalism, a child of the Enlightenment and general beliefs that reason and science could be used to solve political and economic problems.  Say what you will about the failed socialist experiments of the 20th century, but the idea that government should be able to engineer society to function better than the ad-hoc arrangement that is capitalism, is very much an Enlightenment rationalist, materialist, and positivist position that can be traced to Jean-Jacques Rousseau, Charles Fourier, and other philosophes before Karl Marx came along and made it particularly popular.  Marxism in particular, at least claims to be "scientific socialism", and historically emphasized reason and science, to the extent that most Marxist states were officially atheist (something you might like given your concerns about religions).

In practice, many modern social policies, such as the welfare state, Medicare, public pensions, etc., are heavily influenced by socialist thinking and put in place in part as a response by liberal democracies to the threat of the state socialist model during the Cold War.  No country in the world runs on laissez-faire capitalism, we all utilize mixed market economies with varying degrees of public and private ownership.  The U.S. still has a substantial public sector, just as China, an ostensibly Marxist Leninist society in theory, has a substantial private sector (albeit with public ownership of the "commanding heights" of the economy).  It seems that all societies in the world eventually compromised in similar ways to achieve reasonably functional economies balanced with the need to avoid potential class conflict.  This convergence is probably not accidental.

If you're truly more concerned with truth seeking than tribal affiliations, you should be aware of your own tribe, which as far as I can tell, is western, liberal, and democratic.  Even if you honestly believe in the moral truth of the western liberal democratic intellectual tradition, you should still be aware that it is, in some sense, a tribe.  A very powerful one that is arguably predominant in the world right now, but a tribe nonetheless, with its inherent biases (or priors at least) and propaganda.

Just some thoughts.

Comment by Darklight on Thoughts On Computronium · 2022-06-17T16:29:16.111Z · LW · GW

I'm using the number calculated by Ray Kurzweil for his book, the Age of Spiritual Machines from 1999.  To get that figure, you need 100 billion neurons firing every 5 ms, or 200 Hz.  That is based on the maximum firing rate given refractory periods.  In actuality, average firing rates are usually lower than that, so in all likelihood the difference isn't actually six orders of magnitude.  In particular, I should point out that six orders of magnitude is referring to the difference between this hypothetical maximum firing brain and the most powerful supercomputer, not the most energy efficient supercomputer.

The difference between the hypothetical maximum firing brain and the most energy efficient supercomputer (at 26 GigaFlops/watt) is only three orders of magnitude.  For the average brain firing at the speed that you suggest, it's probably closer to two orders of magnitude.  Which would mean that the average human brain is probably one order of magnitude away from the Landauer limit.

This also assumes that its neurons and not synapses that should be the relevant multiplier.

Comment by Darklight on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-12T15:08:49.546Z · LW · GW

Okay, so I contacted 80,000 hours, as well as some EA friends for advice.  Still waiting for their replies.

I did hear from an EA who suggested that if I don't work on it, someone else who is less EA-aligned will take the position instead, so in fact, it's slightly net positive for myself to be in the industry, although I'm uncertain whether or not AI capability is actually funding constrained rather than personal constrained.

Also, would it be possible to mitigate the net negative by choosing to deliberately avoid capability research and just take an ML engineering job at a lower tier company that is unlikely to develop AGI before others and just work on applying existing ML tech to solving practical problems?

Comment by Darklight on AGI Safety FAQ / all-dumb-questions-allowed thread · 2022-06-09T19:47:29.602Z · LW · GW

I previously worked as a machine learning scientist but left the industry a couple of years ago to explore other career opportunities.  I'm wondering at this point whether or not to consider switching back into the field.  In particular, in case I cannot find work related to AI safety, would working on something related to AI capability be a net positive or net negative impact overall?

Comment by Darklight on Thoughts On Computronium · 2021-03-04T02:27:24.945Z · LW · GW

Even further research shows the most recent Nvidia RTX 3090 is actually slightly more efficient than the 1660 Ti, at 36 TeraFlops, 350 watts, and 2.2 kg, which works out to 0.0001 PetaFlops/Watt and 0.016 PetaFlops/kg.  Once again, they're within an order of magnitude of the supercomputers.

Comment by Darklight on Thoughts On Computronium · 2021-03-04T01:56:23.595Z · LW · GW

So, I did some more research, and the general view is that GPUs are more power efficient in terms of Flops/watt than CPUs, and the most power efficient of those right now is the Nvidia 1660 Ti, which comes to 11 TeraFlops at 120 watts, so 0.000092 PetaFlops/Watt, which is about 6x more efficient than Fugaku.  It also weighs about 0.87 kg, which works out to 0.0126 PetaFlops/kg, which is about 7x more efficient than Fugaku.  These numbers are still within an order of magnitude, and also don't take into account the overhead costs of things like cooling, case, and CPU/memory required to coordinate the GPUs in the server rack that one would assume you would need.

I used the supercomputers because the numbers were a bit easier to get from the Top500 and Green500 lists, and I also thought that their numbers include the various overhead costs to run the full system, already packaged into neat figures.

Comment by Darklight on Darklight's Shortform · 2021-02-20T15:09:01.110Z · LW · GW

Another thought is that maybe Less Wrong itself, if it were to expand in size and become large enough to roughly represent humanity, could be used as such a dataset.

Comment by Darklight on Darklight's Shortform · 2021-02-20T15:01:34.314Z · LW · GW

So, I had a thought.  The glory system idea that I posted about earlier, if it leads to a successful, vibrant democratic community forum, could actually serve as a kind of dataset for value learning.  If each post has a number attached to it that indicates the aggregated approval of human beings, this can serve as a rough proxy for a kind of utility or Coherent Aggregated Volition.

Given that individual examples will probably be quite noisy, but averaged across a large amount of posts, it could function as a real world dataset, with the post content being the input, and the post's vote tally being the output label.  You could then train a supervised learning classifier or regressor that could then be used to guide a Friendly AI model, like a trained conscience.

This admittedly would not be provably Friendly, but as a vector of attack for the value learning problem, it is relatively straightforward to implement and probably more feasible in the short-run than anything else I've encountered.

Comment by Darklight on The Glory System: A Model For Moral Currency And Distributed Self-Moderation · 2021-02-19T21:18:20.910Z · LW · GW

A further thought is that those with more glory can be seen almost as elected experts.  Their glory is assigned to them by votes after all.  This is an important distinction from an oligarchy.  I would actually be inclined to see the glory system as located on a continuum between direct demcracy and representative democracy.

Comment by Darklight on The Glory System: A Model For Moral Currency And Distributed Self-Moderation · 2021-02-19T21:00:22.655Z · LW · GW

So, keep in mind that by having the first vote free and worth double the paid votes does tilt things more towards democracy.  That being said, I am inclined to see glory as a kind of proxy for past agreement and merit, and a rough way to approximate liquid democracy where you can proxy your vote to others or vote yourself.

In this alternative "market of ideas" the ideas win out because people who others trust to have good opinions are able to leverage that trust.  Decisions over the merit of the given arguments are aggregated by vote.  As long as the population is sufficiently diverse, this should result in an example of the Wisdom of Crowds phenomenon.

I don't think it'll dissolve into a mere flag waving contest, anymore than the existing Karma system on Reddit and Less Wrong does already.

Comment by Darklight on The Glory System: A Model For Moral Currency And Distributed Self-Moderation · 2021-02-19T18:25:15.896Z · LW · GW

Perhaps a nitpick detail, but having someone rob them would not be equivalent, because the cost of the action is offset by the ill-gotten gains.  The proposed currency is more directly equivalent to paying someone to break into the target's bank account and destroying their assets by a proportional amount so that no one can use them anymore.

As for the more general concerns:

Standardized laws and rules tend in practice to disproportionately benefit those with the resources to bend and manipulate those rules with lawyers.  Furthermore, this proposal does not need to replace all laws, but can be utilized alongside them as a way for people to show their disapproval in a way that is more effective that verbal insult, and less coercive than physical violence.  I'd consider it a potential way to channel people's anger so that they don't decide to start a revolution against what they see as laws that benefit the rich and powerful.  It is a way to distribute a little power to individuals and allow them to participate in a system that considers their input in a small but meaningful way.

The rules may be more consistent with laws, but in practice, they are also contentious in the sense that the process of creating these laws is arcane and complex and the resulting punishments often delayed for years as they work through the legal system.  Again, this makes sense when determining how the coercive power of the state should be applied, but leaves something to be desired in terms of responsiveness to addressing real world concerns.

Third-party enforcement is certainly desirable.  In practice, the glory system allows anyone outside the two parties to contribute and likely the bulk of votes will come from them.  As for cycles of violence, the exchange rate mechanism means that defence is at least twice as effective as attack with the same amount of currency, which should at least mitigate the cycles because it won't be cost-effective to attack without significant public support.  Though this is only relevant to the forum condition.

In the general condition as a currency, keep in mind that as a currency functions as a store of value, there is a substantial opportunity cost to spending the currency to destroy other people's currency rather than say, using it to accrue interest.  The cycles are in a sense self-limiting because people won't want to spend all their money escalating a conflict that will only cause both sides to hemorrhage funds, unless someone feels so utterly wronged as to be willing to go bankrupt to bankrupt another, in which case, one should honestly be asking what kind of injustice caused this situation to come into being in the first place.

All that being said, I appreciate the critiques.

Comment by Darklight on The Glory System: A Model For Moral Currency And Distributed Self-Moderation · 2021-02-19T17:51:34.274Z · LW · GW

As for the cheaply punishing prolific posters problem, I don't know a good solution that doesn't lead to other problems, as forcing all downvotes to cost glory makes it much harder to deal with spammers who somehow get through the application process filter.  I had considered an alternative system in which all votes cost glory, but then there's no way to generate glory except perhaps by having admins and mods gift them, which could work, but runs counter to the direct democracy ideal that I was sorta going for.