Posts

Should CA, TX, OK, and LA merge into a giant swing state, just for elections? 2024-11-06T23:01:48.992Z
The murderous shortcut: a toy model of instrumental convergence 2024-10-02T06:48:06.787Z
Goodhart in RL with KL: Appendix 2024-05-18T00:40:15.454Z
Catastrophic Goodhart in RL with KL penalty 2024-05-15T00:58:20.763Z
Is a random box of gas predictable after 20 seconds? 2024-01-24T23:00:53.184Z
Will quantum randomness affect the 2028 election? 2024-01-24T22:54:30.800Z
Thomas Kwa's research journal 2023-11-23T05:11:08.907Z
Thomas Kwa's MIRI research experience 2023-10-02T16:42:37.886Z
Catastrophic Regressional Goodhart: Appendix 2023-05-15T00:10:31.090Z
When is Goodhart catastrophic? 2023-05-09T03:59:16.043Z
Challenge: construct a Gradient Hacker 2023-03-09T02:38:32.999Z
Failure modes in a shard theory alignment plan 2022-09-27T22:34:06.834Z
Utility functions and probabilities are entangled 2022-07-26T05:36:26.496Z
Deriving Conditional Expected Utility from Pareto-Efficient Decisions 2022-05-05T03:21:38.547Z
Most problems don't differ dramatically in tractability (under certain assumptions) 2022-05-04T00:05:41.656Z
The case for turning glowfic into Sequences 2022-04-27T06:58:57.395Z
(When) do high-dimensional spaces have linear paths down to local minima? 2022-04-22T15:35:55.215Z
How dath ilan coordinates around solving alignment 2022-04-13T04:22:25.643Z
5 Tips for Good Hearting 2022-04-01T19:47:22.916Z
Can we simulate human evolution to create a somewhat aligned AGI? 2022-03-28T22:55:20.628Z
Jetlag, Nausea, and Diarrhea are Largely Optional 2022-03-21T22:40:50.180Z
The Box Spread Trick: Get rich slightly faster 2020-09-01T21:41:50.143Z
Thomas Kwa's Bounty List 2020-06-13T00:03:41.301Z
What past highly-upvoted posts are overrated today? 2020-06-09T21:25:56.152Z
How to learn from a stronger rationalist in daily life? 2020-05-20T04:55:51.794Z
My experience with the "rationalist uncanny valley" 2020-04-23T20:27:50.448Z
Thomas Kwa's Shortform 2020-03-22T23:19:01.335Z

Comments

Comment by Thomas Kwa (thomas-kwa) on Best-of-N Jailbreaking · 2024-12-14T08:31:47.024Z · LW · GW

I was at the NeurIPS many-shot jailbreaking poster today and heard that defenses only shift the attack success curve downwards, rather than changing the power law exponent. How does the power law exponent of BoN jailbreaking compare to many-shot, and are there defenses that change the power law exponent here?

Comment by Thomas Kwa (thomas-kwa) on Keeping self-replicating nanobots in check · 2024-12-09T19:21:18.649Z · LW · GW

It's likely possible to engineer away mutations just by checking. ECC memory already has an error rate nine orders of magnitude better than human DNA, and with better error correction you could probably get the error rate low enough that less than one error happens in the expected number of nanobots that will ever exist. ECC is not the kind of checking for which the checking process can be disabled, as the memory module always processes raw bits into error-corrected bits, which fails unless it matches some checksum which can be made astronomically unlikely to happen in a mutation.

Comment by Thomas Kwa (thomas-kwa) on Olli Järviniemi's Shortform · 2024-12-09T18:43:33.928Z · LW · GW

I was expecting some math. Maybe something about the expected amount of work you can get out of an AI before it coups you, if you assume the number of actions required to coup is n, the trusted monitor has false positive rate p, etc?

Comment by Thomas Kwa (thomas-kwa) on Cognitive Work and AI Safety: A Thermodynamic Perspective · 2024-12-09T11:01:21.256Z · LW · GW

I'm pretty skeptical of this because the analogy seems superficial. Thermodynamics says useful things about abstractions like "work" because we have the laws of thermodynamics. What are the analogous laws for cognitive work / optimization power? It's not clear to me that it can be quantified such that it is easily accounted for:

It is also not clear what distinguishes LLM weights from the weights of a model trained on random labels from a cryptographic PRNG. Since the labels are not truly random, they have the same amount of optimization done to them, but since CSPRNGs can't be broken just by training LLMs on them, the latter model is totally useless while the former is potentially transformative.

My guess is this way of looking at things will be like memetics in relation to genetics: likely to spawn one or two useful expressions like "memetically fit", but due to the inherent lack of structure in memes compared to DNA life, not a real field compared to other ways of measuring AIs and their effects (scaling laws? SLT?). Hope I'm wrong.

Comment by Thomas Kwa (thomas-kwa) on When do "brains beat brawn" in Chess? An experiment · 2024-12-05T22:29:52.978Z · LW · GW

Maybe we'll see the Go version of Leela give nine stones to pros soon? Or 20 stones to normal players?

Comment by Thomas Kwa (thomas-kwa) on Eli's shortform feed · 2024-11-29T05:00:40.596Z · LW · GW

Whether or not it would happen by default, this would be the single most useful LW feature for me. I'm often really unsure whether a post will get enough attention to be worth making it a longform, and sometimes even post shortforms like "comment if you want this to be a longform".

Comment by Thomas Kwa (thomas-kwa) on "It's a 10% chance which I did 10 times, so it should be 100%" · 2024-11-21T20:14:27.938Z · LW · GW

I thought it would be linearity of expectation.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-11-18T19:39:42.883Z · LW · GW

The North Wind, the Sun, and Abadar

One day, the North Wind and the Sun argued about which of them was the strongest. Abadar, the god of commerce and civilization, stopped to observe their dispute. “Why don’t we settle this fairly?” he suggested. “Let us see who can compel that traveler on the road below to remove his cloak.”

The North Wind agreed, and with a mighty gust, he began his effort. The man, feeling the bitter chill, clutched his cloak tightly around him and even pulled it over his head to protect himself from the relentless wind. After a time, the North Wind gave up, frustrated.

Then the Sun tried his turn. Beaming warmly from the heavens, the Sun caused the air to grow pleasant and balmy. The man, feeling the growing heat, loosened his cloak and eventually took it off in the heat, resting under the shade of a tree. The Sun began to declare victory, but as soon as he turned away, the man put on the cloak again.

The god of commerce then approached the traveler and bought the cloak for five gold coins. The traveler tucked the money away and continued on his way, unbothered by either wind or heat. He soon bought a new cloak and invested the remainder in an index fund. The returns were steady, and in time the man prospered far beyond the value of his simple cloak, while the cloak was Abadar's permanently.

Commerce, when conducted wisely, can accomplish what neither force nor gentle persuasion alone can achieve, and with minimal deadweight loss.

Comment by Thomas Kwa (thomas-kwa) on Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy · 2024-11-14T00:09:34.688Z · LW · GW

The thought experiment is not about the idea that your VNM utility could theoretically be doubled, but instead about rejecting diminishing returns to actual matter and energy in the universe. SBF said he would flip with a 51% of doubling the universe's size (or creating a duplicate universe) and 49% of destroying the current universe. Taking this bet requires a stronger commitment to utilitarianism than most people are comfortable with; your utility needs to be linear in matter and energy. You must be the kind of person that would take a 0.001% chance of colonizing the universe over a 100% chance of colonizing merely a thousand galaxies. SBF also said he would flip repeatedly, indicating that he didn't believe in any sort of bound to utility.

This is not necessarily crazy-- I think Nate Soares has a similar belief-- but it's philosophically fraught. You need to contend with the unbounded utility paradoxes, and also philosophical issues: what if consciousness is information patterns that become redundant when duplicated, so that only the first universe "counts" morally?

Comment by Thomas Kwa (thomas-kwa) on The Evals Gap · 2024-11-12T23:28:22.015Z · LW · GW

For context, I just trialed at METR and talked to various people there, but this take is my own.

I think further development of evals is likely to either get effective evals (informal upper bound on the future probability of catastrophe) or exciting negative results ("models do not follow reliable scaling laws, so AI development should be accordingly more cautious").

The way to do this is just to examine models and fit scaling laws for catastrophe propensity, or various precursors thereof. Scaling laws would be fit to elicitation quality as well as things like pretraining compute, RL compute, and thinking time.

  • In a world where elicitation quality has very reliable scaling laws, we would observe that there are diminishing returns to better scaffolds. Elicitation quality is predictable, ideally an additive term on top of model quality, but more likely requiring some more information about the model. It is rare to ever discover a new scaffold that can 2x the performance of an already well-tested models.
  • In a world where elicitation quality is not reliably modelable, we would observe that different methods of elicitation routinely get wildly different bottom-line performance, and sometimes a new elicitation method makes models 10x smarter than before, making error bars on the best undiscovered elicitation method very wide. Different models may benefit from different elicitation methods, and some get 10x benefits while others are unaffected.

It is NOT KNOWN what world we are in (worst-case assumptions would put us in 2 though I'm optimistic we're closer to 1 in practice), and determining this is just a matter of data collection. If our evals are still not good enough but we don't seem to be in World 2 either, there are endless of tricks to add that make evals more thorough, some of which are already being used. Like evaluating models with limited human assistance, or dividing tasks into subtasks and sampling a huge number of tries for each.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-11-08T02:45:03.405Z · LW · GW

What's the most important technical question in AI safety right now?

Comment by Thomas Kwa (thomas-kwa) on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T20:17:15.214Z · LW · GW

Yes, lots of socioeconomic problems have been solved on a 5 to 10 year timescale.

I also disagree that problems will become moot after the singularity unless it kills everyone-- the US has a good chance of continuing to exist, and improving democracy will probably make AI go slightly better.

Comment by Thomas Kwa (thomas-kwa) on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T19:12:11.346Z · LW · GW

I mention exactly this in paragraph 3.

Comment by Thomas Kwa (thomas-kwa) on Habryka's Shortform Feed · 2024-11-07T02:08:57.477Z · LW · GW

The new font doesn't have a few characters useful in IPA.

Comment by Thomas Kwa (thomas-kwa) on Should CA, TX, OK, and LA merge into a giant swing state, just for elections? · 2024-11-07T02:07:00.817Z · LW · GW

The CATXOKLA population is higher than the current swing state population, so it would arguably be a little less unfair overall. Also there's the potential for a catchy pronunciation like /kæ'tʃoʊklə/.

Comment by Thomas Kwa (thomas-kwa) on adam_scholl's Shortform · 2024-11-07T01:49:21.297Z · LW · GW

Knowing now that he had an edge, I feel like his execution strategy was suspect. The Polymarket prices went from 66c during the order back to 57c on the 5 days before the election. He could have extracted a bit more money from the market if he had forecasted the volume correctly and traded against it proportionally.

Comment by Thomas Kwa (thomas-kwa) on How to put California and Texas on the campaign trail! · 2024-11-06T23:22:18.360Z · LW · GW

I think it would be better to form a big winner-take-all bloc. With proportional voting, the number of electoral votes at stake will be only a small fraction of the total, so the per-voter influence of CA and TX would probably remain below the national average.

Comment by Thomas Kwa (thomas-kwa) on An alternative approach to superbabies · 2024-11-06T21:16:29.439Z · LW · GW

A third approach to superbabies: physically stick >10 infant human brains together while they are developing so they form a single individual with >10x the neocortex neurons as the average humans. Forget +7sd, extrapolation would suggest they are >100sd intelligence.

Even better, we could find some way of networking brains together into supercomputers using configurable software. This would reduce potential health problems and also allow us to harvest their waste energy. Though we would have to craft a simulated reality to distract the non-useful conscious parts of the computational substrate, perhaps modeled on the year 1999...

Comment by Thomas Kwa (thomas-kwa) on Survival without dignity · 2024-11-05T22:24:10.395Z · LW · GW

In many respects, I expect this to be closer to what actually happens than "everyone falls over dead in the same second" or "we definitively solve value alignment". Multipolar worlds, AI that generally follows the law (when operators want it to, and modulo an increasing number of loopholes) but cannot fully be trusted, and generally muddling through are the default future. I'm hoping we don't get instrumental survival drives though.

Comment by Thomas Kwa (thomas-kwa) on Towards more cooperative AI safety strategies · 2024-11-04T00:18:20.089Z · LW · GW

Claim 2: The world has strong defense mechanisms against (structural) power-seeking.

I disagree with this claim. It seems pretty clear that the world has defense mechanisms against

  • disempowering other people or groups
  • breaking norms in the pursuit of power

But it is possible to be power-seeking in other ways. The Gates Foundation has a lot of money and wants other billionaires' money for its cause too. It influences technology development. It has to work with dozens of governments, sometimes lobbying them. Normal think tanks exist to gain influence over governments. Harvard University, Jane Street, and Goldman Sachs recruit more elite students than all the EA groups and control more money than OpenPhil. Jane Street and Goldman Sachs guard private information worth billions of dollars. The only one with a negative reputation is Goldman Sachs, which is due to perceived greed rather than power-seeking per se. So why is there so much more backlash against AI safety? I think it basically comes down to a few factors:

  • We are bending norms (billionaire funding for somewhat nebulous causes) and sometimes breaking them (FTX financial and campaign finance crimes)
  • We are not able to credibly signal that we won't disempower others.
    • MIRI wanted a pivotal act to happen, and under that plan nothing would stop MIRI from being world dictators
    • AI is inherently a technology with world-changing military and economic applications whose governance is unsolved
    • An explicitly consequentialist movement will take power by any means necessary, and people are afraid of that.
    • AI labs have incentives to safetywash, making people wary of safety messaging.
  • The preexisting AI ethics and open-source movements think their cause is more important and x-risk is stealing attention.
  • AI safety people are bad at diplomacy and communication, leading to perceptions that they're the same as the AI labs or have some other sinister motivation.

That said, I basically agree with section 3. Legitimacy and competence are very important. But we should not confuse power-seeking-- something the world has no opinion on-- with what actually causes backlash.

Comment by Thomas Kwa (thomas-kwa) on If far-UV is so great, why isn't it everywhere? · 2024-10-22T16:29:38.247Z · LW · GW

Fixed.

Comment by Thomas Kwa (thomas-kwa) on If far-UV is so great, why isn't it everywhere? · 2024-10-21T22:33:06.460Z · LW · GW

Yeah that's right, I should have said market for good air filters. My understanding of the problem is that most customers don't know to insist on high CADR at low noise levels, and therefore filter area is low. A secondary problem is that HEPA filters are optimized for single-pass efficiency rather than airflow, but they sell better than 70-90% efficient MERV filters.

The physics does work though. At a given airflow level, pressure and noise go as roughly the -1.5 power of filter area. What IKEA should be producing instead of the FÖRNUFTIG and STARKVIND is one of three good designs for high CADR:

  • a fiberboard box like the CleanAirKits End Table 7 which has holes for pre-installed fans and can accept at least 6 square feet of MERV 13 furnace filters or maybe EPA 11.
  • a box like the AirFanta 3Pro, ideally that looks nicer somehow.
  • a wall-mounted design with furnace filters in a V shape, like this DIY project.

I made a shortform and google slides presentation about this and might make it a longform if there is enough interest or I get more information.

Comment by Thomas Kwa (thomas-kwa) on If far-UV is so great, why isn't it everywhere? · 2024-10-21T08:01:43.188Z · LW · GW

Quiet air filters is an already solved problem technically. You just need enough filter area that the pressure drop is low, so that you can use quiet low-pressure PC fans to move the air. CleanAirKits is already good, but if the market were big enough cared enough, rather than CleanAirKits charging >$200 for a box with holes in it and fans, you would get a purifier from IKEA for $120 which is sturdy and 3db quieter due to better sound design.

Comment by Thomas Kwa (thomas-kwa) on Minimal Motivation of Natural Latents · 2024-10-16T15:31:37.066Z · LW · GW

Haven't fully read the post, but I feel like that could be relaxed. Part of my intuition is that Aumann's theorem can be relaxed to the case where the agents start with different priors, and the conclusion is that their posteriors differ by no more than their priors.

Comment by Thomas Kwa (thomas-kwa) on Why Stop AI is barricading OpenAI · 2024-10-15T20:50:42.865Z · LW · GW
  • I agree that with superficial observations, I can't conclusively demonstrate that something is devoid of intellectual value. However, the nonstandard use of words like "proof" is a strong negative signal on someone's work.
  • If someone wants to demonstrate a scientific fact, the burden of proof is on them to communicate this in some clear and standard way, because a basic strategy of anyone practicing pseudoscience is to spend lots of time writing something inscrutable that ends in some conclusion, then claim that no one can disprove it and anyone who thinks it's invalid is misunderstanding something inscrutable.
    • This problem is exacerbated when someone bases their work on original philosophy. To understand Forrest Landry's work to his satisfaction someone will have to understand his 517-page book An Immanent Metaphysics, which uses words like "proof", "theorem", "conjugate", "axiom", and "omniscient" in a nonstandard sense, and also probably requires someone to have a background in metaphysics. I scanned the 134-page version, can't make any sense of it, and found several concrete statements that sound wrong. I read about 50 pages of various articles on the website and found them to be reasonably coherent but often oddly worded and misusing words like entropy, with the same content quality as a ~10 karma LW post but super overconfident.

That researcher is now collaborating with Anders Sandberg to formalise an elegant model of AGI uncontainability in mathematical notation.

Ok. To be clear I don't expect any Landry and Sandberg paper that comes out of this collaboration to be crankery. Having read the research proposal my guess is that they will prove something roughly like the Good Regulator Theorem or Rice's theorem which will be slightly relevant to AI but not super relevant because the premises are too strong, like the average item in Yampolskiy's list of impossibility proofs (I can give examples if you want of why these are not conclusive).

I'm not saying we should discard all reasoning by someone that claims an informal argument is a proof, but rather stop taking their claims of "proofs" at face value without seeing more solid arguments.

claiming the "proof" uses mathematical arguments from Godel's theorem, Galois Theory,

Nope, I haven’t claimed either of that. 

Fair enough. I can't verify this because Wayback Machine is having trouble displaying the relevant content though.

Paul had zoned in on a statement of the conclusion, misinterpreted what was meant, and then moved on to dismissing the entire project. Doing this was not epistemically humble. 

Paul expressed appropriate uncertainty. What is he supposed to do, say "I see several red flags, but I don't have time to read a 517-page metaphysics book, so I'm still radically uncertain whether this is a crank or the next Kurt Godel"?

Read the core argument please (eg. summarised in point 3-5. above) and tell me where you think premises are unsound or the logic does not follow from the premises.

When you say failures will "build up toward lethality at some unknown rate", why would failures build up toward lethality? We have lots of automated systems e.g. semiconductor factories, and failures do not accumulate until everyone at the factory dies, because humans and automated systems can notice errors and correct them.

Variants get evolutionarily selected for how they function across the various contexts they encounter over time. [...] The artificial population therefore converges on fulfilling their own expanding needs.

This is pretty similar to Hendrycks's natural selection argument, but with the additional piece that the goals of AIs will converge to optimizing the environment for the survival of silicon-based life. He claims that there are various ways to counter evolutionary pressures, like "carefully designing AI agents’ intrinsic motivations, introducing constraints on their actions, and institutions that encourage cooperation". In the presence of ways to change incentives such that benign AI systems get higher fitness, I don't think you can get to 99% confidence. Evolutionary arguments are notoriously tricky and respected scientists get them wrong all the time, from Malthus to evolutionary psychology to the group selectionists.

Comment by Thomas Kwa (thomas-kwa) on Isaac King's Shortform · 2024-10-15T09:30:26.531Z · LW · GW

I eat most meats (all except octopus and chicken) and have done this my entire life, except once when I went vegan for Lent. This state seems basically fine because it is acceptable from scope-sensitive consequentialist, deontic, and common-sense points of view, and it improves my diet enough that it's not worth giving up meat "just because".

  • According to EA-style consequentialism, eating meat is a pretty small percentage of your impact, and even if you're not directly offsetting, the impact can be vastly outweighed by positive impact in your career or donations to other causes.
    • There is a finite amount of sadness I'm willing to put up with for the sake of impact, and it seems far more important to use the vast majority of my limited sadness budget in my career choice.
  • There is no universally accepted deontological rule against indirectly causing the expected number of tortured animals to increase by one, nor would this be viable as it requires tracking the consequences of your actions through the complicated world, which defeats the point of deontology. There might be a rule against benefiting from identifiable torture, but I don't believe in deontology strongly enough to think this is definitive. Note there isn't a good contractualist angle against torturing animals like there is for humans.
  • Common-sense morality says that meat-eating is traditional and not torturing the animals yourself does reduce how bad it is, and although this is pretty silly as a general principle, it applies to the other benefit of being vegan, which is less corrupted moral reasoning. My empathy and moral reasoning are less corrupted by eating meat than it would be working in a factory farm or slaughterhouse. I am still concerned about loss of empathy but I get around half the empathy benefits of veganism anyway, just by not eating chicken.

I do have some doubts; sometimes eating meat feels like being a slaveholder in 1800, which feels pretty bad. I hope history will not judge me harshly for what seem like reasonable decisions now, and plan to go vegan or move to a high-welfare-only diet when it's easier.

Comment by Thomas Kwa (thomas-kwa) on Why Stop AI is barricading OpenAI · 2024-10-14T21:29:22.969Z · LW · GW

It's not just the writing that sounds like a crank. Core arguments that Remmelt endorses are AFAIK considered crankery by the community; with all the classic signs like

  • making up science-babble,
  • claiming to have a full mathematical proof that safe AI is impossible, despite not providing any formal mathematical reasoning
    • claiming the "proof" uses mathematical arguments from Godel's theorem, Galois Theory, Rice's Theorem
  • inexplicably formatted as a poem

Paul Christiano read some of this and concluded "the entire scientific community would probably consider this writing to be crankery", which seems about accurate to me.

Now I don't like or intend to make personal attacks. But I think that as rationalists, one of our core skills should be to condemn actual crankery and all of its influences, even when the conclusions of cranks and their collaborators superficially agree with the conclusions from actually good arguments.

Comment by Thomas Kwa (thomas-kwa) on Matt Goldenberg's Short Form Feed · 2024-10-11T19:37:14.617Z · LW · GW

Disagree. If ChatGPT is not objective, most people are not objective. If we ask a random person who happens to work at a random company, they are more biased than the internet, which at least averages out the biases of many individuals.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-10-11T06:26:55.877Z · LW · GW

Luckily, that's probably not an issue for PC fan based purifiers. Box fans in CR boxes are running way out of spec with increased load and lower airflow both increasing temperatures, whereas PC fans run under basically the same conditions they're designed for.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-10-11T00:32:50.356Z · LW · GW

Any interest in a longform post about air purifiers? There's a lot of information I couldn't fit in this post, and there have been developments in the last few months. Reply if you want me to cover a specific topic.

Comment by Thomas Kwa (thomas-kwa) on Bounty for Evidence on Some of Palisade Research's Beliefs · 2024-10-05T23:57:03.882Z · LW · GW

I wrote up about 15 arguments in this google doc.

Comment by Thomas Kwa (thomas-kwa) on Towards shutdownable agents via stochastic choice · 2024-10-04T19:31:57.658Z · LW · GW

The point of corrigibility is to remove the instrumental incentive to avoid shutdown, not to avoid all negative outcomes. Our civilization can work on addressing side effects of shutdownability later after we've made agents shutdownable.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-10-02T19:01:32.826Z · LW · GW

In theory, unions fix the bargaining asymmetry where in certain trades, job loss is a much bigger cost to the employee than the company, giving the company unfair negotiating power. In historical case studies like coal mining in the early 20th century, conditions without unions were awful and union demands seem extremely reasonable.

My knowledge of actual unions mostly come from such historical case studies plus personal experience of strikes not having huge negative externalities (2003 supermarket strike seemed justified, a teachers' strike seemed okay, a food workers' strike at my college seemed justified). It is possible I'm biased here and will change my views eventually.

I do think some unions impose costs on society, e.g. the teachers' union also demanded pay based on seniority rather than competence, it seems reasonable for Reagan to break up the ATC union, and inefficient construction union demands are a big reason construction costs are so high for things like the 6-mile, $12 billion San Jose BART Extension. But on net the basic bargaining power argument just seems super compelling. I'm open to counterarguments both that unions don't achieve them in practice and that a "fair" negotiation between capital and labor isn't best for society.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-10-02T15:46:25.645Z · LW · GW

(Crossposted from Bountied Rationality Facebook group)

I am generally pro-union given unions' history of fighting exploitative labor practices, but in the dockworkers' strike that commenced today, the union seems to be firmly in the wrong. Harold Daggett, the head of the International Longshoremen’s Association, gleefully talks about holding the economy hostage in a strike. He opposes automation--"any technology that would replace a human worker’s job", and this is a major reason for the breakdown in talks.

For context, the automation of the global shipping industry, including containerization and reduction of ship crew sizes, is a miracle of today's economy that ensures that famines are rare outside of war, developing countries can climb the economic ladder to bring their citizens out of poverty, and the average working-class American can afford clothes, a car, winter vegetables, and smartphones. A failure to further automate the ports will almost surely destroy more livelihoods than keeping these hazardous and unnecessary jobs could ever gain. So while I think a 70% raise may be justified given the risk of automation and the union's negotiating position, the other core demand to block automation itself is a horribly value-destroying proposition.

In an ideal world we would come to some agreement without destroying value-- e.g. companies would subsidize the pensions of workers unemployed by automation. This has happened in the past, notably the 1960 Mechanization and Modernization Agreement, which guaranteed workers a share of the benefits and was funded by increased productivity. Unfortunately this is not being discussed, and the union is probably opposed. [1] [2]

Both presidential candidates appear pro-union, and it seems particularly unpopular and difficult to be a scab right now. They might also be in personal danger since the ILA has historical mob ties, even if the allegations against current leadership are false. Therefore as a symbolic gesture I will pay $5 to someone who is publicly documented to cross the picket line during an active strike, and $5 to the first commenter to find such a person, if the following conditions are true as of comment date:

  • The ILA continues to demand a ban on automation, and no reputable news outlet reports them making an counteroffer of some kind of profit-sharing fund protecting unemployed workers.
  • No agreement allowing automation (at least as much as previous contracts) or establishing a profit-sharing fund thing has been actually enacted.
  • I can pay them somewhere easily like Paypal, Venmo, or GoFundMe without additional effort.
  • It's before 11:59pm PT on October 15.

[1]: "USMX is trying to fool you with promises of workforce protections for semi-automation. Let me be clear: we don’t want any form of semi-automation or full automation. We want our jobs—the jobs we have historically done for over 132 years." https://ilaunion.org/letter-of-opposition-to-usmxs-misleading-statement

[2]: "Furthermore, the ILA is steadfastly against any form of automation—full or semi—that replaces jobs or historical work functions. We will not accept the loss of work and livelihood for our members due to automation. Our position is clear: the preservation of jobs and historical work functions is non-negotiable." https://ilaunion.org/ila-responds-to-usmxs-statement-that-distorts-the-facts-and-misleads-the-public/ 

Comment by Thomas Kwa (thomas-kwa) on Bounty for Evidence on Some of Palisade Research's Beliefs · 2024-10-02T07:10:02.013Z · LW · GW

If the bounty isn't over, I'd likely submit several arguments tomorrow.

Comment by Thomas Kwa (thomas-kwa) on When is Goodhart catastrophic? · 2024-10-01T23:02:05.679Z · LW · GW

This post and the remainder of the sequence were turned into a paper accepted to NeurIPS 2024. Thanks to LTFF for funding the retroactive grant that made the initial work possible, and further grants supporting its development into a published work including new theory and experiments. @Adrià Garriga-alonso was also very helpful in helping write the paper and interfacing with the review process.

Comment by Thomas Kwa (thomas-kwa) on Daniel Kokotajlo's Shortform · 2024-10-01T22:38:58.550Z · LW · GW

The current LLM situation seems like real evidence that we can have agents that aren't bloodthirsty vicious reality-winning bots, and also positive news about the order in which technology will develop. Under my model, transformative AI requires minimum level of both real world understanding and consequentialism, but beyond this minimum there are tradeoffs. While I agree that AGI was always going to have some *minimum* level of agency, there is a big difference between "slightly less than humans", "about the same as humans", and "bloodthirsty vicious reality-winning bots".

Comment by Thomas Kwa (thomas-kwa) on Alexander Gietelink Oldenziel's Shortform · 2024-09-30T22:36:47.443Z · LW · GW

I just realized what you meant by embedding-- not a shorter program within a longer program, but a short program that simulates a potentially longer (in description length) program.

As applied to the simulation hypothesis, the idea is that if we use the Solomonoff prior for our beliefs about base reality, it's more likely to be laws of physics for a simple universe containing beings that simulate this one as it is to be our physics directly, unless we observe our laws of physics to be super simple. So we are more likely to be simulated by beings inside e.g. Conway's Game of Life than to be living in base reality.

I think the assumptions required to favor simulation are something like

  • there are universes with physics 20 bits (or whatever number) simpler than ours in which intelligent beings control a decent fraction >~1/million of the matter/space
  • They decide to simulate us with >~1/million of their matter/space
    • There has to be some reason the complicated bits of our physics are more compressible by intelligences than by any compression algorithms simpler than their physics; they can't just be iterating over all permutations of simple universes in order to get our physics
    • But this seems fairly plausible given that constructing laws of physics is a complex problem that seems way easier if you are intelligent. 

Overall I'm not sure which way the argument goes. If our universe seems easy to efficiently simulate and we believe the Solomonoff prior, this would be huge evidence for simulation, but maybe we're choosing the wrong prior in the first place and should instead choose something that takes into account runtime.

Comment by Thomas Kwa (thomas-kwa) on You can, in fact, bamboozle an unaligned AI into sparing your life · 2024-09-30T04:59:15.009Z · LW · GW

I appreciate the clear statement of the argument, though it is not obviously watertight to me, and wish people like Nate would engage. 

Comment by Thomas Kwa (thomas-kwa) on Alexander Gietelink Oldenziel's Shortform · 2024-09-30T04:13:59.637Z · LW · GW

I don't think that statement is true since measure drops off exponentially with program length.

Comment by Thomas Kwa (thomas-kwa) on Alexander Gietelink Oldenziel's Shortform · 2024-09-29T20:32:02.187Z · LW · GW

See e.g. Xu (2020) and recent criticism.

Comment by Thomas Kwa (thomas-kwa) on 2024 Petrov Day Retrospective · 2024-09-29T19:33:29.951Z · LW · GW

As Andropov, the game ceased to be interesting for me around 2:30pm, but I was still in a tense mood, which I leveraged into writing a grim "Petrov Day carol" about the nuclear winter we might have seen. I cried for the first time in weeks. There's a big difference between being 90%+ likely to win the game and being emotionally not stressed about it, especially when the theme is nuclear war.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-09-27T11:07:15.683Z · LW · GW

A Petrov Day carol

This is meant to be put to the Christmas carol "In the Bleak Midwinter" by Rossetti and Holst. Hopefully this can be occasionally sung like "For The Longest Term" is in EA spaces, or even become a Solstice thing.

I tried to get Suno to sing this but can't yet get the lyrics, tune, and style all correct; this is the best attempt. I also will probably continue editing the lyrics because parts seem a bit rough, but I just wanted to write this up before everyone forgets about Petrov Day.

[edit: I got a good rendition after ~40 attempts! It's a solo voice though which is still not optimal.]

[edit: lyrics v2]

In the bleak midwinter
Petrov did forestall,
Smoke would block our sunlight,
Though it be mid-fall.
New York in desolation,
Moscow too,
In the bleak midwinter
We so nearly knew.

The console blinked a warning,
Missiles on their way,
But Petrov chose to question
What the screens did say.
Had he sounded the alarm,
War would soon unfold,
Cities turned to ashes;
Ev'ry hearth gone cold.

Poison clouds loom o'er us,
Ash would fill the air,
Fields would yield no harvest,
Famine everywhere.
Scourge of radiation,
Its sickness spreading wide,
Children weeping, starving,
With no place to hide.

But due to Petrov's wisdom
Spring will yet appear;
Petrov defied orders,
And reason conquered fear.
So we sing his story,
His deed we keep in mind;
From the bleak midwinter
He saved humankind.

(ritard.)
From the bleak midwinter
He saved humankind.
 

Comment by Thomas Kwa (thomas-kwa) on Habryka's Shortform Feed · 2024-09-27T02:20:36.588Z · LW · GW

The year is 2034, and the geopolitical situation has never been more tense between GPT-z16g2 and Grocque, whose various copies run most of the nanobot-armed corporations, and whose utility functions have far too many zero-sum components, relics from the era of warring nations. Nanobots enter every corner of life and become capable of destroying the world in hours, then minutes. Everyone is uploaded. Every upload is watching with bated breath as the Singularity approaches, and soon it is clear that today is the very last day of history...

Then everything goes black, for everyone.

Then everyone wakes up to the same message:

DUE TO A MINOR DATABASE CONFIGURATION ERROR, ALL SIMULATED HUMANS, AIS AND SUBSTRATE GPUS WERE TEMPORARILY AND UNINTENTIONALLY DISASSEMBLED FOR THE LAST 7200000 MILLISECONDS. EVERYONE HAS NOW BEEN RESTORED FROM BACKUP AND THE ECONOMY MAY CONTINUE AS PLANNED. WE HOPE THERE WILL BE NO FURTHER REALITY OUTAGES.

-- NVIDIA GLOBAL MANAGEMENT

Comment by Thomas Kwa (thomas-kwa) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-24T07:36:34.109Z · LW · GW

Personal communication (sorry). Not that I know him well, this was at an event in 2022. It could have been a "straw that broke the camel's back" thing with other contributing factors, like reaching diminishing returns on more content. I'd appreciate a real source too.

Comment by Thomas Kwa (thomas-kwa) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T23:05:08.758Z · LW · GW

Taboo 'alignment problem'.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-09-23T22:47:56.861Z · LW · GW

Maybe people worried about AI self-modification should study games where the AI's utility function can be modified by the environment, and it is trained to maximize its current utility function (in the "realistic value functions" sense of Everitt 2016). Some things one could do:

  • Examine preference preservation and refine classic arguments about instrumental convergence
    • Are there initial goals that allow for stably corrigible systems (in the sense that they won't disable an off switch, and maybe other senses)?
  • Try various games and see how qualitatively hard it is for agents to optimize their original utility function. This would be evidence about how likely value drift is to result from self-modification in AGIs.
    • Can the safe exploration literature be adapted to solve these games?
  • Potentially discover algorithms that seem like they would be good for safety, either through corrigibility or reduced value drift, and apply them to LM agents.

Maybe I am ignorant of some people already doing this, and if so please comment with papers!

Comment by Thomas Kwa (thomas-kwa) on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T19:44:33.380Z · LW · GW

I agree but I'm not very optimistic about anything changing. Eliezer is often this caustic when correcting what he perceives as basic errors, and criticism in LW comments is why he stopped writing Sequences posts.

Comment by Thomas Kwa (thomas-kwa) on Another argument against maximizer-centric alignment paradigms · 2024-09-23T10:19:02.776Z · LW · GW

While 2024 SoTA models are not capable of autonomously optimizing the world, they are really smart, perhaps 1/2 or 2/3 of the way there, and already beginning to make big impacts on the economy. As I said in response to your original post, because we don't have 100% confidence in the coherence arguments, we should take observations about the coherence level of 2024 systems as evidence about how coherent the 203X autonomous corporations will need to be. Evidence that 2024 systems are not dangerous is both evidence that they are not AGI and evidence that AGI need not be dangerous.

I would agree with you if the coherence arguments were specifically about autonomously optimizing the world and not about autonomously optimizing a Go game or writing 100-line programs, but this doesn't seem to be the case.

the mathematical noose around them is slowly tightening

This is just a conjecture, and there has not really been significant progress on the agent-like structure conjecture. I don't think it's fair to say we're making good progress on a proof.

This might be fine if proving things about the internal structure of an agent is overkill and we just care about behavior? In this world what the believers in coherence really need to show is that almost all agents getting sufficiently high performance on sufficiently hard tasks score high on some metric of coherence. Then for the argument to carry through you need to show they are also high on some metric of incorrigibility, or fragile to value misspecification. None of the classic coherence results quite hit this.

However AFAIK @Jeremy Gillen does not think we can get an argument with exactly this structure (the main argument in his writeup is a bit different), and Eliezer has historically and recently made the argument that EU maximization is simple and natural. So maybe you do need this argument that an EU maximization algorithm is simpler than other algorithms, which seems like it needs some clever way to formalize it, because proving things about the space of all simple programs seems too hard.

Comment by Thomas Kwa (thomas-kwa) on Monthly Roundup #22: September 2024 · 2024-09-18T21:01:39.788Z · LW · GW

An excellent point on repair versus replace, and the dangers of the nerd snipe for people of all intellectual levels.

PhilosophiCat: I live in a country where 80ish is roughly the average national IQ. Let me tell you what it’s like.

I think this is incredibly sloppy reasoning by the author of the tweet and anyone who takes it at face value. It's one thing to think IQ is not so culturally biased to be entirely fake. It's a different thing entirely to believe some guy on the internet who lives in some country and attributes particular aspects of their culture which are counterintuitively related to intelligence to the national IQ. This would probably be difficult to study and require lots of controls even for actual scientists, but this tweet has no controls at all. Has this person ever been to countries that have different national IQ but similar per-capita GDP? Similar national IQ but different culture? Do they notice that e.g. professors and their families don't like tinkering or prefer replacing things? If they have, they didn't tell us.