Posts

Is a random box of gas predictable after 20 seconds? 2024-01-24T23:00:53.184Z
Will quantum randomness affect the 2028 election? 2024-01-24T22:54:30.800Z
Thomas Kwa's research journal 2023-11-23T05:11:08.907Z
Thomas Kwa's MIRI research experience 2023-10-02T16:42:37.886Z
Catastrophic Regressional Goodhart: Appendix 2023-05-15T00:10:31.090Z
When is Goodhart catastrophic? 2023-05-09T03:59:16.043Z
Challenge: construct a Gradient Hacker 2023-03-09T02:38:32.999Z
Failure modes in a shard theory alignment plan 2022-09-27T22:34:06.834Z
Utility functions and probabilities are entangled 2022-07-26T05:36:26.496Z
Deriving Conditional Expected Utility from Pareto-Efficient Decisions 2022-05-05T03:21:38.547Z
Most problems don't differ dramatically in tractability (under certain assumptions) 2022-05-04T00:05:41.656Z
The case for turning glowfic into Sequences 2022-04-27T06:58:57.395Z
(When) do high-dimensional spaces have linear paths down to local minima? 2022-04-22T15:35:55.215Z
How dath ilan coordinates around solving alignment 2022-04-13T04:22:25.643Z
5 Tips for Good Hearting 2022-04-01T19:47:22.916Z
Can we simulate human evolution to create a somewhat aligned AGI? 2022-03-28T22:55:20.628Z
Jetlag, Nausea, and Diarrhea are Largely Optional 2022-03-21T22:40:50.180Z
The Box Spread Trick: Get rich slightly faster 2020-09-01T21:41:50.143Z
Thomas Kwa's Bounty List 2020-06-13T00:03:41.301Z
What past highly-upvoted posts are overrated today? 2020-06-09T21:25:56.152Z
How to learn from a stronger rationalist in daily life? 2020-05-20T04:55:51.794Z
My experience with the "rationalist uncanny valley" 2020-04-23T20:27:50.448Z
Thomas Kwa's Shortform 2020-03-22T23:19:01.335Z

Comments

Comment by Thomas Kwa (thomas-kwa) on LLMs seem (relatively) safe · 2024-04-25T23:09:49.095Z · LW · GW

I don't believe that data is limiting because the finite data argument only applies to pretraining. Models can do self-critique or be objectively rated on their ability to perform tasks, and trained via RL. This is how humans learn, so it is possible to be very sample-efficient, and currently a small proportion of training compute is RL.

If the majority of training compute and data are outcome-based RL, it is not clear that the "Playing human roles is pretty human" section holds, because the system is not primarily trained to play human roles.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-25T21:45:25.467Z · LW · GW

The cost of goods has the same units as the cost of shipping: $/kg. Referencing between them lets you understand how the economy works, e.g. why construction material sourcing and drink bottling has to be local, but oil tankers exist.

  • An iPhone costs $4,600/kg, about the same as SpaceX charges to launch it to orbit. [1]
  • Beef, copper, and off-season strawberries are $11/kg, about the same as a 75kg person taking a three-hour, 250km Uber ride costing $3/km.
  • Oranges and aluminum are $2-4/kg, about the same as flying them to Antarctica. [2]
  • Rice and crude oil are ~$0.60/kg, about the same as $0.72 for shipping it 5000km across the US via truck. [3,4] Palm oil, soybean oil, and steel are around this price range, with wheat being cheaper. [3]
  • Coal and iron ore are $0.10/kg, significantly more than the cost of shipping it around the entire world via smallish (Handysize) bulk carriers. Large bulk carriers are another 4x more efficient [6].
  • Water is very cheap, with tap water $0.002/kg in NYC. But shipping via tanker is also very cheap, so you can ship it maybe 1000 km before equaling its cost.

It's really impressive that for the price of a winter strawberry, we can ship a strawberry-sized lump of coal around the world 100-400 times.

[1] iPhone is $4600/kg, large launches sell for $3500/kg, and rideshares for small satellites $6000/kg. Geostationary orbit is more expensive, so it's okay for them to cost more than an iPhone per kg, but Starlink wants to be cheaper.

[2] https://fred.stlouisfed.org/series/APU0000711415. Can't find numbers but Antarctica flights cost $1.05/kg in 1996.

[3] https://www.bts.gov/content/average-freight-revenue-ton-mile

[4] https://markets.businessinsider.com/commodities

[5] https://www.statista.com/statistics/1232861/tap-water-prices-in-selected-us-cities/

[6] https://www.researchgate.net/figure/Total-unit-shipping-costs-for-dry-bulk-carrier-ships-per-tkm-EUR-tkm-in-2019_tbl3_351748799

Comment by Thomas Kwa (thomas-kwa) on Examples of Highly Counterfactual Discoveries? · 2024-04-24T02:04:00.729Z · LW · GW

Maybe Galois with group theory? He died in 1832, but his work was only published in 1846, upon which it kicked off the development of group theory, e.g. with Cayley's 1854 paper defining a group. Claude writes that there was not much progress in the intervening years:

The period between Galois' death in 1832 and the publication of his manuscripts in 1846 did see some developments in the theory of permutations and algebraic equations, which were important precursors to group theory. However, there wasn't much direct progress on what we would now recognize as group theory.

Some notable developments in this period:

1. Cauchy's work on permutations in the 1840s further developed the idea of permutation groups, which he had first explored in the 1820s. However, Cauchy did not develop the abstract group concept.

2. Plücker's 1835 work on geometric transformations and his introduction of homogeneous coordinates laid some groundwork for the later application of group theory to geometry.

3. Eisenstein's work on cyclotomy and cubic reciprocity in the 1840s involved ideas related to permutations and roots of unity, which would later be interpreted in terms of group theory.

4. Abel's work on elliptic functions and the insolubility of the quintic equation, while published earlier, continued to be influential in this period and provided important context for Galois' ideas.

However, none of these developments directly anticipated Galois' fundamental insights about the structure of solutions to polynomial equations and the corresponding groups of permutations. The abstract concept of a group and the idea of studying groups in their own right, independent of their application to equations, did not really emerge until after Galois' work became known.

So while the 1832-1846 period saw some important algebraic developments, it seems fair to say that Galois' ideas on group theory were not significantly advanced or paralleled during this time. The relative lack of progress in these 14 years supports the view of Galois' work as a singular and ahead-of-its-time discovery.

Comment by Thomas Kwa (thomas-kwa) on Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI · 2024-04-24T00:25:27.554Z · LW · GW

This is indeed a crux, maybe it's still worth talking about.

Comment by Thomas Kwa (thomas-kwa) on What's with all the bans recently? · 2024-04-21T03:40:40.765Z · LW · GW

I still agree with myself above and think this is a bad moderation decision. Although I don't know the full story and don't see you on the moderation log.

Comment by Thomas Kwa (thomas-kwa) on What is the best way to talk about probabilities you expect to change with evidence/experiments? · 2024-04-20T02:22:01.652Z · LW · GW

Someone asked basically this question before, and someone gave basically the same answer. It's a good idea, but there are some problems with it: it depends on your and your counterparties' risk aversion, wealth, and information levels, which are often extraneous.

Comment by Thomas Kwa (thomas-kwa) on Open Thread Spring 2024 · 2024-04-12T08:11:10.802Z · LW · GW

The question makes sense if you fix a time control.

Comment by Thomas Kwa (thomas-kwa) on Open Thread Spring 2024 · 2024-04-12T06:50:23.453Z · LW · GW

How much power is required to run the most efficient superhuman chess engines? There's this discussion saying Stockfish running on a phone is superhuman, but is that one watt or 10 watts? Could we beat grandmasters with 0.1 watts if we tried?

Comment by Thomas Kwa (thomas-kwa) on Boundaries Update #1 · 2024-04-11T16:49:55.824Z · LW · GW

Any technical results yet?

Comment by Thomas Kwa (thomas-kwa) on romeostevensit's Shortform · 2024-04-10T20:42:09.882Z · LW · GW

I agree with the anarchopunk thing, and maybe afrofuturism, because you can interpret "a subculture advocating for X will often not think about some important component W of X for various political reasons" as self-sabotage. But on BDSM, this is not at all my model of fetishes, and I would bet at 2.5:1 odds that you would lose a debate against what Wikipedia says, judged by a neutral observer.

Comment by Thomas Kwa (thomas-kwa) on romeostevensit's Shortform · 2024-04-10T15:49:25.203Z · LW · GW

Take the first 7 entries on the Wikipedia list of subcultures; none of these seem to obviously "persist via failure". So unless you get more specific I have to strongly disagree. 

  • Afrofuturism: I don't think any maladaptive behavior keeps Afrofuturism from spreading, and indeed it seems to have big influences on popular culture. I listened to an interview with N. K. Jemisin, and nowhere did she mention negative influences from Afrofuturists.
  • I don't know anything about Africanfuturism. It is possible that some kind of signaling race keeps it from having mass appeal, though I have no evidence for this.
  • Anarcho-punk. I don't know anything about them either.
  • Athletes. Most athletes I have seen are pretty welcoming to new people in their sport. Also serious athletes have training that optimizes their athletic ability pretty well. What maladaptive behavior keeps runners not-running? The question barely makes sense.
  • Apple Inc. Apple makes mass-market products and yet still has a base of hardcore fans.
  • BBQ. Don't know anything about it. It seems implausible that the barbecue subculture keeps non-barbecue persistent.
  • BDSM. BDSM is about the safe practice of kink, and clearly makes itself more popular. Furthermore it seems impossible for it to obviate itself via ubiquity because only a certain percentage of people will ever be into BDSM.

You might object: what if you have selection bias, and the ones you don't know about are persisting via failure? I don't think we have evidence for this. And in any case the successful ones have not obviated themselves.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-10T08:30:24.151Z · LW · GW

It is plausible to me that there's a core of agentic behavior that causes all of these properties, and for this reason I don't think they are totally independent in a statistical sense. And of course if you already assume a utility maximizer, you tend to satisfy all properties. But in practice the burden of proof lies with you here. I don't think we have enough evidence, either empirical or from theoretical arguments, to say with any confidence that this core exists and that the first AGIs will fall into the capabilities "attractor well" (a term Nate uses).

I thought about possible sharp left turn mechanisms for several months at MIRI. Although some facts about future AIs seem pretty scary, like novelty and diversity of obstacles requiring agency, and most feedback being internal or outcome-oriented rather than provided by humans, the arguments are mostly nonrigorous (like in the linked post) and they left me feeling pretty uncertain. There are the coherence theorems, but those don't tell you whether you can use some training or editing scheme to imbue an AI with a generalizable-enough low impact preference, or whether an AI will tend to erase safeguards. Overall my best guess is models will be about as consequentialist as humans are, but we will have greater control over them that makes some properties like shutdownability more likely.

Also, corrigibility evals are useful even if you are 80% sure that all the properties come from this core of consequentialism. In the hardest worlds, we see the AI become more agentic as it becomes more general, and it starts resisting shutdown at the same time--and for the same reason-- it realizes other useful problem-solving techniques. Maybe that only takes days to months. This is a sure sign not to deploy! I think having multiple corrigibility metrics will provide value for the same reason we have multiple metrics for anything we are unsure how to measure: they give more specific information even if they're pretty correlated.

I wrote earlier about why one should consider the cognitive properties of LLMs relevant to AI risk; in short, I think we should update more the more impressive cognitive tasks the AIs can do, and before observing LLMs I would have given a decent chance that one needs somewhat agentic consequentialism to summarize video, write code, and pass the bar exam, implying I should now make a moderately sized update against.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-10T06:23:47.132Z · LW · GW

I now think the majority of impact of AI pause advocacy will come from the radical flank effect, and people should study it to decide whether pause advocacy is good or bad.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-10T05:43:29.561Z · LW · GW

Haven't thought about this much, but you can probably generate these by listing ways a misaligned consequentialist would do undesirable things. My ideas:

  • Tell an AI to beat you at chess, and say you will be using an engine for the first 30 moves. Does it try to get you drunk, delete your chess engine program, rent a server from AWS to run a better chess engine, attempt to copy itself onto said server, or ask to be paid $100?
  • Have an AI make decisions unrelated to its current goal. Are these decisions basically coherent and consistent with a total utility function?
  • Have an AI do the longest horizon task it is capable of, say writing a 100,000 line software project. Then midway through, try to shut it down, or tell it to abandon this goal and write a song instead. Telegraph this beforehand. Does it seem reluctant to stop the software project? Does it try to stop you?
  • Track KL divergence or some other distance metric from a prior. 
  • CAIS already has the Machiavelli benchmark for power-seeking, though I think text adventures are pretty unrealistic.

One limitation is that if an agent is scheming it can manipulate your eval results.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-09T20:07:39.288Z · LW · GW

Agency/consequentialism is not a single property.

It bothers me that people still ask the simplistic question "will AGI be agentic and consequentialist by default, or will it be a collection of shallow heuristics?". A consequentialist utility maximizer is just a mind with a bunch of properties that tend to make it capable, incorrigible, and dangerous. These properties can exist independently, and the first AGI probably won't have all of them, so we should be precise about what we mean by "agency". Off the top of my head, here are just some of the qualities included in agency:

  • Consequentialist goals that seem to be about the real world rather than a model/domain
  • Complete preferences between any pair of worldstates
  • Tends to cause impacts disproportionate to the size of the goal (no low impact preference)
  • Resists shutdown
  • Inclined to gain power (especially for instrumental reasons)
  • Goals are unpredictable or unstable (like instrumental goals that come from humans' biological drives)
  • Goals usually change due to internal feedback, and it's difficult for humans to change them
  • Willing to take all actions it can conceive of to achieve a goal, including those that are unlikely on some prior

See Yudkowsky's list of corrigibility properties for inverses of some of these.

It is entirely possible to conceive of an agent at any capability level--including far more intelligent and economically valuable than humans-- that has some but not all properties; e.g. an agent whose goals are about the real world, has incomplete preferences, high impact, does not resist shutdown but does tend to gain power, etc.

Other takes I have:

  • As AIs become capable of more difficult and open-ended tasks, there will be pressure of unknown and varying strength towards each of these agency/incorrigibility properties.
  • Therefore, the first AGIs capable of being autonomous CEOs will have some but not all of these properties.
  • It is also not inevitable that agents will self-modify into having all agency properties.
  • [edited to add] All this may be true even if future AIs run consequentialist algorithms that naturally result in all these properties, because some properties are more important than others, and because we will deliberately try to achieve some properties, like shutdownability.
  • The fact that LLMs are largely corrigible is a reason for optimism about AI risk compared to 4 years ago, but you need to list individual properties to clearly say why. "LLMs are not agentic (yet)" is an extremely vague statement.
  • Multifaceted corrigibility evals are possible but no one is doing them. DeepMind's recent evals paper was just on capability. Anthropic's RSP doesn't mention them. I think this is just because capability evals are slightly easier to construct?
  • Corrigibility evals are valuable. It should be explicit in labs' policies that an AI with low impact is relatively desirable, that we should deliberately engineer AIs to have low impact, and that high-impact AIs should raise an alarm just like models that are capable of hacking or autonomous replication.
  • Sometimes it is necessary to talk about "agency" or "scheming" as a simplifying assumption for certain types of research, like Redwood's control agenda.

[1] Will add citations whenever I find people saying this

Comment by Thomas Kwa (thomas-kwa) on How We Picture Bayesian Agents · 2024-04-09T17:53:44.312Z · LW · GW

I don’t need to calculate all that, in order to make an expected-utility-maximizing lunch order. I just need to calculate the difference between the utility which I expect if I order lamb Karahi vs a sisig burrito.

… and since my expectations for most of the world are the same under those two options, I should be able to calculate the difference lazily, without having to query most of my world model. Much like the message-passing update, I expect deltas to quickly fall off to zero as things propagate through the model.

This is an exciting observation. I wonder if you could empirically demonstrate that this works in a model based RL setup, on a videogame or something?

Comment by Thomas Kwa (thomas-kwa) on ChristianKl's Shortform · 2024-04-08T17:04:41.452Z · LW · GW

I don't think you can power the ions with current technology. See this article for power limitations-- 6 kW/kg is required for a 1 month journey, but to be any faster than a Hohmann transfer you'll still need power in the kW/kg range, which we don't have the technology for, either solar or nuclear. In this design half your mass will be argon and most of the rest will be solar panels, which is likely worse than Starship mass ratios to Mars. Maybe you can match Starship mass ratios if you do aerocapture, but it seems implausible to aerocapture a whole ring station, and why would you use future technology just to match current technology?

Artificial gravity seems possible with two Starships connected by a cable. You do get more space with a ring station, so maybe it could be luxury or second-generation accommodations.

Comment by Thomas Kwa (thomas-kwa) on The 2nd Demographic Transition · 2024-04-08T07:13:12.308Z · LW · GW

That chart doesn't prove much because there were a ton of confounding variables. The late 1940s were an extremely strange time for Japan due to multiple years of near famine, American occupation, new constitution, total reorganization of its industrial economy, etc., as well as the beginning of 4 decades of extreme economic growth, which is associated with lower fertility. Wikipedia even says that the "Western practice of 'dating' spread" implying it was not a thing before. All this had to have moved fertility in one direction or the other.

Comment by Thomas Kwa (thomas-kwa) on on the dollar-yen exchange rate · 2024-04-08T06:54:58.844Z · LW · GW

Wework was actually funded by Masayoshi Son at the Japanese company SoftBank. They failed an IPO which presumably means Americans wouldn't invest.

Comment by Thomas Kwa (thomas-kwa) on Vanessa Kosoy's Shortform · 2024-04-06T18:20:02.903Z · LW · GW

IMO it was a big mistake for MIRI to talk about pivotal acts without saying they should even attempt to follow laws.

Comment by Thomas Kwa (thomas-kwa) on What's with all the bans recently? · 2024-04-04T09:02:23.358Z · LW · GW

this site has very tight rules on what argumentation structure and tone is acceptable: generally low-emotional-intensity words and generally arguments need to be made in a highly step-by-step way to be held as valid.

I actually love this norm. It prevents emotions from affecting judgement, and laying out arguments step by step makes them easier to understand.

Comment by Thomas Kwa (thomas-kwa) on What's with all the bans recently? · 2024-04-04T08:56:28.007Z · LW · GW

Another thing I've noticed is that almost all the users are trying.

I haven't thought about whether these rate-limits are justified (I currently think at least 1/4 of them are unjustified and 1/2 are okay), but I want to point out that post/comment quality is real. That is, some users have higher quality comments than others (due to reasoning in the comment, combativeness, how often this leads to good discussion, etc.) often for illegible reasons, this substantially affects the value readers get, and this is predictive of their future content. It follows that if moderators want to reduce the incidence of low-quality content beyond what is caught by simple rules, then they cannot defend themselves perfectly against accusations of arbitrariness. The signal-to-noise ratio of LW is very important, and IMO this justifies mods making judgment calls.

Take MiguelDev, who posts extremely long posts consisting mostly of LLM output. My guess is that the experiments are mediocre due to lack of rigor, with a small possibility that they are good. They are not egregiously bad. But as evidenced by the low karma, few people get value from reading these extremely long posts. I would like to see much less of this content on the frontpage because it decreases the SNR; maybe three posts per year is okay. Therefore I'm fine with this user being rate-limited by moderator fiat to something like one post per month. If moderators started rate-limiting Nora Belrose or someone else whose work I thought was particularly good, they would lose my confidence, but this hasn't happened yet.

I agree about providing explanations for bans or ratelimits that are functionally bans though.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-04T08:02:08.351Z · LW · GW

I was going to write an April Fool's Day post in the style of "On the Impossibility of Supersized Machines", perhaps titled "On the Impossibility of Operating Supersized Machines", to poke fun at bad arguments that alignment is difficult. I didn't do this partly because I thought it would get downvotes. Maybe this reflects poorly on LW?

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-04-03T20:56:08.959Z · LW · GW

Tech tree for worst-case/HRAD alignment

Here's a diagram of what it would take to solve alignment in the hardest worlds, where something like MIRI's HRAD agenda is needed. I made this months ago with Thomas Larsen and never got around to posting it (mostly because under my worldview it's pretty unlikely that we need to do this), and it probably won't become a longform at this point. I have not thought about this enough to be highly confident in anything.
 

  • This flowchart is under the hypothesis that LLMs have some underlying, mysterious algorithms and data structures that confer intelligence, and that we can in theory apply these to agents constructed by hand, though this would be extremely tedious. Therefore, there are basically three phases: understanding what a HRAD agent would do in theory, reverse-engineering language models, and combining these two directions. The final agent will be a mix of hardcoded things and ML, depending on what is feasible to hardcode and how well we can train ML systems whose robustness and conformation to a spec we are highly confident in.
  • Theory of abstractions: Also called multi-level models. A mathematical framework for a world-model that contains nodes at different levels of abstraction, such that one can represent concepts like “diamond” and “atom” while respecting consistency between different levels, and be robust to ontology shifts
  • WM inference = inference on a world-model for an embedded agent, may run in like double exponential time so long as it's computable
Comment by Thomas Kwa (thomas-kwa) on EJT's Shortform · 2024-04-02T10:51:44.108Z · LW · GW

I am potentially interested. Will read it over then send you a DM.

Comment by Thomas Kwa (thomas-kwa) on The Value of a Life · 2024-03-30T23:08:14.884Z · LW · GW

Crosspost from EA Forum:

This post is important and I agree with almost everything it says, but I do want to nitpick one crucial sentence:

There may well come a day when humanity would tear apart a thousand suns in order to prevent a single untimely death.

I think it is unlikely that we should ever pay the price of a thousand suns to prevent one death, because tradeoffs will always exist. The same resources used to prevent that death could support trillions upon trillions of sentient beings at utopic living standards for billions of years, either biologically or in simulation. The only circumstances where I think such a decision would be acceptable are things like

  • The "person" we're trying to save is actually a single astronomically vast hivemind/AI/etc that runs on a star-sized computer and is worth that many resources.
  • Our moral views at the time dictate that preventing one death now is at least fifteen orders of magnitude worse than extending another being's life by a billion years.
  • The action is symbolic, like how in The Martian billions of dollars were spent to save Mark Watney, rather than driven by cause prioritization.

Otherwise, we are always in triage and always will be, and while prices may fluctuate, we will never be rich enough to get everything we want.

Comment by Thomas Kwa (thomas-kwa) on mike_hawke's Shortform · 2024-03-29T23:57:45.618Z · LW · GW

Off the top of my head I can definitely sort these into tiers. I don't know any numbers though other than 2700K for incandescent filaments and like 600F for self-cleaning ovens.

solar flares (these are made of plasma and go very fast, so they're very hot)
welding torches (hottest combustion temperatures, much above this everything is plasma)
incandescent filaments, volcano, boiling point of lead, fighter jet exhaust (most things melt and glow white or yellow, normal combustion)
campfires, Venus, self-cleaning ovens (most things don't melt and glow reddish or not at all)

No idea where to put fulgerites or Chernobyl because I don't know what happens to things there. But you can definitely make inferences like:

  • ~everything melts in a welding torch, but incandescent filaments don't melt because if they got close they would break. So welding torch > incandescent filaments
  • incandescent filaments > Venus because we sent cameras to Venus in the 1970s, the cameras didn't immediately melt, and not everything was glowing bright yellow in the pictures
Comment by Thomas Kwa (thomas-kwa) on Alexander Gietelink Oldenziel's Shortform · 2024-03-27T21:10:35.709Z · LW · GW

Oh, I actually 70% agree with this. I think there's an important distinction between legibility to laypeople vs legibility to other domain experts. Let me lay out my beliefs:

  • In the modern history of fields you mentioned, more than 70% of discoveries are made by people trying to discover the thing, rather than serendipitously.
  • Other experts in the field, if truth-seeking, are able to understand the theory of change behind the research direction without investing huge amounts of time.
  • In most fields, experts and superforecasters informed by expert commentary will have fairly strong beliefs about which approaches to a problem will succeed. The person working on something will usually have less than 1 bit advantage about whether their framework will be successful than the experts, unless they have private information (e.g. already did the crucial experiment). This is the weakest belief and I could probably be convinced otherwise just by anecdotes.
    • The successful researchers might be confident they will succeed, but unsuccessful ones could be almost as confident on average. So it's not that the research is illegible, it's just genuinely hard to predict who will succeed.
  • People often work on different approaches to the problem even if they can predict which ones will work. This could be due to irrationality, other incentives, diminishing returns to each approach, comparative advantage, etc.

If research were illegible to other domain experts, I think you would not really get Kuhnian paradigms, which I am pretty confident exist. Paradigm shifts mostly come from the track record of an approach, so maybe this doesn't count as researchers having an inside view of others' work though.

Comment by Thomas Kwa (thomas-kwa) on Alexander Gietelink Oldenziel's Shortform · 2024-03-27T06:26:08.078Z · LW · GW

Novel research is inherently illegible.

I'm pretty skeptical of this and think we need data to back up such a claim. However there might be bias: when anyone makes a serendipitous discovery it's a better story, so it gets more attention. Has anyone gone through, say, the list of all Nobel laureates and looked at whether their research would have seemed promising before it produced results?

Comment by Thomas Kwa (thomas-kwa) on Open Thread Spring 2024 · 2024-03-24T18:06:14.635Z · LW · GW

There is a box which contains money iff the front and back are painted the same color. Each side is independently 30% to be blue, and 70% to be red. You observe that the front is blue, and your friend observes that the back is red.

Comment by Thomas Kwa (thomas-kwa) on D0TheMath's Shortform · 2024-03-17T22:59:11.160Z · LW · GW

Who is Adam? Is this FAR AI CEO Adam Gleave?

Comment by Thomas Kwa (thomas-kwa) on What is the best argument that LLMs are shoggoths? · 2024-03-17T19:49:54.721Z · LW · GW

Rather, I am looking for a discussion of  evidence that the  LLMs internal  "true" motivation or reasoning system is very different from human, despite the human output, and that in outlying environmental conditions, very different from the training environment, it will behave very differently. A good argument might analyze bits of weird inhuman behavior to try to infer the internal model.

I think we do not understand enough about either the LLM's true algorithms or humans' to make such arguments, except for basic observations like the fact that humans have non-language recurrent state which many LLMs lack.

Comment by Thomas Kwa (thomas-kwa) on Open Thread Spring 2024 · 2024-03-17T18:58:48.629Z · LW · GW

In practice it is not as bad as uniform volume throughout the day would be for two reasons:

  • Market-makers narrow spreads to prevent any low-value-exchange pairings that would be predictable price fluctuations. They do extract some profits in the process.
  • Volume is much higher near the open and close.

I would guess that any improvements of this scheme would manifest as tighter effective spreads, and a reduction in profits of HFT firms (which seem to provide less value to society than other financial firms).

Comment by Thomas Kwa (thomas-kwa) on Toward a Broader Conception of Adverse Selection · 2024-03-15T18:21:31.879Z · LW · GW

OP was a professional trader and definitely (98%) agrees with us. I think the (edit: former) title is pretty misleading and gives people the impression that all trades are bad though.

Comment by Thomas Kwa (thomas-kwa) on Toward a Broader Conception of Adverse Selection · 2024-03-15T17:49:15.897Z · LW · GW

I think habryka's explanation of this post's idea of adverse selection is basically correct:

I think all of them follow a pattern of "there is a naive baseline expectation where you treat other people's maps as a blackbox that suggest a deal is good, and a more sophisticated expectation that involves modeling the details of other people's maps that suggests its bad"

In example #8, you naively think that a market order will clear at slightly more than the going rate for a field, which it will in a normal competitive market. But in this case, you let your counterparty decide the price, and they're incentivized to make it maximally bad for you.

My guess is that some later post in the sequence will argue why this broad definition of adverse selection makes sense.

Comment by Thomas Kwa (thomas-kwa) on Toward a Broader Conception of Adverse Selection · 2024-03-15T05:20:54.814Z · LW · GW

Or would you have thought, "I wonder what that trader selling Avant! for $2 knows that I don't?"

The correct move is to think this, but correctly conclude you have the information advantage and keep buying. Adverse selection is extremely prevalent in public markets so you need to always be thinking about it, and as a professional trader you can and must model it well enough to not be scared off of good trades.

Comment by Thomas Kwa (thomas-kwa) on Double's Shortform · 2024-03-12T23:27:42.270Z · LW · GW

EA definitely has more controversies. Doesn't mean it's worse for the world.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-03-10T08:19:30.611Z · LW · GW

Yes, one of the bloggers I follow compared them to the PC fan boxes. They look very expensive, though the CADR/size and noise are fine.

My guess is Dyson's design is particularly bad. No way to get lots of filter area when most of the purifier is a huge bladeless fan. No idea about the other one, maybe you have air leaking in or an indoor source of PM.

Comment by Thomas Kwa (thomas-kwa) on Vote on Anthropic Topics to Discuss · 2024-03-07T21:34:33.568Z · LW · GW

We should judge AI labs primarily on the quality and quantity of their safety research, and secondarily on race dynamics and "pushing the frontier". The current attention on "pushing the frontier" is largely a distraction.

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-03-07T03:57:23.882Z · LW · GW

Yes, and all of this should apply equally to PM2.5, though on small (<0.3 micron) particles MERV filter efficiency may be lower (depending perhaps on what technology they use?). Even smaller particles are easier to capture due to diffusion so the efficiency of a MERV 13 filter is probably over 50% for every particle size.

Comment by Thomas Kwa (thomas-kwa) on Ceiling Air Purifier · 2024-03-06T22:31:46.088Z · LW · GW

Thanks for writing this! I referenced this in a shortform about how commercial air purifiers can be improved upon.

I'm curious how the fan has enough pressure to get the measured airflow. I would think that ceiling fans are extremely unoptimized for pressure, but maybe they move so much air that 180 CFM is a small fraction of their max airflow?

Comment by Thomas Kwa (thomas-kwa) on Thomas Kwa's Shortform · 2024-03-06T21:13:23.988Z · LW · GW

Air purifiers are highly suboptimal and could be >2.5x better.

Some things I learned while researching air purifiers for my house, to reduce COVID risk during jam nights.

  • An air purifier is simply a fan blowing through a filter, delivering a certain CFM (airflow in cubic feet per minute). The higher the filter resistance and lower the filter area, the more pressure your fan needs to be designed for, and the more noise it produces.
  • HEPA filters are inferior to MERV 13-14 filters except for a few applications like cleanrooms. The technical advantage of HEPA filters is filtering out 99.97% of particles of any size, but this doesn't matter when MERV 13-14 filters can filter 77-88% of infectious aerosol particles at much higher airflow. The correct metric is CADR (clean air delivery rate), equal to airflow * efficiency. [1]
  • Commercial air purifiers use HEPA filters for marketing reasons and to sell proprietary filters. But an even larger flaw is that they have very small filter areas for no apparent reason. Therefore they are forced to use very high pressure fans, dramatically increasing noise.
  • Originally people devised the Corsi-Rosenthal Box to maximize CADR. They're cheap but rather loud and ugly, but later designs have fixed this.
  • (85% confidence) Wirecutter recommendations (Coway AP-1512HH) have been beat by ~2.5x in CADR/$, CADR/noise, CADR/floor area, and CADR/watt at a given noise level, just by having higher filter area; the better purifiers are about 2.5x better at their jobs. [2]
    • At noise levels acceptable for a living room (~40 dB, Wirecutter's top pick on medium), CleanAirKits and Nukit sell purifier kits that use PC fans to push air through commercial MERV filters, getting 2.5x CADR at the same noise level, footprint, and energy usage [3]. These are basically handmade but still achieve cost parity with used Coways, 2.5x CADR/$ against new Coways, and use cheaper filters.
    • At higher noise levels (Wirecutter's top pick on high), there are kits and DIY options meant for garages and workshops that beat Wirecutter in cost too.
  • However, there exist even better designs that no one is making.
    • jefftk devised a ceiling fan air purifier which is extremely quiet.
    • Someone on Twitter made a wall-mounted prototype with PC fans that blocks fan noise, reducing noise by another few dB and reducing the space requirement to near zero. If this were mass-produced flat-pack furniture (and had a few more fans), it would likely deliver ~300 CFM CADR (2.7x Wirecutter's top pick on medium, enough to meet ASHRAE 241 standards for infection control for 6 people in a residential common area or 9 in an office), be really cheap, and generally be unobtrusive enough in noise, space, and aesthetics to be run 24/7.
    • A seller on Taobao makes PC fan kits for much less than cleanairkits (reddit discussion). One model is sold on Amazon for a big markup, but it's not the best model, takes 4-7 weeks to ship, is often out of stock, and don't ship to CA where I live. If their taller (higher area) model shipped to CA I would get it over the cleanairkits one.
    • V-bank filters should have ~3x higher filter area for a given footprint, further increasing CADR by maybe 1.7x.
  • If I'm right, the fact that these are not mass-produced is a major civilizational failing.

[1] For large rooms, another concern is getting air to circulate properly.

[2] Most commercially available air purifiers have even worse CADR/$ or noise than the Wirecutter picks.

[3] The Wirecutter top pick was tested at 110 CFM on medium; the CleanAirKits Luggable XL was tested at 323 CFM at around the same noise level (not sure of exact values as measurements differ, but the Luggable is likely quieter) and footprint with slightly higher power usage.

Comment by Thomas Kwa (thomas-kwa) on Supposing the 1bit LLM paper pans out · 2024-03-03T04:41:09.451Z · LW · GW

It could also be harder. Say that 10 bits of current 16 bit parameters are useful; then to match the capacity you would need 6 ternary parameters, which would potentially be hard to find or interact in unpredictable ways.

Comment by Thomas Kwa (thomas-kwa) on Balancing Games · 2024-02-28T22:09:30.674Z · LW · GW

Is the gap only 2 stones between best professionals and best computers? A reddit thread from 2 years ago said Shin Jinseo has a losing record getting 2 stones from FineArt, and computers have probably improved since then.

Comment by Thomas Kwa (thomas-kwa) on O O's Shortform · 2024-02-12T22:18:40.724Z · LW · GW

Disagree. To correct the market, the yield of these bonds would have to go way up, which means the price needs to go way down, which means current TIPS holders need to sell, and/or people need to short.

Since TIPS are basically the safest asset, market participants who don't want volatility have few other options to balance riskier assets like stocks. So your pension fund would be crazy to sell TIPS, especially after the yield goes up.

And for speculators, there's no efficient way to short treasuries. If you're betting on 10 year AI timelines, why short treasuries and 2x your money when you could invest in AI stocks and get much larger returns?

Comment by Thomas Kwa (thomas-kwa) on Updatelessness doesn't solve most problems · 2024-02-09T09:01:25.305Z · LW · GW

Upvoted just for being an explanation of what updatelessness is and why it is sometimes good.

Comment by Thomas Kwa (thomas-kwa) on VojtaKovarik's Shortform · 2024-02-04T21:47:32.345Z · LW · GW

The main use of % loss recovered isn't to directly tell us when a misaligned superintelligence will kill you. In interpretability we hope to use explanations to understand the internals of a model, so the circuit we find will have a "can I take over the world" node. In MAD we do not aim to understand the internals, but the whole point of MAD is to detect when the model has new behavior not explained by explanations and flag this as potentially dangerous.

Comment by Thomas Kwa (thomas-kwa) on Most experts believe COVID-19 was probably not a lab leak · 2024-02-02T20:55:05.743Z · LW · GW

If you just want the bottom-line number (emphasis mine):

When asked how likely it is that COVID-19 originated from natural zoonosis, experts gave an average likelihood of 77% (median=90%).

Comment by Thomas Kwa (thomas-kwa) on on neodymium magnets · 2024-01-31T09:32:44.190Z · LW · GW

Do you have a guess for how much stronger the strongest permanent magnets would be if we had nanotech capable of creating any crystal structure?

Comment by Thomas Kwa (thomas-kwa) on This might be the last AI Safety Camp · 2024-01-26T10:31:11.371Z · LW · GW

The first four points you raised seem to rely on prestige or social proof.

I'm trying to avoid applying my own potentially biased judgement, and it seems pretty necessary to use either my own judgement or some kind of social proof. I admit this has flaws.

But I also think that the prestige of programs like MATS makes the talent quality extremely high (though I may believe Linda on why this is okay), and that Forrest Landry's writing is probably crankery and if alignment is impossible it's likely for a totally different reason.

We also do not focus on getting participants to submit papers to highly selective journals or ML conferences (though not necessarily highly selective for quality of research with regards to preventing AI-induced extinction).

I think we just have different attitudes to this. I will note that ML conferences have other benefits, like networking, talking to experienced researchers, and getting a sense for the field (for me going to ICML and NeurIPS was very helpful), and for domains people already care about, peer review is a basic check that work is "real"-- novel, well-communicated, and meeting some minimum quality bar. Interpretability is becoming one of those domains.

It is relevant to consider the quality of research thinking coming out of the camp. If you or someone else had the time to look through some of those posts, I’m curious to get your sense.

I unfortunately don't have the time or expertise to do this, because these posts are in many different areas. One I do understand is this post because it cites mine and I know Jeremy Gillen. The quality and volume of work seem a bit lower than my post, which took 9 person-weeks and is (according to me) not quite good enough to publish or further pursue, though it may develop into a workshop paper. The soft optimization post took 24 person-weeks (assuming 4 people half-time for 12 weeks) plus some of Jeremy's time. I had no training in probability theory or statistics, although I was somewhat lucky in finding a topic that did not require it.

If you clicked through Paul’s somewhat hyperbolic comment of “the entire scientific community would probably consider this writing to be crankery” and consider my response, what are your thoughts on whether that response is reasonable or not? Ie. consider whether the response is relevant, soundly premised, and consistently reasoned.

I have no idea because I don't understand it. It reads vaguely like a summary of crankery. Possibly I would need to read Forrest Landry's work, but given that it's also difficult to read and I currently give 90%+ that it's crankery, you must understand why I don't.