utilistrutil

Posts
Comments

Posts

Comments

Comment by utilistrutil on Read More News · 2025-03-18T16:56:29.005Z · LW · GW

MIT Tech Review doesn't break much news. Try Techmeme.

Re "what people are talking about"

Sure, the news is biased toward topics people already think are important because you need readers to click etc etc. But you are people, so you might also think that at least some of those topics are important. Even if the overall news is mostly uncorrelated with your interests, you can filter aggressively.

Re "what they're saying about it"

I think you have in mind articles that are mostly commentary, analysis, opinion. News in the sense I mean it here tells you about some event, action, deal, trend, etc that wasn't previously public. News articles might also tell you what some experts are saying about it, but my recommendation is just to get the object-level scoop from the headline and move on.

Re is it worth the time of sifting through

Skimming headlines is fast. Maybe the news tends to be less action-relevant for your research, but I bet AI safety collectively wastes time and misses out on establishing expertise by being behind the news. Reading Zvi's newsletter falls under what I'm advocating for (even though it's mostly that what-people-are-saying commentary, the object-level news still comes through.)

Comment by utilistrutil on utilistrutil's Shortform · 2025-02-23T23:56:28.088Z · LW · GW

Conditioning as a Crux Finding Device

Say you disagree with someone, e.g. they have low pdoom and you have high pdoom. You might be interested in finding cruxes with them.

You can keep imagining narrower and narrower scenarios in which your beliefs still diverge. Then you can back out properties of the final scenario to identify cruxes.

For example, you start by conditioning on AGI being achieved - both of your pdooms tick up a bit. Then you also condition on that AGI being misaligned, and again your pdooms increase a bit (if the beliefs move in opposite directions, that might be worth exploring!). Then you condition on the AGI self-exfiltrating, and your pdooms nudge up again.

Now you've found a very narrow scenario in which you still disagree! You think it's obvious that a misaligned AGI proliferating around the world is an endgame, they don't see what the big deal is. From there, you're in a good position to find cruxes.

(Note that you're not necessarily finding the condition of maximum disagreement, you're just trying to get information about where you disagree.)

Comment by utilistrutil on Implications of the inference scaling paradigm for AI safety · 2025-02-10T05:01:58.032Z · LW · GW

Got it thanks!

Comment by utilistrutil on Implications of the inference scaling paradigm for AI safety · 2025-02-10T04:07:59.327Z · LW · GW

(eg. any o1 session which finally stumbles into the right answer can be refined to drop the dead ends and produce a clean transcript to train a more refined intuition)

Do we have evidence that this is what's going on? My understanding is that distilling from CoT is very sensitive—reordering the reasoning, or even pulling out the successful reasoning, causes the student to be unable to learn from it.

I agree o1 creates training data, but that might just be high quality pre-training data for GPT-5.

Comment by utilistrutil on Implications of the inference scaling paradigm for AI safety · 2025-02-10T04:04:12.100Z · LW · GW

Why does it make the CoT less faithful?

Comment by utilistrutil on Decomposing Agency — capabilities without desires · 2024-09-03T01:24:40.669Z · LW · GW

Favorite post of the year so far!

Comment by utilistrutil on Determining the power of investors over Frontier AI Labs is strategically important to reduce x-risk · 2024-07-26T00:14:31.534Z · LW · GW

Better link: https://www.bloomberg.com/opinion/articles/2024-07-10/jefferies-funded-some-fake-water

Comment by utilistrutil on Determining the power of investors over Frontier AI Labs is strategically important to reduce x-risk · 2024-07-25T22:14:05.542Z · LW · GW

My favored version of this project would involve >50% of the work going into the econ literature and models on investor incentives, with attention to

Principal-agent problems
Information asymmetry
Risk preferences
Time discounting

And then a smaller fraction of the work would involve looking into AI labs, specifically. I'm curious if this matches your intentions for the project or whether you think there are important lessons about the labs that will not be found in the existing econ literature.

Comment by utilistrutil on Determining the power of investors over Frontier AI Labs is strategically important to reduce x-risk · 2024-07-25T22:08:32.124Z · LW · GW

How does the fiduciary duty of companies to investors work?

OpenAI instructs investors to view their investments "in the spirit of a donation," which might be relevant for this question.

Comment by utilistrutil on utilistrutil's Shortform · 2024-07-22T18:30:00.381Z · LW · GW

I would really like to see a post from someone in AI policy on "Grading Possible Comprehensive AI Legislation." The post would lay out what kind of safety stipulations would earn a bill an "A-" vs a "B+", for example.

I'm imagining a situation where, in the next couple years, a big omnibus AI bill gets passed that contains some safety-relevant components. I don't want to be left wondering "did the safety lobby get everything it asked for, or did it get shafted?" and trying to construct an answer ex-post.

Comment by utilistrutil on Against most, but not all, AI risk analogies · 2024-07-19T23:08:44.083Z · LW · GW

I don't know how I hadn't seen this post before now! A couple weeks after you published this, I put out my own post arguing against most applications of analogies in explanations of AI risk. I've added a couple references to your post in mine.

Comment by utilistrutil on The Strangest Thing An AI Could Tell You · 2024-07-19T01:30:38.935Z · LW · GW

Adult brains are capable of telekinesis, if you fully believe in your ability to move objects with your mind. Adults are generally too jaded to believe such things. Children have the necessary unreserved belief, but their minds are not developed enough to exercise the ability.

Comment by utilistrutil on utilistrutil's Shortform · 2024-07-12T21:36:47.393Z · LW · GW

File under 'noticing the start of an exponential': A.I. Helped to Find a Vast Source of the Copper That A.I. Needs to Thrive

Comment by utilistrutil on utilistrutil's Shortform · 2024-07-01T20:18:48.511Z · LW · GW

Scott Alexander says:

Suppose I notice I am a human on Earth in America. I consider two hypotheses. One is that everything is as it seems. The other is that there is a vast conspiracy to hide the fact that America is much bigger than I think - it actually contains one trillion trillion people. It seems like SIA should prefer the conspiracy theory (if the conspiracy is too implausible, just increase the posited number of people until it cancels out).

I am often confused by the kind of reasoning at play in the text I bolded. Maybe someone can help sort me out. As I increase the number of people in the conspiracy world, my prior in that world also decreases. If my prior falls faster than the number of people in the considered world grows, I will not be able to construct a conspiracy-world that allows the thought experiment to bite.

Consider the situation where I arrive at the airport, where I will wait in line at security. Wouldn't I be more likely to discover a line 1000 people long than 100 people long? I am 10x more likely to exist in the longer line. The problem is that our prior on 1000 people security lines might be very low. The reasoning on display in the above passage would invite us to simply crank up the length of the line, say, to 1 million people. I suspect that SIA proponents don't show up at the airport expecting lines this long. Why? Because the prior on a million-person line is more than a thousand times lower than the prior on a 100-person line.

This also applies to some presentations of Pascal's mugging.

Comment by utilistrutil on LLM Generality is a Timeline Crux · 2024-06-26T10:06:06.410Z · LW · GW

Jacob Steinhardt on predicting emergent capabilities:

There’s two principles I find useful for reasoning about future emergent capabilities:
If a capability would help get lower training loss, it will likely emerge in the future, even if we don’t observe much of it now.
As ML models get larger and are trained on more and better data, simpler heuristics will tend to get replaced by more complex heuristics. . . This points to one general driver of emergence: when one heuristic starts to outcompete another. Usually, a simple heuristic (e.g. answering directly) works best for small models on less data, while more complex heuristics (e.g. chain-of-thought) work better for larger models trained on more data.

The nature of these things is that they're hard to predict, but general reasoning satisfies both criteria, making it a prime candidate for a capability that will emerge with scale.

Comment by utilistrutil on DanielFilan's Shortform Feed · 2024-06-11T00:46:18.673Z · LW · GW

I think you could also push to make government liable as part of this proposal

Comment by utilistrutil on D0TheMath's Shortform · 2024-03-22T23:41:52.410Z · LW · GW

There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I'm not even sure about the sign.

Sign of the effect of open source on hype? Or of hype on timelines? I'm not sure why either would be negative.

Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines

The example I've heard cited is Stable Diffusion leading to LORA.

There's a countervailing effect of democratizing safety research, which one might think outweighs because it's so much more neglected than capabilities, more low-hanging fruit.

Comment by utilistrutil on Using axis lines for good or evil · 2024-03-20T03:39:26.575Z · LW · GW

GDP is an absolute quantity. If GDP doubles, then that means something. So readers should be thinking about the distance between the curve and the x-axis.
But 1980 is arbitrary. When comparing 2020 to 2000, all that matters is that they’re 20 years apart. No one cares that “2020 is twice as far from 1980 as 2000” because time did not start in 1980.

This is the difference between a ratio scale and a cardinal scale. In a cardinal scale, the distance between points is meaningful, e.g., "The gap between 1 and 2 is twice as big as the gap between 2 and 4." In a ratio scale, there is a well-defined zero point, which means the ratios of points are also meaningful, e.g., "4 is twice as large as 2."

Comment by utilistrutil on utilistrutil's Shortform · 2024-03-18T03:06:19.109Z · LW · GW

I just came across this word from John Koenig's Dictionary of Obscure Sorrows, that nicely capture the thesis of All Debates Are Bravery Debates.

redesis n. a feeling of queasiness while offering someone advice, knowing they might well face a totally different set of constraints and capabilities, any of which might propel them to a wildly different outcome—which makes you wonder if all of your hard-earned wisdom's fundamentally nonstraferable, like handing someone a gift card in your name that probably expired years ago.

Comment by utilistrutil on A case for AI alignment being difficult · 2024-01-03T07:04:52.492Z · LW · GW

(and perhaps also reversing some past value-drift due to the structure of civilization and so on)

Can you say more about why this would be desirable?

Comment by utilistrutil on The Dark Arts · 2024-01-03T05:28:49.113Z · LW · GW

A lot of this piece is unique to high school debate formats. In the college context, every judge is themself a current or previous debater, so some of these tricks don't work. (There are of course still times when optimizing for competitive success distracts from truth-seeking.)

Comment by utilistrutil on Arjun Panickssery's Shortform · 2023-12-30T20:53:09.482Z · LW · GW

Here are some responses to Rawls from my debate files:

A2 Rawls

Ahistorical
- Violates property rights
- Does not account for past injustices eg slavery, just asks what kind of society would you design from scratch. Thus not a useful guide for action in our fucked world.
Acontextual
- Veil of ignorance removes contextual understanding, which makes it impossible to assess different states of the world. Eg from the original position, Rawls prohibits me from using my gender to inform my understanding of gender in different states of the world
- Identity is not arbitrary! It is always contingent, yes, but morality is concerned with the interactions of real people, who have capacities, attitudes, and preferences. There are reasons for these things that are located in individual experiences and contexts, so they are not arbitrary.
- But even if they were the result of pure chance, it’s unclear that these coincidences are the legitimate subject of moral scrutiny. I *am* a white man - I can’t change that. They need to explain why morality should be pretend otherwise. Only after conditioning on our particular context can we begin to reason morally.
The one place Rawls is interested in context is bad: he says the principle should only be applied within a society: but this precludes action on global poverty.
Rejects economic growth: the current generation is the one that is worst-off; saving now for future growth necessarily comes at the cost of foregone consumption, which hurts the current generation.

Comment by utilistrutil on Arjun Panickssery's Shortform · 2023-12-30T20:51:09.533Z · LW · GW

1. It’s pretty much a complete guide to action? Maybe there are decisions where it is silent, but that’s true of like every ethical theory like this (“but util doesn’t care about X!”). I don’t think the burden is on him to incorporate all the other concepts that we typically associate with justice. At very least not a problem for “justifying the kind of society he supports”

2. Like the two responses to this are either “Rawls tells you the true conception of the good, ignore the other ones” or “just allow for other-regarding preferences and proceed as usual” and either seems workable

3. Sure

4. Agree in general that Rawls does not account for different risk preferences but infinite risk aversion isn’t necessary for most practical decisions

5. Agree Rawls doesn’t usually account for future. But you could just use veil of ignorance over all future and current people, which collapses this argument into a specific case of “maximin is stupid because it doesn’t let us make the worst-off people epsilon worse-off in exchange for arbitrary benefits to others”

I think (B) is getting at a fundamental problem

Comment by utilistrutil on MATS Summer 2023 Retrospective · 2023-12-04T20:25:05.437Z · LW · GW

What % evals/demos and what % mech interp would you expect to see if there wasn't Goodharting? 1/3 and 1/5 doesn't seem that high to me, given the value of these agendas and the advantages of touching reality that Ryan named.

Comment by utilistrutil on MATS Summer 2023 Retrospective · 2023-12-04T19:35:44.452Z · LW · GW

these are the two main ways I would expect MATS to have impact: research output during the program and future research output/career trajectories of scholars.

We expect to achieve more impact through (2) than (1); per the theory of change section above, MATS' mission is to expand the talent pipeline. Of course, we hope that scholars produce great research through the program, but often we are excited about scholars doing research (1) as a means for scholars to become better researchers (2). Other times, these goals are in tension. For example, some mentors explicitly de-emphasize research outputs, encouraging scholars to focus instead on pivoting frequently to develop better taste.

If you have your own reason to think (1) is comparably important to (2), I wonder if you also think providing mentees to researchers whose agendas you support is a similarly important source of MATS' impact? From the Theory of Change section:

By mentoring MATS scholars, these senior researchers also benefit from research assistance

Comment by utilistrutil on MATS Summer 2023 Retrospective · 2023-12-02T21:45:44.034Z · LW · GW

Thanks, Aaron! That's helpful to hear. I think "forgetting" is a good candidate explanation because scholars answered that question right after competing Alignment 201, which is designed for breadth. Especially given the expedited pace of the course, I wouldn't be surprised if people forgot a decent chunk of A201 material over the next couple months. Maybe for those two scholars, forgetting some A201 content outweighed the other sources of breadth they were afforded, like seminars, networking, etc.

Comment by utilistrutil on utilistrutil's Shortform · 2023-11-23T20:11:49.117Z · LW · GW

Today I am thankful that Bayes' Rule is unintuitive.

Much ink has been spilled complaining that Bayes' Rule can yield surprising results. As anyone who has taken an introductory statistics class knows, it is difficult to solve a problem that requires an application of Bayes' Rule without plugging values into the formula, at least for a beginner. Eventually, the student of Bayes may gain an intuition for the Rule (perhaps in odds form), but at that point they can be trusted to wield their intuition responsibly because it was won through disciplined practice.

This unintuitiveness is a feature, not a bug because it discourages motivated reasoning. If Bayes' Rule were more intuitive, it would be simple to back out what P(A), P(B), and P(B|A) must be to justify your preferred posterior belief, and then argue for these quantities. It would also be simple to work backwards to select your prediction A from a favorable hypothesis space. Because Bayes' Rule is unintuitive, these are challenging moves, and formally updating your beliefs is less vulnerable to motivated reasoning.

Happy Thanksgiving!

Comment by utilistrutil on Apply for MATS Winter 2023-24! · 2023-11-02T18:41:59.257Z · LW · GW

Update: We have finalized our selection of mentors.

Comment by utilistrutil on Model, Care, Execution · 2023-08-17T01:06:42.655Z · LW · GW

I have a friend Balaam who has a very hard time saying no. If I ask him, “Would it bother you if I eat the last slice of pizza?” he will say “Of course that’s fine!” even if it would be somewhat upsetting or costly to him.

I think this is a reference to Guess/Ask/Tell Culture, so I'm linking that post for anyone interested :)

Comment by utilistrutil on Accidentally Load Bearing · 2023-08-13T01:40:13.496Z · LW · GW

This happens in chess all the time!

Comment by utilistrutil on Ways to buy time · 2023-05-19T18:53:05.396Z · LW · GW

You recommend Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover. I wonder if you would also include Why Would AI "Aim" to Defeat Humanity on this list? I know it came out after this post.

User info

Posts

Comments