tao-lin

Posts
Comments

Posts

Send us example gnarly bugs 2023-12-10T05:23:00.773Z

Causal scrubbing: results on induction heads 2022-12-03T00:59:18.327Z

Causal scrubbing: results on a paren balance checker 2022-12-03T00:59:08.078Z

Tao Lin's Shortform 2021-07-30T21:03:00.931Z

Comments

Comment by Tao Lin (tao-lin) on Show, not tell: GPT-4o is more opinionated in images than in text · 2025-04-05T01:36:08.409Z · LW · GW

Yeah they may be the same weights. The above quote does not absolutely imply the same weights generate the text and images IMO, just that it's based on the 4o and sees the whole prompt. OpenAI's audio generation is also 'native', but it's served as a separate model on the API with different release dates, and you can't mix audio and some function calling in chatgpt in a way that's consistent with them not actually being the same weights.

Comment by Tao Lin (tao-lin) on Show, not tell: GPT-4o is more opinionated in images than in text · 2025-04-03T21:29:45.340Z · LW · GW

Note that the weights of 'gpt-4o image generation' may not be the same - they may be separate finetuned models! The main 4o chat llm calls a tool start generating an image, which may use the same weights but may just use different weights that have different post training

Comment by Tao Lin (tao-lin) on Why do many people who care about AI Safety not clearly endorse PauseAI? · 2025-03-31T16:42:33.853Z · LW · GW

EU AI Code of Practice is better, a little closer to stopping ai development

Comment by Tao Lin (tao-lin) on Good Research Takes are Not Sufficient for Good Strategic Takes · 2025-03-24T16:19:12.245Z · LW · GW

yeah there's generalization, but I do thing that eg (AGI technical alignment strategy, AGI lab and government strategy, AI welfare, AGI capabilities strategy) are sufficiently different that experts at one will be significantly behind experts on the others

Comment by Tao Lin (tao-lin) on Good Research Takes are Not Sufficient for Good Strategic Takes · 2025-03-24T02:20:16.080Z · LW · GW

Also, if you're asking a panel of people, even those skilled at strategic thinking will still be useless unless they've thought deeply about the particular question or adjacent ones. And skilled strategic thinkers can get outdated quickly if they haven't thought seriously about the problem in awhile.

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2025-03-06T18:52:59.529Z · LW · GW

The fact that they have a short lifecycle with only 1 lifetime breeding cycle is though. A lot of intelligent animals, like humans, chimps, elephants, dolphins, orcas, have long lives with many breeding cycles and grandparent roles. Ideally we want an animal that starts breeding in 1 year AND lives for 5+ breeding cycles to be able to learn enough to be useful over its lifetime. It takes so long for humans to learn enough to be useful!

Comment by Tao Lin (tao-lin) on How Much Are LLMs Actually Boosting Real-World Programmer Productivity? · 2025-03-04T18:18:54.198Z · LW · GW

Empirically, we likewise don't seem to be living in the world where the whole software industry is suddenly 5-10 times more productive. It'll have been the case for 1-2 years now, and I, at least, have felt approximately zero impact. I don't see 5-10x more useful features in the software I use, or 5-10x more software that's useful to me, or that the software I'm using is suddenly working 5-10x better, etc.

Diminishing returns! Scaling laws! One concrete version of "5x productivity" is "as much productivity as 5 copies of me in parallel", and we know that usually 5x-ing most inputs, like training compute and data, # of employees, etc, more often scales logarithmically instead of linearly

Comment by Tao Lin (tao-lin) on Fabien's Shortform · 2025-03-04T02:34:31.473Z · LW · GW

I was actually just making some tree search scaffolding, and i had the choice between honestly telling each agent would be terminated if it failed or not. I ended up telling them relatively gently that they would be terminated if they failed. Your results are maybe useful to me lol

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2025-02-20T19:27:43.781Z · LW · GW

Maybe, you could define it that way. I think R1, which uses ~naive policy gradient, is evidence that long generations are different and much easier than long eposides with environment interaction - GRPO (pretty much naive policy gradient) does no attribution to steps or parts of the trajectory, it just trains on the whole trajectory. Naive policy gradient is known to completely fail at more traditional long horizon tasks like real time video games. R1 is more like brainstorming lots of random stuff that doesn't matter and then selecting the good stuff at the end than taking actions that actually have to be good before the final output

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2025-02-20T06:11:34.962Z · LW · GW

If by "new thing" you mean reasoning models, that is not long-horizon RL. That's many generation steps with a very small number of environment interaction steps per eposide, whereas I think "long-horizon RL" means lots of environment interaction steps

Comment by Tao Lin (tao-lin) on Catastrophe through Chaos · 2025-01-31T18:32:21.061Z · LW · GW

I agree with this so much! Like you I very much expect benefits to be much greater than harms pre superintelligence. If people are following the default algorithm "Deploy all AI which is individually net positive for humanity in the near term" (which is very reasonable from many perspectives), they will deploy TEDAI and not slow down until it's too late.

I expect AI to get better at research slightly sooner than you expect.

Comment by Tao Lin (tao-lin) on MONA: Managed Myopia with Approval Feedback · 2025-01-24T20:23:02.509Z · LW · GW

Interested to see evaluations on tasks not selected to be reward-hackable and try to make performance closer to competitive with standard RL

Comment by Tao Lin (tao-lin) on AI Timelines · 2025-01-08T00:43:50.798Z · LW · GW

a hypothetical typical example would be it tries to use the file /usr/bin/python because it's memorized that that's the path to python, that fails, then it concludes it must create that folder which would require sudo permissions, if it can it could potentially mess something

Comment by Tao Lin (tao-lin) on AI Timelines · 2025-01-08T00:36:25.547Z · LW · GW

not running amock, just not reliably following instructions "only modify files in this folder" or "don't install pip packages". Claude follows instructions correctly, some other models are mode collapsed into a certain way of doing things, eg gpt-4o always thinks it's running python in chatgpt code interpreter and you need very strong prompting to make it behave in a way specific to your computer

Comment by Tao Lin (tao-lin) on AI Timelines · 2025-01-07T22:00:26.644Z · LW · GW

i've recently done more AI agents running amok and i've found Claude was actually more aligned and did stuff i asked it not to much less than oai models enough that it actaully made a difference lol

Comment by Tao Lin (tao-lin) on Tao Lin's Shortform · 2025-01-02T02:00:23.850Z · LW · GW

i'd guess effort at google/banks to be more leveraged than demos if you're only considering harm from scams and not general ai slowdown and risk

Comment by Tao Lin (tao-lin) on Tao Lin's Shortform · 2025-01-01T21:34:22.617Z · LW · GW

Working on anti spam/scam features at Google or banks could be a leveraged intervention on some worldviews. As AI advances it will be more difficult for most people to avoid getting scammed, and including really great protections into popular messaging platforms and banks could redistribute a lot of money from AIs to humans

Comment by Tao Lin (tao-lin) on A Three-Layer Model of LLM Psychology · 2024-12-31T18:15:18.192Z · LW · GW

Like the post! I'm very interested in how the capabilities of prediction vs character are changing with more recent models. Eg sonnet new may have more of its capabilities tied to its character. And Reasoning models have maybe a fourth layer between ground and character, possibly even completely replacing ground layer in highly distilled models

Comment by Tao Lin (tao-lin) on Jimrandomh's Shortform · 2024-12-30T18:04:25.324Z · LW · GW

there is https://shop.nist.gov/ccrz__ProductList?categoryId=a0l3d0000005KqSAAU&cclcl=en_US which fulfils some of this

Comment by Tao Lin (tao-lin) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-23T01:51:18.507Z · LW · GW

Wow thank you for replying so fast! I donated $5k just now, mainly because you reminded me that lightcone may not meet goal 1 and that's definitely worth meeting.

About web design, am only slightly persuaded by your response. In the example of Twitter, I don't really buy that there's public evidence that twitter's website work besides user-invisible algorithm changes has had much impact. I only use Following page, don't use spaces, lists, voice, or anything on twitter. Comparing twitter with bluesky/threads/whatever, really looks to me like cultural stuff, moderation, and advertisement are the meat, not the sites. Something like StackOverflow has more complexity that actually impacts website, in some way (like there is lots of implicit complexity in tweet reply trees and social groups but that only impacts website through user-invisible algorithms). And a core part of my model is that recommendation algoritms have a much lower ceiling for LessWrong because it doesn't have enough data volume. Like I don't expect to miss stuff i really wanted to see on LW, reading the titles of most posts isn't hard (i also have people recommend posts in person which helps...). Maybe in my model StackOverflow is at the ceiling of web dev leveraged-ness, because there is enough volume of posts written by quality people who can be nudged to spend a little more time on quality and can be sorted through, or something (vague thought).

When I look at lesswrong, it seems extremely bottlenecked on post quality. I think having the best AIs (o3 when it comes out might help significantly) help write and improve the core content of posts might make a big difference. I would bet that interventions that don't route through more effort/intelligence/knowledge going into writing main posts would make me like LessWrong much more.

Comment by Tao Lin (tao-lin) on (The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser · 2024-12-23T00:26:18.050Z · LW · GW

My main crux about how valuable Lightcone donations are is how impactful great web dev on LessWrong is. If I look around, impact of websites doesn't look strongly correlated with web design, expecially on the very high end. My model is more like platforms / social networks rise or fall by zeitgeist, moderation, big influencers/campaigns (eg elon musk for twitter), web design, in that order. Olli has thought about this much more than me, maybe he's right. I certainly don't believe there's a good argument for LW web dev is responsible for its user metrics. Zeitgeist, moderation, and lightcone people personally posting seems likely more important to me. Lightcone is still great despite my (uninformed) disagreement!

Comment by Tao Lin (tao-lin) on A breakdown of AI capability levels focused on AI R&D labor acceleration · 2024-12-23T00:08:24.733Z · LW · GW

The AI generally feels as smart as a pretty junior engineer (bottom 25% of new Google junior hires)

I expect it to be more smart than that. Plausibly o3 now generally feels as smart as 60th percentile google junior hires

Comment by Tao Lin (tao-lin) on Yonatan Cale's Shortform · 2024-11-25T23:02:11.875Z · LW · GW

note: the minecraft agents people use have far greater ability to act than to sense. They have access to commands which place blocks anywhere, and pick up blocks from anywhere, even without being able to see them, eg the llm has access to mine(blocks.wood) command which does not require it to first locate or look at where the wood is currently. If llms played minecrafts using the human interface these misalignments would happen less

Comment by Tao Lin (tao-lin) on evhub's Shortform · 2024-11-12T18:20:17.886Z · LW · GW

Building in california is bad for congresspeople! better to build across all 50 states like United Launch Alliance

Comment by Tao Lin (tao-lin) on evhub's Shortform · 2024-11-12T18:04:06.276Z · LW · GW

I likely agree that anthropic-><-palantir is good, but i disagree about blocking hte US government out of AI being a viable strategy. It seems to me like many military projects get blocked by inefficient beaurocracy, and it seems plausible to me for some legacy government contractors to get exclusive deals that delay US military ai projects for 2+ years

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2024-11-06T17:24:53.625Z · LW · GW

Why would the defenders allow the tunnels to exist? Demolishing tunnels isnt expensive, if attackers prefer to attack through tunnels there likely isn't enough incentive for defenders to not demolish tunnels

Comment by Tao Lin (tao-lin) on The hostile telepaths problem · 2024-10-30T05:07:59.526Z · LW · GW

I'm often surprised how little people notice, adapt to, or even punish self deception. It's not very hard to detect when someone's deceiving them self, people should notice more and disincentivise that

Comment by Tao Lin (tao-lin) on Change My Mind: Thirders in "Sleeping Beauty" are Just Doing Epistemology Wrong · 2024-10-16T22:30:02.256Z · LW · GW

I prefer to just think about utility, rather than probabilities. Then you can have 2 different "incentivized sleeping beauty problems"

Each time you are awakened, you bet on the coin toss, with $ payout. You get to spend this money on that day or save it for later or whatever
At the end of the experiment, you are paid money equal to what you would have made betting on your average probability you said when awoken.

In the first case, 1/3 maximizes your money, in the second case 1/2 maximizes it.

To me this implies that in real world analogues to the Sleeping Beauty problem, you need to ask whether your reward is per-awakening or per-world, and answer accordingly

Comment by Tao Lin (tao-lin) on sarahconstantin's Shortform · 2024-10-10T23:24:11.254Z · LW · GW

I disagree a lot! Many things have gotten better! Is sufferage, abolition, democracy, property rights etc not significant? All the random stuff eg better angels of our nature claims has gotten better.

Either things have improved in the past or they haven't, and either people trying to "steer the future" in some sense have been influential on these improvements. I think things have improved, and I think there's definitely not strong evidence that people trying to steer the future was always useless. Because trying to steer the future is very important and motivating, i try to do it.

Yes the counterfactual impact of you individually trying to steer the future may or may not be insignificant, but people trying to steer the future is better than no one doing that!

Comment by Tao Lin (tao-lin) on Wei Dai's Shortform · 2024-09-26T17:04:52.916Z · LW · GW

Do these options have a chance to default / are the sellers stable enough?

Comment by Tao Lin (tao-lin) on What are the best arguments for/against AIs being "slightly 'nice'"? · 2024-09-25T23:03:53.592Z · LW · GW

A core part of Paul's arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it's more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed

Comment by Tao Lin (tao-lin) on The case for a negative alignment tax · 2024-09-19T01:08:23.983Z · LW · GW

to me "alignment tax" usually only refers to alignment methods that don't cost-effectively increase capabilities, so if 90% of alignment methods did cost effectively increase capabilities but 10% did not, i would still say there was an "alignment tax", just ignore the negatives.

Also, it's important to consider cost-effective capabilities rather than raw capabilities - if a lab knows of a way to increase capabilities more cost-effectively than alignment, using that money for alignment is a positive alignment tax

Comment by Tao Lin (tao-lin) on Proveably Safe Self Driving Cars [Modulo Assumptions] · 2024-09-18T17:02:07.331Z · LW · GW

there's steganography, you'd need to limit total bits not accounted for by the gating system or something to remove them

Comment by Tao Lin (tao-lin) on Proveably Safe Self Driving Cars [Modulo Assumptions] · 2024-09-18T04:04:37.993Z · LW · GW

yes, in some cases a much weaker (because it's constrained to be provable) system can restrict the main ai, but in the case of llm jailbreaks there is no particular hope that such a guard system could work (eg jailbreaks where the llm answers in base64 require the guard to understand base64 and any other code the main ai could use)

Comment by Tao Lin (tao-lin) on In Defense of Open-Minded UDT · 2024-08-26T22:58:20.598Z · LW · GW

interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn't seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it

Comment by Tao Lin (tao-lin) on The Pragmascope Idea · 2024-08-26T22:04:15.969Z · LW · GW

Train two nets, with different architectures (both capable of achieving zero training loss and good performance on the test set), on the same data.
...
Conceptually, this sort of experiment is intended to take all the stuff one network learned, and compare it to all the stuff the other network learned. It wouldn’t yield a full pragmascope, because it wouldn’t say anything about how to factor all the stuff a network learns into individual concepts, but it would give a very well-grounded starting point for translating stuff-in-one-net into stuff-in-another-net (to first/second-order approximation).

I don't see why this experiment is good. This hessian similarity loss is only a product of the input/output behavior, and because both networks get 0 loss, their input/output behavior must be very similar, combined with general continuous optimization smoothness would lead to similar hessians. I think doing this in a case where the nets get nonzero loss (like ~all real world scenarios), would be more meaningful, because it would be similarity despite input-output behavior being non-identical and some amount of lossy compression happening.

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T20:35:23.927Z · LW · GW

yeah, i agree the movie has to be very high quality to work. This is a long shot, although the best rationalist novels are actually high quality which gives me some hope that someone could write a great novel/movie outline that's more targeted at plausible ASI scenarios

Comment by Tao Lin (tao-lin) on Please stop using mediocre AI art in your posts · 2024-08-26T19:41:39.972Z · LW · GW

it's sad that open source models like Flux have a lot of potential for customized workflows and finetuning but few people use them

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T19:26:03.778Z · LW · GW

yeah. One trajectory could be someone in-community-ish writes an extremely good novel about a very realistic ASI scenario with the intention to be adaptable into a movie, it becomes moderately popular, and it's accessible and pointed enough to do most of the guidence for the movie. I don't know exactly who could write this book, there are a few possibilities.

Comment by Tao Lin (tao-lin) on ... Wait, our models of semantics should inform fluid mechanics?!? · 2024-08-26T19:09:34.529Z · LW · GW

Another way this might fail is if fluid dynamics is too complex/difficult for you to constructively argue that your semantics are useful in fluid dynamics. As an analogy, if you wanted to show that your semantics were useful for proving fermat's last theorem, you would likely fail because you simply didn't apply enough power to the problem, and I think you may fail that way in fluid dynamics.

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T18:53:55.056Z · LW · GW

Great post!

I'm most optimistic about "feel the ASI" interventions to improve this. I think once people understand the scale and gravity of ASI, they will behave much more sensibly here. The thing I intuitively feel most optimistic (whithout really analyzing it) is movies or generally very high quality mass appeal art.

Comment by Tao Lin (tao-lin) on The economics of space tethers · 2024-08-22T17:48:25.057Z · LW · GW

you can recover lost momentum by decelerating things to land. OP mentions that briefly

And they need a regular supply of falling mass to counter the momentum lost from boosting rockets. These considerations mean that tethers have to constantly adapt to their conditions, frequently repositioning and doing maintenance.

If every launch returns and lands on earth, that would recover some but not all lost momentum, because of fuel spent on the trip. it's probably more complicted than that though

Comment by Tao Lin (tao-lin) on Zach Stein-Perlman's Shortform · 2024-08-21T19:31:30.631Z · LW · GW

two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It's likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant

Comment by Tao Lin (tao-lin) on Zach Stein-Perlman's Shortform · 2024-08-20T22:55:21.624Z · LW · GW

if you tested a recent version of the model and your tests have a large enough safety buffer, it's OK to not test the final model at all.

I agree in theory but testing the final model feels worthwhile, because we want more direct observability and less complex reasoning in safety cases.

Comment by Tao Lin (tao-lin) on Recommendation: reports on the search for missing hiker Bill Ewasko · 2024-08-15T15:45:31.405Z · LW · GW

With modern drones, searching in places with as few trees as Joshua tree could be done far more effectively. I don't know if any parks have trained teams with ~$50k with of drones ready but if they did they could have found him quickly

Comment by Tao Lin (tao-lin) on Truthseeking is the ground in which other principles grow · 2024-08-07T22:34:48.979Z · LW · GW

I am guilty of citing sources I don't believe in, particularly in machine learning. There's a common pattern where most papers are low quality, and no can/will investigate the validity of other people's papers or write review papers, so you usually form beliefs by an ensemble of lots of individually unreliable papers and your own experience. Then you're often asked for a citation and you're like "there's nothing public i believe in, but i guess i'll google papers claiming the thing i'm claiming and put those in". I think many ML people have ~given up on citing papers they believe in, including me.

Comment by Tao Lin (tao-lin) on Shutting Down the Lightcone Offices · 2024-08-07T06:29:14.878Z · LW · GW

I don't particularly like the status hierarchy and incentive landscape of the ML community, which seems quite well-optimized to cause human extinction

the incentives are indeed bad, but more like incompetent and far from optimized to cause extinction

Comment by Tao Lin (tao-lin) on New fast transformer inference ASIC — Sohu by Etched · 2024-07-03T20:29:45.476Z · LW · GW

the reason why etched was less bandwidth limited is they traded latency for throughput by batching prompts and completions together. Gpus could also do that but they don't to improve latency

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2024-07-02T22:57:29.824Z · LW · GW

the reason airplanes need speed is basically because their propeller/jet blades are too small to be efficient at slow speed. You need a certain amount of force to lift off, and the more air you push off of at once the more force you get per energy. The airplanes go sideways so that their wings, which are very big, can provide the lift instead of their engines. Also this means that if you want to go fast and hover efficiently, you need multiple mechanisms because the low volume high speed engine won't also be efficient at low speed

Comment by Tao Lin (tao-lin) on Fabien's Shortform · 2024-06-26T17:34:47.455Z · LW · GW

yeah learning from distant near misses is important! Feels that way in risky electric unicycling.

User info

Posts

Comments