Posts

Send us example gnarly bugs 2023-12-10T05:23:00.773Z
Causal scrubbing: results on induction heads 2022-12-03T00:59:18.327Z
Causal scrubbing: results on a paren balance checker 2022-12-03T00:59:08.078Z
Tao Lin's Shortform 2021-07-30T21:03:00.931Z

Comments

Comment by Tao Lin (tao-lin) on Wei Dai's Shortform · 2024-09-26T17:04:52.916Z · LW · GW

Do these options have a chance to default / are the sellers stable enough?

Comment by Tao Lin (tao-lin) on What are the best arguments for/against AIs being "slightly 'nice'"? · 2024-09-25T23:03:53.592Z · LW · GW

A core part of Paul's arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it's more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed

Comment by Tao Lin (tao-lin) on The case for a negative alignment tax · 2024-09-19T01:08:23.983Z · LW · GW

to me "alignment tax" usually only refers to alignment methods that don't cost-effectively increase capabilities, so if 90% of alignment methods did cost effectively increase capabilities but 10% did not, i would still say there was an "alignment tax", just ignore the negatives.

Also, it's important to consider cost-effective capabilities rather than raw capabilities - if a lab knows of a way to increase capabilities more cost-effectively than alignment, using that money for alignment is a positive alignment tax

Comment by Tao Lin (tao-lin) on Proveably Safe Self Driving Cars [Modulo Assumptions] · 2024-09-18T17:02:07.331Z · LW · GW

there's steganography, you'd need to limit total bits not accounted for by the gating system or something to remove them

Comment by Tao Lin (tao-lin) on Proveably Safe Self Driving Cars [Modulo Assumptions] · 2024-09-18T04:04:37.993Z · LW · GW

yes, in some cases a much weaker (because it's constrained to be provable) system can restrict the main ai, but in the case of llm jailbreaks there is no particular hope that such a guard system could work (eg jailbreaks where the llm answers in base64 require the guard to understand base64 and any other code the main ai could use)

Comment by Tao Lin (tao-lin) on In Defense of Open-Minded UDT · 2024-08-26T22:58:20.598Z · LW · GW

interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn't seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it

Comment by Tao Lin (tao-lin) on The Pragmascope Idea · 2024-08-26T22:04:15.969Z · LW · GW

Train two nets, with different architectures (both capable of achieving zero training loss and good performance on the test set), on the same data.
...
Conceptually, this sort of experiment is intended to take all the stuff one network learned, and compare it to all the stuff the other network learned. It wouldn’t yield a full pragmascope, because it wouldn’t say anything about how to factor all the stuff a network learns into individual concepts, but it would give a very well-grounded starting point for translating stuff-in-one-net into stuff-in-another-net (to first/second-order approximation).

I don't see why this experiment is good. This hessian similarity loss is only a product of the input/output behavior, and because both networks get 0 loss, their input/output behavior must be very similar, combined with general continuous optimization smoothness would lead to similar hessians. I think doing this in a case where the nets get nonzero loss (like ~all real world scenarios), would be more meaningful, because it would be similarity despite input-output behavior being non-identical and some amount of lossy compression happening.

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T20:35:23.927Z · LW · GW

yeah, i agree the movie has to be very high quality to work. This is a long shot, although the best rationalist novels are actually high quality which gives me some hope that someone could write a great novel/movie outline that's more targeted at plausible ASI scenarios

Comment by Tao Lin (tao-lin) on Please stop using mediocre AI art in your posts · 2024-08-26T19:41:39.972Z · LW · GW

it's sad that open source models like Flux have a lot of potential for customized workflows and finetuning but few people use them

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T19:26:03.778Z · LW · GW

yeah. One trajectory could be someone in-community-ish writes an extremely good novel about a very realistic ASI scenario with the intention to be adaptable into a movie, it becomes moderately popular, and it's accessible and pointed enough to do most of the guidence for the movie. I don't know exactly who could write this book, there are a few possibilities.

Comment by Tao Lin (tao-lin) on ... Wait, our models of semantics should inform fluid mechanics?!? · 2024-08-26T19:09:34.529Z · LW · GW

Another way this might fail is if fluid dynamics is too complex/difficult for you to constructively argue that your semantics are useful in fluid dynamics. As an analogy, if you wanted to show that your semantics were useful for proving fermat's last theorem, you would likely fail because you simply didn't apply enough power to the problem, and I think you may fail that way in fluid dynamics.

Comment by Tao Lin (tao-lin) on Would catching your AIs trying to escape convince AI developers to slow down or undeploy? · 2024-08-26T18:53:55.056Z · LW · GW

Great post!

I'm most optimistic about "feel the ASI" interventions to improve this. I think once people understand the scale and gravity of ASI, they will behave much more sensibly here. The thing I intuitively feel most optimistic (whithout really analyzing it) is movies or generally very high quality mass appeal art.

Comment by Tao Lin (tao-lin) on The economics of space tethers · 2024-08-22T17:48:25.057Z · LW · GW

you can recover lost momentum by decelerating things to land. OP mentions that briefly

And they need a regular supply of falling mass to counter the momentum lost from boosting rockets. These considerations mean that tethers have to constantly adapt to their conditions, frequently repositioning and doing maintenance.

If every launch returns and lands on earth, that would recover some but not all lost momentum, because of fuel spent on the trip. it's probably more complicted than that though

Comment by Tao Lin (tao-lin) on Zach Stein-Perlman's Shortform · 2024-08-21T19:31:30.631Z · LW · GW

two versions with the same posttraining, one with only 90% pretraining are indeed very similar, no need to evaluate both. It's likely more like one model with 80% pretraining and 70% posttraining of the final model, and the last 30% of posttraining might be significant

Comment by Tao Lin (tao-lin) on Zach Stein-Perlman's Shortform · 2024-08-20T22:55:21.624Z · LW · GW

 if you tested a recent version of the model and your tests have a large enough safety buffer, it's OK to not test the final model at all.

I agree in theory but testing the final model feels worthwhile, because we want more direct observability and less complex reasoning in safety cases.

Comment by Tao Lin (tao-lin) on Recommendation: reports on the search for missing hiker Bill Ewasko · 2024-08-15T15:45:31.405Z · LW · GW

With modern drones, searching in places with as few trees as Joshua tree could be done far more effectively. I don't know if any parks have trained teams with ~$50k with of drones ready but if they did they could have found him quickly

Comment by Tao Lin (tao-lin) on Truthseeking is the ground in which other principles grow · 2024-08-07T22:34:48.979Z · LW · GW

I am guilty of citing sources I don't believe in, particularly in machine learning. There's a common pattern where most papers are low quality, and no can/will investigate the validity of other people's papers or write review papers, so you usually form beliefs by an ensemble of lots of individually unreliable papers and your own experience. Then you're often asked for a citation and you're like "there's nothing public i believe in, but i guess i'll google papers claiming the thing i'm claiming and put those in". I think many ML people have ~given up on citing papers they believe in, including me.

Comment by Tao Lin (tao-lin) on Shutting Down the Lightcone Offices · 2024-08-07T06:29:14.878Z · LW · GW

I don't particularly like the status hierarchy and incentive landscape of the ML community, which seems quite well-optimized to cause human extinction

the incentives are indeed bad, but more like incompetent and far from optimized to cause extinction

Comment by Tao Lin (tao-lin) on New fast transformer inference ASIC — Sohu by Etched · 2024-07-03T20:29:45.476Z · LW · GW

the reason why etched was less bandwidth limited is they traded latency for throughput by batching prompts and completions together. Gpus could also do that but they don't to improve latency

Comment by Tao Lin (tao-lin) on Daniel Kokotajlo's Shortform · 2024-07-02T22:57:29.824Z · LW · GW

the reason airplanes need speed is basically because their propeller/jet blades are too small to be efficient at slow speed. You need a certain amount of force to lift off, and the more air you push off of at once the more force you get per energy. The airplanes go sideways so that their wings, which are very big, can provide the lift instead of their engines. Also this means that if you want to go fast and hover efficiently, you need multiple mechanisms because the low volume high speed engine won't also be efficient at low speed

Comment by Tao Lin (tao-lin) on Fabien's Shortform · 2024-06-26T17:34:47.455Z · LW · GW

yeah learning from distant near misses is important! Feels that way in risky electric unicycling. 

Comment by Tao Lin (tao-lin) on johnswentworth's Shortform · 2024-06-21T18:48:30.002Z · LW · GW

No, the mi300x is not superior to nvidias chips, largely because It costs >2x to manufacture as nvidias chips

Comment by Tao Lin (tao-lin) on Response to Aschenbrenner's "Situational Awareness" · 2024-06-10T16:53:50.054Z · LW · GW

This makes a much worse lesswrong post than twitter thread, it's just a very rudimentary rehashing of very long standing debates

Comment by Tao Lin (tao-lin) on We might be dropping the ball on Autonomous Replication and Adaptation. · 2024-06-04T23:36:23.304Z · LW · GW

for reference, just last week i rented 3 8xh100 boxes without any KYC

Comment by Tao Lin (tao-lin) on MIRI 2024 Communications Strategy · 2024-05-31T03:06:34.359Z · LW · GW

I don't think slaughtering billions of people would be very useful. As a reference point, wars between countries almost never result in slaughtering that large a fraction of people

Comment by Tao Lin (tao-lin) on Open Thread Spring 2024 · 2024-05-23T18:11:50.391Z · LW · GW

lol Paul is a very non-disparaging person. He always makes his criticism constructive, i don't know if there's any public evidence of him disparaging anyone regardless of NDAs

Comment by Tao Lin (tao-lin) on Some ways of spending your time are better than others · 2024-03-13T23:50:42.877Z · LW · GW

I've recently gotten into partner dancing and I think it's a pretty superior activity

Comment by Tao Lin (tao-lin) on TurnTrout's shortform feed · 2024-03-12T01:22:41.876Z · LW · GW

One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.

Comment by Tao Lin (tao-lin) on OpenAI's Sora is an agent · 2024-02-18T17:53:26.651Z · LW · GW

I think the "fraction of Training compute" going towards agency vs nkn agency will be lower in video models than llms, and llms will likely continue to be bigger, so video models will stay behind llms in overall agency

Comment by Tao Lin (tao-lin) on Debating with More Persuasive LLMs Leads to More Truthful Answers · 2024-02-08T06:12:21.424Z · LW · GW

Helpfullness finetuning might make these models more capable when they're on the correct side of the debate. Sometimes RLHF(like) models simply perform worse on tasks they're finetuned to avoid even when they don't refuse or give up. Would be nice to try base model debaters

Comment by Tao Lin (tao-lin) on Preventing model exfiltration with upload limits · 2024-02-06T20:37:30.963Z · LW · GW

A core advantage of bandwidth limiting over other cybersec interventions is its a simple system we can make stronger arguments about, implemented on a simple processor, without the complexity and uncertainty of modern processors and OSes

Comment by Tao Lin (tao-lin) on Processor clock speeds are not how fast AIs think · 2024-01-30T01:04:18.844Z · LW · GW

no clock speed stays the same, but clock cycle latency of communication between regions increases. Just like CPUs require more clock cycles to access memory than they used to.

Comment by Tao Lin (tao-lin) on Will quantum randomness affect the 2028 election? · 2024-01-26T20:25:00.809Z · LW · GW

do we have any reason to believe that particular election won't be close

Comment by Tao Lin (tao-lin) on There is way too much serendipity · 2024-01-22T03:39:34.868Z · LW · GW

I'd expect artificial sweeteners are already very cheap, and most people want more tested chemicals.

Comment by tao-lin on [deleted post] 2024-01-21T21:12:46.685Z

There exists an Effective Altruism VR discord group. It used to have regular VRChat meetups in like 2021 but doesn't have much activity now i think

Comment by Tao Lin (tao-lin) on What’s up with LLMs representing XORs of arbitrary features? · 2024-01-07T23:56:25.115Z · LW · GW

I'd be interested in experiments with more diverse data. Maybe this only works because the passages are very short and simple and uniform, and are using very superposition-y information that wouldn't exist in longer and more diverse text

Comment by Tao Lin (tao-lin) on Succession · 2023-12-22T21:09:35.456Z · LW · GW

i thought about this for a minute and landed on no counting for lorentz factor. Things hitting on the side have about the same relative velocity as things hitting from the front . Because they're hitting the side they could either bounce off or dump all their tangent kinetic energy into each other. like because all the relative velocity is tangent, they could in principle interact without exchanging significant energy. But probably the side impacts are just as dangerous. Which might make them more dangerous because you have less armor on the side

Comment by Tao Lin (tao-lin) on Succession · 2023-12-22T19:40:25.409Z · LW · GW

probes probably want a very skinny aspect ratio. If cosmic dust travels at 20km/s, that's 15k times slower than the probe is travelling, so maybe that means the probe should be eg 10cm wide and 1.5km long

Comment by Tao Lin (tao-lin) on When will GPT-5 come out? Prediction markets vs. Extrapolation · 2023-12-12T17:51:45.406Z · LW · GW

important to note that gpt4 is more like 300x scale equivalent of gpt3, not 100x, based on gpt4 being trained with (rumored) 2e25 flops vs contemporary gpt3-level models (llama2-7b) being trained on 8e22 flops ( 250 times the compute for that particular pair)

Comment by Tao Lin (tao-lin) on When will GPT-5 come out? Prediction markets vs. Extrapolation · 2023-12-12T17:44:06.021Z · LW · GW

Some months before release they had a RLHF-ed model, where the RLHF was significantly worse on most dimensions than the model they finally released. This early RLHF-ed model was mentioned in eg Sparks of AGI.

Comment by Tao Lin (tao-lin) on The Offense-Defense Balance Rarely Changes · 2023-12-09T18:00:22.395Z · LW · GW

if AI does change the offence defence balance, it could be because defending an AI (that doesnt need to protect humans) is fundamentally different than defending humans, allowing the AI to spend much less on defence

Comment by Tao Lin (tao-lin) on Google Gemini Announced · 2023-12-06T23:31:15.941Z · LW · GW

video can get extremely expensive without specific architectural support. Eg a folder of images takes up >10x the space of the equivalent video, and using eg 1000 tokens per frame for 30 frames/second is a lot of compute

Comment by Tao Lin (tao-lin) on Google Gemini Announced · 2023-12-06T19:24:49.162Z · LW · GW

looks slightly behind gpt-4-base in benchmarks. On the tasks where gemini uses chain-of-thought best-of-32 with optimized prompts it beats gpt-4-base, but ones where it doesnt its same or behind

Comment by Tao Lin (tao-lin) on AI Timelines · 2023-11-13T23:55:34.172Z · LW · GW

E.g. suppose some AI system was trained to learn new video games: each RL episode was it being shown a video game it had never seen, and it's supposed to try to play it; its reward is the score it gets. Then after training this system, you show it a whole new type of video game it has never seen (maybe it was trained on platformers and point-and-click adventures and visual novels, and now you show it a first-person-shooter for the first time). Suppose it could get decent at the first-person-shooter after like a subjective hour of messing around with it. If you saw that demo in 2025, how would that update your timelines?

 

Time constraints may make this much harder. Like a lot of games require multiple inputs per second (eg double jump) and at any given time the AI with the best transfer learning will be far too slow for inference to play as well as a human. (you could slow the game down of course)

Comment by Tao Lin (tao-lin) on AI Timelines · 2023-11-11T20:11:41.683Z · LW · GW

Leela Zero uses MCTS, it doesnt play superhuman in one forward pass (like gpt-4 can do in some subdomains) (i think, didnt find any evaluations of Leela Zero at 1 forward pass), and i'd guess that the network itself doesnt contain any more generalized game playing circuitry than an llm, it just has good intuitions for Go.

 

Nit:

Subjectively there is clear improvement between 7b vs. 70b vs. GPT-4, each step 1.5-2 OOMs of training compute.

1.5 to 2 OOMs? 7b to 70b is 1 OOM of compute, adding in chinchilla efficiency would make it like 1.5 OOMs of effective compute, not 2. And llama 70b to gpt-4 is 1 OOM effective compute according to openai naming - llama70b is about as good as gpt-3.5. And I'd personally guess gpt4 is 1.5 OOMs effective compute above llama70b, not 2.

Comment by Tao Lin (tao-lin) on Integrity in AI Governance and Advocacy · 2023-11-09T01:33:08.940Z · LW · GW

I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one.

Agree. Most people will naturally buy AGI Safety if they really believe in AGI. No AGI->AGI is the hard part, not AGI->AGI Safety.

Comment by Tao Lin (tao-lin) on Integrity in AI Governance and Advocacy · 2023-11-08T21:50:13.163Z · LW · GW

then chatgpt4 would still have had low rate limits, so most people would still be more informed by ChatGPT3.5

Comment by Tao Lin (tao-lin) on Integrity in AI Governance and Advocacy · 2023-11-08T20:36:39.528Z · LW · GW

Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.

I would have guessed that this is just not the level of trust people operate at. like for most things in policy people don't really act like their opposition is in good faith so there's not much to lose here. (weakly held)

Comment by Tao Lin (tao-lin) on Are language models good at making predictions? · 2023-11-06T16:02:51.276Z · LW · GW

Chat or instruction finetuned models have poor prediction cailbration, whereas base models (in some cases) have perfect calibration. Also forecasting is just hard. So I'd expect chat models to ~always fail, base models to fail slightly less, but i'd expect finetuned models (on a somewhat large dataset) to be somewhat useful.

Comment by Tao Lin (tao-lin) on Alexander Gietelink Oldenziel's Shortform · 2023-11-02T14:23:36.093Z · LW · GW

Another huge missed opportunity is thermal vision. Thermal infrared vision is a gigantic boon for hunting at night, and you might expect eg owls and hawks to use it to spot prey hundreds of meters away in pitch darkness, but no animals do (some have thermal sensing, but only extremely short range)