Posts
Comments
I've recently gotten into partner dancing and I think it's a pretty superior activity
One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.
I think the "fraction of Training compute" going towards agency vs nkn agency will be lower in video models than llms, and llms will likely continue to be bigger, so video models will stay behind llms in overall agency
Helpfullness finetuning might make these models more capable when they're on the correct side of the debate. Sometimes RLHF(like) models simply perform worse on tasks they're finetuned to avoid even when they don't refuse or give up. Would be nice to try base model debaters
A core advantage of bandwidth limiting over other cybersec interventions is its a simple system we can make stronger arguments about, implemented on a simple processor, without the complexity and uncertainty of modern processors and OSes
no clock speed stays the same, but clock cycle latency of communication between regions increases. Just like CPUs require more clock cycles to access memory than they used to.
do we have any reason to believe that particular election won't be close
I'd expect artificial sweeteners are already very cheap, and most people want more tested chemicals.
There exists an Effective Altruism VR discord group. It used to have regular VRChat meetups in like 2021 but doesn't have much activity now i think
I'd be interested in experiments with more diverse data. Maybe this only works because the passages are very short and simple and uniform, and are using very superposition-y information that wouldn't exist in longer and more diverse text
i thought about this for a minute and landed on no counting for lorentz factor. Things hitting on the side have about the same relative velocity as things hitting from the front . Because they're hitting the side they could either bounce off or dump all their tangent kinetic energy into each other. like because all the relative velocity is tangent, they could in principle interact without exchanging significant energy. But probably the side impacts are just as dangerous. Which might make them more dangerous because you have less armor on the side
probes probably want a very skinny aspect ratio. If cosmic dust travels at 20km/s, that's 15k times slower than the probe is travelling, so maybe that means the probe should be eg 10cm wide and 1.5km long
important to note that gpt4 is more like 300x scale equivalent of gpt3, not 100x, based on gpt4 being trained with (rumored) 2e25 flops vs contemporary gpt3-level models (llama2-7b) being trained on 8e22 flops ( 250 times the compute for that particular pair)
Some months before release they had a RLHF-ed model, where the RLHF was significantly worse on most dimensions than the model they finally released. This early RLHF-ed model was mentioned in eg Sparks of AGI.
if AI does change the offence defence balance, it could be because defending an AI (that doesnt need to protect humans) is fundamentally different than defending humans, allowing the AI to spend much less on defence
video can get extremely expensive without specific architectural support. Eg a folder of images takes up >10x the space of the equivalent video, and using eg 1000 tokens per frame for 30 frames/second is a lot of compute
looks slightly behind gpt-4-base in benchmarks. On the tasks where gemini uses chain-of-thought best-of-32 with optimized prompts it beats gpt-4-base, but ones where it doesnt its same or behind
E.g. suppose some AI system was trained to learn new video games: each RL episode was it being shown a video game it had never seen, and it's supposed to try to play it; its reward is the score it gets. Then after training this system, you show it a whole new type of video game it has never seen (maybe it was trained on platformers and point-and-click adventures and visual novels, and now you show it a first-person-shooter for the first time). Suppose it could get decent at the first-person-shooter after like a subjective hour of messing around with it. If you saw that demo in 2025, how would that update your timelines?
Time constraints may make this much harder. Like a lot of games require multiple inputs per second (eg double jump) and at any given time the AI with the best transfer learning will be far too slow for inference to play as well as a human. (you could slow the game down of course)
Leela Zero uses MCTS, it doesnt play superhuman in one forward pass (like gpt-4 can do in some subdomains) (i think, didnt find any evaluations of Leela Zero at 1 forward pass), and i'd guess that the network itself doesnt contain any more generalized game playing circuitry than an llm, it just has good intuitions for Go.
Nit:
Subjectively there is clear improvement between 7b vs. 70b vs. GPT-4, each step 1.5-2 OOMs of training compute.
1.5 to 2 OOMs? 7b to 70b is 1 OOM of compute, adding in chinchilla efficiency would make it like 1.5 OOMs of effective compute, not 2. And llama 70b to gpt-4 is 1 OOM effective compute according to openai naming - llama70b is about as good as gpt-3.5. And I'd personally guess gpt4 is 1.5 OOMs effective compute above llama70b, not 2.
I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one.
Agree. Most people will naturally buy AGI Safety if they really believe in AGI. No AGI->AGI is the hard part, not AGI->AGI Safety.
then chatgpt4 would still have had low rate limits, so most people would still be more informed by ChatGPT3.5
Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.
I would have guessed that this is just not the level of trust people operate at. like for most things in policy people don't really act like their opposition is in good faith so there's not much to lose here. (weakly held)
Chat or instruction finetuned models have poor prediction cailbration, whereas base models (in some cases) have perfect calibration. Also forecasting is just hard. So I'd expect chat models to ~always fail, base models to fail slightly less, but i'd expect finetuned models (on a somewhat large dataset) to be somewhat useful.
Another huge missed opportunity is thermal vision. Thermal infrared vision is a gigantic boon for hunting at night, and you might expect eg owls and hawks to use it to spot prey hundreds of meters away in pitch darkness, but no animals do (some have thermal sensing, but only extremely short range)
What about simulating the user being sad? If you train on both sides of transcripts, this would straightforwardly happen, and even if you only trained on the assistant side it would still "simulate" the user as a generalization of pretraining.
I feel like militaries would really want to collect huge datasets of camoflaged military personel and equipment, usually from long distance. If people collected million-image datasets and used the largest models it might be feasible
One piece of evidence for ML people not understanding things is the low popularity of uParameterization. After it was released there's every theoretical and epirical reason to use it but most big projects (like llama 2?) just don't.
"Median person in ML" varies wildly by population. To people in AI safety, capabilities researchers at Openai et al represent the quintessential "ML people", and most of them understand deep learning about as well as Bengio. I agree that the median person employed in an ML role knows very little, but you can do a lot of ML without running into the average "ML" person.
I would bet money, maybe $2k, that I can create a robust system using a combination of all the image compression techniques I can conveniently find and a variety of ml models with self consistency that achieves >50% robust accuracy even after another year of attacks Edit: on inputs that don't look obviously corrupted or mangled to an average human
I expect lossy image compression to perform better than downsampling or noising because it's directly destroying the information that humans don't notice while keeping information that humans notice. Especially if we develop stronger lossy encoding using vision models, it really feels like we should be able to optimize our encodings to destroy the vast majority of human-unnoticed information.
I think the tax will be surprisingly low. Most of the time people or chatbots see images, they don't actually need to see much detail. As an analogy, i know multiple people who are almost legally blind (see 5-10x less resolution than normal), and they can have long conversations about the physical world ("where's my cup?\n[points] over there" ect) without anyone noticing that their vision is subpar.
For example, if you ask a chatbot "Suggest some landscaping products to use on my property
I claim the chatbot will be able to respond almost as well as if you had given it
even though the former image is 1kb vs the latter's 500kb. Defense feels pretty tractable if each input image is only ~1kb.
It's very important to know how well multimodal models perform when only shown extreme lossy compressed images, and whether these attacks work through extreme lossy compression. Feels like compressing everything horrifically before showing to models might solve it.
if the human brain had around 2.5 petabytes of storage, that would decrease my credence in AI being brain-like, because i believe AI is on track to match human intelligence in its current paradigm, so the brain being different just means the brain is different.
Pythia is meant for this
I do think open sourcing is better, because there already was a lot of public attention and results on llm capabilities which are messy and misleading, and open sourcing one eval like this might improve our understanding a lot. Also, there are tons of llm agent projects/startups trying to build hype, so if you drop a benchmark here you are unlikely to attract unwanted attention (i'm guessing). I largely agree with https://www.lesswrong.com/posts/fRSj2W4Fjje8rQWm9/thoughts-on-sharing-information-about-language-model
Do you want to open source the code for this?
Base model sycophancy feels very dependent on the training distribution and prompting. I'd guess there are some prompts where a pretrained model will always agree with other voices in the prompt, and some where it would disagree, because on some websites where people agree a lot, on some websites where people disagree, and maybe an effect where it will switch positions every step to simulate an argument between two teams.
I'd be very afraid that the track would flex and absorb energy because it has such a long unsupported span, would first test a top-two-thirds roll to see the flex. Also i might calculate the rotational KE of the ball at the end, to see if that absorbs a significant fraction of the energy
If automating alignment research worked, but took 1000+ tokens per researcher-second, it would be much more difficult to develop, because you'd need to run the system for 50k-1M tokens between each "reward signal or such". Once it's 10 or less tokens per researcher second, it'll be easy to develop and improve quickly.
AIs could learn to cooperate with perfect selfishness, but humans and AIs usually learn easier to compute heuristics / "value shards" early in training, which persist to some extent after the agent discovers the true optimal policy, although reflection or continued training could stamp out the value shards later.
The big reason why humans are cosmopolitan might be that we evolved in multipolar environments, where helping others is instrumental. If so, just training AIs in multipolar environments that incentivize cooperation could be all it takes to get some amount of instrumental-made-terminal-by-optimization-failure cosmopolitanism.
if I were doing this, I'd use gpt-4 to translate it into the style of a specific person, preferably a deceased public figure, then edit the result. I'd guess GPTs are better at translating to a specific style than removing style
I think the fact that some process produced the image and showed it to you is a lot of evidence. Your theories need to be compatible with something intelligent deciding to produce the image and show it to you. Therefore you could in principle (although I think unlikely) arrive at GR from a render of a simulated apple, by considering universes that support intelligence where said intelligence would make an image of an apple.
One takeaway from this would be "CoT is more accurate at the limit of the model's capabilities". Given this, you could have a policy of only using the least capable model for every task to make CoT more influential. Of course this means you always get barely-passable model performance, which is unfortunate. Also, people often use CoT in cases where it intuitively wouldn't help, such as common sense NLP questions, where I'd expect influential CoT to be pretty unnatural.
based on semi-private rumors and such, i think their current best model is significantly behind gpt4
>Why did you decide to go with the equivalence of 1 token = 1 bit? Since a token can usually take on the order of 10k to 100k possible values, wouldn't 1 token equal 13-17 bits a more accurate equivalence?
LLMs make very inneficient use of their context size because they're writing human-like text which is predictable. Human text is like 0.6 bits/byte, so maybe 2.5 bits per token. Text used in language model scaffolding and such tends to be even more predictable (by maybe 30%)
Why measure flops at FP32? All the big training runs in the last 2 years are FP16 right?
Foundation Models tend to have a more limited type of orthogonality - they're good at pursuing any goal that's plausible under the training distribution, meaning they can pursue any goal that humans would plausibly have (with some caveats I guess). This is most true without outcome-based RL on top of the foundation model, but I'd guess some of the orthogonality transfers through RL.
Imo they are silent bc they're failing at capabilities and don't see popular games (which are flashier) as good research anymore
When I first saw "save all weights to on chip hardware", I thought it would be super expensive, but actually saving like 5 times the GPU's memory to a seperate flash chip would only cost $20 (80GB*5 at 5 cents per gigabyte for flash storage). It can be way cheaper bc it's low bandwidth and slow.