LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
It's a bit of an out-of-body experience to see your own tweet in a newsletter! The location-finding model used is available on geospy.ai.
mishka on AI #62: Too Soon to Tellsuccess
d0themath on D0TheMath's ShortformI don't really know what people mean when they try to compare "capabilities advancements" to "safety advancements". In one sense, its pretty clear. The common units are "amount of time", so we should compare the marginal (probablistic) difference between time-to-alignment and time-to-doom. But I think practically people just look at vibes.
For example, if someone releases a new open source model people say that's a capabilities advance, and should not have been done. Yet I think there's a pretty good case that more well-trained open source models are better for time-to-alignment than for time-to-doom, since much alignment work ends up being done with them, and the marginal capabilities advance here is zero. Such work builds on the public state of the art, but not the private state of the art, which is probably far more advanced.
I also don't often see people making estimates of the time-wise differential impacts here. Maybe people think such things would be exfo/info-hazardous, but nobody even claims to have estimates here when the topic comes up (even in private, though people are glad to talk about their hunches for what AI will look like in 5 years, or the types of advancements necessary for AGI), despite all the work on timelines. Its difficult to do this for the marginal advance, but not so much for larger research priorities, which are the sorts of things people should be focusing on anyway.
brendan-long on How would you navigate a severe financial emergency with no help or resources?I'm not sure if you have the setup for this, but call centers tend to pay reasonably well, and some are online now and don't care where you work from.
You could try using your partner's connections to get a signing bonus or early-payment of his first few paychecks, then use that to cover moving.
It might be worth getting him to where the work is with a cheap one-way flight and the cheapest hotel you can find (or stay with friends if possible), then follow later when you have enough saved for moving. Or do something similar in Arizona (find a job that's too far to drive and stay in the cheapest motel you can find nearby until you've saved enough to move).
Some jobs provide transportation, room, and board, like cruise companies. If you can get one of those jobs, they'll get you where you need to be and provide somewhere to live during the season. This includes both people on the ships and some people on land (i.e. they don't expect employees to live year-round in Skagway, AK).
zach-stein-perlman on Questions for labsThanks. Briefly:
I'm not sure what the theory of change for listing such questions is.
In the context of policy advocacy, think it's sometimes fine/good for labs to say somewhat different things publicly vs privately. Like, if I was in charge of a lab and believed (1) the EU AI Act will almost certainly pass and (2) it has some major bugs that make my life harder without safety benefits, I'd publicly say "I support (the goals of) the EU AI Act" and privately put some effort into removing those bugs, which is technically lobbying to weaken the Act.
(^I'm not claiming that particular labs did ~this rather than actually lobby against the Act. I just think it's messy and regulation isn't a one-dimensional thing that you're for or against.)
daniel-kokotajlo on Please stop publishing ideas/insights/research about AI"If nobody publishes anything, how will alignment get solved?" — sure, it's harder for alignment researchers to succeed if they don't communicate publicly with one another — but it's not impossible. That's what dignity is about. A
Huh, I have the opposite intuition. I was about to cite that exact same "Death with dignity" post as an argument for why you are wrong; it's undignified for us to stop trying to solve the alignment problem and publicly discussing the problem with each other, out of fear that some of our ideas might accidentally percolate into OpenAI and cause them to go slightly faster, and that this increased speedup might have made the difference between victory and defeat. The dignified thing to do is think and talk about the problem.
I think this is undignified.
I agree that it would be safer if humanity were a collective hivemind that could coordinate to not build AI until we know how to build the best AI, and that people should differentially work on things that make the situation better rather than worse, and that this potentially includes keeping quiet about information that would make things worse.
The problem is—as you say—"[i]t's very rare that any research purely helps alignment"; you can't think about aligning AI without thinking about AI. In order to navigate the machine intelligence transition in the most dignified way, you want your civilization's best people to be doing their best thinking about the problem, and your best people can't do their best thinking under the conditions of paranoid secrecy.
Concretely, I've been studying some deep learning basics lately and have written a couple [LW · GW] posts [LW · GW] about things I've learned. I think this was good, not bad. I think I and my readers have a slightly better understanding of the technology in question than if I hadn't studied and hadn't written, and that better understanding will help us make better decisions in expectation.
This applies doubly so to work that aims to make AI understandable or helpful, rather than aligned—a helpful AI will help anyone
Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.
seth-herd on Please stop publishing ideas/insights/research about AIAt the core, this is a reminder to not publish things that will help more with capabilities than alignment. That's perfectly reasonable.
The tone of the post suggests erring on the side of "safety" by not publishing things that have an uncertain safety/capabilities balance. I hope that wasn't the intent.
Because that does not make sense. Anything that advances alignment more than safety in expectation should be published.
You have to make a difficult judgment call for each publication. Be mindful of your bias in wanting to publish to show off your work and ideas. Get others' insights if you can do so reasonably quickly.
But at the end of the day, you have to make that judgment call. There's no consolation prize for saying "at least I didn't make the world end faster". If you're a utilitarian, winning the future is the only goal.
(If you're not a utilitarian, you might actually want a resolution faster so you and your loved ones have higher odds of surviving into the far future.)
lee_sharkey on Transcoders enable fine-grained interpretable circuit analysis for language modelsI'm pretty sure that there's at least one other MATS group (unrelated to us) currently working on this, although I'm not certain about any of the details. Hopefully they release their research soon!
There's recent work published on this here [LW · GW] by Chris Mathwin, Dennis Akar, and me. The gated attention block is a kind of transcoder adapted for attention blocks.
Nice work by the way! I think this is a promising direction.
Note also the similar, but substantially different, use of the term transcoder here [AF · GW], whose problems were pointed out to me by Lucius. Addressing those problems helped to motivate our interest in the kind of transcoders that you've trained in your work!
But do they also generalize out of training distribution more similarly? If so, why?
Neither of them is going to generalize very well out of distribution, and to the extent they do it will be via looking for features that were present in-distribution. The old adage "to imagine 10-dimensional space, first imagine 3-space, then say 10 really hard".
My guess is that basically every learning system which tractably approximates Bayesian updating on noisy high dimensional data is going to end up with roughly Gaussian OOD behavior. There's been some experiments where (non-adversarially-chosen) OOD samples quickly degrade to uniform prior, but I don't think that's been super robustly studied.
The way humans generalize OOD is not that our visual systems are natively equipped to generalize to contexts they have no way of knowing about, that would be a true violation of no-free-lunch theorems, but that through linguistic reflection & deliberate experimentation some of us can sometimes get a handle on the new domain, and then we use language to communicate that handle to others who come up with things we didn't, etc. OOD generalization is a process at the (sub)cultural & whole-nervous-system level, not something that individual chunks of the brain can do well on their own.
This is also confusing/concerning for me. Why would it be necessary or helpful to have such a large dataset to align the shape/texture bias with humans?
Well it might not be, but you need large datasets to motivate studying large models, as their performance on small datasets like imagenet is often only marginally better.
A 20b param ViT trained on 10m images at 224x224x3 is approximately 1 param for every 75 subpixels, and 2000 params for every image. Classification is an easy enough objective that it very likely just overfits, unless you regularize it a ton, at which point it might still have the expected shape bias at great expense. Training a 20b param model is expensive, I don't think anyone has ever spent that much on a mere imagenet classifier, and public datasets >10x the size of imagenet with any kind of labels only started getting collected in 2021.
To motivate this a bit, humans don't see in frames but let's pretend we do. At 60fps for 12h/day for 10 years, that's nearly 9.5 billion frames. Imagenet is 10 million images. Our visual cortex contains somewhere around 5 billion neurons, which is around 50 trillion parameters (at 1 param / synapse & 10k synapses / neuron, which is a number I remember being reasonable for the whole brain but vision might be 1 or 2 OOM special in either direction).