LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)
We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.
Also, one of my points was that even a tiny trace of malicious optimization here can still have large effects because there are so many neutral options which are convergently unsafe, so with such a high base rate, even a bit of optimization can result in a large absolute increase in failure. Lucius brings up one concern about what one bit could do. What could one bit do? Well, let's continue my scenario: the Oracle is not quite purged of scheming by all of your tricks, and it is able to exert exactly 1 bit of optimization power unnoticed (ie. a single binary choice). You ask it to cure cancer, and it computes a plan and comes to the database selection: it could choose MongoDB or PostgreSQL - both respectable common choices, with their own set of tradeoffs, which would be equally appropriate. However, the MongoDB devs have a nasty habit of 'blaming the users' when it comes to security, which is why MongoDB shipped with default username/passwords, which is something the more serious PostgreSQL devs fixed back in the '90s. The Oracle notes this, and that the PostgreSQL plan would have zero chance of unboxing, and exerts its one bit of power to pick MongoDB each time and/or kill candidate plans containing PostgreSQL to get a near 100% chance of unboxing. You get back all of the plans and see unanimity about MongoDB, and inner-monologues truthfully extolling its flexibility and Web 2.0 scalability and explaining why PostgreSQL comes with downsides like restrictive schema etc, and sign off on it... Whereas in your described threat model, a mere 1 bit would still wind up orders of magnitude far short of any discernible problem.
metacelsus on [Linkpost] Practically-A-Book Review: Rootclaim $100,000 Lab Leak DebateI agree, I think the most likely version of the lab leak scenario does not involve an engineered virus. Personally I would say 60% chance zoonotic, 40% chance lab leak.
kave on Daniel Kahneman has died(I assume you mean the story with him and the SS soldier; I think a couple of people got confused and thought you were referring to the fact Kahneman had died)
gerald-monroe on Modern Transformers are AGI, and Human-LevelYes, I agree. Whenever I think of things like this I focus on how what matters in the sense of "when will agi be transformational" is the idea of criticality.
I have written on it earlier but the simple idea is that our human world changes rapidly when AI capabilities in some way lead to more AI capabilities at a fast rate.
Like this whole "is this AGI" thing is totally irrelevant, all that matters is criticality. You can imagine subhuman systems using AGI reaching criticality, and superhuman systems being needed. (Note ordinary humans do have criticality albeit with a doubling time of about 20 years)
There are many forms of criticality, and the first one unlocked that won't quench easily starts the singularity.
Examples:
Investment criticality: each AI demo leads to more investment than the total cost, including failures at other companies, to produce the demo. Quenches if investors run out of money or find a better investment sector.
Financial criticality: AI services delivered by AI bring in more than they cost in revenue, and each reinvestment effectively has a greater than 10 percent ROI. This quenches once further reinvestments in AI don't pay for themselves.
Partial self replication criticality. Robots can build most of the parts used in themselves, I use post 2020 automation. This quenches at the new equilibrium determined by the percent of automation.
Aka 90 percent automation makes each human worker left 10 times as productive so we quench at 10x number of robots possible if every worker on earth was building robots.
Full self replication criticality : this quenches when matter mineable in the solar system is all consumed and made into either more robots or waste piles.
AI research criticality: AI systems research and develop better AI systems. Quenches when you find the most powerful AI the underlying compute and data can support.
You may notice 2 are satisfied, one eoy 2022, one later 2023. So in that sense the Singularity began and will accelerate until it quenches, and it may very well quench on "all usable matter consumed".
Ironically this makes your central point correct. Llms are a revolution.
adamzerner on Some Things That Increase Blood Flow to the BrainThe subtext here seems to be that such references are required. I disagree that it should be.
It is frequently helpful but also often a pain to dig up, so there are tradeoffs at play. For this post, I think it was fine to omit references. I don't think the references would add much value for most readers and I suspect Romeo wouldn't have found it worthwhile to post if he had to dig up all of the references before being able to post.
unexpectedvalues on My PhD thesis: Algorithmic Bayesian EpistemologyThanks! Here are some brief responses:
From the high level summary here it sounds like you're offloading the task of aggregation to the forecasters themselves. It's odd to me that you're describing this as arbitrage.
Here's what I say about this anticipated objection in the thesis:
For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate value (as in our example), the principal does not find out whether they originally agreed. Second, even if the principal only seeks to act based on some aggregate of the experts' opinions, their method of aggregation may be different from the one that experts use to collude. For instance, the principal may have a private opinion on the trustworthiness of each expert and wishes to average the experts' opinions with corresponding weights. Collusion among the experts denies the principal this opportunity. Third, a principal may wish to track the accuracy of each individual expert (to figure out which experts to trust more in the future, for instance), and collusion makes this impossible. Fourth, the space of collusion strategies that constitute arbitrage is large. In our example above, any report in [0.546, 0.637] would guarantee a profit; and this does not even mention strategies in which experts report different probabilities. As such, the principal may not even be able to recover basic information about the experts' beliefs from their reports.
For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters.
This would indeed be arbitrage-free, but likely not proper: it wouldn't necessarily incentivize each expert to report their true belief; instead, an expert's optimal report is going to be some sort of function of the expert's belief about the joint probability distribution over the experts' beliefs. (I'm not sure how much this matters in practice -- I defer to you on that.)
It's surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
In Chapter 4, we are thinking of experts as having immutable beliefs, rather than beliefs that change upon hearing other experts' beliefs. Is this a silly model? If you want, you can think of these beliefs as each expert's belief after talking to the other experts a bunch. In theory(?) the experts' beliefs should converge (though I'm not actually clear what happens if the experts are computationally bounded); but in practice, experts often don't converge (see e.g. the FRI adversarial collaboration on AI risk [EA · GW]).
It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts.
Yup -- in my summary I described "robust aggregation" as "finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures." In fact, you can't do anything interesting in the worse case over all information structures. The assumption I make in the chapter in order to get interesting results is, roughly, that experts' information is substitutable rather than complementary (on average over the information structure). The sort of scenario you describe in your example is the type of example where Alice and Bob's information might be complementary.
gwern on rhollerith_dot_com's ShortformI quality-downvoted it for being silly, but agree-upvoted it because AFAICT that string does indeed contain all the (lowercase) letters of the English alphabet.
gwern on rhollerith_dot_com's ShortformI agree. The problem with AI-generated images is that any image you can generate with a prompt like "robot looking at chessboard" is going to contain, almost by definition, no more information than that prompt did, but it takes a lot longer than reading the prompt to look at the image and ascertain that it contains no information and is just AI-generated imagery added 'to look nice'. This is particularly jarring on a site like LW2 where, for better or worse, images are rarely present and usually highly-informative and dense with information when present.
Worse, they usually don't 'look nice' either. Most of the time, people who use AI images can't even be bothered to sample one without blatant artifacts, or to do some inpainting to fix up the worst anomalies, or figure out an appropriate style. The samples look bad to begin with, and a year later, they're going to look even worse and more horribly dated, and make the post look much worse, like a spammer wrote it. (Almost all images from DALL-E 2 are already hopelessly nasty looking, and stuff from Midjourney-v1--3 and SD1.x likewise, and SD2/SD-XL/Midjourneyv4/5 are ailing.) It would be better if the authors of such posts could just insert text like [imagine 'a robot looking at a chessboard' here]
if they are unable to suppress their addiction to SEO images; I can imagine that better than they can generate it, it seems.
So my advice would be that if you want some writing to still be read in a year and it to look good, then you should learn how to use the tools and spend at least an hour per image; and if you can't do that, then don't spend time on generating images at all (unless you're writing about image generation, I suppose). Quickies are fine for funny tweets or groupchats, but serious readers deserve better. Meaningless images don't need to be included, and the image generators will be much better in a year or two anyway and you can go back and add them if you really feel the need.
For Gwern.net, I'm satisfied with the images I've generated for my dropcap fonts or as thumbnail previews for a couple of my pages like "Suzanne Delage" or "Screwfly Solution", but I also put in a lot of work: I typically generate scores to hundreds of images, varying them heavily and randomizing as much as possible, before inpainting or tweaking them. (I generally select at a 1:4 or less ratio, and then select out of a few dozen; I archive a lot of the first-stage images in my Midjourney & DALL-E 3 tag-directories if you want to browse them.) It takes hours. But I am confident I will still like them years from now.
jeffjo on The Solution to Sleeping BeautyThe link I use to get here only loads the comments, so I didn't find the "Effects of Amnesia" section until just now. Editing it:
"But in my two-coin case, the subject is well aware about the setting of the experiment. She knows that her awakening was based on the current state of the coins. It is derived from, but not necessarily the same as, the result of flipping them. She only knows that this wakening was based on their current state, not a state that either precedes or follows from another. And her memory loss prevents her from making any connection between the two. As a good Bayesian, she has to use only the relevant available information that can be applied to the current state."
charlie-steiner on Charlie Steiner's ShortformDictionary/SAE learning on model activations is bad as anomaly detection because you need to train the dictionary on a dataset, which means you needed the anomaly to be in the training set.
How to do dictionary learning without a dataset? One possibility is to use uncertainty-estimation-like techniques to detect when the model "thinks its on-distribution" for randomly sampled activations.