LessWrong 2.0 Reader
View: New · Old · Top← previous page (newer posts) · next page (older posts) →
← previous page (newer posts) · next page (older posts) →
Huh, that's indeed somewhat surprising if the SAE features are capturing the things that matter to CLIP (in that they reduce loss) and only those things, as opposed to "salient directions of variation in the data". I'm curious exactly what "failing to work" means -- here I think the negative result (and the exact details of said result) are argubaly more interesting than a positive result would be.
dagon on Thomas Kwa's ShortformI think this leans a lot on "get evidence uniformly over the next 10 years" and "Brownian motion in 1% steps". By conservation of expected evidence, I can't predict the mean direction of future evidence, but I can have some probabilities over distributions which add up to 0.
For long-term aggregate predictions of event-or-not (those which will be resolved at least a few years away, with many causal paths possible), the most likely updates are a steady reduction as the resolution date gets closer, AND random fairly large positive updates as we learn of things which make the event more likely.
johannes-c-mayer on Johannes C. Mayer's ShortformYes, that is a good point. I think you can totally write a program that checks given two lists as input, xs and xs', that xs' is sorted and also contains exactly all the elements from xs. That allows us to specify in code what it means that a list xs' is what I get when I sort xs.
And yes I can do this without talking about how to sort a list. I nearly give a property such that there is only one function that is implied by this property: the sorting function. I can constrain what the program can be totally (at least if we ignore runtime and memory stuff).
johannes-c-mayer on My hour of memoryless lucidityTo test whether Drake’s circumvention of his short-term memory loss worked via the intended mechanism, I could ask my girlfriend in advance to prompt me once — and only once — to complete the long-term memory scene that I had been practicing. Then I could see if I have a memory of the scene after I fully regain my memory.
Maybe you need to think the thought many times over in order to overwrite the original memory. In your place, I would try to prepare something similar to what Drake did. Some mental objects that you can retrieve have a predesigned hole to put information. To me, it seems like this should not be that hard to get. Then for ideally 30 minutes or so (though the streaming algorithm experiment seems also very interesting) after the surgery when you don't have short-term memory, you can repeatedly try to insert some specific object in the memory.
Maybe it would make sense for the sake of the experiment to limit yourself to 3 possible objects that could be inserted. Your girlfriend can then choose one randomly after surgery, for you to drill into the memory, by repeatedly thinking about the scene completed with that specific object.
Then after the 30 minutes, you do something completely different. Then 1 hour afterwards your girlfriend can ask you what the object was that she told you 1 hour ago. Well and probably many times during the first 30 minutes.
Probably it would be best if your girlfriend (or whatever person is willing to do this) constantly reminds you during the first 30 minutes or so that you need to imagine the object. Probably at least every minute or so.
dagon on Johannes C. Mayer's ShortformI kind of see what you're saying, but I also rather think you're talking about specifying very different things in a way that I don't think is required. The closer CS definition of math's "define a sorted list" is "determine if a list is sorted". I'd argue it's very close to equivalent to the math formality of whether a list is sorted. You can argue about the complexity behind the abstraction (Math's foundations on set theory and symbols vs CS library and silicon foundations on memory storage and "list" indexing), but I don't think that's the point you're making.
When used for different things, they're very different in complexity. When used for the same things, they can be pretty similar.
ryan_b on How do you actually obtain and report a likelihood function for scientific research?I was absolutely certain I had responded to this, because I had taken the trouble to search for and locate a description of the procedure used in particle physics, which appears to be the central place where likelihood functions are the preferred tool.
Seems I wrote it but never submitted it, so in this here placeholder comment I vouchsafe to hunt that resource down again and put it here in an edit.
Edit: As I promised, the resource: https://ep-news.web.cern.ch/what-likelihood-function-and-how-it-used-particle-physics
This is a short article from by a person from CERN, Robert Cousins. It covers in brief what likelihood is and how it is different than probability, then a short description of three different methods of using a likelihood function (here listed as Likelihoodist, Neman-Pearson, and Bayesian), and then on to a slightly more advanced example. It has references which include some papers from the work on identifying the Higgs Boson, and some of his own relevant papers.
nebuchadnezzar on Which skincare products are evidence-based?I would also like to recommend the INCI (International Nomenclature of Cosmetic Ingredients) decoder tool: https://incidecoder.com/. It explains the ingredients of your skincare products and points out potential hazards, such as irritancy and comedogenicity. It's easy to use and you have the ability to compare products.
lawrencec on Thomas Kwa's ShortformThe general version of this statement is something like: if your beliefs satisfy the law of total expectation, the variance of the whole process should equal the variance of all the increments involved in the process.[1] In the case of the random walk where at each step, your beliefs go up or down by 1% starting from 50% until you hit 100% or 0% -- the variance of each increment is 0.01^2 = 0.0001, and the variance of the entire process is 0.5^2 = 0.25, hence you need 0.25/0.0001 = 2500 steps in expectation. If your beliefs have probability p of going up or down by 1% at each step, and 1-p of staying the same, the variance is reduced by a factor of p, and so you need 2500/p steps.
(Indeed, something like this standard way to derive the expected steps before a random walk hits an absorbing barrier).
Similarly, you get that if you start at 20% or 80%, you need 1600 steps in expectation, and if you start at 1% or 99%, you'll need 99 steps in expectation.
One problem with your reasoning above is that as the 1%/99% shows, needing 99 steps in expectation does not mean you will take 99 steps with high probability -- in this case, there's a 50% chance you need only one update before you're certain (!), there's just a tail of very long sequences. In general, the expected value of variables need not look like
I also think you're underrating how much the math changes when your beliefs do not come in the form of uniform updates. In the most extreme case, suppose your current 50% doom number comes from imagining that doom is uniformly distributed over the next 10 years, and zero after -- then the median update size per week is only 0.5/520 ~= 0.096%/week, and the expected number of weeks with a >1% update is 0.5 (it only happens when you observe doom). Even if we buy a time-invariant random walk model of belief updating, as the expected size of your updates get larger, you also expect there to be quadratically fewer of them -- e.g. if your updates came in increments of size 0.1 instead of 0.01, you'd expect only 25 such updates!
Applying stochastic process-style reasoning to beliefs is empirically very tricky, and results can vary a lot based on seemingly reasonable assumptions. E.g. I remember Taleb making a bunch of mathematically sophisticated arguments[2] that began with "Let your beliefs take the form of a Wiener process[3]" and then ending with an absurd conclusion, such as that 538's forecasts are obviously wrong because their updates aren't Gaussian distributed or aren't around 50% until immediately before the elction date. And famously, reasoning of this kind has often been an absolute terrible idea in financial markets. So I'm pretty skeptical of claims of this kind in general.
There's some regularity conditions here, but calibrated beliefs that things you eventually learn the truth/falsity of should satisfy these by default.
Often in an attempt to Euler people who do forecasting work but aren't super mathematical, like Philip Tetlock.
This is what happens when you take the limit of the discrete time random walk, as you allow for updates on ever smaller time increments. You get Gaussian distributed increments per unit time -- W_t+u - W_t ~ N(0, u) -- and since the tail of your updates is very thin, you continue to get qualitatively similar results to your discrete-time random walk model above.
And yes, it is ironic that Taleb, who correctly points out the folly of normality assumptions repeatedly, often defaults to making normality assumptions in his own work.
This is a solid seeming proposal. If we are in a world where the majority of danger comes from big datacenters and large training runs, I predict that this sort of regulation would be helpful. I don't think we are in that world though, which I think limits how useful this would be. Further explanation here: https://www.lesswrong.com/posts/sfWPjmfZY4Q5qFC5o/why-i-m-doing-pauseai?commentId=p2avaaRpyqXnMrvWE [LW(p) · GW(p)]
cameron-berg on Key takeaways from our EA and alignment research surveysI expect this to generally be a more junior group, often not fully employed in these roles, with eg the average age and funding level of the orgs that are being led particularly low (and some of the orgs being more informal).
Here is the full list of the alignment orgs who had at least one researcher complete the survey (and who also elected to share what org they are working for): OpenAI, Meta, Anthropic, FHI, CMU, Redwood Research, Dalhousie University, AI Safety Camp, Astera Institute, Atlas Computing Institute, Model Evaluation and Threat Research (METR, formerly ARC Evals), Apart Research, Astra Fellowship, AI Standards Lab, Confirm Solutions Inc., PAISRI, MATS, FOCAL, EffiSciences, FAR AI, aintelope, Constellation, Causal Incentives Working Group, Formalizing Boundaries, AISC.
~80% of the alignment sample is currently receiving funding of some form to pursue their work, and ~75% have been doing this work for >1 year. Seems to me like this is basically the population we were intending to sample.
One additional factor for my abandoning it was that I couldn't imagine it drawing a useful response population anyway; the sample mentioned above is a significant surprise to me (even with my skepticism around the makeup of that population). Beyond the reasons I already described, I felt that it being done by a for-profit org that is a newcomer and probably largely unknown would dissuade a lot of people from responding (and/or providing fully candid answers to some questions).
Your expectation while taking the survey about whether we were going to be able to get a good sample does say much about whether we did end up getting a good sample. Things that better tell us whether or not we got a good sample are, eg, the quality/distribution of the represented orgs and the quantity of actively-funded technical alignment researchers (both described above).
All in all, I expect that the respondent population skews heavily toward those who place a lower value on their time and are less involved.
Note that the survey took people ~15 minutes to complete and resulted in a $40 donation being made to a high-impact organization, which puts our valuation of an hour of their time at ~$160 (roughly equivalent to the hourly rate of someone who makes ~$330k annually). Assuming this population would generally donate a portion of their income to high-impact charities/organizations by default, taking the survey actually seems to probably have been worth everyone's time in terms of EV.