Posts
Comments
Thanks for writing such a comprehensive explanation!
Why is downvoting disabled, for how long has it been like this, and when will it be back?
In support of your point, MIRI itself changed (in the opposite direction) from its former stance on AI research.
You've been around long enough to know this, but for others: The former ambition of MIRI in the early 2000s—back when it was called the SIAI—was to create artificial superintelligence, but that ambition changed to ensuring AI friendliness after considering the "terrible consequences [now] feared by the likes of MIRI".
In the words of Zack_M_Davis 6 years ago:
(Disclaimer: I don't speak for SingInst, nor am I presently affiliated with them.)
But recall that the old name was "Singularity Institute for Artificial Intelligence," chosen before the inherent dangers of AI were understood. The unambiguous for is no longer appropriate, and "Singularity Institute about Artificial Intelligence" might seem awkward.
I seem to remember someone saying back in 2008 that the organization should rebrand as the "Singularity Institute For or Against Artificial Intelligence Depending on Which Seems to Be a Better Idea Upon Due Consideration," but obviously that was only a joke.
I've always thought it's a shame they picked the name MIRI over SIFAAIDWSBBIUDC.
- who on lesswrong tracks their predictions outside of predictionbook, and their thoughts on that method
Just adding to the other responses: I also use Metaculus and like it a lot. In another thread, I posted a rough note about its community's calibration.
Compared to PredictionBook, the major limitation of Metaculus is that users cannot create and predict on arbitrary questions, because questions are curated. This is an inherent limitation/feature for a website like Metaculus because they want the community to focus on a set of questions of general interest. In Metaculus's case, 'general interest' translates mostly to 'science and technology'; for questions on politics, I suggest taking a look at GJ Open instead.
Here is the full text article that was actually published by Kahneman et al. (2011) in Harvard Business Review, and here is the figure that was in HBR:
Is there any information on how well-calibrated the community predictions are on Metaculus?
Great question! Yes. There was a post on the official Metaculus blog that addressed this, though this was back in Oct 2016. In the past, they've also sent to subscribed users a few emails that looked at community calibration.
I've actually done my own analysis on this around two months ago, in private communication. Let me just copy two of the plots I created and what I said there. You might want to ignore the plots and details, and just skip to the "brief summary" at the end.
(Questions on Metaculus go through an 'open' phase then a 'closed' phase; predictions can only be made and updated while the question is open. After a question closes, it gets resolved either positive or negative once the outcome is known. I based my analysis on the 71 questions that have been resolved as of 2 months ago; there are around 100 resolved questions now.)
First, here's a plot for the 71 final median predictions. The elements of this plot:
Of all monotonic functions, the black line is the one that, when applied to this set of median predictions, performs the best (in mean score) under every proper scoring rule given the realized outcomes. This can be interpreted as a histogram with adaptive bin widths. So for instance, the figure shows that, binned together, predictions from 14% to 45% resolved positive around 0.11 of the time. This is also the maximum-likelihood monotonic function.
The confidence bands are for the null hypothesis that the 71 predictions are all perfectly calibrated and independent, so that we can sample the distribution of counterfactual outcomes simply by treating the outcome of each prediction with credence p as an independent coin flip with probability p of positive resolution. I sampled 80,000 sets of these 71 outcomes, and built the confidence bands by computing the corresponding maximum-likelihood monotonic function for each set. The inner band is pointwise 1 sigma, whereas the outer is familywise 2 sigma. So the corner of the black line that exceeds the outer band around predictions of 45% is a p < 0.05 event under perfect calibration, and it looks to me that predictions around 30% to 40% are miscalibrated (underconfident).
The two rows of tick marks below the x-axis show the 71 predictions, with the upper green row comprising positive resolutions, and the lower red row comprising negatives.
The dotted blue line is a rough estimate of the proportion of questions resolving positive along the range of predictions, based on kernel density estimates of the distributions of predictions giving positive and negative resolutions.
Now, a plot of all 3723 final predictions on the 71 questions.
The black line is again the monotonic function that minimizes mean proper score, but with the 1% and 99% predictions removed because—as I expected—they were especially miscalibrated (overconfident) compared to nearby predictions.
The two black dots indicate the proportion of question resolving positive for 1% and 99% predictions (around 0.4 and 0.8).
I don't have any bands indicating dispersion here because these predictions are a correlated mess that I can't deal with. But for predictions below 20%, the deviation from the diagonal looks large enough that I think it shows miscalibration (overconfidence).
Along the x-axis I've plotted kernel density estimates of the predictions resolving positive (green, solid line) and negative (red, dotted line). Kernel densities were computed under log-odds with Gaussian kernels, then converted back to probabilities in [0, 1].
The blue dotted line is again a rough estimate of the proportion resolving positive, using these two density estimates.
Brief summary:
- Median predictions around 30% to 40% occur less often than claimed.
- User predictions below around 20% occur more often than claimed.
- User predictions at 1% and 99% are obviously overconfident.
- Other than these, calibration seems okay everywhere else; at least, they aren't obviously off.
- I'm very surprised that user predictions look fairly accurate around 90% and 95% (resolving positive around 0.85 and 0.90 of the time). I expected strong overconfidence like that shown by the predictions below 20%.
Also, if one wanted to get into it, could you describe what your process is?
Is there anything in particular that you want to hear about? Or would you rather have a general description of 1) how I'd suggest starting out on Metaculus, and/or 2) how I approach making and updating predictions on the site, and/or 3) something else?
(The FAQ is handy for questions about the site. It's linked to by the 'help' button at the button of every page.)
That's some neat data and observation! Could there be other substantial moderating differences between the days when you generate ~900 kJ and the days when you don't? (E.g., does your mental state before you ride affect how much energy you generate? This could suggest a different causal relationship.) If there are, maybe some of these effects can be removed if you independently randomize the energy you generate each time you ride, so that you don't get to choose how much you ride.
To make this a single-blinded experiment, just wear a blindfold; to double blind, add a high-beam lamp to your bike; and to triple blind, equip and direct high beams both front and rear.
… okay, there will be no blinding.
Polled.
I generally do only a quick skim of post titles and open threads (edit: maybe twice a month on average; I'll try visiting more often). I used to check LW compulsively prior to 2013, but now I think both LW and I have changed a lot and diverged from each other. No hard feelings, though.
I rarely click link posts on LW. I seldom find them interesting, but I don't mind them as long as other LWers like them.
I mostly check LW through a desktop browser. Back in 2011–2012, I used Wei Dai's "Power Reader" script to read all comments. I also used to rely on Dbaupp's "scroll to new comments" script after they posted it in 2011, but these days I use Bakkot's "comment highlight" script. (Thanks to all three of you!)
I've been on Metaculus a lot over the past year. It's a prediction website focusing on science and tech (the site's been mentioned a few times on LW, and in fact that's how I heard of it). It's sort of like a gamified and moderated PredictionBook. (Edit: It's also similar to GJ Open, but IMO, Metaculus has way better questions and scoring.) It's a more-work-less-talk kind of website, so it's definitely not a site for general discussions.
I've been meaning to write an introductory post about Metaculus… I'll get to that sometime.
Given that one of LW's past focus was on biases, heuristics, and the Bayesian interpretation of probability, I think some of you might find it worthwhile and fun to do some real-world practice on manipulating subjective probabilities based on finding evidence. Metaculus is all about that sort of stuff, so join us! (My username there is 'v'. I recognize a few of you, especially WhySpace, over there.) The site itself is under continual improvement and work, and I know that the admins have high ambitions for it.
Edit: By the way, this is a great post and idea. Thanks!
I haven't been around for a while, but I expect to start fulfilling the backlog of requests after Christmas. Sorry for the long wait.
Do we know which country Wright was living in during 2010?
Here.
Requested.
The article is available on various websites by exact phrase searching, but there are some minor transcription errors in these copies. I've transcribed it below using Google's copy of the scanned article to correct these errors. There seems to be a relevant captioned figure (maybe a photo of Fuller?) on p. 63 of the magazine that is missing from the scan.
Dymaxion Sleep
Sleep is just a bad habit. So said Socrates and Samuel Johnson, and so for years has thought grey-haired Richard Buckminster Fuller, futurific [sic] inventor of the Dymaxion* house (Time, Aug. 22, 1932), the Dymaxion car and the Dymaxion globe. Fuller made a deliberate attempt to break the sleep habit, with excellent results. Last week he announced his Dymaxion system of sleeping. Two hours of sleep a day, he said firmly, is plenty.
Fuller reasoned that man has a primary store of energy, quickly replenished, and a secondary reserve (second wind) that takes longer to restore. Therefore, he thought, a man should be able to cut his rest periods shorter by relaxing as soon as he has used up his primary energy. Fuller trained himself to take a nap at the first sign of fatigue (i.e., when his attention to his work began to wander). These intervals came about every six hours; after a half-hour's nap he was completely refreshed.
For two years Fuller thus averaged two hours of sleep in 24. Result: “The most vigorous and alert condition I have ever enjoyed.” Life-insurance doctors who examined him found him sound as a nut. Eventually he had to quit because his schedule conflicted with that of his business associates, who insisted on sleeping like other men. Now working for the Foreign Economic Administration, Buckminster Fuller finds Dymaxion working and sleeping out of the question. But he wishes the nation's “key thinkers” could adopt his schedule; he is convinced it would shorten the war.
Intermittent sleeping was not originated by Fuller, has respectable scientific backing. [sic] Last week the Industrial Bulletin of Arthur D. Little, Inc., famed Cambridge, Mass. research firm, which published Fuller's sleeping plan, noted a strong point in its favor: most sleep investigators agree that the first hours of sleep are the soundest. Some pro-Fuller evidence:
Photographs and electric devices to record movements show that the average sleeper, who changes position at least 40 times during an eight-hour stretch, is quietest in the first two hours, then grows progressively more restless.
At Colgate University sleep investigator Donald A. Laird found that people awakened after four hours' sleep were just as alert, well-coordinated physically and resistant to fatigue as those who slept eight hours (but they did lose in accuracy and concentration).
* A Fuller word representing “dynamic” and “maximum service.”
Here. Figures 4 and 5 are missing from the scan that I received. Dope ads.
Requested.
From the linked Wired article:
The PGP key associated with Nakamoto’s email address and references to an upcoming “cryptocurrency paper” and “triple entry accounting” were added sometime after 2013.
Gwern's comment in the Reddit thread:
[...] this is why we put our effort into nailing down the creation and modification dates of the blog post in third-party archives like the IA and Google Reader.
These comments seem to partly refer to the 2013 mass archive of Google Reader just before it was discontinued. For others who want to examine the data: the relevant WARC records for gse-compliance.blogspot.com
are in line 110789824 to line 110796183 of greader_20130604001315.megawarc.warc
, which is about three-quarters of the way into the file. I haven't checked the directory and stats grabs and don't plan to, as I don't want to spend any more time on this.
NB: As for any other large compressed archives, if you plan on saving the data, then I suggest decompressing the stream as you download it and recompressing into a seekable structure. Btrfs with compression works well, but blocked compression implementations like bgzip
should also work in a pinch. If you leave the archive as a single compressed stream, then you'll pull all your hair out when you try to look through the data.
3. Here.
Huh. I never knew there were so many other plants that had similar effects on cats.
Anyway, best of luck getting Todd's work… and getting cats high.
Why the interest in catnip?
- Requested.
- Sadly, I can't request entire dissertations. I'm sure there are Harvard students on LW; maybe try asking for help in the open thread?
- Requested.
Still can't get it. I should be able to access it through an institutional subscription to the EBSCO database once the paper is assigned to an issue, replacing its current "online first" designation.
Nice paper.
p. 558 (Study 4):
Participants also completed a ten item personality scale (Gosling, Rentfrow & Swann, 2003) [the TIPI; an alternative is Rammstedt and John's BFI-10] that indexes individual differences in the Big Five personality traits (extraversion, agreeableness, conscientiousness, emotional stability, and openness). These data will not be considered further.
It's strange not to say why the data will not be considered further. The data are available, the reduction is clean, but the keys look a bit too skeletal given that copies of the orignal surveys don't seem to be available (perhaps because Raven's APM and possibly some other scales are copyrighted). Still, it's great of the journal and the authors to provide the data. Anyway, I'll take a look.
The supplement contains the statements and the corresponding descriptive statistics for their profundity ratings. It's an entertaining read.
ETA: For additional doses of profundity, use Armok_GoB's profound LW wisdom generator.
Here.
Requested.
See also Patton's (1988) "Can bad men make good brains do bad things?" (AKA "Brain in a vat on a trolley"), published in APA Proceedings.
Here.
it blew up to 14M
The object streams for indirect objects have been unpacked and stripped away, leaving their contents uncompressed. Use qpdf
to regenerate compressed object streams:
qpdf --object-streams=generate in.pdf out.pdf
(The --stream-data=compress
option is already set by default.)
While you are at it, might as well re-linearize the PDF for online readers with low bandwidth:
qpdf --object-streams=generate --linearize in.pdf out.pdf
Got the whole PDF from HathiTrust. I think Chart I is missing from the scan.
Downvoted. I'm sorry to be so critical, but this is the prototypical LW mischaracterization of utility functions. I'm not sure where this comes from, when the VNM theorem gets so many mentions on LW.
A utility function is, by definition, that which the corresponding rational agent maximizes the expectation of, by choosing among its possible actions. It is not "optimal as the number of bets you take approaches infinity": first, it is not 'optimal' in any reasonable sense of the word, as it is simply an encoding of the actions which a rational agent would take in hypothetical scenarios; and second, it has nothing to do with repeated actions or bets.
Humans do not have utility functions. We do not exhibit the level of counterfactual self-consistency that is required by a utility function.
The term "utility" used in discussions of utilitarianism is generally vaguely-defined and is almost never equivalent to the "utility" used in game theory and related fields. I suspect that is the source of this never-ending misconception about the nature of utility functions.
Yes, it is common, especially on LW and in discussions of utilitarianism, to use the term "utility" loosely, but don't conflate that with utility functions by creating a chimera with properties from each. If the "utility" that you want to talk about is vaguely-defined (e.g., if it depends on some account of subjective preferences, rather than on definite actions under counterfactual scenarios), then it probably lacks all of useful mathematical properties of utility functions, and its expectation is no longer meaningful.
I see. Looking into this, it seems that the (mis)use of the phrase "confidence interval" to mean "credible interval" is endemic on LW. A Google search for "confidence interval" on LW yields more than 200 results, of which many—perhaps most—should say "credible interval" instead. The corresponding search for "credible interval" yields less than 20 results.
I briefly skimmed the paper and don't see how you are getting this impression. Confidence intervals are—if we force the dichotomy—considered a frequentist rather than Bayesian tool. They point out that others are trying to squish a Bayesian interpretation on a frequentist tool by treating confidence intervals as though they are credible intervals, and they state this quite explicitly (p.17–18, emphasis mine):
Finally, we believe that in science, the meaning of our inferences are important. Bayesian credible intervals support an interpretation of probability in terms of plausibility, thanks to the explicit use of a prior. Confidence intervals, on the other hand, are based on a philosophy that does not allow inferences about plausibility, and does not utilize prior information. Using confidence intervals as if they were credible intervals is an attempt to smuggle Bayesian meaning into frequentist statistics, without proper consideration of a prior. As they say, there is no such thing as a free lunch; one must choose. We suspect that researchers, given the choice, would rather specify priors and get the benefits that come from Bayesian theory. We should not pretend, however, that the choice need not be made. Confidence interval theory and Bayesian theory are not interchangeable, and should not be treated as so.
Here. Sorry about the horrible format; I didn't see a better way to download the content or print the page. In addition, I couldn't access the figures.
Page-by-page .djvu scans are available here (found via this search; edit: it seems to appear sporadically in the search results). Full sequence of download links is http://202.116.13.3/ebook%5C24/24000522/ptiff/00000{001..744}.djvu
I wrote the following just before finding the scan of the book. I'll post it anyway.
I've used 1DollarScan for about 50 books, including math/stat textbooks, and the quality is consistently good (unless you need accurate color reproduction) even with the cheapest option (i.e., $1 per 100 pages), but you'll need to do your own post-processing to:
- Lossily compress further and binarize B/W text; expect about 400 KB/page from 1DollarScan.
- Perform OCR; 1DollarScan's OCR option is expensive and performs okay at best.
- Straighten pages; pages are often offset slightly from the vertical.
- Add metadata (e.g., page numbering, section bookmarks).
I use Adobe Acrobat with ABBYY FineReader for these. FineReader's OCR is more accurate than Acrobat's, but Acrobat performs okay by itself. Acrobat's trial can be indefinitely reactivated every month in a Windows VM by reverting to a pre-activation snapshot, whereas FineReader has to be bought or torrented, as its trial is overly restrictive. I don't know of any good options on Linux.
BTW, there's a used copy on Half.com for $39. Not sure if you saw that.
You take the probability of A not happening and multiply by the probability of B not happening. That gives you P(not A and not B).
Only if A and B are independent.
No. "Expected value" refers to the expectation of a variable under a probability distribution, whereas "expected utility" refers specifically to the expectation of a utility function under a probability distribution. That is, expected utility is a specific instantiation of an expected value; expected value is more general than expected utility and can refer to things other than utility.
The importance of this distinction often arises when considering the utility of large sums of money: a person may well decline a deal or gamble with positive expected value (of money) because the expected utility can be negative (for example, see the St. Petersburg paradox).
See also this highly-upvoted question on the Physics Stack Exchange, which deals with your question.
Interesting. Thanks for posting this!
I received exactly the same number of SNPs from BGI, so it looks like our data were processed under the same pipeline. I've found three people who have publicly posted their BGI data: two at the Personal Genome Project (hu2FEC01 and hu41F03B, each with 5,095,048 SNPs), and one on a personal website (with 18,217,058 SNPs).
Then there are a few thousand SNPs that one or other analysis (in 26 cases, both) list in their output but don't report anything for. What causes this?
The double dashes are no calls. 23andme reports on a set list of SNPs, and instead of omitting an SNP when they can't confidently determine the genotype, they indicate this with a double dash.
Is this amount of mismatch typical for such analyses?
This seems normal considering the error rates from 23andme that others have been reporting (example). I don't know about BGI's error rates.
I think it might be possible to accurately guess the actual genotypes for some of the mismatches by imputing the genotypes with something like Impute2 (for each mismatched SNP, leave it out and impute it using the nearby SNPs). This will take many hours of work, though, and you might as well phase and impute across the whole genome if you have the time, interest, and processing power to do so (I've been meaning to try this out to learn more about how these things work).
ILL couldn't get Schretlen et al. Can try again once the paper is included in the print journal, but I'd recommend just asking the authors for a copy.
I'm still waiting for Schretlen et al.
Dale and Krueger's paper was revised and published in the Journal of Human Resources under the new title "Estimating the effects of college characteristics over the career using administrative earnings data".
I see. In GAZP vs. GLUT, Eliezer argues that the only way to feasibly create a perfect imitation of a specific human brain is to do computations that correspond in some way to the functional roles behind mental states, which will produce identical conscious experiences according to functionalism.
For uploading, that means whole brain emulation. In my underinformed opinion, whole brain emulation is not the Way to AGI if you just want AGI. At some point, then, AGI will be available while WBE systems will be way behind; and so, uploaders will at least temporarily face a deeply serious choice on this issue.
Are you suggesting that mind uploading to a non-WBE platform will be available before WBE? I don't think this is a common belief; uploading almost exclusively refers to WBE. See, for instance, Sandberg and Bostrom (2008), who don't distinguish between WBE and uploading:
Whole brain emulation, often informally called “uploading” or “downloading”, has been the subject of much science fiction and also some preliminary studies.
I think it is indeed a common belief that AGI may come before WBE, but as far as I know, it is not commonly believed that AGI will provide an alternative route to WBE, because human minds will likely not be feasibly translatable to the AGI architectures that we come up with.
The relevant paragraph is in Section 2.2.5:
OCD is ranked by the WHO in the top ten of the most disabling illnesses by lost income and decreased quality of life (Bobes et al., 2001). The severity of OCD differs markedly from one person to another. While some people may be able to hide their OCD from their own family, the disorder may have a major negative impact on social relationships leading to frequent family and marital discord or dissatisfaction, separation or divorce (Koran, 2000). It also interferes with leisure activities (Antony et al., 1998) and with a person’s ability to study or work, leading to diminished educational and/or occupational attainment and unemployment (Koran, 2000; Leon et al., 1995). The social cost (that is the person’s inability to fully function in society) has been estimated as US$5.9 billion in 1990, or 70.4% of the total economic cost of OCD (DuPont et al., 1995).
Following the Bobes et al. citation yields:
Obsessive-compulsive disorder is the fourth most common psychiatric disorder, occurring in 2–3% of the U.S. population [14]. Furthermore, OCD is a chronic and disabling illness that impacts negatively on the academic, occupational, social, and family function of patients [11, 12, 17]. This impact carries over onto their families, friends, and society [12]. Indeed, OCD ranks tenth in the World Bank’s and WHO’s ten leading causes of disability and, in the case of women aged 15–44 years, OCD occupies the fifth position [22]. In spite of this situation, to our knowledge data on quality of life and disability has scarcely been reported in OCD patients.
Reference 22 is to "WHO Fact Sheet no. 217", which no longer exists. Luckily, the Wayback Machine has a copy. The relevant point:
It is also of great significance that 5 of the 10 leading causes of disability worldwide (major depression, schizophrenia, bipolar disorders, alcohol use, obsessive compulsive disorders) are mental problems. They are as relevant in developing countries as they are in industrialised societies
Unfortunately, there is no citation and it does not precisely match Bobes et al's claims. Neither Bobes et al. nor the WHO fact sheet refers to lost income, so the reference to that in the original claim is wholly unsubstantiated by these citations.
The last one.
I don't think I can get the two dissertations. I'll put in ILL requests for the other papers over the next week.
Chapter 3 is available from the publisher as a sample.
(BTW, this is an old help desk thread; the newest one is here.)
Is there any chance your sequencing had greater than 4x coverage?
I don't know. How do I find out?
There is the References & Resources for LessWrong (last updated in 2011), which has a good selection of older posts and other resources by subject.
Are you sure you've downloaded your entire genome file? My uncompressed file is about 500 MB, and I got about 26000 annotations on Promethease. It seems like your file might have gotten truncated during the download.
Short step-by-step guide for those who want to get their genome annotated by Promethease:
- Use the 'Download All Files' link on the SpiderOak page to download your genome file.*
- Unzip then gunzip to get the raw text file
genome.txt
. - Open the file in a text editor. Remove all the commented lines at beginning of the file except the last one (i.e., keep the line starting with
# rsid
; Promethease chokes if you don't) and save. This is required to get Promethease to recognize the file. - (optional) Compress the edited file with zip, gzip, or bzip2 to save upload time and bandwidth.
- Upload to Promethease and follow the directions there.
* I advise against downloading the genome.txt.gz
file directly because for some reason SpiderOak has Content-encoding: gzip
in their HTTP response header, which means that browsers will transparently uncompress that file. This makes me uneasy because there is no checksum provided for the (somewhat large) plain text file, so we have little protection against corruption and truncation. In contrast, by using 'Download All Files' to download everything in a zip, the data's integrity will be automatically verified against CRC-32 checksums when we unzip and gunzip locally.
Here.
Not directly answering your conundrum on wrist computers, but—I go trail running frequently (in Hong Kong), so I've thought a bit about wearable devices and safety. Here are some of my solutions and thoughts:
I use a forearm armband (example) to hold my phone in a position that allows me to use and see the touchscreen while running. I find this incredibly useful for checking a GPS map on the run while keeping both hands free for falls. I worry that the current generation of watches are nowhere near as capable as my phone.
I rely a lot on Strava's online route creation tool and phone app for navigation.
Digital personal locator beacons on the 406 MHz channel (example) are the current gold standard for distress signals.
Sharing your location through your phone (e.g., on Google+) can give some peace of mind to your family and friends.
An inactivity detector based on a phone's accelerometer might be a useful dead man switch for sending a distress SMS/email in the event of an accident that renders you incompetent. I haven't gotten around to setting this up on my phone, but here's an (untested) example of an app that might work.
In case of emergency, it might be useful to have a GPS app on your phone that can display your grid reference so that you can tell rescuers where to find you.