Posts

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results 2023-02-23T10:48:08.766Z
Still possible to change username? 2022-05-13T13:41:03.579Z
PSA for academics in Ukraine (or anywhere else) who want to come to the United Kingdom 2022-03-08T20:00:01.775Z

Comments

Comment by gabrielrecc (pseudobison) on Survival without dignity · 2024-11-04T10:16:40.440Z · LW · GW

Love pieces that manage to be both funny and thought-provoking. And +1 for fitting a solar storm in there. There is now better evidence of very large historical solar storms than there had been during David Roodman's Open Phil review in late 2014, have been meaning to write something up about that but other things have taken priority.

Comment by gabrielrecc (pseudobison) on There is a globe in your LLM · 2024-10-10T12:19:52.437Z · LW · GW

This is cool, although I suspect that you'd get something similar from even very simple models that aren't necessarily "modelling the world" in any deep sense, simply due to first and second order statistical associations between nearby place names. See e.g. https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/j.1551-6709.2008.01003.x , https://escholarship.org/uc/item/2g6976kg .

Comment by gabrielrecc (pseudobison) on Ilya Sutskever and Jan Leike resign from OpenAI [updated] · 2024-05-15T07:51:21.266Z · LW · GW

Leopold and Pavel were out ("fired for allegedly leaking information") in April. https://www.silicon.co.uk/e-innovation/artificial-intelligence/openai-fires-researchers-558601

Comment by gabrielrecc (pseudobison) on Reproducing ARC Evals' recent report on language model agents · 2023-09-05T05:24:43.442Z · LW · GW

Nice job! I'm working on something similar.

> Next, I might get my agent to attempt the last three tasks in the report

I wanted to clarify one thing: Are you building custom prompts for the different tasks? If so, I'd be curious to know how much effort you put into these (I'm generally curious how much of your agent's ability to complete more tasks might be due to task-specific prompting, vs. the use of WebDriverIO and other affordances of your scaffolding). If not, isn't getting the agent to attempt the last three tasks as simple as copy-pasting the task instructions from the ARC Evals task specs linked in the report, and completing the associated setup instructions? 

Comment by gabrielrecc (pseudobison) on Biosecurity Culture, Computer Security Culture · 2023-08-30T20:04:18.522Z · LW · GW

Cybersecurity seems in a pretty bad state globally - it's not completely obvious to me that a historical norm of "people who discover things like SQL injection are pretty tight-lipped about them and share them only with governments / critical infrastructure folks / other cybersecurity researchers" would have led to a worse situation than the one we're in cybersecuritywise...

Comment by gabrielrecc (pseudobison) on How to find AI alignment researchers to collaborate with? · 2023-07-31T09:53:45.117Z · LW · GW

I'd recommend participating in AGISF. Completely online/virtual, a pretty light commitment (I'd describe it more as a reading group than a course personally), cohorts are typically run by AI alignment researchers or people who are quite well-versed in the field, and you'll be added to a Slack group which is pretty large and active and a reasonable way to try to get feedback.

Comment by gabrielrecc (pseudobison) on When can we trust model evaluations? · 2023-07-30T10:17:27.821Z · LW · GW

This is great. One nuance: This implies that behavioral RL fine-tuning evals are strictly less robust than behavioral I.I.D. fine-tuning evals, and that as such they would only be used for tasks that you know how to evaluate but not generate. But it seems to me that there are circumstances in which the RL-based evals could be more robust at testing capabilities, namely in cases where it's hard for a model to complete a task by the same means that humans tend to complete it, but where RL can find a shortcut that allows it to complete the task in another way. Is that right or am I misunderstanding something here?

For example, if we wanted to test whether a particular model was capable of getting 3 million points in the game of Qbert within 8 hours of gameplay time, and we fine-tuned on examples of humans doing the same, it might not be able to: achieving this in the way an expert human does might require mastering numerous difficult-to-learn subskills. But an RL fine-tuning eval might find the bug discovered by Canonical ES, illustrating the capability without needing the subskills that humans lean on.

Comment by gabrielrecc (pseudobison) on Long Covid Risks: 2023 Update · 2023-05-07T04:03:48.456Z · LW · GW

Nice, thanks for this!

If you want to norm this for your own demographic, you can get a very crude estimate by entering your demographic information in this calculator, dividing your risk of hospitalization by 3 and multiplying the total by 0.4 (which includes the 20% reduction from vaccination and the 50% reduction from Paxlovid)

Anecdotally, I feel like I've heard a number of instances of folks with what pretty clearly seemed to be long Covid coming on despite not having required hospitalization? And in this UK survey of "Estimated number of people (in thousands) living in private households with self-reported long COVID of any duration", it looks like only 4% of such people were hospitalized (March 2023 dataset table 1)

Comment by gabrielrecc (pseudobison) on A (EtA: quick) note on terminology: AI Alignment != AI x-safety · 2023-02-27T11:23:48.892Z · LW · GW

Irving's team's terminology has been "behavioural alignment" for the green box - https://arxiv.org/pdf/2103.14659.pdf

Comment by gabrielrecc (pseudobison) on Can ChatGPT count? · 2023-01-07T15:04:27.426Z · LW · GW

The byte-pair encoding is probably hurting it somewhat here; forcing it to unpack it will likely help. Try using this as a one-shot prompt:
 


How many Xs are there in "KJXKKLJKLJKXXKLJXKJL"? 

Numbering the letters in the string, we have: 1 K, 2 J, 3 X, 4 K, 5 K, 6 L, 7 J, 8 K, 9 L, 10 J, 11 K, 12 X, 13 X, 14 K, 15 L, 16 J, 17 X, 18 K, 19 J, 20 L. There are Xs at positions 3, 12, 13, and 17. So there are 4 Xs in total. 

How many [character of interest]s are there in "[string of interest goes here]"?
 


If it's still getting confused, add more shots - I suspect it can figure out how to do it most of the time with a sufficient number of examples.

Comment by gabrielrecc (pseudobison) on The case against AI alignment · 2022-12-24T09:32:46.055Z · LW · GW

It seems like you're claiming something along the lines of "absolute power corrupts absolutely" ... that every set of values that could reasonably be described as "human values" to which an AI could be aligned -- your current values, your CEV, [insert especially empathetic, kind, etc. person here]'s current values, their CEV, etc. -- would endorse subjecting huge numbers of beings to astronomical levels of suffering, if the person with that value system had the power to do so. 

I guess I really don't find that claim plausible. For example, here is my reaction to the following two questions in the post:

"How many ordinary, regular people throughout history have become the worst kind of sadist under the slightest excuse or social pressure to do so to their hated outgroup?"

... a very, very small percentage of them? (minor point: with CEV, you're specifically thinking about what one's values would be in the absence of social pressure, etc...)

"What society hasn’t had some underclass it wanted to put down in the dirt just to lord power over them?"

It sounds like you think "hatred of the outgroup" is the fundamental reason this happens, but in the real world it seems like "hatred of the outgroup" is driven by "fear of the outgroup". A godlike AI that is so powerful that it has no reason to fear the outgroup also has no reason to hate it. It has no reason to behave like the classic tyrant whose paranoia of being offed leads him to extreme cruelty in order to terrify anyone who might pose a threat, because no one poses a threat.

Comment by gabrielrecc (pseudobison) on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable · 2022-11-28T21:56:14.857Z · LW · GW

This reminded me of some findings associated with "latent semantic analysis", an old-school information retrieval technique. You build a big matrix where each unique term in a corpus (excluding a stoplist of extremely frequent terms) is assigned to a row, each document is assigned to a column, and each cell holds the number of times that term  appeared in document  , and with some kind of weighting scheme that downweights frequent terms), and you take the SVD. This also gives you interpretable dimensions, at least if you use varimax rotation. See for example pgs. 9-11 & pgs. 18-20 of this paper. Also, I seem to recall that the positive and negative singular values after doing latent semantic analysis are often both semantically interpretable, sometimes with antipodal pairs, although I can't find the paper where I saw this.

I'm not sure whether the right way to think about this is "you should be very circumspect about saying that 'semantic processing' is going on just because the SVD has interpretable dimensions, because you get that merely by taking the SVD of a slightly preprocessed word-by-document matrix", or rather "a lot of what we call 'semantic processing' in humans is probably just down to pretty simple statistical associations, which the later layers seem to be picking up on", but it seemed worth mentioning in any case!

edit: seems likely that the "association clusters" seen in the earlier layers might map onto what latent semantic analysis is picking up on, whereas the later layers might be picking up on semantic relationships that aren't as directly reflected in the surface-level statistical associations. could be tested!

Comment by gabrielrecc (pseudobison) on People Will Listen · 2022-11-23T10:47:54.757Z · LW · GW

Why do you expect Bitcoin to be excepted from being labelled a security along with the rest? 
(Apologies if the answer is obvious to those who know more about the subject than me, am just genuinely curious)

Comment by gabrielrecc (pseudobison) on Covid 11/10/22: Into the Background · 2022-11-10T15:52:01.400Z · LW · GW

Had a similar medical bill story from when I was a poor student: Medical center told me that insurance would cover an operation. They failed to mention that they were only talking about the surgeon's fee; the hospital at which they arranged the operation was out-of-network and I was stuck with 50% of the facility's costs. I explained my story to the facility. They said I still had to pay but that a payment plan would be possible, and that I could start by paying a small amount each month. I took that literally and just started paying a (very) small amount monthly. At some point they called back to tell me to formally arrange a payment plan through their online portal, which gave me options with such high interest rates that there was no way my future earnings would increase at a fast enough rate to make a payment plan make any sense whatsoever. I called back and explained this, and said that if those were the only options I guess I would just have to try to scrape the money together now, and that I was prepared to try to do this. The administrator, bless her heart, asked me to hold for awhile, and eventually came back to say "I've spoken with my colleagues, and your current balance owed to us is now zero dollars". 

This (along with a few other experiences in my life) has underscored how sometimes an apparently immovable constraint can evaporate if you can manage to talk to the right person. That said, I felt very lucky to have been taken pity on in this way -- I feel like having one's balance explicitly zeroed out in this way is rare! But it's interesting to hear that Zvi knows of cases where someone just didn't pay, with no consequences. I would have assumed that they'd normally report nonpayers to credit agencies and crater their credit scores after long enough, as it costs them nothing or almost nothing to do so. Would be interested either to hear other people's anecdotes of what happened after nonpayment of a large hospital bill (positive or negative), or to see data on this if anyone knows of any.

Comment by gabrielrecc (pseudobison) on Why we're not founding a human-data-for-alignment org · 2022-09-28T16:58:21.862Z · LW · GW

I was using medical questions as just one example of the kind of task that's relevant to sandwiching. More generally, what's particularly useful for this research programme are

  • tasks where we have "models which have the potential to be superhuman at [the] task", and "for which we have no simple algorithmic-generated or hard-coded training signal that’s adequate"; and
  • for which there is some set of reference humans who are currently better at the task than the model;
  • and for which there is some set of reference humans for whom the task is difficult enough that they would have trouble even evaluating/recognizing good performance. (you also want this set of reference humans to be capable of being helped to evaluate/recognize good performance in some way)

Prime examples are task types that require some kind of niche expertise to do and evaluate. Cotra's examples involve "[fine-tuning] a model to answer long-form questions in a domain (e.g. economics or physics) using demonstrations and feedback collected from experts in the domain", "[fine-tuning] a coding model to write short functions solving simple puzzles using demonstrations and feedback collected from expert software engineers", "[fine-tuning] a model to translate between English and French using demonstrations and feedback collected from people who are fluent in both languages". I was just making the point that Surge can help with this kind of thing in some domains (coding), but not in others.

Comment by gabrielrecc (pseudobison) on Why we're not founding a human-data-for-alignment org · 2022-09-28T12:29:28.565Z · LW · GW

It's worth knowing that there are some categories of data that Surge is not well positioned to provide. For example, while they have a substantial pool of participants with programming expertise, my understanding from speaking with a Surge rep is that they don't really have access to a pool of participants with (say) medical expertise -- although for small projects it sounds like they are willing to try to see who they might already have with relevant experience in their existing pool of 'Surgers'. This kind of more niche expertise does seem likely to become increasingly relevant for sandwiching experiments. I'd be interested in learning more about companies or resources that can help collect RLHF data from people with uncommon (but not super-rare) kinds of expertise for exactly this reason.

Comment by gabrielrecc (pseudobison) on Your posts should be on arXiv · 2022-09-01T10:01:01.147Z · LW · GW

I did Print to PDF in Word after formatting my Word document to look like a standard LaTeX-exported document, it had no problem going through! But might depend on the particular moderator. 

Comment by gabrielrecc (pseudobison) on The lessons of Xanadu · 2022-08-10T08:40:54.249Z · LW · GW

Sounds a little like StarWeb? Recently read a lovely article about a similar but different game, Monster Island, which was a thing from 1989 to 2017.

But yes, my default assumption would be that the particular conversation you're referring to never resulted in a game that saw the light of day; I've seen many detailed game design discussions among people I've known meet the same fate.

Comment by gabrielrecc (pseudobison) on Rant on Problem Factorization for Alignment · 2022-08-06T17:01:09.788Z · LW · GW

Thanks, I agree that's a better analogy. Though of course, it isn't necessary that none of the employees (participants in a sandwiching project) are unaware of the CEO's (sandwiching project overseer's) goal; I was only highlighting that they need not necessarily be aware of it in order to make it clear that the goals of the human helpers/judges aren't especially relevant to what sandwiching, debate, etc. is really about. But of course if it turns out that having the human helpers know what the ultimate goal is helps, then they're absolutely allowed to be in on it...

Perhaps this is a bit glib, but arguably some of the most profitable companies in the mobile game space have essentially built product assembly lines to churn out fairly derivative games that are nevertheless unique enough to do well on the charts, and they absolutely do it by factoring the project of "making a game" into different bits that are done by different people (programmers, artists, voice actors, etc.), some of whom might not have any particular need to know what the product will look like as a whole to play their part. 

However, I don't want to press too hard on this game example as you may or may not consider this 'cognitive work' and as it has other disanalogies with what we are actually talking about here. And to a certain degree I share your intuition that factoring certain kinds of tasks is probably very hard: if it wasn't, we might expect to see a lot more non-manufacturing companies whose employee main base consists of assembly lines (or hierarchies of assembly lines, or whatever) requiring workers with general intelligence but few specialized rare skills, which I think is the broader point you're making in this comment. I think that's right, although I also think there are reasons for this that go beyond just the difficulty of task factorization, and which don't all apply in the HCH etc. case, as some other commenters have pointed out.

Comment by gabrielrecc (pseudobison) on Rant on Problem Factorization for Alignment · 2022-08-06T10:00:16.865Z · LW · GW

We start with some ML model which has lots from many different fields, like GPT-n. We also have a human who has a domain-specific problem to solve (like e.g. a coding problem, or a translation to another language) but lacks the relevant domain knowledge (e.g. coding skills, or language fluency). The problem, roughly speaking, is to get the ML model and the human to work as a team, and produce an outcome at-least-as-good as a human expert in the domain. In other words, we want to factorize the “expert knowledge” and the “having a use-case” parts of the problem.
...
This sort of problem comes up all the time in real-world businesses. We could just as easily consider a product designer at a tech startup (who knows what they want but little about coding), an engineer (who knows lots about coding but doesn't understand what the designer wants)...


These examples conflate "what the human who provided the task to the AI+human combined system wants" with "what the human who is working together with the AI wants" in a way that I think is confusing and sort of misses the point of sandwiching. In sandwiching, "what the human wants" is implicit in the choice of task, but the "what the human wants" part isn't really what is being delegated or factored off to the human who is working together with the AI; what THAT human wants doesn't enter into it at all. Using Cotra's initial example to belabor the point: if someone figured out a way to get some non-medically-trained humans to work together with a mediocre medical-advice-giving AI in such a way that the output of the combined human+AI team is actually good medical advice, it doesn't matter whether those non-medically-trained humans actually care that the result is good medical advice; they might not even individually know what the purpose of the system is, and just be focused on whatever their piece of the task is - say, verifying the correctness of individual steps of a chain of reasoning generated by the system, or checking that each step logically follows from the previous, or whatever. Of course this might be really time intensive, but if you can improve even slightly on the performance of the original mediocre system, then hopefully you can train a new AI system to match the performance of the original AI+human system by imitation learning, and bootstrap from there.

The point, as I understand it, is that if we can get human+AI systems to progress from "mediocre" to "excellent" (in other words, to remain aligned with the designer's goal) -- despite the fact that the only feedback involved is from humans who wouldn't even be mediocre at achieving the designer's goal if they were asked to do it themselves -- and if we can do it in a way that generalizes across all kinds of tasks, then that would be really promising. To me, it seems hard enough that we definitely shouldn't take a few failed attempts as evidence that it can't be done, but not so hard as to seem obviously impossible.

Comment by gabrielrecc (pseudobison) on Common but neglected risk factors that may let you get Paxlovid · 2022-06-22T09:11:57.326Z · LW · GW

I just shared this info with an immune-compromised relative, thanks so much for this.

Comment by gabrielrecc (pseudobison) on Covid 5/12/22: Other Priorities · 2022-05-13T13:27:47.596Z · LW · GW

When I see young healthy people potentially obsessing, turning life into some sort of morbid probability matrix because one particular potential risk (Long Covid) has been made more salient and blameworthy, I sympathize a lot less. 

 

ONS's latest survey finds 2.8% of the UK population report that they are currently experiencing long COVID symptoms: 67% of that 2.8% report that the symptoms adversely affect their day-to-day activities. Separately, they've estimated that 70% of England has had COVID at least once; weighting their estimates for England/Scotland/Wales/NI suggests about 68% of the UK has had it. So conditional on having caught COVID at least once, we have ~3% of the population experiencing symptoms that adversely affect day-to-day activities for at least a month and often much longer.  (Table 7 of the associated dataset implies that for each individual symptom, well over half have been experiencing those symptoms for "at least 12 weeks", which is consistent with Fig 3 in this earlier survey.).

Anyway, if every time or few times that I catch COVID equates to a ~3% chance of long covid that adversely affects my day-to-day activities for a long time, for me that's high enough that it justifies having categories of things that I do less often than I used to, categories of things that I do while masked, and categories that I do with no precautions. We don't generally go around criticizing people for "obsessing" when they take other slightly inconvenient actions to mitigate other low-probability risks (wearing seatbelts; having a diet composed of more healthy-but-less-delicious than unhealthy-and-more-delicious foods; cutting down on alcohol; etc.). So this constant criticism of people who are choosing to make changes to reduce their long COVID risk does rub me the wrong way.
 

Comment by gabrielrecc (pseudobison) on Long COVID risk: How to maintain an up to date risk assessment so we can go back to normal life? · 2022-05-09T07:33:17.835Z · LW · GW

The poster's concern is with long COVID, which can certainly have effects that a lot of people would consider severe. The "severe" COVID that has a baseline of less than 1% for the young and healthy refers to COVID that requires hospitalization. Long Covid rates are higher.

Comment by gabrielrecc (pseudobison) on GPT-3 and concept extrapolation · 2022-04-21T08:35:41.605Z · LW · GW

I was slightly surprised to find that even fine-tuning GPT-Neo-125M  for a long time on many sequences of letters followed by spaces, followed by a colon, followed by the same sequence in reverse, was not enough to get it to pick up the pattern - probably because the positional encoding vectors make the difference between e.g. "18 tokens away" and "19 tokens away" a rather subtle difference. However, I then tried fine-tuning on a similar dataset with numbers in between (e.g. "1 W 2 O 3 R 4 D 5 S : 5 S 4 D 3 R 2 O 1 W") (or similar representation -- can't remember exactly, but something roughly like that) and it picked up the pattern right away. Data representation matters a lot!

Comment by gabrielrecc (pseudobison) on The Long Long Covid Post · 2022-03-08T17:40:19.442Z · LW · GW

Thanks for digging into this a bit, and I should have linked directly to the paper rather than to an article with the headline "Heart-disease risk soars after COVID" with a focus on relative risks, since as you say the absolute risks are very important for putting things into perspective. For what it's worth, I agree with Zvi's final conclusion ("That’s not nothing, but it’s not enough that you shouldn’t live your life").

That said, an additional 1.2 out of every 100 people experiencing heart failure in the first 12 months after COVID-19 infection, if that holds up in reality, seems like it may have some effects at a population level (suggests that cardiologists will be in more demand, if nothing else). I can imagine that for some people with low risk tolerances, or with high preexisting cardiac risk, it might be a factor in wanting to live one's life slightly differently than one did prepandemic.

It'd have been nice if they'd included a breakdown of when the vaccinated participants tested positive relative to when they were vaccinated. Supplementary table 21 notes that virtually no one was vaccinated prior to enrollment in the study, but that 62% in the COVID group (56% in the control group) had been vaccinated by the end.

Comment by gabrielrecc (pseudobison) on PSA: if you are in Russia, probably move out ASAP · 2022-03-03T20:35:00.182Z · LW · GW

Yes, this seems correct. Unfortunately it already sounds difficult to get out:

‘My future is taken away from me’: Russians flee to escape consequences of Moscow’s war | Russia | The Guardian

"Those seeking to leave faced a severe lack of available flights after western countries closed their airspace to Russian airlines. Moscow has also closed its airspace to much of the west in response.

Flights to Yerevan, Istanbul and Belgrade were completely sold out for the coming days while a one-way ticket to Dubai was priced at over £3,000 ($4,006) – compared with £250 ($334) in ordinary times – according to the flight aggregator Skyscanner. Train tickets from St Petersburg to Helsinki were also sold out on Thursday and Friday."

Also sounds from the article like border officials are extensively questioning folks at the border, scrolling through any private chats that haven't been deleted on messaging apps, etc. Be careful all.

Comment by gabrielrecc (pseudobison) on The Long Long Covid Post · 2022-02-14T18:06:27.677Z · LW · GW

Just to clarify - given that your first link seems concerned about athlete collapses/deaths following vaccination (supposedly, although the comments there imply insufficient fact-checking), but your second link is about athlete collapses/deaths following COVID-19 infection and your comment is on a post about long COVID, is your concern about heart issues following vaccination or following COVID infection?

If the latter, yes, heart disease and stroke do seem to be more probable following COVID infection according to this recent large study. It should be noted that the control group came from 2017, but the effect sizes they find are so large that it doesn't seem like differences in average heart disease frequency between 2017 and 2022 in a counterfactual world without COVID are especially relevant.

Comment by gabrielrecc (pseudobison) on The Long Long Covid Post · 2022-02-12T22:18:53.266Z · LW · GW

The way that data set is presented is infuriating – there are tables that list raw counts without reference to the sample size (maybe it’s an estimated raw number for the whole country, in which case they’re quite small)


This is the UK Office for National Statistics - their usual is to report estimated numbers for the whole country. Easy to miss but it's in thousands -- scroll to the far right of each table with raw numbers and you'll see that stated near the top. So Table 1 estimates 1,332,000 UK residents with Long COVID, which is in line with the 2% figure stated in Table 4 if we assume that it's talking about the whole country.

I presume this is listing their health conditions before Covid since it makes no sense the other way, but am still somewhat confused.

Footnote 7 says "Health/disability status is self-reported by study participants rather than clinically diagnosed. From February 2021 study participants were asked to exclude any symptoms related to COVID-19 when reporting their health/disability status. However, in practice it may be difficult for some participants to separate long COVID symptoms from unrelated exacerbation of pre-existing conditions, so these results should be treated with caution."

What’s even stranger is this is now people who had Covid over 12 weeks ago, instead of the general population, and the estimate has gone down – 2.06% to 1.46%.

The title of the table can be parsed different ways, but pretty sure that what this table is showing is, "Of people living in private households with self-reported long COVID, what proportion of them say that they first had COVID at least 12 weeks previously" (1.46%). We can see from Footnote 1 that the definition of Long Covid for this study was "Would you describe yourself as having 'long COVID', that is, you are still experiencing symptoms more than 4 weeks after you first had COVID-19, that are not explained by something else?" So presumably the remaining 98.54% of people with self-reported long COVID said that they first had COVID at least 4 weeks but less than 12 weeks previously.

The table with the 2.06% is saying, "Of all people in the country, what percentage of them have long COVID of any duration", i.e., 4 weeks or longer. So I don't think there's a contradiction here.

Matt Bell was referencing the UK data set above so I have no idea how he can get 2.8%, and in fact my reading of the link says he has it somewhat lower than that but still strangely high.

I also tried and failed to figure out how he gets this number.

 

There is a separate study by the Office for National Statistics with controls (a later one than the one Matt Bell mentions, with different methodology) that I found useful - report is here - though annoyingly it doesn't break the data down by individual symptoms. Figures 1 and 2 also illustrative with respect to duration of symptoms. The report is pretty comprehensive but the data tables are here, Tables 1-4 show comparisons to controls.

Bottom line is summarized by the points at the top, reproduced below; note that only "Approach 3" uses self-reported long COVID:

"Approach 1: Prevalence of any symptom at a point in time after infection. Among study participants with COVID-19, 5.0% reported any of 12 common symptoms 12 to 16 weeks after infection; however, prevalence was 3.4% in a control group of participants without a positive test for COVID-19, demonstrating the relative commonness of these symptoms in the population at any given time.

Approach 2: Prevalence of continuous symptoms after infection. Among study participants with COVID-19, 3.0% experienced any of 12 common symptoms for a continuous period of at least 12 weeks from infection, compared with 0.5% in the control group; this estimate of 3.0% is based on a similar approach to the one we published in April 2021 (13.7%), but is substantially lower because of a combination of longer study follow-up time and updated statistical methodology. The corresponding prevalence estimate when considering only participants who were symptomatic at the acute phase of infection was 6.7%.

Approach 3: Prevalence of self-reported long COVID. An estimated 11.7% of study participants with COVID-19 would describe themselves as experiencing long COVID (based on self-classification rather than reporting one of the 12 common symptoms) 12 weeks after infection, and may therefore meet the clinical case definition of post-COVID-19 syndrome, falling to 7.5% when considering long COVID that resulted in limitation to day-to-day activities; these percentages increased to 17.7% and 11.8% respectively when considering only participants who were symptomatic at the acute phase of infection."

Comment by gabrielrecc (pseudobison) on Survey supports ‘long covid is bad’ hypothesis (very tentative) · 2022-01-15T22:08:28.765Z · LW · GW

UK's ONS has a nice comparison with controls which shows a clear difference, see Fig 1. (Note that this release uses laboratory-confirmed COVID-19 only, unlike some of their other releases.)

Comment by gabrielrecc (pseudobison) on What would you like from Microcovid.org? How valuable would it be to you? · 2021-12-30T14:44:42.881Z · LW · GW

Given that the early data I've seen suggests that efficacy of 3 doses vs. omicron is similar to that of 2 doses vs. delta -- probably a bit lower, but at least in the same universe -- I've been using it largely as is, multiplying the final output by 2 to 3 based on what I've seen about the household transmission rate of Omicron relative to Delta. I know some other boosted people who have used it in a similar fashion. There's so much uncertainty in the model assumptions that its best use in my view is to get very broad-strokes order-of-magnitude idea of the risk, which has been extremely useful for friends and relatives who have just wanted a baseline idea of whether the risk of getting COVID when participating in a particular activity is more like .01% or .1% or 1% or 10%. (Note: I doubt that said friends and relatives would have been able to use it in this way without my help, since it requires a little math and they're not math types.) So I guess my main recommendations would be:


- don't get rid of it even if you aren't confident in the Omicron data - if you can produce results that are probably in the right order of magnitude, it's still useful! If you aren't up for a full Omicron overhaul, but you think there's some back-of-the-envelope adjustment that could give results that are probably the correct order of magnitude, I think applying that -- with suitable caveats about accuracy -- would be preferable to taking the site down or leaving it as is.


- It's easy to forget how many people are not math people whatsoever. Best practice in risk communication is often considered to be communicating numbers as percentages, as well as contextualized frequencies -- not just 'X-in-a-million', but something like "X out of Y people (for context, Y is roughly the number of people living in Z)" -- as there are a lot of people who don't really understand percentages and need a little context to understand frequencies. In my ideal world the output would make the chance of getting COVID from this specific activity clear as a percentage and as a contextualized frequency, as well as the chance of getting COVID from this activity in a year under the assumption that you do this activity every N weeks, where N can be entered by the user.

Comment by gabrielrecc (pseudobison) on Walkthrough: Filing a UK self-assessment tax return · 2021-12-30T01:58:50.694Z · LW · GW

Thank you for such a comprehensive rundown! I've bookmarked this as I expect/hope to be in a situation in the future when this comes in handy.

I hate to say it, but the images are not coming through for me, as perhaps you've already noticed!

Comment by gabrielrecc (pseudobison) on COVID Skepticism Isn't About Science · 2021-12-30T01:48:11.497Z · LW · GW

Really, though, shouldn't we be able to do something to protect the elderly or other vulnerable people without causing everyone else six months of financial hardship and lost relationships?"

"Six months..." the man squirms. "I might need you to do this for a year or two."


Not exactly a fair description of what the public health measures have been. What country has been in lockdown for "a year or two" (besides China)?

> The harms caused by COVID suppression were larger than the harms of COVID itself for most people.

Possibly, but I doubt the same can be said for the net hedon loss. The great-uncle who died of COVID may have been quite old, but he still probably had a few years ahead of him: an expected 11 if he was 75, or 6 if he was 85. Those are years his family misses out on spending with him as well. The 10% of those infected who are still experiencing symptoms after 12 weeks (not depression: most frequently fatigue, cough, headache, loss of taste, loss of smell, myalgia), most of whom are likely to still be experiencing these issues for another 12 weeks or more, are not mentioned, nor is the impact of this on their own lives and livelihoods.

Most importantly, this really seems to strawman our poor bureaucrat, as he doesn't even mention the actual point of these measures: to serve as a stopgap until herd immunity, ideally primarily by vaccination so as to mitigate the above harms + further harms caused by hospital overload. Meanwhile, vaccination is the primary thing that our actual public health bureaucrats have been hammering on for the past year. I get the feeling that this isn't discussed in this post because it doesn't fit the narrative.

(edited to add context to initial quote)

Comment by gabrielrecc (pseudobison) on On (Not) Reading Papers · 2021-12-21T13:46:31.281Z · LW · GW

The number of experiences I've had of reading an abstract and later finding that the results provided extraordinarily poor evidence for the claims (or alternatively, extraordinarily good evidence -- hard to predict what I will find if I haven't read anything by the authors before...) makes this system suspect. This seems partially conceded in the fictive dialogue ("You don't even have to dig into the methodology a lot") - but it helps to look at it at least a little. I knew a senior academic whose system was as follows: read the abstract (to see if the topic of the paper is of any interest at all) but don't believe any claims in it; then skim the methodology and results and update based on that. This makes a bit more sense to me.

Comment by gabrielrecc (pseudobison) on Visible Thoughts Project and Bounty Announcement · 2021-11-30T22:10:24.974Z · LW · GW

Relevant: From OpenAI's "Training Verifiers To Solve Math Word Problems": "We also note that it is important to allow the model to generate the full natural language solution before outputting a final answer. If we instead finetune a 6B model to directly output the final answer without any intermediate steps, performance drops drastically from 20.6% to 5.2%." Also the "exploration" linked in the post, as well as my own little exploration restricted to modulo operations on many-digit numbers (via step-by-step long division!), on which LMs do very poorly without generating intermediate steps. (But see also Hendryks et al: "We also experiment with using step-by-step solutions. We find that having models generate their own step-by-step solutions before producing an answer actually degrades accuracy. We qualitatively assess these generated solutions and find that while many steps remain illogical, they are often related to the question. Finally, we show that step-by-step solutions can still provide benefits today. We find that providing partial ground truth step-by-step solutions can improve performance, and that providing models with step-by-step solutions at training time also increases accuracy.")

Comment by gabrielrecc (pseudobison) on ($1000 bounty) How effective are marginal vaccine doses against the covid delta variant? · 2021-07-23T07:57:13.697Z · LW · GW

https://www.bbc.co.uk/news/health-57667987 - Hard to say what's "likely" with this government, but it's what the Joint Committee on Vaccination and Immunisation has advised

Comment by gabrielrecc (pseudobison) on What is the Risk of Long Covid after Vaccination? · 2021-07-03T21:56:08.756Z · LW · GW

Interesting... although that given that the report you link to is also about the ZOE research, and later seems to elaborate on what they mean by "up to" 30% as by saying that "even if a vaccinated individual goes on to contract Covid-19, that person’s chances of developing Long Covid are reduced by a further 30 per cent in the most at-risk age group", I wonder if this is just a slightly trumped-up way of stating the finding of a significant difference for the older age group alone in the preprint that OP already linked to. (The preprint states: "In the 60+ group, we found lower risk of symptoms lasting for more than 28 days (OR=0.72, 95%CI [0.51-1.00])".)

Comment by gabrielrecc (pseudobison) on What is the Risk of Long Covid after Vaccination? · 2021-07-03T14:13:38.370Z · LW · GW

Regarding the comments earlier in this thread suggesting that long Covid is largely misattributed depression, the symptom profile doesn't seem to bear this out. "At five weeks post-infection among Coronavirus Infection Survey respondents testing positive for COVID-19", quite a few are still experiencing symptoms that don't sound much like depression: 11% of individuals are still experiencing cough, 8% loss of taste, 8% loss of smell, 8% muscle pain. It also seems fairly clear from Figure 1a of the first(?) ZOE App long Covid study that a substantially higher proportion of those testing positive, vs. symptom-matched negative controls, have long symptom timecourses. You might take some comfort from the fact that the same figure shows that the proportion still experiencing symptoms continues to drop off markedly as additional time passes, though admittedly part of this might be down to some people eventually tiring of logging symptoms into the app.

That said: Speaking for myself, as someone who has been very cautious throughout the pandemic, after my second vaccination I suspect I will mostly go "back to normal", with some exceptions like continued mask-wearing in certain indoor public spaces subject to case rates. If Covid is going to be endemic at this point, as appears likely, the fact that I am as young as I will ever be, and will be recently fully vaccinated, gives me the best protection I will ever have. So after full vaccination, I suspect that when my immune system eventually gets to practice on "the real thing" - while there would be a small risk of a breakthrough infection (and a small risk of long COVID if one occurs), it seems there might be an equivalently small chance that the experience would increase my adaptive immunity such that I am in a better position if I catch a variant when I am older.

Comment by gabrielrecc (pseudobison) on Are we prepared for Solar Storms? · 2021-02-18T21:23:31.820Z · LW · GW

Worth noting that studying the effect of damage to GPS systems was not considered in the Open Philanthropy report (beyond a mention that it is beyond the scope of the report), but GPS going down could be very bad indeed, given power grids' and supply chains' near-total dependence on it: https://www.nytimes.com/2021/01/23/opinion/gps-vulnerable-alternatives-navigation-critical-infrastructure.html . On the plus side, an emergency backup system for GPS has already been mandated by U.S. law, but it sounds from the NYT article like it's been 3 years and the government and business interests still can't agree on who's going to pay for it. Might be a good topic for some political advocacy.

David Denkenberger also points to this paper reporting evidence of two extremely large solar storm events in AD 774 and 993, the larger being "five times stronger than any instrumentally recorded solar event": https://www.nature.com/articles/ncomms9611. On that basis he predicts  "~10% chance of loss of industrial civilisation this century."

Comment by gabrielrecc (pseudobison) on Crazy Ideas Thread · 2016-06-28T19:23:48.283Z · LW · GW

This seems like a slippery slope. Minorities tend to have shorter life expectancies than whites, at least in the U.S. and U.K. Do their votes then count for less?

Comment by gabrielrecc (pseudobison) on Crazy Ideas Thread · 2016-06-28T19:16:26.222Z · LW · GW

No; my script only contains the handful of unicode characters I commonly use, and is so idiosyncratic to me that it wouldn't be of much use to anyone else (mine includes autoreplacements for directories, email addresses I commonly type, etc.). But it's easy enough to make your own with whatever characters you use -- the syntax is simply

::text-to-replace::desired-replacement

::alpha::α

::em::—

etc.

Comment by gabrielrecc (pseudobison) on Crazy Ideas Thread · 2016-06-28T06:46:53.184Z · LW · GW

I use Autohotkey on Windows for that purpose.

Comment by gabrielrecc (pseudobison) on Suggest best book as an introduction to computational neuroscience · 2016-05-28T06:10:32.738Z · LW · GW

Networks of the Brain by Olaf Sporns certainly doesn't cover all of computational neuroscience, but is a good accessible introduction to using the tools of network theory to gain a better understanding of brain function at many different levels.

Comment by gabrielrecc (pseudobison) on Recovery Manual for Civilization · 2016-05-26T09:08:40.729Z · LW · GW

One could bury Wikipedia, the Internet Archive, or a bunch of other items suggested by The Long Now Foundation

Since no one's yet included the links to the Long Now Foundation's blog posts in which they discuss suggestions for such items and other projects that are attempts in this direction, here they are:

http://blog.longnow.org/02010/04/06/manual-for-civilization/

http://blog.longnow.org/category/manual-for-civilization/manual-book-lists/

Comment by gabrielrecc (pseudobison) on Negative visualization, radical acceptance and stoicism · 2016-04-17T06:29:09.332Z · LW · GW

I find that negative visualization in conjunction with Mark Williams' guided meditation "Exploring Difficulties" is useful for getting me in that stoic mindset of being more okay with a worst-case scenario. (Or at least, I hope so - I guess I'll see how well it worked if the worst-case scenario ever comes to pass.)

Comment by gabrielrecc (pseudobison) on Lesswrong 2016 Survey · 2016-04-02T09:11:12.671Z · LW · GW

I've taken the survey.

Comment by gabrielrecc (pseudobison) on Open Thread March 21 - March 27, 2016 · 2016-03-22T09:58:55.837Z · LW · GW

I keep a daily journal. Beginning of day: Two things that I'm grateful for. End of day: Two things that went well that day, two things that could have gone better. Each "thing" is usually only a sentence or few long. I find that going back through the end-of-day sentences every so often is useful for doing 80-20 analyses to find out what seems to be bringing me the most happiness / dissatisfaction (at least as judged by my end-of-day assessments).

Comment by gabrielrecc (pseudobison) on Open Thread March 21 - March 27, 2016 · 2016-03-22T09:01:41.822Z · LW · GW

I'm very sorry to hear about your dog. It's a very difficult thing to go through even without any predisposition towards depression.

This is probably an idiosyncratic thing that only helps me, but I find remembering that time is a dimension just like space helps a little bit. In the little slice of time I inhabit, a pet or person who has passed on is gone. From a higher-dimensional perspective, they haven't gone anywhere. If someone were to be capable of observing from a higher dimension, they could see the deceased just as I remember them in life. So in the same way that someone whose children are living far from home can remind themselves that their children are in another place, likewise your dog is living happily in another time. English doesn't quite have a tense that conveys the sentiment I want to convey, but I think you get the idea. Don't know if that line of thought does anything for you - I find it a small but useful comfort.

Re actually doing exercise/positive self-talk when you're down, setting up little conditionals that I make into automatic habits by following them robotically has sometimes worked for me. "IF notice self getting anxious - THEN take five minute walk outside". Obviously setting up those in the first place and following through on them the first n times only works when in an OK mood, but once they become habits they're easier to follow through on in more difficult states of mind. I've also found the Negative Self-Talk/Positive Thinking table at the bottom of the page here to be useful.

But hard things are hard no matter what. Sounds like you're doing the right thing now by making the most of the time you have together. Best of luck to you.

Comment by gabrielrecc (pseudobison) on Open Thread Feb 22 - Feb 28, 2016 · 2016-02-29T09:17:56.578Z · LW · GW

That's the point of the article: agriculture allowed the Earth to support a vastly larger human population than it could have otherwise, but at a cost.

Personally I'm more optimistic than the author of the article I linked that the median quality of life of a human on Planet Earth will ultimately exceed the median quality of life of a human on an Earth where agriculture had never been developed -- in fact I think there's a good chance that that's already the case. But I don't think it's completely obvious, for reasons the author describes in detail.

Comment by gabrielrecc (pseudobison) on Goal setting journal (March 2016) · 2016-02-29T08:30:42.905Z · LW · GW

I guess whether > 3 mg/kg is a "lot" compared to other food types is relative to the number of food types the study considered.

I haven't dug up the France study to see how many foods they looked at that didn't make the >3 mg/kg cut, but the first study that I clicked on after searching Google scholar just now is a German study that found a median mg/kg of 160 for "cocoa powder" and 39 for "chocolate". Of the 1,431 food samples they tested, "77.8% had an aluminium concentration of less than 10 mg kg-1. Of the samples, 17.5% had aluminium concentrations between 10 and 100 mg kg-1. In only 4.6% of the samples, aluminium concentrations greater than 100 mg kg-1 were found.". Looking at the histogram in Figure 1, we can place chocolate's median aluminum level of 39 in the top 13.7% percent or higher, and cocoa powder's of 160 in the top 4.6% or higher.

I'm well aware of the irony that in my above post I suggested substituting cocoa powder for chocolate.

In particular, the study notes that "Table 4 shows that the PTWI for aluminium can be reached only by consumption of large amounts of chocolate [42–44]." (PTWI = provisional tolerable weekly intake used by the Joint FAO/WHO Expert Committee on Food Additives).

Are there plenty of other foods with as much aluminum as chocolate? Sure. Am I cutting chocolate out of my own diet anytime soon? No. But since the original poster is planning to take up chocolate consumption specifically for brain/intelligence -related reasons, I figured it was a relevant consideration.

edit: It's kind of an odd list of foodstuffs the German study considered. The introduction implies but doesn't state that they selected foods that they expected to have at least some aluminum content based on prior research. I also can't account for the huge discrepancies between the French and German studies in terms of mg/kg aluminum levels detected.

Comment by gabrielrecc (pseudobison) on Open Thread Feb 22 - Feb 28, 2016 · 2016-02-28T20:20:47.144Z · LW · GW

I think it turned out pretty well.

Well, that remains to be seen.