Doing a self-randomized study of the impacts of glycine on sleep (Science is hard)

thedissonance-net

Doing a self-randomized study of the impacts of glycine on sleep (Science is hard)

post by thedissonance.net · 2025-01-17T18:49:30.989Z · LW · GW · 5 comments

  Intro
  Motivation
  Preparation
    Randomization
      Generating instructions
    Blinding
  Doing the study
    Data gathering
  Things to change
None
5 comments

This is a linkpost from my blog and also my first submission on LessWrong. Please be generous with your feedback! I will post the results of our study once the analysis is done and written up. To avoid cliff-hangers: From what I've seen so far, there doesn't seem to be a whole lot of ... effect.

Intro

In November 2024, me and two friends decided to run a self-experiment to see if taking glycine regularly would impact our sleep. Running the study required us to solve a few interesting practical challenges. This post describes how we came up with the study design and the issues we ran into. I'm writing this on the last few days of the study so I don't know the results yet. Once they are in, I'll write them up and publish them in a separate blog post.

I'll start this post by telling you why we thought this hypothesis is worth testing, what problem our study design solved, and the design itself.

Me and my co-conspirators tried our actual best at producing high-quality results but don't be deceived: I'm just a person on the internet, not a trained scientist. Nothing in this post is medical advice.

Motivation

My co-conspirators have already done a study like this on themselves, randomizing control vs intervention for each other. I would describe their results as inconclusive, showing a positive effect for one of them and no effect for the other. Based on this, I have taken 3 g of glycine a few times when I expected not to get a lof of sleep. I noticed feeling more awake than I would expect the next day, and I noticed cold shivers about 30 minutes after taking it which fits with the purported mechanism^[1].

This was intriguing enough that I wanted to run a blinded experiment on my self to figure out how much of what I was observing was effect versus placebo. My co-conspirators were not at that time taking glycine regularly as they didn't want to build up tolerance^[2]. This made all of us suitable guinea pigs. So one night, during a long taxi drive, we came up with a study design that would

Show me whether glycine had a statistically significant positive effect on my sleep quality.
Show whether there were any tolerance effects.
Allow me to self-administer intervention or control in a blinded way even though I wouldn't have someone else to help me.
Control for as many things as we could think of.

The biggest challenge here came from the fact that we wanted to share a schedule to control for weather and holidays^[3]. Otherwise, we would have simply each prepared a course of preparate for another, based on a randomly generated schedule.

In order to share a schedule, we had to first figure out how to create it without any of us having information that could deblind us. And second, we needed to come up with a procedure that would preserve blinding over the course of the experiment.

Preparation

Design and preparation of the study took 12 hours, spread out over four days.

The idea for how to build a shared schedule came from looking at various pill trays available online.

Photo of pill tray. See description in paragraph.

This pill tray^[4] is intended for taking medication four times a day, seven days a week. The tray is actually a grid of seven removable subtrays, containing four compartments each. Importantly, the compartments can be put into the tray in any order. The core idea here is that one person prepares the glycine or the control in each tray and another person re-arranges the subtrays, without knowing what's inside. At the end, neither person knows the contents of any compartment.

The detailed preparation procedure expands on this core idea to make sure that none of us carries any statistical information into the experiment. In an extreme case, if one person ends up putting glycine in the first compartment of 18 out of the 21 trays, they would know that any cell in the first row has a 84% chance of containing glycine and not control.

Scetch of three pill trays, containing seven subtrays each. Each subtray contains four compartments.

Randomization

We randomized our schedule by writing a python script that first generates the schedule and then generates three sets of instructions, one each for me and my co-conspirators. We would keep our instructions secret from each other. The instructions would either tell us to put intervention or control in a set of grid cells or to swap around rows. When we performed all of them one after another, we ended up with a schedule that we had saved on file, but without having any information individually about the contents of any of the grid cells.

Generating instructions

First, the program generates a sequence of 28 letters. The sequence consists of runs of As and Cs, where A indicates the active substance (intervention) and C indicates the control. The run length for each run is chosen randomly to be 2, 3, 6 or 8. Longer runs are better for studying tolerance, but if I there were only long runs then getting deblinded on a day would reveal more statistical information about future days^[5]. So we opted for some long and some short runs chosen randomly as a tradeoff.

The sequence is arranged in a 4x7 grid corresponding to how it be laid out in a pill tray. Then three copies of the grid are stitched together, since we want to fill one pill tray one for each of the three participants. This produces an output like this (grouped by tray):

Ground Truth
AAACCAA AAACCAA AAACCAA
CCCAAAC CCCAAAC CCCAAAC
CCCCCAA CCCCCAA CCCCCAA
AAAAAAC AAAAAAC AAAAAAC

The only pill tray with removable sub-trays that we could find had the sub-trays in seven different colours.

To explain how this could be a problem, imagine for a moment that there is only one tray per colour. If my set of instructions told me to put intervention in the top box of the green tray, I could deblind myself by remembering this. Once I had was preparing mixture using the green tray, I would know it contains intervention, since my co-conspirators' instructions only shuffle trays around, but don't change the contents of the trays themselves.

Now imagine that there are three green trays. If my instructions told me to place intervention in the top of each, I could still deblind myself. Even if two trays contained glycine and one contained intervention, I would still know that any green tray contains glycine in the first row with 67% probability. So we needed to come up with instructions that were as resistant as possible to memorization.

One option is to buy seven trays and have each sub-tray be the same colour, but we didn't want to spend that much.

Instead we bought four trays and arranged subtrays in 6 colours like this:

1111 2222 3333 4444 555 66

Based on that, we decided that we would have three sets of instructions, to be executed in order. A, B, and C. A and B told us to place intervention or control in half of the cells each. C told us to permute the rows into the final arrangement.

A and B were complementary, and chosen so that the two people doing them would fill exactly half of each tray and roughly half the trays in each colour. Here are all the schedules, grouped not by tray but by colour:

Schedule A
__AC __AA __AC __AA __A _A
__CA __CC __AA __CA __C _C
AC__ CA__ CC__ CC__ AC_ C_
CA__ AC__ AA__ AA__ AA_ A_
1234 5678 9012 3456 789 01

Schedule B
AA__ CA__ AC__ CA__ AA_ C_
CC__ AC__ CA__ AC__ AC_ A_
__CC __CA __AC __CA __C _C
__AA __AC __AA __AA __A _A
1234 5678 9012 3456 789 01

Schedule C
21 ->  1
14 ->  2
3 ->  3
4 ->  4
12 ->  5
16 ->  6
8 ->  7
2 ->  8
18 ->  9
9 -> 10
13 -> 11
10 -> 12
11 -> 13
6 -> 14
7 -> 15
15 -> 16
19 -> 17
20 -> 18
5 -> 19
17 -> 20
1 -> 21

You can verify that if you put schedules A and B together, look at the first row only, and then take first the 21st letter, then the 14th, the 3rd, the 4th, etc., you'll get AAAC... which corresponds to the ground truth above.

Once the schedules were generated we printed them^[6] and did a test run with differently-coloured marbles. This made us adjust the format and layout a little bit (underscores were more readable than spaces, and 21 -> 1 in schedule C is more readable than 1: 21 which is what we had before). Then each of us locked themselves in the science room and performed their schedule, one after the other.

Those of us doing schedules A and B took a picture after adding the control and another picture after adding the intervention. We also labeled each subtray and made sure to have the labels visible in the photos (we made sure to erase the labels and relabel before and after the permutation in schedule C). The person performing schedule C took a photo with the old labels after they were done, so that we could reconstruct the permutation. After the experiment, we will be able to find and correct any mistakes we made in performing the schedules. The new labels we added after the final schedule were in ascending order, which would allow us to detect if we had reordered the trays by accident over the course of the study.

We wrote down the exact order of operations in a protocol^[7], just to make sure there was no ambiguity.

Blinding

Glycine is a white powder, similar to salt or sugar. We wanted to find a control that was indistinguishable visually from the intervention. Sugar ticked some of the boxes. It matches the appearance of crystalline glycine and both substances taste sweet. It's a stimulant however, so it wasn't a good option. We also considered salt but drinking a glass of water with two teaspoons of salt every night was a bit too grim.

We weren't that worried about the sweetness of glycine because we thought we could mask it with sugarless squash^[8].

We thought about using other amino acids available as supplements but decided against it on the basis that they might have effects of their own which makes them a bad control.

We looked online for suggestions and found that crystalline cellulose is commonly used as a control. It's a tasteless, chemically inert white powder. From looking at pictures we could tell the cellulose was a very fine powder (more like icing sugar than salt).

We didn't have much time to research other controls so in the end we decided we would go with the cellulose because it had a track record as a placebo. We also bought a mortar and pestle to grind down the glycine until it was as fine as the cellulose.

We ordered the ingredients and two days later as they arrived got to work. We decided to use a 1:1 mix of glycine and cellulose as the intervention (in terms of volume) to mask any differences between the two. Before creating an actual schedule we set out to prove to ourselves that intervention and control:

Are visually indistinguishable in the pill tray.
Are visually indistinguishable once dissolved.
Cannot be distinguished by taste once dissolved in sugarless squash.

After a lot of experimenting and testing each other we discovered that:

Grinding glycine into dust was really hard work and it would take us at least half a day of continuous manual grinding to get all of it fine enough.
We couldn't visually distinguish ground glycine from unground glycine.
We could very clearly visually distinguish glycine (both ground and unground) from cellulose. The shades of white are slightly different: Glycine is close to pure white, while cellulose is more alabaster. Also, glycine is shiny, like sugar, while cellulose is not.
We could visually distinguish the glycine-cellulose mix from cellulose when mixed with water. Glycine is soluble in water white cellulose is not. This means that the control mixture had twice as much sediment as the intervention.
We couldn't distinguish intervention from control by their taste or mouthfeel on a first try. We decided that running too many experiments on taste might teach us to tell the two apart by the felt amount of cellulose in suspension, so we intentionally didn't perform too many experiments here.

While not exactly the results we hoped for, the fact that we couldn't tell them apart by taste or mouthfeel meant that this was still salvageable. Since all the cues that could deblind us were visual, we would just have to prepare the mixture without looking. We would also use opaque cups with lids so that we couldn't see the mixture while drinking it. We expected this to be more prone to accidental deblinding than a visually indistinguishable control, but it was still our best option.

Doing the study

On the first day of the experiment, was I noticed cold shivers after taking the mixture. I was curious whether this was an actual physiological response to (suspected) glycine or if it was just placebo, so I wrote down my guess of what I had taken^[9]. This made me hyperfocus on everything that could deblind me and keep mental track of expected runs. I didn't take any further notes thereafter and I think this was the right call.

Actually preparing and taking the sample each evening mostly worked as intended. We had a group chat where we reminded each other to take the sample & survey and I have not missed a day yet.

The only consistent issue was that because I was drinking squash before bed, I would often wake up to having to use the bathroom. Once I noticed the problem, I halved the amount of squash I was using, which seems to have improved things a bit (data analysis outstanding). Also, sometimes taking the mixture or doing data collection would mean that I went to bed later.

Then there was a set of issues around travel. It was easy to forget to take the samples with me when I was not going to stay the night away from home and since I didn't know what I was supposed to be taking that day and we were all on the same schedule, there was no way to fix that.

The pill organizer itself wasn't intended for fine powders, so during travel some of the powders would spill and in the end I would have to transport it by holding horizontally in my hands.

When heading home for the holidays, I had to go through airport security with a box full of unlabeled white powder, which I couldn't put in my checked luggage because it needed to stay horizontal. I made sure to display it very obviously outside my bag as it went through the scanner and it didn't get stopped. I wonder if my model should be that the airport has a more sophisticated way of detecting actual drugs or whether confident people who appear middle-class enough just get away with anything.

While my science didn't get seized, a lot of the powder got spilled during my trip. I didn't bring sugarless squash, and couldn't buy any at my destination, so I used water with one teaspoon of artificial sweetener as a solvent. This worked, except that on one occasion I forgot the sweetner and deblinded myself with the sweet tatse of glycine.

With changing sleep schedules, a larger intake of sugar, and substantially less stress, I expect I'll have to drop most of the Christmas datapoints (9 / 28 days).

Data gathering

To collect data, we answered a survey at the end of each day.

When did you go to bed yesterday? (best guess)
When did you fall asleep yesterday? (best guess)
When did you wake up today? (best guess)
When did you get out of bed / stop snoozing today? (best guess)
How much time did you spend napping today?
How rested did you feel when you woke up today? (1=worst in month, 10=best in month)
How awake did you feel throughout the day today? (1=worst in month, 10=best in month)
How stressed did you feel today? (1=worst in month, 10=best in month)
How irritable did you feel today? (1=worst in month, 10=best in month)
How productive did you feel today? (1=worst in month, 10=best in month)
How much mg of caffeine did you have today? (250ml glass of Cola: 22mg; 250ml of green tea: 30mg; 250ml of black tea 50mg; cup of coffee: 100mg; espresso: 60mg; square of dark chocolate: 8ml)
How many ml of alcohol did you have today? (Shot: 10ml; Glass of wine: 20ml; pint of beer or cider: 30ml)
At what time did you drink the mixture today?

We each used tools for this which we found convenient. I went with the OSS android app Track & Graph.

I think this worked well for the most part. We fixed some wording issues after the first day, and added question 13.

While I expect the data from the survey to contain signal, there are also some biases and data collection issues which we didn't mitigate:

I often struggled to remember what time I went to bed or woke up. Looking at the timestamp of the first/last was a decent but imperfect proxy.
We knew we would analyze the data together and it would be hard/impossible to blind ourselves to who is who. Even though I am very close with my co-conspirators, there is a risk that I would answer the survey so as to present a version of myself to the others that is slightly less stressed & irritable, more productive, has better sleep habits etc.. As long as that bias is roughly constant, I don't expect it to affect the study outcomes.
It wasn't super clear at what level of granularity to record sleep. We decided to record the time we went to bed and the time when we woke up, and the amount of time we spent asleep, to reduce the chance for mental math errors. But what if I wake up in the middle of the night and fail to fall asleep again? Do I lie about the time I woke up? Do I pretend I didn't? Do I count everything after the first wake up as a nap? We failed to establish formal criteria for this.

Things to change

Doing this was a lot of fun, and I can already tell that there are some things I would do differently on a second go:

Choose a better control so that I don't have to prepare the mixture in darkness and/or with my eyes closed. Xylitol is a candidate I would look into.
Choose a schedule where only about 20% of the days are spent on the control, as opposed to 50%.
Keep better track of sleep times, especially when waking up at night.
Use different containers when traveling or make them air tight somehow.

I will save the full retrospective until I have analyzed the data.

^{^}
EDIT 2025-01-22: Clarify mechanism.
This is hearsay, I couldn't find the source I got that from. But the idea is that in order to fall asleep, your core body temperature needs to drop a bit. Glycine seems to decrease body temperature.
^{^}
The results seemed to imply that taking some glycine (6 g iirc) before sleep had the same effect as sleeping one hour longer. I unfortunately don't have the data at hand so I can't give a confidence bound.
^{^}
Honestly, controling for seasonal effects was only a part of the reason we opted for sharing a schedule. We also just wanted to see if we could do it since none of us had seen a study design like this before.
^{^}
In case you are wondering, we used [this one](https://www.amazon.co.uk/dp/B09Q1VH8ZL?ref=ppx_yo2ov_dt_b_fed_asin_title). I do not receive a comission for linking this here.
^{^}
Figuring out the expected number of days one is deblinded for after getting information about one day, given a certain choice of runs, is a fun problem. We deliberately chose not to think about it because constructing an optimal strategy for deblinding ourselves is not part of the optimal strategy for not deblinding ourselves.
^{^}
We were working on a windows machine, so the way to print a file without looking at it was to right-click, print, and close your eyes while the file briefly flashed on screen.
^{^}
Essentially, something like [this](https://youtu.be/FBaVwwuErmU?si=kIfmU7_n4ngFbYBF&t=240).
^{^}
An abominable British concoction, kind of like juice concentrate in that you have to mix it with water before drinking. Except that it's probably never even been close to a fruit. Often sold as juice to catch out the unwary.
^{^}
I did not notice cold shivers at any later point, which indicates that the ones I saw when I was taking glycine deliberately before the study were either unrelated to glycine or fully psychosomatic.
^{^}
We don't have any reason to think there would be tolerance effects, so this was just a precaution meant to preserve the observed usefulness of glycine.

5 comments

Comments sorted by top scores.

comment by niplav · 2025-01-24T11:22:44.276Z · LW(p) · GW(p)

Cool to see people doing self-blinded & randomized QS experiments :-)

Two tips (which you might have considered already): (1) You can buy empty pill capsules and fill them with whatever you want. That makes it a lot easier to blind for taste, and far less annoying to consume with eyes closed. (2) I've found it useful to use a wearable tracker for data collection, especially for sleep, since I can't be arsed to write all that down. ~All trackers allow for data export (thanks GDPR!), I use a cheap fitbit.

Replies from: thedissonance.net

↑ comment by thedissonance.net · 2025-01-24T18:58:36.507Z · LW(p) · GW(p)

(1) Wow I never though of that. That would have saved us a lot of pain! (2) I'm a bit nervous about recording my biometric data if it's not entirely offline, but I think the people I'm doing this with are likely to try this.

Thanks for the tips!

comment by [deleted] · 2025-01-17T19:35:28.556Z · LW(p) · GW(p)

I noticed feeling more awake than I would expect the next day, and I noticed cold shivers about 30 minutes after taking it which fits with the purported mechanism

what was the hypothesized mechanism?

Replies from: thedissonance.net

↑ comment by thedissonance.net · 2025-01-22T20:42:15.859Z · LW(p) · GW(p)

I don't remember where I've read this so it's not very high confidence.

The idea is that glycine lowers core body temperature, and that body temperature needs to drop a bit in the course of falling asleep

comment by Kajus · 2025-01-24T11:38:58.700Z · LW(p) · GW(p)

There are apps that can measure when you go to sleep based on your breath or something. Maybe that could be helpful?

Doing a self-randomized study of the impacts of glycine on sleep (Science is hard)

Contents

Intro

Motivation

Preparation

Randomization

Generating instructions

Blinding

Doing the study

Data gathering

Things to change

5 comments