Posts
Comments
Ok so I found time to try this out.
Treating this like an audio signal makes sense, so that we can use analytical tools available in audio signal processing.
The overall challenge is effectively to try and determine whether each sample is white noisey enough.
Why? Think about what white noise is in this context. It's a pure random process. We're trying to determine if there are any particular patterns or skews to each sequence, or if the noise isn't 'white' enough.
If someone put down a completely repeatable sequence, say: [0,1,0,1,0,1,0,1,0,1...] If you were to transform the range to [-1,1,-1,1...] and put it into Audacity, (try it, you can import->raw data) you'd hear a tone, (or an inaudibly short bleep in this case). Very much not random.
In an FFT, we'd see a spike in this particular frequency on our spectrum.
Imagine summing our first sequence with another random repeatable sequence on top: [0,1,0,1,0,1...] + [0,0,0,1,1,1,0,0...] = [0,1,0,2,1,2,0,1...]
Do some tricks to normalize and rescale the data, and we'd hear two different tones. We'd now have two spikes in our FFT.
Imagine just carrying this on for, say, 1000 times - then do our scaling and rounding. The FFT would be so filled out that it would look almost flat
The more tones of random frequency, phase + amplitude you add, the closer and closer it comes to white noise, and therefore a perfectly random process.
White noise, in theory, should be maximally flat. That means that all frequencies over the possible range have equal power.
For a short sequence such as this it won't be perfectly flat, however any large skew in any direction, suggests the presence of patterns which are not random.
We can use two simple spectral features to show this: Spectral Centroid (i.e the centre of mass) Spectral Flatness (i.e a measure of how flat the spectrum is)
Plots
FFT of a pseudorandom number:
FFT of example 112:
Notice the skew in the example, it's very skewed towards the higher frequencies - this means the flatness will be lower and the centroid will be higher.
So here's a plot of just the centroid vs flatness for all examples + 500 truly pseudorandom examples:
We have some nice separability on a good 12-14 of the examples. Look how far 66 and 5 standout, these would be my 0%ers for sure.
Here's a PCA of several spectral features (rolloff, contrast, bandwidth + 2 above) for good measure:
Reading over this the first thing that sprang to mind is doing an FFT over the sequences and using the spectral flatness as a marker to eliminate simple ones. You could then do PCA on them along with some true pseudorandom sequences and plot, similar to Scy and William's method. I'm not 100% as these sequences are a tad short. Perhaps I'll give this a go however.