Crypto quant trading: Intropost by Alexei · 2019-04-17T20:52:53.279Z · score: 62 (25 votes) · LW · GW · 17 comments
Setup Python, Pandas, Matplotlib, and Jupyter Our first notebook Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 Future information None 17 comments
I’m going to write a few posts on quant trading. Specifically trading crypto, since that’s what I know best. Here’s a few reasons why I’m doing this:
- I think I can benefit a lot from writing about my approach and methodology. Hopefully this will make the ideas and assumptions more clear.
- I’d love to get input from other people in the community on their approaches to model building, data analysis, time series analysis, and trading.
- There’s been a lot of great content on this website, and I’d love to contribute. This is the topic I currently know best, so I might as well write about it.
- My company (Temple Capital) is also looking to hire quants and we believe the rationalist way of thinking is very conducive to successful quant trading.
My goal here isn’t to make you think that “Oh gosh, I can become a millionaire by trading crypto!” or “Here’s the strategy that nobody else has found!” Instead, I want to give you a taste of what quant trading looks like, and what thinking like a quant feels like. EAs have been talking about earning to give for a while, and it’s well known that quant trading is a very lucrative career. I’ve known about it for a while, and several of my friends have done quant (e.g. at Jane Street) or worked at a hedge fund. But, I never thought that it was something I could do or would find enjoyable. Turns out that I can! And it is!
I’m going to be sharing the code and sometimes the step by step thinking process. If you’re interested in learning this on a deeper level, definitely download the code and play with the data yourself. I’ve been doing this for just over a year, so in many ways I’m a novice myself. But the general approach I’ll be sharing has yielded good results, and it’s consistent with what other traders / hedge funds are doing.
Note: I’ve actually haven’t gone through these install steps on a clean machine. I think they’re mostly sufficient. If you run into any issues, please post in the comments.
- Make sure you have Python 3.6+ and pip
- `pip install pandas numpy scipy matplotlib ipython jupyter`
- `git clone https://github.com/STOpandthink/temple-capital.git`
- `cd temple-capital`
- `jupyter notebook`
- Open `blog1_simple_prediction_daily.ipynb`
If you’re not familiar with the tools we’re using here, then the next section is for you.
Python, Pandas, Matplotlib, and Jupyter
We’re going to be writing Python code. Python has a lot of really good libraries for doing numerical computation and statistics. If you don’t know Python, but you know other programming languages, you can still probably follow along.
Pandas is an amazing, wonderful library for manipulating tabular data and time series. (It can do a lot more, but that’s primarily what we’re using it for.) We’re going to be using this library a lot, so if you’re interested in following along, I’d recommend spending at least 10 minutes learning the basics.
Matplotlib is a Python library for plotting and graphing. Sometimes it’s much easier to understand what’s going on with a strategy when you can see it visually.
Jupyter notebooks are useful for organizing and running snippets of code. It’s well integrated with Matplotlib, allowing us to show the graphs right next to the code. And it’s good at displaying Pandas dataframes too. Overall, it’s perfect for quick prototyping.
There are a few things you should be aware of with Jupyter notebooks:
- Just like running Python in an interactive shell mode, the state persists across all cells. So if you set the variable `x` in one cell, after you run it, it’ll be accessible in all other cells.
- If you change any of the code outside of the notebook (like in `notebook_utils.py`), you have to restart the kernel and recompute all the cells. A neat trick to avoid doing this is:
Our first notebook
We’re not going to do anything fancy in the first notebook. I simply want to go over the data, how we’re simulating a trading strategy, and how we analyze its performance. This is a simplified version of the framework you might use to quickly backtest a strategy.
The first cell loads daily Bitcoin data from Bitmex. Each row is a “daily bar.” Each bar has the `open_date` (beginning of the day) and `close_date` (end of the day). The dataframe index is the same as the `open_date`. We have the `high`, `low`, and `close` prices. These are, respectively, the highest price traded in that bar, the lowest, and the last. In stock market data you usually have the open price as well, but since the crypto market is active 24/7, the open price is basically just the close price of the previous bar. `volume_usd` shows how much USD has been transacted. `num_trades_in_bar` is how many trades happened. This is the raw data we have to work with.
From that raw data we compute a few useful variables that we’ll need for basically any strategy: `pct_change` and `price_change`. `pct_change` is the percent change in price between the previous bar and this bar (e.g. 0.05 for +5%). `price_change` is the multiplicative factor, such that: `new_price = old_price * price_change`; additionally, if we had long position, our portfolio would change: `new_portfolio_usd = old_portfolio_usd * price_change`.
A few terms you might not be familiar with:
- We take a long position when we want to profit from the price of an asset going up. So, generally, if the asset price goes up 5%, we make 5% on the money we invested.
- We take a short position when we want to profit from the price of an asset going down. So, generally, if the asset price goes down 5%, we make 5% on the money we invested.
Here we see that indeed BTC recently crossed its 200 day SMA (Simple Moving Average). One neat thing about that that I didn’t realize myself is that it looks like the SMA has done a decent job of acting as support/resistance historically.
Here we simulate a perfect strategy: it knows the future!
One thing to note is that the returns are not as smooth / linear as one might expect. It makes sense, since each day bar has a different `pct_change`. Some days the price doesn’t move very much, so even if we guess it perfectly, we won’t make that much money. But it’s also interesting to note that there are whole periods where the bars are smaller / bigger than average. For example, even with perfect guessing, we don’t make that much money in October of 2018.
Here we simulate what would have happened if we bought and held at the beginning of 2017 (first graph) vs shorted.
Quick explanation of the computed statistics:
- Returns: multiplicative factor on our returns (e.g. 5.2 means 420% gain or turning $1 into $5.20)
- Returns after fees: multiplicative factor on our returns, after accounting for the fees that we would have paid for each transaction. (On Bitmex each time you enter/leave a position, you pay 0.075% fees, assuming you’re placing a market order.)
- SR: is Sharpe Ratio. It’s a very common metric used to measure the performance of a strategy. “Usually, any Sharpe ratio greater than 1 is considered acceptable to good by investors. A ratio higher than 2 is rated as very good, and a ratio of 3 or higher is considered excellent.” (Source)
- % bars right: what percent of days did we guess correctly.
- % bars in the market: what percent of day were we trading (rather than being out of the market). (It’s a bit misleading here, because 1.0 = 100%)
- Bars count: number of days simulated
There are more graphs in the notebook, but you get the idea.
I’m not going to discuss this particular strategy here. I just wanted to show something more interesting than constantly holding the same position.
One of the insidious bugs you can run into while working with time series is using future information. This happens when you make a trading decision using information you wouldn’t have access to if you were trading live. One of the easiest ways to avoid it is to do all the computation in a loop, where each iteration you’re given the data you have up until that point in time, and you have to compute the trading signal from that data. That way you simply don’t have access to future data. Unfortunately this method is pretty slow when you start working with more data or if there’s a lot of computation that needs to be done for each bar.
For this reason, we’ve structured our code in a way where to compute the signal for row N, you can use any information up to and including row N. The computed `strat_signal` will be used to trade the next day’s bar (N+1). (You can see the logic for this in `add_performance_columns()`: `df['strat_pct_change'] = df['strat_signal'].shift(1) * df['pct_change']`. This way as long as you’re using standard Pandas functions and not using `shift(-number)`, you’ll likely be fine.
That’s it for now!
Potential future topics:
- What is overfit and how it impacts strategy research
- Filters (market regimes, entry/exit conditions)
- Common strategies (e.g. moving average crossover)
- Common indicators
- Using simple ML (e.g. Naive Bayes)
- Support / resistance
- Multi-coin analysis
Questions for the community:
- Do you feel like you understand what’s going on so far, or should I move slower / zoom in on one of the prerequisites?
- What topics would you like me to explore?
- What strategies are you interested to try?
Comments sorted by top scores.