Posts
Comments
They claim that by specializing the chips for transformer workloads and removing the programmability of GPUs, they can fit an order of magnitude more compute FLOPs on the same size chip, which is plausible. But common wisdom is that LLMs are memory bandwidth limited. Model Bandwidth Utilization in inference workloads is often 60-80%, which would indicate that Nvidia's chips are reasonably well balanced in their ratio of bandwidth to compute, and that here isn't a ton of performance to be gained by just increasing compute. The Sohu chip reportedly has 144GB of HBM3E memory, the same type of memory as the Nvidia B200, with 0.75x as much memory capacity and bandwidth. Compared to the H100, the Sohu has 1.8x the memory capacity and bandwidth. They claim that performance is 20x that of an H100, which seems hard to believe based on the memory bandwidth. But in the Sohu post, they claim that it's a misconception that inference is memory bandwidth limited. If I'm understanding it correctly, increasing batch sizes reduces the bandwidth to compute ratio, so you can tune the bandwidth to compute ratio of the workload to match your hardware, but at the cost of latency. But maybe I'm missing something, if you have experience in this field please chime in on whether you think memory bandwidth will be a constraint.
Also, sdmat on reddit claims that MoE and long context length models require much more memory bandwidth, which would be bad for Sohu.
The compute die is on the TSMC 4nm process, same as the B200. Die size from the photo looks like it's at the reticle size limit, compared to B200 which uses 2 dies at the reticle limit. So, even if the Sohu chips are memory bandwidth limited, they should be ~20-30% cheaper to produce in terms of $/memory bandwidth, and much more energy efficient than Nvidia's B200. However they only support transformers (and there's a separate variant for MoE models), and if AI architectures shift then it would take Etched around 3 years to be able to launch a new chip accommodate the new architecture. If this style of chip becomes dominant, it would create a degree of lock-in to the transformer architecture and make it more difficult to switch to new architectures.
Their website is just renders not real photos, so I'm pretty sure they don't have chips made yet, and the performance numbers are theoretical and could be way off. But they just announced a $120M fundraise, so they should have enough funding to see this chip across the finish line. I made a market on Manifold on whether they will ship these within a year. I think I'm selling some of my Nvidia stock though.
IBKR just announced a new prediction market, and it pays interest on the value of your positions (fed funds rate minus 0.5%)
Domain: Engineering
Building Prototypes with Dan Gelbart https://www.youtube.com/watch?v=xMP_AfiNlX4&list=PLSGA1wWSdWaTXNhz_YkoPADUUmF1L5x2F&index=1
Dan Gelbart has been Founder and CTO of hardware companies for over 40 years, and shares his deep knowledge of tips and tricks for fast, efficient, and accurate mechanical fabrication. He covers a variety of tools, materials, and techniques that are extremely valuable to have in your toolbox.
Yeah swaptions would be nice but it seems like the minimum size is $1mm.
Why not just short-sell treasuries (e.g. TLT)?
Futures and options give you a lot more leverage than short selling. A $100k short position on TLT would be $30k of maintenance margin, compared to $7,400 for UB.
And banks and hedge funds arbitrage futures prices against the underlying asset, so trading futures basically gives you access to institutional interest rates instead of retail margin rates. Right now the rate difference for short selling on IBKR is ~5% for accounts <$100k and 1.25% for accounts between $100k and $1mm. Plus the borrow fee which is currently only 0.3% for TLT but would go up if lots of people start shorting it.
Buying TLT puts is worth looking into though.
I spent a couple hours looking at different methods to efficiently short long term bonds:
- UB Treasury Bond Futures - 30 year bonds but you have to roll every quarter on the roll date which is both a hassle and you pay the spread each time you roll. Also, the expected return if the world stays normal is significantly negative, it should be the 30 year rate minus the risk free rate, for which the average since 1977 has been 2% per year.
- SOFR Futures - pays out based on the average interest rate in a specific 3 month time period up to 10 years out, though liquidity looks poor past 5 years out. A SOFR Futures strip will have the same returns as the equivalent treasury future, except there's no need to roll them, and you have more control over the time frame you want to target. (Edit: This paper finds the returns are the same as bond futures but they find them to be around 0.5%, rather than the 2% I estimated above)
- Eris SOFR Swap Futures - you pay/get paid the difference between the fixed rate when you bought it and the current floating rate, for up to 30 years. This sounds EV neutral, plus you wouldn't need to roll them.
The Eris SOFR Swap Futures sound promising but I would need to do a lot more research before investing, I'm wondering if anyone else has thoughts on this first. I might try and create a model to estimate the expected returns of each type of instrument in a normal world and a slow takeoff 50% rate world.
Edit: According to this thread, all three are functionally identical, so any significant difference in returns should get arbitraged away. If that's the case, then the Eris Swap Futures seem to have very poor liquidity so I would not recommend them.
However, since they're architected differently, it's possible that they are arbitraged to have very similar expected return profiles in normal times, but offer very different returns if interest rates go way out of distribution.