Reverse engineering the memory layout of GPU inference

post by Paul Bricman (paulbricman) · 2025-04-09T15:40:28.457Z · LW · GW · 0 comments

This is a link post for https://noemaresearch.com/blog/device-structinterp

Contents

  Background Context
  Technical Challenges
  Memory Segmentation
  Future Work
None
No comments

Background Context

This research note provides a brief overview of our recent work on reverse engineering the memory layout of an inference process running on a modern hardware accelerator. We situate this work as follows:

Technical Challenges

While our previous host-side work provided a useful stepping stone, the on-device setting presented several novel obstacles which required us to refine our approach:

To the best of our knowledge, this is the first time when memory activity has been comprehensively tracked on a per-page basis on modern GPUs, albeit with error bounds inherited from the count-min sketch. The several hurdles posed by the sheer volume of data emitted by the embarrassingly parallel hardware may explain this. Note that we later argue that we may have used a machine learning hammer to cast a computer science problem as a nail, and that a more elegant approach to segmentation may be possible, though the bitter lesson will tell.

Memory Segmentation

Despite the novelty of instrumenting kernels to "track themselves" using count-min sketches, the general approach to memory segmentation remained the same as before: treat it as a machine learning problem.

Reconstructing the memory layout of a previously unseen inference process running on a GPU.

Future Work

Where do we go from this feasibility study? Several directions and implications are relevant:

0 comments

Comments sorted by top scores.