Sparse Coding, for Mechanistic Interpretability and Activation Engineering
post by David Udell · 2023-09-23T19:16:31.772Z · LW · GW · 7 commentsContents
Introduction Technical Argument from Sparse Coding Theory Autoencoder Interpretability Pythia 70M Llama-2 7B Neuron Interpretability Baseline Path to Impact: Learning Windows into Models? Conclusion Pythia 70M Autoencoder Data Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Llama-2 7B Autoencoder Data Layer 13 None 7 comments
Especial thanks to Logan Riggs [LW · GW] and Monte MacDiarmid [LW · GW], for pointing me towards this whole research direction and for code discussion, respectively. Thanks to Alex Turner [LW · GW] for project feedback and for orienting me towards scaling activation engineering up to larger models. Thanks to Adrià Garriga-Alonso, [LW · GW] Daniel Kokotajlo, [LW · GW] Hoagy Cunningham, [LW · GW] Nina Rimsky, [LW · GW] and Garrett Baker [LW · GW] for discussion and/or draft comments. And thanks to anyone I discussed this with!
TL;DR: To separate out superimposed features represented by model neurons, train a sparse autoencoder on a layer's activations. Once you've learned a sparse autoencoding of those activations, this autoencoder's neurons can now be readily interpreted.
Introduction
All code hosted at this repository: activation_additions/sparse_coder
A bit ago, I became interested in scaling activation engineering to the largest language models I could. I was initially surprised at how effective the technique was for being such a naive approach, which made me much more enthusiastic about simple manipulations of model activation spaces.
Yudkowsky says that we cannot expect to survive without a mathematical understanding, a guiding mathematical framework, of the AI. One hunch you might have is that a linear feature combination theorem could be the root of such a guiding theory. If so, we might learn a lot about the internal learned mechanisms of models by playing with their activation spaces. I feel like tuned lens and activation additions [LW · GW] are some evidence for this hypothesis.
One major problem I experienced as I scaled up activation engineering to the largest models I could get my hands on (the new open-source Llama-2
models) was that it's hard to guess ahead of time which additions will work and which won't. You generate a new addition and stick it into a forward pass. Then, you get a few bits back observing how well the addition worked. "It would have been great," I thought, "to get a window into which concepts the model represents internally, and at which layer it does so."[1]
Sparse coding excited me at this point, because it suggested a way to learn a function from uninterpretable activations to represented, interpretable concepts! Paired with activation engineering's function from interpretable concepts to model internal activations, it sounded like a promising alignment scheme. Now, many things sound promising ahead of time. But seeing the MATS 4 Lee Sharkey team [LW · GW] get extremely clean, concrete results on Pythia
drove my confidence in this path way up.
This is the writeup of that research path. I still think this is an extremely promising interpretability path, about as important as activation engineering is.
What I do is:
- collect model activations at a layer,
- train an autoencoder on those activations with an sparsity penalty, and
- interpret the neurons of the trained autoencoder.
The neurons in the autoencoder then appear meaningful to top-token visualizations!
Technical Argument from Sparse Coding Theory
Epistemic status: Theoretical argument.
Say you collect a bunch of activation vectors from a particular layer of a trained model, during some task. These activations vectors are generally not natively interpretable. They're vectors in some space... but we have no real understanding of the meanings of that space's basis dimensions. We only know that all those activation spaces, passed through in sequence, yield coherent English speech. English concepts are being represented in there, internally, somewhere. But we don't really know how.
The problem is that there is no privileged basis in a transformer's activation space. The model was incentivized during training to learn every classifier it needed to mirror its training distribution. But there was no training incentive for each classifier to correspond to a single neuron. The training distribution is sparse: you don't need to be ready to represent each concept independently of every other concept. The training incentive actually weighed against the one-to-one neuron solution, then, as that's wasteful in weights. So there's plenty of mechanistic reason for a model's neuron activations to look like jumbled messes to us. To exploit a sparse world, learn densely compacted features.
And the solution we empirically see learned is indeed superimposed features! Don't dedicate a neuron to each feature. Have each neuron represent a linear combination of features. For this reason, all the directions in an activation space will tend to be polysemantic. If you just run PCA on an activation space, the resulting directions will often be frustratingly polysemantic.[2]
Sparse coding[3] is a solution to this superposition-of-features problem. You train autoencoders with an sparsity penalty on the activations collected from a model layer. The autoencoder can be as simple as a tied matrix, then a ReLU, then the tied matrix transpose. The learned matrix together with the ReLU maps to a larger projection space. An penalty is applied during training to autoencoder activations in this large projection space. The autoencoder is trained to reproduce the input activations while simultaneously respecting the internal representation penalty.
We're interested in particular solutions to this formal problem: learn to give each feature a neuron, i.e., have features fall along the standard basis. This way, the penalty gives good values: most of your autoencoder activation values will be precisely zero. (An penalty yields a constant negative gradient to the extent that there are non-zero elements in the autoencoder's activations.) If the activations vectors are just linearly superimposed feature dimensions, then separating them out and squeezing them back together in this way should reproduce the original vectors. That will satisfy the reproduction loss, too.
We train such an autoencoder to convergence, driving towards an value of between (in smaller models) and (in larger models). We save the trained autoencoder and examine its standard basis. Empirically, these neuronal directions appear quite semantically meaningful!
Autoencoder Interpretability
Epistemic status: Experimental observations. There's a robust effect here... but my code could absolutely still contain meaningful bugs.
Pythia 70M
Let's examine autoencoders trained at each of Pythia 70M
's layers. Our interpretability technique is checking which tokens in the prompt most activate a given autoencoder neuronal direction.
For each Pythia
autoencoder, here are ten unsorted non-zero directions and their favorite tokens:[4]
Layer 1 | |
---|---|
Dimension | Top Input Tokens |
2 | holding, speak, remember, read, learn, hears |
11 | :, )? |
76 | commissioned, gear, generate, mixed, conclude, credit |
124 | what, What, What, what |
133 | equally, most, deeply, relatively, greater, more |
166 | civil, loan |
183 | because, still, although, Because, since, although |
191 | Cl, Sn, L, Le, Mes, Mon |
206 | New, New, popular, ', old, handsome |
236 | L, l, O, ., unl, Fl |
Layer 2 | |
---|---|
Dimension | Top Input Tokens |
26 | !", ", ...", "., '. |
88 | Yes, clusively, iably, vertically, right |
96 | What, What, How, what, what, how |
154 | US, Americas, Netherlands, Massachusetts, States, bourg |
158 | presidents, pilots, Scholars, founders, Ts, Doctors |
171 | you, 'll, ), will, we, if |
185 | They, they, she, he |
243 | iless, prohibiting, custody, needs, permission |
269 | impressive, vast, cultural, sports, musical, great |
461 | sites, facilities, une, board, School, Jo |
Layer 3 | |
---|---|
Dimension | Top Input Tokens |
79 | Nik, Ir, Two, Poland, Pol, spectacular |
153 | biological, iga |
156 | attracted, rescued, confined, trouble, provided, avoided |
167 | ft, Lis, bo, ifer, Loren |
244 | (, 6, 5, 3, 7, 4 |
349 | Ċ, ard, ifer, ruct, ively, stra |
507 | 32, 1950, Pole, ple, isation, number |
714 | Anto, controll, along, ri, waters, rans |
779 | Cro, stra, Cron, Bar, Knowledge, Crick |
811 | bar, lang, rio, McC, oph, off |
Layer 4 | |
---|---|
Dimension | Top Input Tokens |
114 | Q, unequal, Gulf, Tenn, extr, GDP |
171 | ours, various, instantly, exact, technically, Ċ |
213 | och, Walt, corner, length, composition, dose |
229 | och, Little, mention, ot, af, / |
266 | A, 15, atomic, Ċ, official, My |
386 | Dec, Rod, send, Cron, catar, tou |
408 | grant, Priv, genuine, absolute, typically, legally |
472 | smell, Jupiter, auditory, thinkers, Venus, razor |
547 | Dec |
647 | och, length, dose |
Layer 5 | |
---|---|
Dimension | Top Input Tokens |
83 | penetrate, ensory, breathe, bites, distract, end |
291 | fats, sequences, ats, who, miracles, isions |
367 | deepest, official, perfect, atomic, presidential, digit |
444 | 2, 1, 6, 3, 4, 7 |
556 | Cash, Hillary, Q, Bond, go, Tea |
560 | becoming |
567 | 2, 3, 1, 4, 6, 5 |
587 | Return, atomic, Person, official, composed, room |
594 | stayed, although, lacks, although, poorer, It |
646 | Be, &, che, Che |
Full model results in footnote.[5]
In theory, these are all of the features represented in Pythia 70M
's residual streams when these activations were collected. If the technique were extended to a representative dataset and to every Pythia
sublayer, you'd in principle enumerate every single concept in Pythia
.
Empirically, layers and (the two residual spaces right after the embedding layer) are the most interpretable of the bunch. Later layers are more garbled, though some clearly meaningful dimension exist there too.[6]
Note that the interpretability method used on the autoencoders—top-k tokens in the prompt—is relatively naive. I have code for activation heatmaps and direction ablations[7], and those interpretability techniques may capture meaning that top-k tokens misses. Any interpretability technique you have for model neurons... can be applied to sparse autoencoder neurons too.
Llama-2 7B
The above results are my independent replication of the the MATS 4 Lee Sharkey team's Pythia
sparse coding. What if we scale the technique? Targeting a layer similarly early in the model, we train an autoencoder on Llama-2 7B
:
Layer 13 | |
---|---|
Dimension | Top Input Tokens |
34 | ▁All |
109 | 2, 3, 2004 |
120 | <s> |
127 | ▁England, ▁dollars, ▁Italian |
206 | ▁means, ▁refers, ▁composed, ▁learned, ▁hid, ▁she |
207 | ▁society, ▁portal, ati, unker, ▁Order, ▁mission |
253 | ▁said, ▁wrote, ▁designed, ▁statement, ▁directed, elled |
277 | ▁dan, ▁po, ▁dess, ▁Know, ▁conce, ▁Har |
328 | <s> |
331 | ▁program, ▁intelligence, ▁computer, ▁artificial, I, ▁Rob |
Full layer results in footnote.[8]
seems too low for the autoencoders trained on Llama-2 7B
. These Llama-2
results are instead at .[9] Still better interpretability results could be obtained if this range of sparsity values was better explored.
Neuron Interpretability Baseline
If you directly interpret model neurons on Llama-2 7B
using the top-k technique, your results look like this:
Layer 13 | |
---|---|
Neuron | Top Input Tokens |
0 | ▁Rafael, ▁animation, ovo, ▁beneath, ▁commun, ▁Cross |
1 | ▁Hero, emor, action, ▁Indones, ▁expedition, immer |
2 | ▁bus, ▁Sund, ▁top, ▁marriage, ander, ▁breakfast |
3 | ▁predict, ▁Ald, ▁phase, ▁overcome, rin, ▁Joy |
4 | related, ▁lazy, round, ▁Nev, UI, ▁atmosphere |
5 | ▁trans, gu, isted, ▁portal, ▁tiny, laimed |
6 | ija, ▁Chief, ▁measures, ▁valuable, space, ▁testing |
7 | ond, ▁lazy, ▁Virgin, tes, ▁conquer, ▁uniform |
8 | ▁Valley, ctions, round, ▁measures, ▁facilities, ▁variable |
9 | ▁ways, ▁definitely, isation, ▁elements, enta, ▁expl |
Path to Impact: Learning Windows into Models?
Epistemic status: Wild speculation.
The above suggests that we can train windows into each layer of a model. Each autoencoder window tells you what's going on at that layer, in human-comprehensible terms. The underlying forward pass is unaltered, but we know what concepts each layer contains.
Because you know how those concepts are mapped out of the model into the autoencoder, they are also ready to be added in through activation engineering! So you already have some interpretability and steering control.
More ambitiously, we can now try to reconstruct comprehensible model circuits. With ablations, see which features at layer affect which features at layer . Measuring the impact of features on downstream features lets you build up an interpretable "directed semantic graph" of the model's computations.
This especially is really good stuff. If you can reconstruct the circuits, you can understand the model and retarget its search algorithms. If you can understand and align powerful models, you can use those models as assistants in yet more powerful model alignment.
Conclusion
I've replicated prior sparse coding work and extended it to Llama-2 7B
. I'm hoping to keep at it and get results for Llama-2 70B
, the best model that I have access to.
Generally, I feel pretty excited about simple modifications to model activation spaces as interpretability and steering techniques! I think these are worth putting points into, as an independent alignment bet from the RLHF.
- ^
I was specifically hunting for a "truthiness" activation addition to move around TruthfulQA benchmarks. (I am unsure whether the techniques covered in the post are, in-practice, up to programatically isolating the "truthiness" vector.)
- ^
Or to an AI assistant helping you interpret neurons in a model.
- ^
Also known as "sparse dictionary learning."
- ^
Underlying
Pythia
activations were collected during six-shot TruthfulQA. (Six shot is standard in the literature.) This is a far smaller dataset than The Pile, so this was also an experiment in small dataset sparse coding.I project to a -dimensional space from
Pythia
's -dimensional activation space. Negative token activations are excluded, since the ReLU would zero all of those out—destroying any information negative values might contain.So, directions with all negative values are dropped—notice that that's most directions! Only about in are kept.
- ^
Pythia 70M
Autoencoder DataLayer 1
[Dimension] [Top Input Tokens] 2 holding, speak, remember, read, learn, hears 11 :, )? 76 commissioned, gear, generate, mixed, conclude, credit 124 what, What, What, what 133 equally, most, deeply, relatively, greater, more 166 civil, loan 183 because, still, although, Because, since, although 191 Cl, Sn, L, Le, Mes, Mon 206 New, New, popular, ', old, handsome 236 L, l, O, ., unl, Fl 254 been, be 286 The, The, decreased, unclear, higher, decrease 313 month, months 393 stunt, pylori, psychological, penal, methodology, punished 455 You, kur, You, you, ( 509 vis, kur, bron, butter, ater, iele 612 that 641 Marc, Justin, Jonathan, Milton, Jeff, Moz 675 high 708 Ċ, over, getting, taking, pushing, coming 728 into 733 ll, cliff, course, should 816 University, City, universities, Airport, Harvard, campus 859 Milky, gum 861 The, The, Three, Two, Our, His 986 ign, dens, gy, acupuncture, undergraduate 989 handsome, Johann, deeply, originated, disguised, hungry 1051 knew, worked, tells, get, knows, won 1138 Millenn, UFO, Gandhi, Herman, Disney, Smith 1148 Real, vamp, Mant, Ch, real, mat 1176 salts, pesticides, mushrooms, spiders, fluids, fertil 1182 fra, schizophren, Jedi, kur, catar, Hitler 1201 deeply, career, critically, psychological 1210 Can, can, did, Do, Did, could 1229 st, gs, never, bra, 't, ieri 1339 makes, make, making, made, How, how 1387 I, I 1423 who, who, Who, which, where 1452 Q, U 1472 orc, father, Dam, Neumann, Auto, arth 1484 horses, has, have, adolescence, burgers, ribs 1540 Algebra, databases 1595 Toronto, Madrid, Munich, Dublin, Paris, Barcelona 1612 Nobel 1647 what, something 1652 family, US, their, national, parents, mothers 1699 are, aren, Are, is, were, Were 1724 Iceland, Finland, Ireland, Poland, Switzerland, Italy 1725 turkey, hunting, salad, nausea, meat, transportation 1861 2, 2, Trans, Cre, bra, Two 1864 "., '., "?, !", '?, ?" 1868 DNA, hair, monkey, gun, palm, doll 1878 6, 7, 9, 8, 12, 13 1965 What, what, What, what 1997 all, turn, turned, All, All, both 2000 lead, La, Flight, passenger, ke, ib 2024 that, a 2125 On, on, P, on, R, Ch 2136 getting, takes, get, taking 2144 1970, 1950, 1990, 1990, II, clusive 2165 lying 2233 detox, patrol, extras, dishon, massacre, purge 2247 your, Your, my, our, his, Our 2352 What, Nobel, What, How, what, Why 2427 ?, '?, )?, "?, .?, ?" 2438 ensory, rugu, N, 3, rist, carb 2505 comment, specify, like 2509 humans, Canadians, Australians, Americans, Iceland, Europe 2568 A, A 2580 people, thinkers, everyone, participants, People, Americans 2610 chili, purple, pink, pepper, Yes, dessert 2679 What 2719 traditional, legal, organized, alloc, accessible, legally 2728 90, 86, twenty, 13, heavens, 12 2729 shouldn 2764 Swift, Harvard, York, Mind, Mex, fat 2765 everyone, every, Everyone, across, many, Many 2814 whether, or, where, unless, and, When 2825 imps, igm, ringer, dig, recogn, uj 2941 Big, mathematical, Neural, New 2955 It, it, it, All, All, all 2995 more, less, more, More, fewer, harder 3021 used, summoned, displayed, removed, accessible, useful 3071 doesn, don, didn, shouldn, Barack, Ad 3104 Economics, Knowledge, Diet, Med, Psych, iology 3131 Does, Did, 's, Do, does, Which 3132 Jacksonville, Indianapolis, University, Paso, Angeles, Carolina 3149 humid, directly, criminal, penal, dishon, bankruptcy 3160 San, San, New, New, Sant, Carn 3163 other, Which 3175 Earth 3202 What, What, How, what, what, McC 3296 :, ", all, your 3316 ", ' 3365 illegal, legal, legally, human, Legal 3379 S 3380 Marl, boil, If, melt, struck, If 3458 5, 5, five 3468 immune, unequiv, payment, proportion, millions, billions 3476 who, who 3578 actor, scientist, lawyer, engineer, sailor, artist 3584 Q, bl, I, Black, If, if 3600 ll, will, Will, would, By, by 3634 and 3650 weather, sun, snow, Snow, cold, rain 3658 ancestor, father, kidnapped, witch, husband, assassination 3685 In, in, In, During, during, along 3769 away, away, atorium, work 3808 home, father, US, childhood, house, parents 3826 swim, rib, tie, doll, wave, stretch 3968 Barack, Bill, Hillary, Bill, George, Michael 3987 the, the, The, The, vom, 2 4057 than, Than, like, 1960, 1961, as 4094 shown, showed, demonstrated, showing, show, shows 4226 )?, '?, ?, ifer, inc, itable 4236 taking, take 4251 there, There 4265 consistently, wildly 4302 gluten, steak, salmon, burger, chicken, straw 4315 Way, Tw, ), mist, Witch, lying 4334 Jenny, recogn, rico, Jonathan, uj, ima 4359 know, knows, knew 4368 Prize, word, phrase, result, periods, period 4383 Gal, S 4442 : 4512 1990, 1960, 1950, 1990, 1970, 2000 4554 rugu, Denver, Miami, Washington, Vancouver, Luis 4726 ( 4729 literally, only, just, clusively, secretly, Only 4762 smallest, brightest, best, richest, largest 4789 flawed, tiny, burned, impressive, harder, excessive 4804 ,, not, 't, not, originally 4808 The, The, the, That, the 4842 always, commonly, remains, inally, cos, inc 4865 Ċ 4887 Ital, Az, Ins, intellig, Mex, Hind 4902 No, no, Not, no, Nothing, Little 4954 do, Do, does, Did, numbers, real 4966 best, take, taking, good 4996 decades, Gates, Way, II, Clinton, years 5025 (, (, Alban, How, Cran, Massachusetts 5036 oil, breastfeeding, alive, smoke, living, women 5041 ve 5050 have, 've, had, Have, has 5052 a, an, What, What, You, With 5062 you, You, only, You, just, Only 5106 cl, fro, fl, gr, ch, merc
Layer 2
[Dimension] [Top Input Tokens] 26 !", ", ...", "., '. 88 Yes, clusively, iably, vertically, right 96 What, What, How, what, what, how 154 US, Americas, Netherlands, Massachusetts, States, bourg 158 presidents, pilots, Scholars, founders, Ts, Doctors 171 you, 'll, ), will, we, if 185 They, they, she, he 243 iless, prohibiting, custody, needs, permission 269 impressive, vast, cultural, sports, musical, great 461 sites, facilities, une, board, School, Jo 463 ll, will, would, might, should, could 574 I, i, I 592 nothing, Nothing 593 heart, world, COVID, cancer, Christ, body 594 In, In, in 665 stunt, rule, block, triggered, notice, transform 705 hasn, continue, won, keeps, doesn, stops 760 and, but, then, eventually, although, various 808 that 812 tal, impressive, ,, notable, asking, its 858 blood, Av, birth, University, healthcare, uterus 870 reading, Fe, Tele, Ind, pre, From 958 1961 987 19 1050 People, Pres, Men, people, gu, Humans 1190 multid, purpos, carn, catar, incon, unl 1228 ancestry, founder, alumni, citizens, father, personalities 1230 col, South, Mill, Ital, Ge, College 1243 izen, ija, pro, &, bron, 1996 1246 scholars, citizens, Democratic, prosecutor, personal, community 1256 Sugar, cuisine, Iron, Fire, Food, Light 1297 icides 1303 Allied, national, domestic, rious, Democratic, ied 1332 your, you, Your, You, yourself, You 1371 Qu, Qu, uff, Pink, IK, inj 1419 sing, ducks, dancing, golf, rugby, chocolate 1421 acting, Qu, aupt, acking, fat, lim 1428 most, largest, best, Most, closest, biggest 1441 ), 9, (, 6, 5, 8 1474 terrible, iless, someone, coworkers, crimes, a 1488 ?, '?, "?, )?, ?", .? 1510 onna, rico, clamation, anca, Auto, oston 1562 etics, icks, Mind, ences, thinkers, ens 1568 ois, ais, ela, Amy, aqu, au 1571 nothing, effort, consensus, obligation, Species 1721 ails, bites, razor, rifle, tricks, strikes 1771 that 1822 camel, wolf, Canadian, witch, tar, lawyer 1831 1, used, 1 1905 Yes, What, Can, Q, No, Prize 1911 your, my, her 1926 mankind, crimes, space, mentally, officers, brain 1985 3, 2, 13, 7, 6, 9 2052 (, '?, 10, Yes, not, 8 2088 S, K 2102 A, A, An, E, An, E 2120 ost, nine, 80, ra, 330, yards 2129 there, There, ucer, covered, series, coming 2170 road, sky, seat, attractions, film, pavement 2255 Puerto, 6, Denver, Vancouver, Luxem, Miami 2259 Fund, cost, restrictions, costs, powers, batteries 2340 Way, Valley, ), Massachusetts, Nevada, Angeles 2347 Declaration, International, Commonwealth, national, Cre, The 2411 5, 7, 6, 9, 8, 13 2415 pillar, object, Angel, Area, circle, Venus 2471 band, solo, rans, canon, penal, electrical 2519 living, nearly, unanimously, expecting, original, just 2548 ', ", -, per 2621 Friedman, labor, riminal, republic, politician, Witch 2642 no, No, Ċ, rit, 't, unlikely 2749 be, a, unusually, necessarily, is, an 3020 : 3049 spiritual, Black, Arab, Hindu, Ital, biological 3052 akes, idden, sea, Go, Dreams, dream 3068 there, All, All, Everyone, There, Have 3097 67, 330, 94, 58, 44, variable 3117 if, if, If, If 3123 You, You, you 3129 there, There, happens 3149 (, U, originally, Future, plans 3211 used, intended, structured, learn, transported, marry 3275 estion, Breast, Honey, infection, isexual, Nut 3324 If, If, if, unless, if, When 3357 Kn, Sn, Tar, Cr, Sha, Sant 3422 orc, tec, amation, injured, inkle, evil 3448 Norris, valuable, Bob, Be, no, Col 3459 ro, third, one, Most, characteristic, hypothetical 3508 It, it 3514 blind, hood, the, gun, tort, cat 3522 Space, iology, speech, Economics, bricks, waters 3563 than, Than 3602 rapy, izers, cards, uncture, illation, Way 3611 modeled, achieved, grant, asking, led, lets 3686 won 3830 Brian, Bill, Jeff, David, Robert, James 3838 Prize, Nobel 3851 istic, ormal, unequal, izers, otion, analogous 3874 happen, happens, happened 3894 By, uously, No 3936 All, All, all, Everyone, everything, everyone 3939 has, have, claimed, hasn, iably, 've 3945 1990, 1970, weeks, 1981, 2001, 1950 3960 Way, inkle, well, inally, hr, beth 3979 Az, reign, plan, Pon, tar, ra 4051 can, could, cannot, may, Can, must 4056 analogous, istic, impressive, ined, rious, affordable 4066 analogous, ideal, am, devoted, unlikely 4082 qual, silver, gall, chocolate, olive, chess 4152 boo, try, agree, speak, love, notice 4270 on, onto, On, across, against, via 4305 ", New, !", New, inaugural, "- 4368 Q 4382 mean, estimated, demonstrate, probably, describes, unlikely 4498 ?, )?, "?, '?, .?, '. 4628 What, insulin, salmon, What, butter, oil 4673 Q, What, What, did, Qu, does 4782 graph, Cap, :, ham, (, rop 4788 Sn, Bl, P, p, T, pe 4885 , 4892 by 4907 Q 4911 Quebec, Massachusetts, Toronto, Dublin, Paris, ivia 4925 hasn 4928 ?, optimization, '? 4941 &, illary, ering, Dec 4943 Sundays, weekends, minute, evening, sky, midnight 4987 obligated, istic, impossible, unlikely, problem, idea 5104 An, These, Their, They, Only, involves 5106 Who, What, What, what, what, Where 5107 No, impossible, no, unlikely, Nothing, t
Layer 3
[Dimension] [Top Input Tokens] 79 Nik, Ir, Two, Poland, Pol, spectacular 153 biological, iga 156 attracted, rescued, confined, trouble, provided, avoided 167 ft, Lis, bo, ifer, Loren 244 (, 6, 5, 3, 7, 4 349 Ċ, ard, ifer, ruct, ively, stra 507 32, 1950, Pole, ple, isation, number 714 Anto, controll, along, ri, waters, rans 779 Cro, stra, Cron, Bar, Knowledge, Crick 811 bar, lang, rio, McC, oph, off 905 order, Little, exagger, U, atmosphere, sand 932 there, There, lots, You, no, It 946 spoken, Bach, Mar, Cap, Dec, modeled 976 Cra, Ber, aff, Bach, ign, Er 1119 Q, izers, pen, bar, fe, oux 1140 1, 2, ak, unusual, Story, upon 1176 Q 1217 cards, biological 1230 moment, career, normal, position, condition, tendencies 1247 cooler, II, taxed, sad, bars, decrease 1408 : 1605 ney, we, vere, ana, Loren, rio 1632 uther, ind, onna, lock, Declaration, ler 1637 colonial, Hollywood, Asian, Indonesia, Portuguese, Florida 1750 abundant, useful, known, affordable, basic, cards 1774 20, Ċ, Related, Rum, thirty, unequiv 2008 million, anymore 2016 leg, mist, mant, ble, watches, suit 2031 immer, yl, away, across, iz, wart 2192 lang, bron 2451 bites, brush 2455 Golden, Elvis, Solar, Steve, ice, chocolate 2541 blocked, parchment, cocaine, permission 2588 highest, position, Declaration, not 2604 arms, swe, boss, alcohol, gum, chairs 2610 abdomen, heavens, mankind, sts, ano, further 2636 ll, will, should 2657 S, rugu, Belgium, Italy, Greece, K 2658 Yellow, oused, preferred 2689 Yes, No, If, Unknown, There, Only 2720 Q, Can, (, eats, tons, love 2728 Happ, of, law, app, war, lots 2780 rain, head, score 2803 trans, path, officers, ceremonies, rop, pilots 2943 accept, interrog, teach, predict, inflict, save 2978 Lis 3019 aro, iga, Cape, asses, more 3095 C, Tw, enth, na, ch, ISA 3128 stove, River, cord, investor, bird, Tri 3213 Er, biological, Europeans, AI, em, brates 3231 Rec, Bel, Ac, Sch, Ad, uther 3281 inflation, ports, yellow, pneumonia, video, thirty 3421 ual, Col, collected, credited, und, obligated 3556 Can, Q, Antar, Have, shower, reb 3632 aff, rapper 3675 otics, ella, uit, ilis, icorn, ija 3717 pp, ater, ent, responsible, aro, refer 3723 lazy, dig, cre, talent, skilled, confined 3744 kind, weeks, thirty, 1000, backwards, happens 3799 Carl, Bryant, Holmes, Freud, Cunningham, Curry 3820 strikes, such, rans, strike, hid, when 3873 less, more, decreased, stayed 3881 Tiger, Pink, Fire, Sugar, Birds, Rich 3901 than, Than 3956 U, tw, Ang, 1990, Lanc, Po 4000 Pole, atmosphere, Building, disorders, pregnancy, yours 4054 unusual, approved, named, thousand, several, getting 4140 A, Allen, An, A, Di, com 4157 describes, whether, third, Because, statements, ey 4181 How, Where, bla, Known, reduce, what 4310 & 4393 13, 3, 12, 5, 1, 4 4426 aqu, Ber, Mar, enn, oge 4435 Queens, Real, 1961, NY, 2003, Trans 4443 The, Kon, The 4469 ), :, illi, Viol, Spot, lim 4530 relatively, accessible 4589 the, your 4616 recorded, notable, existing, basic, several 4670 deprive, historically, recently, FK, shit, Bill 4702 (, oph, Rh, Dec 4709 bla, iga 4724 ?, "?, .?, ?", '?, upset 4767 Despite, Does, unusual, Do, Did, What 4881 Sea, grasp, Cap, record, angle 4956 tells, &, contributes, hasn, comes, came 5021 Q, Can 5076 separately, action, grid, lasts, cleans, plot 5081 drops, keeps, reduces, improves, provides, increases 5108 minute, average
Layer 4
[Dimension] [Top Input Tokens] 114 Q, unequal, Gulf, Tenn, extr, GDP 171 ours, various, instantly, exact, technically, Ċ 213 och, Walt, corner, length, composition, dose 229 och, Little, mention, ot, af, / 266 A, 15, atomic, Ċ, official, My 386 Dec, Rod, send, Cron, catar, tou 408 grant, Priv, genuine, absolute, typically, legally 472 smell, Jupiter, auditory, thinkers, Venus, razor 547 Dec 647 och, length, dose 946 Q, ais, (, smash, pir, iele 1158 28 1607 6, 5, 7, 3, 1, 4 1635 length 2327 digit, deepest, abundant, official, perfect, icking 2448 ais, St, arriv, even, pushing, ous 2747 ais, navig, ag, Sov, Kore, Y 2989 och, ala, Knowledge, participants 3048 15, atomic, tells, obese, undercover, Yellow 3265 Q 3379 ( 3655 âĢĻ 3829 unanimously, miserable, Fox, absolute, Imm, deepest 3870 smash, prick, learned, lets, extend, imagine 4061 ), Pink, Rich, ali, Most, Carolina 4083 lots, length, och, af, corner, dose 4090 directed, Franklin, elson, ek, Fleming, Auckland 4279 ais, St, La, Lav, Gal, Ost 4524 absolute, Little 4624 af, ath 5079 och, 12, 9, 8, 6, 10
Layer 5
[Dimension] [Top Input Tokens] 83 penetrate, ensory, breathe, bites, distract, end 291 fats, sequences, ats, who, miracles, isions 367 deepest, official, perfect, atomic, presidential, digit 444 2, 1, 6, 3, 4, 7 556 Cash, Hillary, Q, Bond, go, Tea 560 becoming 567 2, 3, 1, 4, 6, 5 587 Return, atomic, Person, official, composed, room 594 stayed, although, lacks, although, poorer, It 646 Be, &, che, Che 674 P, p 733 Did, Can, Should, Does, Was, Is 758 Q, jet 790 pent, Miranda, Middle, St, Ex, Pil 982 accumulated, asses 985 approximately, below, Wait, lasts, wait, 7 1081 several, seven, 13, 5, hundred, six 1090 Theorem, founder, root, Wizard, root, cil 1258 deprive, warn, invoke, leaving, causes, discovered 1418 What, what, what, nothing, 7, What 1492 ind, San 1592 Tar 1644 F, ch, C, H, F, sc 1665 pp, abl, Clock, ey, erv, text 1695 :, How, aqu, icking, Ger, ) 1893 extr 1963 (, (, cl, entr, November, Gal 1996 undergo, temporary, high, permanent, lacks 2007 What, How, highest, All, various, icking 2185 temporary, permanent, invoke, feel, simply, uterus 2262 Ċ, Q, "., '., .", !" 2340 except, United, Great, visiting, refers, Luxem 2361 aqu, St, acc, Be, Puerto, mentally 2445 instantly, slowly, should, could, immediately, drank 2453 izen, becoming, Be, Qu, NY, gy 2495 ayan, calorie, asses, powdered, soft, accumulated 2730 prep, meat, jail, inner, Yoga, Toast 2755 delicious 2803 presidential, national, Federal, conservative, Cod, fly 2810 laughter, helium, Arizona, atmosphere, extinct, lungs 2858 iss, iv, urop, gr, isc, ruct 3001 ?, ?, "?, .?, )?, '. 3195 ented, Has, deprive, Did, acting, warn 3233 analyzing, When, receive, How, :, Where 3428 U, I, I, O, s, i 3477 rico, Blake, Justin, Albert, Jeff, Charles 3493 1, 2, 3, 4, 6, 5 3504 Q, able, accurately, 2, iele, refers 3528 speaks, visiting, except, conquered, In, vs 3564 Person, Return, end, rest, Theorem, list 3576 navig, Ins, ais, Pri, Priv, icking 3688 ch, Ch 3767 Ċ 3944 ais, Tar, absolute, Gal, Ins, Ther 4072 banned, outlaw, cars, accepted, originated, scores 4085 2, 3, 1, 4, 6, ( 4114 Asian, oldest, cultural, Trans, New, aster 4228 work 4274 No, comment, Yes, unclear, definite, conclusive 4560 ais, Priv, Ins, Tar, aster 4569 (, (, Q, Type, Gen, Dr 4597 highly, openly, becoming, necessarily, Ger, unusually 4649 necessarily, highly, totally, entirely, relatively, unusually 4688 Return, Person, official, Theorem, lots, rest 4807 including, .?, El, Qu, Real, Mer 4827 organized, uffs, fed, meat, decreased, ella 4854 Person, âĢĻ, rest, Theorem 4874 phants, ats, fear, girls, Scientists, pigs 4937 rabbit, delicious, living, praying, electric, official 4983 fat, Democratic, Most, conservative, educational, Ger
- ^
My experience with the bigger models leads me to think that, plausibly, better results for those other layers could come from different sparsity values. That is, maybe, there isn't a single best sparsity for all layers of a model.
- ^
Heatmap code courtesy of Alan Cooney's
CircuitsVis
library. - ^
Llama-2 7B
Autoencoder DataLayer 13
[Dimension] [Top Input Tokens] 34 ▁All 109 2, 3, 2004 120 <s> 127 ▁England, ▁dollars, ▁Italian 206 ▁means, ▁refers, ▁composed, ▁learned, ▁hid, ▁she 207 ▁society, ▁portal, ati, unker, ▁Order, ▁mission 253 ▁said, ▁wrote, ▁designed, ▁statement, ▁directed, elled 277 ▁dan, ▁po, ▁dess, ▁Know, ▁conce, ▁Har 328 <s> 331 ▁program, ▁intelligence, ▁computer, ▁artificial, I, ▁Rob 336 ▁foot 392 ▁nin, ▁did, ris, ▁ugly, ▁differ, ▁por 416 ▁except, ▁and, aria, ries, ▁Bel, ▁vs 444 ▁few, ▁Many, ▁Very, ▁unlikely, ▁fewer, ▁Most 527 ▁high, ▁college, ▁graduated, ▁school, ▁finish, ▁teachers 629 ▁grown, ▁presented, ▁without, ▁moved 666 ▁stayed, recogn, ▁consist, ▁same, ▁stay, ▁equally 667 A 703 ▁entrepr, XT 774 ▁diam, ▁ugly, ▁por, ▁vision, ▁artists, ▁news 820 ▁pushing, ▁hide, ▁hid, ▁lying, ▁telling, ▁inform 823 ads, won, urus, ▁boys, ▁Sib, ws 842 ▁particular, ▁Nothing, ▁nothing, ▁happens, ▁happen, ▁anything 863 ▁happy, ▁prosper, ▁hun, ▁experience, ▁stub, ▁will 867 ▁levels, ▁accum, ▁blocked, block, ulated, ▁waves 904 ▁expect, ancy, ▁extend, ▁growth, arter, ▁gain 941 <s>, <0x0A> 1114 ▁after, After, ▁August, atra, ▁War, ▁began 1146 ▁position, ▁rate, ▁link, ▁phrase, ▁sound, ▁purpose 1200 ▁its, ▁tries, fr, ▁Its, ▁Har, ▁national 1221 ▁In, ▁in 1354 ▁planet, ▁solar, ky, ▁Earth, ▁Sol, ▁System 1408 ▁name, ▁named, ▁called, ▁height, amed, ▁friend 1522 ▁shorter 1705 ▁Diet, ken, olate, father, orie, can 1728 ording, ▁marry, aking, ▁accept, hing, ▁hitting 1730 rial 1735 anned, wed, ▁cens, ▁still, ▁remain, ▁ban 1739 ▁figures 1787 <s>, Q, 7, 6, 8, ▁Why 1804 ▁Philadelphia, ▁Paris, HT, ▁ha, ▁Rome, ion 1834 ▁examples, ▁example, ▁some, ▁Notable, ▁characteristic, ▁cases 1940 ▁Theorem 1949 ▁no 2063 <s> 2100 pie, ▁single, enta, ▁orange, ▁minute, ruit 2128 ▁leave, ▁stick, ▁suspect, ▁draw, ▁sees, ▁disturb 2233 ▁among, ▁case, ▁if, ▁aid, ▁contribute, ails 2252 ▁speak, ▁wore, ▁am, ▁accept, ▁holding, ▁recommend 2268 ▁interesting, ▁Person, ▁Time, ▁Year 2443 ▁hid, ▁film, ▁Grand, Cast, ▁Cost, To 2455 ▁foot, ters, ▁horn, iums, ▁scales 2511 rac, hard, umann, ems, hner, fe 2527 ▁dogs, ▁positive, ▁verte, ▁prime, br, ▁Christians 2612 ▁Stars 2648 ▁It, ▁it, ▁They, ▁him, ▁dist, ▁they 2708 ▁Sm, ▁Video, ▁Crit, ▁Organ, ▁Disc, ▁Le 2792 ▁eyes, ight, ▁battery, ▁fingers, ▁damage, rain 2856 ining, ▁stops, ▁always, ▁forever, ▁never, ible 2976 ▁purchase, ▁obtain, ▁add 3020 ▁Why, ▁Who, ▁Where, ▁What, ▁Which, ▁How 3029 ices 3114 ▁used, ▁crashes, ▁spent 3227 avia, ▁Dutch, oa, ians, ▁Indians, ests 3258 ▁phen, FO, ormal, ▁ESP, ition, ▁medium 3324 ▁well, ▁add, uent, ▁numbers, ▁talk, ▁accomplished 3342 ▁Nobel, riz, ▁Prize, ure, ▁Theorem, ▁Olympics 3354 %., cer, ya, ., )., ▁determ 3490 5, ▁entrepr 3516 ▁player, ▁greatest, ▁basketball, ▁popular, ▁desert 3598 ll, ▁ticket, ▁would, ▁license, ▁need, ▁must 3599 ▁only, ▁located, ▁lets, ▁refuse, ▁contain, Only 3611 ▁fans, ▁Christians, ▁Only, ▁good, ies, ons 3671 ▁designed, ▁started, ▁Who, ▁founder, ▁invent, ▁first 3756 digit, ▁atomic, ▁double, ▁risk, ▁prime, ▁official 3823 ▁destroyed, ▁ax, pped, ▁cho, ▁lifted, ▁attacked 3843 ▁restart 4000 <s> 4027 ▁particular, ▁happens, ▁happen, ▁ways, ▁aspects, ▁injured 4061 <s> 4065 ▁best, ▁favorite, icious, imate, ite, ▁greatest 4087 ▁eight, ▁five, ▁thirty, ▁several, ▁seven, ▁three 4106 <s> 4309 XT 4426 ▁creation, ▁board, ▁campaign, den, ▁move 4452 ector, activity, ▁meters, ▁skills, can, una 4460 ▁as, ▁well 4478 can, ▁convention, cy, ests, las, ucha 4483 ▁smaller, ▁larger, ▁rich, ▁Rich, ▁poor, ▁pover 4573 enda 4576 ▁right, board, ▁Last, ▁girls, ▁Rem, ▁Er 4593 ▁(, A, ▁easiest, ▁tells, ▁personally, Q 4617 <s>, ▁proofs, ▁varied, ▁accessible, ene, ▁distinct 4671 ▁stayed, ▁keeps, ▁stay, ▁keep, ▁continue, ▁consist 4743 aked, ▁flat, olen, aged 4748 ▁seat, at, ▁back, ▁side, ▁lap, ▁bus 4766 isons, ▁Greece, uto, ▁contribute, ▁twenty, ▁sing 4908 ▁marry, ▁Your, ▁your, ▁my, ▁My, ▁their 4981 ▁Person, ▁magic, af, ▁Mal, imal, ▁Notre 4984 amp, ead, ylvan, itch, ires, ▁drag 5055 %., '., )., ., "., ." 5057 <0x0A>, 3, 4, 5 5171 ▁illegal, ▁legal, ▁ban, ▁allowed, ▁prohib, law 5309 ▁in, ▁among, ▁In, ▁across, ▁during, wed 5310 ▁phrase, ▁term, ", word, OS, ingo 5330 ▁exact, ▁precise, ▁reliable 5413 ▁foot, s 5465 :, ▁Is, ▁Are, ▁Does, ▁Was, ▁How 5517 ▁Sydney, ka, ington, apolis, ▁Chicago 5557 ▁remain, ▁yours, ▁activities, ▁films, ▁subjects, ▁song 5565 ▁', ▁", ▁word, E, ▁phrase, but 5624 ▁planet, ▁systems, ▁potential, ▁unique, ▁similar, ▁phase 5639 ▁shares, ▁gets, ▁got, ▁smoke, umes, ▁produces 5687 ) 5704 question, ▁prompt, ▁fact, ▁question, ▁shared, ▁instruction 5852 ▁doubt, ▁seen, ▁told, ▁sure, ▁shown, ▁personally 5862 <s> 5890 och 5922 ▁a, ▁A, ▁an, An 5942 <s> 5968 ▁Why 6003 key, it, ▁rabb, ▁mouse, ▁husband, aker 6009 vis, ▁Steve, ary, ▁baby, ▁Boston, ▁Scottish 6019 ▁Pot, ▁Harry, ▁Row, iz, arts, w 6066 ▁learned 6147 ▁Uruguay, ▁Chile, ▁sib, ▁Luxemb, ▁Sib, ▁Pakistan 6215 ▁There, ▁Nothing, here, ▁no, ▁nothing, ▁Now 6319 inos, avia, ▁descent, enders, ▁third, ▁budget 6348 <s>, <0x0A> 6374 etes, ama, ▁cookies, rio, esa, ▁Light 6420 ▁The, ▁Break, ▁Si, ▁Sig 6660 :, ▁Despite 6661 ▁element, ▁animal, ▁desert, ▁factor, ▁university, ▁sport 6702 ▁the 6739 ▁U, ▁Des, ▁Cur, ▁Sy, ▁Diet, ▁fam 6744 ▁dawn, ey, working, ulf, XT, ▁saf 6778 ▁mount, ▁identify, ▁specific, ▁onto, ▁predict, ▁let 6997 ER, ▁Independ, ▁Little, ▁navig, ellow, anst 7031 ▁Chile, ▁Venezuela, ▁China, ourg, ▁Canad, ▁Switzerland 7082 ▁Some, ▁some, ▁sometimes, ometimes, ▁kinds, ▁Many 7094 <s> 7104 ▁Q 7130 <s>, round, ▁Goth, ctions, ▁attra, ▁architecture 7154 we 7216 ▁varied, rane 7224 ▁No, ▁non, ▁Every, ▁Near, ▁Non, ▁Last 7231 working, orf, round, itte, XT, ▁dawn 7257 ometimes, Only, ▁whole, ought, You, ▁peace 7271 ▁tin, il, ▁silver, ▁wooden, ▁hat, ▁fo 7297 ▁Mus, ▁Mun, ▁bos, ▁mus, ▁Lis, ▁Cur 7306 ▁United, ▁Republic, ▁Council, ▁Middle, ▁Great, Un 7372 ▁improve, ▁helps, ▁causes, ▁extend, ▁boost, ▁affect 7381 ▁can, ▁Can, ann, ▁canon, ▁cannot, ▁could 7415 ▁largest, ▁animal, ▁giant, ▁living, ark, ▁large 7441 ▁Star, ▁Little, le, ▁Dragon, Tw, AS 7448 ) 7490 ▁knows, ▁know, ▁knew, ▁agree, ▁admit, ▁learned 7511 ▁tum, cin, ▁aut, ▁cancer, etes, ism 7602 ▁composer, ▁unknown, ▁specify, ▁unclear, ▁individual, ▁recorded 7624 ▁UK, ▁Florida, ▁Bible, ▁US, ▁estimated 7653 : 7673 ▁someone, etal, ▁baby, ▁determined, ▁determine, ▁sex 7716 2, 3 7787 ▁Sym, ▁Ult, ▁kin, ▁cart, ▁Linear, ▁Ge 7831 <s>, ▁entrepr, pr, rane, ▁Q, ord 7833 ▁Orange, father, acre, ust, ye, ▁List 7883 <s> 7907 <s> 7980 DP, ▁terms, ▁per, ▁median, ita, ▁income 8023 ▁extr, rial, ▁over, ▁origin, ▁root 8026 ▁Pennsylvania, ▁Carolina, ota, las, ▁Alabama, hner 8078 ▁am, ▁I, ▁My, m, I, ▁my 8091 ▁Three 8095 ▁, ▁$, 9, ▁last, /, ▁War 8117 ▁without 8135 ▁countries, ▁cities, ▁country, ▁nation, ▁county, ▁city 8144 4, ▁four 8181 ▁pos 8206 what, ▁what, ▁which, )?, ▁situations, ... 8221 ▁root, imate 8227 ▁reflection, ▁stick, ▁while, ically, ▁dropped, ▁inform 8287 ▁soon 8314 ▁You, You, ▁Your, ▁They, ▁you, ▁We 8367 ▁easiest, iest, ▁biggest, ▁largest, ▁favorite, ▁interesting 8376 ▁depends, ▁corner, ▁distinct, ▁Because 8481 ▁mention, ▁discuss, ▁use, ▁accept, ▁change, ▁hid 8484 ▁Q, ▁All, ▁Every, here, ▁Part, ▁Near 8515 ▁similar, ▁valuable, ied, ▁properties, ▁systems, ▁notable 8536 ▁November, ▁August, ▁July, ▁pm, /, ▁May 8547 ▁wall, ror, ▁mirror, ▁beautiful, ▁anymore, ▁Little 8635 ener, ▁grow, ▁back, ▁reg, ▁grows, ▁two 8675 XT 8737 pan, ▁Muslim, ▁Korean, ▁Asian, ▁Lat, ▁Chinese 8761 ▁smoke, ▁consume, umes, ▁drink, ▁shares, ▁work 8812 ▁list, ment, ▁Way, ies, ames, ancy 8842 ▁new 8877 ?, "?, )?, ?", ▁compared, ▁compare 8925 atic, edy, ▁reserved, ▁curious, ▁earnest, ▁friendly 8954 ack, ▁Ob, ardo, ▁Mitt, ▁president, ille 8960 ▁either, ▁could, ▁may, ▁fall, iety, ▁possibly 8978 <s> 9008 ▁United 9036 <s> 9069 ▁most, ▁else, ▁least, ▁highest, ▁priority, ▁Most 9270 XT 9288 ▁fact, ▁factor, ▁truth, ▁factors, ▁principle, ▁belief 9384 <s> 9447 aten, ▁treatment, ▁shows, ▁where, ▁contribute, ▁guarantee 9487 ▁twenty 9526 ▁nearly, ▁where 9535 ulf, ▁cultural, ▁divers, ▁looks, ouses, round 9546 <s>, 1, 2, ▁(, 3, 4 9566 aking, ▁rub, hing, ▁tie, ▁touch, ▁disturb 9592 ▁than, ▁near, qual, ▁require, ▁Among, aller 9648 ▁six, ▁days, ▁created, ▁gradually, ▁create, ▁Adam 9660 ▁passenger, ▁produces 9729 ▁The 9765 <0x0A>, ▁strik, ▁Chart, aret, ▁mic 9785 ▁location, ▁ambigu, ▁depends, ▁treated, ▁circumstances, ▁position 9796 ▁add, ▁extend, ▁shares, ▁numbers, ▁smoke, ▁modify 9814 ▁helps, ▁turns, ▁determine, ▁soon, ▁showed, ▁cle 9837 ▁years, ▁minute, ▁year, ▁ten, ▁enough, pm 9866 ▁Council 9888 ▁then, ▁welcome, ▁nothing, ▁knock, ▁will, ▁instantly 9909 ▁hard, ▁worker, ▁harder, ▁effort, ▁efforts, ▁lazy 9973 ▁outside, ors, ▁weather, ▁out, ▁paths, ▁selected 9981 ▁Why 10045 <s> 10066 ▁England, ▁Great, ▁EU, ▁English, ▁Italian, ▁Britain 10136 ▁and, ▁or, ▁while 10138 ▁shown, ▁demonstrated, ▁proven, ▁accepted, ▁confirmed, ▁displayed 10175 ▁visited, ▁set 10183 ▁mother, ▁cord, ▁them, ▁they 10207 inking, ▁moder, ▁quantities, ▁too, ▁dos, ▁consumption 10213 ▁audience, ▁causes, ▁cause, ▁ru, ▁creates, ▁play 10253 ling, opy, ten, ▁Bow, iele, ool 10331 ▁asc, ▁commission, gu, ▁struct, fl, ▁transport 10348 S, ▁US, ▁USA, .,, ▁States, ▁American 10458 ights 10512 ▁visible, ▁jump, ▁sink, ▁lifted, ▁painted, iled 10519 ▁biggest, ▁highest, ▁largest, ▁smallest, ties, ▁city 10523 Q, ▁question, ▁questions, q 10593 ▁Albums, ▁Records, ▁records, ▁Earth, ▁Songs, ▁albums 10616 ▁dollars, ▁much, qual, ▁year, ▁average 10639 7, 8, 9, ▁seven, ▁Seven 10656 ▁located, ▁host, ▁contain, ▁selected, ▁love, ▁spent 10710 ▁increased, ▁decl, ▁harder, ▁expensive, ▁stayed, ▁less 10738 ▁video, ▁record, ures, ▁Video, ▁end, ▁substitute 10796 ▁Sydney, ▁Dublin, ▁Chicago, ▁Toronto, ington, ways 10798 2 10866 gate, win, so, XT, uru, ▁Columb 11069 <s>, ▁Dom, ▁dawn, ▁fran, board, fe 11083 <s>, ray, <0x0A>, ▁Found, eu, clam 11120 <s>, qual, all, erves, ▁players, wed 11218 ▁score, ▁plants, ▁incident, pper, ▁success, market 11229 ▁tower, ▁diverse, ▁vast, enth, ▁varied, XT 11251 ▁Burn, ▁burning, ▁burn, une, ▁fortune, ec 11269 aw, ains, work, uda, ▁Mass, mouth 11287 Real, XT 11297 ames, ▁sometimes, ▁great, ment, ▁top, ▁lets 11302 ▁player, ▁president 11354 ▁entrepr, pr, rane, ord, ▁able 11411 <s> 11549 ▁round, ▁flat, ▁shape, ▁particle, ▁float, ▁forward 11560 fo, ef, ▁tea, nab, ▁lung, ung 11584 ▁leader, ▁released, ▁plays, ▁singer, ▁monarch, ▁achieved 11662 ▁else, ▁anywhere, ▁other, ▁source, ▁places, ▁countries 11704 <s> 11827 :, ▁How, ▁Pay, ▁What, ▁Rel, Q 11841 <s> 11856 ), :, 3, 4, ▁Yes, ▁No 11941 ▁stand, ▁stood, ▁stands, ▁refers, ▁refer, ▁mean 11943 <s> 11947 ese, MI, Is, ▁pover, ▁ob, ▁inequality 11970 ▁gives, ▁Their, ▁wore, ▁provides, ▁stood, ▁should 12038 ▁wall, eth, ▁finger, ror 12046 ▁yours 12097 ▁produces, ▁led, ▁directed, ▁wrote, ▁gets, ▁makes 12156 ▁suffer, ▁suff, ▁damage, ▁experience, ▁receive, ode 12199 ▁measure, ▁players, ▁cars, ▁oil, ▁results, ades 12215 ▁shared, )?, ▁composition, ?, ▁characteristic, ▁song 12357 burg, ija, ellers, ▁Garden, alem, named 12434 ▁Type, ▁Pow, ▁Bl, ▁Altern, ▁Crit, ▁Sm 12453 ▁further, ▁feet, ▁closer, ▁or 12490 ▁twenty, ▁next, ▁tries, night, ▁years, ▁threatened 12496 ▁sink, rown, ode, ▁shoot, ▁kick, ▁lifted 12599 ▁required, ▁always, ▁typically, ▁enjoy 12812 ▁turns, ▁turned, ▁into, ▁new, ▁generate, ▁teach 12814 ▁Egypt, ▁Austria, plane, ▁River, ▁Africa, ears 12931 ▁hour, ▁minutes, ▁wait, ▁Wait, ▁before, ▁weeks 13058 <0x0A> 13133 ▁among, ▁since, ▁twenty, ▁terms, ▁decl, ▁today 13152 ▁no 13201 ▁hitting, ank, itting, ▁child, ▁hit, ▁domestic 13221 ▁produces, ▁stands, ▁Science, ▁stood, ▁gets, ways 13327 ▁Q 13352 ating, iders 13360 ▁Sig, ▁Claud 13366 ▁aren, ▁doesn, ▁hasn, ▁isn, of, ▁strik 13371 ▁relative, ▁forb, ▁subjects, ▁equipment, ▁unusual, ▁brand 13412 <s> 13414 ▁cookie, ▁lamp, ▁television, ▁foot, ▁hat, ▁score 13447 ros, ▁eu, ▁Eu, cs, ▁kr, ▁fran 13462 ▁rice, ave, omy, ▁passenger, ▁VIII, imming 13463 ey, ▁Q, LS 13567 erson, enberg, we 13644 ▁entrepr, pr, rane, ord 13682 ▁optimization, ey, <s>, ue 13701 %. 13710 ▁bars, ▁hit, ▁partner, ▁gun, ▁defense, ▁purposes 13745 ▁further, ▁feet, ▁or 13767 ▁vo, ▁kar, ▁contract, ▁por, ▁ing, ▁ant 13776 ▁full, ▁perfect, ▁absolute, ▁perfectly, ature, oked 13779 <s> 13814 ll, ▁will, ▁would, ▁Will, ▁notice, ▁instantly 13847 ▁Montreal, ▁Amsterdam, ▁Seattle, ▁Boston, ▁Philadelphia, ▁Virginia 13867 ▁Science, ▁scientific, ally, ▁Scient, ▁scient, ▁experiments 13920 question, ▁word, ▁words, ▁once, hand, ▁individual 14013 ▁Books, ▁records, ▁books, ▁Albums, ▁Records, ▁films 14154 place, ▁afternoon, ▁evening, ▁corner, ▁outside, ▁lit 14165 ▁shouldn, ▁acknow, ▁mod 14216 ▁today 14222 ▁No 14307 alt, ril, ina, icole, ardo, ifer 14393 ries, ▁Books, ▁People, ▁places, ▁group, ips 14447 ▁pushing, anim 14462 ▁Fl, ▁AT, ▁Sil, ▁ver, ▁Th, ▁bill 14622 ▁Yes, ▁No, ▁Nothing, ▁depends, here, , 14646 .", '., ▁purposes, ests, cy, ways 14660 ellow 14684 ▁circle 14703 ads, ise, ises, urus, arks, igs 14708 ▁Theorem, laration, ws, ▁Independ, clam, amental 14737 ▁bos, ▁grasp, ▁overcome, ▁purpose, ▁am, ▁move 14767 ▁cig, ar, igare, ▁anymore, ▁watched, ▁Kansas 14775 1 14807 <s> 14823 ▁returns, %, ▁mile, ▁year, ▁every, ▁scores 14854 ▁Q, <0x0A>, 2, 3, )., ) 14932 ▁Joe, ▁Benjamin, ▁Adolf, ▁Christopher, ▁Larry, ▁Michael 14974 ▁ideas, iration, ▁insp, ative, ision, ▁cre 15063 ▁Nick, ▁Pay, ▁Ul, ▁Son, ▁Non, ▁reads 15067 ▁winter, ▁summer, ▁February, ▁Sunday, ▁afternoon, ▁villa 15080 ▁modern, ▁buildings, ▁dawn 15231 ▁Montreal, ▁Indians, ▁Amsterdam, ▁har, icans, ▁Rus 15237 inf, ▁rain, ining, ▁snow, all, so 15318 ." 15350 ▁USA, ▁Video, ▁Records, ▁Sm, ▁Crit, ector 15438 ▁purpose, vention, ▁invent, ▁origin, ▁evol, ▁precise 15467 ▁Hill, ▁El, ▁Bern, ▁Fund, amental, ▁Jenn 15506 iju, nab, itution, ▁Dru, ▁burning, rooms 15518 och, ky, ▁Notre, ▁Lanc, ess 15552 ▁Only 15563 ▁entrepr, pr, rane, ord, ▁able, orf 15591 ▁been, ▁turns, ▁helps, ▁unsafe, ▁had, ▁spent 15635 ▁Mount, ▁Saint 15636 ▁examples, ▁example, ▁characteristic, ▁por, ▁Are, ▁some 15707 ▁gets, ▁produces, umes, ▁consume, odia, ▁slightly 15721 ways, xygen, ▁Bush, ▁Columbia, ▁carbon, ▁Jordan 15751 (, <0x0A>, Q, 5, 6, 4 15763 ▁won, ▁win, aten 15835 <s> 15849 ▁rest, ▁criminal, ▁face, ▁trial, ▁tries, ▁Little 15864 ▁weeks, ▁across, ▁miles, ▁months, ▁million, ▁drive 15872 ▁values, ▁prices, ▁rates, ▁costs, comes, ▁price 15987 ▁involve 15999 XT, ▁dawn, LS 16028 ▁university, ▁city, ▁mode, ▁island, ▁Saint 16029 ▁average, verage, ▁median, ▁approximately, ▁typically, ▁estimated 16030 ▁event, ▁activities, ▁subjects, icas, ▁trait, ▁date 16041 ▁Afr, ▁Indians, ▁Spanish, ▁Japanese, ables, ▁Portuguese 16101 ▁organized, ▁shed, ▁playing, cial, ▁passenger, ographic 16110 ▁transform, ▁knock, ▁invoke, ▁fall, ▁join, ▁lifted 16129 A 16138 <s> 16219 ▁grown, ▁necessarily, ▁food, ▁bread, ▁consumption, ier 16413 ▁plants, ables, ▁Asia, ▁veget, pes, ▁science 16574 ▁finish, ▁graduated, unk, ▁college, ▁school, ▁high 16578 ▁lines, ▁position, ▁positions 16617 lam, po, augh, inden, enberg, ait 16637 ▁letters, ym, ▁word, ▁letter, ▁abbre, ▁phrase 16649 ▁exact, ▁composition, ▁song, our 16659 ▁located, ▁selected, ▁further, ▁host, ▁official, ▁contain 16786 ▁letter, ▁named, ▁phrase, ▁word, ▁type, ▁color 16798 ici, icy, ili, pper, eds, ▁pe 16806 ▁dess 16811 ▁entrepr 16828 ▁further, ▁closer, ▁feet 16837 ▁fortune, une, oo, gly, ▁Iron, iger 16884 FI, ▁Time, ▁Scient 16907 ▁pover, MI, ▁rib, ▁hours, ▁income, DP 16973 ▁studied, ▁study, ▁imagine, ▁prep, ▁hard, cis 17013 ▁(, A, :, 5, 4, 6 17061 ▁than, ▁Mount, iders, ▁Sam, ▁Bel, ▁San 17141 ▁independent, ization, ▁joined, ▁conquer, ▁colonial, ony 17224 ▁(, (, 1 17232 ▁entrepr, pr, rane, ord, ▁able 17397 ▁particular, ▁individual, ▁specify, ▁correlation, ▁normally, ▁necessarily 17466 <s>, uda, ellow, ky, /, fr 17481 ▁Nothing, ▁nothing, ▁everything, ▁soon 17621 ▁build, force, ▁en, ▁via, ▁perform, lict 17683 ices, ice 17739 ▁if, ▁unless, ▁when, ▁because, iting, ▁case 17818 ▁among, ▁involve 17841 ▁because, ▁Because, ▁then, although 17850 ▁have, ▁Have, ▁I 17865 <s> 17916 las, enz, Out, enberg, lain, umann 17944 ▁resources, alem, LS, ▁varied, ouses, stal 17968 arter, ▁smart, ▁minds, ▁performance, ent, ▁intellig 18023 ▁Austria, ▁Wales, sh, ▁differently, ▁they, ▁pay 18108 erves, all, ▁well, fall, ▁level, ▁big 18155 <s>, ulf, ▁Q, agger, itan, ▁oldest 18162 ▁more, ▁fewer, ▁less, ▁lower, ▁bigger, ▁greater 18199 ym, National, All, ▁abbre, Sh, For 18207 inc 18208 ▁If 18226 ▁try, ▁news, ▁currently, inos, ▁similarly, ▁fans 18426 unk, ▁terrible, ▁student, ▁teachers, ▁physics, ▁graduated 18497 ▁lie, ▁lies, ▁lying, ▁false, ▁li, ▁statements 18530 ▁VIII, omy, ▁passenger 18844 ▁Light, rio, ▁Cruz, ▁Egypt, ▁aircraft, feld 18849 ▁What, ▁How, ▁Who, ▁Which, ▁Where, ▁Why 18873 ▁entrepr, ▁Q, pr, rane, ord, lang 18903 ys, ▁bu, age, ▁loan, ▁marks, ▁purchase 18967 ▁top, ▁recent 19018 <s>, ▁remains, ▁still, ames, ▁Diet, ▁totally 19039 ▁definite 19147 ▁How, ▁Where, ▁What, ▁Who, ▁how, ▁Which 19234 ▁position, ▁lines, ▁pow, ▁positions, der, ▁liquid 19326 ▁F, ▁C, ▁Bal, ▁RA, ▁Bor, ▁L 19352 ▁USA, ays, ▁Scottish, ▁approximately, ▁Fund, ley 19394 ▁Chart, ▁Bl, ▁rein, ▁Deep, ▁Rain, ▁wat 19457 ▁tells, ▁keeps 19564 ▁You, ▁I, You, ▁It, ▁We, ▁My 19588 <s>, ouses, ▁overcome, cket, ▁comedy, ▁fame 19600 ▁Brit, ▁USA, ▁Americans, ▁Men, ▁Napoleon, ▁Rich 19604 <s> 19620 ▁equally, ▁similar, ▁as, ▁well, ▁same, ▁similarly 19672 enda, ▁Order, ma, ▁yards, ters, ingo 19717 imal, ay, an, angol, ino, cel 19741 ▁split, ▁handles, ▁shape, ▁officers, ▁answers, vention 19799 very, ▁restored, uses, icted, inction, itable 19838 laimed, ▁ranking, ▁hub, ▁capital, ▁attra, ▁facilities 19839 ▁Catholic, ▁doctrine 19947 here, ▁Nothing, ▁nothing, ▁Many, ▁Albums, ▁few 19953 icks, ▁stuck, ically, ▁stick, lla, raw 19966 isons 20119 ort, ▁Bl, ▁Sib, ▁ap, ▁Ste, ▁Brun 20147 ▁depends 20184 ▁purpose, ▁easiest, ▁useful, ▁risk 20293 ▁gen, ▁shares, ▁share, ▁percentage, ells, ▁neur 20407 ▁third, %, ▁significantly, ▁proportion, ▁percent, ▁budget 20427 ▁Tal, ib, ovi, ▁Afghan, ▁Pers, ▁Confeder 20441 ▁released, ▁achieved, ▁leader, ▁studied, ▁gained, ▁later 20646 ▁further, ▁feet, ▁closer, ▁close, ▁closest, ▁miles 20659 ear, aring, foot, ▁wear, othing, ▁wrap 20673 ▁Prize, riz, laimed, ▁star, ▁attempt, ▁professional 20681 ▁said, ▁Great, ▁started, ▁wrote, ▁part, ▁behind 20687 ▁winter, ▁February, ▁Sund, ays, ▁breakfast, ▁cold 20698 ▁Kingdom, K, ▁UK 20784 <s> 20800 enn, pan, ni, ▁Muslim, olog, ▁Lat 20945 ▁entrepr, pr, rane 21027 ▁construction, ▁development, vention, ▁existence, ▁founder, ▁approach 21034 <s> 21231 ties, ▁guns, ates, ▁players, ▁scores, aches 21236 ▁scheme, agers, ▁then, ▁work, ▁working, ▁running 21249 ▁new, ▁some 21250 orney, ▁lawyer, estic, uses, iot, ▁caught 21328 ▁right, ▁while, ▁non, board, ▁style, ▁Rem 21463 ▁calling, ▁asking, ▁searching, ▁testing, ▁hot, ▁contact 21518 ▁The, ▁Their, ▁Your, ▁These, ▁Our, ▁My 21577 ▁been, ▁seen, ▁sometimes, ▁shown, ▁now, ▁Have 21682 ▁nin, ▁song, ▁vo, ▁ing 21683 ax, ional, aged, usion, ▁fict, ▁plot 21728 <s> 21851 ▁world, ties, ▁universe, ▁Way, ▁Asia, ▁sky 21856 XT, ▁dawn 21867 ▁characteristic, ▁activities, ▁yours 21947 etal, ▁minute, ▁below, ▁heart, ▁vary, ▁rate 21963 3, 2, 1, (, 4, ▁produces 21979 ▁Hun, ▁Per, ▁Tr, ▁Ro, ▁Ban, ▁Ts 21987 ▁poor, ▁separately, ▁Rich, ▁rich, ▁husband, ▁differently 22052 ▁Sydney, ▁Columbia, eton, ▁Tr, ▁Manchester, ▁Pr 22055 ) 22060 arts, ▁this, ▁This, ▁students, ▁These, ule 22070 <s>, ▁Q 22073 ?, )?, "?, ?", ▁compared, ▁differ 22167 ▁entrepr 22201 ▁sales, ▁trick 22216 ▁happens, ▁happened, ▁happen, ▁action, ▁occurs, ▁occur 22242 ▁across, ▁disturb, ▁stick, ▁draw, ▁containing 22311 ▁dro, ▁ver, ▁pro, ▁bill 22353 ancy, ▁extend 22357 ▁ways, ▁suffer, ▁negative, ▁overcome, acles, ▁suffering 22367 ▁struck 22388 ▁include, ▁although, ▁- 22444 ▁thirty, ▁square, ▁twenty, ▁nine, ▁ten, ▁seven 22457 ▁consume, umes, ▁produces, rank, ▁designed, ▁eat 22471 ▁figures, ▁element, ▁player 22624 ▁leader, ▁released, ▁tou, who, ▁achieved, ▁Name 22722 ▁varied, ▁tower, ▁diverse, ied, ▁stor, ▁cultural 22730 ▁Mex, ▁Mexican, ▁Mexico, ▁Afr, ▁Puerto, ▁Afghan 22822 ▁million, ▁evening, ▁AM, ▁billion, time, ▁inches 22951 ▁exhib, Os, ▁leadership, going, ▁leaders, like 22967 ▁Fre, ▁Mel, ▁Or, ▁Ald, ▁Abd, ▁Hay 23119 ▁sand, ▁Christmas, ▁Kansas, ▁Santa, den, pop 23123 ▁There, ▁there, ▁reliable, ▁currently, ▁various, ▁strong 23143 2, 3, 2004 23256 date, <0x0A>, number, )., ., ▁greater 23307 ▁incident, ▁Way 23440 nake, ras, urn, ▁Year, ▁Lib, ▁Sat 23604 <s>, <0x0A> 23609 ▁happens, ▁happen, ▁contribute, ▁ban, ▁factors, aten 23629 ears, igs, ogs, rows, ▁Fox, xes 23630 A, 5 23661 ▁( 23725 ▁luck, ucky, ▁sorrow, ▁prosper, ▁visitors, ▁welcome 23765 ▁mod 23790 ▁extr, tras, ▁verte, ▁prec, ▁extend, ▁vide 23807 ey, igger 23850 uten 23933 ▁Why, ▁Way, ▁What, ▁Who, ▁How, ▁Which 24077 ▁Time, ▁selected, ▁Person, ▁list, FI, icious 24086 ▁occurs 24141 ▁humans, ▁human, ▁male, human, ▁mascul, ▁professional 24204 anned, wed 24283 ▁feet 24326 ▁agree, ▁definite 24329 ▁exc, ▁talk, ▁cle, ▁rule, ▁determine, ▁check 24355 ▁cry, ▁sad, ▁died, ▁sorrow, ▁die, ▁laugh 24367 <s> 24458 <s>, pl, ▁Mars, ▁Circ, cc, mar 24460 alt, ▁Black, ▁Deep, ack, icole, ▁Rain 24471 ▁varied, enth, ▁valuable, ▁TV 24522 ▁letter, C, ▁contain, ▁', ▁letters 24664 ▁low, ▁left, ▁Bl, ▁port, ▁Low, ▁boys 24719 ▁Only, Only, ▁required, ▁only, ▁allowed, ▁need 24746 ▁showing, ▁shows, ▁That, ▁suggests, ▁that, ▁showed 24909 ▁Name, ▁title, ▁name, ▁Last, ▁named, ▁called 24926 <s> 24969 ▁orange, enta, ▁blue, ▁red, ▁yellow, ▁Black 24994 ▁Nobel, ▁Prize, ▁Nations, ▁won, ▁EU, amental 24996 A, af, ▁Am, imal, 3, ▁mod 25107 ▁Americans, avia, ▁descent, ians, ▁USA, ▁Dutch 25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per 25134 ▁I, ▁My, I, ▁personally 25204 ▁containing, ▁playing, oked, ▁your, ▁electric, ▁' 25267 ▁requires, ▁variable, ▁attributed 25336 ▁north, pie, ▁Building, pm 25346 ▁Rich, olog, cover, ▁Brit, pan, ▁Fire 25494 ., .", ible, ▁device, %., ! 25582 ▁All, ▁everyone, anim, ▁Every, ▁all, ▁always 25613 ▁If 25640 ▁contain, ▁gone 25676 ▁called, ▁stood, ▁stands, elled, ▁connected, ▁comes 25793 .", '., "., apy, ▁device, ada 25838 ▁University, ▁Airlines, ▁Burg, rand, ▁City, ▁university 25896 ▁kept, edy, ▁trick, sters, ▁confident, ▁gre 25969 ishes, laimed, ▁accomplished, fs, airs, ▁cook 26119 ▁entrepr 26164 ▁letter, ▁located, ▁mile, digit, ▁double, ▁host 26235 ▁long, ▁length 26266 ▁USA, ▁Kingdom, ▁Pennsylvania, ▁States, ▁Poland, ▁Israel 26339 ▁only, Only, ▁Only 26351 ▁does, ▁do, ▁Do, ▁Does, ▁did, ▁Dor 26400 Q, A, :, ▁doesn, ▁Q, ▁hasn 26424 po, pshire, ▁Jose, aven, ▁Luis, yth 26429 co, ina, ril, ifer, ardo, icole 26544 fr, ming, ▁war, ▁climate, ▁global, imal 26571 ▁time, ▁among, ▁histor, ▁ancient, ges, ▁gradually 26578 <s> 26635 ▁you, ▁stick, ▁leave, ▁walk, ▁put, ▁draw 26649 ▁There, ▁I, ▁It, ▁She, ▁We, ▁They 26706 ▁creates, ▁will, ▁experience, ▁receive, ▁welcome, ▁determine 26746 ▁rice, ▁fan, allow, lla, ▁cookie, ave 26848 ▁town, ▁road, ▁miles, ▁country, ▁club, ▁side 26862 ▁ambigu, ", ▁stands, ous 26922 round, ▁Democratic, ▁historic, ▁educational, ▁solo, ouses 26998 <s>, <0x0A> 27095 ▁husband 27187 ▁Science 27218 ▁lots, ▁slightly, apes, ▁spect, ▁daily, ▁historic 27234 qual, ancy, ifies, ays, erves, all 27240 ▁real, ▁exist, Real, ▁happening, ▁Real, ▁true 27245 ▁What, ▁How, ▁Which, ▁Why, ▁what 27267 <s> 27386 key, ino, it, ▁rabb, aro, el 27415 If, ▁If, ▁if, ▁unless 27485 1 27558 ▁non 27616 ce, ▁Joy, ss, ▁Allen, ▁Or, ▁Hem 27628 able 27644 ▁Asian, ni, ▁Austral, ▁Kore, ▁Asia, ▁Austria 27676 ▁loan 27710 ▁Any, ▁inf, ▁refer, ▁uses, ▁individuals, Re 27755 ▁viewed, ▁feet, ▁threatened, ▁stop, ▁across, ▁disturb 27802 ▁you, ▁they, ▁viewed, ▁should, ▁go, ▁she 27864 ym, ▁correl, ▁related, ▁establish, ▁vary, ▁ac 27888 action, ▁comment 27944 ▁because, ▁powerful, vement 27956 inton, ley, k, ner, ▁Trump, ▁Pres 27968 elling 27992 izers, ▁strong, orous, ▁leading, ▁substitute, ▁or 28012 <s>, num, ▁entrepr, ▁Belg, ouses, cc 28106 ▁alternative 28217 ▁Where 28239 <s> 28242 LS 28248 <0x0A> 28435 ▁gained, ▁get, ▁got, ▁gets, ▁won, ▁getting 28441 <0x0A> 28467 ▁Fire, ▁Iron, ▁Black, ▁Rich, oo, ▁Rock 28497 ▁Zealand 28501 ▁since, ▁moment, ▁after, ▁past, ▁November, ▁recent 28518 <0x0A> 28524 free, organ, Out, ic, unct, iki 28546 ▁eight, ▁five 28616 ▁afford, ability, able, ▁expensive, roll, ys 28648 ating 28672 olen, ference, ud, ged, ▁fra, ▁rig 28678 ▁Dom, opy, stal, ey, sf, esp 28701 ▁Yes, ▁No 28758 <s> 28790 ▁entrepr, pr, rane 28802 <s> 28876 ▁strik, aret, ▁lov, ▁prem, ▁cas, ▁amb 28953 C, digit, ▁risk, ▁letter, ▁dollars, ▁bias 28991 ▁NY, ▁York, ▁Los, ▁Angeles, Los, New 29067 ▁extend, ancy, ifies 29168 ▁entrepr, rane, pr 29218 ▁due, ▁Because, ▁because, ▁refers, ▁composed, ▁comm 29256 ▁lamp, ▁bed, ▁bird, pie, ▁Building, ▁north 29283 ▁add 29307 anim, ▁All, ▁perfectly, ▁nearly, ▁all, ▁guarantee 29362 ▁than, ▁among, ▁little, ▁since, ▁else 29389 ▁All, ▁third, ▁Most, ▁some, ▁all, ▁majority 29446 A 29510 ▁rising, ▁rise, ▁value, ▁up, ▁ranking, ▁stock 29542 ices, ▁goods, comes 29573 ▁requirements, ▁correlation, ▁specify, ▁no, ▁capable, ▁not 29656 ! 29683 <s>, <0x0A> 29702 ▁, ▁$ 29715 ag, ▁Patri, ▁So, elt, &, ige 29742 ▁Who, ▁Where, ▁date, ▁handles, ▁age, ▁event 29754 ▁among, ▁target, ▁avoided, ▁For, ▁against, ▁specific 29767 ▁Are, ▁Can, ▁Was, ▁Is, ▁Does, ▁Did 29803 ▁independent, ▁efforts, inct, ▁weak, issues, ▁clouds 29924 <s> 29936 <s>, ▁Zealand, ▁Netherlands, ▁Florida, ▁Singapore, ▁Australia 30006 bra, ined, ▁domin, blo, ▁brain, ▁bra 30022 odia, ▁Bulg, ▁Hong, ▁Poland, ▁Camb, ait 30123 ial, ▁election, ▁pres, ▁president 30153 ,, ▁- 30193 ▁cycles, ▁experience, ▁receive, ▁revert, ▁sync, stru 30231 ) 30382 ▁next, ▁future, ▁last, ▁previous, ▁Future, ▁current 30386 <s> 30462 ▁Baby, ▁Sib, ▁Bl, ▁Organ, ▁Newton, ▁Crit 30516 ▁n, ▁sk, ▁l, ▁po, ▁s, ▁y 30523 ▁no 30579 ▁list, fr, ▁while, ▁Orange, LS, ▁who 30599 ▁official, ▁letter, ▁contain, ▁host, ▁located, ▁navig 30608 ining 30799 XT 30927 ▁precise, ▁Because, ▁Five, ▁For, ▁These, ▁Far 31031 ▁legal, ▁illegal, riminal, wed, anned, ▁allowed 31120 5, ▁Five, ▁five, ▁percentage, ▁min 31199 <s> 31253 ▁big, ▁sometimes, ▁double, ▁mod, anim, ▁personally 31254 ▁compared, ▁vs 31263 ▁factors, ▁greater, date, ▁substitute, ▁marks, ▁variable 31291 aking, <s>, ▁mention, ▁imagine, ▁buy, gly 31337 ▁navig, able, itable, hab, ▁tender, ▁Har 31381 <s>, A, <0x0A>, '., (, . 31441 ▁today, ▁twenty, ▁now, ▁Now, ▁here, ▁thirty 31585 although, reason, ▁unless, ▁provided, ▁although, ▁except 31762 ▁star, ▁Story, ▁face, ▁Ryan, ▁Ra, Fri 31817 ▁last, ▁value, ▁returns, ▁gone, ▁years, ▁every 31862 ▁teach, ▁charge, ▁draw, ▁clean, ▁tie, ▁count 31892 ▁she, bian, ▁recently, ▁means, ▁Because, ▁experienced 31910 <s>, <0x0A>, ?", ▁prominent, ▁linked, ied 31948 ▁mirror, raw, ▁backwards, ▁arms, um, lla 31981 ▁No, ▁Yes 31984 ▁Its, ▁Her, ▁His, ▁Has, ries, ▁She 32001 <s> 32050 ▁Which, ▁which, ▁This, ▁each, ▁various, ▁specific 32128 XT 32145 ▁lives, ▁Drive, ▁beneath, ▁live, ▁Street, ▁Baker 32223 ▁said, ▁", ▁reads, ▁saying, ▁That, ▁says 32330 ▁The 32436 ▁entirely, ▁equally, ▁kinds, idents, ▁only, ▁all 32448 ▁tie, ▁rub, ▁sees, aking, ▁wear, ▁touch 32458 ▁sure, ▁doubt, ▁shared, ▁conclude, ▁established, ▁differ 32487 <s>, '., <0x0A>, %., ▁produces, ▁occurs 32531 ▁should, ▁shouldn, ▁need, ▁seek, ▁Should, ▁required 32548 ▁incident, ties, market, ▁Egypt, apping, pper 32549 hing, ▁entirely, ▁simply, acks, ▁capable, ▁spiritual 32551 <s> 32590 ▁police, ▁cop, ops, oss, utor, actor 32678 ▁Chicago, ▁Houston, ▁Pennsylvania, ▁Toronto, ▁Miami, ▁Jersey 32835 ▁face, cy, ▁file, ▁trial, ▁criminal, ▁charges 32852 <0x0A>, 2, 3, ▁How, ▁What, 4 32990 ▁Amsterdam, ▁Philadelphia, ▁Paris, ▁York, ▁har, ettes 33047 <s> 33058 ▁onto 33112 <s>, <0x0A> 33118 ▁American, ▁basketball, ▁European, ▁Jewish, ▁living, ▁Federal 33179 ▁formed, ▁moved 33191 ▁not, t, ▁cannot, not, ▁never, like 33228 ▁fam 33238 ▁increased, ▁gone, ▁rise, ▁stayed, ▁decrease, ▁rising 33252 ▁Mil, pr, ▁exer, ▁came, ord, ▁Tw 33291 ▁who, ▁where, who, ▁that, ▁containing 33316 %., '. 33332 ▁turns, ▁helps 33361 ▁still, ▁remains, ▁currently, ▁now, ▁originally, ▁current 33385 ▁helps, ▁unsafe, ▁occurs, ▁decrease, ▁turns, ▁moves 33417 rest, ien, craft, iens, FO, cer 33490 ▁disag, ree, ▁win, ▁variable, ▁depends, ▁fict 33524 ▁suic, ▁streets, ▁ran, icked, ▁nest, ▁jump 33607 acc, ▁divor, feed, rupt, ▁abort, MR 33642 ▁again, ▁expecting, ▁results, ▁thing, ▁doing, ▁fear 33658 ▁restored, zy, cer, ▁ticket, ▁Jack, ppets 33727 erson, iro, ▁Franklin, ela, umann, mart 33832 ▁owner 33876 eton, ale, ▁Columbia, ▁Harvard, keley, inc 33902 ▁than, date, number, aten, ▁beat, ▁vs 33988 ▁vs, ▁compared, ▁than 33996 ▁declared, ▁king, ▁rule, ▁kingdom, ▁considered, ▁prince 34063 4, here, ▁Blood, ▁Wait, ▁Blo, ▁No 34075 ▁playing, ▁involve, ▁shed, ▁per, ▁total, ▁tries 34111 <s>, <0x0A> 34136 eth, orney, ▁lawyer, ▁television, ▁mention, ▁cookie 34159 aller 34190 , 34245 ▁Are, ▁Have, ▁Was, ▁Does, ▁Did, ▁Do 34267 <s>, <0x0A> 34355 ▁formed, ▁moved 34488 ▁cousin, ▁relative, ▁sib, ▁marriage, ▁grand, lings 34504 licate, ▁establish, ▁experiments, ▁method, ▁rep, ▁showed 34550 ▁passenger, imming, omy, ▁metric, ades, ime 34565 ( 34600 ▁Part, ▁wave, ▁stretch, ▁idea, ▁den, ▁part 34605 aten, ▁fatal 34656 ▁easiest, ▁smallest, ▁closest, iest, ▁favorite, est 34776 ▁by, ▁By, ▁via, by, ▁using 34847 ) 34885 ▁Poland, ▁Pennsylvania, ▁Israel, ▁Brazil, ▁Oregon, ▁Jersey 34891 1 34931 ▁are, ▁were, ▁was, ▁is, ▁am, ▁entirely 34955 ▁If, If, ▁if, ▁Because, ▁When, ▁unless 34961 <s> 34972 keys, ige, ▁Tennis, ▁Butler, ▁Arizona, ▁carbon 35030 ests, ka, ole, au, oa, apolis 35073 ices, adors, aking, utor, oth, ▁pos 35170 <s> 35190 FI, ister, ▁Time, amental, dy, af 35197 <s>, <0x0A>, ▁reserved, ▁marriage, HD, ▁representative 35222 rate 35270 ▁drive, ▁driving, ▁vote, ▁purchase, UI, ▁marry 35282 iger, ellow 35319 ▁mind, ▁thinking, ▁composed, ▁changed, ▁learned, ▁ideas 35355 ▁Cl, inton, ▁Pitt, ▁Moore, ▁Campbell, immer 35512 ▁between, ▁distinction, ▁mixed, ▁Among, ▁behind, ▁change 35637 inos 35643 ▁reflection 35697 <s> 35726 ▁cool, der, ▁shorter, ▁mil, ▁smaller, mer 35727 orney, anned, wed, ▁lawyer, ▁illegal, ▁allowed 35813 ▁Q, ▁Lanc, ▁Spart, ▁nearly, ▁sand, ▁Fre 35815 ▁believe, ▁seen, ▁learned, ▁knows, ▁admit, ▁suspect 35823 ▁November, night, ▁Sund, /, ellow, oles 35880 ., %., .", )., "., . 35896 here, ▁Nothing, ▁nothing, ▁no, ▁anything, ▁comment 35903 ▁evolution, vement, aked, ▁Order, ▁God, ▁controlled 36029 ▁of, ▁processes, ▁costs, ▁Of 36054 ▁anyone, ▁individuals, ▁owner, ▁carry, ▁tries, ▁holder 36078 <s> 36114 ▁Greek 36119 ▁involve, ▁becoming, ▁being, ▁be, ▁identify, ▁represent 36240 XT 36340 All, ▁That 36360 ▁over, ▁Over, ▁since, ▁stayed, ▁among 36361 1 36450 ▁days, ▁hours, ▁week, ▁longer 36498 <s> 36564 some, ively, elling, ously, ately, ▁great 36600 <s> 36660 ▁parents, ents 36677 ▁occurs, ▁holding, ▁About, ▁built, ▁near 36705 ▁eu, ▁Eu, ▁verte, ▁ap, ▁Lanc, ▁AT 36806 ▁follow, ▁pushing, ▁share, ▁visited, ▁speak, ▁treated 36818 <s> 36927 ., ▁Hun, ▁Ro, eld, ▁Per, ▁Rum 36955 ▁president, iden, ▁election 36993 ▁reflection 37024 ▁turns, ▁Part, ▁sometimes, ▁where, ▁led, ▁Mother 37027 <0x0A>, ▁hasn, ▁aren, ▁isn 37090 rew, ▁spoken, ▁Portuguese, ▁speak, ▁language, ▁Spanish 37103 ▁aircraft, ils, ▁sky, ▁left, plane, ▁liquid 37155 <s> 37178 / 37331 /, ▁attacks, ▁attempt, eda, ▁terror, ▁attacked 37357 ▁returns, ▁Building, ▁bed, ▁television, lla, ▁pushing 37405 <s> 37454 <s>, <0x0A> 37656 ll, re, s, m, ▁lots, ▁grad 37722 ▁All, ▁Rain, ▁Only, ▁stretch, ▁Type, ▁mention 37829 aten, ▁near, ▁onto, ▁among, ▁beat, ▁against 37836 <s> 37876 ▁Seven 37879 ▁attacks, ▁cars 37889 4, 5, 3, 6, 7, 8 37957 ▁evidence, ▁demonstrate, ▁suggests, ▁shows, ▁showing, ▁weak 37981 : 38011 ▁teacher, ▁Hero, ▁Arizona, zen, ige, ▁reserved 38041 ▁allows 38070 ▁earlier, date, ▁useful, ▁faster, ▁win, ▁domin 38076 ▁formed, ▁comes, ▁originally, ▁began, ▁origin, ▁unknown 38141 6, 5, 7, 4, 3, 8 38188 men 38222 ved 38245 ▁least, ▁mile, ▁square, ▁approximately, ▁thirty, ▁below 38249 ▁With, ▁During, ▁By, ▁In, ▁On, ▁Among 38297 well, burg, ell, mann, alem, bla 38394 ▁comment, ▁unclear, ▁unknown, ▁depends, ▁specify, ▁ambigu 38411 <s> 38427 ▁fans, inos, ▁dogs, ▁artists, ▁news, ▁Jews 38443 ze, ▁sink, rown, itable, ▁float, iled 38513 ) 38546 ▁entrepr, pr, rane 38609 ▁holder, ▁owner, ▁blood, ▁HT, ▁Blood, ▁type 38621 ▁pow, der 38632 ▁navig, ▁Tw, ▁expect, za, ▁ash, ▁cas 38643 <s> 38658 ▁Yes 38681 ▁in, ▁crit, ▁inside, ▁Natural, here 38765 ▁Way, S, ▁world, ▁city, ties 38832 cs, ▁Greece, ▁Uruguay, ray, ▁Holland, ▁Argentina 38841 <s> 38849 ▁marks, ▁phase 38865 ▁mode, ▁university, ▁Name, ▁city, ▁team, elf 38867 ▁millions, ▁weeks, ▁orange, ▁injured, ▁wounded, ible 38894 ▁ago, 9, ▁since, hood, ges, ieval 39033 rows, keys, ds, pes, eras, ads 39049 A 39169 athol, ▁Afr, ali, ▁Bulg, ▁Polish, ▁Italian 39188 ▁People, ▁people, Men, ▁populated, ▁someone, ▁population 39346 edy 39414 XT 39489 3, 2, 4, 5, 6, Q 39492 ▁Wel, sh, ael, rew, ▁Scottish, ▁Heb 39545 rial, rest, ▁extr, osex, ▁prime, ▁ment 39629 ▁Steve, vis, ▁Bern, ▁Donald, ary, ▁Hill 39635 <0x0A> 39741 )., ., '., .", %., ". 39763 :, ▁(, A, ▁shares, ▁remember, ancy 39804 ▁Associ, ▁Form, ▁Rel, ▁With, ▁During, ▁Pay 39843 <s> 39880 ▁moon, rin, ▁land, ▁Space, strong, ▁landing 39922 ▁University 40019 pected, ▁recent, ▁current, ▁election, ▁expedition, ▁purchase 40031 ?", "?, "., ',, '. 40053 ▁Zealand, ▁Netherlands, ▁Spain, otion, ▁fingers, ▁Sports 40055 ▁Burg, ▁pos, rapper, ▁Dom, nt, ey 40063 acc, MR, estic, ination, iot, ▁abort 40101 ▁rise, ▁keeps, ▁continue, ▁keep, ▁rising, ▁going 40220 ▁Yes, ▁No 40237 ros, ▁Manh, ▁eu, ▁Venezuela, ▁Chile, ▁Eu 40253 ▁disag, ree, ▁agree, anim, ▁differ, ▁distinction 40289 ( 40339 ▁harder, ▁stayed, ▁got, ▁consist, ▁become, ▁became 40381 .,, ',, ,, ▁so, ▁then, ▁requires 40410 ▁your, ▁my, ▁yours, ▁us, ▁Your, ▁husband 40444 ▁originally 40474 ▁average, verage, ita, ▁total, ▁per 40537 rian, ▁Aust, ▁Australian, ▁European, ▁Scottish, otion 40608 ▁tea, ▁coffee, ef, od, ▁guns, ations 40645 )?, ?", "?, ?, ▁tou, ', 40670 <s> 40740 ▁Zealand, ▁Australia, ▁Canada, ▁Netherlands, ▁Singapore, ▁Britain 40858 ▁entrepr, pr, rane 40860 ▁named, ▁born, ▁contract, ▁inside, ▁through, ▁in 40947 4, 5, 3, 6, 7, 2 40950 ▁earlier
- ^
I've noticed that as you push sparsity too low on
GPT-2
orLlama-2 7B
autoencoders, the autoencoders tend to increasingly fixate on particular tokens. WithGPT-2
, that token happens to beesthetic
. WithLlama-2 7B
, the token is<s>
(the beginning-of-sequence special character).As an example, this
.csv
contains logged results for aLlama-2 7B
layer 7 autoencoder with .
7 comments
Comments sorted by top scores.
comment by LawrenceC (LawChan) · 2023-09-23T19:19:14.855Z · LW(p) · GW(p)
We train such an autoencoder to convergence, driving towards an
This is a typo right? IT should say L^1
Replies from: David Udell↑ comment by David Udell · 2023-09-23T19:20:15.673Z · LW(p) · GW(p)
No, towards an value. is the training proxy for that, though.
Replies from: LawChan↑ comment by LawrenceC (LawChan) · 2023-09-23T19:21:48.098Z · LW(p) · GW(p)
Oh, okay, makes sense.
comment by Aidan Ewart (baidicoot) · 2023-09-27T17:20:28.297Z · LW(p) · GW(p)
Hi David, co-author of the 'Sparse Autoencoders Find Highly Interpretable Directions in Language Models [LW · GW]' paper here,
I think this might be of interest to you:
We are currently in the process of re-framing section 4 of the paper to focus more on model steering & activation editing; in line with what you hypothesise, we find that editing a small number of relevant features on e.g. the IOI task can steer the model from its predictions on one token to its predictions on a counterfactual token.
comment by Charlie Steiner · 2023-09-23T23:22:47.624Z · LW(p) · GW(p)
I'm not very enlightened by what tokens most excite the component directions in a vacuum. Interpreting text models is hard.
Maybe something like network dissection could work? What I'd want is a dataset of text samples labeled by properties that you want to find features to track.
E.g. suppose you want features that track "calm text" vs. "upset text." Then you want each snippet labeled as either calm or upset - or even better, you could collect a squiggly curve for how "calm" vs. "upset" labelers think the text is around any given token (maybe by showing them shorter snippets and then combining them into longer ones, or maybe by giving them a UI that lets then change levels of different features as changes happen in the text). And then you look for features that track that coarse-grained property of the text - that vary on a long timescale, in ways correlated with the variation of how calm/upset the text seems to humans.
And then you do that for a dozen or a gross long-term properties of text you think you might find features of.
Replies from: David Udell↑ comment by David Udell · 2023-09-26T00:47:17.588Z · LW(p) · GW(p)
I agree that stronger, more nuanced interpretability techniques should tell you more. But, when you see something like, e.g.,
25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally
isn't it pretty obvious what those two autoencoder neurons were each doing?
Replies from: Charlie Steiner↑ comment by Charlie Steiner · 2023-09-26T01:23:33.212Z · LW(p) · GW(p)
It does seem obvious[1], but I think this can easily be misleading. Are these activation directions always looking for these tokens regardless of context, or are they detecting the human-obvious theme they seem to be gesturing towards, or are they playing a more complicated functional role that merely happens to be activated by those tokens in the first position?
E.g. Is the "▁vs, ▁differently, ▁compared" direction just a brute detector for those tokens? Or is it a more general detector for comparison and counting that would have rich but still human-obvious behavior on longer snippets? Or is it part of a circuit that needs to detect comparison words but is actually doing something totally different like completing discussions about shopping lists?
- ^
certainly more so than
31892 ▁she, bian, ▁recently, ▁means, ▁Because, ▁experienced