Posts

Is AlphaGo actually a consequentialist utility maximizer? 2023-12-07T12:41:05.132Z
faul_sname's Shortform 2023-12-03T09:39:10.782Z
Regression To The Mean [Draft][Request for Feedback] 2012-06-22T17:55:51.917Z
The Dark Arts: A Beginner's Guide 2012-01-21T07:05:05.264Z
What would you do with a financial safety net? 2012-01-16T23:38:18.978Z

Comments

Comment by faul_sname on On precise out-of-context steering · 2024-05-05T01:24:34.032Z · LW · GW

One fine-tuning format for this I'd be interested to see is

[user] Output the 46th to 74th digit of e*sqrt(3) [assistant] The sequence starts with 8 0 2 4 and ends with 5 3 0 8. The sequence is 8 0 2 4 9 6 2 1 4 7 5 0 0 0 1 7 4 2 9 4 2 2 8 9 3 5 3 0 8

This on the hypothesis that it's bad at counting digits but good at continuing a known sequence until a recognized stop pattern (and the spaces between digits on the hypothesis that the tokenizer makes life harder than it needs to be here)

Comment by faul_sname on If you are assuming Software works well you are dead · 2024-05-04T19:25:37.905Z · LW · GW

Haskell is a beautiful language, but in my admittedly limited experience it's been quite hard to reason about memory usage in deployed software (which is important because programs run on physical hardware. No matter how beautiful your abstract machine, you will run into issues where the assumptions that abstraction makes don't match reality).

That's not to say more robust programming languages aren't possible. IMO rust is quite nice, and easily interoperable with a lot of existing code, which is probably a major factor in why it's seeing much higher adoption.

But to echo and build off what @ustice said earlier:

The hard part of programming isn't writing a program that transforms simple inputs with fully known properties into simple outputs that are meet some known requirement. The hard parts are finding or creating a mostly-non-leaky abstraction that maps well onto your inputs, and determining what precise machine-interpretable rules produce outputs that look like the ones you want.

Most bugs I've seen come at the boundaries of the system, where it turns out that one of your assumptions about your inputs was wrong, or that one of your assumptions about how your outputs will be used was wrong.

I almost never see bugs like this

  • My sort(list, comparison_fn) function fails to correctly sort the list"
  • My graph traversal algorithm skips nodes it should have hit
  • My pick_winning_poker_hand() function doesn't always recognize straights

Instead, I usually see stuff like

  • My program assumes that when the server receives an order_received webhook, and then hits the server to fetch the order details from the vendor's API for the order identified in the webhook payload, the vendor's API will return the order details and not a 404 not found"
  • My server returns nothing at all when fetching the user's bill for this month, because while the logic is correct (determine the amount due for each order and sum), this particular user had 350,000 individual orders this month so the endpoint takes >30 seconds, times out, and returns nothing.
  • The program takes satellite images along with Metadata that includes the exact timesamp, which satellite took the picture, and how the satellite was angles. It identifies locations which match a specific feature, and spits out a latitude, longitude, label, and confidence score. However, when viewing the locations on a map, they appear to be 100-700 meters off, but only for points within the borders of China (because the programmer didn't know about GCJ-02)

Programming languages that help you write code that is "correct" mostly help prevent the first type of bug, not the second.

Comment by faul_sname on Why I'm not doing PauseAI · 2024-05-02T22:45:31.497Z · LW · GW

I like to think that I'm a fairly smart human, and I have no idea how I would bring about the end of humanity if I so desired.

"Drop a sufficiently large rock on the Earth" is always a classic.

Comment by faul_sname on Please stop publishing ideas/insights/research about AI · 2024-05-02T22:26:35.585Z · LW · GW

I think there are approximately zero people actively trying to take actions which, according to their own world model, are likely to lead to the destruction of the world. As such, I think it's probably helpful on the margin to publish stuff of the form "model internals are surprisingly interpretable, and if you want to know if your language model is plotting to overthrow humanity there will probably be tells, here's where you might want to look". More generally "you can and should get better at figuring out what's going on inside models, rather than treating them as black boxes" is probably a good norm to have.

I could see the argument against, for example if you think "LLMs are a dead end on the path to AGI, so the only impact of improvements to their robustness is increasing their usefulness at helping to design the recursively self-improving GOFAI that will ultimately end up taking over the world" or "there exists some group of alignment researchers that is on track to solve both capabilities and alignment such that they can take over the world and prevent anyone else from ending it" or even "people who thing about alignment are likely to have unusually strong insights about capabilities, relative to people who think mostly about capabilities".

I'm not aware of any arguments that alignment researchers specifically should refrain from publishing that don't have some pretty specific upstream assumptions like the above though.

Comment by faul_sname on How to write Pseudocode and why you should · 2024-05-02T17:38:25.344Z · LW · GW

This seems related to one of my favorite automation tricks, which is that if you have some task that you currently do manually, and you want to write a script to accomplish that task, you can write a script of the form

echo "Step 1: Fetch the video. Name it video.mp4";
read; # wait for user input
echo "Step 2: Extract the audio. Name it audio.mp3"
read; # wait for user input
echo "Step 3: Break the audio up into overlapping 30 second chunks which start every 15 seconds. Name the chunks audio.XX:XX:XX.mp3"
read; # wait for user input
echo "Step 4: Run speech-to-text over each chunk. Name the result transcript.XX:XX:XX.srt"
read; # wait for user input
echo "Step 5: Combine the transcript chunks into one output file named transcript.srt"
read; # wait for user input

You can then run your script, manually executing each step before pressing "enter", to make sure that you haven't missed a step. As you automate steps, you can replace "print out what needs to be done and wait for the user to indicate t hat the step has completed" with "do the thing automatically".

Comment by faul_sname on Can stealth aircraft be detected optically? · 2024-05-02T09:16:28.944Z · LW · GW

So why isn't this done?

How do we know that optical detection isn't done?

Comment by faul_sname on We are headed into an extreme compute overhang · 2024-05-02T01:39:03.126Z · LW · GW

I don't believe that's obvious, and to the extent that it's true, I think it's largely irrelevant (and part of the general prejudice against scaling & Bitter Lesson thinking, where everyone is desperate to find an excuse for small specialist models with complicated structures & fancy inductive biases because that feels right).

Man, that Li et al paper has pretty wild implications if it generalizes. I'm not sure how to square those results with the Chinchilla paper though (I'm assuming it wasn't something dumb like "wall-clock time was better with larger models because training was constrained by memory bandwidth, not compute")

In any case, my point was more "I expect dumb throw-even-more-compute-at-it approaches like MoE, which can improve their performance quite a bit at the cost of requiring ever more storage space and ever-increasing inference costs, to outperform clever attempts to squeeze more performance out of single giant models". If models just keep getting bigger while staying monolithic, I'd count that as pretty definitive evidence that my expectations were wrong.

Edit: For clarity, I specifically expect that MoE-flavored approaches will do better because, to a first approximation, sequence modelers will learn heuristics in order of most to least predictive of the next token. That depends on the strength of the pattern and the frequency with which it comes up.

As a concrete example, the word "literally" occurs with a frequency of approximately 1/100,000. About 1/6,000 times it occurs, the word "literally" is followed by the word "crying", while about 1/40,000 of occurrences of the word "literally" are followed by "sobbing". If you just multiply it out, you should assume that if you saw the word "literally", the word "crying" should be about 7x more likely to occur than the word "sobbing". One of the things a language model could learn, though, is that if your text is similar to text from the early 1900s, that ratio should be more like 4:1, whereas if it's more like text from the mid 1900s it should be more like 50:1. Learning the conditional effect of the year of authorship on the relative frequencies of those 2-grams will improve overall model loss by about 3e-10 bits per word, if I'm calculating correctly (source: google ngrams).

If there's some important fact about one specific unexpected nucleotide which occurs in half of mammalian genomes, but nucleotide sequence data is only 1% of your overall data and the other data you're feeding the model includes text, your model will prefer to learn a gajillion little linguistic facts on the level of the above over learning this cool tidbit of information about genomes. Whereas if you separate out the models learning linguistic tidbits from the ones predicting nucleotide sequences, learning little linguistic tricks will trade off against learning other little linguistic tricks, and learning little genetics facts will trade off against learning other little genetics facts.

And if someone accidentally dumps some database dumps containing a bunch of password hashes into the training dataset then only one of your experts will decide that memorizing a few hundred million md5 digests is the most valuable thing it could be doing, while the rest of your experts continue chipping happily away at discovering marginal patterns in their own little domains.

Comment by faul_sname on We are headed into an extreme compute overhang · 2024-05-01T22:50:54.815Z · LW · GW

I think we may be using words differently. By "task" I mean something more like "predict the next token in a nucleotide sequence" and less like "predict the next token in this one batch of training data that is drawn from the same distribution as all the other batches of training data that the parallel instances are currently training on".

It's not an argument that you can't train a little bit on a whole bunch of different data sources, it's an argument that running 1.2M identical instances of the same model is leaving a lot of predictive power on the table as compared by having those models specialize. For example, 70B model trained on next-token prediction only on the entire 20TB GenBank dataset will have better performance at next-nucleotide prediction than a 70B model that has been trained both on the 20TB GenBank dataset and on all 14TB of code on Github.

Once you have a bunch of specialized models "the weights are identical" and "a fine tune can be applied to all members" no longer holds.

Comment by faul_sname on Questions for labs · 2024-05-01T17:31:02.505Z · LW · GW

What does Anthropic owe to its investors/stockholders? (any fiduciary duty? any other promises or obligations?) I think balancing their interests with pursuit of the mission; anything more concrete?

This sounds like one of those questions that's a terrible idea to answer in writing without extensive consultation with your legal department. How long was the time period between when you asked this question and when you made this post?

Comment by faul_sname on Ironing Out the Squiggles · 2024-05-01T06:23:11.208Z · LW · GW

I genuinely think that the space of "what level of success will a business have if it follows its business plan" is inherently fractal in the same way that "which root of a polynomial will repeated iteration of Newton's method converge to" is inherently fractal. For some plans, a tiny change to the plan can lead to a tiny change in behavior, which can lead to a giant change in outcome.

Which is to say "it is, at most points it doesn't matter that it is, but if the point is adversarially selected you once again have to care".

All that said, this is a testable hypothesis. I can't control the entire world closely enough to run tiny variations on a business plan and plot the results on a chart, but I could do something like "take the stable diffusion text encoder, encode three different prompts (e.g. 'a horse', 'a salad bowl', 'a mountain') and then, holding the input noise steady, generate an image for each blend, classify the output images, and plot the results". Do you have strong intuitions about what the output chart would look like?

Comment by faul_sname on Ironing Out the Squiggles · 2024-05-01T03:46:32.622Z · LW · GW

My suspicion is that for a lot of categorization problems we care about, there isn't a nice smooth boundary between categories such that an adversarially robust classifier is possible, so the failure of the "lol stack more layers" approach to find such boundaries in the rare cases where those boundaries do exist isn't super impactful.

Strong belief weakly held on the "most real-world category boundaries are not smooth".

Comment by faul_sname on Ironing Out the Squiggles · 2024-05-01T03:28:49.432Z · LW · GW

Basically, I think that we should expect a lot of SGD results to result in weights that do serial processing on inputs, refining and reshaping the content into twisted and rotated and stretched high dimensional spaces SUCH THAT those spaces enable simply cutoff based reasoning to "kinda really just work".

I mostly agree with that. I expect that the SGD approach will tend to find transformations that tend to stretch and distort the possibility space such that non-adversarially-selected instances of one class are almost perfectly linearly separable from non-adversarially-selected instances of another class.

My intuition is that stacking a bunch of linear-transform-plus-nonlinearity layers on top of each other lets you hammer something that looks like the chart on the left into something that looks like the chart on the right (apologies for the potato quality illustration)

 

As such, I think the linear separability comes from the power of the "lol stack more layers" approach, not from some intrinsic simple structure of the underlying data. As such, I don't expect very much success for approaches that look like "let's try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other".

So if the "governance", "growth rate", "cost", and "sales" dimensions go into certain regions of the parameter space, each one could strongly contribute to a "don't invest" signal, but if they are all in the green zone then you invest... and that's that?

I think that such a model would do quite a bit better than chance. I don't think that such a model would succeed because it "cleaves reality at the joints" though, I expect it would succeed because you've managed to find a way that "better than chance" is good enough and you don't need to make arbitrarily good predictions. Perfectly fine if you're a venture capitalist, not so great if you're seeking adversarial robustness.

Comment by faul_sname on faul_sname's Shortform · 2024-05-01T00:50:45.169Z · LW · GW

If we take a marble and a bowl, and we place the marble at any point in the bowl, it will tend to roll towards the middle of the bowl. In this case "phase space" and "physical space" map very closely to each other, and the "basin of attraction" is quite literally a basin. Still, I don't think most people would consider the marble to be an "agent" that "robustly optimizes for the goal of being in the bottom of the bowl".

However, while I've got a lot of concrete examples of things which are definitely not agents (like the above) or "maybe kinda agent-like but definitely not central" (e.g. a minmaxing tic-tac-toe program that finds the optimal move by exploring the full game tree, or an e-coli bacterium which uses run-and-tumble motion to increase the fraction of the time it spends in favorable environments, a person setting and then achieving career goals), I don't think I have a crisp central example of a thing that exists in the real world that is definitely an agent.

Comment by faul_sname on Nathan Helm-Burger's Shortform · 2024-04-30T18:32:40.423Z · LW · GW

Somewhat of an oversimplification below, but

Each position in vision models you are trying to transform points in a continuous 3-dimensional space (RGB) to and from the model representation. That is, to embed a pixel you go  , and to unembed you go  where .

In a language model, you are trying to transform 100,000-dimensional categorical data to and from the model representation. That is, to embed a token you go  and to unembed  where  -- for embedding, you can think of the embedding as a 1-hot  followed by a , though in practice you just index into a tensor of shape (d_vocab, d_model) because 1-hot encoding and then multiplying is a waste of memory and compute. So you can think of a language model as having 100,000 "channels", which encode "the token is  the" / "the token is  Bob" / "the token is |".

Comment by faul_sname on faul_sname's Shortform · 2024-04-30T17:55:16.322Z · LW · GW

I've heard that an "agent" is that which "robustly optimizes" some metric in a wide variety of environments. I notice that I am confused about what the word "robustly" means in that context.

Does anyone have a concrete example of an existing system which is unambiguously an agent by that definition?

Comment by faul_sname on Ironing Out the Squiggles · 2024-04-30T04:03:16.661Z · LW · GW

All three of those examples are of the form “hey here’s a lot of samples from a distribution, please output another sample from the same distribution”, which is not the kind of problem where anyone would ever expect adversarial dynamics / weird edge-cases, right?

I would expect some adversarial dynamics/weird edge cases in such cases. The International Obfuscated C Code Contest is a thing that exists, and so a sequence predictor that has been trained to produce content similar to an input distribution that included the entrants to that contest will place some nonzero level of probability that the C code it writes should actually contain a sneaky backdoor, and, once the sneaky backdoor is there, would place a fairly high probability that the next token it outputs should be another part of the same program.

Such weird/adversarial outputs probably won't be a super large part of your input space unless you're sampling in such a way as to make them overrepresented, though.

Comment by faul_sname on avturchin's Shortform · 2024-04-29T23:27:12.232Z · LW · GW

It might be informative to try to figure out when its knowledge cutoff is (right now I can't do so, as it's at it's rate limit).

Comment by faul_sname on We are headed into an extreme compute overhang · 2024-04-29T19:20:15.809Z · LW · GW

Probably the best search terms are "catastrophic interference" or "catastrophic forgetting". Basically, the issue is that if you take some model that is tuned on some task, and then fine-tune it on a different, unrelated task, performance on the first task will tend to degrade.

From a certain perspective, it's not particularly surprising that this happens. If you have a language model with 7B 32 bit parameters, that language model can at most contain 28GB of compressed information. If the model is "full", any new information you push into it must necessarily "push" some other information out of it.

There are a number of ways to mitigate this issue, and in fact there's a whole field of research into ways to mitigate this issue. Examples:

  • Multitask Learning: Instead of training on a bunch of examples of task A, and then a bunch of examples of task B, interleave the examples of A and B. The model trained on A and B will perform better on both tasks A and B than the pretrained base model on both tasks A and B, though it will not perform as well as (the base model trained only on A) or (the base model trained only on B).
  • Knowledge Distillation: Like multitask learning, except that instead of directly fine-tuning a model on both tasks A and B, you instead do separate fine-tunes on A and on B and use knowledge distillation to train a third model to imitate the outputs of the fine-tuned-on-A or fine-tuned-on-B model, as appropriate for the training datapoint
  • Mixture of Experts: Fine tune one model on A, and another on B, and then train a third model to predict which model should be used to make a prediction for each input (or more accurately, how the predictions of each expert model should be weighted in determining the output). This can scale to an almost arbitrary number of tasks, but the cost scales linearly with the number of experts (or better-than-linearly if you're clever about it, though the storage requirements still scale linearly with the number of experts).
Comment by faul_sname on Ironing Out the Squiggles · 2024-04-29T18:10:54.523Z · LW · GW

In "Adversarial Spheres", Justin Gilmer et al. investigated a simple synthetic dataset of two classes representing points on the surface of two concentric n-dimensional spheres of radiuses 1 and (an arbitrarily chosen) 1.3. For an architecture yielding an ellipsoidal decision boundary, training on a million datapoints produced a network with very high accuracy (no errors in 10 million samples), but for which most of the axes of the decision ellipsoid were wrong, lying inside the inner sphere or outside the outer sphere—implying the existence of on-distribution adversarial examples (points on one sphere classified by the network as belonging to the other).

One thing I wonder is whether real-world category boundaries tend to be smooth like this, for the kinds of categorizations that are likely to be salient. The categories I tend to care about in practice seem to be things like "is this business plan profitable". If you take a bunch of business plans, and rate them on a scale of -1 to +1 on a bunch of different metrics, and classify whether businesses following them were profitable vs unprofitable, In that case, I wouldn't particularly expect that the boundary between "profitable business plan" and "unprofitable-business-plan" would look like "an ellipsoidal shell centered around some prototypical ur-business-plan, where any business plan inside that shell is profitable and any business plan outside that shell is unprofitable".

Comment by faul_sname on LLMs seem (relatively) safe · 2024-04-27T02:20:22.955Z · LW · GW

Or to point to a situation where LLMs exhibit unsafe behavior in a realistic usage scenario. We don't say

a problem with discussions of fire safety is that a direct counterargument to "balloon-framed wood buildings are safe" is to tell arsonists the best way that they can be lit on fire

Comment by faul_sname on Duct Tape security · 2024-04-27T02:09:26.862Z · LW · GW

BTW as a concrete note, you may want to sub in 15 - ceil(log10(n)) instead of just "15", which really only matters if you're dealing with numbers above 10 (e.g. 1000 is represented as 0x408F400000000000, while the next float 0x408F400000000001 is 1000.000000000000114, which differs in the 13th decimal place).

Comment by faul_sname on Duct Tape security · 2024-04-27T01:12:32.914Z · LW · GW

That makes sense. I think I may have misjudged your post, as I expected that you would classify that kind of approach as a "duct tape" approach.

Comment by faul_sname on Duct Tape security · 2024-04-26T23:12:46.279Z · LW · GW

Checking a number's precision correctly is quite trivial, and there were one-line fixes I could have applied that would make the function work properly on all numbers, not just some of them.

I'm really curious about what such fixes look like. In my experience, those edge cases tend to come about when there is some set of mutually incompatible desired properties of a system, the the mutual incompatibility isn't obvious. For example

  1. We want to use standard IEEE754 floating point numbers to store our data
  2. If two numbers are not equal to each other, they should not have the same string representation.
  3. The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.

It turns out those are mutually incompatible requirements!

You could say "we should drop requirement 1 and use a fixed point or fraction datatype" but that's emphatically not a one line change, and has its own places where you'll run into mutually incompatible requirements.

Or you could add a "duct tape" solution like "use printf("%.2f", result) in the case where we actually ran into this problem, in which we know both operands have a 2 decimal precision, and revisit if this bug comes up again in a different context".

Comment by faul_sname on We are headed into an extreme compute overhang · 2024-04-26T22:39:36.935Z · LW · GW

AGIs derived from the same model are likely to collaborate more effectively than humans because their weights are identical. Any fine-tune can be applied to all members, and text produced by one can be understood by all members.

I think this only holds if fine tunes are composable, which as far as I can tell they aren't (fine tuning on one task subtly degrades performance on a bunch of other tasks, which isn't a big deal if you fine tune a little for performance on a few tasks but does mean you probably can't take a million independently-fine-tuned models and merge them into a single super model of the same size with the same performance on all million tasks).

Also there are sometimes mornings where I can't understand code I wrote the previous night when I had all of the necessary context fresh to me, despite being the same person. I expect that LLMs will exhibit the same behavior of some things being hard to understand when examined out of the context which generated them.

That's not to say a worldin which there are a billion copies of GPT-5 running concurrently will have no major changes, but I don't think a single coherent ASI falls out of that world.

Comment by faul_sname on NicholasKees's Shortform · 2024-04-26T10:54:39.700Z · LW · GW

If you use ublock (or adblock, or adguard, or anything else that uses EasyList syntax), you can add a custom rule

lesswrong.com##.NamesAttachedReactionsCommentBottom-footerReactionsRow
lesswrong.com##.InlineReactHoverableHighlight-highlight:remove-class(InlineReactHoverableHighlight-highlight)

which will remove the reaction section underneath comments and the highlights corresponding to those reactions.

The former of these you can also do through the element picker.

Comment by faul_sname on Losing Faith In Contrarianism · 2024-04-26T04:02:56.758Z · LW · GW

It strikes me that there's a rather strong selection effect going on here. If someone has a contrarian position, and they happen to be both articulate and correct, they will convince others and the position will become less surprising over time.

The view that psychology and sociology research has major systematic issues at a level where you should just ignore most low-powered studies is no longer considered a contrarian view.

Comment by faul_sname on Bogdan Ionut Cirstea's Shortform · 2024-04-25T19:28:10.851Z · LW · GW

@the gears to ascension I see you reacted "10%" to the phrase "while (overwhelmingly likely) being non-scheming" in the context of the GPT-4V-based MAIA.

Does that mean you think there's a 90% chance that MAIA, as implemented, today is actually scheming? If so that seems like a very bold prediction, and I'd be very interested to know why you predict that. Or am I misunderstanding what you mean by that react?

Comment by faul_sname on Vector Planning in a Lattice Graph · 2024-04-25T16:34:04.412Z · LW · GW

Do you want me to spoil it for you, do you want me to drop a hint, or do you want to puzzle it out yourself? It's a beautiful little puzzle and very satisfying to solve. Also note that the solution I found only works if you are given a graph with the structure above (i.e. every node is part of the lattice, and the lattice is fairly small in each dimension, and the lattice has edges rather than wrapping around).

Comment by faul_sname on Will_Pearson's Shortform · 2024-04-24T17:39:57.623Z · LW · GW

Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.

Comment by faul_sname on Vector Planning in a Lattice Graph · 2024-04-24T17:29:25.659Z · LW · GW

Fun side note: in this particular example, it doesn't actually matter how you pick your direction. "Choose the axis closest to the target direction" performs exactly as well as "choose any edge which does not make the target node unreachable when traversed at random, and then traverse that edge" or "choose the first edge where traversing that edge does not make the target node unreachable, and traverse that edge".

Edit: at least assuming that the graph is directed

Comment by faul_sname on faul_sname's Shortform · 2024-04-24T08:14:43.976Z · LW · GW

So I keep seeing takes about how to tell if LLMs are "really exhibiting goal-directed behavior" like a human or whether they are instead "just predicting the next token". And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.

Concrete example. Let's say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?

The goal-directed-behavior story is as follows:

  • Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane
  • Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right.
  • Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp
  • Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace
  • Jim wanted to go to his workplace because his workplace pays him money
  • Jim wants money because money can be exchanged for goods and services
  • Jim wants goods and services because they get him things he terminally values like mates and food

But there's an alternative story:

  • When in the context of "I am a middle-class adult", the thing to do is "have a job". Years ago, this context triggered Bob to perform the action "get a job", and now he's in the context of "having a job".
  • When in the context of "having a job", "showing up for work" is the expected behavior.
  • Earlier this morning, Bob had the context "it is a workday" and "I have a job", which triggered Bob to begin the sequence of actions associated with the behavior "commuting to work"
  • Bob is currently approaching the exit for his work - with the context of "commuting to work", this means the expected behavior is "get in the exit lane", and now he's in the context "switching one lane to the right"
  • In the context of "switching one lane to the right", one of the early actions is "turn on the right turn signal by pushing the turn signal lever". And that is what Bob is doing right now.

I think this latter framework captures some parts of human behavior that the goal-directed-behavior framework misses out on. For example, let's say the following happens

  1. Jim is going to see his good friend Bob on a Saturday morning
  2. Jim gets on the freeway - the same freeway, in fact, that he takes to work every weekday morning
  3. Jim gets into the exit lane for his work, even though Bob's house is still many exits away
  4. Jim finds himself pulling onto the street his workplace is on
  5. Jim mutters "whoops, autopilot" under his breath, pulls a u turn at the next light, and gets back on the freeway towards Bob's house

This sequence of actions is pretty nonsensical from a goal-directed-behavior perspective, but is perfectly sensible if Jim's behavior here is driven by contextual heuristics like "when it's morning and I'm next to my work's freeway offramp, I get off the freeway".

Note that I'm not saying "humans never exhibit goal-directed behavior".

Instead, I'm saying that "take a goal, and come up with a plan to achieve that goal, and execute that plan" is, itself, just one of the many contextually-activated behaviors humans exhibit.

I see no particular reason that an LLM couldn't learn to figure out when it's in a context like "the current context appears to be in the execute-the-next-step-of-the-plan stage of such-and-such goal-directed-behavior task", and produce the appropriate output token for that context.

Comment by faul_sname on Vector Planning in a Lattice Graph · 2024-04-23T21:51:32.827Z · LW · GW

Easier question: Let's say you have a single node in this graph of  nodes.  You want to figure out where that single node should be embedded in your 100-dimensional space, but you only care about its embedding location relative to a few specific other nodes.

You have the following affordances:

  1. List the edges that originate at a node. The edges will always be returned in the same order for the same node, but the order is not necessarily the same for different nodes (i.e. the first edge may point along the 57th axis for one node and in the 22nd axis for a different node)
  2. Given an edge, retrieve the node at the far end of that edge
  3. Compare two nodes to see if they are the same node as each other

That is to say, if you have the following problem definition

import random

class Node:
    key = None
    edges = None

    def __init__(self):
        self.edges = []

class Edge:
    _src = None
    _get_dst = None
    _dst = None

    def __init__(self, src, get_dst):
        self._src = src
        self._get_dst = get_dst

    def get_dst(self):
        if self._dst is None:
            self._dst = self._get_dst()
        return self._dst

class Graph:
    def __init__(self, axis_length, n_dims):
        self.axis_length = axis_length
        self.n_dims = n_dims
        self._nodes = {}
        self._next_node_id = 1

    def get_node_at(self, coords):
        axis_order = list(range(self.n_dims))
        random.shuffle(axis_order)
        if coords not in self._nodes:
            node = Node()
            node.key = self._next_node_id
            self._next_node_id += 1
            for axis in axis_order:
                if coords[axis] == 0:
                    continue
                dst_coords = list(coords)
                dst_coords[axis] -= 1
                dst_coords = tuple(dst_coords)
                def make_edge(dst_coords):
                    def get_dst():
                        return self.get_node_at(list(coords))
                    return Edge(node, lambda: self.get_node_at(dst_coords))
                edge = make_edge(dst_coords)
                node.edges.append(edge)
            self._nodes[coords] = node
        return self._nodes[coords]

    def get_random_node(self):
        return self.get_node_at(tuple([random.randint(0, self.axis_length-1) for _ in range(self.n_dims)]))

and you want a function which will take an arbitrary node and give you the coordinates of that node in a consistent basis in finite time with arbitrarily high probability of correctness

class ComputedBasis:
	def __init__(self):
		self.node_positions_by_key = {}

	def get_coords(node):
    	# Given a node, give the coordinates of that node in some
    	# consistent basis 
    	pass

I claim that this is indeed possible to do, and the steps to do it look nothing like "compute  things".

Edit: To be explicit about the motivation, once we define this function, we can find a path from our position to the sandwich using something like

def path_to_sandwich(my_node, sandwich_node):
    basis = ComputedBasis()
    my_coords = basis.get_coords(my_node)
    sandwich_coords = basis.get_coords(sandwich_node)
    for axis, (my_pos, sandwich_pos) in zip(my_coords, sandwich_coords):
        if my_pos < sandwich_pos:
           raise(f"""
               Can't get to sandwich from here!
               I can only travel towards the origin on each axis.
                   axis: {axis}
                   my_pos: {my_pos}
                   sandwich_pos: {sandwich_pos}
           """)
     return get_path(basis, my_node, sandwich_node)

def get_path(basis, start_node, goal_node):
    curr_node = start_node
    path = [curr_node]
    goal_coords = basis.get_coords(goal_node)
    while curr_node != goal_node:
        curr_coords = basis.get_coords(curr_node)
    	# Find the first axis where we need to move towards the goal along that axis.
    	for axis, (curr_pos, goal_pos) in zip(cur_coords, goal_coords):
    	    if curr_pos > goal_pos:
    	         step_coords = [p for p in curr_pos]
    	      	 step_coords[axis] -= 1
    	      	 step_coords = tuple(step_coords)
    	      	 break
    	for edge in curr_node.edges:
    	    dst_node = edge.get_dst()
    	    dst_coords = basis.get_coords(dst_node)
    	    if dst_coords == step_coords:
    	    	step_node = dst_node
    	    	break
    	curr = step_node
    	path.append(curr)
    return path

Note that my framing of the problem is slightly different, in that (0, 0, 0, ..., 0, 0, 0) is the point from which there are no outbound edges, rather than (10, 10, 10, ..., 10, 10, 10) in your version. Doesn't really make a difference logically, just makes the code more readable.

Comment by faul_sname on Johannes C. Mayer's Shortform · 2024-04-23T17:58:51.181Z · LW · GW

In that post, you say that you have a graph of  vertices with a particular structure. In that scenario, where is that structured graph of  vertices coming from? Presumably there's some way you know the graph looks like this

rather than looking like this

 

If you know that your graph is a nice sparse graph that has lots of symmetries, you can take advantage of those properties to skip redundant parts of the computation (and when each of your  nodes has at most 100 inbound edges and 100 outbound edges, then you only have on the order of a trillion distinct nodes (if we consider e.g. (0, 0, 0, ..., 0, 0, 1) to be identical to (0, 0, 0, ..., 0, 1, 0, ..., 0, 0, 0)).

It's probably worth looking at the process which is generating this graph, and figuring out if we can translate the output of that process directly to a coordinate in our 100-dimensional space without going through the "translate the output to a graph, and then embed that graph" intermediate step.

Comment by faul_sname on Goal oriented cognition in "a single forward pass" · 2024-04-22T22:15:09.694Z · LW · GW

I think the probability of getting the exact continuation "a a a a a ..." is genuinely higher than the probability of getting the exact continuation "little girl who was born with a very special gift...", though getting a continuation in the class of "a a a a a..." is much lower-probability than getting a continuation in the class of "little girl who was born with a very special gift..", because the latter class has a much larger possibility space than the former. So there might be 1e4 different low-entropy length-32 completions with an average probability of 1e-10 each, and 9.999999e15 different high-entropy length-32 completions with an average probability of 1e-16. This adds up to normality in that if you were to randomly sample this distribution, you'd get a weird low-entropy output one time in a million, and a normal high-entropy output the other 999999 times in a million. But if you try to do something along the lines of "take the best K outputs and train the model on those", you'll end up with almost entirely weird low-entropy outputs.

But yeah, I think I misunderstood your proposal as something along the lines of "take the k most probable n-token outputs" rather than "take the k% most probable n-token outputs" or "randomly sample a bunch of n-token outputs".

Comment by faul_sname on Goal oriented cognition in "a single forward pass" · 2024-04-22T20:30:17.690Z · LW · GW

And I think one way to create a 2-token reasoner is to generate all plausible completions of 2 tokens, and then propagate the joint loss of the log-probs of those two tokens.

I think this just doesn't work very well, because it incentivizes the model to output a token which makes subsequent tokens easier to predict, as long as the benefit in predictability of the subsequent token(s) outweighs the cost of the first token. Concretely, let's say you have the input "Once upon a time, there was a" and you want 32 tokens. Right now, davinci-002 will spit out something like [" little"," girl"," who"," was"," born"," with"," a"," very"," special"," gift","."," She"," could"," see"," things"," that"," others"," could"," not","."," She"," could"," see"," the"," future",","," and"," she"," could"," see"," the"," past"], with logprobs of [-2.44, -0.96, -0.90, ..., -0.28, -0.66, 0.26], summing to -35.3. But if instead, it returned [" a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"," a"], it would have logprobs like [-9.32, -7.77, -1.51,  ..., -0.06, -0.05, -0.05], summing to -23.5. And indeed, if you could somehow ask a couple quadrillion people "please write a story starting with Once upon a time, there was a", I suspect that at least 1 in a million people would answer with low-entropy completions along the lines of  a a a a ... (and there just aren't that many low-entropy completions). But "Once upon a time there was a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a" is not a very good completion, despite being a much higher-probability completion.

You could use a more sophisticated loss function that "sum of individual-token logprob", but I think that road leads towards PPO (nothing says that your criterion has to be "helpful/harmful/honest as judged by a human rater" though).

Comment by faul_sname on Johannes C. Mayer's Shortform · 2024-04-22T18:14:44.921Z · LW · GW

Do you need to store information about each object? If so, do you need to do so before or after the operation.

If you need to store information about each object before processing (let's say 1 bit per object, for simplicity), the Landauer limit says you need  of mass to store that information (at the current cosmic microwave background temperature of 2.7ºK). That's about a factor of  more than the mass of the observable universe, so the current universe could not even store  bits of information for you to perform your operation on in the first place.

I think if you're willing to use all the mass in the universe and wait a trillion billion years or so for the universe to cool off, you might be able store one bit of output per operation for  operations, assuming you can do some sort of clever reversible computing thing to make the operations themselves approximately free.

Is there some specific computation you are thinking of that is useful if you can do it  times but not useful if you can only do it  times? 

Comment by faul_sname on shortplav · 2024-04-21T04:30:24.742Z · LW · GW

Ideally one would want to be able to compute the logical correlation without having to run the program.

I think this isn't possible in the general case. Consider two programs, one of which is "compute the sha256 digest of all 30 byte sequences and halt if the result is 9a56f6b41455314ff1973c72046b0821a56ca879e9d95628d390f8b560a4d803" and the other of which is "compute the md5 digest of all 30 byte sequences and halt if the result is 055a787f2fb4d00c9faf4dd34a233320".

Any method that was able to compute the logical correlation between those would also be a program which at a minimum reverses all cryptograhic hash functions.

Comment by faul_sname on Cohesion and business problems · 2024-04-19T01:50:58.953Z · LW · GW

Your POS system exports data that your inventory software imports and uses. But I strongly suspect that this is often not possible in practice.

 

 

This sounds like exactly the sort of problem that a business might pay for a solution to, particularly if there is one particular pair of POS system / inventory software that is widely used in the industry in question, where those pieces of software don't natively play well together.

Comment by faul_sname on Experiments with an alternative method to promote sparsity in sparse autoencoders · 2024-04-18T07:33:16.001Z · LW · GW

The other baseline would be to compare one L1-trained SAE against another L1-trained SAE -- if you see a similar approximate "1/10 have cossim > 0.9, 1/3 have cossim > 0.8, 1/2 have cossim > 0.7" pattern, that's not definitive proof that both approaches find "the same kind of features" but it would strongly suggest that, at least to me.

Comment by faul_sname on Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer · 2024-04-18T05:07:16.956Z · LW · GW

With that in mind, the real hot possibility is the inverse of what Shai and his coresearchers did. Rather than start with a toy model with some known nice latents, start with a net trained on real-world data, and go look for self-similar sets of activations in order to figure out what latent variables the net models its environment as containing. The symmetries of the set would tell us something about how the net updates its distributions over latents in response to inputs and time passing, which in turn would inform how the net models the latents as relating to its inputs, which in turn would inform which real-world structures those latents represent.

Along these lines, I wonder whether you get similar scaling laws by training on these kind of hidden markov processes as you do by training on real-world data, and if so if there is some simple relationship between the underlying structure generating the data and the coefficients of those scaling laws. That might be informative for the question of what level of complexity you should expect in the self-similar activation sets in real-world LLMs. And if the scaling laws are very different, that would also be interesting.

Comment by faul_sname on Experiments with an alternative method to promote sparsity in sparse autoencoders · 2024-04-16T06:03:31.537Z · LW · GW

This is really cool!

  • I did some tests on random features for interpretability, and found them to be interpretable. However, one would need to do a detailed comparison with SAEs trained on an L1 penalty to properly understand whether this loss function impacts interpretability. For what it’s worth, the distribution of feature sparsities suggests that we should expect reasonably interpretable features.

One cheap and lazy approach is to see how many of your features have high cosine similarity with the features of an existing L1-trained SAE (e.g. "900 of the 2048 features detected by the  -trained model had cosine sim > 0.9 with one of the 2048 features detected by the L1-trained model"). I'd also be interested to see individual examinations of some of the features which consistently appear across multiple training runs in the -trained model but don't appear in an L1-trained SAE on the training dataset.

Comment by faul_sname on nikola's Shortform · 2024-04-15T17:39:01.256Z · LW · GW

Not just "some robots or nanomachines" but "enough robots or nanomachines to maintain existing chip fabs, and also the supply chains (e.g. for ultra-pure water and silicon) which feed into those chip fabs, or make its own high-performance computing hardware".

If useful self-replicating nanotech is easy to construct, this is obviously not that big of an ask. But if that's a load bearing part of your risk model, I think it's important to be explicit about that.

Comment by faul_sname on Open Thread Spring 2024 · 2024-04-13T18:40:54.558Z · LW · GW

By building models which reason inductively, we tackle complex formal language tasks with immense commercial value: code synthesis and theorem proving.

There are commercially valuable uses for tools for code synthesis and theorem proving. But structured approaches of that flavor don't have a great track record of e.g. doing classification tasks where the boundary conditions are messy and chaotic, and similarly for a bunch of other tasks where gradient-descent-lol-stack-more-layer-ML shines.

Comment by faul_sname on Open Thread Spring 2024 · 2024-04-13T03:53:17.557Z · LW · GW

Outside view (bitter lesson).

Or at least that's approximately true. I'll have a post on why I expect the bitter lesson to hold eventually, but is likely to be a while. If you read this blog post you can probably predict my reasoning for why I expect "learn only clean composable abstraction where the boundaries cut reality at the joints" to break down as an approach.

Comment by faul_sname on Open Thread Spring 2024 · 2024-04-13T00:55:32.733Z · LW · GW

I'd bet against anything particularly commercially successful. Manifold could give better and more precise predictions if you operationalize "commercially viable".

Comment by faul_sname on Is LLM Translation Without Rosetta Stone possible? · 2024-04-12T23:32:43.249Z · LW · GW

Similar question: Let's start with an easier but I think similarly shaped problem.

We have two next-token predictors. Both are trained on English text, but each one was trained on a slightly different corpus (let's say one the first one was trained on all arxiv papers and the other one was trained on all public domain literature), and each one uses a different tokenizer (let's say the arxiv one used a BPE tokenizer and the literature one used some unknown tokenization stream).

Unfortunately, the tokenizer for the second corpus has been lost. You still have the tokenized dataset for the second corpus, and you still have the trained sequence predictor, but you've lost the token <-> word mapping. Also due to lobbying, the public domain is no longer a thing and so you don't have access to the original dataset to try to piece things back together.

You can still feed a sequence of integers which encode tokens to the literature-next-token-predictor, and it will spit out integers corresponding to its prediction of the next token, but you don't know what English words those tokens correspond to.

I expect, in this situation, that you could do stuff like "create a new sequence predictor that is trained on the tokenized version of both corpora, so that the new predictor will hopefully use some shared machinery for next token prediction for each dataset, and then do the whole sparse autoencoder thing to try and tease apart what those shared abstractions are to build hypotheses".

Even in that "easy" case, though, I think it's a bit harder than "just ask the LLM", but the easy case is, I think, viable.

Comment by faul_sname on Poker, Beef Wellington, and Mount Stupid · 2024-04-12T20:11:03.768Z · LW · GW

For anyone who wants to play poker in the way mentioned above, where you treat the game as a puzzle / battle of wits where you deduce what cards your opponents have based on logic and psychology, let me know so we can set up a poker night!

Joking aside

Don't think your high level in one area will translate to others

Yeah, this is a pretty good guideline. There may be a general-factor-of-being-good-at-learning-things but, in my experience, there is no general-factor-of-being-good-at-things that transfers from one domain to another significantly different one.

Comment by faul_sname on Martín Soto's Shortform · 2024-04-12T09:37:54.914Z · LW · GW

Fixed, thanks

Comment by faul_sname on Martín Soto's Shortform · 2024-04-12T09:26:12.589Z · LW · GW

The joke is of the "take some trend that is locally valid and just extend the trend line out and see where you land" flavor. For another example of a joke of this flavor, see https://xkcd.com/1007

Sustainable

The funny happens in the couple seconds when the reader is holding "yep that trend line does go to that absurd conclusion" and "that obviously will never happen" in their head at the same time, but has not yet figured out why the trend breaks. The expected level of amusement is "exhale slightly harder than usual through nose" not "cackling laugh".

Comment by faul_sname on MakoYass's Shortform · 2024-04-11T00:27:22.648Z · LW · GW

I assume you mean, by "stamp collectors", people on the biology/chemistry/materials science side of things, rather than on the math/theoretical physics side of things, and by "extraordinary claims" you mean something along the lines of "claims that a specific simple model makes good predictions in a wide variety of circumstances", and by "ordinary evidence" you mean something along the lines of "some local pieces of evidence from one or a few specific experiments". So with that in mind:

  1. Biology:
    1. Cell theory ("If you look at a tissue sample from a macroscopic organism, it will be made of cells.")
    2. Homeostasis ("If you change the exterior environment of an organism, its responses will tend to keep its internal state within a certain range in terms of e.g. temperature, salinity, pH, etc).
    3. DNA->RNA->protein pipeline ("If you look at an organism's DNA, you can predict the order of the amino acid residues in the proteins it expresses,, and every organism uses pretty much the same codon table which is blah blah")
  2. Chemistry:
    1. Acid-base chemistry 
    2. Bond geometry and its relation to orbitals (e.g. "bond angles will tend to be ~109º for things attached to a carbon that has only single bonds, because that's the angle that two vertices of a tetrahedron make across the center").
    3. Bond energy (i.e. "you can predict pretty well how much energy a given reaction will produce just by summing the bond energy of each individual bond before and after")
    4. Resonance/delocalization
    5. Law of Mass Action: (i.e. "for every chemical reaction, there is an equilibrium ratio of reactants to products at a constant temperature. That equilibrium is computable based on the number of molecules in the reactants and products, and the energy contained within those molecules")
    6. For organic chemistry, literally hundreds of "if you put a molecule with this specific structure in with these specific reagents in these specific conditions, you will get a molecule that is transformed in this one specific way with no other important changes". For a concrete example: if you have a Grignard Reagent RMgX, and an aldehyde R'HO, you can combine them to form R-CH(OH)-R'. Individually, these "laws" are perhaps not so satisfying, but in combination they say "for pretty much any organic compound, you can synthesize that compound from relatively cheap inputs by using some combination of these reactions".
  3. Misc other fields
    1. The photovoltaic effect, demonstrated in 1839, and its relation to the band gap -- the fact that some materials have energy levels that are "forbidden" to electrons led to unexplained empirical observations all the way back in 1839, and understanding the phenomenon (and tinkering a whole bunch, because analytical and computational methods don't even come close to being good enough) paved the way to the information age.
    2. Fourier Transforms aren't directly a physical phenomenon, but the fact that you can convert a series of values of any complex periodic system down into a sum of simple sine waves, knowing only the input frequencies but not the input amplitudes, meant that you could e.g. mechanically predict the future tides for a location based only on the past tides for that location.

I'm not so sure how well these examples will demonstrate that "collecting buckets of examples is not as useful as being able to deeply interpret and explain the examples that you have", but also I'm pretty sure that's just false a lot of the time -- you may have a deep theory of everything which is in principle sufficient, but that doesn't mean your deep theory of everything is computationally tractable for solving the specific problem you have in front of you.