LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

What is bunk?
[deleted] · 2010-05-08T18:06:08.435Z · comments (108)

Moving marginal mothers
KatjaGrace · 2010-05-07T14:40:14.000Z · comments (0)

Beauty quips, "I'd shut up and multiply!"
neq1 · 2010-05-07T14:34:27.204Z · comments (358)

Cognitive Bias Song
xamdam · 2010-05-06T23:30:55.781Z · comments (5)

Antagonizing Opioid Receptors for (Prevention of) Fun and Profit
Scott Alexander (Yvain) · 2010-05-05T14:40:12.797Z · comments (35)

The Cameron Todd Willingham test
Kevin · 2010-05-05T00:11:47.162Z · comments (86)

Experiences are friends
KatjaGrace · 2010-05-04T21:03:03.000Z · comments (0)

But Somebody Would Have Noticed
Alicorn · 2010-05-04T18:56:34.802Z · comments (258)

Human values differ as much as values can differ
PhilGoetz · 2010-05-03T19:35:25.533Z · comments (220)

Connotations are indelible
KatjaGrace · 2010-05-03T13:20:18.000Z · comments (0)

Enjoy ≠ want, but why should wants submit?
KatjaGrace · 2010-05-02T11:17:07.000Z · comments (0)

Rationality quotes: May 2010
ata · 2010-05-01T05:48:10.694Z · comments (301)

Open Thread: May 2010
Jack · 2010-05-01T05:29:40.871Z · comments (558)

← previous page (newer posts) · next page (older posts) →

Archive

Recent comments

sam-marks on Improving Dictionary Learning with Gated Sparse Autoencoders

Great work! Obviously the results here speak for themselves, but I especially wanted to complement the authors on the writing. I thought this paper was a pleasure to read, and easily a top 5% exemplar of clear technical writing. Thanks for putting in the effort on that.

I'll post a few questions as children to this comment.

dagon on keltan's Shortform

Hmm. I don't doubt that targeted voice-mimicking scams exist (or will soon). I don't think memorable, reused passwords are likely to work well enough to foil them. Between forgetting (on the sender or receiver end), claimed ignorance ("Mom, I'm in jail and really need money, and I'm freaking out! No, I don't remember what we said the password would be"), and general social hurdles ("that's a weird thing to want"), I don't think it'll catch on.

Instead, I'd look to context-dependent auth (looking for more confidence when the ask is scammer-adjacent), challenge-response (remember our summer in Fiji?), 2FA (let me call the court to provide the bail), or just much more context (5 minutes of casual conversation with a friend or relative is likely hard to really fake, even if the voice is close).

But really, I recommend security mindset and understanding of authorization levels, even if authentication isn't the main worry. Most friends, even close ones, shouldn't be allowed to ask you to mail $500 in gift cards to a random address, even if they prove they are really themselves.

cousin_it on "Why I Write" by George Orwell (1946)

Orwell is one of my personal heroes, 1984 was a transformative book to me, and I strongly recommend Homage to Catalonia as well.

That said, I'm not sure making theories of art is worth it. Even when great artists do it (Tolkien had a theory of art, and Flannery O'Connor, and almost every artist if you look close enough), it always seems to be the kind of theory which suits that artist and nobody else. Would advice like "good prose is like a windowpane" or "efface your own personality" improve the writing of, say, Hunter S. Thompson? Heck no, his writing is the opposite of that and charming for it! Maybe the only possible advice to an artist is to follow their talent, and advising anything more specific is as likely to hinder as help.

glen-taggart on ProLU: A Nonlinearity for Sparse Autoencoders

Thank you!

That's super cool you've been doing something similar. I'm curious to see what direction you went in. It seemed like there's a large space of possible things to do along these lines. DeepMind also did a similar but different thing here.

What does the distribution of learned biases look like?

That's a great question, something I didn't note in here is that positive biases have no effect on the output of the SAE -- so, if the biases were to be mostly positive that would suggest this approach is missing something. I saved histograms of the biases during training, and they generally look to be mostly (80-99% of bias values I feel like?) negative. I expect the exact distributions vary a good bit depending on L1 coefficient though.

I'll post histograms here shortly. I also have the model weights so I can check in more detail or send you weights if you'd like either of those things.

On a related point, something I considered: since positive biases behave the same as zeros, why not use ProLU where the bias is negative and regular ReLU where the biases are positive? I tried this, and it seemed fine but it didn't seem to make a notable impact on performance. I expect there's some impact, but like a <5% change and I don't know in which direction, so I stuck with the simpler approach. Plus, anyways, most of the bias values tend to be negative.

For the STE variant, did you find it better to use the STE approximation for the activation gradient, even though the approximation is only needed for the bias?

I think you're asking whether it's better to use the STE gradient only on the bias term, since the mul () term already has a 'real gradient' defined. If I'm interpreting correctly, I'm pretty sure the answer is yes. I think I tried using the synthetic grads just for the bias term and found that performed significantly worse (I'm also pretty sure I tried the reverse just in case -- and that this did not work well either). I'm definitely confused on what exactly is going on with this. The derivation of these from the STE assumption is the closest thing I have to an explanation and then being like "and you want to derive both gradients from the same assumptions for some reason, so use the STE grads for $m$ too." But this still feels pretty unsatisfying to me, especially when there's so many degrees of freedom in deriving STE grads:

choice of STE
I glossed over this but it seems like maybe we should think of the grads of $Thresh$ like $\frac{\partial^{*} Thresh (x)}{\partial x} = k \cdot ST (x)$ where $k > 0$
- I think this because $Thresh (x)^{n} = Thresh (x)^{m}$ for $n, m > 1$
- I also see an argument from this that $Thresh (x)$ should be a term in the partial of $Thresh$ , which is a property I like about taking $Thresh (x)$ as it's own derivative

Another note on the STE grads: I first found these gradients worked emperically, was pretty confused by this, spent a bunch of time trying to find an intuitive explanation for them plus trying and failing to find a similar-but-more-sensible thing that works better. Then one night I realized that those exact gradient come pretty nicely from these STE assumptions, and it's the best hypothesis I have for "why this works" but I still feel like I'm missing part of the picture.

I'd be curious if there are situations where the STE-style grads work well in a regular ReLU, but I expect not. I think it's more that there is slack in the optimization problem induced by being unable to optimize directly for L0. I think it might be just that the STE grads with L1 regularization point more in the direction of L0 minimization. I have a little analysis I did supporting this I'll add to the post when I get some time.

lblack on Examples of Highly Counterfactual Discoveries?

It's measuring the volume of points in parameter space with loss when $ϵ$ is infinitesimal.

This is slightly tricky because it doesn't restrict itself to bounded parameter spaces,^[1] but you can fix it with a technicality by considering how the volume scales with $ϵ$ instead.

In real networks trained with finite amounts of data, you care about the case where $ϵ$ is small but finite, so this is ultimately inferior to just measuring how many configurations of floating point numbers get loss $< ϵ$ , if you can manage that.

I still think SLT has some neat insights that helped me deconfuse myself about networks.

For example, like lots of people, I used to think you could maybe estimate the volume of basins with loss $< ϵ$ using just the eigenvalues of the Hessian. You can't. At least not in general.

^{^}
Like the floating point numbers in a real network, which can only get so large. A prior of finite width over the parameters also effectively bounds the space

evan-r-murphy on Bing Chat is blatantly, aggressively misaligned

Thanks, I think you're referring to:

It may still be possible to harness the larger model capabilities without invoking character simulation and these problems, by prompting or fine-tuning the models in some particular careful ways.

There were some ideas proposed in the paper "Conditioning Predictive Models: Risks and Strategies" by Hubinger et al. (2023). But since it was published over a year ago, I'm not sure if anyone has gotten far on investigating those strategies to see which ones could actually work. (I'm not seeing anything like that in the paper's citations.)

eggsyntax on eggsyntax's Shortform

the model isn't optimizing for anything, at training or inference time.

One maybe-useful way to point at that is: the model won't try to steer toward outcomes that would let it be more successful at predicting text.

eggsyntax on eggsyntax's Shortform

And the potential complication of multiple parts and specific applications a tool-oriented system is likely to be in - it'd be very odd if we decided the language processing center of our own brain was independently sentient/sapient separate from the rest of it, and we should resent its exploitation.

Yeah. I think a sentient being built on a purely more capable GPT with no other changes would absolutely have to include scaffolding for eg long-term memory, and then as you say it's difficult to draw boundaries of identity. Although my guess is that over time, more of that scaffolding will be brought into the main system, eg just allowing weight updates at inference time would on its own (potentially) give these system long-term memory and something much more similar to a persistent identity than current systems.

In a general sense, though, there is an objective that's being optimized for

My quibble is that the trainers are optimizing for an objective, at training time, but the model isn't optimizing for anything, at training or inference time. I feel we're very lucky that this is the path that has worked best so far, because a comparably intelligent model that was optimizing for goals at runtime would be much more likely to be dangerous.

bill-benzon on The first future and the best future

YES.

At the moment the A.I. world is dominated by an almost magical believe in large language models. Yes, they are marvelous, a very powerful technology. By all means, let's understand and develop them. But they aren't the way, the truth and the light. They're just a very powerful and important technology. Heavy investment in them has an opportunity cost, less money to invest in other architectures and ideas.

And I'm not just talking about software, chips, and infrastructure. I'm talking about education and training. It's not good to have a whole cohort of researchers and practitioners who know little or nothing beyond the current orthodoxy about machine learning and LLMs. That kind of mistake is very difficult to correct in the future. Why? Because correcting it means education and training. Who's going to do it if no one knows anything else?

Moreover, in order to exploit LLMs effectively we need to understand how they work. Mechanistic interpretability is one approach. But: We're not doing enough of it. And by itself it won't do the job. People need to know more about language, linguistics, and cognition in order to understand what those models are doing.

matthew-barnett on The first future and the best future

Do you think it's worth slowing down other technologies to ensure that we push for care in how we use them over the benefit of speed? It's true that the stakes are lower for other technologies, but that mostly just means that both the upside potential and the downside risks are lower compared to AI, which doesn't by itself imply that we should go quickly.