LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

d0themath on Examples of Highly Counterfactual Discoveries?

I've heard an argument that Mendel was actually counter-productive to the development of genetics. That if you go and actually study peas like he did, you'll find they don't make perfect Punnett squares, and from the deviations you can derive recombination effects. The claim is he fudged his data a little in order to make it nicer, then this held back others from figuring out the topological structure of genotypes.

adam-scherlis on What's up with all the non-Mormons? Weirdly specific universalities across LLMs

I suspect a lot of this has to do with the low temperature.

The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.

Going word by word, "person who is not a member..." is all nice and vague and generic; by the time you get to "a member of the", obvious continuations are "Church" or "Communist Party"; by the time you have "the Church of", "England" is a pretty likely continuation. Why Mormons though?

"Since 2018, the LDS Church has emphasized a desire for its members be referred to as "members of The Church of Jesus Christ of Latter-day Saints"." --Wikipedia

And there just aren't that many other likely continuations of the low-temperature-attracting phrase "members of the Church of".

(While "member of the Communist Party" is an infamous phrase from McCarthyism.)

If I'm right, sampling at temperature 1 should produce a much more representative set of definitions.

faul_sname on Will_Pearson's Shortform

Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.

tailcalled on David Udell's Shortform

The tricky part is, on the margin I would probably use various shortcuts, and it's not clear where those shortcuts end short of just getting knowledge beamed into my head.

I already use LLMs to tell me facts, explain things I'm unfamiliar with, handle tedious calculations/coding, generate simulated data/brainstorming and summarize things. Not much, because LLMs are pretty bad, but I do use them for this and I would use them more on the margin.

creon-levit on Mapping the semantic void: Strange goings-on in GPT embedding spaces

You said "there are too few strictly-orthogonal directions, so we need to cram things in somehow."

I don't think that's true. That is a low-dimensional intuition that does not translate to high dimensions. It may be "strictly" true if you want the vectors to be exactly orthogonal, but such perfect orthogonality is unnecessary. See e.g. papers that discuss "the linearity hypothesis' in deep learning.

As a previous poster pointed out (and as Richard Hamming pointed out long ago) "almost any pair of random vectors in high-dimensional space are almost-orthogonal." And almost orthogonal is good enough.

(when we say "random vectors in high dimensional space" we mean they can be drawn from any distribution roughly centered at the origin: Uniformly in a hyperball, or uniformly from the surface of a hypersphere, or uniformly in a hypercube, or random vertices from a hypercube, or drawn from a multivariate gaussian, or from a convex hyper-potato...)

You can check this numerically, and prove it analytically for many well-behaved distribution.

One useful thought is to consider the hypercube centered at the origin where all vertices coordinates are ±1. A random vertex is a long random vector that look like {±1, ±1,... ±1} where each coordinate has a 50% probability of being +1 or -1 respectively.

What is the expected value of the dot product of pairs of such random (vertex) vectors? They dot product is almost always close to zero.

There are an exponential number of almost-orthogonal directions in high dimensions. The hypercube vertices are just an easy example to work out analytically, but the same phenomenon occurs for many distributions. Particularly hyperballs, hyperspheres, and gaussians.

The hypercube example above, BTW, corresponds to one-bit quantization of the embedding vector space dimensions. It often works surprisingly well. (see also "locality sensitive hashing").

This point that Hamming made (and he was probably not the first) lies close to the heart of all embedding-space-based learning systems.

faul_sname on Planning in a Lattice Graph

Fun side note: in this particular example, it doesn't actually matter how you pick your direction. "Choose the axis closest to the target direction" performs exactly as well as "choose any edge which does not make the target node unreachable when traversed at random, and then traverse that edge" or "choose the first edge where traversing that edge does not make the target node unreachable, and traverse that edge".

cubefox on Priors and Prejudice

Without an account of that, IBE is the claim that something being the best available explanation is evidence that it is true.

That being said, we typically judge the goodness of a possible explanation by a number of explanatory virtues like simplicity, empirical fit, consistency, internal coherence, external coherence (with other theories), consilience, unification etc. To clarify and justify those virtues on other (including Bayesian) grounds is something epistemologists work on.

nathan-young on 1-page outline of Carlsmith's otherness and control series

I sort of don't think it hangs together that well as a series. Like I think it implies a lot more interesting points that it makes, hence my reordering.

davidmanheim on Paul Christiano named as US AI Safety Institute Head of AI Safety

The OP claimed it was a failure of BSL levels that induced biorisk as a cause area, and I said that was a confused claim. Feel free to find someone who disagrees with me here, but the proximate causes of EAs worrying about biorisk has nothing to do with BSL lab designations. It's not BSL levels that failed in allowing things like the soviet bioweapons program, or led to the underfunded and largely unenforceable BWC, or the way that newer technologies are reducing the barriers to terrorists and other being able to pursue bioweapons.

nathan-helm-burger on Open Thread Spring 2024

I've been using a remineralization toothpaste imported from Japan for several years now, ever since I mentioned reading about remineralization to a dentist from Japan. She recommended yhe brand to me. FDA is apparently bogging down release in the US, but it's available on Amazon anyway. It seems to have slowed, but not stopped, the formation of cavities. It does seem to result in faster plaque build-up around my gumline, like the bacterial colonies are accumulating some of the minerals not absorbed by the teeth. The brand I use is apagard.

LessWrong 2.0 Reader

Archive

Recent comments