LessWrong 2.0 Reader

View: New · Old · Top

next page (older posts) →

next page (older posts) →

Recent comments

interstice on Is being a trans woman +20 IQ?

performance gap of trans women over women

The post is about the performance gap of trans women over men, not women.

leon-lang on Examples of Highly Counterfactual Discoveries?

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it.

eggsyntax on eggsyntax's Shortform

Maybe by the time we cotton on properly, they're somewhere past us at the top end.

Great point. I agree that there are lots of possible futures where that happens. I'm imagining a couple of possible cases where this would matter:

Humanity decides to stop AI capabilities development or slow it way down, so we have sub-ASI systems for a long time (which could be at various levels of intelligence, from current to ~human). I'm not too optimistic about this happening, but there's certainly been a lot of increasing AI governance momentum in the last year.
Alignment is sufficiently solved that even > AGI systems are under our control. On many alignment approaches, this wouldn't necessarily mean that those systems' preferences were taken into account.

We can't "just ask" an LLM about its interests and expect the answer to soundly reflect its actual interests.

I agree entirely. I'm imagining (though I could sure be wrong!) that any future systems which were sentient would be ones that had something more like a coherent, persistent identity, and were trying to achieve goals.

LLMs specifically have a 'drive' to generate reasonable-sounding text

(not very important to the discussion, feel free to ignore, but) I would quibble with this. In my view LLMs aren't well-modeled as having goals or drives. Instead, generating distributions over tokens is just something they do in a fairly straightforward way because of how they've been shaped (in fact the only thing they do or can do), and producing reasonable text is an artifact of how we choose to use them (ie picking a likely output, adding it onto the context, and running it again). Simulacra like the assistant character can be reasonably viewed (to a limited degree) as being goal-ish, but I think the network itself can't.

That may be overly pedantic, and I don't feel like I'm articulating it very well, but the distinction seems useful to me since some other types of AI are well-modeled as having goals or drives.

tailcalled on Examples of Highly Counterfactual Discoveries?

Newton's Universal Law of Gravitation was the first highly accurate model of things falling down that generalized beyond the earth, and it is also the second-most computationally applicable model of things falling down that we have today.

Are you saying that singular learning theory was the first highly accurate model of breadth of optima, and that it's one of the most computationally applicable ones we have?

johannes-c-mayer on Johannes C. Mayer's Shortform

The point is that you are just given some graph. This graph is expected to have subgraphs which are lattice graphs. But you don't know where they are. And the graph is so big that you can't iterate the entire graph to find these lattices. Therefore you need a way to embed the graph without traversing it fully.

johannes-c-mayer on Johannes C. Mayer's Shortform

—The realization that I have a systematic distortion in my mental evaluation of plans, making actions seem less promising than they are. When I’m deciding whether to do stuff, I can apply a conscious correction to this, to arrive at a properly calibrated judgment.

—The realization that, in general, my thinking can have systematic distortions, and that I shouldn’t believe everything I think. This is basic less-wrong style rationalism, but it took years to work through all the actual consequences on me.

This is useful. Now that I think about it, I do this. Specifically, I have extremely unrealistic assumptions about how much I can do, such that these are impossible to accomplish. And then I feel bad for not accomplishing the thing.

I haven't tried to be mindful of that. The problem is that this is I think mainly subconscious. I don't think things like "I am dumb" or "I am a failure" basically at all. At least not in explicit language. I might have accidentally suppressed these and thought I had now succeeded in not being harsh to myself. But maybe I only moved it to the subconscious level where it is harder to debug.

johannes-c-mayer on Planning in a Lattice Graph

I might not understand exactly what you are saying. Are you saying that the problem is easy when you have a function that gives you the coordinates of an arbitrary node? Isn't that exactly the embedding function? So are you not therefore assuming that you have an embedding function?

I agree that once you have such a function the problem is easy, but I am confused about how you are getting that function in the first place. If you are not given it, then I don't think it is super easy to get.

In the OP I was assuming that I have that function, but I was saying that this is not a valid assumption in general. You can imagine you are just given a set of vertices and edges. Now you want to compute the embedding such that you can do the vector planning described in the article.

I agree that you probably can do better than though. I don't understand how your proposal helps though.

no77e-noi on The first future and the best future

From a purely utilitarian standpoint, I'm inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future.

That said, after we know there's "no chance" of extinction risk, I don't think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it's likely that we're giving up freedoms in a way that may reduce the value of the median future and increase the chance of stuff like totalitarian lock-in, which decreases the value of the average future overall.

I think you're correct that there's also to balance the "other existential risks exist" consideration in the calculation, although I don't expect it to be clear-cut.

oliver-daniels-koch on Benchmarks for Detecting Measurement Tampering [Redwood Research]

looking at your code - seems like there's an option for next-token prediction in the initial finetuning state, but no mention (that I can find) in the paper - am I correct in assuming the next token prediction weight was set to 0? (apologies for bugging you on this stuff!)

quetzal_rainbow on Is being a trans woman +20 IQ?

Whoops, it's really looks like I imagined this claim to be backed more than by one SSC post. In my defense I say that this poll covered really existing thing like abnormal illusions processing in schizophrenics (see "Systematic review of visual illusions schizophrenia" Costa et al., 2023) and I think it's overall plausible.

My general objections stays the same: there is a bazillion sources on brain differences in transgender individuals, transgenderism is likely to be a brain anomaly, we don't need to invoke "testosterone damage" hypothesis.

LessWrong 2.0 Reader

Archive

Recent comments