LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

unexpectedvalues on Some Experiments I'd Like Someone To Try With An Amnestic

Yeah, that's my best guess. I have other memories from that period (which was late into the hour), so I think it was the drug wearing off, rather than learning effects.

chris_leong on Does reducing the amount of RL for a given capability level make AI safer?

Oh, this is a fascinating perspective.

So most uses of RL already just use a small-bit of RL.

So if the goal was "only use a little bit of RL", that's already happening.

Hmm... I still wonder if using even less RL would be safer still.

michaeldickens on Which skincare products are evidence-based?

What do you think is the strongest evidence on sunscreen? I've read mixed things on its effectiveness.

carl-feynman on yanni's Shortform

This question is two steps removed from reality. Here’s what I mean by that. Putting brackets around each of the two steps:

what is the threshold that needs meeting [for the majority of people in the EA community] [to say something like] "it would be better if EAs didn't work at OpenAI"?

Without these steps, the question becomes

What is the threshold that needs meeting before it would be better if people didn’t work at OpenAI?

Personally, I find that a more interesting question. Is there a reason why the question is phrased at two removes like that? Or am I missing the point?

d0themath on D0TheMath's Shortform

Does the possibility of China or Russia being able to steal advanced AI from labs increase or decrease the chances of great power conflict?

An argument against: It counter-intuitively decreases the chances. Why? For the same reason that a functioning US ICBM defense system would be a destabilizing influence on the MAD equilibrium. In the ICBM defense circumstance, after the shield is put up there would be no credible threat of retaliation America's enemies would have if the US were to launch a first-strike. Therefore, there would be no reason (geopolitically) for America to launch a first-strike, and there would be quite the reason to launch a first strike: namely, the shield definitely works for the present crop of ICBMs, but may not work for future ICBMs. Therefore America's enemies will assume that after the shield is put up, America will launch a first strike, and will seek to gain the advantage while they still have a chance by launching a pre-emptive first-strike.

The same logic works in reverse. If Russia were building a ICBM defense shield, and would likely complete it in the year, we would feel very scared about what would happen after that shield is up.

And the same logic works for other irrecoverably large technological leaps in war. If the US is on the brink of developing highly militaristically capable AIs, China will fear what the US will do with them (imagine if the tables were turned, would you feel safe with Anthropic & OpenAI in China, and DeepMind in Russia?), so if they don't get their own versions they'll feel mounting pressure to secure their geopolitical objectives while they still can, or otherwise make themselves less subject to the threat of AI (would you not wish the US would sabotage the Chinese Anthropic & OpenAI by whatever means if China seemed on the brink?). The fast the development, the quicker the pressure will get, and the more sloppy & rash China's responses will be. If its easy for China to copy our AI technology, then there's much slower mounting pressure.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

Also, TC0 is very much limited, see e.g. this presentation.

bogdan-ionut-cirstea on Bogdan Ionut Cirstea's Shortform

I think I remember William Merrill (in a video) pointing out that the rational inputs assumption seems very unrealistic (would require infinite memory); and, from what I remember, https://arxiv.org/abs/2404.15758 and related papers made a different assumption about the number of bits of memory per parameter and per input.

lawrencec on yanni's Shortform

What does a "majority of the EA community" mean here? Does it mean that people who work at OAI (even on superalignment or preparedness) are shunned from professional EA events? Does it mean that when they ask, people tell them not to join OAI? And who counts as "in the EA community"?

I don't think it's that constructive to bar people from all or even most EA events just because they work at OAI, even if there's a decent amount of consensus people should not work there. Of course, it's fine to host events (even professional ones!) that don't invite OAI people (or Anthropic people, or METR people, or FAR AI people, etc), and they do happen, but I don't feel like barring people from EAG or e.g. Constellation just because they work at OAI would help make the case, (not that there's any chance of this happening in the near term) and would most likely backfire.

I think that currently, many people (at least in the Berkeley EA/AIS community) will tell you to not join OAI if asked. I'm not sure if they form a majority in terms of absolute numbers, but they're at least a majority in some professional circles (e.g. both most people at FAR/FAR Labs and at Lightcone/Lighthaven would probably say this). I also think many people would say that on the margin, too many people are trying to join OAI rather than other important jobs. (Due to factors like OAI paying a lot more than non-scaling lab jobs/having more legible prestige.)

Empirically, it sure seems significantly more people around here join Anthropic than OAI, despite Anthropic being a significantly smaller company.

Though I think almost none of these people would advocate for ~0 x-risk motivated people to work at OAI, only that the marginal x-risk concerned technical person should not work at OAI.

What specific actions are you hoping for here, that would cause you to say "yes, the majority of EA people say 'it's better to not work at OAI'"?

benito on LessOnline Festival Updates Thread

I anticipate the vast majority of people going to each of the events will be locals to the state and landmass respectively, so I don't think it's actually particularly costly for them to overlap.
That's unfortunate that you are less likely to come, and I'm glad to get the feedback. I could primarily reply with reasons why I think it was the right call (e.g. helpful for getting the event off the ground, helpful for pinpointing the sort of ideas+writing the event is celebrating, I think it's prosocial for me to be open about info like this generally, etc) but I don't think that engages with the fact that it left you personally less likely to come. I still overall think if the event sounds like a good time to you (e.g. interesting conversations with people you'd like to talk to and/or exciting activities) and it's worth the cost to you then I hope you come :-)

daemonicsigil on DaemonicSigil's Shortform

Linkpost for: https://pbement.com/posts/threads.html

Today's interesting number is 961.

Say you're writing a CUDA program and you need to accomplish some task for every element of a long array. Well, the classical way to do this is to divide up the job amongst several different threads and let each thread do a part of the array. (We'll ignore blocks for simplicity, maybe each block has its own array to work on or something.) The method here is as follows:

for (int i = threadIdx.x; i < array_len; i += 32) {
    arr[i] = ...;
}

So the threads make the following pattern (if there are threads):

⬛🟫🟥🟧🟨🟩🟦🟪⬛🟫🟥🟧🟨🟩🟦🟪⬛🟫🟥🟧🟨🟩🟦🟪⬛🟫

This is for an array of length $l = 26$ . We can see that the work is split as evenly as possible between the threads, except that threads 0 and 1 (black and brown) have to process the last two elements of the array while the rest of the threads have finished their work and remain idle. This is unavoidable because we can't guarantee that the length of the array is a multiple of the number of threads. But this only happens at the tail end of the array, and for a large number of elements, the wasted effort becomes a very small fraction of the total. In any case, each thread will loop $⌈ \frac{l}{t} ⌉ = ⌈ \frac{26}{8} ⌉ = 4$ times, though it may be idle during the last loop while it waits for the other threads to catch up.

One may be able to spend many happy hours programming the GPU this way before running into a question: What if we want each thread to operate on a continguous area of memory? (In most cases, we don't want this.) In the previous method (which is the canonical one), the parts of the array that each thread worked on were interleaved with each other. Now we run into a scenario where, for some reason, the threads must operate on continguous chunks. "No problem" you say, we simply need to break the array into chunks and give a chunk to each thread.

const int chunksz = (array_len + blockDim.x - 1)/blockDim.x;
for (int i = threadIdx.x*chunksz; i < (threadIdx.x + 1)*chunksz; i++) {
    if (i < array_len) {
        arr[i] = ...;
    }
}

If we size the chunks at 3 items, that won't be enough, so again we need $⌈ l / t ⌉ = 4$ items per chunk. Here is the result:

⬛⬛⬛⬛🟫🟫🟫🟫🟥🟥🟥🟥🟧🟧🟧🟧🟨🟨🟨🟨🟩🟩🟩🟩🟦🟦

Beautiful. Except you may have noticed something missing. There are no purple squares. Though thread 6 is a little lazy and doing 2 items instead of 4, thread 7 is doing absolutely nothing! It's somehow managed to fall off the end of the array.

Unavoidably, some threads must be idle for $⌈ l / t ⌉ t - l = 6$ loops. This is the conserved total amount of idleness. With the first method, the idleness is spread out across threads. Mathematically, the amount of idleness can be no greater than $t - 1$ regardless of array length and thread number, and so each thread will be idle for at most 1 loop. But in the contiguous method, the idleness is concentrated in the last threads. There is nothing mathematically impossible about having $⌈ l / t ⌉ t - l$ as big as $⌈ l / t ⌉$ or bigger, and so it's possible for an entire thread to remain unused. Multiple threads, even. Eg. take $l = 9$ :

⬛⬛🟫🟫🟥🟥🟧🟧🟨

3 full threads are unused there! Practically, this shouldn't actually be a problem, though. The number of serial loops is still the same, and the total number of idle loops is still the same. It's just distributed differently. The reasons to prefer the interleaved method to the contiguous method would be related to memory coalescing or bank conflicts. The issue of unused threads would be unimportant.

We don't always run into this effect. If $l$ is a multiple of $t$ , all threads are fully utilized. Also, we can guarantee that there are no unused threads for $l$ larger than a certain maximal value. Namely, take $l = (t - 1) ²$ then $⌈ (t - 1) ² / t ⌉ = t - 1$ and so the idleness is $t (t - 1) - (t - 1) ² = t - 1 \geq ⌈ l / t ⌉ = t - 1$ . But if $l$ is larger than this, then one can show that all threads must be used at least a little bit.

So, if we're using $t = 32$ CUDA threads, then when the array size is 961, the contiguous processing method will leave thread 31 idle. And 961 is the largest array size for which that is true.

LessWrong 2.0 Reader

Archive

Recent comments