benwr

Posts
Comments

Posts

Information throughput of biological humans and frontier LLMs 2025-02-22T07:15:45.457Z

Biological humans collectively exert at most 400 gigabits/s of control over the world. 2025-02-20T23:44:06.509Z

Not all capabilities will be created equal: focus on strategically superhuman agents 2025-02-13T01:24:46.084Z

Bounty for Evidence on Some of Palisade Research's Beliefs 2024-09-23T20:01:20.917Z

11 diceware words is enough 2024-02-15T00:13:43.420Z

What policies have most thoroughly crippled (otherwise-promising) industries or technologies? 2022-12-27T02:25:44.376Z

A Litany Missing from the Canon 2022-06-17T01:39:22.086Z

Sneaking Suspicion 2022-05-27T19:17:34.214Z

Entropy isn't sufficient to measure password strength 2022-01-17T06:41:18.073Z

Do you do weekly or daily reviews? What are they like? 2019-08-05T01:23:43.351Z

benwr's unpolished thoughts 2019-07-29T02:18:14.366Z

Why I've started using NoScript 2019-05-15T21:32:20.415Z

Usernames in RSS feeds 2018-02-19T02:22:26.749Z

Comments

Comment by benwr on Not all capabilities will be created equal: focus on strategically superhuman agents · 2025-04-16T21:45:31.009Z · LW · GW

I think it seems like a fine possibility in principle, actually; sorry to have given the wrong impression! It's not my central hope, since strategy-stealing seems like it should make many human-augmentations "available" to AI systems as well. This is notably not true for things involving, e.g., BCIs or superbabies.

Comment by benwr on Not all capabilities will be created equal: focus on strategically superhuman agents · 2025-04-15T20:20:49.316Z · LW · GW

When I'm thinking about this, it seems kind of fine if the goalposts move - human strategic capacity will certainly move over time no matter what, right? Like, someone invented crowdfunding and suddenly we could do types of coordination that we previously couldn't do.

Comment by benwr on Biological humans collectively exert at most 400 gigabits/s of control over the world. · 2025-02-20T23:50:11.150Z · LW · GW

Nate Soares points out that the first paragraph is not quite right: Imagine writing a program that somehow implements an aligned superintelligence, giving it as an objective, "maximize utility according to the person who pressed the 'go' button", and pressing the 'go' button.

There's some sense in which, by virtue of existing in the world, you're already kind of "lucky" by this metric: It can take a finite amount of information to instantiate an agent that takes unbounded actions on your behalf.

Comment by benwr on benwr's unpolished thoughts · 2025-02-20T20:02:39.287Z · LW · GW

I asked Deep Research to see if there are existing treatments of this basic idea in the literature. It seems most closely related to the concept of "empowerment" in RL, which I'm surprised I hadn't heard of: https://en.m.wikipedia.org/wiki/Empowerment_(artificial_intelligence)

The Wikipedia article makes it seem like this might also be how RL people think about instrumental convergence?

Comment by benwr on benwr's unpolished thoughts · 2025-02-20T09:26:31.538Z · LW · GW

Human information throughput is allegedly only about 10-50 bits per second. This implies an interesting upper bound, in that the information throughput of biological humanity as a whole can't be higher than around 50 * 10^10 = 500Gbit/s. I.e., if all distinguishable actions made by humans were perfectly independent, biological humanity as a whole would have at most 500Gbit/s of "steering power".

I need to think more about the idea of "steering power" (e.g. some obvious rough edges around amplifying your steering power using external information processing / decision systems), but I have some intuition that one might actually be able to come up with a not-totally-useless concept that lets us say something like "humanity can't stay in 'meaningful control' if we have an unaligned artificial agent with more steering power than humanity, expressed in bits/s".

Comment by benwr on Not all capabilities will be created equal: focus on strategically superhuman agents · 2025-02-13T21:00:06.953Z · LW · GW

I think you may have missed, or at least not taken literally, at least one of these things in the post:

The expansion of "superhuman strategic agent" is not "agent that's better than humans at strategic reasoning", it's "agent that is better than the best groups of humans at taking (situated) strategic action"
Strategic action is explicitly context-dependent, e.g. an AI system that's inside a mathematically perfect simulated world that can have no effect on the rest of the physical world and vice versa, has zero strategic power in this sense. Also e.g. in the FAQ, "Capabilities and controls are relevant to existential risks from agentic AI insofar as they provide or limit situated strategic power." So, yes, an agent that lives on your laptop is only strategically superhuman if it has the resources to actually take strategic action rivaling the most strategically capable groups of humans.
"increasingly accurately" is meant to point out that we don't need to understand or limit the capabilities of things that are obviously much strategically worse than us.

Comment by benwr on Not all capabilities will be created equal: focus on strategically superhuman agents · 2025-02-13T01:44:56.837Z · LW · GW

Comment by benwr on benwr's unpolished thoughts · 2025-02-06T14:29:53.914Z · LW · GW

I think it probably makes sense for ~everyone to have an explicit list of "things I'd like AI to do for me", especially around productivity and/or things that could help you with world-saving. If you have a list like this, and we happen to hit a relevant capability threshold before we lose, you should probably avoid wasting time on that thing as quickly as possible.

Comment by benwr on Bounty for Evidence on Some of Palisade Research's Beliefs · 2024-09-24T21:28:23.089Z · LW · GW

Thanks everyone for thoughts so far! I do want to emphasize that we're actually highly interested in collecting even the most "obvious" evidence in favor of or against these ideas. In fact, in many ways we're more interested in the obvious evidence than in reframes or conceptual problems in the ideas here; of course we want to be updating our beliefs, but we also want to get a better understanding of the existing state of concrete evidence on these questions. This is partly because we consider it part of our mission to expand the amount and quality of relevant evidence on these beliefs, and are trying to ensure that we're aware of existing work.

Comment by benwr on benwr's unpolished thoughts · 2024-07-08T05:02:57.400Z · LW · GW

Surprisingly to me, Claude 3.5 Sonnet is much more consistent in its answer! It is still not perfect, but it usually says the same thing (9/10 times it gave the same answer).

Comment by benwr on benwr's unpolished thoughts · 2024-07-07T20:18:49.484Z · LW · GW

From the "obvious-but-maybe-worth-mentioning" file:

ChatGPT (4 and 4o at least) cheats at 20 questions:

If you ask it "Let's play a game of 20 questions. You think of something, and I ask up to 20 questions to figure out what it is.", it will typically claim to "have something in mind", and then appear to play the game with you.

But it doesn't store hidden state between messages, so when it claims to "have something in mind", either that's false, or at least it has no way of following the rule that it's thinking of a consistent thing throughout the game. i.e. its only options are to cheat or refuse to play.

You can verify this by responding "Actually, I don't have time to play the whole game right now. Can you just tell me what it was you were thinking of?", and then "refreshing" its answer. When I did this 10 times, I got 9 different answers and only one repeat.

Comment by benwr on benwr's unpolished thoughts · 2024-02-29T21:17:02.207Z · LW · GW

Sometimes people use "modulo" to mean something like "depending on", e.g. "seems good, modulo the outcome of that experiment" [correct me ITT if you think they mean something else; I'm not 100% sure]. Does this make sense, assuming the term comes from modular arithmetic?

Like, in modular arithmetic you'd say "5 is 3, modulo 2". It's kind of like saying "5 is the same as 3, if you only consider their relationship to modulus 2". This seems pretty different to the usage I'm wondering about; almost its converse: to import the local English meaning of "modulo", you'd be saying "5 is the same as 3, as long as you've taken their relationship to the modulus 2 into account". This latter statement is false; 5 and 3 are super different even if you've taken this relationship into account.

But the sense of the original quote doesn't work with the mathematical meaning: "seems good, if you only consider the outcome of that experiment and nothing else".

Is there a math word that means the thing people want "modulo" to mean?

Comment by benwr on 11 diceware words is enough · 2024-02-16T06:44:40.538Z · LW · GW

Well, not that much, right? If you had an 11-word diceware passphrase to start, each word is about 7 characters on average, so you have maybe 90 places to insert a token - only 6.5 extra bits come from choosing a place to insert your character. And of course you get the same added entropy from inserting a random 3 base32 chars at a random location.

Happy to grant that a cracker assuming no unicode won't be able to crack your password, but if that's your goal then it might be a bad idea to post about your strategy on the public internet ;)

Comment by benwr on 11 diceware words is enough · 2024-02-15T19:31:00.980Z · LW · GW

maybe; probably the easiest way to do this is to choose a random 4-digit hexadecimal number, which gives you 16 bits when you enter it (e.g. via ctrl+u on linux). But personally I think I'd usually rather just enter those hex digits directly, for the same entropy minus a keystroke. Or, even better, maybe just type a random 3-character base32 string for one fewer bit.

Comment by benwr on Babble challenge: 50 ways of sending something to the moon · 2023-08-01T11:04:23.455Z · LW · GW

Some thoughts after doing this exercise:

I did the exercise because I couldn't sleep; I didn't keep careful count of the time, and I didn't do it all in one sitting. I'd guess I spent about an hour on it total, but I think there's a case to be made that this was cheating. However, "fresh eyes" is actually a really killer trick when doing this kind of exercise, in my experience, and it's usually available in practice. So I don't feel too bad about it.

I really really dislike the experience of saying things I think are totally stupid, and I currently don't buy that I should start trying to say stupider things. My favorite things in the above list came from refusing to just say another totally stupid thing. Nearly everything in my list is stupid in some way, but the things that are so stupid they don't even feel interesting basically make me feel sad. I trust my first-round aesthetic pruner to actually be helping to train my babbler in constructive directions.

The following don't really feel worth having said, to me:

Throw it really hard
Catapult
Kick it really hard
Wormhole
Nuclear explosion based craft

My favorites didn't come after spewing this stuff; instead they came when I refused to be okay with just saying more of that kind of junk:

Move the thing upward by one foot per day
Name the thing "420 69 Doge To The Moon" and hope Elon takes the bait
The various bogo-send options
Optical tweezers

The difference isn't really that these are less stupid; in fact they're kind of more stupid, practically speaking. But I actually viscerally like them, unlike the first group. Forcing myself to produce things I hate feels like a bad strategy on lots of levels.

Comment by benwr on Babble challenge: 50 ways of sending something to the moon · 2023-08-01T10:32:34.947Z · LW · GW

A thing that was going through my head but I wasn't sure how to turn into a real idea (vulgar language from a movie):

Perhaps you would like me to stop the car and you two can fuck yourselves to Lutsk!

Comment by benwr on Babble challenge: 50 ways of sending something to the moon · 2023-08-01T10:27:16.514Z · LW · GW

Whoa. I also thought of this, though for me it was like thing 24 or something, and I was too embarrassed to actually include it in my post.

Comment by benwr on Babble challenge: 50 ways of sending something to the moon · 2023-08-01T10:17:22.444Z · LW · GW

Hire SpaceX to send it
Bribe an astronaut on the next manned moon mission to bring it with them
Bribe an engineer on the next robotic moon mission to send it with the rover
Get on a manned mars mission, and throw it out the airlock at just the right speed
Massive evacuated sphere (like a balloon but arbitrarily light), aimed very carefully
Catapult
Send instructions on how to build a copy of the thing, and where to put it, such that an alien race will do it as a gesture of goodwill
Same, but with an incentive of some kind
Same, but do it acausally
Make a miniature moon and put the thing on that
Build an AGI with the goal of putting the thing on the moon with 99% confidence, with minimum impact to other things
Carve the thing out of the moon's surface, using lasers from satellites around Earth
Build a reverse space elevator: the earth is in a luno-stationary orbit due to tidal locking, so you could in principle build an extremely tall tower on the moon's surface that came relatively close to earth. Then, you could lower objects down that tower after launching them a relatively short distance, exchanging them for moonrock ballast.
Quantum-bogo-send it: check to see if the thing has materialized on the moon. If it hasn't, destroy this everett branch.
Tegmark-1-bogo-send it: check to see if the thing has materialized on the moon. If it hasn't, destroy a large local region of space.
Tegmark-4-bogo-send it: check to see if the thing has materialized on the moon. If it hasn't, derive a logical contradiction
Pray for God to send the thing to the moon
Offer to sell your soul to the devil in exchange for the thing being sent to the moon
Ask everyone on LessWrong to generate 50 ideas each on how to send a thing to the moon, and do the best one
Ask everyone on LessWrong to generate 50 ideas each on how to send a thing to the moon, and do the worst one
Ask everyone on LessWrong to generate 50 ideas each on how to send a thing to the moon, and do all of them
Ask everyone on LessWrong to generate 50 ideas each on how to send a thing to the moon, put all the letters from all the answers into a big bag, and shake it and draw from it repeatedly until you draw a sentence that describes a strategy for sending a thing to the moon, and then do that
Somehow annihilate the earth (except for the thing). The thing will then probably fall to the moon? Probably, figure out whether that's right before annihilating the earth
Pull a Raymond-Smullyan-style "will you answer my next question honestly?" scam on the director of NASA, forcing him to kiss you... er... I mean, send the thing to the moon
Wait until moon tourism is cheap
Start a religion whose central tenets include the belief that this thing being on the moon is a prerequisite for the creation of a universal utopia
Non-reverse-space-elevator: build a space elevator, and then throw the thing off the top when the moon is nearby
Big ol' rocket
Nuclear explosion based craft
Wormhole
Unrealistically-good weather control, allowing you to harness the motion of the molecules in the atmosphere to propel objects however you want via extremely careful placement.
Redefine or reconceptualize "the moon" to mean wherever the thing is already
Redefine or reconceptualize "thing" to mean a thing that's already on the moon
Redefine or reconceptualize "send" to mean keeping the sent thing away from the target
Build an extremely detailed simulation of the moon with the thing on it
Wait for the sun to engulf the earth-moon system, mixing what's-left-of-the-thing up with what's-left-of-the-moon
Propel the earth, "wandering earth"-style, to become a moon of Jupiter. Now at least the thing is on a moon.
Propel the earth, "wandering earth"-style, to collide with the moon, and be sure the thing is located at the point of collision
Throw it really hard
Gun
Put your face between a really big grapefruit and the moon, put the thing in the grapefruit, and then insert a spoon into the grapefruit. When the grapefruit squirts at your face, pull away quickly
Make a popular movie that involves the thing being sent to the moon, in a very memeable way, and hope Elon takes the bait
Name the thing "420 69 Doge To The Moon" and hope Elon takes the bait
So, y'know how you can levitate things in ultrasonic standing waves? Can you do that with light waves on a super small scale? I think you can, and I think I've seen some IBM animation that was made this way? "optical tweezers", was it called? So, do that, with the standing waves slowly drifting up toward the moon
Eh; things seeming to retain a particular identity over time is just a useful fiction - "the thing" next year is just a subset of the causal results of the thing as it is now, not really any more special than any other causal results of the thing as it is now. So since the moon is in the thing's future light cone already, the job is more-or-less already accomplished.
Turn back time to the moment when the parts of the thing were most recently intermixed with the parts of the moon. Maybe the big bang? or maybe some more recent time.
Starting somewhere on the equator, move the thing upward by one foot. Tomorrow, move it up by another foot. Continue until you reach the moon. Surely it's never all that hard to just move the thing one more foot, right?
Kick it really hard
Nanobot swarm
Adult-sized stomp rocket

Comment by benwr on UFO Betting: Put Up or Shut Up · 2023-07-29T07:09:07.420Z · LW · GW

(I've added my $50 to RatsWrong's side of this bet)

Comment by benwr on "Justice, Cherryl." · 2023-07-24T18:10:03.572Z · LW · GW

For contingent evolutionary-psychological reasons, humans are innately biased to prefer "their own" ideas, and in that context, a "principle of charity" can be useful as a corrective heuristic

I claim that the reasons for this bias are, in an important sense, not contingent. i.e. an alien race would almost certainly have similar biases, and the forces in favor of this bias won't entirely disappear in a world with magically-different discourse norms (at least as long as speakers' identities are attached to their statements).

As soon as I've said "P", it is the case that my epistemic reputation is bound up with the group's belief in the truth of P. If people later come to believe P, it means that (a) whatever scoring rule we're using to incentivize good predictions in the first place will reward me, and (b) people will update more on things I say in the future.

If you wanted to find convincing evidence for P, I'm now a much better candidate to find that evidence than someone who has instead said "eh; maybe P?" And someone who has said "~P" is similarly well-incentivized to find evidence for ~P.

Comment by benwr on Systems that cannot be unsafe cannot be safe · 2023-05-02T20:46:33.509Z · LW · GW

I would agree more with your rephrased title.

People do actually have a somewhat-shared set of criteria in mind when they talk about whether a thing is safe, though, in a way that they (or at least I) don't when talking about its qwrgzness. e.g., if it kills 99% of life on earth over a ten year period, I'm pretty sure almost everyone would agree that it's unsafe. No further specification work is required. It doesn't seem fundamentally confused to refer to a thing as "unsafe" if you think it might do that.

I do think that some people are clearly talking about meanings of the word "safe" that aren't so clear-cut (e.g. Sam Altman saying GPT-4 is the safest model yet™️), and in those cases I agree that these statements are much closer to "meaningless".

Comment by benwr on Systems that cannot be unsafe cannot be safe · 2023-05-02T20:14:41.069Z · LW · GW

Part of my point is that there is a difference between the fact of the matter and what we know. Some things are safe despite our ignorance, and some are unsafe despite our ignorance.

Comment by benwr on Systems that cannot be unsafe cannot be safe · 2023-05-02T12:13:51.530Z · LW · GW

The issue is that the standards are meant to help achieve systems that are safe in the informal sense. If they don't, they're bad standards. How can you talk about whether a standard is sufficient, if it's incoherent to discuss whether layperson-unsafe systems can pass it?

Comment by benwr on Systems that cannot be unsafe cannot be safe · 2023-05-02T11:59:49.173Z · LW · GW

I don't think it's true that the safety of a thing depends on an explicit standard. There's no explicit standard for whether a grizzly bear is safe. There are only guidelines about how best to interact with them, and information about how grizzly bears typically act. I don't think this implies that it's incoherent to talk about the situations in which a grizzly bear is safe.

Similarly, if I make a simple html web site "without a clear indication about what the system can safely be used for... verification that it passed a relevant standard, and clear instruction that it cannot be used elsewhere", I don't think that's sufficient for it to be considered unsafe.

Sometimes a thing will reliably cause serious harm to people who interact with it. It seems to me that this is sufficient for it to be called unsafe. Sometimes a thing will reliably cause no harm, and that seems sufficient for it to be considered safe. Knowledge of whether a thing is safe or not is a different question, and there are edge cases where a thing might occasionally cause minor harm. But I think the requirement you lay out is too stringent.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-22T22:25:27.903Z · LW · GW

I think I agree that this isn't a good explicit rule of thumb, and I somewhat regret how I put this.

But it's also true that a belief in someone's good-faith engagement (including an onlooker's), and in particular their openness to honest reconsideration, is an important factor in the motivational calculus, and for good reasons.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-18T04:00:52.100Z · LW · GW

I think it's pretty rough for me to engage with you here, because you seem to be consistently failing to read the things I've written. I did not say it was low-effort. I said that it was possible. Separately, you seem to think that I owe you something that I just definitely do not owe you. For the moment, I don't care whether you think I'm arguing in bad faith; at least I'm reading what you've written.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-18T01:55:12.610Z · LW · GW

Nor should I, unless I believe that someone somewhere might honestly reconsider their position based on such an attempt. So far my guess is that you're not saying that you expect to honestly reconsider your position, and Said certainly isn't. If that's wrong then let me know! I don't make a habit of starting doomed projects.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-16T04:59:41.037Z · LW · GW

I'm not sure what you mean - as far as I can tell, I'm the one who suggested trying to rephrase the insulting comment, and in my world Said roughly agreed with me about its infeasibility in his response, since it's not going to be possible for me to prove either point: Any rephrasing I give will elicit objections on both semantics-relative-to-Said and Said-generatability grounds, and readers who believe Said will go on believing him, while readers who disbelieve will go on disbelieving.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-15T04:14:44.950Z · LW · GW

By that measure, my comment does not qualify as an insult. (And indeed, as it happens, I wouldn’t call it “an insult”; but “insulting” is slightly different in connotation, I think. Either way, I don’t think that my comment may fairly be said to have these qualities which you list.

I think I disagree that your comment does not have these qualities in some measure, and they are roughly what I'm objecting to when I ask that people not be insulting. I don't think I want you to never say anything with an unflattering implication, though I do think this is usually best avoided as well. I'm hopeful that this is a crux, as it might explain some of the other conversation I've seen about the extent to which you can predict people's perception of rudeness.

There are of course more insulting ways you could have conveyed the same meaning. But there are also less insulting ways (when considering the extent to which the comment emphasizes the unflatteringness and the call to action that I'm suggesting readers will infer).

Certainly there’s no “call to non-belief-based action”…!)

I believe that none was intended, but I also expect that people (mostly subconsciously!) interpret (a very small) one from the particular choice of words and phrasing. Where the action is something like "you should scorn this person", and not just "this person has unflattering quality X". The latter does not imply the former.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-15T03:48:08.683Z · LW · GW

For what it's worth, I don't think that one should never say insulting things. I think that people should avoid saying insulting things in certain contexts, and that LessWrong comments are one such context.

I find it hard to square your claim that insultingness was not the comment's purpose with the claim that it cannot be rewritten to elide the insult.

An insult is not simply a statement with a meaning that is unflattering to its target - it involves using words in a way that aggressively emphasizes the unflatteringness and suggests, to some extent, a call to non-belief-based action on the part of the reader.

If I write a comment entirely in bold, in some sense I cannot un-bold it without changing its effect on the reader. But I think it would be pretty frustrating to most people if I then claimed that I could not un-bold it without changing its meaning.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-15T03:16:08.960Z · LW · GW

My guess is that you believe it's impossible because the content of your comment implies a negative fact about the person you're responding to. But insofar as you communicated a thing to me, it was in fact a thing about your own failure to comprehend, and your own experience of bizarreness. These are not unflattering facts about Duncan, except insofar as I already believe your ability to comprehend is vast enough to contain all "reasonable" thought processes.

Comment by benwr on Moderation notes re: recent Said/Duncan threads · 2023-04-15T03:03:00.419Z · LW · GW

But, of course, I recognize that my comment is insulting. That is not its purpose, and if I could write it non-insultingly, I would do so. But I cannot.

I want to register that I don't believe you that you cannot, if we're using the ordinary meaning of "cannot". I believe that it would be more costly for you, but it seems to me that people are very often able to express content like that in your comment, without being insulting.

I'm tempted to try to rephrase your comment in a non-insulting way, but I would only be able to convey its meaning-to-me, and I predict that this is different enough from its meaning-to-you that you would object on those grounds. However, insofar as you communicated a thing to me, you could have said that thing in a non-insulting way.

Comment by benwr on Sneaking Suspicion · 2022-05-27T19:18:34.939Z · LW · GW

Other facts about how I experience this:

* It's often opposed to internal forces like "social pressure to believe the thing", or "bucket errors I don't feel ready to stop making yet"

* Noticing it doesn't usually result in immediate enlightenment / immediately knowing the answer, but it does result in some kind of mini-catharsis, which is great because it helps me actually want to notice it more.

* It's not always the case that an opposing loud voice was wrong, but I think it is always the case that the loud voice wasn't really justified in its loudness.

Comment by benwr on Benign Boundary Violations · 2022-05-26T20:05:17.651Z · LW · GW

A thing I sort-of hoped to see in the "a few caveats" section:

* People's boundaries do not emanate purely from their platonic selves, irrespective of the culture they're in and the boundaries set by that culture. Related to the point about grooming/testing-the-waters, if the cultural boundary is set at a given place, people's personal boundaries will often expand or retract somewhat, to be nearer to the cultural boundary.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-20T05:53:48.558Z · LW · GW

Perhaps controversially, I think this is a bad selection scheme even if you replace "password" with any other string.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-18T20:46:58.303Z · LW · GW

any password generation scheme where this is relevant is a bad idea

I disagree; as the post mentions, sometimes considerations such as memorability come into play. One example might be choosing random English sentences as passwords. You might do that by choosing a random parse tree of a certain size. But some English sentences have ambiguous parses, i.e. they'll have multiple ways to generate them. You *could* try to sample to avoid this problem, but it becomes pretty tricky to do that carefully. If you instead find the "most ambiguous sentence" in your set, you can get a lower bound on the safety of your scheme.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-18T15:44:04.790Z · LW · GW

~~Um, huh? There are 2^1000 1000-character passwords, not 2^4700. Where is the 4700 coming from?~~

(added after realizing the above was super wrong): Whoops, that's what I get for looking at comments first thing in the morning. log2(26^1000) = 4700 Still, the following bit stands:

I'd also like to register that, in my opinion, if it turns out that your comment is wrong and not my original statement, it's really bad manners to have said it so confidently.

(I'm now not sure if you made an error or if I did, though)

Update: I think you're actually totally right. The entropy gives a lower bound for the average, not the average itself. I'll update the post shortly.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-18T01:44:03.157Z · LW · GW

To clarify a point in my sibling comment, the concept of "password strength" doesn't cleanly apply to an individual password. It's too contingent on factors that aren't within the password itself. Say I had some way of scoring passwords on their strength, and that this scoring method tells me that "correct horse battery staple" is a great password. But then some guy puts that password in a webcomic read by millions of people - now my password is going to be a lot worse, even though the content of the password didn't change.

Password selection schemes aren't susceptible to this kind of problem, and you can consistently compare the strength of one with the strength of another, using methods like the ones I'm talking about in the OP.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-17T17:57:56.326Z · LW · GW

I don't think that's how people normally do it; partly because I think it makes more sense to try to find good password *schemes*, rather than good individual passwords, and measuring a password's optimal encoding requires knowing the distribution of passwords already. The optimal encoding story doesn't help you choose a good password scheme; you need to add on top of it some way of aggregating the code word lengths. In the example from the OP, you could use the average code word length of the scheme, which has you evaluating Shannon entropy again, or you could use the minimum code word length, which brings you back to min-entropy.

Comment by benwr on Entropy isn't sufficient to measure password strength · 2022-01-17T16:42:24.031Z · LW · GW

Yep! I originally had a whole section about this, but cut it because it doesn't actually give you an ordering over schemes unless you also have a distribution over adversary strength, which seems like a big question. If one scheme's min-entropy is higher than another's max-entropy, you know that it's better for any beliefs about adversary strength.

Comment by benwr on Long covid: probably worth avoiding—some considerations · 2022-01-16T22:17:16.242Z · LW · GW

Hm. On doing exactly as you suggest, I feel confused; it looks to me like the 25-44 cohort has really substantially more deaths than in recent years: https://www.dropbox.com/s/hcipg7yiuiai8m2/Screen Shot 2022-01-16 at 2.12.44 PM.png?dl=0 I don't know what your threshold for "significance" is, but 103 / 104 weeks spent above the preceding 208 weeks definitely meets my bar.

Am I missing something here?

Comment by benwr on benwr's unpolished thoughts · 2021-06-10T21:16:52.133Z · LW · GW

A thing that feels especially good about this way of thinking about things is that it feels like the kind of problem with straightforward engineering / cryptography style solutions.

Comment by benwr on benwr's unpolished thoughts · 2021-06-10T20:19:53.748Z · LW · GW

I'm interested in concrete ways for humans to evaluate and verify complex facts about the world. I'm especially interested in a set of things that might be described as "bootstrapping trust".

For example:

Say I want to compute some expensive function f on an input x. I have access to a computer C that can compute f; it gives me a result r. But I don't fully trust C - it might be maliciously programmed to tell me a wrong answer. In some cases, I can require that C produce a proof that f(x) = r that I can easily check. In others, I can't. Which cases are which?

A partial answer to this question is "the complexity class NP". But in practice this isn't really satisfying. I have to make some assumptions about what tools are available that I do trust.

Maybe I trust simple mathematical facts (and I think I even trust that serious mathematics and theoretical computer science track truth really well). I also trust my own senses and memory, to a nontrivial extent. Reaching much beyond that is starting to feel iffy. For example, I might not (yet) have a computer of my own that I trust to help me with the verification. What kinds of proof can I accept with the limitations I've chosen? And how can I use those trustworthy proofs to bootstrap other trusted tools?

Other problems in this bucket include "How can we have trustworthy evidence - say videos - in a world with nearly perfect generative models?" and a bunch of subquestions of "Does debate scale as an AI alignment strategy?"

This class of questions feels like an interesting lens on some things that are relevant to some sorts of AI alignment work such as debate and interpretability. It's also obviously related to some parts of information security and cryptography.

"Bootstrapping trust" is basically just a restatement of the whole problem. It's not exactly that I think this is a good way to decide how to direct AI alignment effort; I just notice that it seems somehow like a "fresh" way of viewing things.

Comment by benwr on Prize: Interesting Examples of Evaluations · 2020-11-29T14:02:15.206Z · LW · GW

IT security auditing; e.g. https://safetag.org/guide/

Comment by benwr on Prize: Interesting Examples of Evaluations · 2020-11-29T00:27:01.702Z · LW · GW

"Postmortem culture" from the Google SRE book: https://sre.google/sre-book/postmortem-culture/

This book has some other sections that are also about evaluation, but this chapter is possibly my favorite chapter from any corporate handbook.

Comment by benwr on Prize: Interesting Examples of Evaluations · 2020-11-28T23:05:55.917Z · LW · GW

Two that are focused on critique rather than evaluation per se:

"the critical response process" is about useful critique in the arts: https://lizlerman.com/critical-response-process/
"best practices for code review": https://smartbear.com/learn/code-review/best-practices-for-peer-code-review/

Comment by benwr on benwr's unpolished thoughts · 2020-09-27T04:26:02.606Z · LW · GW

If I got to pick the moral of today's Petrov day incident, it would be something like "being trustworthy requires that you be more difficult to trick than it would be worth", and I think very few people reliably live up to this standard.

Comment by benwr on benwr's unpolished thoughts · 2020-09-22T05:45:19.692Z · LW · GW

Beth Barnes notices: Rationalists seem to use the word "actually" a lot more than the typical English speaker; it seems like the word "really" means basically the same thing.

We wrote a quick script, and the words "actually" and "really" occur about equally often on LessWrong, while Google Trends suggests that "really" is ~3x more common in search volume. SSC has ~2/3 as many "actually"s as "really"s.

What's up with this? Should we stop?

Comment by benwr on Did any US politician react appropriately to COVID-19 early on? · 2020-07-17T02:25:44.508Z · LW · GW

San Francisco's mayor, London Breed, declared a state of emergency in the city on February 25th, and it seems like she was concerned about the disease (and specifically ICU capacity) as early as January.

I don't know what actions the mayor's office actually took during this time, but it seems like she was at least aware and concerned well ahead of most other politicians.

Comment by benwr on benwr's unpolished thoughts · 2020-07-16T23:10:49.002Z · LW · GW

darn - I've been playing it on my old ipad for a long time

User info

Posts

Comments