Long Bets by Confidence Level 2019-12-09T14:20:02.076Z · score: 22 (8 votes)
Ungendered Spanish 2019-12-07T16:20:02.001Z · score: 20 (9 votes)
LW For External Comments? 2019-12-05T18:30:02.217Z · score: 51 (17 votes)
Elementary Statistics 2019-12-05T02:00:02.266Z · score: 12 (6 votes)
Long-lasting Effects of Suspensions? 2019-12-03T20:40:01.264Z · score: 16 (6 votes)
Vocal Range 2019-12-01T15:50:02.022Z · score: 17 (7 votes)
How To Change a Dance 2019-11-30T13:40:01.625Z · score: 11 (5 votes)
Getting Ready for the FB Donation Match 2019-11-27T19:20:02.156Z · score: 9 (5 votes)
Effect of Advertising 2019-11-26T14:30:02.095Z · score: 27 (9 votes)
Market Rate Food Is Luxury Food 2019-11-23T15:10:02.344Z · score: 10 (7 votes)
Solar One Year In 2019-11-22T15:20:01.839Z · score: 18 (8 votes)
Hybrid Lottery Update 2019-11-21T14:30:01.394Z · score: 10 (2 votes)
Affordable Housing Workarounds 2019-11-20T13:50:01.412Z · score: 10 (3 votes)
Drawing on Walls 2019-11-19T15:00:02.091Z · score: 15 (4 votes)
Comment, Don't Message 2019-11-18T16:00:02.010Z · score: 28 (12 votes)
Lazy Compost is Worse Than Landfill 2019-11-15T16:20:01.629Z · score: 35 (14 votes)
Arguing about housing 2019-11-14T17:00:01.752Z · score: 16 (4 votes)
Mosquito Net Fishing 2019-11-13T13:30:01.731Z · score: 14 (5 votes)
Attach Receipts to Credit Card Transactions 2019-11-12T16:30:01.678Z · score: 5 (2 votes)
Experiments and Consent 2019-11-10T14:50:01.956Z · score: 26 (14 votes)
Notes on Running Objective 2019-11-09T15:40:02.123Z · score: 9 (2 votes)
Uber Self-Driving Crash 2019-11-07T15:00:01.625Z · score: 110 (41 votes)
Lite Blocking 2019-11-05T13:50:02.402Z · score: 23 (6 votes)
Drug Policy 2019-11-04T21:30:02.058Z · score: 7 (1 votes)
Speaking up publicly is heroic 2019-11-02T12:00:01.882Z · score: 41 (14 votes)
Shared Cache is Going Away 2019-11-01T15:10:02.635Z · score: 19 (7 votes)
Breaking Group Rock Paper Scissors 2019-10-31T23:50:01.568Z · score: 7 (1 votes)
When do you start looking for a Boston apartment? 2019-10-29T16:30:01.716Z · score: 9 (3 votes)
Internet Distance 2019-10-28T13:10:03.716Z · score: 9 (2 votes)
Jacy Reese (born Jacy Anthis)? 2019-10-26T21:00:01.838Z · score: 16 (15 votes)
Door Ideas 2019-10-25T12:40:01.627Z · score: 8 (2 votes)
Multi-belled Brass 2019-10-24T14:20:01.617Z · score: 11 (4 votes)
Why "Referer"? 2019-10-23T13:40:01.647Z · score: 27 (11 votes)
Let People Move to Jobs 2019-10-21T18:00:01.772Z · score: 13 (6 votes)
Endogenous Epinephrine for Anaphylaxis? 2019-10-20T12:10:01.538Z · score: 11 (3 votes)
Why Ranked Choice Voting Isn't Great 2019-10-19T11:10:01.566Z · score: 21 (11 votes)
Archiving Yahoo Groups 2019-10-18T11:20:01.851Z · score: 45 (15 votes)
Festival Stats 2019 2019-10-17T11:00:02.242Z · score: 8 (2 votes)
Make more land 2019-10-16T11:20:03.381Z · score: 95 (46 votes)
Transportation Tracking 2019-10-15T11:00:01.987Z · score: 10 (4 votes)
Whistle-based Synthesis 2019-10-14T12:10:01.968Z · score: 7 (1 votes)
MA Price Accuracy Law 2019-10-13T12:00:01.815Z · score: 16 (5 votes)
Planned Power Outages 2019-10-12T14:10:01.395Z · score: 28 (10 votes)
Rent Needs to Decrease 2019-10-11T12:40:01.798Z · score: 16 (7 votes)
What is the real "danger zone" for food? 2019-10-10T11:00:01.792Z · score: 8 (2 votes)
Regularly Scheduled: Day-of Reminders 2019-10-09T11:00:01.830Z · score: 10 (3 votes)
Realigning Housing Coalitions 2019-10-08T11:10:01.584Z · score: 10 (4 votes)
Hybrid Lottery Admission 2019-10-07T13:20:01.496Z · score: 11 (3 votes)
Advanced Dances 2019-10-06T11:10:01.810Z · score: 10 (3 votes)
Eight O'Clock is Relative 2019-10-05T16:20:01.445Z · score: 13 (6 votes)


Comment by jkaufman on Ungendered Spanish · 2019-12-11T03:18:26.913Z · score: 2 (1 votes) · LW · GW

You don't need to make inanimate words gender neutral, so no need for "le oficine". But yes, "le niñe".

Comment by jkaufman on Long Bets by Confidence Level · 2019-12-09T16:29:04.628Z · score: 7 (5 votes) · LW · GW

Unless your expectation is that your counterparty's chosen charity has negligible effectiveness (for good or bad) relative to yours, it seems to me that this calculation is unlikely to be the one you actually want to do.

Great point! I think this is often the case for bets between people who do or don't consider effectiveness in choosing charities, or where people have sufficiently strong value disagreements that each thinks of the other's charity as neutral, but you're right that this is unusual.

Comment by jkaufman on Ungendered Spanish · 2019-12-09T04:09:26.425Z · score: 2 (1 votes) · LW · GW

I suspect it would be "le jefe" / "les jefes", with no changes to existing gendered forms.

Comment by jkaufman on Pieces of time · 2019-12-08T16:50:25.524Z · score: 2 (1 votes) · LW · GW

"Will it be the same day after nap? That's so silly. How can it be the same day after children go to bed?"

Comment by jkaufman on Ungendered Spanish · 2019-12-08T00:36:31.197Z · score: 5 (3 votes) · LW · GW

Having the nonbinary identity enter public consciousness seems to have caused the neutral pronoun to take on a weight and colour that makes it harder to apply it to non-nonbinary people. In English, since use in situations where gender is irrelevant is already grammatical, so I'd guess this has a negligible effect on usage (though it does seem to have caused a notable amount of brain inflammation in terfs and reactionaries that I must mention but probably shouldn't go into depth about), but in a different place, seems like this might be more of a thing

This isn't how I think the path of 'they' has gone in English? Using it where gender is irrelevant is super new ("my friend said they might be late") and felt wrong to me ten years ago. Having there be specific individuals who go by 'they' feels like it has done a lot to get people to practice and be comfortable with 'they', though it's possible I'm paying too much attention to my local communities?

Comment by jkaufman on LW For External Comments? · 2019-12-06T02:00:18.147Z · score: 7 (4 votes) · LW · GW
  1. I'm imagining you'd create an LW account if you didn't have one.

1 & 2. Yes, it's just two views into the same comment thread.

3 & 4. These are UI questions, but I would think having them work the same as on LW would make sense.

  1. Yes the comment is posted by your LW account. This would be implemented by having the comment composition box be in a cross-domain iframe hosted on The independent blog can't impersonate you.
Comment by jkaufman on Pieces of time · 2019-12-05T02:04:23.884Z · score: 6 (3 votes) · LW · GW

My younger (3.5y) child seems to experience the world something like this. She's very often confused about whether she's waking up from a nap or from the night, she'll say "but we haven't had dinner yet" when going down for a nap, she'll ask for breakfast when waking up for a nap, etc.

Comment by jkaufman on 2019 Winter Solstice Collection · 2019-12-04T18:03:51.175Z · score: 7 (3 votes) · LW · GW


Saturday December 14th, 7pm-9pm

Comment by jkaufman on Jacy Reese (born Jacy Anthis)? · 2019-12-02T19:32:26.473Z · score: 3 (2 votes) · LW · GW

If others are looking at this: Contributions/Utsil, Contributions/Bodole

Comment by jkaufman on How To Change a Dance · 2019-12-02T12:00:55.793Z · score: 2 (1 votes) · LW · GW

they had a popular vote go against them

I wrote "If you had a vote you'd probably be in the minority". To take this specific example, we publicly planned a vote, saw that this was overwhelmingly preferred, and decided to switch. But if we had had the vote five years earlier I think we probably would have had different results.

Everyone I can think of who is pushing for the examples I gave in the post was a dancer before they started having that belief. Contra dance is far too niche for it to be worth anyone's while to try to come in from outside as a non-dancer and change it to be more the way they want.

Comment by jkaufman on Getting Ready for the FB Donation Match · 2019-11-30T12:29:15.783Z · score: 4 (2 votes) · LW · GW

Pretty confident; I did three $9,999 donations as a test run without getting any declined. The EA Giving Tuesday team (which I'm not on) has thought a lot more about this, and you could ask them?

Comment by jkaufman on Getting Ready for the FB Donation Match · 2019-11-29T02:28:56.706Z · score: 5 (3 votes) · LW · GW

Those instructions are not as good as the EA Giving Tuesday ones. They don't tell you to confirm your identity 1d+ before, or to click the green button 30s+ before.

Comment by jkaufman on When do you start looking for a Boston apartment? · 2019-11-29T02:25:38.368Z · score: 3 (2 votes) · LW · GW

I'm used to the NYC rental market, particularly Brooklyn, and, aside from lining up apartment-mates, the rule is that you look for a new apartment right before you're reading to move.

NYC has a very different apartment listing culture than Boston, yup!

Where are you getting your listings and how can you tell when the lease is intended or expected to start from the listing?

I'm scraping Padmapper. The availability date is usually plain text in the listing, unfortunately, and is also not something I have in my archived date (just location, price, and number of bedrooms).

But if you go on Padmapper, Craigslist, etc in April and look at listings, you'll mostly see 9/1 start dates.

Comment by jkaufman on Market Rate Food Is Luxury Food · 2019-11-28T11:49:56.930Z · score: 2 (1 votes) · LW · GW

"Fully Fund Section 8" is part of Bernie Sanders' housing proposal and is popular among people on the left. If we think low income people should get housing vouchers, why give out so few?

If I thought there was no way to bring down the cost of housing I would probably agree, but since supply is so restricted giving Section 8 to everyone who needs it would (a) raise rents even more, (b) be incredibly expensive, and (c) transfer a huge amount of money to landlords.

Building public housing (at higher densities than would normally be allowed) or just removing zoning restrictions would go much farther.

Comment by jkaufman on Market Rate Food Is Luxury Food · 2019-11-28T02:50:03.927Z · score: 2 (1 votes) · LW · GW

is a (fictional) several-year waiting list for SNAP equivalent to a several-year waiting list for, er, whatever housing thing this is meant to be parallel to?

Section 8

Comment by jkaufman on Effect of Advertising · 2019-11-27T13:31:29.540Z · score: 4 (2 votes) · LW · GW

Why would you say adblock detectors work fine? My understanding is any time a popular site starts using one, adblockers work around the detector:

EDIT: another example ( and a long list of issues (

Comment by jkaufman on Effect of Advertising · 2019-11-27T12:17:29.967Z · score: 4 (5 votes) · LW · GW

I think this is simpler to talk about with the case of publisher funding than purchasing decisions, and your arguments still apply. If you start using adblock you observe your experience on news sites is better, with no visible deterioration in the quality of reporting available to you. But each adblock user slightly decreases the publisher's income, and a world where adblock usage was, say, 98%, would mean you really couldn't run sites supported by advertising.

Applying "if everyone does what's good for them" here is tricky. The publisher would like to say "you're welcome to read my articles for free, as long as you don't bypass the ads", and then adblockers let users take one half of the offer without the other. Which I guess violates the "excludable" premise you have above?

A rough analogy (and I'm just talking about the economics and explicitly not trying to say adblocking is morally similar to shoplifting) is that you could save money by shoplifting, and it would be good for you individually. But the more people shoplift the less a business model of "put products on shelves, users will pay for them when they leave" stops working.

Comment by jkaufman on Effect of Advertising · 2019-11-27T02:41:18.726Z · score: 2 (1 votes) · LW · GW

The discussion on the FB version of the post convinced me this part isn't right. Yes, if you assume perfect enforcement then the reviewers become trustworthy, but in practice reviewing would be so lucrative and there would be so many ways to disguise compensation that reviews would probably be even more captured.

Comment by jkaufman on Effect of Advertising · 2019-11-27T01:15:25.674Z · score: 5 (5 votes) · LW · GW

The effects I'm describing are mostly about how advertising changes market-wide dynamics. One person not seeing any ads, or all the people not seeing one kind of ad, would have disproportionately smaller effects.

"Ads are annoying and we should have fewer" is a very different sort of claim than "ads are fundamentally illegitimate because they operate by corrupting your desires".

Comment by jkaufman on Effect of Advertising · 2019-11-26T21:16:47.430Z · score: 5 (2 votes) · LW · GW

Have a look at for a discussion around this with eyeglasses advertising

Comment by jkaufman on Effect of Advertising · 2019-11-26T21:15:48.855Z · score: 2 (1 votes) · LW · GW

After having read all my e-mails...

This is a minor quibble, but Google doesn't use emails to target ads anymore:

Comment by jkaufman on Effect of Advertising · 2019-11-26T15:59:12.679Z · score: 6 (3 votes) · LW · GW

Inviting a reviewer to an all-experience paid trade show in the Maldives isn't advertising.

I'm not so sure; that seems like a kind of sponsored review. Inviting a government regulator to a similar thing would be bribery, for example.

If you want a trustworthy review that isn't paid for by affiliate commissions you currently have the choice to go to ConsumerReports and pay for their subscription.

I really like that ConsumerReports works this way, and I respect them a lot for it. Unfortunately their main demographic is so different from mine that their reviews are generally not useful to me.

Comment by jkaufman on Affordable Housing Workarounds · 2019-11-20T15:24:59.695Z · score: 2 (1 votes) · LW · GW

I agree that today this isn't too much of an issue because there are so few units. But there are proposals to build far more affordability-restricted housing, and I no longer think that's such a good idea.

Some of what I describe above, however, is an issue today. Consider the "Buy land, take advantage of density bonuses, build a large 100% affordable fancy building, and sell the units to your just-out-of-school currently-low-earning children" case.

Comment by jkaufman on Arguing about housing · 2019-11-18T02:58:52.811Z · score: 2 (1 votes) · LW · GW

Thanks for being up for having this conversation in comments! Sorry for the slow response; I just got back to proper internet after several days on an island.

As I've said before, if political solutions were viable then this would have been solved 5+ years ago.

I still think dramatic improvement is possible via the political process for two main reasons:

  • The higher rents get, the more pressure there is to fix this. While it wasn't great five years ago, it's much worse now. As terrible housing policy continues expanding the number of people it affects, it's easier to build support for measures to fix it.

  • Housing coalitions are shifting, YIMBY is growing, and the idea that we can make things better by building more is spreading.

I think we should continue trying to build this support.

Find ways to increase the quality of the average grouphouse so more people want to live in them. ... if you found a way to increase the efficiency of a grouphouse bedroom so everything that would usually take 150ft2 can be done in 75ft2 without throwing important considerations under the bus, someone would only need to rent half as much room to maintain the same quality of life

I think this could be a decent solution for many young relatively well off single people without kids, who live primarily digital lives. While this is a demographic we know many people in, it's only a very small slice of the people affected by the housing crisis. Separately, since different people have different preferences and constraints I suspect most people who would have the time, energy, and inclination to build something like this would actually want to customize it more for their situation. Which is fine! Your design can still be useful even if most builders use it as a jumping-off point; you don't need interchangeable parts.

If people really did have generally similar preferences here you could build this in your apartment, and then when you moved you could sell it to the incoming tenant and leave it there. But if you actually tried this, even in a city like SF with tons of people in the target demographic, I expect pretty much everyone would ask you to bring it with you, even if you offered it for free. Similarly, if this were a large improvement over the kinds of loft systems you can already buy from IKEA I would expect you to be able to sell these to the general public, but again I don't think it would be very popular.

Coordinate groups of people to move from NIMBY cities with 10/10 jobs and 10/10 house prices to YIMBY cities with 8/10 jobs but 3/10 house prices.

I think this is likely to lose too much of what people value about being in those cities.

I'm also not sure where you're getting "8/10 jobs"; I think the benefits of being in the top city for your field are usually much higher than 25%, more like 50% to 300%.

Comment by jkaufman on Arguing about housing · 2019-11-15T16:20:25.033Z · score: 2 (1 votes) · LW · GW

Reading people's comments, it's very common for people opposing allowing market rate housing to object because they think it will make the area more expensive.

Comment by jkaufman on Arguing about housing · 2019-11-15T16:17:13.596Z · score: 4 (2 votes) · LW · GW

Though "pull in RSS updates" isn't the best fix for the problem that my posts here currently end with "Comment via: facebook". I could make a LW-specific RSS feed that didn't include that note. This would also let me filter out "Update" entries, which don't make sense here either.

Comment by jkaufman on Arguing about housing · 2019-11-15T06:36:00.146Z · score: 2 (1 votes) · LW · GW

Example threads: (but please no one go jump in on them because you saw them linked here; don't want to brigade)

It really doesn't seem to me like they're worried about decreasing property values?

Comment by jkaufman on Arguing about housing · 2019-11-15T06:29:32.003Z · score: 6 (3 votes) · LW · GW

Yup, that's my flow!

I'd love it if edits in the RSS feed propagated to posts; when have typo fixes I need to do them in both places.

Don't want to clobber any changes made through the LW UI though. Maybe something like refreshing the post from RSS on changes, but only if there aren't local edits? Automatically merging when there's no conflict would be spiffy, but probably not worth it.

Comment by jkaufman on Experiments and Consent · 2019-11-12T21:26:20.519Z · score: 2 (1 votes) · LW · GW

That's not talking about a UI refresh, but about Gmail adding new features:

  • Introduction of snooze
  • Introduction of smart reply
  • Offering attachment links in the message list view
  • Collapsable sidebar

Is that what you're talking about or am I still looking at the wrong thing?

Comment by jkaufman on Experiments and Consent · 2019-11-12T20:11:48.243Z · score: 2 (1 votes) · LW · GW

the Gmail interface update that happened last year

Are you talking about the Inbox deprecation?

Comment by jkaufman on Attach Receipts to Credit Card Transactions · 2019-11-12T18:59:01.702Z · score: 2 (1 votes) · LW · GW

It looks to me like country club billing stopped because at a time when everything was done on paper it was far too much work. If the purchase information was sent as part of getting the transaction approved then you can use it for fraud prevention in a way that wasn't possible in the 1970s.

Comment by jkaufman on Attach Receipts to Credit Card Transactions · 2019-11-12T17:41:24.735Z · score: 2 (1 votes) · LW · GW

Do you think this would decrease spending appreciably? I would be very surprised. (And if it doesn't decrease spending, or only decreases it slightly, then getting a better rate from the card company is enough to motivate them.)

Comment by jkaufman on Attach Receipts to Credit Card Transactions · 2019-11-12T17:38:40.788Z · score: 2 (1 votes) · LW · GW

I just tried reading some about fleet cards, and found this exxon faq and this sales page. It sounds like the number of gallons gets sent automatically, and you can set up the card to prompt for an odometer reading to be sent too. This is neat, though very fuel-specific.

When you say "at the high-end, these cards capture a great deal of data, comparable to a receipt", what are you thinking of?

The fraud gains are minimal.

Why do you say that? I would expect comparing what was being purchased to a model from this user's history and a model of fraudulent transactions would be very helpful!

Comment by jkaufman on Experiments and Consent · 2019-11-12T15:03:06.361Z · score: 2 (1 votes) · LW · GW

And the paper you linked showed that it wasn't being done for most of Google's history.

This is a nitpick, but 2000-2007 (the period between when AdWords launched and when the paper says they started quantitative ad blindness research) is 1/3 of Google's history, not "most".

I'm also not sure if the experiments could have been run much earlier, because I'm not sure identity was stable enough before users were signing into search pages.

Also, this sort of optimization isn't that valuable compared to much bigger opportunities for growth they had in the early 2000s.

If Google doesn't do it, I would be doubtful if anyone, even a peer like Amazon, does.

Why are you saying Google doesn't do it? I understand arguing about whether Google was doing it at various times, whether they should have prioritized it more highly, etc, but it's clearly used and I've talked to people who work on it.

Would you be interested in betting on whether Amazon has quantified the effects of ad blindness? I think we could probably find an Amazon employee to verify.

Which is just another way of saying that before then they hadn't used their long-term value measurements to figure out what threshold of ads to run before. Whether 2015 or 2013, this is damning.

It's specifically about mobile, which in 2013 was only about 10% of traffic and much less by monetization. Similar desktop experiments had been run earlier.

But I also think you're misinterpreting the paper to be about "how many ads should we run" and that those launches simply reduced the number of ads they were running. I'm claiming that the tuning of how many ads to run to maximize long-term value was already pretty good by 2013, but having a better experimental framework allowed them to increase long-term value by figuring out which specific kinds of ads to run or not run. As a rough example (from my head, I haven't looked at these launches) imagine an advertiser is willing to pay you a lot to run a bad ad that makes people pay less attention to your ads overall. If you turn down your threshold for how many ads to show, this bad ad will still get through. Measuring this kind of negative externality that varies on a per-ad basis is really hard, and it's especially hard if you have to run very long experiments to quantify the effect. One of the powerful tools in the paper is estimating long-term impacts from short term metrics so you can iterate faster, which makes it easier to evaluate many things including these kind of externalities.

(As before, speaking only for myself and not for Google)

Comment by jkaufman on Experiments and Consent · 2019-11-12T14:38:41.397Z · score: 2 (1 votes) · LW · GW

Much less frequent, somehow—to the point of being almost totally absent from my experience—are sentiments along the lines of “I prefer modern UIs, for the following specific reasons; they are superior to older UIs, which have the following specific flaws (which modern UIs lack)”.

I think maybe what's going on is that people who are good at talking about what they like generally prefer older approaches? But if you run usability tests, focus groups, A/B tests, etc you see users do better with modern UIs.

But note that this objection essentially concedes the point: that the pressure toward “modernization” of UX design is a Molochian race to the bottom.

I do think there's a coordination failure here, as there is in any signaling situation. I think it explains less of what's going on than you do, and I also don't think getting UX people to agree on a code of ethics that prohibited non-feature-driven UI changes would be useful. (I also can't tell if that's a proposal you're still pushing.)

The amount of work is re­ally not that much

I have a hard time believing that you are serious, here. I find this to be an absurd claim.

To be specific, I'm estimating that the amount of work required to build and maintain a simple and constant UI wrapper around a browser rendering engine is about one full time experienced engineer for two weeks to build and then 10% of their time (usually 0% but occasionally a lot of work when the underlying implementation changes) going forward. The interface between the engine and the UI is pretty clean. For example, have a look at Apple's documentation for WebView:

A WebView object is intended to support most features you
would expect in a web browser except that it doesn’t implement
the specific user interface for those features. You are responsible
for implementing the user interface objects such as status bars,
toolbars, buttons, and text fields. For example, a WebView object
manages a back-forward list by default, and has goBack(_:) and
goForward(_:) action methods. It is your responsibility to create
the buttons that would send theses action messages.

The situation on Android is similar. Hundreds of apps, including many single-developer ones, use WebView to bring a web browser into their app, with the UI fully under their control.

in large part be­cause of anti-com­pet­i­tive be­hav­ior and gen­eral shadi­ness on the part of Google

Not sure what you’re refer­ring to here?

Once again, it is difficult for me to believe that you actually don’t know what I’m talking about—you would have to have spent the last five years, at the very least, not paying any attention to developments in web technologies.

I've been paying a lot of attention to this, since that's been the core of what I've worked on since 2012: first on mod_pagespeed and now on GPT. When I look back at the last five years of web technology changes the main things I see (not exhaustive, just what I remember) are:

  • SPDY, QUIC, HTTP/2, HTTP/3, TLS 1.3 (and everything moved to HTTPS post-Snowden)
  • Most sites can develop only for evergreen browsers (no dealing with IE8 etc)
  • Service workers, web workers
  • WebAssembly
  • Browsers blocking identity in third-party contexts
  • JavaScript modernization: Promises/async/await etc

I'm still not sure what you're referring to?

(As before: I work at Google, and am commenting only for myself.)

Comment by jkaufman on Experiments and Consent · 2019-11-12T02:31:55.577Z · score: 2 (3 votes) · LW · GW

Most companies manage to not run any of those long-term experiments and do things like overload ads to get short-term revenue boosts at the cost of both user happiness and their own long-term bottom line.

The claim was that A/B testing was "not as good a tool for measuring long term changes in behavior" and I'm saying that A/B testing is a very good tool for that purpose. That companies generally don't do it I think is mostly a lack of long-term focus, independent of experiments. I'm sure Amazon does it.

Note that at the end of a paper published in 2015, for a company which has been around for a while in the online ad business, let us say, they are shocked to realize they are running way too many ads and can boost revenue by cutting ad load.

The paper was published in 2015, but describes work on estimating long-term value going back to at least 2007. It sounds like you're referring to the end of section five, where they say "In 2013 we ran experiments that changed the ad load on mobile devices ... This and similar ads blindness studies led to a sequence of launches that decreased the search ad load on Google’s mobile traffic by 50%, resulting in dramatic gains in user experience metrics." By 2013 they were certainly already taking into account long-term value, even on mobile (which was pretty small until just around 2013). This section isn't saying "we set the threshold for the number of ads to run too high" but "we were able to use our long-term value measurements to better figure out which ads not to run". So I don't think "if even Google can fuck that up for so long so badly" is a good reading of the paper.

Ads are the first, second, third, and last thing any online business will A/B test, and if there's time left over, maybe something else will get tested.

I work in display ads and I don't think this is right. Where you see the most A/B testing is in funnels. If you're selling something the gains from optimizing the flow from "user arrives on your site" to "user finishes buying the thing" are often enormous, like >10x. Whereas with ads if you just stick AdSense or something similar on your page you're going to be within, say, 60% of where you could be with a super complicated header bidding setup. And if you want to make more money with ads your time is better spent on negotiating direct deals with advertisers than on A/B testing. I dearly wish I could get publishers to A/B test their ad setups!

Comment by jkaufman on Experiments and Consent · 2019-11-12T01:49:09.966Z · score: 2 (1 votes) · LW · GW

I didn't introduce Wikipedia as an example of a site with poor UI. I think it's pretty good aside from, as I said, the line width issue. It's also in a space that people have a lot of experience with: displaying textual information to people. Wikipedia could likely benefit from some A/B tests to optimize their page load times, but that's all behind the scenes.

Comment by jkaufman on Experiments and Consent · 2019-11-12T01:46:04.464Z · score: 2 (1 votes) · LW · GW

users look­ing at it will have a low im­pres­sion of it

Mistakenly, of course. This is a well-attested problem, and is fundamental to this entire topic of discussion.

I'm not sure that this is mistaken: companies that can keep their UI current can probably, in general, make better software. This probably only holds for large companies: since small companies face more of a choice of what to prioritize while large companies that look like they're from 2005 are more likely to be environments that can't get anything done.

I'm generally pretty retrogrouch, and do often prefer older interfaces (I live on the command line, code in emacs, etc). But I also recognize that different interfaces work well for different people and as more people start using tech I get farther and farther from the norm.

you can’t make it go away just by unilat­er­ally stop­ping play­ing

I never said that you could.

That was how I interpreted your suggestion that UX people start to follow a "change UIs only when functionality demands". Anyone who tried to do the "responsible" thing would lose out to less responsible folks. Even if you got a large group of UX people to refuse work they considered to be changing UIs for fashion, companies are in a much stronger position since the barrier to entry for UX work is relatively low.

how do I move to a browser with which I can effectively browse every website, but whose UI stays static? I can’t.

The rendering engines of Chrome/Edge/Opera (Blink), Safari (WebKit), and Firefox (Gecko) are all open source and there are many projects that wrap their own UI around a rendering engine. The amount of work is really not that much, especially on mobile (where iOS requires you to take this approach). If this was something that many people cared about it would not be hard for open source projects to take it on, or companies to sell it. That no one is prioritizing a UI-stable browser really is strong evidence that there's not much demand.

in large part because of anti-competitive behavior and general shadiness on the part of Google

Not sure what you're referring to here?

Comment by jkaufman on Experiments and Consent · 2019-11-11T21:06:14.129Z · score: 3 (2 votes) · LW · GW

Anyone who has a long-term view into user identity (FB, email providers, anywhere you log in) can totally do long-term experiments and account for user learning effects. Google published a good paper about this: Focusing on the Long-term: It’s Good for Users and Business (2015)

(Disclosure: I work for Google)

Comment by jkaufman on Experiments and Consent · 2019-11-11T21:03:32.647Z · score: 2 (1 votes) · LW · GW

Wikipedia has considerably superior usability to the majority of modern websites.

Wikipedia is generally pretty good, but the "lines run the full width of your monitor on desktop no matter how wide your screen" is terrible.

Comment by jkaufman on Experiments and Consent · 2019-11-11T21:02:14.603Z · score: 2 (1 votes) · LW · GW

“Dated” is not a problem unless you treat UX design like fashion. UIs don’t rust.

"Dated" is a problem for companies because users care about it in selecting products. Compare:

The first UI isn't "rusted", but users looking at it will have a low impression of it and will prefer competing products with newer UIs. I don't think fashion is the main motivator here, but it is real and you can't make it go away just by unilaterally stopping playing. (I mean I can but I'm an individual running a personal website, not a company.)

The “earlier understanding” of many problems in UX design was more correct. Knowledge and understanding in the industry has, in many cases, degenerated, not improved.

How so? I can think of cases where earlier UX was a better fit for experienced users and newer UXes are "dumbed down", is that what you mean?

The entire exercise is vastly negative-sum. It is destructive of value on a massive scale.

Let's take a case where all the externalities should be internalized: internal tooling at a well run company. I use many internal UIs in my day-to-day work, and every so often one of them is reworked. There's not much in the way of fashion here, since it's internal, but there are still UI changes. The kind of general "let's redo the UI and stop being stuck in a local maximum" is the main motivation, and I'm generally pretty happy with it.

I don't think the public-facing version is that different. If there was massive value destruction then users would move to software that changed UI less.

Comment by jkaufman on Experiments and Consent · 2019-11-11T20:44:54.251Z · score: 4 (2 votes) · LW · GW

Maybe, but if this is common enough to justify limiting experimentation I'd expect people to be able to easily find examples.

Comment by jkaufman on Experiments and Consent · 2019-11-11T14:16:51.462Z · score: 3 (4 votes) · LW · GW

Companies optimize for making money, and while ideally they do that by providing value for people in some situations they'll do that best by annoying users. The problem here is bad incentives, though, and if you took way A/B testing you'd just see cargo culting instead.

Comment by jkaufman on Experiments and Consent · 2019-11-11T14:14:15.652Z · score: 12 (3 votes) · LW · GW

(Assuming we're still talking about A/B testing significant changes to UIs on products that a lot of people use, which is a very small part of A/B testing)

The unstated assumption in your assertion is that A/B testing is the only way for companies to get feedback on their UIs. It isn't.

Wait, I don't think this. Running lots of tiny tests and dogfooding can both give you early feedback about product changes before rolling them out. You can run extensive focus groups with real users once you have something ready to release. But if you take the results from those tests just launch to 100%, sometimes you're going to make bad decisions . Real user testing is especially good for catching issues that apply infrequently, affect populations that are hard to bring in for focus groups, or that only come up after a long time using the product.

Here's an example of how I think these should be approached:

  • Say eBay was considering a major redesign of their seller UI. They felt like their current UI was near a local maximum, but if they reworked it they could get somewhere much better.

  • They run mockups by some people who don't currently sell on eBay, and they like how much easier it is to list products

  • They build out something fake but interactive and run focus groups, which are also positive.

  • They implement the new version and make it available under a new URL, and add a link to the old version that says "try the new eBay" (and a link to the new one that says "switch back to the old eBay").

  • When people try the new UI and then choose to switch back they're offered a comment box where they can say why they're switching. Most people leave it blank, and it's annoying to triage all the comments, but there are some real bugs and the team fixes them.

  • At first they just pay attention to the behavior of people who click the link: are they running into errors? Are they more or less likely to abandon listings? This isn't an A/B test and they don't have a proper control group because users are self-selected and learning effects are hard, but they can get rough metrics that let them know if there are major issues they didn't anticipate. Some things come up, they fix them.

  • They start a controlled experiment where people opening the seller UI for the first time get either the new or old UI, still with the buttons for switching in the upper corner. They use "intention to treat" and compare selling success between the two groups. Some key metrics are worse, they figure out why, they fix them. This experiment starts looking positive.

  • They start switching a small fraction of existing users over, and again look at how it goes and how many users chose to switch back to the old UI. Not too many switch back, and they ramp the experiment up.

  • They add a note to the old UI saying that it's going away and encouraging people to try out the new UI.

  • They announce a deprecation date for the old UI and ramp up the experiment to move people over. At this point the only people on the old UI are people who'e tried the new UI and switched back.

  • They put popups in the old UI asking people to say why they're not switching. They fix issues that come up there.

  • They turn down the old UI.

It sounds like you're saying they should skip all the steps after "They implement the new version and make it available under a new URL" and jump right to "They turn down the old UI"?

Comment by jkaufman on Experiments and Consent · 2019-11-11T13:41:17.220Z · score: 2 (3 votes) · LW · GW

Changing UIs has costs to users. So does charging for your service. Is charging for your service unethical? Think about the vast amount of frustration caused by people not having enough money, just so the company can shovel even more money onto already overpaid CEOs. (Want to modus again?)

I do think companies should seriously consider the disruption UI changes cause, just like they seriously consider the disruption of price increases, and often it will make sense for the company to put in extra development to save their users' frustration. For example, for large changes like the ~2011 Gmail redesign you can have a period of offering both UIs with a toggle to switch between them. (And stats on how people use that toggle give you very useful information about how the redesign is working.)

Companies that followed your suggestions would, over the years, look very dated. Their UIs wouldn't be missing features, exactly, but their features would be clunky, having been patched onto UIs that were designed around an earlier understanding of the problem. As the world changed, and which features were most useful to users changed, the UI would keep emphasizing whatever was originally most important. Users would leave for products offered by new companies that better fit their needs, and the company would especially have a hard time getting new users.

Comment by jkaufman on Experiments and Consent · 2019-11-11T13:15:39.305Z · score: 4 (2 votes) · LW · GW

This would not be safe to do in any car.

There are now actual driverless cars in Phoenix that you can hail. If they get into an emergency situation they need to resolve it entirely on their own because there isn't time to bring anyone else in.

The step before this was probably having a safety driver in the car who isn't expected to take over immediately, but can do things like move the car to the side of the road after an emergency stop. In that case the person in the driver's seat spending most of their time reading their phone would safe.

Comment by jkaufman on Experiments and Consent · 2019-11-11T00:49:07.097Z · score: 18 (9 votes) · LW · GW

Giving different results to different people for the same input is unethical.

You're going to need to give more justification for this. Here are some examples that I think even someone who's skeptical should be ok with:

  • If we both get mystery-flavor dum-dum lollipops they won't taste the same.

  • If we both open packs of Magic cards you might get much better cards than I do.

  • If we search Gmail for a phrase we'll get different results.

  • If we search Facebook for "John Smith" we should see different profiles, since FB considers the friend graph in ranking responses.

  • If I search Amazon for "piezos" it shows me piezo pickup disks, but if I search it in an incognito window I get "Showing results for piezas". This is because it has learned something about what sort of products I'm likely to want to buy.

  • If we ask for directions on Waze we may get different routings. All the routes it sends people on are reasonable ones (as far as it knows) and you get much better routing than you'd get from a hypothetical Waze that didn't have all its users as an experimental pool.

You give two arguments:

Even in just the online realm, it can cause major issues for people with learning disabilities or older people who aren't able to deal with change. If they need help with software, it can be a blocker for them if what they experience is different from what they see in help pages or on other people's computers.

It sounds like you're mostly talking about user-interface experiments? Like, if Tumblr shows me different results than it shows you that doesn't limit your ability to help me, or my ability to use help pages. Even just with UI experiments, your argument proves too much: it says it's unethical for companies to ever change their UI. Now people who are used to it working one way need all need to learn how to use the new interface. And all the Stack Overflow answers are wrong now. But clearly making changes to your UI is ok!

If either A or B is better for the user, they are getting discriminated against by the random algorithm that chooses which version of the software to show them.

Companies run A/B tests when they don't know which of A or B is better, and running these tests allows them to make products that are better than if they didn't run the tests. Giving everyone worse outcomes to make sure everyone always gets identical outcomes would not be an improvement.

Are there other reasons behind your claim?

Comment by jkaufman on Lite Blocking · 2019-11-06T13:22:14.837Z · score: 2 (1 votes) · LW · GW

I think your two examples of abusing the feature can be more easily and subtly done today:

  • If you hate me and want to spread rumors you could just share with "friends except jkaufman". To your other friends it just looks like a normal post with restricted visibility. You can do this today on FB, either directly or by adding me to your "restricted" list (which is automatically carved out from posts shared to "friends"). You can also just block me and expect reasonably that no one will notice.

  • If you want to make me look bad by assocation and you can convince me to accept a friend request from your fake accounts, all you have to do is post boring content I won't interact with. The network will quickly decide that these new friends are not interesting to me, and then when they later start posting hateful things I won't notice.

Comment by jkaufman on Lite Blocking · 2019-11-06T12:19:59.368Z · score: 2 (1 votes) · LW · GW

Users liking / interacting with things is a strong leading indicator of engagement and time spent, and you get it on a per-item basis. So you use those predictions heavily in deciding what to show people, but tune your model based on your larger scale metrics like time spent.

Comment by jkaufman on Speaking up publicly is heroic · 2019-11-06T03:43:27.794Z · score: 3 (2 votes) · LW · GW

When people speak up publicly and share their experience, everyone can make their own judgement about how to respond. In the two cases I refer to in this post the community response was pretty obvious, and there was a lot of corroboration. I can think of other cases of someone speaking up where it was less clear, and people reacted in a range of ways from "I'm going to stop interacting with this person because they hurt someone" to "I'm going to keep an eye on this person and be extra alert for other potentially harmful behavior" to "I don't think this report is credible and I'm going to ignore it".

While this isn't ideal in various ways, it does seem reasonably robust to false accusations.