Sorry for the downtime, looks like we got DDosd

post by habryka (habryka4) · 2024-12-02T04:14:30.209Z · LW · GW · 13 comments

Contents

13 comments

We were down between around 7PM and 8PM PT today. Sorry about that.

It's hard to tell whether we got DDosd or someone just wanted to crawl us extremely aggressively, but we've had at least a few hundred IP addresses and random user agents request a lot of quite absurd pages, in a way that was clearly designed to avoid bot-detection and block methods. 

I wish we were more robust to this kind of thing, and I'll be monitoring things tonight to prevent it from happening again, but it would be a whole project to make us fully robust to attacks of this kind. I hope it was a one-off occurence, but also, I think we can figure out how to make it so we are robust to repeated DDos attacks, if that is the world we live in, though I do think it would mean strapping in for a few days of spotty reliability while we figure out how to do that.

Sorry again, and boo for the people doing this. It's one of the reasons why running a site like LessWrong is harder than it should be.

13 comments

Comments sorted by top scores.

comment by Zolmeister · 2024-12-02T09:48:14.699Z · LW(p) · GW(p)

I recommend Cloudflare.

Replies from: habryka4, programcrafter
comment by habryka (habryka4) · 2024-12-02T10:09:00.466Z · LW(p) · GW(p)

Yeah, we considered setting up a Cloudflare proxy for a while, but at least for logged-in users, LW is actually a really quite dynamic and personalized website, and not a great fit for it (I do think it would be nice to have a logged-out version of pages available on a Cloudflare proxy somehow).

Replies from: Zolmeister
comment by Zolmeister · 2024-12-02T11:29:17.248Z · LW(p) · GW(p)

I was referring to their (free) DDoS protection service, rather than their CDN services (also free). In addition to their automated system, you can manually enable an "under-attack" mode that aggressively captchas requests.

Setup is simply pointing DNS name-servers at Cloudflare. Caching HTML pages for logged out (i.e. cookie-less) users is a trivial config ("cache-everything").

Replies from: habryka4
comment by habryka (habryka4) · 2024-12-02T17:51:00.589Z · LW(p) · GW(p)

Oh, interesting. I had not properly realized you could unbundle these. I am hesitant to add a hop to each request, but I do sure expect Cloudflare to be fast. I'll look into it, and thanks for the recommendation.

comment by ProgramCrafter (programcrafter) · 2024-12-02T15:19:53.155Z · LW(p) · GW(p)

It's a solution! However it comes with its own downsides. For instance, Codeforces users ranted on Cloudflare usage for a while, with following things (mapped to LessWrong) highlighted:

  • The purpose of an API is defeated: even the API endpoints on the same domain are restricted, which prevents users from requesting posts via GraphQL. In particular, ReviewBot will be down (or be hosted in LW internal infrastructure).
  • In China, Cloudflare is a big speed bump.
  • Cloudflare-protected sites are reported to randomly lag a lot.
    > I had been assuming that this is a server problem, but from talking to some people it seems like this is an issue with differential treatment of who is accessing CF.
    Lack of interaction smoothness might be really noticeable for new users, comparing to current state.
comment by htfr · 2024-12-03T17:09:58.212Z · LW(p) · GW(p)

If you have the developer time for it, have you considered building a cryptocurrency-based firewall? Pay $1 to whitelist your IPv6 range in the firewall.

What to do with non-whitelisted IPs is up to you, you could limit the bandwidth for them.

I suggest this because the endgame of the IP address doxxing performed by companies like cloudflare is the death of anonymity on the internet. Each ISP has a finite IP range and a finite number of optical fiber cables, so there's only so many times someone can change their IP address.

(Sure the NSA probably knows who you are anyway, but IP ranges mapped to real names by random companies are eventually going to end up sold on the dark web to basically anyone with money.)

Replies from: Dagon
comment by Dagon · 2024-12-03T23:10:12.656Z · LW(p) · GW(p)

Interesting thought.  I tend to agree that the endgame of ... protection from scalable attacks in general ... is lack of anonymity.  Without identity, there can be no memory of behavior, and no prevention of abuse that's only harmful across multiple events/sources.  I suspect it's a long way out, though.

Your proposed solution (paid IP whitelisting) is pretty painful - the vast majority of real users (and authorized scrapers) don't have a persistent enough address, or at least don't know that they do, to participate.

Replies from: sjadler
comment by sjadler · 2024-12-03T23:49:51.430Z · LW(p) · GW(p)

Hi! Created a (named) account for this - in fact, I think you can conceptually get some of those reputational defenses (memory of behavior; defense against multi-event attacks) without going so far as to drop anonymity / prove one's identity!

See my Twitter thread here, summarizing our paper on Personhood Credentials.

Paper's abstract:

Anonymity is an important principle online. However, malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes. With the advent of increasingly capable AI, bad actors can amplify the potential scale and effectiveness of their operations, intensifying the challenge of balancing anonymity and trustworthiness online. In this paper, we analyze the value of a new tool to address this challenge: "personhood credentials" (PHCs), digital credentials that empower users to demonstrate that they are real people -- not AIs -- to online services, without disclosing any personal information. Such credentials can be issued by a range of trusted institutions -- governments or otherwise. A PHC system, according to our definition, could be local or global, and does not need to be biometrics-based. Two trends in AI contribute to the urgency of the challenge: AI's increasing indistinguishability from people online (i.e., lifelike content and avatars, agentic activity), and AI's increasing scalability (i.e., cost-effectiveness, accessibility). Drawing on a long history of research into anonymous credentials and "proof-of-personhood" systems, personhood credentials give people a way to signal their trustworthiness on online platforms, and offer service providers new tools for reducing misuse by bad actors. In contrast, existing countermeasures to automated deception -- such as CAPTCHAs -- are inadequate against sophisticated AI, while stringent identity verification solutions are insufficiently private for many use-cases. After surveying the benefits of personhood credentials, we also examine deployment risks and design challenges. We conclude with actionable next steps for policymakers, technologists, and standards bodies to consider in consultation with the public.

Replies from: Dagon
comment by Dagon · 2024-12-04T00:40:54.696Z · LW(p) · GW(p)

This seems just like regular auth, just using a trusted 3P to re-anonymize.  Maybe I'm missing something, though.  It seems likely it won't provide much value if it's unbreakably anonymous (because it only takes a few stolen credentials to give an attacker access to fake-humanity), and doesn't provide sufficient anonymity for important uses if it's escrowed (such that the issuer CAN track identity and individual usage, even if they currently choose not to).

Replies from: sjadler
comment by sjadler · 2024-12-04T03:47:30.775Z · LW(p) · GW(p)

Yeah I appreciate the engagement, I don’t think either of those is a knock-down objection though:

The ability to illicitly gain a few credentials —> >1 account is still meaningfully different from being able to create ~unbounded accounts. It is true this means a PHC doesn’t 100% ensure a distinct person, but it can still be a pretty high assurance and significantly increase the cost of doing attacks that depend on scale.

Re: the second point, I’m not sure I fully understand - say more? By our paper’s definitions, issuers wouldn’t be able to merely choose to identify individuals. In fact, even if an issuer and service-provider colluded, PHCs are meant to be robust to this. (Devil is in the details of course.)

comment by Chipmonk · 2024-12-02T04:17:30.234Z · LW(p) · GW(p)

another weird bug is if i click the link i was just sent in my email, it brings me to a 403 Forbidden page (even though the URLs of this functional page and that 403 page look identical)

Replies from: habryka4
comment by habryka (habryka4) · 2024-12-02T04:20:51.862Z · LW(p) · GW(p)

Should now be fixed. We've blocked traffic to basically all pages and been restoring them incrementally to make sure we don't go down again immediately. I just lifted the last of those blocks.

Replies from: Chipmonk
comment by Chipmonk · 2024-12-02T04:22:40.197Z · LW(p) · GW(p)

works!