Preventing, reversing, and addressing data leakage: some thoughts

post by VipulNaik · 2022-11-15T02:09:22.482Z · LW · GW · 4 comments

Contents

  Prevention strategies: general philosophy
    The accident triangle philosophy and the conjunctive nature of accidents
    Avoid multitasking when handling sensitive data
    Develop clipboard hygiene
      The failure scenario
      Clipboard-clearing
      Making clipboard-clearing second nature
      Cutting down on dangerous clipboard use in the first place
      Cutting down on accidental pasting
    Pause and check before hitting Enter or send
    Double-check when sharing documents or putting them in shared folders
    Be careful about screensharing
    Take additional precautions when using shared computers (such as at a cybercafe or printing shop)
    Pay attention to warnings and color coding
    Figure out how tagging interacts with permissions and notifications
  Reversal strategies
    Undo send for emails and messages
    Undo bad edits to shared documents
    Remove sensitive information from histories
    Undo bad edits to posts or comments on online fora
    Reverse local changes within git repositories
  Detecting and monitoring leakage
    Anomaly-checking in the data you publish
    Activity logs and security alerts for services
  Addressing leakage of password or credentials
    Change credentials where feasible (after reversing whatever you can reverse)
    Check for any alerts or logs showing unauthorized access over the period where the credentials were leaked
    Review after being done to make sure you've covered your bases
  Addressing leakage of factual information
    Check if others accessed the information
    Figure out what kind of secret data it was from the person it leaked to
      The benign case of not-very-actionable information they don't and shouldn't know but they know about
      The trickier case of information that is genuinely surprising and actionable to them
    Explicitly discuss if you're sure they saw it
    Make a judgment call about explicit sharing if you aren't sure if they saw it or not
    What can you do in advance to make addressing leakage easier?
    More on the point about not making enemies, and the need for extra caution if making enemies is inevitable
  Meta comments
    Why think about this in advance? Object-level and meta-level reasons
    Tool-specific guidance versus the general ideas
    Am I encouraging people to hide factual information more efficiently?
None
4 comments

In the last few months, I've been thinking about the problem of accidental leakage of data and how to prevent it, reverse it, and address its aftermath. This post includes various thoughts, some of them including tool-specific guidance, and some of them more general.

Examples of data leakage are as follows:

The prevention steps are fairly similar for both cases, but the steps for addressing are somewhat different. Specifically, for the case of passwords or sensitive credentials, changing it is part of the best practice response. For factual information, on the other hand, it is often infeasible to change the facts since they're the facts!

This post is largely focused on leakage in the digital realm. Some similar ideas apply in the physical realm as well.

This post also doesn't cover large-scale data leakage, nor does it cover complementary best practices such as password security, other forms of authentication security (such as IP-based limits and two-factor authentication) and encryption. I might write additional posts about some of those topics, but those topics are in general more widely covered, hence my desire to write what I'm writing first.

I end the post with meta comments including more on its potential relevance to LessWrong as well as the distinction between general ideas and details specific to particular tools and services.

Prevention strategies: general philosophy

The accident triangle philosophy and the conjunctive nature of accidents

The accident triangle idea is that for every accident with major injury, there are several accidents with minor injury and even more accidents with no injury. A similar idea applies here: for every case of leaked sensitive data, there are several cases where something went wrong, but sensitive data didn't end up being leaked. Since sensitive data leakage is a low-probability-but-high-cost occurrence, we don't get a lot of data points for it.

One underlying insight here is that a major failure such as accidental data leakage is often conjunctive: for instance, an accidental data leakage through accidental copy/paste is a conjunction of several things:

The conjunction of all these is uncommon/unlikely. But we can think of each of these as an accident-with-no-injury, and independently try to reduce the likelihood of each of them. This will reduce the overall probability of leakage due to accidental copy/paste to a negligible amount.

Avoid multitasking when handling sensitive data

In general, it's a bad idea to multitask between handling sensitive data and doing things in a completely different domain. For instance, it's a bad idea to be chatting with friends while changing a password, or composing a public tweet while editing a private doc with sensitive information. There's a possibility of getting mixed up.

Also, it's important to not do anything with sensitive data while on a conference call, especially if screensharing. It's probably also better to not work with sensitive data while in a shared physical space with people who shouldn't have access to the data, due to the risks of people seeing the data by peeking at your screen.

Develop clipboard hygiene

The failure scenario

Your computer's clipboard is the thing that gets pasted when you paste (using Ctrl+V on Windows or Cmd+V on Mac or Ctrl+Y in emacs; Linux shells vary in terms of which of these shortcuts they support). You put stuff in the clipboard by executing a cut (Ctrl+X on Windows, Cmd+X on Mac, Ctrl+W on emacs) or a copy (Ctrl+C on Windows, Cmd+C on Mac, Meta+W on emacs).

An interesting property of clipboards is that pasting doesn't clear the clipboard; the item remains in the clipboard until it is overridden by something else being cut or copied. This means that a scenario like this is not uncommon:

Clipboard-clearing

Making clipboard-clearing second nature

Cutting down on dangerous clipboard use in the first place

In general, it's best not to have sensitive data in the clipboard in the first place. Consider some alternatives:

Cutting down on accidental pasting

The paste shortcut key (Ctrl+V in Windows, Cmd+V in Mac) is close to several other shortcut keys, including the shortcut key for copy and for bolding. So there's a good chance of accidentally pasting when you're trying to copy or bold.

If your clipboard is clear of sensitive information in the first place, accidental pasting causes limited damage. But based on the accident triangle philosophy, it's best practice to try to cut down on accidental pasting as much as possible, especially when switching contexts or using a collaboratively edited doc such as a Google Doc, or while sharing screen (accidental pasting while within the flow of editing a document locally and privately is not that big a deal).

The main way to cut down on accidental pasting is a mix of (a) improving precision of one's keyboard use in general, and (b) going a little slower whenever using the keys to cut, copy, paste, or bold, e.g., looking at the keyboard rather than touch-typing even if you normally touch-type.

Pause and check before hitting Enter or send

Always pause and check what you're sending before you hit Enter or send, whether you're using email or a messaging tool. In general, I recommend a gap of at least 2-3 seconds between typing your message and sending it, as it gives you enough time to react to anything weird you see in what you're sending.

I generally check the following:

Where applicable, I also try to use preview functionality.

This is an important practice even outside the issue of sensitive data leakage; sending a message to a wrong recipient delays the right recipient receiving it, while also confusing the wrong recipient.

Double-check when sharing documents or putting them in shared folders

Sharing documents in Google Drive or a similar online service for document collaboration is a bit like sending, and the same cautions apply here as they do with sending emails. (Maybe a bit less because you have the ability to edit the document after sending, but you still don't want to share sensitive data).

A silent way of sharing documents is to move them into a shared folder. This generally doesn't proactively notify recipients (but you should double-check this for the specific service you are using) but it still makes the documents accessible to them, and they'll see the documents if they happen to view the folder after you put the document in the folder. Be careful when putting documents in shared folders -- if in doubt, check who has access to the folder, and that it matches the people you want to have access to the document. NOTE: Google does warn you about this when moving a document to a shared folder; in general, it's a good idea to read and ponder such warnings.

Be careful about screensharing

These suggestions apply when screensharing on a conference call (such as Zoom, Google Meet, or Slack Huddle), and also when sharing your screen physically such as in a physical conference room.

Take additional precautions when using shared computers (such as at a cybercafe or printing shop)

You may sometimes need to use a shared computer, such as one at a cybercafe, for printing and scanning documents. It is possible that software on these machines (such as keyloggers) is surreptitiously collecting information about your login credentials. (I don't know of any incidents where this has happened to me or anybody I know, so it's likely that this is fairly rare).

For more caution, I recommend the following:

Pay attention to warnings and color coding

Many services warn you when they suspect you're sending something suspicious. For instance, Gmail warns you if the email you're trying to send includes links to Google Drive content that the recipients don't have access to. Pay attention to these warnings, as they could help prevent accidental sends to the wrong parties.

Gmail also uses color coding and banners in various ways to indicate cases where it suspects that you're sending a message to the wrong recipients. Pay attention to these colors and banners! For instance, if you're using Google for Workspaces, your work Gmail will color-code the recipients who are outside your organization, so you can eyeball and see that these people aren't in your organization. Similarly, Google Drive issues a warning when you try to share a document with somebody outside your workspace.

In Slack, if you're using Slack Connect to communicate with people from a different workspace, Slack will show a message right above your message-drafting area reminding you that the other person is from a different organization. Pay attention to this.

Figure out how tagging interacts with permissions and notifications

Many services (social media such as Facebook or Twitter, messaging tools such as Facebook Messenger or Slack, collaborative editing tools such as Google Docs) allow for the possibility of tagging people or other entities. Services can differ in the way that tagging interacts with permissions and notifications.

In general, it seems safest not to tag entities that you don't want to see the place you're tagging them in, unless you are very sure that the service does not notify them about being tagged.

Reversal strategies

Undo send for emails and messages

Undo bad edits to shared documents

When you make a bad edit to a document in Google Docs, Google Slides, or Google Sheets, you should reverse it immediately without navigating away from the tab where you're editing. The reason for this is two-fold:

If nobody else is viewing the document at the time you make your undesired edit, and you reverse the edit immediately without switching tabs, it won't get into the history and will not be visible to others. You should confirm this later by reviewing the history of the document in its most expanded form.

Remove sensitive information from histories

Undo bad edits to posts or comments on online fora

This applies for many online fora; the one I have the most experience with is GitHub.

It's generally good to investigate what is and isn't allowed by a forum in advance, so that when you do get in the position of having entered sensitive data, you can quickly choose the optimal approach.

Reverse local changes within git repositories

If you made a sensitive change to a git repository that is shared with others (e.g., on GitHub) local changes you make could be reversed if they have not yet been pushed to the remote origin. Here's some guidance:

Detecting and monitoring leakage

What happens if you leak sensitive data and don't even notice that you did so? That's pretty dangerous, because it means that you can't even take the appropriate action to reverse and address it.

Anomaly-checking in the data you publish

I have a few lines of shell script code in one of my scripts (that I run regularly, several times a day) that scan my personal git repositories to make sure they don't include any keywords related to my day job work. This is to address a possibility that I might accidentally type in some work-related stuff into a file in one of my personal repositories. I have some other anomaly checks in a similar spirit.

Things of this kind can be helpful for detecting leaked information that might otherwise be missed.

Activity logs and security alerts for services

If your credential for a service gets leaked, and somebody uses that leaked credential to access the service, it'll be good if you are set up to receive a security alert. For instance, Google logins send a security alert to both the email address that logged in and the recovery email address. Note that this alert will catch when the leak is acted upon by others, and may not catch a latent leak that others haven't acted upon.

Addressing leakage of password or credentials

Change credentials where feasible (after reversing whatever you can reverse)

You should assume that any password or credential is compromised if you think others might have seen it (for instance, you included it in a Slack message and then deleted the message, but aren't sure if people read the message before you deleted it). Therefore, you should change it as soon as feasible. if you are using the same or a highly similar password or credential elsewhere, you should change that too.

This is made easier if:

Check for any alerts or logs showing unauthorized access over the period where the credentials were leaked

Some services offer details on recent login activity, or the time of recent use of access keys. Take a look at these to check if the leaked credential was used over the time period before it was changed.

Review after being done to make sure you've covered your bases

After you've done whatever reversal and credential changes you need to, you should sit down and review the situation, systematically going over what happened to see if things are back to a secure state. If there are other trusted people with whom you can discuss the situation, please do so.

Addressing leakage of factual information

Check if others accessed the information

This is most relevant to factual information that, unlike credentials, cannot simply be changed.

In case of factual information that others were not supposed to have, you may be able to check if they were able to access the information without explicitly asking them. Some cases:

Figure out what kind of secret data it was from the person it leaked to

The benign case of not-very-actionable information they don't and shouldn't know but they know about

An example here is that you accidentally share information about your company's quarterly revenue with a work colleague who is not supposed to have access to the data. Then you detect and delete the data. There's nothing particularly surprising about the data you accidentally shared; it's just that as a matter of policy, your colleague shouldn't have access to that data. Your colleague is also aware that such data exists, and that you have access to the data -- the only thing they didn't know is the specifics.

The trickier case of information that is genuinely surprising and actionable to them

Let's say you are a senior executive at a company and have been tasked with figuring out what half of the people in your division to fire. If you accidentally share musings/thoughts related to this with one of the people under you (who might end up being in the line of fire, even if it's unclear right now), this is a case of information that is actionable to them. This is a tricky situation to be in, and here you have to exercise a judgment call regarding whether to take them into confidence explicitly versus go with the risk that they might have seen the information.

In this case, the other person may feel stress and fear about the situation. Depending on how they model you, and what they expect from you, they may also feel disappointed in you for not sharing the information with them earlier. Finally, if you only leaked to a subset of the affected people, you may create asymmetries between them (e.g., now some of the people who may be fired know they may be fired, and others don't) and the possibility of further leaks.

Explicitly discuss if you're sure they saw it

If you're sure the other person saw it, it seems best to explicitly discuss.

In the case of "not-very-actionable information they don't and shouldn't know but they know about" it may be as simple as saying "Hey, I accidentally leaked this info to you; you may have seen it; please ignore and delete it. Sorry!"

In the "trickier case of information that is genuinely surprising and actionable to them", you'll need more of a strategy. The exact strategy you choose partly depends on the reasons you kept the information secret in the first place.

Make a judgment call about explicit sharing if you aren't sure if they saw it or not

If you were able to re-hide factual information that you temporarily leaked, and aren't sure if people saw it during the interim, you have to make a judgment call: do you operate under the assumption that they probably didn't see it (and it's as good as if you had never leaked it), or do you act on the assumption that they might have seen it? The specifics here will vary based on the situation.

In the "benign case of not-very-actionable information they don't and shouldn't know but they know about" it doesn't matter too much; it may be easiest to just not say anything, but if they bring it up you can tell them you accidentally shared it and ask them to ignore or delete it.

In the "trickier case of information that is genuinely surprising and actionable to them", you have to choose between sharing with them and asking them to keep confidence, sharing more widely with all affected parties, and just keeping quiet until you have stronger evidence that they know.

What can you do in advance to make addressing leakage easier?

When hiding sensitive information over an extended period of time, you should assume some probability of leakage. Here are a few thoughts on how you can make it easier:

More on the point about not making enemies, and the need for extra caution if making enemies is inevitable

I want to dig in on the point about not making enemies because it's pretty important when thinking about the impact of any kind of mistake, whether it's data leakage or anything else. When you have enemies, there's more risk that they'll pounce on and exploit your mistakes. In contrast, if people are friendly with you and like and respect what you're doing, they're more likely to be generous to your faults, particularly if those faults don't directly hurt them.

There are, however, cases where the very nature of what you're doing means the other side is going to be hostile to you, and you have accepted that. In that case, you need to invest in a substantially greater level of caution as the cost of data leakage is higher. One example of such a situation is when you're a whistleblower for a terrible situation. It's very likely that the fact that you're blowing the whistle itself needs to be kept a secret from the people whose whistle you're blowing.

Or, there could be cases where people don't have enmity against you personally, but you're a messenger of really bad news coming from elsewhere (for instance, you have information in advance about mass layoffs and have been told to not communicate this to any of the people in the pool that might be considered for layoffs). Here again, you can mitigate the impact to some extent by not additionally behaving in ways that make you personally unlikable. But given the base animosity of the situation, exercising additional caution is probably very important.

Meta comments

Why think about this in advance? Object-level and meta-level reasons

Why think about this sort of thing in advance, rather than wait to let a problem happen and then tackle it after it happens?

I think there are several object-level reasons to think about this in advance:

There are also meta-level reasons that might be relevant for the LessWrong audience; data leakage is a concrete example of a low-frequency but high-cost disaster, and as such, we don't get a lot of day-to-day feedback around it. Thinking about this sort of thing offers relatively easy practice of security mindset as well as practice in reducing accidents. Such practice could be useful for avoiding even rarer and higher-stakes problems.

Tool-specific guidance versus the general ideas

In this post, I've included concrete details for several specific tools such as Gmail, Messenger, Slack, git and GitHub, Google Docs. These tools may change over time and some of my advice may become outdated. Also, you might be using very different tools, so you may not be able to put the guidance to immediate use.

I wanted to put in specific guidance for tools that people are likely to use in order to make the action items fairly concrete and make the post useful. But I also think that they are useful even if you don't happen to use these specific tools, but they provide a framework for thinking about the structure and design choices underlying these tools. Even if you use a different tool, it was likely designed with relatively similar design constraints.

For instance, Gmail, Facebook Messenger, and Slack all offer different versions of "undo send" that are all meaningfully different in their implications for how to deal with sensitive data:

If you happen to use something different from all three of these, you at least have a framework and a set of comparison points, which can lead you to ask the right questions about what your service supports and what implications it has for the leakage of sensitive data.

Am I encouraging people to hide factual information more efficiently?

I think the idea of hiding credentials is relatively uncontroversial; this is good for security (even if others have access to the same resources you do, using separate credentials is better for security as it allows you to swap your credentials out without affecting others).

On the other hand, hiding factual information from others is not always good. It's clearly necessary in at least some cases, but we could also argue that there is a wide range of cases where it's done in service of wrong purposes. One could even argue that in some cases, it's better for the world if the information being hidden did get leaked.

I think that even though the post offers guidance on how to hide factual information more effectively, it also raises considerations that should encourage people to reduce reliance on hiding relevant factual information for wrong purposes. In particular, earlier in the post, I wrote:

  • Don't make enemies! In general, people are more likely to exacerbate a leakage (by leaking it further) if they don't like you. Don't give people reasons to hate you or want to get back at you.
  • As much as possible, try to minimize the number of cases where you're hiding from people "information that is genuinely surprising and actionable to them"; as much as possible, data that you hide should be of the form of "not-very-actionable information they don't and shouldn't know but they know about" (for instance, detailed financial statements that are consistent with the high-level picture they have of finances, but are being kept secret). As long as most secrets aren't things that are very actionable to the people who they get leaked to, these people will just ignore them (unless they hate you, which brings me back to the "Don't make enemies!" point).

[...]

  • Formulate contingency plans for how to deal with a leakage. In particular, think through what path you'll take if the information has definitely or probably leaked, per the preceding subsections. Think whether the fact that you had the information and kept this a secret will seem, in hindsight, to be a moral failing on your part. If it will, examine whether it really is the right thing to keep it secret.

I think these points probably push in the direction of not keeping material, actionable secrets from people for the wrong reasons.

4 comments

Comments sorted by top scores.

comment by riceissa · 2022-11-15T02:29:02.462Z · LW(p) · GW(p)

If you're pasting sensitive data such as a password or card number for regular entry of that password, consider other options such as using the browser autofill or a password manager.

Some password managers like KeePassXC automatically clear the clipboard after 10 seconds or when you close the program (whichever comes first).

comment by clone of saturn · 2022-11-15T04:56:04.740Z · LW(p) · GW(p)

This post is currently tagged "security mindset" but the advice seems close to the opposite of security mindset; it amounts to just trying to be extra careful, and if that doesn't work, hoping the damage isn't too bad. Security mindset would require strategies to make a leak impossible or at least extremely unlikely.

Replies from: VipulNaik
comment by VipulNaik · 2022-11-15T06:50:41.966Z · LW(p) · GW(p)

Good point -- I removed the tag!

comment by Dagon · 2022-11-15T18:29:21.660Z · LW(p) · GW(p)

It's worth considering additional layers beyond "make it less likely to leak".  Reducing the adversarial value of information is a very good strategy.  Use 2FA and change your important passwords regularly (and rotate immediately if you know of a leak).  Don't keep many secrets, if you can help it.  Or understand that leaks happen and be prepared to release MORE information if needed to ensure proper framing of whatever socially-painful leak happens.