In software engineering, what are the upper limits of Language-Based Security?

post by MakoYass · 2020-12-27T05:50:46.772Z · LW · GW · 5 comments

This is a question post.

Given the most security-amenable language you know, or that you could imagine building (or given some tools and processes that change the qualities of an existing language), by what proportion could we reduce the incidence rate of major security errors? In what domains could incidence be reduced all the way to zero?

Are there situations where your more secure language can't currently be used, because it imposes higher development or runtime costs?

Answers

answer by Darmani · 2020-12-27T06:52:40.819Z · LW(p) · GW(p)

It is already possible to build an embedded language inside of Coq which requires arbitrary security proofs for all executable code. It's theoretically possible to make those proofs guard against all known side channels, including, theoretically, hardware-based side channels such as Row Hammer.

Are you asking about which kinds of attacks can't be stopped by improving software? Or are you asking about the theoretical limits of PL technology? The latter question is not so interesting without  more constraints: as stated above, they're basically unbounded.

comment by MakoYass · 2020-12-27T23:19:17.486Z · LW(p) · GW(p)

(If you think the question is too underspecified to answer, you probably shouldn't try to post an answer in the answers section. There is a comments section.)

(I'll try to work this into the question description)

Are you asking about which kinds of attacks can't be stopped by improving software?

That would be an interesting thing to see discussed, sure.

Or are you asking about the theoretical limits of PL technology?

No, that might be interesting from the perspective of.. what kinds of engineering robustness will exist at the limits of the singularity (this topic is difficult to motivate, but I hope the reader would agree that we should generally try to make forecasts about cosmically large events even when the practical reasons to do it are not obvious. It seems a-priori unlikely that questions of, say, what kinds of political arrangements are possible in a post-organic galaxy sized civilization wont turn out to be important in some way, even if we can't immediately see how.)

But I'm mainly wondering from a practical perspective. Programming languages are generally tools of craft, they are for getting things done in reality, even many of the most theoretical languages aspire to that. I'm asking mainly from a perspective of...

Can we get the engineers of this imperiled civilization to take their responsibilities more seriously, generally? When it's helpful to be able to prove that something will work reliably before we put it into production, can we make that easy to do? Can any of the tooling design principles from that be generalized to the AGI alignment problem?

With regards to Coq, will those languages actually be used in reality, why or why not, should promote them, should we should fund their development?

I'm interested in different timescales

  • the most security-promoting development processes that are currently in wide use.
  • the most security-promoting development processes that are possible with recently developed technology.
  • processes that could be developed now.
  • processes that could come to exist 10 years away; processes that might exist 30-50 years from now.
  • perhaps some impossibility theorems that may bind even the creatures of the singularity.
Replies from: Darmani
comment by Darmani · 2020-12-28T01:01:12.819Z · LW(p) · GW(p)

Did not know about the answer/comment distinction! Thanks for pointing that out.

Before I dig deeper, I'd like to encourage you to come bring these questions to Formal Methods for the Informal Engineer ( https://fmie2021.github.io/ ), a workshop organized by myself and a few other researchers (including one or two more affiliated with LessWrong!) specifically dedicated to helping build collaborations to increase the adoption of formal methods technologies in the industry. Since it's online this year, it might be a bit harder to have these deep open-ended conversations, but you sound exactly like the kind of person we want to attend. (To set expectations, I should add that registrations already exceed capacity; I'm not sure how we plan to allocate spots.)

I'd also like to share this list of formal methods in industry: https://github.com/ligurio/practical-fm . In the past decade, there's been a huge (in relative, not absolute terms) increase in the commercial use of formerly Ivory Tower tools.

You may also be interested in the readings from this course: https://6826.csail.mit.edu/2020/


BTW, I've been trying to think about whom I know that directly works in language-based security.  (defined as a narrow specialty). The main name that comes to mind that I personally know is Stephen Chong, but I think some of the Wyvern developers ( https://wyvernlang.github.io/ ) may also consider themselves in this category (and be easier to get ahold of than a Harvard professor).

I'm going to briefly hit some of your more narrow questions now. As an aside, be wary about saying "process" to a researcher -- it's used narrowly in ICSE circles to mean "methodology" (e.g.: Agile). I'm trying to mentally replace every use with "language-based security."
 

the most security-promoting development processes that are currently in wide use.


I think Jim mostly gets it above: memory-safe languages and secure API design; also, implicitly, the type systems that make the latter possible.

There are a number of patterns in secure API design you might not know the names for, such as object capabilities ( https://en.wikipedia.org/wiki/Object-capability_model ).


In a way, this question is kinda self-answered by the framing, since language-based security primarily refers to language design, which primarily means type systems --- this is in contrast to techniques such as static analysis, testing, model checking, symbolic execution, and sandboxing.

Leaving the realm of Turing-complete programs, I'll point you to PNaCl/RockSalt ( https://news.harvard.edu/gazette/story/2012/07/nacl-to-give-way-to-rocksalt/ ) and eBPF, both of which have verified sandboxes.

If you're willing to be flexible about the "widely-used" statement, then individual companies have their own quirky languages, some of which have rather interesting restrictions. This language ostensibly used by OutSystems comes to mind ( https://link.springer.com/chapter/10.1007/978-3-642-19718-5_8 ), although I'm told by one of the authors that their actual implementation (as of 2013) is a bit simpler.

 

  • the most security-promoting development processes that are possible with recently developed technology

 

This is a tough question for me, because recent papers in this area tend to be about solving highly specific problems (the space of security problems is big, yo), and it takes a lot to generalize that to answer such a broad question. Also, I don't follow latest developments that intensely. I'm going to take a pass.

 

  • processes that could come to exist 10 years away; processes that might exist 30-50 years from now.

 

Adam Chlipala thinks by that time we'll be generating correct-by-construction code from specs. The Everest and Fiat-Crypto projects, both of which generate correct-by-construction cryptography code, are probably the two current best-known deployments of this.

 

 

  • perhaps some impossibility theorems that may bind even the creatures of the singularity.

 

"If it's nontrivial to prove your program terminates, your program probably doesn't run." --- an undergrad friend.


For common infrastructure software of today, no. Except maybe that I don't know it's been shown  possible to build secure, reliable software atop a realistic model of hardware faults.
 

For Software 2.0 (i.e.: neural net in the loop), it's a more open question that I don't know much about.


For the kinds of reflective self-improvement software MIRI discusses, that's part of their active research program (and generally outside the cognizance of PL/SE researchers).

Replies from: MakoYass
comment by MakoYass · 2020-12-28T05:16:35.581Z · LW(p) · GW(p)

but you sound exactly like the kind of person we want to attend

What, but I'm just a stray dog who makes video games about -... [remembers that I am making a game that centers around an esolang. Turns and looks at my BSc in formal languages and computability. Remembers all of the times a layperson has asked whether I know how to do Hacking and I answered "I'm not really interested in learning how to break things. I'm more interested in developing paradigms where things cannot be broken"]... oh.

uh. maybe.

answer by jimrandomh · 2020-12-27T21:55:25.695Z · LW(p) · GW(p)

The C-style "oops I used an object after freeing it and now anyone can execute arbitrary code" style of vulnerabilities is confined to a fairly narrow set of programming languages, most notably C and C++, which unfortunately happen to be popular and to have some interoperability advantages. One of the key requirements of a security-amenable language is that it can never tempt its users into writing parts of their project in C, which happens if the language is too slow (eg Python) or can't otherwise interoperate with important systems (most languages, unfortunately). Some programming language choices create performance and sysadmin problems that are treated by end-users as dealbreakers; in fact this was responsible for most of C's historical success.

A lot of issues crop up at the interfaces between programming languages, which tend not to fall neatly into one of the languages' scope. SQL injection is a classic example. A language can't do a whole lot in practice to protect against that, but a library can: by designing the APIs so that doing things the safe way is easy, and doing things the dangerous way requires calling functions with "dangerous" in the name (like React's dangrouslySetInnerHTML).

If I import third-party libraries from the internet, there is a lot of opportunity for mischief (both by those third parties, and by the people who might hack them). This is probably the nodejs ecosystem's weakest link at the moment. This problem is largely social; the best defense is a trustworthy group of curators providing a core set of libraries that you rarely need to step outside, plus an expectation that libraries are large projects that avoid creating indirect dependencies and that you use a small number of. (As opposed to having a hundred different left-pad style tiny libraries from a hundred different authors.)

Reducing security vulnerability incidence to zero is possible in some domains, but a programming language alone can't do it; I can write if(password=="backdoor") acceptLogin() in any language.

I think there's a fair amount of room left for incremental improvement, but in practice I think it looks less like "move everyone to Haskell and Coq" and more like "design a good core-library crypto API for X" and "reform common practices around npm".

5 comments

Comments sorted by top scores.