Type-safeness in Shell

post by Martin Sustrik (sustrik) · 2019-05-12T11:30:00.680Z · LW · GW · 3 comments

Since writing the post on a hypothetical hull language as an alternative to shell I cannot stop thinking about the shortcomings of shell.

And one think that comes to mind over and over is type-safeness. Shell treats everything as a string and that's the source of both its power and its poor maintainability.

So when I ask whether shell can be improved, the question is actually more subtle: Can it be improved without compromising its versatility? Can we, for example, be more type-safe without having to type Java-like stuff on the command line? Without sacrificing the powerful and dangerous features like string expansion?

I mean, you can write shell-like scripts in Python even today and use type hints to get type safeness. But in real world this practice seems to be restricted to writing more complex programs, programs that require actual in-language processing, complex control flow, use of libraries and so on. Your typical shell script which just chains together a handful of UNIX utilities — no, I don't see that happening a lot.

To put it in other words, different "scripting languages" managed to carve their own problem spaces from what once used to be the domain of shell, but almost none of them attacked its very core use case, the place where it acts as a dumb glue between stand-alone applications.

But when writing shell scripts, I observe that I do have a type system in mind. When I type "ls" I know that an argument of type "path" should follow. Sometimes I am even explicit about it. When I save JSON into a file, I name it "foo.json". But none of that is formalized in the language.

And in some way, albeit in a very hacky one, shell is to some extent aware of the types. When I type "ls" and press Tab twice a list of files appears on the screen. When I type "git checkout" pressing Tab twice results in a list of git branches. So, in a way, shell "knows" what kind of argument is expected.

And the question that's bugging me is whether the same can be done in a more systemic way.

Maybe it's possible to have a shell-like language with actual type system. Maybe it could know that file with .json extension is supposed to contain JSON. Or it could know that "jq" expects JSON as an input. Maybe it could know that JSON is a kind of text file and that any program accepting a text file (e.g. grep) can therefore accept JSON as well. And it could know that "ls -l" returns a specific "type", a refinement of "text file" and "file with one item per line", with items like access rights, ownership, file size and so on.

But how would one do that?

In addition to the language implementing a type system it would require some kind of annotation of common UNIX utilities, adding formal specification of their arguments and outputs. (With all programs not present in the database defaulting to "any number of arguments of any type and any output".) Maybe it can be done by simple type-safe wrappers on top of existing non-type-safe binaries.

3 comments

Comments sorted by top scores.

comment by quanticle · 2019-05-12T19:06:21.121Z · LW(p) · GW(p)

PowerShell does a lot of this, doesn't it? PowerShell abandons the concept of programs transferring data as text, and instead has them tranferring serialized .Net objects (with type annotations) back and forth. It doesn't extend to the filesystem, but it's entirely possible to write functions that enforce type guarantees on their input (i.e. requiring numbers, strings, or even more complicated data types, like JSON).

A good example is searching with regexps. In Unix, grep returns a bunch of strings (namely the lines which match the specified regex). In PowerShell, Select-String returns match objects, which have fields containing the file that matched, the line number that matched, the matching line itself, capture groups, etc. It's a much richer way of passing data around than delimited text.

comment by Julian O · 2019-05-12T17:05:13.831Z · LW(p) · GW(p)

In the early 1990s, there was a computer called the Rational 1000, which was pretty much a specialised development machine for producing code in the Ada programming language.

The "shell" language for the system was... Ada.

It was a weird choice - Ada is compiled. Ada is a strongly/strictly typed language. Ada is certainly not terse, but the IDE helped with a lot of the boilerplating. It is not what you normally think of as a scripting language.

Nonetheless, I think it was very successful. The users all knew Ada well. The command line (itself, written with the help of an IDE) knew all the types of all the parameters and could help complete them (alas - very, very slowly).

I see this as an anecdote to support your idea that a typed scripting language could work.

comment by mako yass (MakoYass) · 2019-05-13T23:47:32.605Z · LW(p) · GW(p)

Make a powerful enough system shell with an expressive enough programming language, and you shall be able to unify the concepts of GUI and API, heralding a new age of user empowerment, imo.

This unification is one of the projects I'd really like to devote myself to, but I'm sort of waiting for Rust GUI frameworks to develop a bit more (since the shell language will require a new VM and the shell and the language server will need to be intimately connected, I think Rust is the right language). (It may be possible to start, at this point, but my focus is on other things right now.)