"Throwing Exceptions" Is A Strange Programming Pattern

post by Thoth Hermes (thoth-hermes) · 2023-08-21T18:50:44.102Z · LW · GW · 13 comments

This is a link post for https://thothhermes.substack.com/p/throwing-exceptions-is-a-strange

Contents

13 comments

For laypeople: In software, “throwing an exception” is a programmer-chosen type of behavior that occurs on the event that an “error” occurs in the course of program execution. This might happen immediately prior to or right at the time of the error.

This is ostensibly done in order to avoid something more catastrophic happening down the line. I’m going to argue that this basis does not often seem to be justified.

You are probably familiar with instances of your programs suddenly and without warning becoming unresponsive, closing themselves down, or simply straight-up disappearing sometimes, perhaps with an error message pop-up soon to follow, but perhaps without one. These are due to exceptions being thrown, not just “errors” per se. Let me explain.

When a function or procedure is called in a program, it is typically expected to return something. The value that this function returns is going to be either one of a set of “normal” values, which are the ones you expect to receive if all goes well, or it could be an “abnormal” or “anomalous” value that only returns if “something bad” happens.

If you choose to be relatively laid-back about your program, you might decide to check for anomalous values only after the procedure returns. Maybe even well after! Furthermore, how you choose to decide what “anomalous” means is arbitrary and up to you.

If you are more paranoid, you have typically decided what counts as “anomalous” before the program runs. These stipulations often come in the form of establishing type requirements for the arguments to the function, and possibly also range checks on the size or values of the inputs. These are just common examples, but like I said, it is arbitrary. It’s also possible that the function calls another function inside of itself, awaits for that function to return something, and then decides whether or not what that function returned is anomalous.

“Throwing” or “raising” an exception occurs when you decide that your function will immediately exit and return a special error value instead, which may also try to indicate what type of error it is. If this function is called from another function, it passes this value “up the call stack.” The calling function, if it implements similar behavior (which is usually does), will pass that value or perhaps a different one (but still an error-value) up its call stack as well. If all of the calling functions implement such behavior, it will propagate all the way to the top, ending the program.

The only time it won’t immediately end the program is if you decide to “catch” the exception, which means simply deal with the error and move on. However, this still causes the program to execute a different portion of code than it otherwise would. Also, this is somewhat language-specific.

So, the problematic behavior is the immediate exit. To see why this is a weird thing to do, imagine what would happen if it did not immediately exit in all cases. My guess is that, in the general case, your function would return gibberish. Your full program most likely would either keep running forever, run but produce garbled noise or gibberish, or run until some other termination conditions were met (user input, timer, or iterations max out).

So, why is that worse than immediately exiting on its own as soon as anomalous behavior is detected? Well, I don’t think it can be worse in all cases. But when is it worse? It certainly must exceed a high bar for overall disutility caused. When you immediately terminate the program, you are most likely throwing away any work done up until that point, and certainly any amount of work that would have been done after that. It requires you to re-run the program probably from the beginning, but at the very least from immediately prior to the function call at which the exception was raised.

So that disutility must be overcome by the disutility of incorrect work done. That will depend on how incorrect the work done is, how difficult it is to detect, and - if that work has to be passed on to someone or something else later on - how badly that would affect the subsequent processes or tasks.

Regarding this feature in language design, Stroustrup mentions that this could be a language choice and that he believes it is better for the immediate exit to occur by default[1]:

Should it be possible for an exception handler to decide to resume execution from the point of the exception? Some languages, such as PL/I and Mesa, say yes and reference 11 contains a concise argument for resumption in the context of C++. However, we feel it is a bad idea and the designers of Clu, Modula2+, Modula3, and ML agree.

Resumption is of only limited use without some means of correcting the situation that led to the exception. However, correcting error conditions after they occur is generally much harder than detecting them beforehand. Consider, for example, an exception handler trying to correct an error condition by changing some state variable. Code executed in the function call chain that led from the block with the handler to the function that threw the exception might have made decisions that depended on that state variable. This would leave the program in a state that was impossible without exception handling; that is, we would have introduced a brand new kind of bug that is very nasty and hard to find. In general, it is much safer to re-try the operation that failed from the exception handler than to resume the operation at the point where the exception was thrown.

My understanding is that different contexts, have, in practice, caused different protocols to be followed.[2] For example, high-reliability code used in avionics software is said not to use exceptions, because it is worse for the airplane’s engines to shut down upon an error being detected than for them to keep running even if there is a malfunction or abnormality somewhere.

That’s kind of obvious. However, what is not obvious - to me at least - is why a high-reliability context would justify less “exception throwing” behavior. At first glance, it would seem that “exception throwing” would normally be done the more risk-averse one is. By default, the “immediate termination” behavior of exceptions implies that the programmer is trying to avoid the risk of greater damage being caused from uncertainty in the outcome of the program when anomalous data is fed into subsequent function calls and processes.

When I was personally involved in writing a machine learning library codebase, I was presented with the option of using thrown exceptions in many places. Typically, these exceptions would most likely be thrown within the functions that implemented a “node” in a model-graph (e.g., a neural network or directed acyclic graph). These nodes depended on receiving input data and being able to perform accurate calculations on such data. Given that these functions had to be agnostic about what data would be received in the most general case, it is always possible that they would return values that were anomalous or undesirable in some way (e.g., null or numeric overflow).

During testing (which includes testing while running on real data, not just unit tests), it was often much easier to allow the full processing of a model to occur rather than have exceptions thrown, which often clog up the log files or immediately shut down the program. One thing we realized is that nodes should still be able to work correctly even if nodes preceding them do not. This is more-or-less what I think of when I think of “resilience”: Even if a piece of your program (or model, in this case) is broken, the program is only roughly around as broken as the percentage of pieces that are broken in it. When we allowed our software to run without stopping, we could also reliably and more quickly get better data about which parts of it weren’t working.

At the end of the day, your goal is to be able to correct issues and deploy the product quickly, as well as deliver results - hopefully incrementally more and better results. We were in a “high reliability” context as well: The answers had to be correct, but if they weren’t (which you know you couldn’t be sure about), they had to be at least well-calibrated answers.

So what brings about the opposite context, in which this “high-reliability” frame does not seem to apply?

One (speculative) explanation I have heard is that exceptions are used more routinely in software-business environments which involve contracts made between two parties (often both businesses). These contracts are often designed in a somewhat adversarial manner. In other words, the customer usually has somewhat of an asymmetric advantage over the vendor (especially if the latter is a small startup). Therefore, the customer has more power over the contract itself, and whether the startup survives at all may depend on whether these contracts are fulfilled to the letter.

Thus, it typically becomes more risky for the vendor’s software to provide a potentially malformed product than it is for the vendor to simply delay the satisfaction of the service. The contracts signed between the provider-of-services and the receiver-of-services may stipulate that the latter will be entitled to more damages if the services fail to meet specific standards. However, those damages may be worse if the customer is under the impression that the vendor is adequately providing services for a specific period of time, when in fact they might not be.

Generally speaking, this is simply the idea that a job poorly-done is worse than one not even started. When people are worried about reputation and embarrassments and things like that, that will typically exacerbate the problem.

I’m not sure if I agree that a job poorly-done is worse than one not even started. Not inherently, anyway. And if it is a reaction to one’s social context and pressures one faces, I am also not sure that I agree bending to such pressure is actually even either the most personally-beneficial nor even the most utilitarian thing to do.

The problem seems to be at least somewhat inherently philosophical, but not intractable. My experience philosophically dealing with the problem of “errors[3]” leads me to believe that this might be in the same class as similar social-problems that I have been writing about lately. If so, that may mean there is some low-hanging fruit here in the sense of potentially being able to correct bigger issues that have yet to be resolved.

  1. ^

    https://www.stroustrup.com/except89.pdf

  2. ^

    https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1947r0.pdf

    There have always been applications for which the use of exceptions was unsuitable. Examples Include:

    • Systems where the memory is so limited that the run-time support needed for exception handling crowds out needed application functionality.
    • Hard-real time systems where the tool chains cannot guarantee prompt response after a throw. That is not an inherent language problem.
    • Systems relying on multiple unreliable computers so that immediate crash-and-restart is a reasonable (and almost necessary) way of dealing with errors that cannot be handled locally.
  3. ^

    As an issue that gets progressively escalated by someone noticing an error and raising a new one on top of it, which itself becomes a risk for someone else to avoid.

13 comments

Comments sorted by top scores.

comment by faul_sname · 2023-08-21T20:39:42.211Z · LW(p) · GW(p)

Programmer by trade here.

Philosophically, I view exceptions as the programmer saying "we have entered a state that we either cannot or will not handle correctly, and so we will punt this back up the call stack, and the caller can decide whether to abort, retry, or do something else". Frequently, the reason for being unable to handle that state is due to a lack of context -- for example, if someone is writing library code that deals with HTTP requests, "what should the progam do in the event of a network issue" is something that the writer of the library cannot answer (because the answer may be different in different contexts). In these cases, punting the decision up the call stack seems to be the obviously correct thing (though there is a bit of a holy war in programming over whether it is better to do this explicitly or implicitly).

Both sides of that holy war will generally agree that thrown exceptions are a slightly odd pattern. In terms of why one might want to use that odd pattern, it's easiest to see the advantages of the pattern by looking at what happens when you remove it. One alternative to using that pattern is to do what Rust does, and return Result<oktype,errtype> for everything which can fail.

So let's take an example Rust program which fetches an OpenAPI schema from a server, then makes a GET request against an endpoint, and determines whether the result matches the schema. This is a fairly simple task, and yet with explicit error handling (and without using the try operator, which is Rust's answer to thrown exceptions) it looks like this. Happy path code is mixed with error handling code, and as such it can be difficult to verify the correctness of either when reading the code.

If you want to argue that the improved happy-path-readability of code which uses thrown exceptions is not worth it, I can get back to you once I finish convincing people that vim is obviously better than emacs.

comment by qjh (juehang) · 2023-08-21T19:17:19.928Z · LW(p) · GW(p)

I come from science, so heavy scientific computing bias here.

I think you're largely focusing on the wrong metric. Whether exceptions should be thrown has little to do with reliability (and indeed, exceptions can be detrimental to reliability), but instead is more related to correctness. They are not always the same thing. In a scientific computing context, for example, a program can be unreliable, with memory leaks resulting in processes often being killed by the OS, but still always give correct results when a computation actually manages to finish.

If you need a strong guarantee of correctness, then this is quite important. I'm not so sure that this is always the case in machine learning, since ML models by their nature can usually train around various deficiencies; with small implementation mistakes you might just be a little confused as to why your model performs worse than expected. In aerospace, correctness needs to balanced against aeroplanes suddenly losing power, so correctness always doesn't always win. In scientific computing you might have the other extreme, where there's very little riding on your program not exiting, since you can always do a bunch of test runs before sending your code off to a HPC cluster, but if you do run this thing and base a whole bunch of science off of it it better not be ruined by little insidious bugs. I can imagine correctness mattering a lot too in crypto and security contexts, where a bug might cause information to leak and it is probably better for your program to die from internal checks than for your private key to be leaked.

I’m not sure if I agree that a job poorly-done is worse than one not even started.

I think this is definitely highly context-dependent. A scientific result that is wrong is far worse than the lack of a result at all, because this gives a false sense of confidence, allowing for research to be built on wrong results, or for large amounts of research personpower to be wasted on research ideas/directions that depend on this wrong result. False confidence can be very detrimental in many cases.

As to why general purpose languages usually involve error handling and errors: they are general purpose languages and have to cater to use cases where you do care about errors. Built-in routines fail with exceptions rather than silently so that people building mission-critical code where correctness is the most important metric can at least kinda trust every language built-in routine to return correct results if it manages to return something successfully.

Edit: some grammatical stuff and clarity

Replies from: boris-kashirin, thoth-hermes
comment by Boris Kashirin (boris-kashirin) · 2023-08-21T19:32:15.163Z · LW(p) · GW(p)

I'd add that correctness often is security: job poorly done is an opportunity for hacker to subvert your system, make your poor job into great job for himself.

comment by Thoth Hermes (thoth-hermes) · 2023-08-22T15:57:22.958Z · LW(p) · GW(p)

This is a good reply, because its objections are close to things I already expect will be cruxes. 

If you need a strong guarantee of correctness, then this is quite important. I'm not so sure that this is always the case in machine learning, since ML models by their nature can usually train around various deficiencies;

Yeah, I'm interested in why we need strong guarantees of correctness in some contexts but not others, especially if we have control over that aspect of the system we're building as well. If we have choice over how much the system itself cares about errors, then I can design the system to be more robust to failure if I want it to be.

I think this is definitely highly context-dependent. A scientific result that is wrong is far worse than the lack of a result at all, because this gives a false sense of confidence, allowing for research to be built on wrong results, or for large amounts of research personpower to be wasted on research ideas/directions that depend on this wrong result. False confidence can be very detrimental in many cases.

I think the crux for me here is how long it takes before people notice that the belief in a wrong result causes them to receive further wrong results, null results, or reach dead-ends, and then causes them to update their wrong belief. LK-99 is the most recent instance that I have in memory (there aren't that many that I can recall, at least). 

What's the worst that happened from having false hope? Well, researchers spent time simulating and modeling the structure of it and tried to figure out if there was any possible pathway to superconductivity. There were several replication attempts. If that researcher-time-money is more valuable (meaning potentially more to lose), then that could be because the researcher quality is high, the time spent is long, or the money spent is very high. 

If the researcher quality is high (and they spent time doing this rather than something else), then presumably we also get better replication attempts, as well as more solid simulations / models. If they debunk it, then those are more reliable debunks. This prevents more researcher-time-money from being spent on it in the future. If they don't debunk it, that signal is more reliable, and so spending more on this is less likely to be a waste.

If researcher quality is low, then researcher-time-money may also be low, and thus there will be less that could be potentially wasted. I think the risk we are trying to avoid is losing high-quality researcher time that could be spent on other things. But if our highest-quality researchers also do high-quality debunkings, then we still gain something (or at least lose less) from their time spent on it. 

The universe itself also makes it so that being wrong will necessarily cause you to hit a dead-end, and if not, then you are presumably learning something, obtaining more data, etc. Situations like LK-99 may arise because before our knowledge gets to a high-enough level about some phenomenon, there is some ambiguity, where the signal we are looking for seems to be both present and not-present.  

If the system as a whole ("society") is good at recognizing signal that is more reliable without needing to be experts at the same level as its best experts, that's another way we avoid risk. 

I worked on dark matter experiments as an undergrad, and as far as I know, those experiments were built such that they were only really for testing the WIMP models, but also so that it would rule out the WIMP models if they were wrong (and it seems they did). But I don't think they were necessarily a waste.

Replies from: juehang
comment by qjh (juehang) · 2023-08-22T17:00:33.789Z · LW(p) · GW(p)

Yeah, I'm interested in why we need strong guarantees of correctness in some contexts but not others, especially if we have control over that aspect of the system we're building as well. If we have choice over how much the system itself cares about errors, then I can design the system to be more robust to failure if I want it to be.

This would make sense if we are all great programmers who are perfect. In practice, that's not the case, and from what I hear from others not even in FAANG. Because of that, it's probably much better to give errors that will show up loudly in testing, than to rely on programmers to always handle silent failures or warnings on their own.

I think the crux for me here is how long it takes before people notice that the belief in a wrong result causes them to receive further wrong results, null results, or reach dead-ends, and then causes them to update their wrong belief. LK-99 is the most recent instance that I have in memory (there aren't that many that I can recall, at least). 

Sometimes years or decades. See the replicability crisis in psychology that's decades in the making, and the Schron scandal that wasted years of some researchers time, just for the first two examples off the top of my head.

You have a cartoon picture of experimental science. LK-99 is quite unique in that it is easy to synthesise, and the properties being tested are easy to test. When you're on the cutting edge, this is almost by necessity not the case, because most of the time the low-hanging fruit has been picked clean. Thus, experiments are messy and difficult, and when you fail to replicate, it is sometimes very hard to tell if it is due to your failure to reproduce the conditions (eg. synthesise a pure-enough material, have a clean enough experiment, etc.) 

For a dark matter example, see DAMA/Libra. Few in the dark matter community take their result too seriously, but the attempts to reproduce this experiment has taken years and cost who knows how much, probably tens of millions.

I worked on dark matter experiments as an undergrad, and as far as I know, those experiments were built such that they were only really for testing the WIMP models, but also so that it would rule out the WIMP models if they were wrong (and it seems they did). But I don't think they were necessarily a waste.

I am a dark matter experimentalist. This is not a good analogy. The issue is not replication, but that results get built on; when that result gets overturned, a whole bunch of scaffolding collapses. Ruling out parameter space is good, you're searching for things like dark matter. Having to keep looking at old theories is quite different; what are you searching for?

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-08-22T18:28:03.324Z · LW(p) · GW(p)

I think your view involves a bit of catastrophizing, or relying on broadly pessimistic predictions about the performance of others. 

Remember, the "exception throwing" behavior involves taking the entire space of outcomes and splitting it into two things: "Normal" and "Error." If we say this is what we ought to do in the general case, that's basically saying this binary property is inherent in the structure of the universe. 

But we know that there's no phenomenon that can be said to actually be an "error" in some absolute, metaphysical sense. This is an arbitrary decision that we make: We choose to abort the process and destroy work in progress when the range of observations falls outside of a single threshold. 

This only makes sense if we also believe that sending the possibly malformed output to the next stage in the work creates a snowball effect or an out-of-control process. 

There are probably environments where that is the case. But I don't think that it is the default case nor is it one that we'd want to engineer into our environment if we have any choice over that - which I believe we do. 

If the entire pipeline is made of checkpoints where exceptions can be thrown, then if I remove an earlier checkpoint, then it could mean that more time is wasted if it is destined to be thrown at a later time. But like I mentioned in the post, I usually think this is better, because I get more data about what the malformed input/output does to later steps in the process. Also, of course, if I remove all of the checkpoints, then it's no longer going to be wasted work. 

Mapping states to a binary range is a projection which loses information. If I instead tell you, "This is what I know, this is how much I know it," that seems better because it carries enough to still give you the projection if you wanted that, plus additional information.

Sometimes years or decades. See the replicability crisis in psychology that's decades in the making, and the Schron scandal that wasted years of some researchers time, just for the first two examples off the top of my head.

I don't know if I agree that those things have anything to do with people tolerating probability and using calibration to continue working under conditions of high uncertainty. 

The issue is not replication, but that results get built on; when that result gets overturned, a whole bunch of scaffolding collapses.

I think you're also saying that when you predict that people are limited or stunted in some capacity, that we have to intervene to limit them or stunt them even more, because there is some danger in letting them operate in their original capacity. 

It's like, "Well they could be useful, if they believed what I wanted them to. But they don't, and so, it's better to prevent them from working at all."

Replies from: juehang
comment by qjh (juehang) · 2023-08-22T19:03:13.528Z · LW(p) · GW(p)

Remember, the "exception throwing" behavior involves taking the entire space of outcomes and splitting it into two things: "Normal" and "Error." If we say this is what we ought to do in the general case, that's basically saying this binary property is inherent in the structure of the universe. 

I think it works in the specific context of programming because for a lot of functions (in the functional context for simplicity), behaviours are essentially bimodal distributions. They are rather well behaved for some inputs, and completely misbehaving (according to specification) for others. In the former category you still don't have perfect performance; you could have quantisation/floating-point errors, for example, but it's a tightly clustered region of performing mostly to-spec. In the second, the results would almost never be just a little wrong; instead, you'd often just get unspecified behaviour or results that aren't even correlated to the correct one. Behaviours in between are quite rare.

I think you're also saying that when you predict that people are limited or stunted in some capacity, that we have to intervene to limit them or stunt them even more, because there is some danger in letting them operate in their original capacity. 

It's like, "Well they could be useful, if they believed what I wanted them to. But they don't, and so, it's better to prevent them from working at all."

If you were right, we'd all be hand-optimising assembly for perfect high performance in HPC. Ultimately, many people do minimal work to accomplish our task, sometimes to the detriment of the task at hand. I believe that I'm not alone in this thinking, and you'd need quite a lot of evidence to convince others. Look at the development of languages over the years, with newer languages (Rust, Julia, as examples) doing their best to leave less room for user errors and poor practices that impact both performance and security. 

comment by Dagon · 2023-08-21T20:20:32.706Z · LW(p) · GW(p)

Good topic, but I think you're missing some of the very good things about exception mechanisms.  

  1. The common-path behaviors can be coded, read, and understood a whole lot more simply when a large class of program/environment states can be handled in a different part of the code.  if you've ever written i/o code in C, you'll stop thinking that exceptions don't solve real problems.
  2. Languages with general-purpose disjoint return types are way harder to use, and most often just get exception handling added as a library.
  3. The vast majority of real-world code is multi-programmer, and clear communication of expectations and behaviors, even in unusual conditions, is not adversarial, it's cooperative.

It's important to note that exceptions do not necessarily indicate errors.  It just indicates a change from normal-path assumptions.  Testing for whether the internet is down or a disk is full with EVERY SINGLE function call that may or may not actually use it is ... tiresome.  Letting the actual library/syscall that does the thing fail in a visible way is just much smoother.  

You can catch and handle the condition at any level you like.  If you don't want to react (or you know for certain that it's not actually a problem), you can let it pass through. If you DO want to do something differently, catch it and handle it (and perhaps throw a new/chained exception if the catching function's behavior is unexpected by YOUR caller).

I call it elegant to acknowledge and handle both common-path intended uses and uncommon-path situations where the code CAN'T behave as expected.

comment by TAG · 2023-09-17T19:38:56.599Z · LW(p) · GW(p)

Exceptions don't have to be handled immediately in time, don't have to be handled where they are thrown, and don't have to cause an immediate halt like calling exit(). All of these things are deliberate and useful. Whichever code catches the exception has the option of ignoring it if its unimportant, or repairing the problem and re-trying the operation, if that is possible. That's why they are called exceptions, not errors. Now, if you have sketchy code that does not have a considered scheme to handle exceptions --throwing without catching -- then the default action is usually to halt. That isn't the fault of exceptions.

When a function or procedure is called in a program, it is typically expected to return something. The value that this function returns is going to be either one of a set of “normal” values, which are the ones you expect to receive if all goes well, or it could be an “abnormal” or “anomalous” value that only returns if “something bad” happens.

That's common, but bad. It's better to return status separately from result. Also, it's ignore-by-default which is not good -- you can have too few crashes as well as too many.

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-09-18T19:15:27.715Z · LW(p) · GW(p)

The issue that I'm primarily talking about is not so much in the way that errors are handled, it's more about the way of deciding what constitutes an exception to a general rule, as Google defines the word "exception":

a person or thing that is excluded from a general statement or does not follow a rule.

In other words, does everything need a rule to be applied to it? Does every rule need there to be some set of objects under which the rule is applied that lie on one side of the rule rather than the other (namely, the smaller side)? 

As soon as we step outside of binary rules, we are in Case-when-land where each category of objects is treated with a part of the automation that is expected to continue. There is no longer a "does not follow" sense of the rule. The negation there is the part doing the work that I take issue with.  

Replies from: TAG
comment by TAG · 2023-09-20T12:45:21.277Z · LW(p) · GW(p)

Exceptions in programming aren't "exceptions to a rule" , they are "potential problems".

Replies from: thoth-hermes
comment by Thoth Hermes (thoth-hermes) · 2023-09-20T13:06:57.563Z · LW(p) · GW(p)

I really don't think I can accept this objection. They are clearly considered both of these, most of the time.

I would really prefer that if you really want to find something to have a problem with, first it's got to be true, then it's got to be meaningful.

comment by Brendan Long (korin43) · 2023-08-23T21:46:00.002Z · LW(p) · GW(p)

For example, high-reliability code used in avionics software is said not to use exceptions, because it is worse for the airplane’s engines to shut down upon an error being detected than for them to keep running even if there is a malfunction or abnormality somewhere.

It's important to note that contexts that high reliability contexts that don't use exceptions don't just ignore errors; they typically just have much more explicit error handling logic. And the "no exceptions" thing is really more language-dependent, since requiring all exceptions to be handled (like checked exceptions in old-school Java) would be similar in practice.

The "ignore errors" style of programming has been tried, and I think it's been near-universally rejected. Once an error occurs, ignoring it and continuing usually doesn't do what you want anyway, so crashing and getting enough info to fix it is much more helpful than doing something you don't want. In cases where something really is optional, wrapping it in a try / catch is relatively easy, but having the language implicitly wrap every expression in a try / catch is really annoying.