Scientific Notation Options

post by jefftk (jkaufman) · 2024-05-18T15:10:02.181Z · LW · GW · 13 comments

Contents

13 comments

When working with numbers that span many orders of magnitude it's very helpful to use some form of scientific notation. At its core, scientific notation expresses a number by breaking it down into a decimal ≥1 and <10 (the "significand" or "mantissa") and an integer representing the order of magnitude (the "exponent"). Traditionally this is written as:

3 × 104

While this communicates the necessary information, it has two main downsides:

Instead, I'm a big fan of e-notation, commonly used in programming and on calculators. This looks like:

3e4

This works everywhere, doesn't mess up your line spacing, and requires half as many characters as writing it the traditional way.

There are a bunch of other variants of e-notation, but I don't like any of them as much:

One downside of "e" notation is that it comes off as less formal than traditional scientific notation. But unless you need to be read as maximally formal I think it's just better all around.

Comment via: facebook, mastodon

13 comments

Comments sorted by top scores.

comment by gwern · 2024-05-20T00:03:02.702Z · LW(p) · GW(p)

I think this framing conflates the question of input with that of presentation. The 'e' notation seems easiest to input to write - simple, unambiguous and reliable to parse, enterable everywhere - but it's not a good one to output to read, because if nothing else now it looks like it's multiplying variables & numbers etc.

They don't have to be the same. If numbers are written uniformly, they can be parsed & rendered differently.


And they should be. For example, I think that one of the things that makes calculations or arguments hard to follow is that they shamelessly break human subitizing and intuitive numeracy by promiscuously mixing units, which makes it hard to do one of the most common things we do with numbers - compare them - while not really making anything easier. This leads to "number numbness". ('Microsecond or millisecond'? Well, what's a factor of a thousand between friends?)

In much the same way that people sloppily will quote dollar amounts from decades apart as if they were the same thing (which is why I inflation adjust them automatically into current dollars), they will casually talk about "10 million" vs "20 billion", imposing a burden of constant mental arithmetic as one tries to juggle back and forth all of these different base units. Sure, decimal numbers or metric units may not be as bad as trying to convert hogheads to long fathoms or swapping between binary and decimal, but it's still not ideal.

It is no wonder that people constantly are off by orders of magnitude and embarrass themselves on social media when they turn out to be a factor of 10 off because they accidentally converted by 100 instead of 1000, or they convert milligrams and grams wrong and poison themselves on film. If someone is complaining about the US federal government, which is immediately more understandable: "of $20 billion, $10 million was spent on engineering a space pen" or "of $20,000 million, $10 million was spent on a space pen"? (And this is an easy case, with about the most familiar possible units. As soon as it becomes something like milligrams and grams...)

I mean, imagine if this was normal practice with statistical graphs: "oh, the blue and red bar columns, even though they are the same size in the image and describe the same thing, dollars, are actually 10× different. Didn't you see in the legend where it clearly says that 'blue = 1; red = 10'?" "Er, OK, but if they're the same sort of thing, then why are some blue and some the larger red?" "No reason. I just didn't feel like multiplying the blue datapoints by 10 before graphing." "...I see."

So while it might look a little odd, I try to write with a single base-unit throughout a passage of writing, to enable immediate comparison. (I think this helps a lot with DL scaling too, because somehow when you talk about a model having '50 million parameters' and are comparing it to multi-billion parameter models like a "GPT-3-175b", that seems a lot bigger than if you had written '0.05b parameters'. Or if you compare, say, a Gato with 1b parameters to a GPT-4 with 1,400b parameters, the comparison feels a lot more intuitive than if I had written 'a GPT-4 with 1.4 trillion parameters'.)

This practice might seem too annoying for the author (although if it is, that should be a warning sign: if it's hard for you, the author, to corral these units while carefully writing them, how do you expect the reader to handle them while skimming and reading?), but it could just be automated. Write all numbers which are numerical in a standard format, whether it's 10e2 or 1,000, and then a program can simply parse it for numbers, take the first number, extract the largest base that makes it a single-digit number ("thousand") and then rewrite all following numbers with that as the unit, formatted in your preferred format as '1 × 102' or whatever.

(And you can, for HTML, make them copy-paste as regular full-length numbers through a similar trick as we do to provide the original LaTeX for math formulas which were converted from LaTeX, so it can be fully compatible with copy-pasting into a REPL or other application.)

Replies from: ben-lang, Three-Monkey Mind
comment by Ben (ben-lang) · 2024-05-21T11:23:06.021Z · LW(p) · GW(p)

Good ideas.

A gripe of mine in the same vein is that my old employer had this idea that in any public facing communication "numbers up to ten must be written in words, 11 or higher in digits". I think its a common rule in (for example) newspapers. But it leads to ludicrous sentences like "There are either nine, ten or 11 devolved administrations depending on how they are counted." It drives me completely crazy, either the whole list should be words or numerals, not a mix.

comment by Three-Monkey Mind · 2024-05-20T21:01:52.790Z · LW(p) · GW(p)

I'd like to second this comment, at least broadly. I've seen the e notation in blog posts and the like and I've struggled to put the × 10 in the right place.

One of the reasons why I dislike trying to understand numbers written in scientific notation is because I have trouble mapping them to normal numbers with lots of commas in them. Engineering notation helps a lot with this — at least for numbers greater than 1 — by having the exponent be a multiple of 3. Oftentimes, losing significant figures isn't an issue in anything but the most technical scientific writing.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-18T16:47:01.925Z · LW(p) · GW(p)

Yeah, agreed. Also, using just an e makes it much easier to type on a phone keyboard.

There are also other variants, like ee and EE. And also sometimes you see a variant which uses only multiples of three as the exponent. I think it's called engineering notation instead of scientific notation? So like 1e3, 50e3, 700e6, 2e9. I also like this version less.

Replies from: noggin-scratcher, JenniferRM
comment by noggin-scratcher · 2024-05-18T18:38:08.138Z · LW(p) · GW(p)

Sticking to multiples of three does have a minor advantage of aligning itself with things that there are already number-words for; "thousand", "million", "billion" etc. 

So for those who don't work with the notation often, they might find it easier to recognise and mentally translate 20e9 as "20 billion", rather than having to think through the implications of 2e10

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-18T18:53:09.741Z · LW(p) · GW(p)

Yeah, that's probably the rationale

Replies from: AnthonyC
comment by AnthonyC · 2024-05-18T21:43:25.121Z · LW(p) · GW(p)

Makes me wonder if there's an equivalent notation for languages that use other number word intervals. Multiples of 4 would work better in Mandarin, for example.

Although i guess it's more important that it aligns with SI prefixes?

Replies from: nathan-helm-burger
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-05-18T23:27:55.715Z · LW(p) · GW(p)

Well, the nice thing about at least agreeing on using e as the notation means its easy to understand variants which prefer subsets of exponents. 500e8, 50e9, and 5e10 all are reasonably mutually intelligible. I think sticking to a subset of exponents does feel intuitive for talking about numbers frequently encountered in everyday life, but seems a little contrived when talking about large numbers. 4e977 seems to me like it isn't much easier to understand when written as 40e976 or 400e975.

comment by JenniferRM · 2024-05-19T19:32:05.879Z · LW(p) · GW(p)

There is a bit of a tradeoff if the notation aims to transmit the idea of measurement error.

I would read "700e6" as saying that there were three digits of presumed accuracy in the measurement, and "50e3" as claiming only two digits of confidence in the precision.

If I knew that both were actually a measurement with a mere one part in ten of accuracy, and I was going to bodge the numeric representation for verbal convenience like this, it would give my soul a twinge of pain.

Also, if I'm gonna bodge my symbols to show how sloppy I'm being, like in text, I'd probably write 50k and 700M (pronounced "fifty kay" and "seven hundred million" respectively).

Then I'd generally expect people to expect me to be so sloppy with this that it doesn't even matter (like I haven't looked it up, to be precise about anything) if I meant to point to 5*10^3 or 5*2^10. In practice I would have meant roughly "both or either of these and I can't be arsed to check right now, we're just talking and not making spreadsheets or writing code or cutting material yet".

Replies from: jkaufman
comment by jefftk (jkaufman) · 2024-05-21T10:52:51.003Z · LW(p) · GW(p)

FWIW, I read 700e6 the same as 700M or 7e8. If someone was trying to communicate significant figures I'd expect 7.00e8.

Replies from: JenniferRM
comment by JenniferRM · 2024-05-21T15:07:44.957Z · LW(p) · GW(p)

I see it. If you try to always start with a digit, then always follow with a decimal place, then the rest implies measurement precision, and the mantissa lets you ensure a dot after the first digit <3

The most amusing exceptional case I could think of: "0.1e1" :-D

This would be like "I was trying to count penguins by eyeball in the distance against the glare of snow and maybe it was a big one, or two huddled together, or maybe it was just a weirdly shaped rock... it could have been a count of 0 or 1 or 2."

comment by C. Pannell · 2024-05-18T23:25:20.114Z · LW(p) · GW(p)

As an engineering student, I often use each of these methods. I consider each of these options a tool in my toolbox. I'll often specifically choose one based on the problem I am attempting to solve and my intended audience. Here are a few observations I would add:

Scientific notation ()

  • This can be converted to a standard number by any person that understands exponents and multiplication. As a result, it is still legible to any without a formal introduction to scientific notation, and for those who may lack the intuitive understanding of the need for scientific notation that comes from doing many calculations across orders of magnitude.

e notation ()

  • If typesetting is lost on copy and paste, I have seen  and  both become . Using a capital or smallcaps e can help distinguish between the two.

Engineering notation ()

  • Some calculations involve multiplying and dividing many constants and variables in series. If it is important to understand the meaning of each term on its own and how each term contributes to the overall result, engineering notation can make the calculation significantly easier to follow. Such calculations often occur in engineering, hence the name. Fermi approximations are also a good example.  (https://en.wikipedia.org/wiki/Fermi_problem)
comment by BlackCat · 2024-05-19T00:07:50.043Z · LW(p) · GW(p)

I’m the olden days we used a carrot (^) to indicate exponentiation. I don’t like using the character “e” because it’s already well-established as the base for natural logarithms.