Dotting i's and Crossing t's - a Journey to Publishing Elegance

post by wedrifid · 2012-03-14T21:23:28.381Z · LW · GW · Legacy · 15 comments

Contents

  Task
  Challenge
  Opportunity
  Attempted Workarounds
  Success!
  Optimal Decision Making
None
15 comments

More literally a journey to making the dots of the 'i's line up just right with the 'f's and ensuring that the crossing of 'T' meets up neatly with the tip of the 'h' - all without breaking text searching and copy and paste.

Task

Now, as we all know, science isn't just about little things like peer review and double blind placebo controlled studies. Far more important is presenting your work in accordance with the grand traditions of scientific publication - all while ensuring you flatter all the right people for their sometimes obsolete and possibly only slightly relevant past works. Of course you must do this all according to standard citation formulae developed a century or two ago back when the city in which a text document was published was somehow a useful piece of information.

Some may consider people like Galileo and Bacon to be the most influential figures in science but the man who made the greatest contribution to the way humanity seeks and disseminates knowledge is of course Donald Knuth. The man who took a decade off writing his multi-volume magnum opus [The Art of Computer Programming](http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming) to create TeX, the foundation of LaTeX and without which science as we know it would be unrecognizable. These days presenting academic publications without using LaTeX may be nearly as uncouth and banal as writing about your research in first person rather than than the passive voice!

The above cynicism is largely sincere and only a trifle exaggerated. Yet at the same time I acknowledge that there is much value to be had in wearing a uniform and the time for lonely dissent is not on matters as trivial as presentation. The overhead of presenting work in a form that other academics are willing to accept is comparatively minor and the payoffs significant.

One of the many initiatives lukeprog has set in motion now that he is organizing things over at SingInst is the porting of all of SIAI's past publications from various adhoc formats to LaTeX with a standard publication template. You can see an early example of the new format here.

Challenge

Unfortunately, Wei_Dai encountered a problem. In the first presentation of the converted document copy and pasting "The" would give something like "Ļe" and copying "fi" would give "ŀ". The problem is with the implementation of ligatures. Back when typesetting was done manually - I can only imagine using a whole bunch of little metal stamp like things that could be plugged into the right places - the typsetters had an extra collection of pseudo letters to use instead of combinations like "fi", "ffi" and "Th". The reason being that those particular combinations just don't look too good if they are placed together the same way that you would place them with other letters. You wind up with either having the too far apart or having parts of them overlap in a way that isn't particularly neat.

In the font SingInst uses the non-ligature versions of 'f' and 'i' combine with the dot of the 'i' only partially ovelapping the 'f' which somehow makes it jump out more easily to the reader. The way this is solved with the ligatures is actually increase the degree of overlap such that the f smoothly blends in to the i. Someone with far more highly honed aesthetic sense than I concluded that this is the best way to present English letters and it looks fairly good to me so I'll take their word for it.

The problem is that while ligatures are easy for humans to read "Notepad", "Word" and "Firefox" aren't nearly as smart. And unfortunately there isn't a consistent standard between fonts of which ligature means what so we end up with all sorts of random mess if we try to copy and paste from a ligature riddled document into our editor of choice. This left me with rather a lot of work to do while I was generating LaTeX files from those of the old SingInst publications that were only available in PDF form and that isn't a task I would wish on all the future consumers of SingInst literature.

Opportunity

Fortunately, the PDF format and the LaTeX are both advanced enough to handle making the visible text use the ligature characters while keeping the original text available for easy copy and pasting by the interested reader. This involves something called a 'cmap'. It is a mapping from an input encoding to the output encoding. With that cmap embedded in the pdf file any fully featured pdf reader is able to take the pretty text, strip apart the ligatures and figure out what they were originally.

Why then is Wei unable to copy our Th's and fi's? I haven't the slightest idea. My research suggests that the xelatex distribution we were using should just work and handle this sort of thing. So confident is it in managing such mappings that it outright rejects compatibility with the 'cmap' passage which could be used in the older 'pdflatex' compiler to handle this sort of task.

Attempted Workarounds

Success!

Optimal Decision Making

An analysis could be done on what the optimal problem solving strategy would have been at any point in that process. Among other things I would note that rather early on in the process I decided that the expected value of continuing to attack the problem was rather low - so I stopped billing Luke for the time. But since I really don't like being bested by a challenge I went ahead and did it anyway. Much frustration was involved but in this case I was rewarded with a large boost of personal satisfaction and with SingInst publications that are an iota or two more beautiful!

15 comments

Comments sorted by top scores.

comment by lukeprog · 2012-03-14T23:19:45.976Z · LW(p) · GW(p)

Isn't the punchline that you solved the problem? If so, that punchline is buried.

Replies from: wedrifid
comment by wedrifid · 2012-03-15T03:14:27.301Z · LW(p) · GW(p)

True, fixed.

comment by wedrifid · 2012-03-14T21:29:03.053Z · LW(p) · GW(p)

(In case you were wondering, yes, I did just write a post rather than just a reply to Wei's comment because I wanted to use "Dotting i's and crossing t's" as a double entendre in the title.)

comment by dbaupp · 2012-03-15T00:21:48.565Z · LW(p) · GW(p)

The TeX Stackexchange is a good place to ask questions about LaTeX. (As an example, this question is similar to the problem solved here.)

Replies from: wedrifid
comment by wedrifid · 2012-03-15T03:46:48.625Z · LW(p) · GW(p)

The TeX Stackexchange is a good place to ask questions about LaTeX. (As an example, this question is similar to the problem solved here.)

One of the many threads on TeX Stackexchange that I memorized while tackling the problem. Your point is well taken, however, I have never once treated forums like that as if they grant write access. I seek out solutions that are already there but don't bother trying to have people solve mine. Perhaps because most technical problems of this nature don't end up being nearly so intractable.

comment by CronoDAS · 2012-03-16T05:00:59.481Z · LW(p) · GW(p)

Regarding LaTeX: My father (a professor of engineering) once got annoyed that the journal he submitted to wouldn't accept his Microsoft Word file as a submission, asking for a LaTeX one instead. Having no clue what the heck LaTeX was and not wanting to learn any kind of crazy new system, he managed to get the journal to accept a PostScript document instead. (I showed him LaTeX and he said anyone who wants to learn a whole programming language so they can do what Microsoft Word will do just fine must either be crazy or a professional typesetter who has the job of formatting things for a dead tree edition.)

Replies from: jmmcd
comment by jmmcd · 2012-03-17T14:40:50.350Z · LW(p) · GW(p)

I had to submit two book chapters in Word format recently -- just finished fiddling with the references this morning -- and I've decided that in future I'll just refrain from submitting when Word format is required. So much pain in every step of the process.

Replies from: wedrifid, None, CronoDAS
comment by wedrifid · 2012-03-18T13:42:23.918Z · LW(p) · GW(p)

I had to submit two book chapters in Word format recently -- just finished fiddling with the references this morning -- and I've decided that in future I'll just refrain from submitting when Word format is required. So much pain in every step of the process.

But, but, you know LaTeX! Knowing word and having to submit in LaTeX is a nightmare. Knowing LaTeX and having to export to word is a matter of replacing your LaTeX to pdf converter with a LaTeX to word converter!

It's a googlesearch away.

Replies from: jmmcd
comment by jmmcd · 2012-03-18T18:12:40.492Z · LW(p) · GW(p)

Knowing LaTeX and having to export to word is a matter of replacing your LaTeX to pdf converter with a LaTeX to word converter!

Oh no it isn't! In fact that was the workflow I tried for one of the two chapters. It was really painful for many reasons. One of the big things that people always seem to overlook in this discussion (in both directions) is the need to use templates specified by the publisher. That messes up the workflow.

Of course, I do admit that I know LaTeX better than Word.

Replies from: wedrifid
comment by wedrifid · 2012-03-18T18:24:04.150Z · LW(p) · GW(p)

Word AND their template? Barbarians! What are they thinking? A research embargo upon them!

Replies from: jmmcd
comment by jmmcd · 2012-03-18T19:26:14.395Z · LW(p) · GW(p)

Can't tell if serious or joking :-/

Just in case, the point about the template is that you can't export from LaTeX to Word and then impose a template on that. You have to start from the template.

Have you published in academia? Almost all conferences, journals, and edited books do indeed require a template.

Replies from: wedrifid
comment by wedrifid · 2012-03-19T01:23:11.853Z · LW(p) · GW(p)

Can't tell if serious or joking

More or less serious with a touch of hyperbole.

Have you published in academia?

Yes, with LaTeX.

comment by [deleted] · 2012-03-19T01:42:29.289Z · LW(p) · GW(p)

Having to submit things in Word format is always hilariously painful.

comment by CronoDAS · 2012-03-18T02:02:05.844Z · LW(p) · GW(p)

What program would you rather be using?

Replies from: jmmcd
comment by jmmcd · 2012-03-18T13:19:15.351Z · LW(p) · GW(p)

LaTeX. There's a learning curve but I am long long past that. I don't feel that it gets in my way.