Reason #35 to Like MacPorts

So the essay that Jonathan Goodwin and I wrote together using LDA topic modeling to explore the intellectual history of folklore studies is about to head into the _Journal of American Folklore_’s workflow and that means it has to get converted from LaTeX to Word. The way that conversion apparently works is:

LaTeX > HTML > Word

Fortunately, [someone has written a command line tool][], `latex2html`, that does the heavy lifting. And, thank you computing gods wherever, and whoever, ye may be, the tool is already in MacPorts. MacPorts makes it easy to find this out:

% port search latex2html
latex2html @2008 (print)
Convert LaTeX into HTML.

All that means is this:

% sudo port install latex2html

A whole lot of scrolled text later ends with:

—> Installing latex2html @2008_3
—> Activating latex2html @2008_3
—> Cleaning latex2html
—> Updating database of binaries: 100.0%
—> Scanning binaries for linking errors: 100.0%
—> No broken files found.

How do we use this tool?

> At this point, you’re ready to convert your files from LaTeX to HTML, and then possibly to Word. To invoke latex2html, switch to the directory with your .tex file, and latex2html filename. In its default state, latex2html will produce HTML that is broken up into multiple pages, usually one per section / subsection, much like the latex2html home page is. If you want to import your document into Word, you may wish to suppress this tendency. To do so, use the following command:

latex2html -split 0 -info 0 -no_navigation filename

> `-split 0` will make the entire LaTeX file into a single HTML page, while `-info 0` will remove the information bar at the bottom of the page and `-no_navigation` will remove the navigational menus on the top on bottom. This should produce a vanilla HTML file that Microsoft Word can read fairly easily.

> One thing to beware at this point: … Word will link to image files instead of including them in the document, which will mean that things like your equations will drop out if you send someone the .doc file without sending the image files as well. To fix this … go to Edit->Links, selecting all of the links in the dialog box, and clicking “Break Link”. Once that is done, save the file and the images will now be embedded into the document itself, ready for sending off to someone else.

`UPDATE: I just ran this, and it worked fine. And it was very fast.`

[someone has written a command line tool]:

(Visited 13 times, 1 visits today)

Leave a Reply