Saturday, January 23, 2010

Firefox 3.6 in Ubuntu

One problem with Linux is that it's common that the latest releases of cross-platform applications aren't easily available or as easily installed as on Mac and Windows computers, as Preston Gralla recently observed. Although I would dearly like to have disagreed with him, I have to say that after two unsuccessful attempts to install the recently released Firefox 3.6 using two different sets of instructions, I have to agree he's got a point. Although I succeeded with the directions here, I didn't know that I had without logging in and out, something I guessed at rather than was told. Other than that it was, as promised, "super easy": three simple steps with a few "y"s and a password.

My consistent experience is that Ubuntu is excellent for most purposes for "ordinary" users if you don't want to do anything unusual, including installing the latest versions of some software (Firefox and TeX Live among them). If you do, you may get lucky and be able to find instructions that will work the first time around; if not, possibly the second or third. But — and this is something I think about both Mac OS X and Windows — it really doesn't seem like it should be this hard. There are certain things I prefer about Ubuntu's software installation: Opening up a Terminal window and typing sudo apt-get install [whatever] works surprisingly often and is even easier than the process of finding and installing new software in Mac OS X and (in my very limited experience) Windows. But why can't it always be easy?

In my work as de facto tech support where I work (the one-eyed man among the more-or-less blind) when I look at what I have to do to fix even some simple things, things that I do so commonly they've often become second-nature to me at this point, I realize that to ordinary users they're entirely unintuitive. Admittedly complexity is a price that must be paid for power, but does so much have to be that complex? Well, maybe someday; here we are in 2010 and I still don't have my jet pack, after all.

Friday, January 15, 2010

Tesseract OCR

Optical character recognition is a useful capability to have and although I have it on my Mac thanks to the bundleware software, OmniPage SE, included on the CD that came with my flatbed scanner, I naturally wanted to be able to do the same on my Linux machine too. A search quickly turned up Tesseract as the best option for Linux and although for some reason the new-to-Karmic Ubuntu Software Center didn't turn it up, Synaptic Package Manager did and helpfully made sure I had everything I needed in the way of related packages.

There's a passage in Suetonius in which Hyginus' life gets a paragraph treatment, and although the English has been transcribed at Lacus Curtius from an old Loeb, the Latin hasn't been - at least there; at first I didn't search too far for a text version because I wanted an excuse to try out Tesseract on a text image and Google Books has kindly put that same Loeb volume on-line with both Latin and English.

I started off using GIMP to take a screen shot of the text at the Lacus Curtius site. There are a bunch of options out there for partial and full screen shots but GIMP lets you fiddle with the image to make it more readable and save it as the .tif that Tesseract requires. It was nice and easy in Gimp, too: File -> Create -> Screen Shot, followed by saving as an uncompressed .tif.

Tesseract is a command line utility with a simple syntax: tesseract [image file] [what you want to call the file with the OCR results]; in Ubuntu you can drag the file's icon from a File Browser (=Mac OS X's Finder) to provide Terminal with the correct path. The file with the results is put in your current working directory (i.e., your user folder unless you've changed directories since starting this Terminal session) and the file name you choose for your output will be provided with a .txt suffix automatically.

I did all this and got, well, dismal results, although by squinting you can see some resemblance between the original and what it produced. Here's the first line:


111.:.22 ¤|:Z ln» |1mH·| (3ai11s -111`Ii11s I—Iy*gi1111s., a f1`eecl111a11 c»f 4\11g11st11s a


I remembered in my researches that I'd come across some advice on processing a text image before running it through Tesseract although I'd thought it wouldn't be necessary clearly something had to be done and I decided this was a promising next step. I followed steps 4.2-3 (4.1 wasn't necessary) and ran it through again and this time:


20 EI Gaius Julius Hrginus, a freednian of Augustus and a Spaniard by birth (some think that he was a native of Ale·<andria and was brought to Rome when a boy by


Not bad, not bad at all.

And how did it work on the Latin from the Loeb image?


XX. C. lulius Hyginus, Augusti libertus, naticnc
Hispanus, (nnnnulli Alexandrinum pumnt et .1
Caesarc pucmm Rnmmn adductum Alexandria cupta)


Not so great. Maybe undoing and redoing, plus increasing the Threshold (Tools->Color Tools->Threshold...) as suggested at the page above?


XX. C. Iulius Hyginus, Augusti libertus, nntiune
Hispmus, (nnunulli Alcxandrinum putunt ct a
Caesar: pucrum Rumsm adductum Alcxandrin capta)


Some gains, some losses. Overall about a wash. A little disappointing since the text was fairly clear as scans of hundred-year-old books go, still probably a little better than retyping de novo.

All in all, I'm satisfied if not delighted.

Saturday, January 2, 2010

Punctuation and Diacriticals in Ubuntu

One hope — one might even call it a resolution — for this year is to get my Hyginus project published in some form. I'm fairly close to finishing my mark-up of the text into XeLaTeX, which is tedious but can't easily be automated. After that I plan to do a thorough revision with more comments in the source file and to check for additions to the literature on the Fabulae published since I finished the original version in AppleWorks (which was long enough ago that the application had stopped being developed but Apple hadn't yet totally pulled the plug on it).

Since I'm also hoping to learn more about Linux, I'm working on this project mostly on my ThinkPad rather than my Mac. Since I'm also continuing my quest to make the key commands of Vim second-nature so it will be in practice and not just in theory faster than a conventional text editor, I've been doing my mark-up using that editor; some time ago I started using it and added useful macros for LaTeX mark-up to my .vimrc file so I can add a variety of tags and move to the next line with easy pairs of keystrokes.

A few days ago while writing HTML mark-up in Vim for my day job, I was reminded again that I wasn't able to do my usual find-and-replace for single and double curly (a.k.a. "smart") quotes; in the past I've shrugged my shoulders and opened the file with gedit and used its more obvious interface to switch them. In Mac OS X you can type those curly characters easily with combinations of option, shift, and square brackets, but those particular key combinations don't work the same way in Ubuntu (or possibly Gnome or Linux in general). The obvious solution of copying and pasting that I use in my favorite mouse-centric Mac text editor, Smultron (sadly no longer under development), is foiled by Vim's modal nature: hitting p to paste simply types "p" into the find space in a substitute command. What to do?

After using gedit to clean up my work code and uploading the file, I remembered that sometime in the past I'd seen a Vim command that would provide information about a character under the cursor. A few seconds' search turned it up: ga. Now, would it be possible to use that information to type a character not transparently accessible?

As it turns out, yes, although it wasn't as obvious as it might have been. Using ga prints a line at the bottom of the window like this one for a question mark:

63, Hex 3f, Octal 077

Checking charts of code points it turned out that 63 is the decimal value of the question mark. I eventually found in the Vim documentation that using Control-V, with or without an additional letter, would let me type any character I wanted if I knew its code points; Control-V u, 16-bit hexadecimal, is most useful because — if my understanding of Unicode code ranges is correct — it'll work for any Western language.

Turning to Hyginus, the challenge there lies in producing macrons. One of the first things I did when I installed Ubuntu was to try to figure out how to add diacritical marks like accents, and I learned quickly that it's necessary to select a modifier key in System->Preferences->Keyboard->Layouts->Layout Options->Compose Key Position, which gives you seven options (although my keyboard lacks two of the choices). I picked the right control key and that's worked well for me. Key combinations for common (and some uncommon, like the dot-less i, ı, used in Turkish) Latin alphabet diacriticals are explained at this helpful page but macrons are not among those listed.

I found a partial answer by looking inside my /usr/share/X11/locale/en_US.UTF-8/Compose file: macrons are made by using the compose key and underscores. However, oddly, using the same keystrokes that successfully overline e, i, u, and y, a and o are instead transformed into the feminine and masculine ordinal indicators, small superscript letters that are underlined in some fonts as well. Unfortunately I'm not able to interpret the Compose file, which is not transparent in its meaning.

Interestingly, e, i, and u are all overlined by right-control--hyphen as well as by right-control--underscore (i.e., the shift key isn't necessary), but to overline y the shift key has to be used or the Japanese yen currency character is produced.

Fortunately, there's a simple solution for macrons for the 5 common vowels: switching to the Maori keyboard, a trick I learned from an early version of Mac OS X. Adding the Maori keyboard is pretty simple: System->Preferences->Keyboard->Layouts->Add...

And that's a handy ellipsis, because it inspired me to think to check whether Ubuntu might include something comparable to OS X's US Extended keyboard, and in fact there's a "USA International (with dead keys)" which creates macrons for all 5 vowels the way I´d expected. On the other hand, it also requires use of the right ALT key to create single and double straight quotes, so it's a mixed blessing.

Next layout research project? Polytonic Greek, which is apparently not well-served in its ancient form by the built-in "Greece Polytonic" keyboard. I'll probably start here.

Saturday, October 31, 2009

Google Chrome

Having tried out Google Chrome in its semi-native version (Windows 7 beta in VMWare Fusion on my Mac) I decided a few weeks ago to try out the Mac dev version. It installed fine and is fast and stable and has worked well for most although not all web sites. Today I decided to get the Linux version and after checking for a package with Synaptic - not there, as I suspected would be the case - I found a direct download at Google of a .deb file. It was one of the many cases of a Linux install being just as easy as in OS X: click to download, enter the system password when it's finished, click a few buttons and it appeared in my Applications->Internet menu. So far it seems fine (unlike Firefox, it works with the Last.fm web site) and it's nice to have yet another browser to choose from.

The one problem was in importing bookmarks. Today I also finally got around to installing Xmarks and signing up so I could merge and synchronize my bookmarks on my two personal machines. That eventually worked out fine, but I wasn't able to find where bookmarks were stored on my Jaunty machine. Both bookmarks.html files I found included only the handful of default bookmarks that came with the basic installation. I still have no idea where they are, but I eventually learned that by going into the Firefox menus in Bookmarks->Organize bookmarks... it's possible to export them as a file and then import them into Chrome.

I'm not giving up Firefox as my default browser, but Chrome is an attractive fast alternative.

Monday, October 26, 2009

Lewis and Short's Latin Dictionary

Thanks to A Third Way the old but comprehensive Lewis and Short Latin dictionary is now available in two more forms: both through a web interface and as a stand-alone Adobe AIR desktop application. The web interface joins Perseus at Tufts University, Perseus under Philologic at the University of Chicago, the Archimedes Project at Harvard, and the stand-alone joins the cross-platform Diogenes application.

I tested the web site with Firefox on my Mac (OS 10.5) and with Opera in Ubuntu Jaunty and the AIR app on both machines and it installed and ran flawlessly. Although it doesn't work with inflected forms, it provides a list of the ten entries before and after your entry so as long as the word you're looking for begins the same odds are good you'll find what you're looking for; when an entry for an irregular form is provided (e.g., tuli, lātus, the third and fourth principal parts of ferō), if the original dictionary includes a cross-reference it's provided, but not as a link. As with the other electronic versions and the print, macrons are shown for the main entry but not in the quotations. Search terms cannot include macrons (and in fact the not-found mālum was located at the very end of the M section in the context list of 20 rather than among the ma's), but entries without them generate results for words both with and without.

This is a 0.2 alpha version, and they write of their plans:

We are busy cataloging every word in the dictionary as to its part of speech, declension or conjugation, gender, and other grammatical information. In the future, this will allow searches for "every first declension word that begins with R" or "every third conjugation verb that's deponent". We will also be adding maps for the geographical entries, pictures of items where appropriate, and updating some of the less-than-modern English Lewis and Short sometimes present.


My thanks to Terrence Lockyer, who kindly posted about it to the Classics list.

Sunday, August 2, 2009

Three Successes, Two Failures with Jaunty

After reading about the browser Arora and being of the mind that one can't have too many browsers, I decided to try it out on my Jaunty ThinkPad. I found it easily enough via Add/Remove Applications, but was disappointed that the version in the repository was 0.5 and the web site offered 0.8, which is presumably 60% better than the older version. After doing some research I downloaded the source code, but although it unpacked into a large number of files and folders that looked like they would turn into a browser, I wasn't able to figure out how to get it to run.

I was more successful (eventually) with Adobe Reader and Foxit Reader. I have a .pdf that looks perfectly fine in Preview and Adobe Reader in Mac OS X, but in the default .pdf reader, Evince Document Viewer, it's converted to a hideous sans serif font with striking kerning errors. I decided to check other options, and even though I'm not fond of the time it takes Adobe Reader to start on OS X I decided to download it. It's available as a .bin file and although my first attempts to turn it into an application failed, finally following the instructions here, including the use of chmod 700, I succeeded with the minor quirk that it took me a while to figure out what to do to get it to run (double-click on acroread in Adobe Reader->Adobe->Reader9->bin; who would have suspected?). As I'd hoped, it looked fine.

Lifehacker just had an article on the best .pdf readers and that reminded me that Foxit is available for Linux. I downloaded that and successfully installed it and it renders the file as well as Adobe Reader, which is particularly nice in light of the exploits for Adobe programs that seem to crop up on a regular basis.

Although I wouldn't want to look at ugly output on a screen, my plan to was to print two pages of this .pdf and since I hadn't tried printing from the ThinkPad before I didn't know what to expect. I connected it to an old Epson Stylus Color 740 inkjet printer, hit the Print command, and ... nothing. Not entirely unexpected, but a mild disappointment. After a brief false lead in System->Preferences->Default Printer, I found System->Administration->Printing: pretty straightforward. Adding that printer was also straightforward; it asked if I wanted it to look for a driver, and after I approved and it found and downloaded it I printed a test page and then my two pages without a hitch. My even more ancient laser printer (an Apple LaserWriter 12/640PS) was seen by the computer but I unfortunately wasn't able to get it to work despite several attempts both using Add Printer and localhost:631; I kept getting a "bad device-uri" error even though it appeared that there was a driver for it.

So, installing Adobe Reader and Foxit and printing with the Epson inkjet worked with little difficulty, but running Arora and printing to the LaserWriter didn't. Sixty percent success in this recent batch of experiments isn't bad, I suppose, but I do with it all Just Worked. I'm reading Keir Thomas' Ubuntu Pocket Guide and Reference and I have some hope that with its help I'll eventually be able to figure out more useful things than how to switch screensavers.

Saturday, July 25, 2009

Wine, Lunascape, and IE in Ubuntu Jaunty

My most recent explorations in Ubuntu have involved Wine (a translation layer that allows you run Windows applications on Linux and other *nix platforms) and a couple of Windows browsers.

I had thought Wine was installed by default in Ubuntu but found I was incorrect. Downloading and installing it was quick and painless, thanks to Applications->Add/Remove.

Now for some Windows applications to try it out on.

I recently read about Lunascape, "the world's first and only triple engine browser," which uses the rendering engines underlying Internet Explorer (Trident), Firefox (Gecko), and Safari and Chrome (WebKit); Opera's Presto was not included, nor were a handful of other lesser-known engines. Intriguing, eh?

I downloaded and ran the set-up .exe file with no problem by right-clicking and selecting "Open with 'Wine Windows Program Loader'," but unfortunately several attempts to run it resulted in at best a browser window that darkened after about 10 seconds, after which a warning box opened notifying me that the application was not responding; and at worst, nothing. Further attempts to uninstall and reinstall didn't help and it's unclear how I can remove Lunascape from my Wine programs menu (except possibly by un- and reinstalling Wine itself?) but a bit of research turned up instructions here to remove it entirely after uninstalling it from the Wine menu. I had to use ls to get the official name of the Lunascape folder and rm -r to get rid of it, but I'm getting better at that than I was.

Internet Explorer was more of a mixed bag. By following these instructions I was able to get IE 6 installed and running, but neither the XP nor Vista versions of IE 8, downloaded directly from Microsoft, would open with Wine. The interesting thing about the IE 6 installation is that it requires using the command line to start it: /home/john/bin/ie6, although reinstalling ies4linux created a desktop shortcut to it. I think I'll get rid of the shortcut and just try to remember ~/bin/ie6. That shouldn't be too hard, should it?

At some point when I feel very brave I might try VirtualBox with some of these Windows apps.