Monday, May 26, 2008

Evaluating the Quality of Electronic Texts

Lisa Spiro, director of the Digital Media Center at Rice University’s Fondren Library, has an excellent blog Digital Scholarship in the Humanities. A few weeks ago she posted a clear and detailed comparison of 5 collections of digital texts (Google Books, the Internet Archive, Project Gutenberg, Early American Fiction, and Making of America), concentrating on 6 factors that determine the usefulness of each for purposes of scholarship:

  • Quality of the scanning
  • Quality of the OCR/text conversion
  • Quality of the metadata
  • Terms of use
  • Convenience
  • Reputation in the world of scholarship

Presciently she omitted the soon-to-be-shuttered Microsoft Live Book search service.

Her posts are always worth reading, but this one is particularly recommended.

Friday, May 23, 2008

Microsoft Book Search Closing its Cyberdoors

Microsoft announced today that it had advised its partners that its Live Book and Academic search services will be shutting down next week, and that they're going to stop scanning entirely. Fortunately, the content already scanned will continue to be available through Microsoft's general search interface at

This is quite unfortunate, because they had done at least as good a job with their interface, provision of metadata, and search capabilities as Google has done with their book search — and often better — and their quality control has been excellent, far better than Google's. The collection made available through Live Books never extended beyond English-language titles (at least I've never seen anything there in another language) and was nowhere near as comprehensive as Google's, but it was a useful supplement.

Some reflections can be found at the Search Engine Land blog, and additional links to related news stories at Techmeme. Peter Suber also provides some extensive quotes from other blogs at his Open Access News blog, and Peter Brantley, executive director of the Digital Library Federation, has posted his thoughts at his blog.

ResourceShelf last year provided a useful set of links to other large digitization projects. Oddly, neither it nor the more comprehensive British Columbia site linked there includes Gallica at the Bibliothèque nationale; Gallica is an excellent source of not only French works (including not just books but maps, images, music, and manuscripts), but also of works in other languages, Latin in particular but yes, even some in English. Its interface is less than ideal, but it's usable.

Wednesday, May 21, 2008

Paleography Resources

Dave Postles of the English Department of the University of Leicester recently announced on Mediev-l the news that he has made available on-line (both directly and as downloads) some resources previously only on CD-ROM, including two tutorials on medieval and early modern paleography (with a self-assessment test; sample question: "Try to convert the regnal year 7 Edward I into the year of grace"), information on the history of the urban development of Leicester, and a book about Oseney Abbey, a medieval Augustinian establishment in Oxfordshire.

Sunday, May 18, 2008

Latin at the Vatican Web Site

The Vatican web site has added a seventh language to its six previous interfaces (German, French, Italian, English, Spanish, and Portuguese): Latin. This just came up on the Latinteach list but according to a Catholic News Service story, the Latin section went on-line on May 9th.

The contents includes documents of the last five popes, the Nova Vulgata translation of the Bible (which is a slightly cleaned-up Clementine Vulgate), the current version of the Code of Canon Law, the documents of Vatican II, and an assortment of curial documents, with a link at that page to information about the Latinitas Foundation (also in English), with a link there to a page with some entries from the Lexicon recentis Latinitatis published a few years ago by the Vatican. The selection includes:

pastillus tórtilis

práedium rei pecuáriae curandae

máizae grana tosta (pl.)

renovātus fascálium motus

mountain bike
bírota montāna

memóriae amíssio

punkianae catervae ássecla

Most entries at the page are in Italian, with a few in English or German.

Those wanting a more comprehensive list and who don't want to put out some or all of the $168 (plus shipping) that the Vatican Bookstore is asking for the full-length book should visit Florus' English-Latin page for a much briefer but still very useful word list.

Monday, May 12, 2008

Codex Sinaiticus and Microsoft Silverlight

Wieland Willker, owner of the New Testament Textual Criticism list, recently posted there that the 43 leaves of the important 4th century manuscript Codex Sinaiticus housed at the University of Leipzig have been made available on-line. He added, "You have to install a little Microsoft tool for the zooming functionality first" and then provided a link to the site.

Besides my interest in main subject of the posting, I was curious about this "little Microsoft tool" and quickly discovered from a response on the list that the site requires a Silverlight browser plug-in, Microsoft's alternative to Adobe's Flash. The author of the response lamented that Silverlight requires relatively recent versions of either Microsoft Windows or Mac OS X, with no provision for Linux, only a few browsers are supported, and the plug-in's proprietary nature ties it to Microsoft, which can't be trusted.

I responded to the list with some comments which I present below in edited form for those who aren't subscribed to that e-mail list and who might be curious about Silverlight.

Although Professor Willker referred to the plug-in only as required for zooming functionality, in fact the entire site is inaccessible without it. The specific OS and browser requirements can be found at Microsoft's web site. With some exceptions, it's currently supported only on Windows Server 2003, XP, and Vista on Internet Explorer and Firefox, and on Mac OS 10.4.8 and up on Firefox and Safari. Linux users are expected to be able to view Silverlight content sometime around the middle of this year via Moonlight.

Ars Technica has published a few articles about Silverlight and Moonlight, and this recent one about a presentation by the developer of Moonlight talks about some of its advantages over not only simple image presentations but even over Flash. In at least some cases, though, as at the Library of Congress money figures into the equation:

You're probably wondering why the LOC is using Silverlight instead of something more widely supported, like Adobe's Flash. The answer is, of course, money. As we reported back in February, Microsoft gave the LOC $3 million to put exhibits online using Silverlight...

The FOSS community in general is not fond of Silverlight. A couple of examples of arguments against it can be found here and here, although a response to the latter by Moonlight's developer should be read.

One Windows Vista user reported to the list that his computer showed a runtime error with both Internet Explorer and Firefox, but an XP user had no problem with it.

I was not able to install the Silverlight plug-in on my Mac OS 10.2 machine; as the system requirements I linked to above show, the minimum on that platform is 10.4.8 (the current version of 10.4 is 10.4.11; the most recent version of the OS is 10.5.2).

I succeeded with my 10.4 machine, although with a minor glitch. My main browser is Firefox, and after a quick download and painless installation of the plug-in, I restarted the browser and discovered on trying the site again that I was offered the same "Download Silverlight" semi-error message I'd been getting before I installed it. I logged out of my account and logged in and tried it again without success. I thought to check the site with Safari before restarting the computer prior to another attempt and it worked perfectly.

I had a guess the responsibility for the problem with Firefox might lie in an add-on. My first instinct was to suspect NoScript, but I don't run it on that computer. Scanning my other extensions I noticed Flashblock. I disabled it and restarted and that time the site loaded perfectly. It was ironic but reasonable that something designed to block the use of Flash also blocked Microsoft's "Flash-killer" (as Silverlight is commonly called, with greater likelihood of eventual accuracy than the Zune becoming an iPod-killer, as Microsoft had hoped).

A curious side-note is that Safari took me to an English-language interface for the web site and Firefox to the original German-language version. Neither seems to provide the capability to switch to the other language, although presumably anyone interested in the MS images wouldn't be likely to be seriously handicapped by that.

Unfortunately (although unsurprisingly given the few browsers explicitly supported) Silverlight doesn't work in Opera, nor in the last version of IE for Mac, now 6-1/2 years old and exceedingly long in the electronic tooth.

Friday, May 2, 2008

A Few Interesting New Testament-Related and Other Early Christian Titles at the Internet Archive

A recent post to the B-Greek list about the availability at the Internet Archive of the first and second volumes of The Grammar of the Greek New Testament by Moulton et al. reminded me that it had been a while since I'd done trial searches there for some keywords of interest (Latin, Greek, lateinisch*, griechisch* — and one nice feature of the site is that an asterisk works as a wild card).

Moulton and Milligan's useful study of the Vocabulary of the Greek Testament (1914-24) in light of the language of the papyri discovered in the late 19th and early 20th century and Moulton and Geden's Concordance to the Greek Testament (1897) are also available.

I was delighted by how many titles seem to have been added since I last poked around there, some (like Karl Staab's 1933 Pauluskommentare aus der Griechischen Kirche: aus Katenenhandschriften gesammelt und herausgegeben) still under copyright (published 1933). This is particularly surprising since it's currently in print. That title and quite a few other 19th and early 20th century German works have a notation "microform" or "microfilm," suggesting they've made their way to the Internet with film or fiche as an intermediate step. The page images I've seen are uniformly quite high quality, whether they've gone through filming or not.

Angelo Mai's edition of Codex vaticanus: Novum Testamentum graece ex antiquissimo codice vaticano (1859) is also at the site.

When I return to the Internet in a week or so after a trip to points north I hope to take some time to explore more extensively.