TeleRead: Bring the E-Books Home

News & views on e-books, libraries, publishing and related topics

Archive for the ‘Internet Archive’ Category

Of old people and the things that pass

Wednesday, March 12th, 2008

By Branko Collin

Today marks the start of the Boekenweek, the Dutch week to promote books. This year’s motto is “Of old people…,” named after Louis Couperus’ classic 1906 psychological novel Of Old People and the Things That Pass… The theme focuses on old age, both in people and books, and has already been criticised by those who feel that youngsters should be encouraged to read books, not discouraged.

More interesting for the TeleReaders may be that Alexander Teixeira de Mattos’ classic translation of Couperus’ masterpiece has recently become available in many formats at the Internet Archive. If anyone would like a version that is more accessible (plain text, HTML, PDF), let me know and I’ll try and post one at my other blog. The Dutch version is available from DBNL.org.

Of Old People follows a couple of murderers in their old age, and their children and grand children, and shows how one gruesome act committed many years ago is felt in the family today.

(Picture: Louis Couperus. This entry also published at 24 Oranges.)

Make readable PDFs for your Sony Reader

Friday, September 14th, 2007

By Paul Biba

smalltype.jpgHere is a photo taken of a WOWIO PDF as it is displayed on my Sony Reader. Since the text doesn’t re-flow, you can see that it is almost unreadable. As a matter of fact, it is not almost unreadable, it is indeed unreadable.

Now here is that same PDF after it has been processed with a piece of free Windows software called pdflrf. As you can see the text is now readable.

pdflrf is available from the MobileRead forums and can be found here. It is now at version 0.7 and the zip file you will download has both a command line and Windows GUI version.

I also downloaded one of the Japanese fairy tale books from the Internet Archive in a black and white PDF format (as the Sony Reader does not display color). When loaded to the Sony Reader the PDF file produced only solid black pages. I then ran the original PDF through pdflrf and it produced a perfectly formatted file with all the illustrations intact.

This is a wonderful tool for any Sony Reader owner and many thank must go to MobileRead and the author for providing this free of charge.

Related: How to read e-books on (almost) any phone, from MobileRead.

‘Digital Text Masters’ (Digitizing the classic public domain books)

Tuesday, February 13th, 2007

By Jon Noring

Self Portrait of Rembrandt van RijnThe recent TeleBlog articles about the Project Gutenberg (PG) text Tarzan of the Apes (see 1, 2), suggest that not all is well in the existing corpus of public domain digital texts.

My personal experience the last twelve years in digitizing several public domain books has helped me to see a number of problems which I’ve mentioned in various forums, including the PG forums, and The eBook Community. For the sake of not turning this already long article into a whole book, I won’t cover here the complete list of problems I found, plus those found by others.

To summarize what I believe should be done to resolve most of the known problems, when it comes to creating a digital text of any work in the public domain, we should first produce and make available what we call a “digital text master,“ which meets a quite high degree of textual accuracy to an acceptable and known print source. From the “master,” various display formats, and derivative types of texts (e.g., modernized, corrected, composite, bowdlerized, parodied, etc.) can then be produced to meet a variety of user needs.

(Btw, what better example to illustrate the concept of a “digital text master” than to show the self-portrait of the great 17th century Dutch master painter, Rembrandt van Rijn, whose attention to detail and exactness is renowned.)

(more…)

Open Content Alliance, Internet Archive to ‘compete’ with Google Books

Friday, December 22nd, 2006

By Chris Meadows

From the Washington Post, spotted on Slashdot: The Open Content Alliance is funding the Internet Archive with a $1 million grant to allow them to digitize more books for their Open Library project. The OCA feels that there should be other major repositories of scanned knowledge than just one commercial search engine (though it is worth noting that two other commercial search engines are prominent OCA members). Meanwhile, Google has been subpoenaing Yahoo, one of the members of the OCA, as part of the discovery process in defending against the lawsuit that the Authors’ Guild is bringing against it.

Both articles note that the Open Content Alliance will virtuously restrict itself to works in the public domain, or for which they have explicit permission, and will make the contents available to all search engines whereas Google will be keeping its contents for itself and its member libraries. Still, the point of Google Books is to provide inside-the-cover search capability for all books, which includes orphaned works for which it is not possible to find the rights-holder in order to obtain permission—not simply those in the public domain. It has been estimated that only 4% of all books ever published are being commercially exploited—meaning that any book search that relies on asking permission for nonpublic-domain works will only ever be able to index a tiny fraction of available titles. And given that Google is footing the bill for their own effort, it only seems fair they reap the benefit. It is all very well for the OCA to provide its results to all search engines; however, given that the Open Content Alliance includes both Yahoo and Microsoft, who are the two largest search rivals to Google, they can afford to be benevolent since Google is the only competition that can worry them.

Interestingly, there seems to be some disagreement between allies Yahoo and the Authors’ Guild as to just how much control Yahoo has over the Open Library project. Yahoo claims that they are only providing funding but not control, whereas the Authors’ Guild refers to it as “Yahoo’s new venture.” Hmm…

Adding books to The Internet Archive

Tuesday, June 27th, 2006

By Branko Collin

Scanned public domain books lately to be read as a PDF or DJVU file on your PDA? Why not share them through The Internet Archive? TIA will take any book it can legally distribute. I wrote a small how-to for Distributed Proofreaders volunteers who wish to (pre-)publish high-quality scans of the books they are processing, and this how-to might be useful to others too.

In the USA, where The Internet Archive is based, a work is generally understood to be in the public domain if it was published before 1923.