TeleRead: Bring the E-Books Home

News & views on e-books, libraries, publishing and related topics
August 31st, 2006

How do Google e-book files display on your PDA—and why can’t I find a free download of Around the World in Eighty Days?

By David Rothman

Bleak HouseSo far Liviu seems to be able to read Google PDFs on his Nokia 770. I haven’t been so lucky. Ironically, I got only some irritating legalese when I tried to display Bleak House on my Palm TX using Documents to Go, and the TX rendered the pages too small when I resorted to PalmPDF.

Bill Janssen, who’s far more enthusiastic about Google’s PDFish approach than I am, says Google is using an image compression scheme that many PDF viewers can’t understand yet. Nonproprietary e-book standards, anyone? Meanwhile I’d welcome other people’s observations and advice on the issue of reading downloaded Google files on PDAs, especially Palms.

I’m also curious if anyone can find a fully viewable PDF download of Around the World in Eighty Days on Google. Just curious. I might be overlooking something. If the Jules Verne novel isn’t there, however, what does this say about Google’s priorities? (Update, 12:34 p.m.: Found—but only after an advanced search, and with the title The Tour of the World in Eighty Days. For the typical searcher, the book might as well not be there. My big point remains.)

Jules Verne is the tenth most popular author of public domain works if you go by a Project Gutenberg list. But Google shareholders first, right?

Well, as a very small one, let me tell Google how it can place me first—by showing a little more interest in both e-book usability and a comprehensive, well-organized collection of free public domain downloads. I’m amazed that Google appears to have flunked even the World test. Who’s looking out for corporate values? Gordon Gecko?

Messes like Google’s are why so many people can’t wait for the full return of Blackmask. While I find David Moynihan’s copyright arguments to be problematic—despite my enthusiasm for the public domain— he is spot-on about the need for e-books to be presented in a PDA-friendly way.

For serious recreational readers, PDAs are very likely the main platform in use. That’s what an IDPF-commissioned survey seemed to suggest, and I’d heartily agree. If the Google Boys really want to Do No Evil, then they’ll be less evil toward my back and let me laze back and enjoy public domain works via a PDA.

Related: Jenny Levine’s DRM hassles with her video iPod.

Digg us! Slashdot us! Share the news.
  • Digg
  • Slashdot
  • del.icio.us
  • Reddit
  • TailRank
  • StumbleUpon
  • Technorati
  • Netvouz
  • YahooMyWeb

15 Responses to “How do Google e-book files display on your PDA—and why can’t I find a free download of Around the World in Eighty Days?”

  1. Try searching for the title “The Tour of the World in Eighty Days.”

  2. Had Google stuck to the ISO standard PDF/A subset (based on PDF 1.4) there would have been no problems with interoperability. They are using JPEG-2000 image format which was added in PDF 1.5 (Adobe Acrobat and Reader version 6). There is work underway to standardize PDF/A version 2 (PDF/A-2) and it is expected to support JPEG-2000. Google confuses matter by incorrectly reporting their PDF version (in Document/Properties menu) as “PDF 1.4″.

    Anyway this all just reinforces that PDF is indeed non-proprietary - and a full ISO standard at that, how much more de jure can you possibly want David? There are always proprietary extensions that go beyond base standards, that’s just a fact of life. Do you really think we need yet more standards for page-oriented final-form documents, such as those produced from scanning books? If so, why?

  3. Ryan and Bill…

    Ryan: Sorry. I still can’t find the real Verne novel, even under the name you gave. I’m seeing book reviews, for example. Got a specific URL for the actual book with a fully viewable and downloadable file?

    Bill: Within PDF, of course, standards are great. But I’m talking about a true mainsteam e-book standard. OK, now you can say PDF is the mainstream. ;-)

    Thanks,
    David

  4. ryanramseyer Says:
    August 31st, 2006 at 11:32 am

    I don’t know how well these URL’s transfer, but try this.

    If that doesn’t work, go here:

    http://books.google.com/advanced_book_search

    Plug “Around the world” into the “exact phrase” field, “jules verne” into the author field, and “0″ to “1922″ in the publication date field.

  5. Big thanks, Ryan. The long URL may have caused the filter to kick in, but at any rate I’ve restored your comment (always email or otherwise alert me if the anti-spam Dobermans are too quick). Plus, I reworded it to reduce the chance of the filter once again causing problems.

    Of course, so far, my basic point remains. When I tried a quick Google style approach, the book didn’t come up. And of course, it isn’t there under the title most people would expect. It might as well not be there.

    A good libray NOT!

    Thanks,
    David

  6. ryanramseyer Says:
    August 31st, 2006 at 1:36 pm

    “It might as well not be there.”

    I disagree, it wasn’t that hard to find. It took about 1.5 minutes this morning while the iron was warming up. I’m sure you would get dozens more hits if you only searched for “jules verne” pre-1922.

    You’ll just have to get used to using this particular card catalog.

  7. Munango-Keewati Says:
    August 31st, 2006 at 1:56 pm

    Google’s library project is a massive work in progress, with many relatively minor problems. Which books are include is basically random–whatever they’ve gotten around to scanning thus far. Many scans are flawed. There is a mechanism for users to point out problems, and fixes have been promised.

    I, too, hope Blackmask.com will return soon–but ranting against Google like this is really over the top and shows that you haven’t been following the development of the Book Search project very closely. There’s room for it all–the more digital libraries, the more ways people can find, obtain, and read books, the better.

    Step back, take a few deep breaths, and appreciate all that is being done. Really, many of these posts come across as nearly fanatical!

    M-K

  8. Ryan and M-K..

    Ryan: Heck, why shouldn’t a standard Google-style approach bring up the info? Why the catalog approach as a default? I’ll stick to my belief that Google is way off target. “Jules Verne Around the World” in one swoop should bring up the work. Why wasn’t the book there as “Around the World in Eighty Days”–how the title is popularly known? This is a great example of librarians and/or techies expecting the rest of the world to adjust to them rather than vice versa. Googoe should accustom itself to the needs of the market, not vice versa.

    M-K: Is it reallly so “fanatical” to insist on more usability for PDA users? What’s more, I’d feel better if Google weren’t so aggressive at branding. I’m grateful for the better side of the project but concerned about the overaggressive branding. I wonder how sensitive Google will be to these concerns.

    Thanks,
    David

  9. Munango-Keewati Says:
    August 31st, 2006 at 4:54 pm

    You’re expecting Google Book Search to be something that it’s not, at least yet, and may never be. Remember, the Book Search page page still clearly says “BETA.”

    As to their “branding”–I’m all for it. After investing millions of dollars, why shouldn’t they put their name on their product? Frankly, I’m surprised that they’re allowing free downloads.

    This project envisions making every public domain book ever published freely available to the world (an impossibility, of course, since some are undoubtedly lost or not in libraries). Such an undertaking was unimaginable even two years ago. And yet people complain!

  10. As long as you are only scanning the book and not ocr’ing it (and the proofing required for any official public release is a very tricky and labour intensive task), the choices are very limited for handhelds because ultimately what you get is a picture album of pages in whatever format that supports it.

    As mentioned, I do not read scans as pdf since it’s veeery slow, but I use the hack of extracting the pages as jpg’s, sizing them to my device screen (480×800 in the Nokia case) and embedding them in a blank html for Fbreader and fast reading. It is some work, but mostly automated and worth for the books I really want on my handheld and have no other choice of getting.

    Or I ocr (but not proof) and live with the results. Neither choice is completely satisfactory, but unless someone is doing the proofing for me that’s how it is.

    In the Google case, I tried to ocr only one page yet (from Dumas, Deux Dianes vol1) due to lack of time, and maybe it was luck and not indicative, but the results were unexpectedly good.

    Liviu

  11. Garson Poole Says:
    August 31st, 2006 at 11:34 pm

    David Rothman raises the topic of finding popular books such as “Around the World in Eighty Days” by Jules Verne with “Google Book Search” and commenter ryanramseyer offers some excellent search tips. This topic is treated in an article at the “Search Engine Watch” website entitled Download Books For Free From Google Book Search. The author makes several suggestions for improving Google’s valuable service such as a better ranking system so that “much better popular interest books” show up higher in the list of search results. Here is an excerpt about the difficulty of finding some popular books:

    I took this top ten list from Project Gutenberg:

    1. Fifteen Thousand Useful Phrases by Grenville Kleiser (334)
    2. The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle (250)
    3. Kamasutra by Vatsyayana (245)
    4. The Notebooks of Leonardo Da Vinci — Complete by Leonardo da Vinci (244)
    5. Pride and Prejudice by Jane Austen (220)
    6. How to Speak and Write Correctly by Joseph Devlin (197)
    7. The Victorian Age in Literature by G. K. Chesterton (187)
    8. The Art of War by 6th cent. B.C. Sunzi (170)
    9. Ulysses by James Joyce (168)
    10. The Hound of the Baskervilles by Sir Arthur Conan Doyle (165)

    I did quick searches using the Full View option along with the titles and author names. I couldn’t find any of them available at Google for download.

    To check this result I tried to find “Pride and Prejudice” by looking for the following phrase from the book “I had not thought Mr. Darcy so bad as this.” I found three versions of the book with recent publication dates and one version published in 1892. “The Novels of Jane Austen” is the title of the out-of-copyright downloadable version from 1892 and that may explain why it was not found with a simple search.

    A downloadable copy of “The Hound of the Baskervilles” can be found with a similar strategy. The phrase “I knew that seclusion and solitude” located a 1902 edition of the text that was downloadable. I have not tried to find the other books on the list.

    So, Google or another enterprising group may wish to construct a supplementary index of popular book titles and authors together with links into the Google database of out-of-copyright texts. This will simplify the task of finding popular resources.

  12. Anders Thulin Says:
    September 1st, 2006 at 2:44 am

    David Rothman notes that “Google is using an image compression scheme that many PDF viewers can’t understand yet. Nonproprietary e-book standards, anyone?”

    It is probably something else at work.

    The image compression seems to be /JPXDecode, which is just Adobe’s name for a subformat of JPEG2000, which is an ISO standard. This compression was introduced in PDF 1.5, but all Google files I’ve seen so far claim that they follow PDF 1.4. Adobe Reader doesn’t seem to care about such niceties, but I would not be surprised if other readers adapt to what the document claims to be. If it claims PDF 1.4 compatibility, JPXDecode would simply be an unknown decompression method, and for that reason the document would fail to read. (It’s just a guess: I haven’t verified it with xpdf or any of the other non-Adobe PDF readers.)

    Of course, JPEG2000 is not an entirely trivial standard to implement, so implementation errors is another reasonable hypothesis for any problems. Google also seem to use progressive image coding, which is not entirely common for this purpose: it tends to achieve a lower degree of compression than ’straight’ compression does.

    As to proprietary standards … well, if that was the reason, there would be no PDF reader from that implementer in the first place, so I rather doubt that is the problem: the proprietariness of PDF as a whole would be a greater obstacle to implementation than a image decoding format within that specification.

  13. I was dismayed to discover that I can’t copy/paste blocks of text from Google’s PDFs. I use snippets from old travel books in my database aimed at studying cultural change. Google books is a disaster area for me - and I suppose all other researchers who maintain collections of personal notes. The whole point about passing on knowledge through internet is that it opens access to new minds, new ways of thinking, new research, and copying/pasting ideas and being able to organise them electronically on your hard disk is an essential part of the process.
    Is there a free or inexpensive program for OCR’ing these pictures from Google? And how come you can do a word search in Google Books on any of these texts? Apparently the OCRing has already been done by Google but they aren’t sharing it.

  14. Kerry Murdoch Says:
    September 1st, 2006 at 11:23 am

    Unless they have some sort of image search text thingo?

    They probably wouldn’t release raw computer done OCR as it would not look very good, and they would get blasted - and they are seriously unlikely to human proof tens or hundreds of thousands of them.

    That would be my guess, anyway.

    Having them available so other people could do it would make sense though!

  15. I put in a comment to the Google Book Search team on this matter and they kindly (and quickly) replied with the following:

    “We do disable copy functionality for the materials we display on Google Book Search. I understand that this feature would be useful to you, so I’ve passed this idea along to the rest of the team for consideration. I appreciate your taking the time to offer us this feedback and encourage you to continue to let us know how we can improve Google Book Search. As this is still a young program, new features are under consideration and your feedback is very helpful.”

    This is phrased in a way that makes it sound as though “enabling copy functionality” would be a technically easy matter, but as Kerry Murdoch said it probably isn’t that simple.
    I’ll pass on to them his idea about making available their raw OCR files for eventual editing by third parties

Leave a Reply

Subscribe without commenting