TeleRead: Bring the E-Books Home

News & views on e-books, libraries, publishing and related topics
August 27th, 2009

Mobipocket Desktop vs. Google ePub: No yellow submarines, please!

By David Rothman

image Mobipocket Desktop is a proprietary creature, with a few notable exceptions—for example, the ability to import books from ePub, among other formats. 

But Mobi Desktop at least offers a first-rate interface for e-booking on a PC or laptop.

Imagine my hopes, then, when Google started letting us download ePub files directly without first visiting a Sony or Barnes & Noble site.

Alas, however, as you can see from this Mobi screenshot from 20,000 Leagues Under the Sea, the results are suboptimal. For some reason, at least in my case, ePub books are showing up with a horrid yellow background, accompanied by unwanted underlining. Remember Captain Nemo lurking under the seas? What we have, so to speak, is a yellow submarine.

So what’s happening? Can you replicate the problem? And is this a failure of Mobi’s conversion or of the original coding? My guess is Mobi. At any rate I have the same problem on both my Acer notebook and my HP desktop.

One way or another, my unpleasant little discovery shows the need to work toward a full-fledged ePub solution rather than rely so much on conversion schemes and the like. Amazon might well keep this in mind in deciding whether to let the Kindle read Mobi natively. I hope ePub nirvana—including accelerated standards development by the International Digital Publishing Forum, as well as a logo for nonDRMed ePub—will be with us soon.

About the typos: Yes, they abound in OCRed ePub books on Google. But I’d be surprised if Google did not address the problem in popular books such as Jules Verne’s works.

Digg us. Slashdot us. Facebook us. Twitter us. Share the news.
  • Digg
  • Slashdot
  • Facebook
  • Twitter
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Technorati
  • NewsVine
  • LinkedIn
  • MySpace
  • Suggest to Techmeme via Twitter
  • Netvibes
  • Turn this article into a PDF!

13 Responses to “Mobipocket Desktop vs. Google ePub: No yellow submarines, please!”

  1. I guess the blame can be pointed either direction. The ePub content files do have those styles in them:

    .flow .gstxt_hlt {
    background-color:yellow;
    }
    .flow .gstxt_underline {
    text-decoration: underline;
    }

    Mobipocket does not support multiple attribute levels in CSS, so those styles are read differently during the conversion to Mobipocket for display in Mobipocket Reader. And yes, that’s what Reader does: it uses an automated conversion process behind the scenes to make the ePub into a Mobi file.

    I think the bulk of the responsibility falls on Google’s shoulders. The quality of their ePubs is abysmal, not just because of the horrible OCR errors you mentioned, but also because their overall implementation of eBooks leaves a lot to be desired. Speed and quantity is not usually a good replacement for quality.

  2. Gosh, Josh, I was just about tempted to email you about this. Thanks for beating me to the punch with that informative comment! David

  3. Happy to chime in! As you can probably tell, I have strong feelings about the quality of eBooks, and I think Google’s effort so far has not improved the situation. We need better quality content, not just more content.

  4. I’d agree. Flooding the web with poor quality e-books is no way to encourage a nascent market. One more reason to deny the Google settlement (and surely their releasing the works in ePub was a way to get public opinion on their side, before people realized how dismal the ePub books were).

  5. Why not use Sony’s eBook Library or Adobe’s Digital Editions for EPUB? You should get better results.

  6. Felix Torres Says:
    August 27th, 2009 at 10:35 am

    I didn’t use to much worry about ebook quality; most of my reading came off Black Mask, the U of Virginia collection, Baen, and a few commercial lits. All were always high quality.
    And rendering and presentation never used to be an issue.
    So far, I have yet to find a single google book that doesn’t suffer issues and when combined with the “settlement” and the Adobe-ization of the second tier reader gadgets I’m starting to wonder what is going on.
    Do these folks seriously think the customers will be so happy to get *anything* in epub that we’ll settle for whatever substandard stuff they deign to put out?
    Last I heard, there are still alternatives…

  7. Igorsk: Depends on my mood. I dislike Mobi’s proprietary ways, but love the interface. Sony’s is pretty good for me. I HATE Adobe Digitial Editions’.

    Thanks,
    David

  8. Project Gutenberg is now distributing both epub and mobi books, and they’re pretty well formatted. Definitely better than Google’s offerings. And there’s always Manybooks.net, offering the classics from Gutenberg in many different file types. Not the greatest formatting, but usually not so bad either.

    http://www.gutenberg.org/etext/164
    http://manybooks.net/titles/vernejuletext942000010.html

    I’m also big on good quality formatting. You’re not doing me any favors if you’re giving me books I can’t read. Free doesn’t matter if I can’t read them. It’s like a little game – “Look at how great I am, I’m giving away a free ebook!” the catch being it’s too difficult to read. I’m not impressed.

  9. Why use Google books, if the one you want is available at Project Gutenberg (in HTML, ePub, Mobipocket or Plucker, I might add)?
    http://www.gutenberg.org/etext/2488
    I’d hate to think that my proofreading was wasted (not of this particular book, though).

  10. Google does have many books that Gutenberg doesn’t. I think it’s good to get these books out there for free even when Google doesn’t have the time to go back and reformat all their books to look beautiful. Having access to a poorly formatted ebook is better than no access at all. This was the decision I came to when I downloaded a book by H.L. Mencken, which Gutenberg did not have. At first, like David, I was converting with Mobicreator. I noticed the same problems with the yellow background and the underlining. The background wasn’t really a problem since I would be reading it on my Kindle but the underlining was really annoying. So I tried converting the book with Calibre. It got rid of both those problems. I recommend it. Then there’s just the OCR errors to deal with. I can deal.

  11. I think it’s good to get these books out there for free even when Google doesn’t have the time to go back and reformat all their books to look beautiful. Having access to a poorly formatted ebook is better than no access at all.

    I have to disagree. Do it right or don’t do it. Once a text is corrupt, it is much harder to fix the problem. Even PG has errors. For example, Heart of Darkness has an error in the first line (at least in .html and .txt) that has flowed through to the half dozen or so free versions available elsewhere in other formats. Now if these kind of mistakes don’t get fixed, imagine the problem if Google floods ebook sits with poor formating.

    Crap is crap and a lot of Google formating is crap. I don’t have the time or inclination for dealing with crap.

  12. I must say that I’m a little dismayed by some of the comments I see here.

    Google spends a few hundred million dollars to scan a few million books in an effort to bring to the world books that would otherwise not be available to but a few in ANY format.

    To make the books available in a format other than a pdf full of scanned images Google MUST rely on OCR’ing of the books they scan. OCR has an accuracy rate of anywhere from 0-100%. Given the variety of typefaces and page layouts and physical condition of the books we shouldn’t be surprised that the output text file of many, if not most, of these books is less than optimal.

    Not even Google has the resources to manually correct each of the millions of volumes and hand code a text based file format.

    Project Gutenberg has a small fraction of the books that Google has made available. All of the books in the Project Gutenberg collection relied on the generous donations of time of thousands of volunteers over a period of decades to provide a starting text and to correct text errors and provide decent formatting.

    Crowdsourcing will eventually eat away at the errors of many of the books from Google and Project Gutenberg. Obviously that will take time.

    Many folks are glad to be able to have access to the books available via Google and Project Gutenberg for free – warts and all.

  13. I absolutely agree that Google books are a great resource, regardless of quality of the OCR. I have already found several books dating from the 1700s which are simply not available elsewhere, for love or money.
    Is there any easy way of correcting the specific problems of background, underline and indenting with Google Epub and Mobipocket Creator?

Leave a Reply

Subscribe without commenting