TeleRead: Bring the E-Books Home

News & views on e-books, libraries, publishing and related topics
July 24th, 2009

Kindle books ‘riddled with typographical and formatting errors,’ says Bloomberg columnist

By David Rothman

image I’m curious. Do Kindle books have more or fewer typographical errors than e-books from other sources? Or is this too hard to say—since many publishers may just be doing conversions from the same files?

Typos and formatting errors of course are an old story in e-bookdom, and the latest to raise the issue is Rich Jaroslovsky, a columnist for Bloomberg News:

Virtually every e-book I downloaded was riddled with typographical and formatting errors, the result of the process that translates the files publishers use to print physical books…

The problems ranged from the benign — strange gaps in the middle of a sentence, for example — to the bizarre. In Larry Tye’s “Satchel: The Life and Times of an American Legend,” a biography of the great baseball player Satchel Paige, the apostrophe is dropped every time he uses the term “blackball’s,” referring to the Negro Leagues; hyphens are inserted in inappropriate places and the odd period is missing. 

According to Jaroslovsky, the problems he writes about are not present in the paper-editions. Jaroslovsky says it’s hard at this point to pin down blame. What/who do you think is responsible?

Digg us. Slashdot us. Facebook us. Twitter us. Share the news.
  • Digg
  • Slashdot
  • Facebook
  • Twitter
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Technorati
  • NewsVine
  • LinkedIn
  • MySpace
  • Suggest to Techmeme via Twitter
  • Netvibes
  • Turn this article into a PDF!

22 Responses to “Kindle books ‘riddled with typographical and formatting errors,’ says Bloomberg columnist”

  1. Some conversion programs have problems with smart quotes, with ellipses, and with em-dashes, to name a few. I try to go through my original files and replace em-dashes with double-hyphens, ellipses with three periods, that kind of thing. Still, with so many formats to convert to, errors sometimes sneak through.

    Rob Preece
    Publisher

  2. The kindle books I’ve read have all had typographical errors in them, but those error rates about match what I’ve need in paperback novels. Come to think of it, the hardbacks I’ve read also had typos, but I’ve excused those as being first-edition errors, paperbacks I’ve written off to fast work, and Kindle books as having used the paperback files (I assume).

    Pirate ebooks are loaded with errors, mostly.

  3. For Kindle ebooks it’s the publisher’s problem.

    Either they’re creating and proofing the Kindle ebook themselves, in which case it’s their problem

    Or they’re sending some other format to Amazon for automatic conversion to a Kindle ebook, in which case it’s their problem for using an automated system and not checking the results.

    Ebooks should have no more errors than paper books, but because the market is much smaller at the moment, publishers don’t seem to be proofing them at all, just hoping that nothing goes wrong.

    A bit of a cheek, considering that they mostly charge the same for the ebook as for a paper book.

  4. borax99 (AlainC.) Says:
    July 24th, 2009 at 12:25 pm

    I’m not American, so I’m out in the Kindle cold. However, I have found muy muy errors in some of my Sony titles, whereas I have found very few in my ereader (pdb) titles. Ultimately, this has to do with the publisher though, not the platform. Lack of proofreading. If you don’t believe me, ask anyone who has paid for a James Bond title on the Sony. Some of the spelling errors are downright igreedjus…

  5. Eric Montgomery Says:
    July 24th, 2009 at 12:49 pm

    One would think that in this day and age that the original “digital manuscript” (ostensibly a word document) would be used to create the digital copy.

    The conspiracy theorist in me makes me wonder if they aren’t leaving some of those errors (or even inserting some) in order to annoy people enough into not buying e-books in favor of the ultimately more profitable dead tree versions.

  6. This has to do with the conventional workflow that can be found in most publishing houses.

    An author writes his manuscript in Word. After the manuscript has been revised by an editor, the Word file is sent to typesetters, who import the document in InDesign or Quark. They apply all neccesary formatting and export a pdf file.

    After that, the editor (and sometimes the author) checks the exported (typeset) pdf for errors. The corrections are processed by the typesetter, but more often than not, the errors are not corrected in the original document.

    When the publisher decides to create an ebook version of the book, the pdf file can’t be used, so they have to work with the original Word document again. But this document still contains many errors and typo’s, which were corrected in the print-ready pdf, but not in the manuscript… There you go!

  7. The worst problems I have seen have been in older, often out of print books converted to take advantage of the new format to sell. The most common problem is paragraphs not indenting. Very hard to read!

    I have occasionally seen problems in new releases, usually relating to quote marks and special characters.

    It looks to me like there is no QA step before the book hits the Kindle store.

  8. Eric Montgomery Says:
    July 24th, 2009 at 1:24 pm

    @Wiebe: Thanks! That explains a lot and shoots a crappy conspiracy theory down. :o ) I appreciate the info on how it’s done!

  9. Garson O'Toole Says:
    July 24th, 2009 at 1:29 pm

    There should be a mechanism for readers to easily and quickly report ebook typos and other lacunae to publishers. This information should be used to create new improved documents for rapid download, and readers should be notified. Open-source advocate Eric S. Raymond famously pronounced Linus’s Law, “Given enough eyeballs, all bugs are shallow”. The buzzword neologism crowdsourcing is already a few years old. Authors, editors, and publishers that are able to efficiently gather typos and repair manuscripts would achieve buzzword compliance.

    This blog would also benefit from a link labeled “typo” so that orthographic feedback is easy to transmit. Currently, I send an occasional email when I spot a typo, but I am indolent, and unable to fix all the typos in my own writings.

  10. It might have been helpful if Amazon had consulted publishers before they developed the Kindle. But since they didn’t, the problem of converting texts has been left up to many struggling organizations that lack the necessary technical expertise in-house.

    Formatting text is not easy at the best of times. When your source material is in a number of different formats and the target technology is opaque, good luck. Amazon could fix this problem by releasing the Kindle’s code, but I’m not holding my breath.

  11. @Wiebe’s description for formating errors.

    This is sort of what I sort of thought was going on. I have always called the PDF the enemy of ebooks.

    Your can’t read a PDF on the Kindle without conversion and converted files are full of strange problems and errors. The Sony’s native support is also imperfect from what I’ve heard. The DX can read PDF files a bit better, but a lot of people are saying it is not as good as Kindle formatted ebooks. I don’t know about the Plastic Logic reader, but my guess is that reading a PDF on it will still be troublesome.

    As ebooks are on the rise, maybe it is time for publishers to give up the PDF. Put it to rest with VHS, cassette tapes, and floppy disks. There needs to be one format, not only for all reading devices, but for publisher’s as well.

    Whoever comes out with the software that will allow publishers to send a file to the printers or to an ebook from the same source document is going to put a stake into the heart of Adobe.

  12. Obviously the problem is half-assed OCR programs and using non-native speakers (e.g., Indians or Filipinos) to “correct” the errors.

    Rob Preece had better learn fast about Unicode. We left US-ASCII behind two decades ago.

    Tagged PDFs, which InDesign can natively create if you tick one box and leave it that way, generally solve the problem of exporting plain text and certain kinds of XML from PDF. So, Wiebe, the solution is better PDFs.

    Indention of paragraphs is as simple as p+p { text-indent: 2em; /* or choose your own value */} in your CSS.

  13. I’ve had incredible problems with Amazon ebooks. Poor formatting. Outrageous page display. Enough to put me off buying from Amazon. Since there are professionals who can make the transition, I am surprised they are not using it. I dont have the time to complain, so I dont buy ebooks for my Kindle anymore, and will sell on eBay.

  14. Joseph Gray Says:
    July 25th, 2009 at 4:21 am

    I wouldn’t put it quite as bluntly as Joe Clark did, but I’d have to agree with him that Rob Preece shouldn’t be dumbing down his ebooks to ASCII. I much prefer ebooks with real quotes, em-dashes and such. And what about foreign language characters? There are many English language texts that contain foreign words.

  15. Poor formatting. Outrageous page display.

    That’s a problem with the Kindle. I can create any CSS I want and mark up my XHTML any way I want and the rendering will still screw my work six ways from Sunday. However, I’ve learned that if I do the MOBI first (which will honor most of what you ask it to do), then use that file for the Kindle store, it works a lot better than a straight XHTML file.

    I much prefer ebooks with real quotes, em-dashes and such. And what about foreign language characters? There are many English language texts that contain foreign words.

    Agreed. I keep a chart printed out and handy. The proper diacritical marks make me think the publisher at least tried, even if there are still typos here and there.

  16. The Kindle needs html entities rather than curly quotes, em-dashes, ellipses, and accented characters. There does seem to be some confusion, nay downright contradictory bits of advice, in Amazon’s own DTP pages, concerning whether the entities need to be numerical or named.

    An allied problem I’ve seen on the web is news items where the website publishes the page in one encoding (iso-8859-1 for example) but the contributing writer (maybe using WinWord on his MS-Windows PC) has one or more characters in win-1252 encoding. These are not consistent within the document, for example a straight apostrophe will show up, then there is the symbol for unreadable character at another place where an apostrophe should go, such as the ‘blackball’s’ example; or else there is only a space rather than the symbol. Most of the items listed as problems fall into this category.

    The other kind (the inapproprate hyphens) might well be optional or ’soft’ hyphens being translated as true of ‘hard’ hyphens — but it’s hard to judge, since Mr Jaroslovsky is writing as a layman without the necessary experience in OCR and word processing.

  17. I don’t have a kindle so I can’t compare to books from the kindle store, but generally speaking, the ebooks I’ve read have a lot of formatting problems. This includes some big name, big publisher ebooks. Some sites definitely provide very very bad, as in unreadable, ebooks. Scribd and Smashwords come to mind.

    Typos are far more likely when the book hasn’t been professionally edited, which has become common since self publishing and small online publishing is relatively easy now.

    I don’t think there is any one program that does a good formatting job (not that I know of for sure). A person has to put forth effort to tweak the formatting to get a good result. If it’s a professional big publisher book, I think they need to be putting forth the effort, they are certainly charging us enough money.

    By the way, I have a small collection of well formatted free ebooks at the bottom of my web page, if you want to have a look. Working on a new addition currently.

  18. Another thing to consider when talking about ebook formatting is what software are you using to read the file. I make prc files because that’s what my cybook reads. I’ve looked at my prc files in different programs such as Stanza, FB Reader, Calibre, and much of the formatting that I know is in the file, and that shows correctly in Mobipocket reader, is not displayed or displayed correctly by these other readers. Some of these won’t even display a plain html page correctly. Stanza in particular will completely rewrite the formatting regardless of what is written into the file. So be aware of the limitations of the software you choose.

  19. At my newspaper we always ran wire service stories and syndicated columns through a macro that knocked down em-dash and such. The macro had nearly a hundred tests, as I recall, but there were still corrections needed. We were always tinkering with the macro because it CAUSED some problems because of the testing order.

  20. Emma Wayne Porter Says:
    July 29th, 2009 at 9:34 am

    To begin with, two words: Latin 1

    And then of course there’s the fact that most ebook formats (excluding PDF) are xhtml/CSS based, a set of complimentary protocols that were never intended for any usage even approaching print convention.

    Publishers jump through the hoops to make it work — to varying results, as end users have seen — but as Paula said, it would have been lovely had developers consulted with publishers somewhere along the line to ensure there’d be no gap between print convention and digital display.

    Add to that Amazon’s unreliable testing mechanism: most computers (on which their testing program runs) support way more character sets than the Kindle, so errors will get through no matter how many uninitiated publishers test and check their output on PCs.

    Word of advice: if you’re supplying for Kindle, use the .prc trick rather than zipping .html. It works a lot better, we’ve found.

  21. Not sure if this is the right place to ask, but thought I would since there seems to be a lot of publisher’s responding to this thread….what do you think the current job market is for testing/proofreading this growing e-book format? Is there one? I, for one, get frustrated after purchasing an e-book for more than the cost of the paperback version and then cannot make any sense of the book. I guess this would hurt Kindle/Device Reader sales more so than standard book sales (publishers main revenue), but I’d love to hear some opinions on the subject. I’m not a publisher or proofreader (I’m an IT Manager), but I think this could be a new job market for those with the right skill set. Or am I wrong and the publishers are of the opinion that if it’s an e-book it will have an extensive amount of typos/formatting errors and the reader just has to “deal with it”. :) I did contact Amazon.com when a book I purchased was riddled with errors (won’t mention the author, but it was a previous nationwide bestseller), and they told me it was the publisher’s responsibility. I then contacted the publisher and received no response. Made me curious for opinions from the publishing side. Thanks. Jason

  22. just started reading ‘foundation’ by isaac asimov last weekend. in print. del rey paperback edition that i got used, for a buck. 5 typos/printing errors in 3 pages, dude.

    we want electronic publishing to be clean & perfect, but print hadn’t gotten *that* figured out in 500 years of trying. best to just make sure that the text is usable and intelligible.

    very important, though: images. if images and charts and graphs weren’t necessary to the books in which they appear, they wouldn’t be there. they take some effort to produce.

Leave a Reply

Subscribe without commenting