Notes from the Tower: One publisher’s struggles with ePub

ePub is the magic that will rescue us from the crumbling Tower of eBabel and give us e-books that Just Work.
Or not.
Here is the experience of a simple-minded publisher who believed what he was told about ePub. Perhaps there are some morals to be drawn. But if I’m not simple-minded, just simple— please correct me gently!
The book I’m formatting as an e-book is a collection of short stories. Here’s a link to one of them, so you can follow my reasoning if you want.
The formatting of these stories has two major complexities:
- One or two paragraphs have extra space before them, to indicate the start of a new section.
- There is one limerick, which has to be set as verse.
I’ve been checking my ePub files in Adobe Digital Editions (as a proxy for the Sony Reader), on O’Reilly’s Bookworm web site, and in Stanza on my iPod Touch. The actual XML was also viewed and checked in a standard web browser. I’m sure there are many other possibilities, but you have to stop somewhere. I built the files with Dreamweaver, so as to have full control of all the XML and CSS.
Space before paragraphs
Everyone knows enough CSS to know that the ‘margin-top’ attribute will control the space before a paragraph. Except that in Bookworm it doesn’t, because Bookworm ignores all the CSS in the ePub file (do View > Source if you don’t believe me). And Stanza takes over all the margin settings for its own purposes. Result: a short story that drones on, paragraph after paragraph, with no refreshing pauses to give it rhythm.
For my next attempt, I used the ‘padding-top’ attribute. This still fails in Bookworm, of course, but Stanza doesn’t seem to mess it up. So, victory!
Verse
There was an old man of Zermatt
Who was really exceedingly fat.
···Because we were thinner
···We had him for dinner.
Now what could be nicer than that?
Now what could be easier than that, in typesetting terms? A handful of paragraph styles controlling spacing and indentation, and there you are.
Except: Bookworm ignores the CSS, so we’ll get huge spacing, and Stanza has its own ideas about margins and won’t listen to our CSS, so we’ll get huge spacing. In both cases, a mess.
The cure is to put ‘br’ tags between each line. But even that is not enough, because some readers (such as Adobe) will indent the first line of a paragraph, which ruins the verse spacing. So we have to put a ‘br’ tag in front of the first line as well — which means that there’s an ugly blank space before the whole poem.
So you can’t set verse in ePub without knowing what software is going to be reading it.
Typesetting quality
People who set type for a living spend much of their time slaughtering widows and orphans. Adobe Digital Editions manages to eliminate them, though not perhaps in the most elegant way. Stanza, on the other hand, is a one-man benevolent institution. The last line of a paragraph is frequently left dangling at the top of a page; the first line of a paragraph is frequently stuck at the bottom of a page; and it has a love of hyphenating the last word on a page, even if that leaves only 4½ words of the paragraph floating absurdly at the top of the next page.
Yes, hyphenation… I conduct a running war against malicious publishers who hyphenate “wouldn’t” before the ‘n’, but Stanza goes one better than them. In the sentence ‘And how is the butcher’s meat, Valerie dear?’ Stanza manages to put a hyphen after the question mark and before the closing quote. It looks novel.
The moral of the story
The biggest complaint about the content of e-books today is that the quality is so low. This is not just an aesthete’s whinge. Beautiful text is readable text. It is our duty to our readers to do the best we can in e-books, just as we do in print. But despite all the promise shown by ePub, it fails, in practice, to provide the consistency that we need.
You will say that this is a software question; but I say that there is no software involved. No-one really uses Adobe Digital Editions: they use the Sony Reader or another machine. No-one really uses Stanza: they use the iPhone or the iPod Touch. These things are appliances, not computers. Users have no concept that there is any software there at all — do you know what software powers your DVD player? — and so they have no incentive to put pressure on the software suppliers to make things better.
Even if pressure could be brought to bear, it may already be too late. Suppose you are responsible for maintaining Stanza, and you read this post. What will you do? Correct Stanza to be consistent with Adobe? Surely not. Because that will mean that every existing e-book whose author has tweaked it to work will Stanza will instantly look broken. The losers in any change will notice at once, and make a lot of noise; the winners will not even notice they’ve won.
Evolutionary biology has the concept of a speciation event: when two populations find that it is to their benefit for interbreeding to be impossible. Has this already happened within ePub?
Is it already too late?














September 4th, 2009 at 4:34 pm
There are certainly problems with ePub reader implementations. ADE is the best so far, but is woefully lacking in some areas. But nothing you seem to have hit yet.
Looking at your sample, the first thing I’d suggest is adding a css style
h1+p {text-indent:0}
which will stop the 1.5em text indent from being applied to the first papgraph in the story. Having an indent there is just wrong.
For the gaps, I’d suggesting adding a specific gap paragraph. This is likely to work correctly on any reader
with
p.gap {line-height:1.5em}
in the CSS
For the limerick, I’d define a limerick paragraph style with 0 text-indent, and then you can apply whatever left-margin seems appropriate. I’d indent the third and fourth lines with a couple of non breaking spaces.
September 4th, 2009 at 4:44 pm
Oh – just noticed that you have used a class for your verse and you’re already using non-breaking spaces for the third and fourth lines. But the style is not defined it in the CSS. How about getting rid of the br and adding CSS of
p.Verse {text-indent:0; margin-left:1.5em; font-style:italic;margin-top:0.5em;margin-bottom:0.5em}
Of course, it /would/ be nicer to have the verse centred on the page, perhaps. But I don’t think that can be done with XHTML and CSS2. I think it might be possible using SVG.
There’s good discussions of ePub formatting, and the foibles of various implementations over at MobileRead. http://www.mobileread.com/forums/forumdisplay.php?f=179
September 4th, 2009 at 4:46 pm
>>>Stanza manages to put a hyphen after the question mark and before the closing quote. It looks novel.
I LMAOed at that. Sorry it was at your expense.
Welcome to the world of eBooks. It’s not pretty.
September 4th, 2009 at 4:47 pm
Bookworm supports all CSS. I’m sure we’d be happy to look at this particular case if you email the epub or a link to it on Bookworm to bookworm@threepress.org.
September 4th, 2009 at 6:51 pm
I don’t know about using ADE as a proxy for the Sony Reader. I would use the Sony Reader as a proxy for other dedicated devices that use Adobe Reader Mobile for rendering epub. That covers quite a few devices.
By specifying 0 margin and 0 padding in the @page selector your text has almost no margin at all on the Sony Reader. Combined with the default line spacing the story appears (to me) to be bit too cramped to read comfortably.
Also, as Paul noted you’ve used a selector on your verse that isn’t actually defined in your CSS.
September 4th, 2009 at 7:47 pm
btw, elsewhere Kirk pointed out a particular problem with Bookworm and CSS which was introduced relatively recently, and I’ll fix it ASAP. It’s possible it’s the same issue.
September 4th, 2009 at 9:15 pm
I think it’s the responsibility of the reader software to support what the file format supports. Yes, I test my books in some readers, and I’ll do a little tweaking, but I’m not going to kill myself trying to make them look good in every reader that rewrites or ignores what I’ve done. I think the person reading needs to be aware of the limitations of the software they choose to use, and suffer the consequences of badly formatted ebooks if they choose to. And for some people, they don’t even care. So be it.
September 4th, 2009 at 10:59 pm
I convert all my ebooks to text as soon as I get them, and I scan any books I want to keep straight to text too.
Normal space before a paragraph? Press the Enter key twice.
Extra space? Three times.
Close up? Once
Indents? [SPACE][SPACE][SPACE]
Hyphenation? Who needs it? Let the display device work out how to end the line at the nearest word. (Or are you using full justification? — ugh!)
Easy, really; and once your readers have learnt the code they will know exactly what you meant to do.
Typesetting is an attempt to impose smartness on dumb media. Once the media get smarter, typesetting will no longer be required.
September 5th, 2009 at 12:31 am
I too have the same feeling that the quality of the e books these days are really low.I think the publishers should take effective measures against those.
September 5th, 2009 at 1:38 am
At last, somebody to realize it. As you can see, digitalizing is not so trivial/easy/cheap.
I liked the ideas, but there’s not solution yet. Whe’re all waiting for it.
September 5th, 2009 at 8:20 am
“But let there be spaces in your togetherness and let the winds of the heavens dance between you. ”
–Kahlil Gibran
All we can do, for now, is to make sure that our EPUBs are valid.
As you know, IDPF is currently revising the EPUB standards, which might make things easier for the reading devices to render things more accurately.
September 5th, 2009 at 8:36 am
@Paul – Your h1+p is a good idea and I should have thought of it myself. I must say that the idea of a ‘gap paragraph’ seems a little less elegant than giving the first paragraph of a section its own attribute, but if necessary I’ll look at that again.
@Paul – I originally did this sort of thing and of course it worked in a real browser, but when I saw that Stanza was ignoring the top and bottom margin settings (and Bookworm was ignoring everything) it seemed safest to avoid being clever with CSS in this case. I’ve now defined the style almost exactly as you suggest, and it turns out that although Stanza ignores top and bottom margins it honours left ones and also handles ‘text-indent’ properly, so the initial “” is no longer needed. I’ve uploaded a revised version to the same address as before.
@Kirk – there is no @page selector at all in my CSS that I can see. That’s an interesting point abou the Sony Reader, though. I suppose I’ll have to go out and get one.
@Liza – thanks: an email is on its way to you. If there’s any change I can make, I’ll make it; if the bug is yours, then I’ll be glad if my example can help you pin it down.
@Jon – No, I’m not using justification: I let the device decide for itself. You could add to your recipe “Italics, write italicized words _like this_”. I know this is correct because the edition of “Journey to the Centre of the Earth” that I recently read on my iPod does it.
September 5th, 2009 at 10:45 am
Wow. This was an eye-opener to me, thanks for posting it!
ePub is supposed to save us from the tower of ebabel, only … it also slams us all back into the ugliness of html and the different ways different browser (user agents) interpret and display the code. CSS sounds great in theory, but in practice it turned out that different browsers only supported what they wanted to support.
I would ask, then, that the IPDF standardize the rules of rendering that any user agent needs to accept, so that Sony and other companies using ePub can really have industry-standard rendering. Plus the IPDF should publish ‘best practices’ for publishers with specific suggestions as to how to format their texts.
‘Thought breaks’ and rendering verse are pretty basic to publishing, after all!
How many years have the eggheads been working on this? Sheesh.
September 5th, 2009 at 10:57 am
It’s not the idpf, there are standards. It’s the reading software makers not supporting them, regardless.
September 5th, 2009 at 12:07 pm
As Christine says, the problem isn’t that the standard doesn’t define what CSS should be implemented by ePub reading software, it’s that the programs aren’t implementing them all (yet).
http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html#Section3.3
specifies all the CSS properties that should be implemented.
September 5th, 2009 at 4:06 pm
The sample EPUB file does not conform to the OPF 2.0 specification. There is no entry in the manifest of the OPF file for the CSS stylesheet file. In this case it is not unreasonable for an EPUB reading system to ignore it, like Bookworm did.
September 5th, 2009 at 5:33 pm
@Ordbrand – You’re absolutely right, and when I put it in, Bookworm sees the CSS. I’m left with egg on my face, but at least egg is nutritious!
September 5th, 2009 at 6:19 pm
No worries, I found and fixed some unrelated Bookworm CSS issues thanks to digging in here. They will be posted shortly along with some exciting new features.
September 5th, 2009 at 7:26 pm
@ martin –
epubcheck might have caught that.
http://code.google.com/p/epubcheck/
Using a program to create the epub file would also make sure everything’s in order. I use eCub, there are others. I like eCub because it doesn’t alter my html or css (if you remember to uncheck “generate css” in the options), and it’s easy to use.
http://www.juliansmart.com/ecub
September 6th, 2009 at 12:54 pm
In this case it is not unreasonable for an EPUB reading system to ignore it, like Bookworm did.
In fact, a conforming reading system is required to ignore resources that aren’t in the OPF. So any one that did display the CSS here was explicitly misbehaving.
September 6th, 2009 at 1:23 pm
@Jon – that’s the beauty of markdown.
September 7th, 2009 at 1:23 pm
@Christine – No, epubcheck didn’t catch it. I’ll certainly consider eCub in the future.
@Liza – I’m not sure how much use a standard is when nothing (apart from Bookworm, admittedly) conforms to it. It’s really irritating for the standards-conformers.
September 7th, 2009 at 7:11 pm
epubcheck doesn’t catch this issue because of a bug: http://code.google.com/p/epubcheck/issues/detail?id=28
September 12th, 2009 at 10:55 am
Could this post be edited to reflect that Bookworm wasn’t ignoring the CSS incorrectly? At least one other blog has reported the misinformation. Thanks!