BOOK Offered Or Kept: Digital reading without Epub?
By Aaron S. Miller, CTO of BookGlutton, a Web-based community of readers
Reminder: The TeleBlog offers many viewpoints, and I’m delighted to see Aaron not pulling any punches even if I disagree with him in places. - D.R.
Recent posts and comments have carefully pointed out that what we call .epub is actually three separate specifications which evolved from the OEBPS, or OEB for short. These three specs are OPS, OPF, and OCF . . . Or is that OCS? What do each of those stand for again?
The Web grew because smart people who were smart enough to understand SGML were also smart enough to know it was too complicated. What we need is a simplified subset of OEB … or OP … Or whatever it’s called — okay, epub.
Let’s face it: there is no grass-roots explosion of .epub adoption. The most hopeful implementation has come from Adobe, and it’s nice that we have a validator now, but we also need legions of independent developers building APIs and authoring tools and Reading Systems and open-sourcing all of them. We need writers, authors and artists who want to use .epub for their work because it’s simple and hackable, as easy as HTML or easier. We need a way for folks to put their own books up on their blog for download as .epub, or for them to post links to their writer friends’ latest novel, or to their favorite passage in a public domain text. The OP-OB-OC-OS-epub spec doesn’t give us this possibility.
I have an idea for yet another brick in the tower of e-babel. It’s a new e-book format, and it’s called BOOK, which stands for BOOK Offered Or Kept. (I love recursive acronyms!). BOOK is not exactly revolutionary, but it’s easy, and there are already dozens of great Reading Systems which support it. No plugins needed, no desktop software, no hardware devices, no subscriptions. Just Web access and a compliant BOOK Reading System.
The structure of a BOOK would look like this:
…BOOK/
……index.html
……images/
………cover.png
……css/
………base.css
………skins/
…………modern.css
…………classic.css
…………nouveau.css
……scripts/
………prototype.js
………base.js
………extensions.js
Many people will recognize this. To read a BOOK, one only has to click, either from their Desktop, from their Bookmarks menu in their Reading System (Safari, Opera, IE, Firefox, others), or from a link placed on a website (Teleread, Delicious, Google, etcetera). The URI to reference a book is just like any other hyperlink. It supports linking to fragments, and if used in the correct browser, it could even link to specific passages. It looks like this:
http://www.bookserver.dom/BOOK/index.html#chapter1
You can put it on your blog and people can read it in the same window!
The content of index.html may be HTML 4, HTML 5, XHTML or anything recognized by the many Reading Systems already out there. HTML 4 is of course the official recommendation of the spec. The rest of the package is composed of PNG images, CSS2 and Javascript. The CSS2 is extensible, but everything an author needs is already there, with a few default skins to boot. Also, authors are free to use ANY or ALL of the CSS2 properties, with a few exceptions which are dependent on the Reading System itself. Fortunately, there is a wealth of information about how to make BOOKs work across different Reading Systems. Guidelines can be found all over the Internet.
As for the Javascript, it’s based on the ECMAScript standard, which has evolved into a strongly-typed, object-oriented programming language and is one of the few web “standards” which really is a standard. BOOK authors will welcome the addition of a scripting language, as it is NOT currently supported in the IDPF specifications. In fact, it’s forbidden for .epub reading systems to execute scripts. It’s also forbidden for them to display a file called index.html without first loading and parsing several other files.
For those not familiar with the state of Javascript today, it controls much of the page layout and animation you see on the Web these days. For BOOKs, it offers true pagination, typesetting, skinnable and collapsible layouts, footers and headers, footnotes as popups, inline text, true footer notes, or endnotes . . . the list goes on. YUI, JQuery, dojo, MooTools, and Prototype are just a few of the frameworks available, and they’ve been addressing these issues for some time now.
The really great thing is that BOOK developers and authors don’t have to think about the scripting. All they really need to know is basic HTML. With that, they can specify skins, footers, rollovers, highlights, annotations, link to or include external content, embed video and flash, you name it. Add to this that many Reading Systems, especially my favorites, Safari and Firefox, have support for SVG, MathML and the wonderful, revolutionary <canvas> tag.
One more thing, because I can hear people screaming about wanting to read BOOKs offline. What then? How to download a BOOK? The answer again, is Javascript. Gone are the days when it was a toy. We now have several frameworks supporting offline storage. Just download the books you want right in the Reading System. This means you don’t have to worry about saving them to your disk. They don’t clutter your desktop. You don’t need to be online the next time you open it: the books will just be there.
So, as much as I abhor the tower, as many on this blog do, perhaps just one more stone will make it fall?
Moderator: I totally agree with Aaron on the need for independent developers to do .epub tools and content creators to take more interest. The IDPF should take a proactive role. That said, Penguin’s new announcement should help. As for scripts inside books, as is suggested for the BOOK format, well, I’m not a cheerleader, given the security risks. - D.R.









April 11th, 2008 at 4:38 pm
Hey David,
The IDPF should take a look at the browser security model. They’re already implicitly accepting sandboxes by allowing embeddable players (such as Flash). Unclear whether a Reading System is supposed to disable ActionScript, but its also based on the ECMA standard. In fact, we could conceive of a BOOK embedded in an .epub, with Flash controlling presentation and pagination. Not what it was intended for, but it would solve some problems. Re: The Penguin announcement, I can’t wait to see one of those .epubs. Everyone seems to have a different take on how to actually implement and author it.
A
April 11th, 2008 at 8:15 pm
First you say that we need a subset of epub because it is too complicated for the average user. Then you describe a hodgepodge of various HTML versions, javascript and “extensible” CSS as an alternative. If you really think that your alternative is easier to deal with, I’ve got a bridge to sell you.
To begin with, epub is based on XHTML 1.1 and CSS 2.1 and the few differences are listed in the spec. The supporting files that are not content are valid XML and are also detailed in the spec.
As for your alternative being an online solution, no thanks. I like to take my ebooks with me and read where there may not be an internet connection.
It is true that the current version of the OPS spec does not include javascript. Depending on your viewpoint, that is either good or bad. There are certainly cases where javascript support would be very useful in an ebook reader. In fact, some type of scripting will probably be added to a future version of OPS.
As for picking on epub for not having scripting, name me one reflowable ebook standard that does include scripting. I can’t think of any. PDF doesn’t count, since it isn’t worth a damn on small screens and isn’t usually reflowable.
Although epub is not perfect (being version 1.0), I think that it is a good beginning and very usable. It is already more capable than older ebook formats like MS Reader, Mobipocket, etc.
One other thing. The valid XHTML and CSS that you create for an epub can be used as-is on a web page. If you want to add javascript or one of the few CSS features that epub doesn’t currently support, that can easily be done. The bulk of the work is already done once you’ve created the epub. Or you can go the other way. Create the web page with XHTML/CSS and use that for the epub. That’s just one reason why using these standards are a good idea, rather than the Wild West days of any-old-flavor of HTML.
What you propose will do nothing to simplify the current ebook situation. It will only add to the e-babel confusion. At least epub stands a chance. It has been adopted by some big publishing houses for internal workflow (and maybe we’ll see it at the consumer level). Two of the most popular ebook reader devices (Sony 505 and Cybook) are due to have native epub reading capability soon.
April 11th, 2008 at 8:21 pm
I just read the story about Penguin Books. It looks like the consumer will have epub ebooks sooner than I thought. Very good news. I am glad to see that some of these publishers see the benefits of epub as both a workflow and a distribution medium.
April 11th, 2008 at 9:21 pm
As a long-time contributor to the OEBPS specification (version 1.0 released in 1999), and the recently released OPS/OPF/OCF specs (which are successors to OEBPS and which underlie EPub), let me address, from my unofficial perspective, points made by both Aaron and Joseph.
First, I’m glad and excited to see a significant number of developers taking an interest in EPub. This is a good sign that we will see, in the future, new EPub authoring, presentation and conversion applications. EPub is an excellent, high-fidelity format for both direct rendering and for user-side conversion to other formats for particular platforms such as very limited resource handheld devices.
Aaron noted a lack of script support as a demerit on OPS. I do understand his reasoning. Let me explain that the reasons we currently have an effective ban on script support in OEBPS/OPS has to do with
We prefer that reading system developers innovate within the XHTML+CSS framework, and if they want some fancy “dancing bears” stuff, to approach the OEB Working Group and request the new feature, then the Working Group will consider how to implement it which benefits everyone in a balanced manner.
Certainly, as EPub support grows, the OEB Working Group will reconsider its position on EPub script support, but it will (and should) take a hard-nose position as to what is gained that cannot be gained by reading system-side innovation plus better CSS support (e.g. CSS3), and of course the impact of scripting on EPub authoring and reading system developers. Yes, it sounds a bit harsh, but it is better to be conservative with the spec in the beginning, and then expand it carefully over time as needs require, rather than just opening it up from the start and then regretting that we had done so later.
I do hope Aaron will contribute requirements to the OEB Working Group so it has something to chew on when it reconvenes to update the current specs. This will include specific things that he’d like to see that scripting would give – that is, not just ask for scripting support, but list what sort of things he’d like enabled (which could be done by scripting.) This gives us a better idea of how to intelligently expand the spec, and whether or not we should bite the bullet and add scripting support.
April 11th, 2008 at 9:33 pm
Jon, from the little I have looked at CSS3, it does seem that it will take care of quite a few of the current CSS2 and javascript shortcomings. As for your comment about “reading system-side innovation”, I wish that the current epub spec had been a bit more “hard nosed” and not left so much as optional on the reading system. One example in particular is the header/footer support. There are a few others.
April 12th, 2008 at 9:56 am
Joseph, in general I agree with you regarding reading system requirements.
It is important to not go too far in either direction. One can over-specify what a reading system must and must not do, but then this stifles innovation. On the other hand, without any rules, it becomes a free-for-all and EPub authors cannot rely on some sort of presentation uniformity across competing reading systems when they design their EPub Publications.
A balance is thus necessary. We can and should discuss whether the proper balance has been achieved.
And yes, CSS3 will provide several of the “bells and whistles” that scripters would like to enable, at least those that make any sense. For now the “we-want-scripting-in-EPub” supporters need to make a stronger case, detailing exactly what sorts of things scripting will enable that cannot be enabled with CSS3. Such a list will enable the OEB Working Group, when it reconvenes, to decide what to do.
April 12th, 2008 at 1:04 pm
I’m actually not advocating scripting in .epubs, I’m advocating a better alignment with what I see as the current most viable “Reading Systems” at hand: web browsers.
I wholly agree with Jon that the spec should be simple(r), and that overspecifying things is a risk. And I have no qualms about .epub’s adequacy for production or industrial use. It’s fine as machine readable formats go, better when we get some open source libraries to deal with it. The problem is having something that every developer, no matter how skilled or unskilled, can start working with and building for, and something that just works in the kinds of Reading Systems most of the digital community already uses (think Firefox and IE7).
@Joseph: As for reading on-line, it’s time to stop equating browsers with being “on-line.” Offine storage is a reality now. Hundreds of megabytes of data can now be stored by browsers for offline access, with or without plugins. See Google’s new offline office apps, for example. See also dojo storage (http://ajaxian.com/archives/dojo-storage-updated-for-10).
@Jon: Javascript is useful mainly for rendering, not bells and whistles. Without Javascript, the non-normalized implementations of CSS out there become useless–you can’t rely on them to produce a consistent rendering of a document. Unfortunately, with CSS3 the rendering game is only going to get more complicated. I don’t advocate executing scripts from epubs, I advocate executing scripts in epub reading systems. Two very different things, as you’re aware.
@Joseph: I’ve gone to bed with the spec. And CSS and XHTML are not standards, they’re W3C recommendations. As for .epub being compatible with browsers, strictly speaking, it’s not. Browsers can’t do anything with .epub natively because they can’t link into the container format and they can’t natively parse the opf file. In an uncontained structure, they can link directly but there is no recommendation on best practices for this. Since they could technically link to any content– out-of-spine, hidden, protected, whatever–this runs the risk of violating the document author’s intentions, and possibly copyright. There’s no requirement in the spec for an index file, which has for almost twenty years been the accepted norm for linking into a web structure.
Anyway, my point here was not to say “Away with epub!” — If I thought that, I wouldn’t have spent the last year building a reading system on top of it!
However, going forward, it seems the IDPF must change its approach. Championing and adopting W3C recs without deferring to the Reading Systems which have for 16 years now been built around those recs seems like a mistake. My guess, and maybe Jon could clarify this, is that the spec has been designed with hardware systems in mind (not explicitly, of course, but in spirit), platforms where the development of web technology has been absent or slow. IMO this is the lowest common denominator: these don’t have nearly the reach that web browsers do.
The IDPF should not continue to overlook the fact that the recs they’re building on were designed with web browsers in mind.
April 12th, 2008 at 2:59 pm
Thanks for clarifying, Aaron.
Actually, I agree with most of what Aaron says. (And yes I do know the difference between “specification”, “recommend”, and “standard”, but in my previous reply I did not want to delve into the intricacies since I was communicating to a wider community who may not understand these nuances.)
My big problem with the current “EPub” is that it is essentially linear in focus – it forces textual content to be mostly linear even when it isn’t. And most textual content is NOT linear in structure/organization. Imagine trying to fit a typical web site into the EPub paradigm. Or how about a snapshot of the Wikipedia?
Because of this strong emphasis on linearizing content, new EPub Reading Systems will focus on linearity – it may lead to EPub not being able to evolve to properly handle non-linear publications (such as representing web sites) because of the already installed base of linear Reading Systems and their vested interests. It’s a real concern of mine.
For example, the OpenReader spec (currently on ice) specified two modes for any publication: “OEB” (content is essentially linear per the OEBPS paradigm), and “WEB” (content should be displayed in web fashion, like a web browser even if discretely paginated on presentation.) This would have forced OpenReader user agents to support both paradigms from the get-go. This should have been done with EPub, but the major players in the working group were advocates of mostly linearity (Adobe and ETI) and I was out-voted. I believe this is a major blunder in the EPub standard.
It is interesting Aaron brought up web browsers. I’ve contemplated writing a blog article describing a strategy to allow EPub Publications to be viewed in web browsers. In fact, it is quite trivial, and I’m amazed no one has (apparently) considered this yet.
First step is to develop an EPub–>WebSite converter, which simply unpacks the EPub (it is, after all, simply a ZIP file), does some very minor link additions to the content documents, converts the NCX to one or a set of table of contents pages, etc., and then saves the results into a folder. A web browser can then view the contents of the folder just like any web site.
The next step is to create a browser plug-in which does the same unpacking but adds some features to make it more seamless for the end-user. There are more advanced things that could be done after this.
The downside to the above is that content documents will be viewed in usual web fashion, typically requiring scrolling to read a chapter or book. (Many people don’t mind this – thousands of books are now online as HTML content and are very much readable.) But still, it’s another way to view EPub Publications which is a Good Thing™. We can imagine plug-ins for IE, Firefox, Opera and Safari, to name the major ones. And, more importantly, imagine now that small handhelds with web browsers will now be able to view the converted Publication since it is simply a “web site”.
In addition, and this is important, web browsers are capable of nicely handling the one non-linear feature included in EPub: auxiliary content. Such auxiliary content can be viewed in separate windows (preferably popup windows), which is the intent of OPS. Here’s where web browsers will be in far advance of Adobe’s DE, for example.
Again, thanks for clarifying several things, Aaron.
April 12th, 2008 at 5:45 pm
Aaron, I didn’t say that an epub could be used directly via the web (that would also be useful). What I said was that the XHTML content with the CSS styling could be used. This saves you the trouble of creating a different set of web pages for “live” use. In some cases and with some browsers, you may have to make minor tweaks, but the bulk of the markup that you created for epub use works fine on the web.
Even if you are going from the web to an epub, if you have been using XHTML and CSS on your web pages (which has been considered good practice for some time), most of that work can go into an epub.
The point I was trying to make was that the standards that epub is based on are the same ones that we all should be using on the web. The old days of HTML 4 and sloppy markup should be gone (but they sure take a long time to die). This is one of the things in your original post that I disagree with–your suggestion to use “HTML 4, HTML 5, XHTML or anything”, which perpetuates the “tag soup” mess that we have now.
April 12th, 2008 at 5:59 pm
Hey Jon -
Just a note on your comment about making epubs readable in web browsers. That’s actually the core of what we developed at BookGlutton. Our import and upload tools accept doc, html, rtf, and txt and produce epub files without the need for plugins. At this time we haven’t turned on the ability to download the epbus, but if our community is interested in that feature we would definitely take it into consideration.
April 12th, 2008 at 6:09 pm
I do agree that all HTML, whatever flavor, should at least be well-formed XML. And these days it’s relatively easy to convert malformed HTML into XHTML using Tidy.
April 13th, 2008 at 1:23 am
Arron makes very good points concerning epub format.
Agreeing mostly with him, I would like to express my opinion in more general terms: epub will most probably not become a standard for digital publishing because instead of trying to propose a format which is simple, powerful and flexible, IDPF proposes something which is neither. Moreover, epub will most probably never to evolve into simple, powerful and flexible format because IDPF takes patronizing and arrogant stance of “we know best what what is really needed for digital publishing and what is not”.
Scripting is very good example of both failure to provide flexible format for digital publishing and inability to understand the needs of possible adopters of that format. Saying that scripting is needed only for bells and whistles (fancy “dancing bears” stuff) or that instead of scripting some functionality of reading systems must be used, is completely inadequate. Such reasoning would be adequate in early nineties but not now when we have javascript in browsers for a decade. Scripting is *essential* for many digital publishing projects and not understanding it is a major failure of IDPF. Saying that “we will reconsider scripting when adoption of epub grows” is also inadequate, because nobody will wait patiently, but will choose some another platform for their publishing needs, Adobe AIR for example.
It seems to me that all those who think that IDPF goes wrong way should consider forming a new body and formulating specification of a format alternative to epub. Arron has very good ideas about how such format should look like. So, Arron, push your ideas for a new format till a new spec a a new body around them is formed. You will certainly have a following since you are mostly correct and what you propose is simple, powerful, flexible and relatively easy to implement.
April 13th, 2008 at 9:15 am
Laisvunas, I would think it worthwhile to list those features of EPub which you believe are not needed and/or are too “complex”, and/or were implemented incorrectly.
I’d also like to see specific examples of what javascript can provide for the digital publication reading experience.
I’ve also started the EPub Community list, and certainly those who believe EPub is a deficient format are welcome to discuss their reasoning.
Another point is that in the design of any digital publication format, it is important to understand the needs of the various stakeholders who will use the spec (publishers, readers, developers, accessibility community, etc.), as well as the various types of digital publications that we want the format to represent – essentially a requirements exercise. IDPF has repeatedly done this exercise since 1999 (and I’ve participated in nearly all of the discussions over this time frame). Included in the discussion were a number of software developers like yourself.
My biggest concern with EPub is that it is too linear, and thus may not be optimum for types of digital publications which are more non-linear (such as web sites!) If we wish EPub to be a universal digital publication format, the format needs to be more flexible. Until the format can represent a web site structure, for example, we have not arrived.
Laisvunas, I suggest you or someone else start a new public discussion group to hash out ideas for a “next-gen” universal digital publication format. I think this would actually be great and I would participate. I think this would be the best way to start a new standards effort.
April 13th, 2008 at 9:32 am
Btw, in looking again at Aaron’s BOOK proposal, as best I see his idea is to place all content into a single document, index.html.
One reason OPS allows multiple documents is that it makes it much easier for reading systems to only parse what is needed for the moment, very useful for limited resource hardware. Even Adobe DE for desktops works better when the content is split up (such as into chapters.) Now with a single document approach one could ask authors to split up their content in a document and “wrap” each piece within a specially-classed <div>, but this is not practical for reasons I need not get into.
In addition, in the future, we’d want the spec to be able to represent web site structures, and of course that is multiple files.
The lack of something equivalent to the OPF Package is also troubling. Back in 1999, the first OEBPS proposed was essentially what Aaron proposed (but did not include scripting.) It became clear as the requirements came in from publishers and other stakeholders that there needed to be a “Package” (metadata, identifiers, content organization, etc.)
In addition, for both accessibility and use by the sighted, the recent addition of NCX is revolutionary — we now have a means to have a machine readable table of contents (and similar lists) in every publication, far superior to having the publisher customize something in content itself (thus making it near impossible for accessibility purposes).
When we add all these things together (and many I’ve not even discussed), the idea of a Package is a great one, and I assume this is part of the “unneeded” complexity I hear voiced in this comments area.
Anyway, I could go on, but there are reasons for why EPub is the way it is. Those who believe there is a better way (and I actually do: OpenReader) should strive to understand the requirements that were used, to reevaluate those requirements given what we know today, add new requirements, then see what shakes out for the “next-gen” design. Maybe javascript will be a major part of that new design, maybe it won’t. (As I noted, from the beginning the OEB Working Group was concerned with proprietization of the spec using javascript — we had long discussions on this, a lot of “what if” scenarios involving 800# gorillas.)
April 13th, 2008 at 12:14 pm
Hi Jon,
For an example on what can be used javascript for in digital publication you can take a look at one publication experiment here.
It is a page which has several thousands of lines of javascript, much more than there is text in it. Javascript is used for building several custom widgets: application style menubar, widget allowing for a reader to change proportional and monospace fonts for specific regions of the page, widget allowing to select regions of the page for printing, widgets for displaying and hiding interline and block references etc. Nothing of this can be provided by reading systems or achieved without javascript. Of course, it is neither very characteristic, nor very professional example, but my point is this: epub by forbidding javascript does not allow this kind of experimenting.
You say that javascript was forbidden because of the fears of proprietizing epub format. I do not understand how scripting can lead to anything what might be called “proprietizing”. But whatever this might be, axing the problem by forbidding javascript is wrong.
My criticism of epub is not about details but about its fundamentals. It seems to me that while preparing the spec the most fundamental question was left out of view: what is the right model for digital publication: is it a physical book? Or is it something else? If something else, then what? From my point of view, not a physical book, but a website should be thought as the right model. Why website? - because of the well supported and ubiquitous mix of technologies (html, css, javascript) and because of the workflow (publishing early versions of the publication on the website for gathering feedback and then publishing as downloadable file). If a model for a digital publication is a website, then any format which does not allow to have everything which we have on websites and does not allow to take all website’s html, css and client-side scripts and publish them as downloadable file without much changing them, is doomed to failure in the long run. It seems that epub is now on this way to failure.
There is already at least one software platform which allows to take a website with all its html, css and client side scripts and publish all of them for a desktop without much of change and retaining all functionality. It is Adobe AIR. It is quite possible that in a short time there will be a handful of such platforms, both free and proprietary, and from my point of view each of them will be better platform for digital publishing that epub with its reader-agents.
Great buzz around Adobe AIR (although not because of its publishing capabiliries) and lack of it around epub (see its dead forums) is quite characteristic.
April 13th, 2008 at 12:35 pm
Laisvunas, I feel your pain, but I don’t think forming another standards body and a new format is the answer. What I’d like to see is a sort of epub spinoff, another specification from the IDPF, if you will, with slightly different requirements. Instead of BOOK, we could call it epub-lite. The basis for this simplified, consumer-oriented version of .epub would be the same browser-centric building blocks under the IDPF specs. The difference would be in the file structure and in the way a browser deals with it.
Just to be clear, my intention is not to tear down the existing specs or the IDPF. I know developing requirements and specs can take an exceedingly long time, and I don’t think what I propose has to undercut or devalue the work IDPF has already done. I want the tower to topple, but I also understand its necessity as a source of building material.
A nice requirement for the IDPF would have been: “packages must provide a fallback STRUCTURE, in the event that a Reading System cannot natively handle the container.xml and/or content.opf files.” The best fallback, it seems, based on the reach of the web browser market, would be to divert the rendering chain through index.html, since it’s an entry point that all browsers (and in the current spec, all Reading Systems) can understand. Document authors could opt to omit the container and opf files, and the same fallback structure would be used by all Reading Systems. As far as indexing such a structure in a machine-readable way, the spec could name acceptable heading elements from the XHTML rec, such as H1, H2, H3, etc, and build a nested contents structure from those. It could even extend upon XHTML, allowing authors to omit certain headings from the structure (with, say a ‘visibility’ or ‘display’ attribute or CSS property). The beauty of this would be that it still falls under the umbrella of epub, but it allows document authors to stay in familiar territory, and it better follows the principle of XHTML, which is to be extensible, not absolute.
As far as scripting goes, the spec could only allow execution in browser fallback mode. However, even then, allowing it only in the Reading System and not the BOOK would still be fine. @Jon, Re: Scripting. Assuming you don’t want to wrap things in a plugin or a custom build, a browser-based epub Reading System MUST have scripting. There is no way around it if you ever want to get closer to true typographic precision and layout control. You COULD use XSL or CSS only, if they had wide enough and normal enough implementations across user agents and platforms. Even Digital Editions, an 800# gorilla project if ever I saw one, uses a programming language (XSL) to help with layout. And unlike ECMAScript, XSL-FO is not only non-standard but also highly controversial among layout and interface experts, not to mention unheard-of to most browser engines.
If you’re building a browser-based RS, there’s also a server side requirement (though I really wish there didn’t have to be), to handle the container format. While I agree a container format is a requirement and a blessing, using ZIP is very problematic for browsers.
The reason I make this proposition is not out of mere whimsy. This sketchy proposal actually comes from a lot of hard tinkering and coding. I have been building a browser-centric reading system for a year now and as much as possible I’ve followed the specs but I’ve come to the realization that this is a bit of a square peg situation, and it could get worse.
Anyway, if there’s enough interest, I could formalize this more, but my intention is not to work against the IDPF.
April 13th, 2008 at 12:47 pm
Laisvunas, just read your comment preceding mine, and you have great points. I’ve been working in the browser since 95 and knowing where it’s been it’s easy to see where it’s headed, and you’re totally right, things like AIR and Apollo and Dashboard and single-site browsers are going to become formiddable opponents to locked-down hardware and bulky desktop apps. In all of them, scripting is more than scripting, it’s a full application programming language. And all of them are ahead of the game on security. Document security is one place web-readers excel. Secure scripting could easily be allowed in book documents on such platforms. Facebook, MySpace and Google Apps have proven this.
April 13th, 2008 at 1:15 pm
Aaron, regarding a container format which web browsers can easily be modified to handle, what would that be?
During the OCF development, we looked at a few container strategies, notably ZIP and MIME-based.
Obviously the issues of server vs. client side play a role.
April 13th, 2008 at 1:30 pm
MIME with Base64 would certainly be better for the browser, but not better overall (memory and CPU concerns, many other issues). For a lightweight subset, MIME containers could be tested, with limitations on file size and other necessary drawbacks. Unfortunately, there’s no good answer for the question, as you know. Like I said, it needs a container, but browsers are all thumbs when it comes to containers. So the simplest solution is still to have a server side class abstracting the file. Another option to explore would be an apache module. Or, since many browsers natively understand tar and gz compression, it would be nice if that could be hooked into, but that’s another thorny issue.
April 13th, 2008 at 1:41 pm
Thanks. Yes, the OCF Working Group looked at a variety of ways to wrap an OPS Publication into a single file for distribution purposes. Lee Passey was also involved in some of the working group development and hopefully he will weigh in with his thoughts.
We settled on a modification of what the Open Office group was using, which has its roots in JAR (based on classic ZIP).
Bill Janssen continues to advocate a MIME container for this purpose, and if he is reading this, I hope he weighs in as well.
April 13th, 2008 at 2:40 pm
Aaron, forming another standards body and a new format is the best way to make impact on existing standard body. Consider only a few recent examples: impact OpenReader made on IDPF or impact WHATWG made on W3C. Such bodies as IDPF or W3C work most effectively when they have to fear that someone will propose something better than they and make their specs obsolete or irrelevant. Your ideas both on BOOK and epub-lite are great. Go ahead, formalize them and show them implemented on your website and in your browser-based reading system and, perhaps in some ebook editor/compiler. You will have following since situation in digital publishing area is not normal at all.
BTW, your browser-centric reading system is some commercial secret or there is some info about it available somewhere?
You are right talking about browser-centric or browser-based reading systems. It is quite natural to expect that for a format which is based on web technologies reading systems will be browser based, that is will use some browser engine for rendering and will be developed as browser add-ons or as programs using browser control. But in this case reading systems will inherit browser security model, so why any fears about scripting? Adobe’s Digital Editions case is really strange in this context. Instead of taking some existing browser engine, be it Webkit, Gecko or Opera, they develop a new one for use in DE. A new engine with its own bugs to search workarounds for? And a new beast XSL-FO to learn? Thanks, no. Everything which worked when a page was on the server should work in exactly the same manner when that same page is included in digital book without any stripping or complicated tweaking. The fact that DE is best known and most advanced reader for epub format is a sure sign that IDPF and epub goes wrong way.
What I wish for is this: a simple ebook format which allows me to use all technologies there are on the web with exactly the same freedom as on web and imposing no additional limitations. Secondly, some browser-based reader (browser add-on or some program based on some quality browser engine). Thirdly, some program (editor/compiler) for producing publications from preexisting web-pages. Now there is neither of these three things. And it seems to me that there is little hope that with a leadership such as that of IDPF these three things will emerge.
April 13th, 2008 at 3:45 pm
Laisvunas, you have a really good point there. WHATWG is an excellent example of how one can spur on another (though remember they released the public draft on HTML 5 together).
The system I’m referring to is alive and well at bookglutton.com. It features an AJAX reader and Package Creation tool. The package tool is currently part of the upload feature which enables people to convert .doc, .rtf, and html documents to epub packages that can be viewed in the Reader. Once we have more epubs out there, direct epub upload will also be an option. We may also eventually enable epub download. Right now, we’re having some doubts about the value of that.
I agree with your thoughts on scripting. I think anyone who develops for the browser would share those thoughts too. I think one problem is that not many e-book reading system developers come from a browser-centric background.
April 13th, 2008 at 9:49 pm
Aaron, perhaps I’m misinterpreting your last posting, but it sounds to me like you’re really talking about wanting to control the content. If you went to all the trouble of creating tools to allow others to convert content to epub, and epub is the native format that your web-only viewer supports, then clearly you see some value in epub that your other comments would deny. Also, your statement about not wanting to allow epub downloads sounds like a control issue again.
So, as long as people have to come to your site to read, your ok with epub? You’re barking up the wrong tree. Sure, we all read some things on the web, but I think most readers want the ability to take content offline and carry it with them to read.
April 13th, 2008 at 10:27 pm
Joseph, I never stated that I saw no value in epub.
As for content, all of our content is free and no account is required to read it. For those who like to read offline, I highly recommend supporting your local public library or bookstore.
April 13th, 2008 at 10:44 pm
>Joseph, I never stated that I saw no value in epub.
Not directly, but your original post panned epub rather severely. It certainly leaves the impression that you had no use for epub.
>For those who like to read offline, I highly recommend supporting your local public library or bookstore.
Sounds like I hit a nerve
Thankfully, there are many other sources of ebooks to be enjoyed offline, both free and commercial. I patronize them, as do quite a few others.
Those issues aside, now that we know that you are using epub internally, perhaps you would share some of your experiences as to creating and using that format? I’m sure we’d all like to hear about some of the tools and processes that you have developed.
April 13th, 2008 at 10:56 pm
To quote from your original post:
“but we also need legions of independent developers building APIs and authoring tools and Reading Systems and open-sourcing all of them.”
I agree with you 100% on this. How about sharing some of those tools that you have developed?
April 13th, 2008 at 11:25 pm
Yeah, I’d be more than happy to write some posts and documentation when I get time, and to start an SVN repository somewhere if enough people express interest. Things I would consider are some APIs for BookGlutton and a REST interface, a PHP class to abstract epub documents, and an upload point solely for document conversions. I would also like to look into the BOOK proposal more deeply, as a way for people to drag and drop epubs onto their browser windows to read them–except as we’ve all noted here, this requires subverting the spec.
April 13th, 2008 at 11:37 pm
Aaron, have you looked at Openberg Lector? It is a plugin that allows you to read epub (and a few other formats) from within Firefox. I have tried it and although it is usable, it needs more work. Perhaps you can either contribute to that, or use some of their ideas as a basis for your own reader.
I took a look at your site and the presentation is quite nice. From my perspective, it is a shame that I can’t download these nicely formatted ebooks to read away from my PC. I’ll have to find them elsewhere.
April 14th, 2008 at 2:49 pm
> this requires subverting the spec.
Aaron, from my point of view the only way to save epub from its ultimate longterm failure is to modify it so as to make it more powerful and flexible. If you could show how it can be modified and if you provided some compelling examples of use of this modified format in some reader, it would be the best for epub and digital publishing in general.
I think that in its current, not modified, state epub is not suitable for digital publishing. I would not recommend to adopt it to anyone. Why?
1) it is weak and not flexible. No scripting, no flash, no multimedia. It means that if you adopt epub now, you will be forced to abandon it as soon as some other publishers will adopt some more powerful platform (for example, Adobe AIR) for their projects.
2)There are no readers based on quality browser engine. That means that to publish the same content at first on web and then as epub you should support in addition to Internet Explorer, Safari, Firefox and Opera all reader engines such as Digital Editions, dotReader, FBReader who have their own bugs. Even if there will emerge some reader based on quality browser engine (maybe Openberg if it will be mored developed), existence of those other readers not so based will create wrong expectations that epub publication will be rendered correctly in all readers which most probably will not be the case. That means that a publisher should either distribute its own reader or direct readers to some quality reader, and this is quite complicated.
April 14th, 2008 at 5:03 pm
Although OPS does not “bless” any particular audio/video format, they may certainly be used in OPS Publications. Fallbacks do need to be provided, however.
About web browsers, it certainly is possible to build plug-ins for web browsers that can render EPub. I discussed this in a prior comment.
April 14th, 2008 at 7:38 pm
I’m guessing that Laisvunas has never tried to take epub content (XHTML and CSS) and use it on a web site, or to take some valid XHTML (or even decently constructed HTML 4) and create an epub with it. If he did, he would find that getting things to display properly is not as big a headache as he thinks it is.
Sure, there are a few things that cause problems in different browsers. But this is not a fault with epub. The browser quirks were there long before epub and there are well documented ways to deal with these quirks. Going forward, even Microsoft has promised to better comply with standards. I see this as a non-argument as far as epub is concerned.
As for some people’s insistance on multimedia in ebooks, while I can see some use for it in certain cases, for general use, I feel about it the way Rod Serling did when commenting on early TV:
“It is difficult to produce a television documentary that is both incisive and probing when every twelve minutes one is interrupted by twelve dancing rabbits singing about toilet paper.”
April 14th, 2008 at 7:56 pm
Jon, while we are all discussing epub, what about dictionary support in the next version of the spec? As I commented on in another blog entry here, I think a standard dictionary format and lookup method really needs to be added to epub (and soon).
Some people don’t seem to care about having an integrated dictionary for e-reading and some can’t be without one. I’m more in the later camp.
The dictionary formats used by Microsoft Reader and Mobipocket Reader are both well documented. The easiest thing to do would be to adopt one of these, thus gaining an instant selection of commercial dictionaries for purchase. If that option is not possible or desireable, then certainly some similarly structured XML format can be invented for epub reader use. Or, even one of the existing XML dictionary formats that the Open Source community has developed (at least two that I know of) can be used.
April 15th, 2008 at 2:56 am
>getting things to display properly is not as big a headache as he thinks it is.
As soon as you have something more complicated than simple sequence of <p> tags in the page, getting things displayed properly is quite complex. Proper display is possible only in quality browsers or in software based on quality browser engine.
Suppose some epub pages in which has some elements with static, some with absolute and some with fixed positioning, and some html tables. Will it be correctly displayed in current epub readers? - No!
FBReader supports neither css positioning nor tables. Digital Editions probably supports it, but DE is based on totally unknown rendering engine and it means that if someone wants to publish for DE he should support a new HTML rendering engine with its own bugs to search workarounds for.
>Although OPS does not “bless” any particular audio/video format, they may certainly be used in OPS Publications.
Well, but will anybody provide plugins for epub readers for rendering those formats? Most probably there will be no plugins, and no multimedia as a consequence.
>it certainly is possible to build plug-ins for web browsers that can render EPub
But until now there is only Openberg Reader, not rich in features (no contents in menu or sidebar, no library view, no search) and not supported in Firefox 3. In this situation if someone publishes epub, that epub will almost certainly not read in browser-based reader, but in not so based DE or FBReader.
It seems to me that IDPF made a huge mistake by thinking about epub apart from browser. Since developing quality engine for rendering HTML and CSS is a gigantic task, any format based on HTML should rely on modern browsers or browser engines for its display. By extension, any format which allows multimedia in addition to HTML/CSS should rely on current browser plugins.
April 15th, 2008 at 3:16 am
The mythical scenario you “suppose” about would render differently in each of today’s browsers as well (as I previously pointed out). If you want such typographic control of your content, PDF is your medium, not the web. Your arguments are specious. I’m done.
April 15th, 2008 at 4:26 am
Laisvunas writes:
You’re talking in generalizations here. Can you give us some specific examples of what you consider to be adequate and inadequate readers? Are you talking about desktop reader applications, embedded reader applications (like those found in dedicated e-reader devices), browser plug-ins, or all three?
If, on the other hand, you’re objecting to the fact that different rendering engines produce different results, well, that’s just a fact of life. Web developers have been dealing with this problem for years, and it’s not likely to get better any time soon. HTML and CSS are large, complex beasts, and in many cases the specs leave implementation details open to interpretation. You can probably expect to experience more inconsistencies as browsers/readers begin to implement new versions of these specs.
Well, but will anybody provide plugins for epub readers for rendering those formats? Most probably there will be no plugins
As far as I know, Openberg Lector on Firefox handles media just fine. Lector handles the .epub file and the Gecko rendering engine handles the rest. What’s the problem?
Openberg Lector is an immature product. Give it some time.
Firefox 3 is beta software. It shouldn’t come as any surprise that some plug-ins designed for Firefox 2 don’t work on Firefox 3 — yet. Once again, give it a little time. The root of most of your objections appears to be your frustration with the fact that these applications don’t do what you want them to do RIGHT NOW. Patience is a virtue.
You seem to believe that books and web sites are synonymous. I disagree with this assertion. Both have their place, but they’re two different things. The current epub spec, despite its shortcomings (which are likely to be addressed in subsequent versions), strikes me as a perfectly reasonable first pass at defining a file format designed primarily to support books as we know them today — an ordered, linear collection of pages — containing text and images; in electronic format. It does not attempt, nor should it, to revolutionize the concept of a book.
You also seem to believe that the solution to underpowered readers it to embed a custom reader in the content. Again, I disagree with this assertion. I think users want a consistent user interface, and I doubt they’d warm to the idea of having to use a different reader for every publisher’s content.
Finally, let’s not confuse platforms, applications, and file formats — they’re three completely different things. If you want to go it alone, define your own file formats and build an application that supports them; if you want that application to run in multiple environments, develop it on a portable runtime platform like Java or Adobe AIR; if you want to publish content that will be supported by multiple applications, choose a file format that is widely supported by existing software.
April 16th, 2008 at 11:17 am
>Can you give us some specific examples of what you consider to be adequate and inadequate readers?
FBReader is inadequate since it does not support HTML tables. Only Openberg is adequate for rendering pages since it is based on Firefox as its plugin.
>you’re objecting to the fact that different rendering engines produce different results, well, that’s just a fact of life. Web developers have been dealing with this problem for years, and it’s not likely to get better any time soon.
On the web this problem is getting better with a release of every new version of any major browser. In epub space this proble is getting worse with release of any new reader.
>The current epub spec, despite its shortcomings (which are likely to be addressed in subsequent versions), strikes me as a perfectly reasonable first pass at defining a file format designed primarily to support books as we know them today — an ordered, linear collection of pages — containing text and images; in electronic format. It does not attempt, nor should it, to revolutionize the concept of a book.
Epub format was proposed and in some circles accepted as a kind of standard for electronic publishing. I dispute that epub deserves such status. To propose as a standard some format which is designed to support books as we know them today is wrong. Nobody knows what electronic publishing will evolve into and how electronic publications will look like. In this unsettled situation those who propose some kind of standard should leave as much as possible space for experimenting, should strive to provide maximum flexibility.
IDPF does not go in this direction. They propose utterly restrictive format and say that it is right to adopt it now. I dispute this. It is wrong to adopt epub now for many reasons. First, its unflexibility and restrictiveness, second, appalling state of reader software. Only Openberg is based on browser engine which rendering power can be believed into (but it is immature and it is doubtful even if its developers will release a version compatible with Firefox 3 when the latter ships); the other readers are based on something nobody knows what. The common denominator of what can be displayed reliably on epub readers is very low (no html tables, no css positioning). On the web situation is quite different: although there are differences between browsers, the common denominator of what can be displayed reliably in major browsers is quite high.
April 17th, 2008 at 11:13 am
Note to everyone that the new EPub Community group is now open for discussion. I hope that Aaron, Laisvunas, Joseph, and Todd will consider contributing to discussion there. All viewpoints are welcome of course, and I hope we can continue to discuss the items brought up in this comments area.