The Web is the format, like it or not
By Aaron S. Miller, CTO of BookGlutton, a Web-based community of readers
Between Google and Amazon, a lot of books are going on-line every day, and while these two are not the only companies doing it, they’re the biggest and the most aggressive.
While many smaller outfits expect people to download a book and read it on the platform of their choice, both Amazon and Google fully expect you to read the books from the Amazon.com or Google.com domains, preferably on their Web sites. Google Booksearch and Amazon Online Reader are both fully functional web-based reading systems which allow you to read paginated text, annotate, communicate with other readers, bookmark and share, all in a browser. And despite Amazon’s offering of the Kindle they are still a Web company, built on Web principles, and we can expect the Kindle won’t forsake their web properties. The Amazon Online Reader is a core product in three of Amazon’s other moneymakers: Amazon Advantage, Search Inside, and the Digital Text Platform, which is itself linked to the Kindle and uses the Online Reader as a preview device for Kindle uploads.
As for Google, well, there’s no doubt as to where Google stands on the Web as platform. They already have us reading PDFs in one of the ugliest interfaces book readers have ever known.
Dictating how books are read
The two big lessons here are:
1. Major players are dictating that books must be read on the Web, and
2. Major players are dictating the experience of reading books on the Web
These two things should worry everyone, because even though many people are disappointed and angry at Amazon’s approach to the market, and plenty are unhappy with Google’s quality control, it’s taken far too long for the rest of us to offer alternatives.
My own company has put out its best first effort: a paginated, networked way to read books called the Unbound Reader. Since we launched it sites like Manybooks, Goodreads and even Gutenberg have added features that allow a user to “page” through texts instead of scrolling them.
Unfortunately this is not on the agenda of the most vocal supporters of digital books. Among e-book lovers, there’s skepticism and even contempt for the idea of reading a book in a browser.
Even among the ranks of the IDPF there is little enthusiasm for the idea. In fact, the IDPF is peddling a format that search engines can’t index, browsers can’t natively open, and experienced web developers can’t figure out. It’s no surprise that neither Google nor Amazon has shown much active interest in ePub. Why would two of the web’s biggest properties invest time and interest in a body so deliberately ignorant of their biggest concerns? Why would they build any of their web properties (including Kindle) around a format designed for back-end systems or hardware with only primitive or no web access? Why would Amazon want ePub, especially when it owns its own clunker of a format, AZW/Mobipocket?
So stop bickering
In this space, there’s an aggressive need to defend one’s turf. You see it among the social book networks (Shelfari, Goodreads, LibraryThing, etc.) and you see it among the device-afficionados (Kindle lovers, Palm jockeys, Cybook and iLiad heads). You see it in the multitude of formats known here as eBabel, and in the hurt tones of each format’s staunch defenders. Everyone applauds when their favorite site or store adds a new format to the repertoire, especially if it’s everyone’s favorite format.
I’ve seen these superstitions before, back in 1995, when many companies and individuals were saying there was no point in having a Web site. Since then, I’ve been building them for clients–as steady a job as I’ve ever had. It seems for every slow realization out there, there’s a slower one coming around the corner. There is always a fair amount of hand-holding and reassurance involved too. And the price, from a client’s POV, is always too high. When I was at Organic, I watched directors and project managers in an endless struggle to convince Macy’s that their millions would return plenty more millions. I won’t divulge numbers, but let’s just say Organic was more than right.
The other big push-back then was advertising. No one thought an ad on a web site would pay off, and plenty of purists were opposed to the very concept. Today it can cost thousands a month and bring tens of thousands of visitors, and even the most ardent advocates of free and open content show ads on their blogs.
That was also when the news industry was vociferously against publishing news online, especially if it was to be free. Some of us knew from the start subscriptions wouldn’t work. And the same people were not surprised when Rupert Murdoch decreed that the Wall Street Journal, one of the last hold-outs on subscription fees, should be free for all on-line? After all, it just makes sense if you understand the nature of the Web.
Lobby the IDPF and build for each other
Writing is an old art, and publishing an old business, and such things don’t adapt quickly to this new medium. I see many admirable efforts on this blog to re-invent books, and I applaud that. But people who love books need to see the larger Web now, while they can still take action.
Book retailers, publishers, authors and developers need to lobby the IDPF to support viable formats, and viable formats are built on top of the existing Web. They don’t require developers to build scads of new tools and validators before they can be used. Book content is moving online, and it’s moving toward one format, the one we all consume daily. Much of it will be free, and all of it needs to be searchable, indexable and accessible by the tools and processes already in place.
Support the Web—not heavy-handed proprietary approaches
Developers need to start building wisely. Build on top of what we have, not just on theory and conjecture. Work together, create APIs and do some reading (or, hopefully, re-reading) of Richard Stallman’s writings. If you’re building a book website, share the wealth with documentation and APIs. Build new things. No one wants fifty identical databases of books, or twenty author wikis when Wikipedia already provides the ones people want. This next generation of the web won’t forgive those practices, because it’s becoming one database and redundancy is for backups. Also, stop creating ISBN-centric systems! An ISBN is something that corresponds to a physical object. Build solutions that fit the new problems rather than retrofitting old ones that solved different problems. And one more thing, coders and architects: lighten up! This means in spirit and in the code you write. Don’t go in for heavy-handed systems made by grave robots. Support the Web, not hardware.









April 28th, 2008 at 11:16 am
Aaron, three cheers for giving people good alternatives to reading books off files!
But there’s also a need for existing approaches, especially with a standard format, which really means, yes, ePub for traditional books (even if this won’t happen immediately for all books).
HTML and other nonproprietary approaches? Sure. But ePub should be the main show for books—at least if the IDPF will kindly care more about intrabook linking, annotations, etc., than it has so far.
I loved your essay but, yes, as you can infer, I disagree with much there. I want to own files and I don’t want my reading restricted to the times I’m online.
But here’s to keeping people thinking—whether about new approaches or ways to improve existing ones!
Your essay should be required reading for the IDPF folks.
I also hope people will check out my piece on library business and access models, where I advocate multiple approaches, both file- and browser-based.
Thanks,
David
April 28th, 2008 at 12:08 pm
hi Aaron,
Great article. I totally agree.
David, what does “owning a file” mean?
Let me tell you about my CD collection. I don’t listen to them very often because it takes too mcuh time to walk across the living room and grab one. They don’t even have the value they used to. They get scratched. Instead I just go online and download the entire discography of my favorite bands. (In Canada it’s legal.)
How long until you’ll feel the same about your books?
The time and efforts would be more worthwhile if we moved every book ever written online, into the “cloud”.
I think all we need to focus on is
- better typography on the web
- better creation tools
One last thought:
For off-line reading PDF format is superb. Reflow is overrated. It can happen at the time when the content is converted to PDF format. (”choose the font-size you want!”)
… and that’s pretty much it
April 28th, 2008 at 12:24 pm
Aaron, I have to disagree.
We really need to make the distinction of reading for pleasure and reading because you have too. They are very separate and need to be approached differently.
Reading how-to, technical articles for research is one thing; but, if you are talking reading for pleasure, at least for the foreseeable future, the desktop is not the first choice.
That’s the biggest missing ingredient to the arguments over ebooks at the moment that people are not grasping in my opinion - not seeing the difference between “informational” reading (blogs, on-line newspapers, e-mail, Google searches, RSS feeds) and “choosing” to read for pleasure.
There is a whole wide world of diffence in those two usage scenarios.
April 28th, 2008 at 1:11 pm
> David, what does “owning a file” mean?
Keeping it forever and using it on a variety of machines and OSes.
As for PDF, yep, it sucks. I want reflow at the consumer level.
Thanks,
David
April 28th, 2008 at 1:36 pm
Aaron Walker,
I think constant web access does not necessarily imply that one is using a desktop computer.
It can be a laptop or some other gadget.
The trend is for all devices to be connected all the time.
e.g. the amazon Kindle has an experimental browser - under the hood I believe
David,
?
what’s this possession mania
I think the best redundancy for “keeping it forever” is provided by the cloud.
I think we need to distinguish between
- a person being able to access his/her collection anywhere anytime
- archival of literature
I think you care about the former.
So what’s wrong with the content being online?
Nothing. What do you care if it’s coming over the wire or coming from a magnetic disk. As long as it’s coming…
April 28th, 2008 at 2:06 pm
I am left wondering what exactly is the point of this article. First, Aaron says that Google and Amazon are forcing users to read books online and that users should be worried (which I agree with). He then proceeds to tell us that not only does his business site do the same, but that everything should be on the web.
Aaron, you have your web-sentric view of things, which is fine. The web is useful for many things, but I don’t think book reading is one of them. I also don’t think that having all your software as a “web app” is a smart idea either. (You didn’t mention this, but it fits the same argument).
You say “there’s skepticism and even contempt for the idea of reading a book in a browser.” This may be true, but your articles leave me with the impression that you are equally skeptical and contemptuous of non-web reading (epub in particular). You also say “In this space, there’s an aggressive need to defend one’s turf.” Well, it seems to me that you are doing the same.
You may have some good ideas about some of these issues, but of the two recent articles that you have posted here, all I see is critisism about epub and drum-beating about a web-only solution, which coincidentally, you have adopted. If this assesment sounds unfair to you, perhaps it is due to the tone of your articles. Maybe you need to rethink how you are presenting your arguments.
April 28th, 2008 at 2:32 pm
This guy clearly has his own axe to grind. Though I’m afraid he may be right about epub:
The format seems to have been designed without thinking about the impact of the Web, and Web 2.0 in particular (by which I mean Javascript apps running in Web browsers). In fact, it looks like it was designed by a bunch of folks who didn’t realize that the “future of ebooks” envisioned in 1996 didn’t actually turn out that way, people who believed that downloadable read-offline files for custom reading hardware were the holy grail of ebook formats, despite all evidence to the contrary.
April 28th, 2008 at 2:49 pm
The cloud is very handy, I’ll admit, but as pointed out elsewhere on Teleread this week, it’s an invitation for governmental and/or corporate snooping the likes of which humanity has never seen. What kind of society will this be in 25 years, and who will have access to the amazing amount of profiling information that the cloud will contain by then? Will Google still not be evil by then?
I’m sorry, but human nature suggests that the future will look less like Star Trek (’Computer, what is the location of David Rothman at this time?’) and more like the bloody century just past. Is it possible to utilize modern technology and live off the grid at the same time?
April 28th, 2008 at 3:28 pm
[...] the future of book reading This article at Teleread really brings to the attention what state reading is developing towards. That is, [...]
April 28th, 2008 at 4:40 pm
There are a lot of interesting angles to this article and follow-up comments. It’s making me accelerate my plans for posting an article covering these various topics. (No specific timetable yet for the article, though, since I’m still untangling it all in my mind.)
One thing is clear: browsers do have most of the capabilities needed for a digital publication reading application including the most important and difficult one to implement: taking XML and splashing the content on the screen in a readable manner.
Two deficiencies with browser reading of digital publication content which ePub fills are 1) single file distribution (hard to argue against), and 2) publication metadata/organization provided by the OPF Package (this is pretty important from several angles – the “web site” paradigm does not include the concept of a Package which is biggest innovation that OEBPS provided to e-bookdom.)
Regarding single file distribution of a “web site”, one could certainly now use MHTML (which Bill Janssen has suggested) which I believe IE supports pretty well (Firefox and Opera support is more problematic so I’m led to believe, especially for running scripts which of course is what a lot of people here love about web content and hate about ePub.)
Alternatively, last year three of us proposed a ZIP-based container for “web content”, compatible with the IDPF OCF container, called the “Generalized Container Format” (GCF).
It was our vision that GCF be used for distributing digital publications (like e-books) directly viewable in browsers “as is”. We envisioned that plug-ins would be developed for browsers like IE, Mozilla/Firefox, Opera, etc. These plug-ins would locally unzip the GCF container and render the web content inside.
GCF may even be used to seamlessly distribute different renditions (formats) of the same publication, such as “browser-ready”, OPS (ePub), Mobipocket, PDF, LIT, etc., thus being a “catch all” container for a variety of formats – a “super” ePub, if you will.
So this is offered as a pathway that is quite compatible with ePub. On a related note, I recently posted to the ePub Community some thoughts on an ePub to “web site” converter. The power of ePub is that it is flexible and high-fidelity, capable of ready conversion into other formats, including a functioning “web site” which is immediately browser renderable.
I don’t see this proposal as competitive with ePub, but rather supportive. After all, what is important is that our digital publication future embrace formatting the publication content in XML documents within a general organizational framework. What those who don’t understand the technical intricacies of ePub fail to realize is that the gap for web browsers to directly render ePub is quite small, and so once web browsers can take a GCF and render the web content inside, the jump to directly rendering ePub becomes even narrower.
April 28th, 2008 at 4:47 pm
Tamas says: “So what’s wrong with the content being online? Nothing. What do you care if it’s coming over the wire or coming from a magnetic disk. As long as it’s coming…”
That is the crux of the matter, right there. Having my ebooks, my music, my personal data, my word processing documents, etc. only available via an internet connection is a major problem. What if I am using a device that doesn’t have internet connectivity? What if I am in a location that doesn’t have this ubiquitous internet “cloud” you are talking about (which is quite a few places)?
Beyond the connectivity issues, there are concerns over who is controlling my access to “my” data if it is all online. Policies change, companies go out of business, governments change laws (and sometimes circumvent the law). I am not willing to place myself at the mercy of all of these entities and circumstances, over which I have no control, just to read a book that I have paid for. If you are, that is your decision.
Until we all live in some future Star Trek utopia, I prefer to be responsible for my own digital information, thank you.
April 28th, 2008 at 4:50 pm
Jon, I recently found a Firfox plugin that handles MHT files. I am using it with Firefox 3 beta 5.
http://www.unmht.org/unmht/en_index.html
April 28th, 2008 at 6:29 pm
Interesting article and comments flowing from it.
I agree that HTML is NOT synonomous with reading at the desktop. Increasingly, just about every device with a screen (or device which can be hooked to a screen) will have an HTML interface. (I remember fighting battles in the 1990s about why network equipment should include web services for management)
Also, smart devices will be able to render HTML in a way consistent with the strengths of the device. After all, that was part of the point of HTML, to separate content from formatting and to let the browser handle formatting in a way that meets the user’s needs.
This is one reason I offer HTML books. However, we don’t live in an always connected world and even for those of us who are always connected, we may share David’s concern for being able to archive our purchases in case the publisher or distributor goes away. And while HTML files can be saved, graphics tend to go funky. Which, I think, is part of the justification for the package idea–all of the parts of the document will be there together. (The other part, as I understand it, is so that eBooks can be used with DRM–something which is already fully discussed on TeleRead).
While I don’t agree with everything Aaron says, I absolutely agree that the more we can start with HTML and figure what we need to add, rather than starting with PDF or anything else, the more universal a reading experience we can offer.
April 29th, 2008 at 2:15 am
Great post and comments, but some points are being missed. See for more details here:
http://exacteditions.blogspot.com/2008/04/web-is-format-but-will-books-be-in-html.html
It is hardly correct to say that Google’s Book Search has us reading PDFs (PDFs are available from Google but with GBS we are reading JPEGs with the help of a database, PDFs are there purely for the convenience of printers) and he seems to be suggesting that going with the Web means going with texts as HTML. This is just not what Google and Amazon are doing. Google and Amazon get their power and their reach by putting all texts into a database system…….
April 30th, 2008 at 8:45 pm
Jon alludes to my suggestion that MHTML would make a good ebook format. It has a number of features which make it much more interesting than any zip-file container format. First of all, it’s text-based, so it’s pretty easy to write Javascript to parse and present, unlike a zip format. It’s Web 2.0 friendly. Secondly, text in it is not compressed (images still are), so it’s easy for Web crawlers to full-text index the content. Thirdly, it’s multipart HTML, so it’s easy for Web developers to figure out. In other words, it addresses the three issues with epub that Miller calls out.
May 1st, 2008 at 12:27 am
The IDPF OCF container format accomplishes two critical requirements:
1) Provides the binding mechanism (OCF uses ZIP).
2) Organizes the content useful for publication distribution purposes (OCF specifies the META-INF/container.xml document).
The real “magic” of OCF is the second item, while the first is simply “mechanics”.
Thus one can imagine repackaging the OCF specified files into MHTML. And of course the same for our proposed GCF.