<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: The shape of EPUB to come, #2: Hyphenation</title>
	<atom:link href="http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/</link>
	<description>News &#38; views on e-books, libraries, publishing and related topics</description>
	<pubDate>Tue, 07 Oct 2008 14:04:23 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Tamas Simon</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760101</link>
		<dc:creator>Tamas Simon</dc:creator>
		<pubDate>Thu, 10 Apr 2008 22:49:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760101</guid>
		<description>IMO instead of comities we need a good open-source implementation - both reader and author tool.
This would evolve faster than the standard and would provide a way to try new ideas.

I come from a CORBA background.
Though CORBA is pretty much dead because the OMG - the group overseeing the standard - reacted way too slow some gurus started a small company called ZeroC, open-sourced the product, fixed the old specs where they were lacking and simply are just doing the right thing.

I wonder if this will ever happen with EPUBS.

As you know from my posts... I'm not a big fan...</description>
		<content:encoded><![CDATA[<p>IMO instead of comities we need a good open-source implementation - both reader and author tool.<br />
This would evolve faster than the standard and would provide a way to try new ideas.</p>
<p>I come from a CORBA background.<br />
Though CORBA is pretty much dead because the OMG - the group overseeing the standard - reacted way too slow some gurus started a small company called ZeroC, open-sourced the product, fixed the old specs where they were lacking and simply are just doing the right thing.</p>
<p>I wonder if this will ever happen with EPUBS.</p>
<p>As you know from my posts&#8230; I&#8217;m not a big fan&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aaron S. Miller, CTO of BookGlutton</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760078</link>
		<dc:creator>Aaron S. Miller, CTO of BookGlutton</dc:creator>
		<pubDate>Thu, 10 Apr 2008 21:56:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760078</guid>
		<description>CSS is the way to nice typography in epubs. Rather than picking out properties to include, the OPS spec should REQUIRE ALL reading systems to be CSS2 compliant first, and recommend on top of that. This will help usher in all the goodness of CSS3 that is around the corner, and it won't do any harm to backward compatibility with older versions of OPS. This is contrary to the approach taken right now, which I think favors hardware systems by selectively including certain CSS properties and not others, and by forbidding script interpretation, which at is the heart of cross-browser rendering consistency and which, with monstrosities like Internet Explorer 8 out there, is our only hope of having good-looking books on-line.</description>
		<content:encoded><![CDATA[<p>CSS is the way to nice typography in epubs. Rather than picking out properties to include, the OPS spec should REQUIRE ALL reading systems to be CSS2 compliant first, and recommend on top of that. This will help usher in all the goodness of CSS3 that is around the corner, and it won&#8217;t do any harm to backward compatibility with older versions of OPS. This is contrary to the approach taken right now, which I think favors hardware systems by selectively including certain CSS properties and not others, and by forbidding script interpretation, which at is the heart of cross-browser rendering consistency and which, with monstrosities like Internet Explorer 8 out there, is our only hope of having good-looking books on-line.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jack Tingle</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760023</link>
		<dc:creator>Jack Tingle</dc:creator>
		<pubDate>Thu, 10 Apr 2008 19:59:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-760023</guid>
		<description>Please, don't include hyphenation! Or any other really cool features. I read on a diverse range of systems, and each feature that gets added is another way for one or another of the systems to screw up. Keep it simple.

Other features you should consider not adding: ligatures, compound characters, accented characters which aren't part of the normal set, nice looking but non-standard single quotes, double quotes, and apostrophes, to name but a few.

Regards,
Jack Tingle</description>
		<content:encoded><![CDATA[<p>Please, don&#8217;t include hyphenation! Or any other really cool features. I read on a diverse range of systems, and each feature that gets added is another way for one or another of the systems to screw up. Keep it simple.</p>
<p>Other features you should consider not adding: ligatures, compound characters, accented characters which aren&#8217;t part of the normal set, nice looking but non-standard single quotes, double quotes, and apostrophes, to name but a few.</p>
<p>Regards,<br />
Jack Tingle</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Noring</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759948</link>
		<dc:creator>Jon Noring</dc:creator>
		<pubDate>Thu, 10 Apr 2008 17:26:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759948</guid>
		<description>It is simply good practice, and one I would require had I written the OPS specification, for all XHTML and DTBook documents in an OPS Publication, to include &lt;code&gt;xml:lang&lt;/code&gt; on the root element. Doing it here, in addition to doing it in the OPF Package, assures the information is not lost when such documents are repurposed.</description>
		<content:encoded><![CDATA[<p>It is simply good practice, and one I would require had I written the OPS specification, for all XHTML and DTBook documents in an OPS Publication, to include <code>xml:lang</code> on the root element. Doing it here, in addition to doing it in the OPF Package, assures the information is not lost when such documents are repurposed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hadrien</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759937</link>
		<dc:creator>Hadrien</dc:creator>
		<pubDate>Thu, 10 Apr 2008 16:59:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759937</guid>
		<description>Yes you're right, the xml:lang element is probably more suited for this than the DublinCore in the OPF file.</description>
		<content:encoded><![CDATA[<p>Yes you&#8217;re right, the xml:lang element is probably more suited for this than the DublinCore in the OPF file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Noring</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759924</link>
		<dc:creator>Jon Noring</dc:creator>
		<pubDate>Thu, 10 Apr 2008 16:35:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759924</guid>
		<description>For situations where there are words in the content not of the primary language defined in &lt;code&gt;dc:language&lt;/code&gt; (and better yet the attribute &lt;code&gt;xml:lang&lt;/code&gt; applied to the &lt;code&gt;&#60;html&#62;&lt;/code&gt; or &lt;code&gt;&#60;body&#62;&lt;/code&gt; elements in each XHTML content document), then the author should apply &lt;code&gt;xml:lang&lt;/code&gt; to them. This way hyphenation engines will know in which dictionary to look.

In addition, this allows text-to-speech engines to likewise know the language of each word.</description>
		<content:encoded><![CDATA[<p>For situations where there are words in the content not of the primary language defined in <code>dc:language</code> (and better yet the attribute <code>xml:lang</code> applied to the <code>&lt;html&gt;</code> or <code>&lt;body&gt;</code> elements in each XHTML content document), then the author should apply <code>xml:lang</code> to them. This way hyphenation engines will know in which dictionary to look.</p>
<p>In addition, this allows text-to-speech engines to likewise know the language of each word.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Hadrien</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759912</link>
		<dc:creator>Hadrien</dc:creator>
		<pubDate>Thu, 10 Apr 2008 16:12:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759912</guid>
		<description>I agree Jon, the first step would be a recommandation in the OPS specs: a best-practice for authoring and a recommendation for the Reading System.

The TeX approach to hyphenation is usually considered pretty good but not perfect. In the OPF file of an EPUB book, you &lt;b&gt;MUST&lt;/b&gt; specify the language of the book using dc:language (at least one element, multiple elements supported too). Therefore, a Reading System could automatically hyphenate if there's default hyphenation patterns available for this language.

I think that we really need something similar to &lt;i&gt;hyphenate-resource&lt;/i&gt; for another reason: for some languages or when you're using a specialized vocabulary, you'll most likely use words that the basic patterns won't be able to hyphenate.</description>
		<content:encoded><![CDATA[<p>I agree Jon, the first step would be a recommandation in the OPS specs: a best-practice for authoring and a recommendation for the Reading System.</p>
<p>The TeX approach to hyphenation is usually considered pretty good but not perfect. In the OPF file of an EPUB book, you <b>MUST</b> specify the language of the book using dc:language (at least one element, multiple elements supported too). Therefore, a Reading System could automatically hyphenate if there&#8217;s default hyphenation patterns available for this language.</p>
<p>I think that we really need something similar to <i>hyphenate-resource</i> for another reason: for some languages or when you&#8217;re using a specialized vocabulary, you&#8217;ll most likely use words that the basic patterns won&#8217;t be able to hyphenate.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alan Wallcraft</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759840</link>
		<dc:creator>Alan Wallcraft</dc:creator>
		<pubDate>Thu, 10 Apr 2008 14:20:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759840</guid>
		<description>FBReader's automatic hyphenation is based on TeX's approach, see http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen
It needs to know the language to hyphenate correctly.</description>
		<content:encoded><![CDATA[<p>FBReader&#8217;s automatic hyphenation is based on TeX&#8217;s approach, see <a href="http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen" rel="nofollow">http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen</a><br />
It needs to know the language to hyphenate correctly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Noring</title>
		<link>http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759834</link>
		<dc:creator>Jon Noring</dc:creator>
		<pubDate>Thu, 10 Apr 2008 14:14:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.teleread.org/blog/2008/04/10/the-shape-of-epub-to-come-hyphenation/#comment-759834</guid>
		<description>Great article, Hadrien!

As a contributor to IDPF&#8217;s OEBPS Working Group, Hadrien&#8217;s article on hyphenation support in EPub is certainly intriguing, and I suggest that Hadrien submit a formal request to the OEBPS Working Group asking the group to study it and determine whether to explicitly include recommendations and/or requirements in a future version of the &lt;a href="http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html" rel="nofollow"&gt;OPS specification&lt;/a&gt;.

It is important in this discussion, and any discussion about EPub, to separate the Reading System (or as Lee Passey prefers to put it, the &#8220;User Agent&#8221;) from the OPS specification. (For those not familiar with the underpinnings of EPub, EPub is defined by the interrelated OPS, &lt;a href="http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html" rel="nofollow"&gt;OPF&lt;/a&gt; and &lt;a href="http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm" rel="nofollow"&gt;OCF&lt;/a&gt; specifications.)

Regarding Hadrien&#8217;s first suggestion, using the &#8220;soft hyphen&#8221; character, the current OPS specification implies support for this character per Unicode and the HTML specification. Thus, the OPS author (EPub author) may include that character in content. So, Hadrien&#8217;s recommendation is pointed towards EPub Publication authors.

However, OPS itself says nothing about how Reading Systems are to handle soft hyphens when encountered. Here the recommendations of HTML that Hadrien quoted come into play since OPS supports XHTML. Of course, a Reading System may completely ignore the soft hyphens it encounters in content and not use them in any manner (default rendering is that soft hyphens are invisible and of zero-width.)

So a future version of the OPS specification could add a section discussing the use of the soft hyphen, and include both authoring and Reading System suggestions, recommendations and requirements.

Hadrien&#8217;s second comment on the use of the &lt;a href="http://www.w3.org/Style/CSS/current-work" rel="nofollow"&gt;CSS3&lt;/a&gt; properties of &lt;code&gt;hyphens&lt;/code&gt; and &lt;code&gt;hyphenate-resource&lt;/code&gt; is also interesting. Currently OPS &lt;em&gt;does&lt;/em&gt; allow these properties to be used in CSS documents, but does not require Reading Systems to recognize them. Any Reading System may simply ignore CSS properties that it is not required by OPS to recognize.

So, a future version of OPS could also discuss the use of these CSS3 properties. (Note that the OEBPS Working Group was reluctant to &#8220;bless&#8221; any of the CSS3 properties for the current OPS, since most of them are still in the draft phase at W3C. A Reading System, though, may choose to recognize and use any of them &#8212; at least that&#8217;s my current understanding without delving back into the subtleties of OPS.)</description>
		<content:encoded><![CDATA[<p>Great article, Hadrien!</p>
<p>As a contributor to IDPF&rsquo;s OEBPS Working Group, Hadrien&rsquo;s article on hyphenation support in EPub is certainly intriguing, and I suggest that Hadrien submit a formal request to the OEBPS Working Group asking the group to study it and determine whether to explicitly include recommendations and/or requirements in a future version of the <a href="http://www.idpf.org/2007/ops/OPS_2.0_final_spec.html" rel="nofollow">OPS specification</a>.</p>
<p>It is important in this discussion, and any discussion about EPub, to separate the Reading System (or as Lee Passey prefers to put it, the &ldquo;User Agent&rdquo;) from the OPS specification. (For those not familiar with the underpinnings of EPub, EPub is defined by the interrelated OPS, <a href="http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html" rel="nofollow">OPF</a> and <a href="http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm" rel="nofollow">OCF</a> specifications.)</p>
<p>Regarding Hadrien&rsquo;s first suggestion, using the &ldquo;soft hyphen&rdquo; character, the current OPS specification implies support for this character per Unicode and the HTML specification. Thus, the OPS author (EPub author) may include that character in content. So, Hadrien&rsquo;s recommendation is pointed towards EPub Publication authors.</p>
<p>However, OPS itself says nothing about how Reading Systems are to handle soft hyphens when encountered. Here the recommendations of HTML that Hadrien quoted come into play since OPS supports XHTML. Of course, a Reading System may completely ignore the soft hyphens it encounters in content and not use them in any manner (default rendering is that soft hyphens are invisible and of zero-width.)</p>
<p>So a future version of the OPS specification could add a section discussing the use of the soft hyphen, and include both authoring and Reading System suggestions, recommendations and requirements.</p>
<p>Hadrien&rsquo;s second comment on the use of the <a href="http://www.w3.org/Style/CSS/current-work" rel="nofollow">CSS3</a> properties of <code>hyphens</code> and <code>hyphenate-resource</code> is also interesting. Currently OPS <em>does</em> allow these properties to be used in CSS documents, but does not require Reading Systems to recognize them. Any Reading System may simply ignore CSS properties that it is not required by OPS to recognize.</p>
<p>So, a future version of OPS could also discuss the use of these CSS3 properties. (Note that the OEBPS Working Group was reluctant to &ldquo;bless&rdquo; any of the CSS3 properties for the current OPS, since most of them are still in the draft phase at W3C. A Reading System, though, may choose to recognize and use any of them &mdash; at least that&rsquo;s my current understanding without delving back into the subtleties of OPS.)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
