Humans:
from the beginning is a single-volume guide to the whole of the human past,
from the first apes to the first cities (for more information, go to www.humansfromthebeginning.com).
It took me nearly five years to research and write, and my intention was to
release it initially as an eBook on the Amazon Kindle platform (I may
eventually also release it as a print-on-demand paperback but currently have no
plans to do so). To produce an eBook from my source files was certainly not
going to be a trivial task – the work as a whole ran to around 250,000 words in
32 chapters, together with an introduction and a number of maps, infographics
and plates and illustrations. In addition, each chapter was comprehensively
referenced. Although not a textbook as such, Humans: from the beginning draws extensively on journal articles
and other academic literature, and each source used was properly cited using
Harvard-Anglia referencing.
In this post, I will document the steps I took to turn my
source files into an eBook meeting the same standards of production values and
professionalism as an academic book produced by one of the major publishing
houses. Please note that the work is a case study rather than a comprehensive
guide. Obviously no two books are alike, and not all the material here will
necessarily be relevant to your needs. Conversely, you may find that some of
your needs and questions are not directly addressed. If so, however, there
should be enough material here to point you in the right direction. Please also
note that a basic knowledge of HTML and CSS is assumed.
Becoming a micropublisher
The basic questions I needed to address were:
1. How to ensure that my self-published eBook was as
professional in its production values as any produced by a major publishing
house;
2. Mastering the basics of producing an eBook for the Kindle
platform;
3. How best to use the eBook format to provide ease of access to
the roughly 1,500 academic sources my book cited;
4. How else I could take advantage of the eBook format by
offering features not available in a traditional book;
The importance the first of the above cannot be overstated. We
live in a world in which self-publishing has finally come of age, liberating
authors from the often frustrating task of trying to persuade publishers and/or
literary agents to take them on. However, they now face a fresh set of problems
in that they now have sole responsibility for tasks that could once be left to
their publishers. The author is now in effect a micropublisher, and if they do
not achieve the same standards of professionalism as a larger-scale publisher
their work – however good – will be unlikely to be taken seriously.
The first and most obvious requirement for a book is a good book
cover. This should not be daunting; my book featured a stone hand-axe
superimposed on a horizon over which dawn is breaking. The breaking dawn
represents the long, slow rise of our modern world; the hand-axe is a stone
tool of a type that remained in use more or less unchanged for one and a half
million years. In many cases, things need not even be that complicated. For
this work I selected a simple textured background, available from the Amazon
Cover Creator.
The next step was to obtain an ISBN Number for my book. Although
this is not obligatory for an Amazon Kindle book, I felt it would be advisable.
Large-scale publishing houses do not release books without ISBN numbers, as a
micro-publisher I wanted to do the same. In the UK, ISBN numbers are the
responsibility of the Nielsen ISBN Agency (please note that their site is a bit
temperamental with some browsers). The minimum purchase is a block of ten
numbers for a price of £132.00 inclusive of VAT. This might sound like
overkill, but bear in mind that if do intend to release your book as a
paperback as well as an eBook, you will require an ISBN number for each format.
Academic non-fiction is frequently re-issued as new editions, each of which
will also require a fresh ISBN number.
You also need to give a name to your publishing house. Note that
this is a purely a label and not a limited company: you do not have to register
anything with Companies House. However, you do need to choose a name that does
not conflict with that of any other publishing house. It is also advisable to
steer clear of names suggesting an association with well-known organisations or
individuals. “Beckham Books” might sound catchy, but unless you happen to share
your surname with the former England footballer it is probably best avoided. A
Google search should confirm whether your chosen name is likely to be
acceptable, but Nielsen has the final say.
I also registered web domain names for my book title and for the
name of my publishing house. I set up a promotional website for the book: this
is a fairly straightforward non-self-hosted WordPress blog; the web domains www.humansfromthebeginning.com and www.humansfromthebeginning.co.uk
both redirect to it. The site itself features a brief biography of the author
(i.e. myself), a preview extract and links to where the book may be purchased
on www.amazon.co.uk
and www.amazon.com
(it’s worth noting that although I’m a UK-based publisher, the bulk of my sales
have been in the United States).
Intellectual Property
Fail to respect the intellectual property of others and
solicitor’s letters could start landing on your doorstep.
As noted above, my work was fully referenced in accordance with
standard academic practice. I was also sparing in my use of exact quotes,
preferring where possible to paraphrase. Exceptions were made when the quote
was obviously intended by its author to be humorous; there I was careful to
fully-attribute the quote in writing in addition to providing a citation.
Although I did not do so in my book, be aware that quoting lines from songs or
printed matter that are not out of copyright will require permission from the
copyright holder.
My book included a number of photographic images and here I was
careful to either i) obtain the permission of the copyright holder, or ii) ensure
that it was available for use under Creative Commons. In the first instance, I
actually used only one image which was licensed for use at a very reasonable
fee. In all cases, I provided full attribution, identifying sources and
copyright holders, with details of the relevant Creative Commons licences where
applicable. A caveat is that you can come across items that should not have
been listed under Creative Commons, for example photographs that have been
taken in museums and other places where photography is not permitted or is for
personal use only (the same applies, of course, to any photographs you might
take yourself).
eBook Basics
Pretty well any of the remarks above could be applied to
traditional books as well as eBooks, but before going any further into the
details of how I converted the finished manuscript of Humans: from the beginning into an eBook, here is a very brief
introduction to eBooks and how they differ from printed books. An eBook is a
book-length electronic document comprising text and images that is readable on
a computer, mobile device or dedicated e-reader (such as the Amazon Kindle). Many
eBooks are electronic versions of printed books, but many (including mine) do
not have a printed counterpart. Though many would argue that eBooks lack the
charm of printed books, they do have a number of advantages. The most obvious
is that large numbers of eBooks can be stored on a device no larger than a
single printed book.
Other important advantages are:
1. An eBook does not require an index, as all text is
searchable. To somebody like myself, who constantly needs to look up items in
reference books, printed indices are a constant source of frustration as time
and time again what I am looking for is either not in the index at all, or a
listed page (often the only one) contains absolutely no reference to the
required subject matter (the printed book equivalent of the dreaded 404 Not
Found message). Furthermore, you can only look up subjects. If you want to look
up a phrase you happen to remember as part of the text you want to find, there
is no way to do so.
2. Navigation within an eBook is quick and easy. Instead of
referring to the Contents for the page number of the desired chapter and then
turning to that page, you can be taken there at a single click. While this
might not seem like a big deal in itself, in a non-fiction eBook the same
approach can be used to provide easy access to references, glossary items and
visual matter.
3. The reader of an eBook is not stuck with the publisher’s
choice of font and can select from a number of fonts. Text size, page colour,
margin size and line spacing are also reader-selectable. It is actually
possible for the publisher to mandate the choice of font in an eBook, but Amazon
discourages the practice and I did not do so.
4. An important difference between a printed book and an eBook
is that in the latter, the concept of the page number is completely
meaningless. The amount of text displayed at any one time on an e-reader will
depend on a) the physical size of the device and b) the choice of font size
selected by the user.
There are two major eBook formats, the open EPUB standard and
Amazon’s in-house MOBI/KF8. The Kindle e-reader, as one might expect, uses the
latter format. Despite this, we need not concern ourselves greatly with
MOBI/KF8, because Amazon provides a tool known as KindleGen that will convert
an EPUB file to a MOBI/KF8 file. The output file, which has a file extension of
.mobi, is Kindle-compatible. KindleGen can also accept files in HTML or XHTML,
and Amazon recommends its use for publishers wishing to create Kindle books
in-house.
Two quick and dirty practical exercises
As a preliminary exercise, I needed to familiarise myself with
the basics of producing an eBook and getting it on to a Kindle. As a starting
point, I downloaded the Amazon Kindle Publishing Guidelines, which are
available as a .pdf file. Google ‘Amazon Kindle Publishing Guidelines’ to
obtain the latest version of this document. I then downloaded and installed the
KindleGen tool provided by Amazon (the procedure is explained in the publishing
guidelines) and I also downloaded and installed Notepad++, a freeware
file-editing tool with some very powerful features including the ability to run
Regular Expression (RegExp) scripts. Throughout the conversion exercise I was using
MS Word on a PC running under Windows 7.
The following is a quick and dirty practical exercise to put a
mini-eBook onto a Kindle e-reader. Note that from now on I will use the term ‘Kindle
e-reader’ to mean any device capable of reading a Kindle eBook. These include
not just dedicated devices such as the Kindle Paperwhite and the Kindle Fire
(the latter basically a customised Android tablet) but also iPhones, iPads,
Android devices or laptops running the appropriate Kindle app or software.
For this exercise, you will require such an e-reader, together
with an Amazon account. Your Kindle e-reader will have an email address in the
format {my Kindle email address}@kindle.com. This will be the address you set
up when you registered the device and you can remind yourself by going to the
Amazon website and selecting Your Account -> Manage Your Kindle -> Manage
Your Devices.
To convert a document and load it onto your kindle, you will
need to use a slightly different email address: {my Kindle email
address}@free.kindle.com. Simply email any small document (MS Word, .rtf or
.html) to this address, putting ‘Convert’ in the subject. Conversion usually
takes no more than a few minutes. You then will receive an email advising you
that the conversion has been completed.
Go to Your Account -> Manage Your Kindle. In ‘Your Kindle
Library’ you will see your newly-converted document at the top of a list of
your Kindle documents. Assuming your Kindle e-reader is connected to the
internet, your document should appear as downloadable to it (exactly how it is
displayed depends on your device as the implementation varies from platform to
platform). I found this exercise to be a useful introduction, but as I shall
explain shortly, it is not suitable for producing a full-sized eBook. There is
really only one way to accomplish this, and it is to use the KindleGen tool
provided by Amazon. Here is a second quick and dirty practical exercise, this
time using KindleGen for converting an HTML file.
Set up a directory on your PC and create a command line .bat
file with the following command:
c:/kindlegen/kindlegen.exe {myfile}.htm>errors.txt
Running the command line file will generate the files
{myfile}.mobi and errors.txt. The latter will contain one or two warning
messages, because we are not at this stage converting a genuine eBook. However,
the {myfile}.mobi can be read on a Kindle or Kindle-enabled device. Send the
file to the {my Kindle email address}@free.kindle.com email address, and
download it to your device as before. This might seem very simple, but now try exporting your lengthy
manuscript from your word processor to HTML, converting it with KindleGen and
trying to read the resulting .mobi file on your device. If your Word document
contained a Table of Contents, this will appear as a series of hypertext links
to the chapters of your book. The links will work – but they will be very slow.
If you have kept each chapter of your book as a separate document (as I did)
and haven’t at any stage combined them into a single massive manuscript (as I
did periodically for test purposes and to circulate to interested parties) then
there is no need to try this: just take my work for it).
Here’s why – eBook files are basically HTML files contained in a
wrapper. Your eBook may be thought of as a website, and hypertext links work
exactly the same way as they do on a website. Now imagine a website that held
all its content on a single, massive page. Any hypertext linking within it
would run pretty slowly. Of course, websites consist of many pages, with
hypertext links typically taking you from one page to another. That is exactly
how your eBook needs to be structured if your readers are to enjoy what Amazon
term a ‘good reading experience’.
Preparing your manuscript for conversion to an eBook
As noted above, I kept each chapter of Humans: from the beginning as a separate MS Word document. The ‘manuscript’
to be converted into an eBook comprised MS Word documents for the 32 chapters,
an introduction and a glossary, together with a title page, copyright notice, acknowledgements
and attributions. The chapters and introduction (though not the glossary) were
referenced using the MS Word citations tool. In addition there were maps,
infographics and plates and illustrations. These I decided to keep separate
from the main text on the grounds that a reader would find it easier to access
them from a central index than would be the case if they were embedded in
individual chapters. My task was to transform this into an EPUB document that
could in turn be converted to the Kindle-compatible KF8 format with Amazon’s
KindleGen tool.
The first step was to convert each Word document into an HTML
file. MS Word provides a ‘filtered HTML’ export option from .doc and .docx
files, but unfortunately this still produces a considerable amount of junk. Indeed,
many books recommend simply copying the contents of each Word document into a
flat text file. I feel that this is throwing out the baby with the bathwater,
as you will lose all of your formatting in the process.
I created a series of styles to cover all aspects of formatting
in each chapter – one each for chapter heading, section headings within each
chapter, and body styles. I used the styles to handle indentation and before
and/or after line spacing. I entirely eliminated the use of tabs, spaces and
carriage returns to accomplish this. The result is that when exported to HTML,
the body text of each document will comprise a series of series of paragraphs
that lend themselves to formatting with cascading style sheets. In an eBook, as
in a website, formatting is carried out using classes contained in a .css file.
The next issue I faced was references, of which my book
contained large quantities. In an eBook, the reader should be able to look up a
reference by simply clicking a hypertext link. They can then return by either
1) clicking a link on the reference that takes them back, or 2) using the ‘back’
function on their e-reader. The first method requires additional HTML coding
and has the problem that it will always return the reader to the same point regardless
of how many times the particular reference is cited in the text. As I was
constantly citing multiple instances of references, I decided that the first method,
though easier to implement, was actually the most suitable in my case.
While I was working on my book, I used Harvard-Anglia
referencing (author(s), year; e.g. Smith, 2012) to save having to constantly
look up what was being cited. However, the presence of large numbers of
references cited in this style can interfere with the reading experience, so
for the purposes of publication I switched to Nature referencing (as used in the science journal Nature), where the reference is assigned
a number that refers to its entry in the bibliography. The references in my
book are broken down by chapter, meaning that each chapter has its own
bibliography. The methods I will describe apply to the referencing system I
have just described, but they could be adapted for other systems if desired.
I first applied a style to the references. This served two purposes:
firstly, the formatting could be again controlled through the cascading style
sheets, and secondly it facilitated the attachment of hypertext links. To
accomplish this, I created the following Word macro:
Sub ApplyCitationStyle()
Dim stylename As String
Dim exists As Boolean
Dim s As Style
Dim fld As Field
stylename = “In-Text Citation”
‘Check if the style already exists.
exists = False
For Each s In ActiveDocument.Styles
If s.NameLocal = stylename Then
exists = True
Exit For
End If
Next
‘If the style did not exist yet, create it.
If exists = False Then
Set s = ActiveDocument.Styles.Add(stylename,
wdStyleTypeCharacter)
s.BaseStyle =
ActiveDocument.Styles(wdStyleDefaultParagraphFont).BaseStyle
s.Font.Superscript = True
End If
‘Now that the style really exists, select it.
Set s = ActiveDocument.Styles(stylename)
‘Apply the style to all in-text citations.
For Each fld In ActiveDocument.Fields
If fld.Type = wdFieldCitation Then
fld.Select
Selection.Style = s
End If
Next
End Sub
The macro formats the references with a style called “In-Text
Citation”, which results in them being displayed as superscripts. It isn’t
actually necessary for it to do so, as you will have to implement
superscripting with your cascading style sheets. The important point is that
the references are now spanned by the style.
For each chapter document, I saved a copy and switched from
Harvard-Anglia to Nature referencing
before running the ApplyCitationStyle macro; I then inserted the bibliography
at the bottom of the document using the Word ‘Insert Bibliography’ feature. For
Nature referencing, this appears as a
table, but I converted it to straight text and formatted it using a Word style.
At the end of these steps, I had a series of ‘well behaved’ MS Word documents,
one per chapter plus one for the introduction. These were ready for export into
a series of HTML files, two per document, one to hold the text and the other
the bibliography of that document.
From Word to HTML
Before beginning the conversion process, I set up a directory
structure to hold my files. This would eventually form the backbone of my
eBook:
1. Within a directory called ‘Build’, I created two
subdirectories; ‘META-INF’ and ‘OEBPS’ (the subdirectory names are required by
the EPUB standard; the name ‘Build’ was my choice);
2. Within ‘OEBPS’, I created three subdirectories; ‘content’, ‘css’
and ‘images’;
‘css’ held the .css file (as one might expect);
3. ‘images’ was used to hold the image JPEG or GIF files
associated with my work; I created subdirectories within it for each category
of image: these were ‘maps’, ‘infographics’ and ‘pictures’, together with a
subdirectory named ‘cover’ to hold the book cover JPEG file;
4. Within ‘content’, I created the subdirectories ‘text’, ‘references’
and ‘toc’, together with one directory for each category of images (i.e. ‘maps’,
‘infographics’ and ‘pictures’);
5. The ‘text’ subdirectory held the files making up the main
body of the text, i.e. chapters, introduction, glossary, title page, copyright
notice, acknowledgements and attributions;
6. The ‘references’ subdirectory held my bibliography files;
7. The ‘toc’ subdirectory held table of contents files, as will
be discussed later;
8. The three image subdirectories held the container files which
display the image files (maps, infographics, and plates and illustrations) and
accompanying explanatory texts;
I was now ready to begin the export process and produce two HTML
files for each document: one for the document text and one for the
bibliography. For chapters, I used the naming convention ChxxN.htm and
ChxxR.htm, where xx is the chapter number with leading zero and the suffixes ‘N’
and ‘R’ identify the chapter text and bibliography files respectively (‘N’
simply referred to the Nature referencing
convention). Other main body text files I simply called by name, i.e.
Introduction.htm, Glossary.htm, etc. The only bibliography file not following
the ChxxR.htm convention was that pertaining to the Introduction; I called it
IntR.htm. These conventions were purely of my own choosing, but the code
described below is based on them. Using other naming conventions would require
the code to be modified accordingly.
I exported each of Word files to HTML by saving as ‘Web page,
filtered’ and opening the resulting file in Notepad++. Each still contained a
significant amount of junk, and I also needed to wrap double-quotes (“) around
the CSS class names, which was readily accomplished by Search and Replace. Note
that the .CSS classes don’t necessarily have to have the same names as the
corresponding Word styles and it was possible to rename them at the same time
as I added the double-quotes. For example, I renamed In-Text Citation to Citation.
Next, I copied and pasted the formatted paragraphs and the bibliography
from each HTML extract file to publishable HTML files set up using the
following general template, taking care to ensure that encoding was set to
UTF-8 for all files. Note that KindleGen will fail if this is not done.
Each document text file has the following format:
[Body text copied from the export file
goes here]
Where:
1. {my CSS file name} is the name of the
css file (HFTB.css in my case)
2. {my div id} is a unique capitalised identifier,
based on the name of the file, e.g. CH05, INTRODUCTION, GLOSSARY;
3. {my text heading} is the chapter name
or name of the text as will appear in the eBook (e.g. 22: Of rice and men);
4. TOC.htm is a table of contents file
for the main text, to be discussed below; the code provides a return hyperlink;
Each bibliography file has the following
format:
[Bibliography copied from the export
file goes here]
Where:
1. {my CSS file name} is the name of the
.css file (HFTB.css in my case)
{my chapter name} is the title of the chapter;
2. CH{ chapter number} is a
four-character text string corresponding to the chapter number with leading
zero, e.g. ‘CH05REF’ (the reference section for the introduction is ‘INTREF’);
3. RefTOC.htm is a table of contents for
the bibliography, to be discussed below; the code provides a return hyperlink;
At this stage, I had two HTML files, ChxxN.htm (chapter text)
and ChxxR.htm (bibliography) for each chapter (xx = chapter number with leading
zero). Unfortunately, as noted above, the files still contained random junk,
which I had to identify and remove by manual editing.
Commonly-occurring junk includes:
1. Unwanted spaces and other blank characters preceding and
within HTML tags, and following after HTML close tags;
2. Unwanted tags, leading to non-well-formed HTML;
3. Unwanted Style attributes.
I now needed to establish hyperlinks from the citations in chapter
text files to the corresponding references in the bibliography files. To this
end, I used Regular Expression (RegExp) search and replace terms in Notepad++.
For each set of chapter text and bibliography files, I proceeded
as follows:
1.
Open the bibliography file in Notepad++;
2.
Go to Search/Replace and select Regular
Expression mode;
3.
Enter the search string
;
(\d+).;
9.
Enter the replace string Citation”>$1
where xx =
chapter number, with leading zero (e.g. CH05);
10. Click
Replace All;
11. Enter
the search string Citation”>(\d+),;
12. Enter
the replace string Citation”>$1,
where xx = chapter number, with leading zero (e.g. CH05);
13. Click
Replace All;
14. Enter
the search string (\d+)
,(\d+)\;
15. Enter
the replace string $1
,$2
where xx = chapter number, with leading zero (e.g. CH05);
16. Click
Replace All repeatedly until you receive the message “Replace: All 0 occurrence
was replaced” [sic];
17. Save
the file;
With the above set of processes completed for each of my MS Word
chapter files, I had completed the process of exporting main manuscript to
HTML.
Logical and Physical TOCs
A Kindle eBook has two tables of contents (TOC): a logical TOC
and a physical (or HTML) TOC. The logical TOC allows readers to navigate
between chapters when using a Kindle e-reader. The exact implementation depends
on the device used, but in general the reader will be presented with a list of
the book’s contents and will be able to navigate to the chapter of their
choice. The physical TOC, on the other hand, will be encountered when the
reader pages through the book from the beginning. Just where it occurs is up to
the publisher, but I located it after ‘Acknowledgements’ and before ‘Introduction’
near the start of the book. It serves the same purpose as the logical TOC,
allowing the reader to navigate to the chapter of their choice. Unlike the
logical TOC, it cannot be summoned on demand, other than via the logical TOC
itself.
In the EPUB 3.0 standard, the logical and physical TOCS can be
accommodated in the same HTML file. Previous implementations required the
logical TOC to be placed in a separate .nav file, in which the order of
appearance for each item has to be coded explicitly. This means a simple
re-ordering of the content requires recoding every single entry, which is
tedious to say the least. For this reason, I adopted the EPUB 3.0 standard
although EPUB 2.0 was suitable in every other respect.
In my implementation, both TOCs were accommodated in a file
named TOC.htm, which resides in the content/toc subdirectory. In principle,
both TOCs should also be able to share the same code but in practice this was
found to cause problems with some implementations.
The TOC.htm file has the following
format:
http://www.w3.org/1999/xhtml
xmlns:epub=“http://www.idpf.org/2007/ops” xml:lang=“en”>
[Physical TOC goes here]
[Logical TOC goes here]
The physical TOC consisted of a series
of entries, one for each item directly referenced from it:
Where:
1. {my CSS class} is a CSS class to
format the line;
2. {file name} is the target file name
including extension, e.g. Ch05N.htm;
3. {file div id} is the div id (see
above) of the target file, e.g. CH05;
4. {text description} is the text
appearing within the
tags of the target file (see above), e.g. “27: An enigmatic civilisation”;
tags of the target file (see above), e.g. “27: An enigmatic civilisation”;
The logical TOC comprises an ordered
list enclosed within a
[Logical TOC entries go here]
The logical TOC entries take the form:
In theory there is no reason why the
ordered list could not serve as the physical as well as logical TOC. It should
be possible to suppress the (unwanted) automatic numbering that appears on an
ordered list; however on some Kindle e-reader implementations this does not
work, and the automatic numbering still appears.
Other TOCs
The logical and physical TOCs described above are mandated (or
at least highly recommended by Amazon) and for my eBook, provide navigational
access to all the items in the content/text directory. The EPUB standard
provides for the nesting of TOCs so that, for example, an entry marked
‘References’ could be expanded into the bibliography list by chapter. Unfortunately,
the Kindle platform does not support nesting for the logical TOC, but there is
no restriction on using physical TOCs.
Accordingly, I provided four additional TOCs: one for accessing
the bibliography (RefTOC.htm), one for accessing the maps (MapTOC.htm), one for
accessing the infographics (FigTOC.htm), and one for accessing the plates and
illustrations (PicTOC.htm). All were in turn accessible from the main TOC.
The RefTOC.htm file has the following
format:
[Bibliography entries go here]
The bibliography entries have the
following format:
Where:
1. {my CSS class} is a CSS class to
format the line;
2. {chapter no.} is the chapter number
with leading zero (for the introduction the target bibliography file is
IntR.htm);
3. {chapter title} is the text appearing
within the
tags of the target bibliography file, e.g. “27: An enigmatic civilisation”;
tags of the target bibliography file, e.g. “27: An enigmatic civilisation”;
The MapTOC.htm, FigTOC.htm and PicTOC.htm files follow the same
general format as RefTOC.htm and hyperlink to the container files which display
the book’s visual matter. As with the bibliography files, the container files
contain return hyperlinks to their respective TOCs.
The Glossary
I provided a glossary in which commonly-encountered terms were
defined and explained. Entries could be accessed alphabetically from within the
glossary via a hyperlinked alphabet, or via hyperlinks in the main body of the
book’s text. Unfortunately, there was no quick and easy way of doing this and
it was necessary to insert the relevant links on an individual basis. Again, as
most glossary items were multiply accessed, no return link was provided and the
reader returns by use of the ‘back’ function on their Kindle e-reader.
The content.opf file
The content.opf file sits in the OEBPS directory and is central
file of an EPUB package. It defines the structure of the eBook and holds its
metadata. Very briefly, it contains four sections: the
section, section, section and
section. The section provides metadata, which is essentially
data about data rather than content. In this implementation, metadata is
supplied for the ISBN number, book title, author name, publisher name, date of
publication, book description, and subject. The section
provides a list of paths, identifiers and properties for each file in the
package; and the section lists in order of appearance the identifiers
for each file in the package, thus defining the order in which they would
appear were the reader to page through the entire book.
The content.opf file has
the following format:
15
aut
{yyyy-mm-dd}T{hh:mm:ss}Z
[One entry for each of the content files from the content/text/
subdirectory]
[MapTOC]
[One entry for each of the container files from the
content/maps/ subdirectory]
[One entry for each of the map image files (in all cases, {type}
= JPEG)]
[FigTOC]
[One entry for each of the container files from the
content/infographics/ subdirectory]
[One entry for each of the infographic image files (in all
cases, {type} = GIF)]
[PicTOC]
[One entry for each of the container files from the
content/pictures/ subdirectory]
[One entry for each of the picture image files (in all cases,
{type} = JPEG)]
[RefTOC]
[One entry for each of the bibliography files from the
content/references/ subdirectory]
[One entry for each of the HTML and image files in the package;
{id ref} will be either the div id for the HTML files or the unique id based on
image name for the image files]
Building the package
At this point, the package was almost complete. It remained only
to add the container.xml file to the META-INF directory and the mimetype file
to the Build directory.
The container.xml file simply tells an eReading device where to
find the content.opf file, and it has the following format:
The mimetype file defines the file as an EPUB and ZIP file. It is
a text file with no file extension containing a single line of text:
application/epub+zip
The next step was to zip the whole package up into an EPUB file
and then convert this to a Kindle-compatible MOBI/KF8. In the Build directory,
I set up two .bat files; compress.bat and mobimake.bat. I also installed the
Zip.exe utility in the Build directory.
Format of compress.bat:
zip {my book}.epub -DX0 mimetype
zip {my book}.epub -rDX9 META-INF OEBPS
Format of mobimake.bat:
{my KindleGen path}/kindlegen.exe {my book}.epub>errors.txt
Having set up these files, I ran compress.bat to produce an EPUB
file, which in my case was called hftb.epub. Before converting this to the
MOBI/KF8 format, it was necessary to check it for errors. There are many free
websites that will upload and validate an EPUB file; I used this one:
http://validator.idpf.org/ (note that it does not have a www. prefix). Once I
had ironed out the inevitable errors, it was time to take the final step and
run mobimake.bat. This created the MOBI/KF8 file, which in my case was called
hftb.mobi. It also outputs the file errors.txt, which lists errors, warnings
and approximate deliverable file size. This latter figure forms the basis of
the ‘delivery fee’ that is charged by Amazon to the author against the royalty
payment for each sale.
Testing and release
With the book now built, the final stage was testing. This
entailed individually testing every single hyperlink in the book and also
checking for formatting errors. This was a laborious procedure for a long book,
but I viewed it as essential. A small number of issues were identified, which were
easy to fix but would have made the book seem less polished as a product had
they been allowed to remain. Testing was carried out on four platforms: a
dedicated Kindle device, an iPhone, a Nexus 7 Android tablet and a PC. Full
testing was carried out only on the latter.
On 4 March 2014, I uploaded Humans:
from the beginning to the Amazon Kindle bookshelf, and it has been in sale
ever since. In the weeks that followed, a few errata came emerged, highlighting
another advantage of the eBook over its printed counterpart. It was simply
necessary to make corrections and upload the corrected version. Previous
purchasers receive this free of charge, similar to the way that apps on a
smartphone are periodically updated. I added a ‘release note’ with the updated
version as a new page located after the bibliography.
No comments:
Post a Comment