Sunday 4 May 2014

Academic publishing: a case study


Humans: from the beginning is a single-volume guide to the whole of the human past, from the first apes to the first cities (for more information, go to www.humansfromthebeginning.com). It took me nearly five years to research and write, and my intention was to release it initially as an eBook on the Amazon Kindle platform (I may eventually also release it as a print-on-demand paperback but currently have no plans to do so). To produce an eBook from my source files was certainly not going to be a trivial task – the work as a whole ran to around 250,000 words in 32 chapters, together with an introduction and a number of maps, infographics and plates and illustrations. In addition, each chapter was comprehensively referenced. Although not a textbook as such, Humans: from the beginning draws extensively on journal articles and other academic literature, and each source used was properly cited using Harvard-Anglia referencing.

In this post, I will document the steps I took to turn my source files into an eBook meeting the same standards of production values and professionalism as an academic book produced by one of the major publishing houses. Please note that the work is a case study rather than a comprehensive guide. Obviously no two books are alike, and not all the material here will necessarily be relevant to your needs. Conversely, you may find that some of your needs and questions are not directly addressed. If so, however, there should be enough material here to point you in the right direction. Please also note that a basic knowledge of HTML and CSS is assumed.

Becoming a micropublisher
The basic questions I needed to address were:
1. How to ensure that my self-published eBook was as professional in its production values as any produced by a major publishing house;
2. Mastering the basics of producing an eBook for the Kindle platform;
3. How best to use the eBook format to provide ease of access to the roughly 1,500 academic sources my book cited;
4. How else I could take advantage of the eBook format by offering features not available in a traditional book;

The importance the first of the above cannot be overstated. We live in a world in which self-publishing has finally come of age, liberating authors from the often frustrating task of trying to persuade publishers and/or literary agents to take them on. However, they now face a fresh set of problems in that they now have sole responsibility for tasks that could once be left to their publishers. The author is now in effect a micropublisher, and if they do not achieve the same standards of professionalism as a larger-scale publisher their work – however good – will be unlikely to be taken seriously.

The first and most obvious requirement for a book is a good book cover. This should not be daunting; my book featured a stone hand-axe superimposed on a horizon over which dawn is breaking. The breaking dawn represents the long, slow rise of our modern world; the hand-axe is a stone tool of a type that remained in use more or less unchanged for one and a half million years. In many cases, things need not even be that complicated. For this work I selected a simple textured background, available from the Amazon Cover Creator.

The next step was to obtain an ISBN Number for my book. Although this is not obligatory for an Amazon Kindle book, I felt it would be advisable. Large-scale publishing houses do not release books without ISBN numbers, as a micro-publisher I wanted to do the same. In the UK, ISBN numbers are the responsibility of the Nielsen ISBN Agency (please note that their site is a bit temperamental with some browsers). The minimum purchase is a block of ten numbers for a price of £132.00 inclusive of VAT. This might sound like overkill, but bear in mind that if do intend to release your book as a paperback as well as an eBook, you will require an ISBN number for each format. Academic non-fiction is frequently re-issued as new editions, each of which will also require a fresh ISBN number.
You also need to give a name to your publishing house. Note that this is a purely a label and not a limited company: you do not have to register anything with Companies House. However, you do need to choose a name that does not conflict with that of any other publishing house. It is also advisable to steer clear of names suggesting an association with well-known organisations or individuals. “Beckham Books” might sound catchy, but unless you happen to share your surname with the former England footballer it is probably best avoided. A Google search should confirm whether your chosen name is likely to be acceptable, but Nielsen has the final say.

I also registered web domain names for my book title and for the name of my publishing house. I set up a promotional website for the book: this is a fairly straightforward non-self-hosted WordPress blog; the web domains www.humansfromthebeginning.com  and www.humansfromthebeginning.co.uk both redirect to it. The site itself features a brief biography of the author (i.e. myself), a preview extract and links to where the book may be purchased on www.amazon.co.uk and www.amazon.com (it’s worth noting that although I’m a UK-based publisher, the bulk of my sales have been in the United States).

Intellectual Property
Fail to respect the intellectual property of others and solicitor’s letters could start landing on your doorstep.

As noted above, my work was fully referenced in accordance with standard academic practice. I was also sparing in my use of exact quotes, preferring where possible to paraphrase. Exceptions were made when the quote was obviously intended by its author to be humorous; there I was careful to fully-attribute the quote in writing in addition to providing a citation. Although I did not do so in my book, be aware that quoting lines from songs or printed matter that are not out of copyright will require permission from the copyright holder.

My book included a number of photographic images and here I was careful to either i) obtain the permission of the copyright holder, or ii) ensure that it was available for use under Creative Commons. In the first instance, I actually used only one image which was licensed for use at a very reasonable fee. In all cases, I provided full attribution, identifying sources and copyright holders, with details of the relevant Creative Commons licences where applicable. A caveat is that you can come across items that should not have been listed under Creative Commons, for example photographs that have been taken in museums and other places where photography is not permitted or is for personal use only (the same applies, of course, to any photographs you might take yourself).

eBook Basics
Pretty well any of the remarks above could be applied to traditional books as well as eBooks, but before going any further into the details of how I converted the finished manuscript of Humans: from the beginning into an eBook, here is a very brief introduction to eBooks and how they differ from printed books. An eBook is a book-length electronic document comprising text and images that is readable on a computer, mobile device or dedicated e-reader (such as the Amazon Kindle). Many eBooks are electronic versions of printed books, but many (including mine) do not have a printed counterpart. Though many would argue that eBooks lack the charm of printed books, they do have a number of advantages. The most obvious is that large numbers of eBooks can be stored on a device no larger than a single printed book.

Other important advantages are:
1. An eBook does not require an index, as all text is searchable. To somebody like myself, who constantly needs to look up items in reference books, printed indices are a constant source of frustration as time and time again what I am looking for is either not in the index at all, or a listed page (often the only one) contains absolutely no reference to the required subject matter (the printed book equivalent of the dreaded 404 Not Found message). Furthermore, you can only look up subjects. If you want to look up a phrase you happen to remember as part of the text you want to find, there is no way to do so.
2. Navigation within an eBook is quick and easy. Instead of referring to the Contents for the page number of the desired chapter and then turning to that page, you can be taken there at a single click. While this might not seem like a big deal in itself, in a non-fiction eBook the same approach can be used to provide easy access to references, glossary items and visual matter.
3. The reader of an eBook is not stuck with the publisher’s choice of font and can select from a number of fonts. Text size, page colour, margin size and line spacing are also reader-selectable. It is actually possible for the publisher to mandate the choice of font in an eBook, but Amazon discourages the practice and I did not do so.
4. An important difference between a printed book and an eBook is that in the latter, the concept of the page number is completely meaningless. The amount of text displayed at any one time on an e-reader will depend on a) the physical size of the device and b) the choice of font size selected by the user.
There are two major eBook formats, the open EPUB standard and Amazon’s in-house MOBI/KF8. The Kindle e-reader, as one might expect, uses the latter format. Despite this, we need not concern ourselves greatly with MOBI/KF8, because Amazon provides a tool known as KindleGen that will convert an EPUB file to a MOBI/KF8 file. The output file, which has a file extension of .mobi, is Kindle-compatible. KindleGen can also accept files in HTML or XHTML, and Amazon recommends its use for publishers wishing to create Kindle books in-house.

Two quick and dirty practical exercises
As a preliminary exercise, I needed to familiarise myself with the basics of producing an eBook and getting it on to a Kindle. As a starting point, I downloaded the Amazon Kindle Publishing Guidelines, which are available as a .pdf file. Google ‘Amazon Kindle Publishing Guidelines’ to obtain the latest version of this document. I then downloaded and installed the KindleGen tool provided by Amazon (the procedure is explained in the publishing guidelines) and I also downloaded and installed Notepad++, a freeware file-editing tool with some very powerful features including the ability to run Regular Expression (RegExp) scripts. Throughout the conversion exercise I was using MS Word on a PC running under Windows 7.

The following is a quick and dirty practical exercise to put a mini-eBook onto a Kindle e-reader. Note that from now on I will use the term ‘Kindle e-reader’ to mean any device capable of reading a Kindle eBook. These include not just dedicated devices such as the Kindle Paperwhite and the Kindle Fire (the latter basically a customised Android tablet) but also iPhones, iPads, Android devices or laptops running the appropriate Kindle app or software.

For this exercise, you will require such an e-reader, together with an Amazon account. Your Kindle e-reader will have an email address in the format {my Kindle email address}@kindle.com. This will be the address you set up when you registered the device and you can remind yourself by going to the Amazon website and selecting Your Account -> Manage Your Kindle -> Manage Your Devices.
To convert a document and load it onto your kindle, you will need to use a slightly different email address: {my Kindle email address}@free.kindle.com. Simply email any small document (MS Word, .rtf or .html) to this address, putting ‘Convert’ in the subject. Conversion usually takes no more than a few minutes. You then will receive an email advising you that the conversion has been completed.
Go to Your Account -> Manage Your Kindle. In ‘Your Kindle Library’ you will see your newly-converted document at the top of a list of your Kindle documents. Assuming your Kindle e-reader is connected to the internet, your document should appear as downloadable to it (exactly how it is displayed depends on your device as the implementation varies from platform to platform). I found this exercise to be a useful introduction, but as I shall explain shortly, it is not suitable for producing a full-sized eBook. There is really only one way to accomplish this, and it is to use the KindleGen tool provided by Amazon. Here is a second quick and dirty practical exercise, this time using KindleGen for converting an HTML file.
Set up a directory on your PC and create a command line .bat file with the following command:

c:/kindlegen/kindlegen.exe {myfile}.htm>errors.txt

Running the command line file will generate the files {myfile}.mobi and errors.txt. The latter will contain one or two warning messages, because we are not at this stage converting a genuine eBook. However, the {myfile}.mobi can be read on a Kindle or Kindle-enabled device. Send the file to the {my Kindle email address}@free.kindle.com email address, and download it to your device as before. This might seem very simple, but now try exporting your lengthy manuscript from your word processor to HTML, converting it with KindleGen and trying to read the resulting .mobi file on your device. If your Word document contained a Table of Contents, this will appear as a series of hypertext links to the chapters of your book. The links will work – but they will be very slow. If you have kept each chapter of your book as a separate document (as I did) and haven’t at any stage combined them into a single massive manuscript (as I did periodically for test purposes and to circulate to interested parties) then there is no need to try this: just take my work for it).

Here’s why – eBook files are basically HTML files contained in a wrapper. Your eBook may be thought of as a website, and hypertext links work exactly the same way as they do on a website. Now imagine a website that held all its content on a single, massive page. Any hypertext linking within it would run pretty slowly. Of course, websites consist of many pages, with hypertext links typically taking you from one page to another. That is exactly how your eBook needs to be structured if your readers are to enjoy what Amazon term a ‘good reading experience’.

Preparing your manuscript for conversion to an eBook
As noted above, I kept each chapter of Humans: from the beginning as a separate MS Word document. The ‘manuscript’ to be converted into an eBook comprised MS Word documents for the 32 chapters, an introduction and a glossary, together with a title page, copyright notice, acknowledgements and attributions. The chapters and introduction (though not the glossary) were referenced using the MS Word citations tool. In addition there were maps, infographics and plates and illustrations. These I decided to keep separate from the main text on the grounds that a reader would find it easier to access them from a central index than would be the case if they were embedded in individual chapters. My task was to transform this into an EPUB document that could in turn be converted to the Kindle-compatible KF8 format with Amazon’s KindleGen tool.

The first step was to convert each Word document into an HTML file. MS Word provides a ‘filtered HTML’ export option from .doc and .docx files, but unfortunately this still produces a considerable amount of junk. Indeed, many books recommend simply copying the contents of each Word document into a flat text file. I feel that this is throwing out the baby with the bathwater, as you will lose all of your formatting in the process.

I created a series of styles to cover all aspects of formatting in each chapter – one each for chapter heading, section headings within each chapter, and body styles. I used the styles to handle indentation and before and/or after line spacing. I entirely eliminated the use of tabs, spaces and carriage returns to accomplish this. The result is that when exported to HTML, the body text of each document will comprise a series of series of paragraphs that lend themselves to formatting with cascading style sheets. In an eBook, as in a website, formatting is carried out using classes contained in a .css file.
The next issue I faced was references, of which my book contained large quantities. In an eBook, the reader should be able to look up a reference by simply clicking a hypertext link. They can then return by either 1) clicking a link on the reference that takes them back, or 2) using the ‘back’ function on their e-reader. The first method requires additional HTML coding and has the problem that it will always return the reader to the same point regardless of how many times the particular reference is cited in the text. As I was constantly citing multiple instances of references, I decided that the first method, though easier to implement, was actually the most suitable in my case.

While I was working on my book, I used Harvard-Anglia referencing (author(s), year; e.g. Smith, 2012) to save having to constantly look up what was being cited. However, the presence of large numbers of references cited in this style can interfere with the reading experience, so for the purposes of publication I switched to Nature referencing (as used in the science journal Nature), where the reference is assigned a number that refers to its entry in the bibliography. The references in my book are broken down by chapter, meaning that each chapter has its own bibliography. The methods I will describe apply to the referencing system I have just described, but they could be adapted for other systems if desired.

I first applied a style to the references. This served two purposes: firstly, the formatting could be again controlled through the cascading style sheets, and secondly it facilitated the attachment of hypertext links. To accomplish this, I created the following Word macro:

Sub ApplyCitationStyle()
Dim stylename As String
Dim exists As Boolean
Dim s As Style
Dim fld As Field

stylename = “In-Text Citation”
‘Check if the style already exists.
exists = False

For Each s In ActiveDocument.Styles
If s.NameLocal = stylename Then
exists = True
Exit For
End If
Next

‘If the style did not exist yet, create it.
If exists = False Then
Set s = ActiveDocument.Styles.Add(stylename, wdStyleTypeCharacter)
s.BaseStyle = ActiveDocument.Styles(wdStyleDefaultParagraphFont).BaseStyle
s.Font.Superscript = True
End If
  
‘Now that the style really exists, select it.
Set s = ActiveDocument.Styles(stylename)
   
‘Apply the style to all in-text citations.
For Each fld In ActiveDocument.Fields
If fld.Type = wdFieldCitation Then
fld.Select
Selection.Style = s
End If
Next

End Sub

The macro formats the references with a style called “In-Text Citation”, which results in them being displayed as superscripts. It isn’t actually necessary for it to do so, as you will have to implement superscripting with your cascading style sheets. The important point is that the references are now spanned by the style.

For each chapter document, I saved a copy and switched from Harvard-Anglia to Nature referencing before running the ApplyCitationStyle macro; I then inserted the bibliography at the bottom of the document using the Word ‘Insert Bibliography’ feature. For Nature referencing, this appears as a table, but I converted it to straight text and formatted it using a Word style. At the end of these steps, I had a series of ‘well behaved’ MS Word documents, one per chapter plus one for the introduction. These were ready for export into a series of HTML files, two per document, one to hold the text and the other the bibliography of that document.

From Word to HTML
Before beginning the conversion process, I set up a directory structure to hold my files. This would eventually form the backbone of my eBook:

1. Within a directory called ‘Build’, I created two subdirectories; ‘META-INF’ and ‘OEBPS’ (the subdirectory names are required by the EPUB standard; the name ‘Build’ was my choice);
2. Within ‘OEBPS’, I created three subdirectories; ‘content’, ‘css’ and ‘images’;
‘css’ held the .css file (as one might expect);
3. ‘images’ was used to hold the image JPEG or GIF files associated with my work; I created subdirectories within it for each category of image: these were ‘maps’, ‘infographics’ and ‘pictures’, together with a subdirectory named ‘cover’ to hold the book cover JPEG file;
4. Within ‘content’, I created the subdirectories ‘text’, ‘references’ and ‘toc’, together with one directory for each category of images (i.e. ‘maps’, ‘infographics’ and ‘pictures’);
5. The ‘text’ subdirectory held the files making up the main body of the text, i.e. chapters, introduction, glossary, title page, copyright notice, acknowledgements and attributions;
6. The ‘references’ subdirectory held my bibliography files;  
7. The ‘toc’ subdirectory held table of contents files, as will be discussed later;
8. The three image subdirectories held the container files which display the image files (maps, infographics, and plates and illustrations) and accompanying explanatory texts;

I was now ready to begin the export process and produce two HTML files for each document: one for the document text and one for the bibliography. For chapters, I used the naming convention ChxxN.htm and ChxxR.htm, where xx is the chapter number with leading zero and the suffixes ‘N’ and ‘R’ identify the chapter text and bibliography files respectively (‘N’ simply referred to the Nature referencing convention). Other main body text files I simply called by name, i.e. Introduction.htm, Glossary.htm, etc. The only bibliography file not following the ChxxR.htm convention was that pertaining to the Introduction; I called it IntR.htm. These conventions were purely of my own choosing, but the code described below is based on them. Using other naming conventions would require the code to be modified accordingly.

I exported each of Word files to HTML by saving as ‘Web page, filtered’ and opening the resulting file in Notepad++. Each still contained a significant amount of junk, and I also needed to wrap double-quotes (“) around the CSS class names, which was readily accomplished by Search and Replace. Note that the .CSS classes don’t necessarily have to have the same names as the corresponding Word styles and it was possible to rename them at the same time as I added the double-quotes. For example, I renamed In-Text Citation to Citation.

Next, I copied and pasted the formatted paragraphs and the bibliography from each HTML extract file to publishable HTML files set up using the following general template, taking care to ensure that encoding was set to UTF-8 for all files. Note that KindleGen will fail if this is not done.
Each document text file has the following format:

{my text heading}



[Body text copied from the export file goes here]





Where:
1. {my CSS file name} is the name of the css file (HFTB.css in my case)
2. {my div id} is a unique capitalised identifier, based on the name of the file, e.g. CH05, INTRODUCTION, GLOSSARY;
3. {my text heading} is the chapter name or name of the text as will appear in the eBook (e.g. 22: Of rice and men);
4. TOC.htm is a table of contents file for the main text, to be discussed below; the code provides a return hyperlink;

Each bibliography file has the following format:

{ my chapter name }



[Bibliography copied from the export file goes here]





Where:
1. {my CSS file name} is the name of the .css file (HFTB.css in my case)
{my chapter name} is the title of the chapter;
2. CH{ chapter number} is a four-character text string corresponding to the chapter number with leading zero, e.g. ‘CH05REF’ (the reference section for the introduction is ‘INTREF’);
3. RefTOC.htm is a table of contents for the bibliography, to be discussed below; the code provides a return hyperlink;

At this stage, I had two HTML files, ChxxN.htm (chapter text) and ChxxR.htm (bibliography) for each chapter (xx = chapter number with leading zero). Unfortunately, as noted above, the files still contained random junk, which I had to identify and remove by manual editing.

Commonly-occurring junk includes:
1. Unwanted spaces and other blank characters preceding and within HTML tags, and following after HTML close tags;
2. Unwanted tags, leading to non-well-formed HTML;
3. Unwanted Style attributes.

I now needed to establish hyperlinks from the citations in chapter text files to the corresponding references in the bibliography files. To this end, I used Regular Expression (RegExp) search and replace terms in Notepad++.

For each set of chapter text and bibliography files, I proceeded as follows:
1.      Open the bibliography file in Notepad++;
2.      Go to Search/Replace and select Regular Expression mode;
;
9.      Enter the replace string Citation”>$1
where xx = chapter number, with leading zero (e.g. CH05);
10.  Click Replace All;
11.  Enter the search string Citation”>(\d+),;
12.  Enter the replace string Citation”>$1, where xx = chapter number, with leading zero (e.g. CH05);
13.  Click Replace All;
14.  Enter the search string (\d+)
,(\d+)\;
15.  Enter the replace string $1
,$2 where xx = chapter number, with leading zero (e.g. CH05);
16.  Click Replace All repeatedly until you receive the message “Replace: All 0 occurrence was replaced” [sic];
17.  Save the file;
With the above set of processes completed for each of my MS Word chapter files, I had completed the process of exporting main manuscript to HTML.

Logical and Physical TOCs
A Kindle eBook has two tables of contents (TOC): a logical TOC and a physical (or HTML) TOC. The logical TOC allows readers to navigate between chapters when using a Kindle e-reader. The exact implementation depends on the device used, but in general the reader will be presented with a list of the book’s contents and will be able to navigate to the chapter of their choice. The physical TOC, on the other hand, will be encountered when the reader pages through the book from the beginning. Just where it occurs is up to the publisher, but I located it after ‘Acknowledgements’ and before ‘Introduction’ near the start of the book. It serves the same purpose as the logical TOC, allowing the reader to navigate to the chapter of their choice. Unlike the logical TOC, it cannot be summoned on demand, other than via the logical TOC itself.

In the EPUB 3.0 standard, the logical and physical TOCS can be accommodated in the same HTML file. Previous implementations required the logical TOC to be placed in a separate .nav file, in which the order of appearance for each item has to be coded explicitly. This means a simple re-ordering of the content requires recoding every single entry, which is tedious to say the least. For this reason, I adopted the EPUB 3.0 standard although EPUB 2.0 was suitable in every other respect.

In my implementation, both TOCs were accommodated in a file named TOC.htm, which resides in the content/toc subdirectory. In principle, both TOCs should also be able to share the same code but in practice this was found to cause problems with some implementations.

The TOC.htm file has the following format:
http://www.w3.org/1999/xhtml
xmlns:epub=“http://www.idpf.org/2007/ops” xml:lang=“en”>

Contents



[Physical TOC goes here]

[Logical TOC goes here]










The physical TOC consisted of a series of entries, one for each item directly referenced from it:

Where:
1. {my CSS class} is a CSS class to format the line;
2. {file name} is the target file name including extension, e.g. Ch05N.htm;
3. {file div id} is the div id (see above) of the target file, e.g. CH05;
4. {text description} is the text appearing within the

tags of the target file (see above), e.g. “27: An enigmatic civilisation”;


The logical TOC comprises an ordered list enclosed within a


[Logical TOC entries go here]




The logical TOC entries take the form:


In theory there is no reason why the ordered list could not serve as the physical as well as logical TOC. It should be possible to suppress the (unwanted) automatic numbering that appears on an ordered list; however on some Kindle e-reader implementations this does not work, and the automatic numbering still appears.

Other TOCs
The logical and physical TOCs described above are mandated (or at least highly recommended by Amazon) and for my eBook, provide navigational access to all the items in the content/text directory. The EPUB standard provides for the nesting of TOCs so that, for example, an entry marked ‘References’ could be expanded into the bibliography list by chapter. Unfortunately, the Kindle platform does not support nesting for the logical TOC, but there is no restriction on using physical TOCs. 

Accordingly, I provided four additional TOCs: one for accessing the bibliography (RefTOC.htm), one for accessing the maps (MapTOC.htm), one for accessing the infographics (FigTOC.htm), and one for accessing the plates and illustrations (PicTOC.htm). All were in turn accessible from the main TOC.

The RefTOC.htm file has the following format:
References



[Bibliography entries go here]




The bibliography entries have the following format:


Where:
1. {my CSS class} is a CSS class to format the line;
2. {chapter no.} is the chapter number with leading zero (for the introduction the target bibliography file is IntR.htm);
3. {chapter title} is the text appearing within the

tags of the target bibliography file, e.g. “27: An enigmatic civilisation”;

The MapTOC.htm, FigTOC.htm and PicTOC.htm files follow the same general format as RefTOC.htm and hyperlink to the container files which display the book’s visual matter. As with the bibliography files, the container files contain return hyperlinks to their respective TOCs.

The Glossary
I provided a glossary in which commonly-encountered terms were defined and explained. Entries could be accessed alphabetically from within the glossary via a hyperlinked alphabet, or via hyperlinks in the main body of the book’s text. Unfortunately, there was no quick and easy way of doing this and it was necessary to insert the relevant links on an individual basis. Again, as most glossary items were multiply accessed, no return link was provided and the reader returns by use of the ‘back’ function on their Kindle e-reader.

The content.opf file
The content.opf file sits in the OEBPS directory and is central file of an EPUB package. It defines the structure of the eBook and holds its metadata. Very briefly, it contains four sections: the section, section, section and section. The section provides metadata, which is essentially data about data rather than content. In this implementation, metadata is supplied for the ISBN number, book title, author name, publisher name, date of publication, book description, and subject. The section provides a list of paths, identifiers and properties for each file in the package; and the section lists in order of appearance the identifiers for each file in the package, thus defining the order in which they would appear were the reader to page through the entire book.

The  content.opf file has the following format:

                       
urn:isbn:{my ISBN no.}
15
en-gb
{my book title}
{author name}
aut
{my publisher}
{yyyy-mm-dd}
{yyyy-mm-dd}T{hh:mm:ss}Z
{a brief description of the book}
{my subject 1}
{my subject 2}
{my subject 3}

           

[One entry for each of the content files from the content/text/ subdirectory]

[MapTOC]
                       
[One entry for each of the container files from the content/maps/ subdirectory]

[One entry for each of the map image files (in all cases, {type} = JPEG)]

[FigTOC]
                       
[One entry for each of the container files from the content/infographics/ subdirectory]

[One entry for each of the infographic image files (in all cases, {type} = GIF)]

[PicTOC]

[One entry for each of the container files from the content/pictures/ subdirectory]
                       
[One entry for each of the picture image files (in all cases, {type} = JPEG)]

[RefTOC]

[One entry for each of the bibliography files from the content/references/ subdirectory]




[One entry for each of the HTML and image files in the package; {id ref} will be either the div id for the HTML files or the unique id based on image name for the image files]




Building the package
At this point, the package was almost complete. It remained only to add the container.xml file to the META-INF directory and the mimetype file to the Build directory.
The container.xml file simply tells an eReading device where to find the content.opf file, and it has the following format:
 
   
 



The mimetype file defines the file as an EPUB and ZIP file. It is a text file with no file extension containing a single line of text:
application/epub+zip
The next step was to zip the whole package up into an EPUB file and then convert this to a Kindle-compatible MOBI/KF8. In the Build directory, I set up two .bat files; compress.bat and mobimake.bat. I also installed the Zip.exe utility in the Build directory.

Format of compress.bat:
zip {my book}.epub -DX0 mimetype
zip {my book}.epub -rDX9 META-INF OEBPS
Format of mobimake.bat:
{my KindleGen path}/kindlegen.exe {my book}.epub>errors.txt

Having set up these files, I ran compress.bat to produce an EPUB file, which in my case was called hftb.epub. Before converting this to the MOBI/KF8 format, it was necessary to check it for errors. There are many free websites that will upload and validate an EPUB file; I used this one: http://validator.idpf.org/ (note that it does not have a www. prefix). Once I had ironed out the inevitable errors, it was time to take the final step and run mobimake.bat. This created the MOBI/KF8 file, which in my case was called hftb.mobi. It also outputs the file errors.txt, which lists errors, warnings and approximate deliverable file size. This latter figure forms the basis of the ‘delivery fee’ that is charged by Amazon to the author against the royalty payment for each sale.

Testing and release
With the book now built, the final stage was testing. This entailed individually testing every single hyperlink in the book and also checking for formatting errors. This was a laborious procedure for a long book, but I viewed it as essential. A small number of issues were identified, which were easy to fix but would have made the book seem less polished as a product had they been allowed to remain. Testing was carried out on four platforms: a dedicated Kindle device, an iPhone, a Nexus 7 Android tablet and a PC. Full testing was carried out only on the latter.

On 4 March 2014, I uploaded Humans: from the beginning to the Amazon Kindle bookshelf, and it has been in sale ever since. In the weeks that followed, a few errata came emerged, highlighting another advantage of the eBook over its printed counterpart. It was simply necessary to make corrections and upload the corrected version. Previous purchasers receive this free of charge, similar to the way that apps on a smartphone are periodically updated. I added a ‘release note’ with the updated version as a new page located after the bibliography.

No comments: