Abstract: The first part of the paper identifies disadvantages of first generation Web publishing solutions that have to be overcome for professional publication providers. Using Hyper-G for distribution of electronic documents opens the way to the first fully-integrated professional publishing solution on the Web. User and group access rights as well as billing mechanisms are integrated into the server, links are bidirectional and annotations to existing documents are possible without changing the contents of documentsi themselves. Hyperlinks are supported in arbitrary document types besides hypertext which opens the way to new publishing paradigms beyond the paper-based approach. Naturally, a rich set of structure and search mechanisms is provided. On this basis, a set of tools has been developed that almost fully automatically supports electronic refereeing, automatic hyperlink creation, glossary links and table of contents generation. All the data prepared on a Hyper-G server can then be simply exported to CD-ROM, allowing hybrid Web/CD-ROM publications to be created without any additional effort.

Key Words: electronic publishing, WWW, Hyper-G, HyperWave, automatic hyper-link creation, electronic refereeing

___________________

*1 This paper was also presented at WebNet '96: The First World Conference of the Web Society, San Francisco, CA, October 1996

1 Motivation

Electronic publishing is today one of the booming branches on the Web. Web surfers will find lots of electronic paperware on their ride through the servers. However, most of the Web sites that serve electronic publications are run by universities and only a few are operated by publishing companies on an evaluation basis free of charge.

Having a closer look at the way electronic publishing is done today, one will mostly find HTML or PDF documents that are very similar to their paper-based counterparts. Often a search engine is provided to make location of interesting papers easier, but other benefits of publishing electronically are mostly neglected. For the reader of electronic publications, nearly no value is added compared to paper-based articles. Worse than that - considering HTML documents - the possibility of making high quality printouts for archival purposes is lost. This is surely not enough to make electronic publishing on the Web a success.

Distributing electronic documents on the Web is considered to be cheaper than publishing on paper [Odlyzko 95], additionally turnaround time from submission of an article to the final published version is considered to be sig- nificantly shorter than for paper-based publications. As experience with J.UCS [Section 5] shows, this is certainly not true for high quality publications that

Page 650

are refereed. In this case refereeing, corrections by the authors and resubmission are the most time-consuming stages of document preparation. Once the paper is ready for publication, hyperlinks have to be inserted in the electronic version, which is still done by hand in first generation systems because appropriate tools are missing. Both refereeing and hyperlink creation can be sped up when using the right tools as can be seen later.

Hyperlinks in first generation Web systems are represented by URLs (Uniform Resource Locators) [Berners-Lee et al 94] that are unidirectional and embedded in documents. This approach implies that links can easily point to nowhere (the "this document has disappeared syndrome"). Using URNs (Uniform Resource Names) instead of URLs is the solution to this problem. URNs no longer point to the physical location of a document on a server, but add another level of indirection by defining unique location independent names for documents.

Embedding hyperlinks in documents has the disadvantage that only document formats designed to be hyperlinkable, such as HTML, allow navigation through cyberspace. Unfortunately, a lot of formats used for different document types were designed long before anybody knew about the Web. This includes all image formats like TIFF, GIF, JPEG, [Murray 94], it also includes all video formats such as MPEG as well as one of the most widespread formats for professional publishing: PostScript [Adobe 90]. The way out of this dilemma is to have a separate link database and consider links to be overlays. This automatically makes all document formats hyperlinkable because documents and links are handled separately.

Structuring and classification of documents in standard first generation Web servers has to be hand-coded using HTML documents. For this reason, most servers have poor structure and no document classification because it is too much work to provide it. This makes navigation rather difficult [Andrews 95], because after a having followed a few hyperlinks, the readers no longer know where they are (the "lost in hyperspace" syndrome). Once a reader has found an interesting paper, it is absolutely necessary to immediately store a bookmark. If the way to the paper was longer than around 3 mouseclicks it is nearly impossible to remember the way to find it the next time it is needed. One of the really severe problems of electronic publishing today is the lack of accounting and billing mechanisms as well as user access rights [Maurer 94]. For this reason, electronic publications on the Web are mostly free of charge which is not the way publishing companies operate.

Very often it would be interesting for server operators to have statistical information on a user session basis to be able to follow users on their way through a server and optimize their offer. As an example, one could find out that 60% of the readers from overseas are leaving the server after having followed 4 hyperlinks. Having a closer look it could turn out that the document after the 4th hyperlink contains a huge inline image that is unacceptable for an overseas data transmission and that this document has to be changed.

2 Hyper-G as a Basis for Electronic Publishing

All the problems and difficulties mentioned above have lead to the development of a Web server with a completely new underlying concept compared to first

Page 651

generation Web systems: Hyper-G [Maurer 96]. Hyper-G now really allows professional electronic publishing to be done on the Web and enters a new era of electronic publications in terms of usability and content. Up to now, electronic publishing has been done either on the Web or on CD- ROM basis. Using Hyper-G one can create hybrid Web and CD-ROM publication without additional effort as has been successfully proven over the last two years with J.UCS, the Journal of Universal Computer Science by Springer Pub. Co. [Maurer 94]. The only thing publishers have to do to produce a CD-ROM is to prepare the data on a Hyper-G server and then export the whole collection tree or parts of it.

The kernel of the Hyper-G server is an object-oriented distributed network database with a separate link database. Information structure as well as document meta-information are a basic part of the concept [Kappe 91]. This makes it possible to present the user with a seamless world-wide structured information space across server boundaries.

Document meta-information such as author, title, keywords, creation date, modification date as well as expiry date and many more support the Web surfer in getting as much information as possible. Naturally, document meta-information is searchable in addition to the fulltext search facility. The scope of searches is user definable and can be one small part of one server or even the whole content of all servers worldwide in one single operation. Even when performing searches on multiple servers it is not necessary to know about the server addresses. More than that: meta-information cannot only be applied to documents but also to hyperlinks! This means that links can have types, such as annotation links, inline links, also version links for documents where multiple versions exist and many more.

Hyper-G servers do not only provide read access but write access is also possible. Read and write access to documents are controlled on a user and group access rights basis and billing is integrated into the server. This concept opens completely new perspectives of electronic publishing: having the hyperlinks in a separate link database makes every document hyper-linkable even when the document format does not allow links [Maurer 96]. All links in Hyper-G are bidirectional, making it possible to not only follow the links pointing from one document to another but also to see links referencing a document and follow them in reverse direction. Being able to examine the neighbourhood of a paper makes it possible to find other interesting papers on the same topic that very likely are difficult if not impossible to locate if only unidirectional links were possible as is the case with first generation Web servers. Hyperlinks in arbitrary document types such as PostScript, images, movies, 3D scenes and even sound make navigation in hyperspace easy and a structured hierarchical view of the database with location feedback helps overcoming the "lost in hyperspace" syndrome.

With this approach, electronic publications need no longer be text based with some multimedia add-ons. Instead authors can choose the document type most suitable for the topic without loosing important hypernavigation features. As an example, a paper about new chemical structures could consist of 3D models of molecules that are clickable. The hyperlinks could then lead to spectrum images that are then linked to some additional text based explanations in e.g. PDF [Adobe 93]. A video of an experiment, naturally again with hyperlinks to explanations, could complete the presentation.

Page 652

All the documents in the example above would carry meta-information such as keywords and therefore could be easily located in a search. Acceptance of electronic publications is highly dependent on their quality. For electronic publishing, quality does not only mean high quality contents, which can be assured by an appropriate refereeing process. Stability of electronic publications is at least as important. Technically it is easy to change electronic papers after publication but this is unacceptable. Instead, Hyper-G's annotation and versioning mechanisms can be used to alert the reader of new results or errata. In this case the paper is not changed at all, only additional information is added to the paper. Therefore all citations of the paper that existed for the original version are still valid and the reader can choose to browse annotations and newer versions of the paper on demand.

Annotations in Hyper-G are hyperlinks pointing to the document that is annotated. Since Hyper-G's links are bidirectional the reader simply follows an annotation link backwards to read the annotation. The use of URNs in a link database instead of URLs embedded in documents guarantees that the annotation links are stable. This means that an annotated document can be moved around on the server or even from one server to another without generating annotations that point to nowhere. All links that pointed to the document before are then pointing to the document at its new location. As has been mentioned above, refereeing is one of the very time consuming processes of publishing. Using annotations, electronic refereeing can be performed easily: papers are inserted in the Hyper-G server with read access only for the referees. The referees then comment the papers using the annotation mechanism. If desired, annotations can also be made readable for the author, so the author is able to react immediately on the referees' comments. More than that - the author himself could also annotate the referees' comments to clarify misunderstandings. Of course the author as well as the referees remain anonymous [Maurer 95]. Saving the time taken sending documents back and forth between referees, authors and editors, as well as being able to do corrections in papers while refereeing is still in progress can shorten the refereeing process significantly.

Amongst other structuring elements, Hyper-G supports the concept of clusters. A cluster contains several documents that are related to each other and therefore should be viewed together. In the example given earlier in this paper, the 3D molecular models could be clustered together with an explanatory text. In this case the user would get the 3D model in one window together with the explanatory text in another window.

Clusters are also used to serve multilingual documents. Documents in different languages are put together and readers then only get the documents matching their language preferences. In first generation Web systems the only possibile way to have multilingual documents was to let the user choose the language on the entry page and then follow different paths through the server for different languages. This approach caused a lot of work for server operators and readers had no chance to change the language on the fly. With Hyper-G only one path through the server has to be maintained and the readers can switch back and forth between multiple languages whenever they like.

Page 653

3 Professional Tools for Publishers

So far the discussion has been about the technical design of Hyper-G and the resulting possibilities arising when using it for distribution of electronic documents. But that's not really all: these features can also be helpful for internal purposes such as collecting all the versions of a paper from the original submission over intermediate corrected versions to the final published paper. The complete refereeing process including all different versions of papers together with the referees' comments can be kept on the same server that is used for distribution. Supported by Hyper-G's access control mechanisms, the publisher defines what a reader can see. As an example, subscribers would only see the published versions of papers, while referees would also see intermediate versions of the paper they are currently refereeing. The editor in chief and the responsible staff would have access to all the information including refereeing forms and internal information. Referees would not have the rights to change a paper they are refereeing, but they would have the rights to make annotations. If desiredi, the author could get the right to also annotate the referees' annotations to clarify misunderstandings - naturally both referees and author remaining anonymous. For the whole process from administration to final document preparation, a set of tools and fill-out forms has been developed that can be adopted according to the publishers' needs. The following paragraphs deal with some of the more sophisticated tools that have been developed to provide a cost effective way for publishers to add value to electronic papers.

As mentioned above, it is desirable to keep older versions of documents for archival and maintenance purposes. For this reason a special versioning tool was developed that uses Hyper-G's cluster mechanism: different versions of a document are clustered together and the reader can switch back and forth between different versions on the fly. Whenever a new version of a document is inserted a special link migration tool parses all outgoing hyperlinks in the older document and inserts them at the appropriate location in the new document. Only outgoing links are considered during migration because incoming links should not be touched - they are normally comments for exactly the version of the paper they are pointing to.

Considering a paper one will find a lot of so-called vocative links. A vocative link is a textual pointer to a location such as "see also page nn". In scientific papers one will normally find a references section with pointers to other publications - again a typical example of vocative links.

These considerations about vocative links have lead to the development of an automatic vocative link creation tool. The tool is based on a so called "Vocative Link Creation Language" (VLCL) that was designed to support the description of contexts in documents and to find potential hyperlinks in it. To spare the reader from too many details, here is a small example:

Consider a piece of text with the phrase ...details can be found in [Moser 95].... The tool will identify this phrase to be a vocative hyperlink leading to the references section and insert a link to the references here. Parsing the references it finds the entry [Moser 95] Moser J., "The Art of PostScript programming", available at http://www.iicm.edu/app, Dec. 95. The URL will be identified by the tool and a link to the according document will be inserted. Since VLCL is a programming language the behaviour can even be controlled to the extent that the link to the references is not created but leads directly to the location instead.

Page 654

If this sounds complicated - VLCL was designed to describe such rules in a very compact and high level manner. Normally VLCL programs having the functionality described above are not longer than around 30 -- 40 lines. Furthermore they normally have to be written only once because journals each have their own well-defined citation rules that do not change much over time. In addition, citation rules do not vary much between journals so that an existing VLCL program can be slightly modified and will then suit the needs of another journal. A different tool for automatic hyperlink creation deals with glossaries. A glossary in Hyper-G is defined as an arbitrary collection of explanatory documents that are classified by their titles and keywords. The glossary link creation tool accepts arbitrarily many glossary collections and automatically interlinks items in papers with glossary entries. The links created get the special type glossary so that the reader can turn them on and off seperately when needed. The glossary link creation tool works at the moment for HTML documents; PDF and PostScript support is under development.

Feedback about the behaviour of readers is necessary to improve the quality of a server. Standard statistics tools today are able to count the number of accesses to a document and give information about the location of the reader. Due to Hyper-G's user session concept, a lot more information can be extracted from the logfiles. For this reason a specialized statistics tool was developed that is also able to trace the readers' way through the server from the beginning to the end of a session.

Using the session-oriented statistics, the information provider can see the "typical" path of a reader through the server and find out about specific problems that arise. As an example, the server operator could see that the majority of users are quitting a session when downloading a certain document. In this case the information provider could examine the document and perhaps find out that it contains a huge inline image that takes too much time to be downloaded. Another situation could be that link references are misunderstood and the readers get a page they don't want.

It can also be analysed whether the structure on the server is easy to handle for the reader. Items that are often searched instead of accessed directly are very likely not easy enough to reach. Parts of the structure that are never accessed also alert the operator that something must be wrong there. Analysing access and path statistics carefully and inserting appropriate links or restructuring the server accordingly helps a lot to ensure proper quality.

4 Making Life Easier for the Reader

Up to now discussion has been about how to add as much information as possible to documents. One of the points was to interlink documents to the maximum extent possible by adding interand intra-document links as well as glossary hyperlinks and more. Using the tools described above this can be done with minimum effort and stability of hyperlinks is guaranteed by Hyper-G. But there is a negative aspect too: too many hyperlinks also means too many color changes which disturbs the reading flow.

As has been discussed earlier Hyper-G supports different kinds of hyperlinks: annotations, glossary links, referential hyperlinks, inline links, texture links and many more. To avoid too many disturbing color changes, links of different types

Page 655

can be turned off and on seperately when needed. Another possibility that is only implemented in Harmony at the moment is to show different kinds of links in different colors (Harmony is one of the very specialized Hyper-G authoring tools). Utilizing this feature, the reader would know about the type of a link even before following it.

As has been mentioned before grouping of documents in clusters is one of the basic concepts in Hyper-G. One of the applications of clusters is it to provide multilingual versions of the same document. Naturally it is normally not possible to translate every single document on a server to several different languages. On the other hand it is not too much effort to translate neuralgic and mostly static parts of the data such as glossaries. In reality, if there are multilingual glossaries existing it normally does not even help too much to translate the papers as well.

Consider an international journal: the language of papers will most likely be English and readers will understand English. The only problem is that some very topic-specific words will be found in the papers, in which case readers can use the glossary to look up unknown words. If the readers were not native English speakers and the glossary also contained a difficult explanation they might be forced to follow some more links between glossary entries until they finally understood the meaning. For this group of readers, a translation of the first entry would very likely help a lot. And there are certainly a high number of non-native English speakers reading English papers.

An all-time-important aspect on the Web is transmission speed. Long distance data transmissions on a slow transmission rate are annoying and in the worst case, electronic publications will not be accepted by the reader for the lack of availability. There are two ways out of this dilemma that can be done simultaneously: providing documents of different size and quality as well as mirroring and caching documents.

For providing documents of different size again Hyper-G's cluster concept can be used. For instance, a cluster could hold several verions of the same document: one in HTML format with small inline images for quick browsing, a second, better version could also be HTML but with high quality images and finally a third, the professional version, could be PostScript with 600 dpi resolution including 600dpi true color images. The reader decides which level of quality to get once per session and the rest is done automatically throughout the whole session. Of course, readers can also change their choice on the fly.

More than that - consider the case of an HTML document with different sets of inline images - small ones with poor quality and huge professional ones. On standard first generation Web servers two different versions of the HTML document must be provided with different links. On Hyper-G links do not point directly to the images but instead to the clusters! In this case the HTML document only has to be prepared once and the inline images automatically change according to the readers' choices, even on the fly if desired! The other well known solution to overcome the transmission speed problem is it to cache and mirror documents and of course Hyper-G has the standard proxy functionality implemented. Besides that, a sophisticated document replication mechanism [Kappe 95] is implemented in Hyper-G.

Document replication in Hyper-G is understood to be a special kind of mirroring with the mirrored documents knowing about the original. This is possible by giving the documents a replica identification that matches the global object Id of the original document. The global object Id of documents in Hyper-G is a

Page 656

world wide unique 64 bit identifier. The effect is that documents can be mirrored to the local site without users having to know about it. If they try to access one of the original documents on the remote server or even one of the documents on another mirror site they automatically get the replicated document from the local site instead of doing a long distance data transfer.

5 Current Electronic Publications With Hyper-G Technology

-- The first electronic journal based on Hyper-G was J.UCS - the Journal of Universal Computer Science by Springer Pub. Co. It is a monthly journal covering all knowledge areas of computer science. In addition to the Web version, a yearly CD-ROM and printed version are provided by Springer. Papers in J.UCS appear in two parallel formats: hypertext and hyperlinked PostScript, PDF is planned for 97. See [http://www.iicm.edu/jucs] - the master J.UCS server - for more details and information about mirrors.

-- Academic Press decided to distribute JNCA - the Journal for Network and Computer Applications (former JMCA and JMA) - electronically using Hyper-G starting in January 96. JNCA can be found at [http://www.iicm.edu/jnca], mirrors are in preparation.

-- One of the most reputable journals in physics, FBS - Few Body Systems - also by Springer - started regular electronic service in January 1995 using Hyper-G. Find FBS at [http://www.iicm.edu/fbs].

-- The German bible of Data structures - Datenstrukturen by Ottmann and Widmeyer - is published electronically on a Hyper-G server using hyperlinked PostScript. At the moment the German version is available at [http://www.iicm.edu/datenstrukturen].

-- Since this paper excessively deals with Hyper-G - naturally also the "Hyper-G bible" called "HyperWave - The Next Generation Web Solution" published by Addison Wesley is available electronically at [http://www.iicm.edu/hgbook] (Note: HyperWave is the product name of the software whereas Hyper-G is the name for the underlying technology). Check out [http://www.iicm.edu/hyperwave] for detailed information about Hyper-G technology and the HyperWave product line.

-- In additional to the Hyper-G bible, Addison-Wesley decided to publish some 30 books electronically on the Web using Hyper-G. Some of them are already under preparation. As soon as they are released they will be found in the IICM electronic library [http://www.iicm.edu/electronic_library].

-- One of the most comprehensive German encyclopedias - Meyer's Lexikon - is electronically available via Hyper-G on an n-user license basis - have a quick look at [http://www.iicm.edu/ref.m10] if there is something you always wanted to know.

References

[Adobe 90] Adobe Systems Inc.: "PostScript Language Reference Manual", 2nd Edition, Addison-Wesley (1990)

[Adobe 93] Adobe Systems Inc.: "Portable Document Format Reference Manual", Addison-Wesley (1993)

Page 657

[Andrews 95] Andrews K., Kappe F., Maurer H. and Schmaranz K.: "On Second Generation Network Hypermedia Systems", Proc. ED-MEDIA '95, (1995), 69-74.

[Berners-Lee et al 94] Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H. and Secred, A.: "The World-Wide Web." Communications of the ACM 37,8 (1994), 76-82.

[Kappe 91] Kappe F., Maurer H. and Tomek I.: "Hyper-G - Specification of Requirements", Proc. Conference on Intelligent Systems (CIS) '91, (1991), 257-272

[Kappe 95] Kappe F.: "A Scalable Architecture for Maintaining Referential Integrity in Distributed Information Systems", J.UCS 1,2, (1995), 84-104.

[Maurer 94] Maurer H. and Schmaranz K.: "J.UCS - The Next Generation in Electronic Journal Publishing", Computer Networks and ISDN Systems, Computer Networks for Research in Europe, Vol. 26 Suppl. 2, 3, (1994), 63-69.

[Maurer 95] Maurer H. and Schmaranz K.: "J.UCS and Extensions as Paradigm for Electronic Publishing.", Proceedings DAGS'95, Boston Massachusetts, (1995).

[Maurer 96] Maurer H. ed.: "HyperWave - The Next Generation Web Solution", Addison-Wesley, (1996).

[Murray 94] Murray J., D. and van Ryper W.: "Encyclopedia of Graphics File Formats", O'Reilly and Associates, Inc. (1994).

[Odlyzko 95] Odlyzko, A., M.: "Tragic Loss or Good Riddance? The impending demise of traditional scholarly journals." In "Electronic Publishing Confronts Academia: The Agenda for the Year 2000," Robin P. Peek and Gregory B. Newby, eds., MIT Press/ASIS monograph, MIT Press (1995).

Page 658