On Second Generation Distributed Component Systems
Klaus Schmaranz
(Institute for Information Processing and Computer Supported New Media
(IICM), Graz, Austria
kschmar@iicm.edu)
Abstract: Two of today's most used buzz-words in the context
of software development are the terms Componentware and
Distributed ObjectSystem. The combination of both ideas is
then called a Distributed ComponentSystem, meaning a
componentware approach where the components are distributed across the
network. Today's approaches fulfill the application developers' needs
only partly. Also, most are more or less cumbersome to use. I want to
call such partsolutions like e.g. Corba, Enterprise
JavaBeans and others first generation distributed component
systems. In fact, Corba has a different origin, but for the
moment let me consider it to be a first generation componentware
system, too.
In this paper I want to identify the requirements that have to be
fulfilled to design and implement a second generation distributed
component system. There is one main aspect behind all of the ideas of
second generation systems: a good distributed component system is
one that the application programmers don't notice.
The opensource project Dinopolis is currently in its early
implementation phase and can be considered the first secondgeneration
distributed component system according to the requirements that are identified
in the following. Therefore the very basic cornerstones of Dinopolis are
discussed at the end of this paper.
Key Words: Distributed Object System, Distributed Component System,
Componentware, Java, Network Transparency Aspects, Robust Globally Unique
Handles, Distributed Relations, Middleware, Dinopolis.
Category: D.1.5, D.2, D.2.6, D.2.7
1 Introduction
Ages ago (in terms of fastevolving computerscience) the
objectoriented software development paradigm was
introduced. It allowed easy mastering of huge software packages by
proper encapsulation. The OOparadigm is still the paradigm of
choice for good reasons for most modern programming
languages. Properly applied (and only then!) the OOparadigm
makes, amongst other benefits, codereuse easy, thus shortening
turnaround times in the softwaredevelopment cycle. However, due
to the existence of OO programming languages, the term
objectorientation is understood as an implementation of the
encapsulation principle on the programminglanguage level
today.
It did not take long until software developers wanted more than to
reuse code by linking additional libraries to their software at
compiletime and recompile the whole project. What was really
desired was to have reusable entities with well defined interfaces
that could be utilized during runtime.
Nowadays these entities are called components. Software utilizing this
paradigm is called componentware. The logical next step was to build
software using the componentbased approach with components that are
distributed across a network, resulting in socalled distributed component
systems.
Very soon it was recognized that some sort of standardized
framework is needed which embeds the single components and adds the
networkdistribution facilities. Corba was one of the first
approaches in this direction (see [OMG]). With the
introduction of Java several other distributed component
frameworks evolved, such as ObjectSpace Voyager (see [Voyager]) or Sun's approach called Enterprise Java
Beans (see [EJB]). Let me call such systems
first generation distributed component systems to indicate two
important points:
- The existing frameworks are great and are definitively a large step
into the right direction.
- Unfortunately all those systems also have some shortcomings
creating a definitive need for what I would like to call second
generation distributed component systems.
I do not want to start a religious discussion with the conclusion
that all existing systems are bad, have to be discarded and something
brandnew has to be invented. In my opinion this would result in
"just another system" which overcomes identified problems but
causes others, starting the next religious discussion. The scientists
and developers who thought about and built the firstgeneration
systems knew very well what they were doing. However, with every
single problem that is solved, new ideas and new needs evolve.
Therefore, before I come to the analysis of what in my opinion really
is essential for a distributed component system to be a second generation
system I want to come straight to one of the conclusions of the following
analysis: it is possible and desirable to build a middlewarelayer
on top of existing systems. The architecture proposed in this paper creates
a secondgeneration system as a combination of existing software with
some additional mechanisms.
Why this middlewarelayer is desirable, how it works and especially
what defines a second generation system in my eyes, is the topic
of this paper. Therefore let me start at the very beginning with the identification
of the requirements from the point of view of developers who write massively
distributed applications, data and computingwise, in large scale
heterogeneous networks.
2 Transparency - the Key to Distribution
There is a good reason why I started the introducion of this paper by
mentioning the OOparadigm: no matter if we are talking about classes,
objects, modules or components - application developers want to utilize
welldefined objects with welldefined interfaces in their
programs.
Whatever happens behind the scenes, whether or not there are
components or distributions, heterogeneous networks or standalone
systems, details have to be encapsulated and therefore hidden from the
developer who works on a different abstraction level! The abstraction
level for my considerations at the moment is: everything is
encapsulated in an object and developers do not care what the object
encapsulates. The common term for this kind of hiding complexity
by encapsulation is transparency. It means to let "something" happen
behind the scenes without putting the burden on the developers to
distinguish between different situations.
I used the term something above, because there are many important
transparencyaspects when considering distributed component systems.
This will be discussed below.
2.1 Component Transparency - Part 1
This paper should be about componentsystems and now I am
at the level of discussing OOaspects - why? The reason is that according
to the definition above, the term component describes a reusable
entity with well/shy;defined interfaces that can be utilized during runtime.
It does not have to be linked to a program statically during compiletime.
Considering this definition even a whole application could be a component,
as long as it provides a welldefined interface that makes it possible
to access it from within a different application during runtime.
The goal is it to hide different kinds of components behind the scenes
by encapsulating them. Inside applications the developers always work with
objects, no matter if they are simple statically linked objects or if they
encapsulate dynamically instantiable and accessible components.
At a first glance the requirement for component transparency is fulfilled
by first generation systems. However, this indeed is only true at a first
glance if we consider what happens behind the scenes when using e.g. Enterprise
JavaBeans or Corba:
- Objects inside the socalled client represent external components:
this is what we want.
- However, first generation systems usually do not give you control over
the instances of external components. Usually clients cannot require that
a new remote component shall be instantiated. They can just connect to
an already instantiated component. Depending on the system it is also possible
that components are instantiated on the fly when the first request occurs
(e.g. with RMI's objectactivation mechanism, see also [RMI]).
Nevertheless all following requests from all different clients then refer
to the same component.
- The fact that one can connect to instantiated components but not
influence instantiation by means of systemimmanent mechanisms
leads to situations where special "managing components" have to be
written that take over this task.
Distributed component systems have to be more than just allowing to
invoke methods on remote objects. Inside applications a request to create
a new instance of a class is usual (e.g. in Java my_object = new MyClass();).
Requiring component transparency an instance of MyClass could
also require that a remote object is instantiated. In this case a remote
instance has to be created and encapsulated by a local stub. The required
instance transparency mechanisms also have to provide control over
accessexclusiveness.
2.2 Network Transparency
Network transparency means that application developers do not have
to know, whether they are working with a local or with a
remoteobject. In order not to have to distinguish between
different programming languages such as e.g. C++ which has pointers
and Java which does not, the term objectreference will be
used from now on. Objectreference means that an object is somehow
held in an application. Applicationprogrammers can work with this
object exactly in the way as is predetermined by the
programminglanguage used.
Speaking in OOterms we are talking about classes that define
datatypes and objects that represent instances of classes.
A networktransparent object is an object that is an instance of a
certain class, no matter if the instance is residing inside the application
or on a different computer. It can very well be that several instances
of a certain class are residing on the local machine and several other
instances of the same class are residing on different remote machines.
Nevertheless these instances all look the same to the application programmers:
it is always possible to work with them as if they were residing inside
their own application. The networkaspect is hidden behind the scenes.
Therefore local and remoteobjects are fully interchangeable.
The aspect of network transparency is one of the aspects that all first
generation systems fulfill more or less in one way or another. Usually,
some sort of static stubskeleton model or a dynamic derivative
of it is implemented to achieve this behaviour.
There is one problem remaining with network transparent objects, no
matter how they are implemented: it is always possible that network connections
fail. Therefore, no matter how welldesigned and wellimplemented
the networklogic may be, some risk remains that requests fail
due to network problems. Whatever topics may be discussed here, e.g. what
happens to timecritical applications, the problem is systemimmanent
and has to be dealt with for every single application. Nevertheless it
is easier to deal with this systemimmanent problem if possible failure
is a welldefined part of all objectinterfaces.
Fortunately the developers of the first generation systems seem to share
this opinion and all systems provide more or less intelligent erroralert
facilities for this case. No matter how well network transparency is implemented
in existing first generation systems, there are several aspects that are
usually overlooked: having a remote objectreference in hands and working
with it is one topic. The other topic is: how do we obtain such a remote
reference? The following transparency requirements deal with exactly this
problem more in detail.
2.3 Component Transparency - Part 2
To be able to find satisfactory answers to questions about obtaining
remote references and dealing with them, it is necessary to pick
up the component transparency thread again and find a detailed and
comprehensive definition of the term component.
All first generation systems have one feature in common: they only
deal with "their" native kind of components. First
generation systems are not able to work with arbitrary content that is
stored "somewhere" and consider this arbitrary content a
component. Neither do first generation systems usually deal with other
component systems in a cooperative manner in the sense that it would
be possible to "combine" different systems.
Apparently, designers of first generation component systems did not
take an approach that is general enough. I will now present a more holistic
definition for distributed component systems:
- Looking at the world from inside a distributed component system, everything
in the whole distributed world will be termed a component, no matter how
granular components become. In this definition it does not matter whether
a component represents a simple file in a filesystem (as a wrapper), a
real remote object, a databaseentry or maybe a remote object of a
different system (e.g. Corba), or an application. It can also be that a
component represents a document, e.g. an XMLdocument. This document
is itself structured as a DOMtree using components as nodes.
- Speaking of components that are composed of several subcomponents
two different cases can be distinguished:
- The entity that is represented by the component is exists as one
single piece and subcomponents represent its logical structure.
- The entity that is represented by the component is made up of different
chunks. Subcomponents represent the individual chunks and the component
that can be seen is in fact a container combining several subcomponents
to one logical virtual composite.
- A component has to be addressable in a globally unique way. This requirement
applies to simple components (e.g. components wrapping single files) and
also to compound components.
When addressing a compound component only the toplevel component
(i.e. the container) is addressed and the rest (i.e. where the parts come
from) is hidden behind the scenes. This behaviour could be called compound
transparancy. However, we must keep in mind that globally unique addressing
in a dynamic world is a topic of its own (objects can move!). Aspects of
globally unique dynamic addressing will be discussed in section 2.7.
- The world does not consist of arbitrary components hanging around somewhere
in a vacuum and being accessible just if we know the right key. It has
to be possible to navigate through the componentspace either by means of
directed and bidirectional relations and also by means of searchoperations.
- Allowing relations between components means that arbitrary components
can be interconnected, no matter in which system they reside. A detailed
discussion about relations is postponed till section 2.8.
- Additional information like the contenttype of the data that is
encapsulated in an object or administrative data like author, creation
date, etc. is also something that has to be handled in a transparent manner.
Hence part of the component transparency requirement is unified handling
of metadata.
- Components have special services that they provide. Considering e.g.
Java Beans, the Beans can be asked for their abilities. In our case components
can also be wrappers for everything from simple content to applications,
components can be madeup of subcomponents, etc. A necessary
requirement is that the abilities of the wrapped resource are passed on
transparently to the components' users, i.e. the application programmers.
- There are cases where components can become active themselves, e.g.
a component wrapping a timeplanner application must be able to pass on
triggers to other components in the system for reminders that come from
the application.
After the above two rounds of discussion about aspects of component
transparency it should be clear where transparency is needed. At a
first glance it thus looks as if we could come to an exact definition
of the term component now. However, some questions still remain if we
look at requirements like globally unique stable addressing,
relationmanagement or compoundcomponents. Therefore let us
first consider the remaining transparency aspects before presenting
the final result.
2.4 Persistence Transparency
As mentioned, components can themselves be composed of several
subcomponents that can reside on different systems. For example
it can happen that documents and metadata describing the
documents reside in different systems. It can happen that documents
are stored in a filesystem, whereas additional metadata such as
keywords, descriptions, etc. are stored in a database.
This happens e.g. very often in electronic publishing applications.
From the application programmers' point of view it is desirable to have
one component that encapsulates the existence of different locations of
the component's persistent data, making it unnoticable for users. This
becomes especially important if a storagesystem is replaced by a different
one. As an example it can happen that metadata is first stored in
the filesystem and later, as the amount of data increases, all metadata
is moved to a database.
For this reason not only persistence transparency in the sense of static
transparency is required. Persistence transparency has to cover the dynamic
case too, where parts of the persistent state of a component can be moved
to a different location. One more dynamic case would be that e.g. the persistent
state of a component is stored as one single XML file in the filesystem
including content and metadata. In this case the application works
with a simple component wrapping it. Later, the decision is made to move
the metadata to a database to make it searchable. Hence a simple component
is converted into a combined component with metadata from the server
and the "rest" from the filesystem.
Back to the question in section 2.2: how do
we obtain a reference? The requirement for dynamic persistence transparency
rules out the use of addresses like URLs. The transparency requirements
discussed later in section 2.5, section 2.6,
section 2.7, section 2.8 and
section 2.9 back up this conclusion.
2.5 Protocol Transparency
As has already been mentioned - components can reside "anywhere" and
can move around. Cases like first the persistent data of a component
was in the filesystem on computer A and now the component is residing on
an httpServer on computer B and arbitrary many other scenarios
are thinkable.
For this reason a protocol as a part of an address for a remote reference,
like in URLs (see also [BernersLee et al 1994]),
is unusable. The protocol to access a remote server, be it just a dataprovider
like an httpServer or a distributed object system like Corba, has
to be completely encapsulated.
2.6 Schema Transparency
More or less the same problem, just from a different point of view,
can be found when having a closer look at addressschemas (see [Terry
1984] and [Znati, Molka 1992]):
In most of today's systems addresses are somehow structured hierarchically,
following an implicit or explicit schema. For example, filesystems
have a directoryhierarchy that is used for two different purposes: addressing
and navigation.
The implicit schema here is the existence of hierarchical subdirectories
that form a fully qualified path for addressing data. The explicit schema
here is the way users or administrators structure the subdirectories to
allow easy navigation.
Data in databases is accessed by queries and the queries are also based
on a distinct schema that is designed by the developers. This schema is
reflected in the queries needed to access data. In any case databaseaccess
and filesystemaccess are completely different, even if we would encapsulate
the protocol transparently as required in section 2.5.
Who does not know the cryptic URLencoded queries for accessing databases
with a Webfrontend?
Imagine further that datachunks are moved from one system to another
(e.g. from the filesystem to a database or from one database to a different
one with a different underlying schema). In this case all addresses obtained
from the "old" system are unusable. Therefore it is also necessary to
hide the schema from the developers.
It is worth to have another look at navigation in the addressspace:
Mixing up addressing and navigation is definitively a very bad idea, because
every attempt to restructure the componentspace would break the schema.
Therefore those two issues, addressing and navigation, have to be strictly
separated as will be pointed out in section 2.8.
2.7 Location Transparency
Several times it has been stated in this paper that moving components
around can break the addressing mechanism. Hence the problems that can
arise should be clear enough by now.
Let us summarize the resulting very strong requirement here: remote
handles have to be robust against all restructuring operations.
These operations include moving components around, splitting them up
into subcomponents that are virtually merged in a container, merging
splitup subcomponents to one "real" component rather than a
virtual container and moving subcomponents around without breaking
the virtually merged components.
From this requirement it finally becomes clear that addresses in the
form of pointers are a problem, no matter if we take URLs or any
other mechanism that points to a location.
The solution is what can be called a globally unique handle,
which represents a symbolic name. The mechanism behind these handles
is a little more complicated than it initially looks. There are
several aspects of scalability which have to be considered. A naive
lookupservice implementation would not work for a worldwide
distributed componentspace. However, for our further considerations
it is enough to know that in principle globally unique handles
solve the location transparency problem. A detailed discussion how the
scalability problems can be overcome is beyond the scope of this
paper.
These essential problems are solved and several algorithms have
been developed to keep scalability very well under control (see [Schmaranz 2002] for details). Such very
specialized algorithms do not fit into this general discussion of
second generation distributed component systems.
2.8 Relation Transparency
There is little need to mention that links between data are an
essential part of every modern document, information and
knowledgemanagement system. However, there is need for discussion
what the requirements for a modern implementation of the nodelink
paradigm are. From the discussion of the "holistic" view of the
system in section 2.3 it is already known that
essential types of applications built with component frameworks will
surely be document, information and knowledgemanagement
systems.
Therefore let us have a closer look at the requirements that such systems
have, to derive the technical requirements for distributed component systems:
- It is clear that hyperlinks embedded in e.g. HTMLdocuments are
not the solution we all are looking for (see [Andrews
et al 1995]). Hyperlinks definitively have to be separated from documents
(at least internally).
- It is also clear that hyperlinks have to be robust against moving the
destination to a different location. In this case the hyperlinks have to
point to the new location.
- It has to be possible to interlink arbitrary kinds of documents, no
matter where they reside and no matter which type they have. And this is
exactly the point, why the term hyperlink seems unappropriate. There is
much more behind this requirement than one would suspect. What is definitively
needed, is a general mechanism to define arbitrary kinds of relations between
arbitrary components. By arbitrary kinds of relations things like
a link to a destination or an inclusion or just an interconnection
are meant. The list what a relation can represent is endless and depends
on the needs of a concrete application. Relations cannot only represent
navigational structure, they can also be used for internal structuring
purposes, e.g. for combination of several components into one virtual component.
It should be clear that there is a myriad of examples how data can be
connected. Considering the relationtopic from a more technical pointofview,
one very important group of features comes into mind, resulting in very
essential requirements for distributed component systems:
Components can, amongst other things, also represent functional
modules (as is the case with e.g. Java Beans). Such functional modules
are combined in one way or another to make up whole applications.
In our case the single components can be arbitrarily distributed across
several computers resulting in a wholly distributed application. All transparency
requirements that have already been discussed above also fully apply for
this case. For example, if a functional module is moved from one location
to the other this must not break the distributed application!
There is one more requirement that can be derived from the discussions
in section 2.6 and section 2.7:
there it was stated that addressing and navigation are technically and
semantically two completely different mechamisms that have to be strictly
separated. The solution for the dynamic location transparency problem is
the use of globally unique handles. Relations are now the method of choice
to implement navigation.
In fact, from the users' point of view, navigation always comes down
to either moving around in a hierarchy (e.g. the subdirectorystructure
of a filesystem) or in a graph (e.g. hyperlinks on the Web or symbolic
links in Unix filesystems). In case of a hierarchy we have to deal with
parentchild relations, in case of a graph we have to deal with directed
relations from one arbitrary point to a different arbitrary point. Relations
always represent some kind of logical structure. Addresses always
represent a technical structure.
With these points in mind the requirements for a relation mechanism
in second generation distributed component systems can be formulated as
follows:
- For the sake of generality n : mrelations have to be used. In
most cases only 1 : 1 or 1 : nrelations will be needed in applications.
Nevertheless there are situations where the general n : mcase applies
(e.g. when interconnecting two versioncontrolled components). One
can simulate n : mrelations by using many 1:1 relations, but this
would cause avoidable overhead. Therefore, from now on the term relation
in this paper is always understood to be an n : mrelation.
- The endpoints of relations are always attached to components. If endpoints
of relations shall point inside components, e.g. refer to a paragraph
in a document, this can be achieved as well. Two cases exist here:
- The endpoint of a relation may be a component that represents a part
of a document, e.g. a paragraph.
- If the granularity of subcomponents is not small enough, a relation
has to point to something that is just part of a component. In this case
additional information can be attached to the endpoint of this relation
at issue that reflects this fact.
Jumping a little ahead, the advantage of attaching endpoints to components
is that robustness concerning componentmovement problems can be achieved
easily. If e.g. a paragraph of a document is represented as a component
and the paragraph is moved inside the document, the relation automatically
points to the new location of the paragraph in the document.
- Relations can be of arbitrary kind (directed, bidirectional, inclusion,
etc.) and of arbitrary user or applicationdefined type (inlineimage,
belongingtogether, interestingadditionalinformation, etc.).
- Arbitrarily many relationendpoints can be attached to a
component
- Relations have to be robust against componentmovement problems.
- It has to be possible that relations between relations exist.
- Internally relations have to be implemented bidirectionally, so that
it is always possible to find all n +m endpoints of a relation. This requirement
is an absolute (internal) necessity to fulfill the movementrobustness
requirement.
- Relations and single endpoints of relations can have arbitrary metainformation
attached.
Considering these requirements it becomes clear, why the heading
relation transparency was chosen for this section: with the
definitions that "everything is represented by components"
and "relations interconnect arbitrary components" it is
possible to define relations between arbitrary data, no matter if the
dataformat natively supports relations or not. If the underlying
dataformat supports relations they are passed on to the component
transparently. If not, the relations are managed by the system and
stored in a separate database. Arbitrary mixtures between implicit and
explicit management of relations for one component are possible.
2.9 Replication Transparency
There are two main factors that make replication of components desirable:
- If many users want to use one and the same component it can happen
that either the machine where the component resides or even the network
in this area become overloaded.
- Network connections to a certain location may be slow from parts of
the network.
Thus, since response time may be rather unsatisfactory, some sort of
replication mechanism would be desirable. By using globally unique handles
this can easily be implemented: resolving a handle can return an appropriate
remote reference to a replica of the desired component rather than to the
original. Therefore replication of components is fully transparent in a
sense that requestors do not notice at all whether they obtain a reference
to a replica or to the original.
Sofar this mechanism corresponds to a standard caching mechanism as
can be found in every Proxy. The difference between simply caching a
component or having a real replica is that caching is unsynchronized
from the point of view of the original, while replication is
not. Replication has to be implemented in a way that the original
knows of existing replicas and can set them dirty if something changes
in the state of the original.
From this point of view there exist three kinds of replicas, depending
on the nature of the component itself and depending on the usage of the
component:
Unsynchronized replicas: these are replicas, where synchronization
is not necessary at all because the original component is stateless. The
mechanism in this case corresponds to a standard caching mechanism without
dirtyflagging. However one thing has to be kept in mind that
forces real replication (i.e. the original knows about existing replicas):
if a component is deleted, the replicas have to be deleted too. Therefore
just for the case of component deletion either close synchronization
or loose synchronization as described below are necessary. For this
reason unsynchronized replicas may only exist in systems that do not allow
object deletion. Hence such systems make only limited sense, but for the
sake of completeness of the discussion this case is mentioned here.
Closely synchronized replicas: these are replicas, where updates
of the internal state are essential for working with them. The problem
with this kind of replicas is that delays in setting them dirty
influences the result in an unacceptable way. Therefore it has to be made
sure that the actual state of the original component is always reflected
in the replicas. Especially when dealing with collaboration aspects like
concurrent editing, close synchronization is the method of choice. It might
be suspected that this means that the original has to be contacted anyway
for each request and that therefore replication does not make any sense
at all in this case. However, this is not really true. Timestamped requests
together with replicated versionupdate information reduce network
traffic for closely synchronized replicas enormously.
Loosely synchronized replicas: these are replicas, where delays
in updating the internal state of components are not critical, as long
as the delays can be kept within certain boundaries. Usually some seconds
of delay, sometimes even minutes or hours could be considered uncritical.
Just think of a standard WWWserver: when pages are changed it usually
does not matter at all, if some users see the old page rather than the
new one, even if the new page would already be available for some seconds.
This kind of delay is quite usual and commonly accepted today if you consider
all the caching mechanisms in proxies and in common browsers. However,
one point has to be kept in mind when discussing loosely synchronized replicas:
it has to be possible to force a lookup, if an update occured. With this
additional forced synchronization that can be triggered by the replica,
one can at least make sure to obtain an updated version if this is absolutely
desired.
Speaking of replications and updates the first thing that usually comes
into mind is the problem that a requestor obtains an outdated version of
an object. However, also the opposite can be a problem: a requestor could
obtain a version that is "too new".
Just think of the case that the network connection between
requestor and replicating system is slower than the connection between
the replicating system and the system that holds the original. Under
certain circumstances it could happen that at the time when the
request was sent, an older version was valid than at the time when the
request arrived at the replicating system. If the replicating system
then sends the newer version of the object rather than the one that
was valid at the time when the request was sent, this could be a
problem. For most applications it is ok or even desirable to always
get the newest available version, for others, e.g. for collaboration
purposes, it is not.
Therefore replication in a second generation distributed component system
has to be implemented in a way that the behaviour can be adapted to the
needs of the application. different strategies have to be available to
choose from, depending on the requirements.
3 Dinopolis the First SecondGeneration System
The need for systems covering the aspects discussed above led a team
of researchers at the IICM to start an opensource framework called
Dinopolis (see [Freismuth et al 1997], [Dallermassl
et al 2000a] and [Dallermassl et al 2000b]).
From 1997 on design and prototypeimplementation phases have been
going on until in 1999 version 2.0 of a system called DINO (Distributed
Interactive Network Objects) was the core for MTP
(Medical Telematics Platform, see [Aly
et al 1998]). MTP is the first system implementing arbitrarily distributed
virtual medical patient records. The first prototype of MTP was introduced
at CeBit 1999 and due to the strong interest among medical institutions
and doctors, phase 2 of MTP, the design and implementation of a production
release of the system, started end of 1999. Since then a group of researchers
and developers at German Aerospace and at the IICM have been working closely
together on the design of Dinopolis as the first secondgeneration
distributed component system, which will be the core for the production
release of MTP.
The cornerstones of Dinopolis that make it a fullfeatured secondgeneration
distributed component framework can be subsummarized as:
- Dinopolis is designed as a platform independent middleware system,
fully written in Java.
- Due to its concept as a middleware system, Dinopolis is able to embed
and combine arbitrary existing systems, such as databases, WebServers
or also ORBs.
- Dinopolis implements a highly sophisticated componentmodel
that fulfills all transparency aspects discussed in section 2.1, section 2.2, section 2.3 and section 2.4. Components
can reside anywhere on the network or in arbitrary embedded systems.
Due to its concept as a middleware system, Dinopolis is able to embed
and combine arbitrary existing systems, such as databases, WebServers
or also ORBs.
- Dinopolis implements a highly sophisticated componentmodel
that fulfills all transparency aspects discussed in section 2.1, section 2.2, section 2.3 and section 2.4. Components
can reside anywhere on the network or in arbitrary embedded systems.
Due to its design as a middleware
system Dinopolis takes over component integration and management.
- Dinopolis implements a highly sophisticated addressing mechanism via
globally unique componenthandles that fulfills all the requirements
discussed in section 2.5, section 2.6
and section 2.7. Handles are robust against componentmovement
which can e.g. happen due to restructuring of the distributed component
space.
- Dinopolis implements a highly sophisticated relation mechanism that
fulfills all the requirements discussed in section 2.8.
- Replication transparency as discussed in section 2.9
is made possible by Dinopolis' addressing mechanism. Because a detailed
description of the whole Dinopolis system would be far beyond the scope
of this paper, this paper emphasizes the three most important aspects:
the component definition, globally unique handles and relation
management.
4 Definition: Component
Because the terms component or object have been used as
buzzwords for a very long time, there exist many different and even
contradictory definitions. In the following the definition of component
that forms the basis of the Dinopolis middleware framework will be discussed.
In principle a component in a distributed component system is an addressable
entity with the following properties:
- A component is addressable in a unique way via globally unique handles.
This means that one handle is always resolved to exactly one and the same
component, no matter when and how often it is resolved. It cannot happen
that a component is replaced by a different one by accident, like it can
happen in today's systems, if one component is deleted and a different
component happens to get the same address at a later stage. If a component
is deleted it is guaranteed that the handle will never be reused again
for a different component. A more indepth discussion about handles
can be found in section 5.
- A component can itself be a compound made up of several partcomponents.
In this case also the parts fully correspond to the whole componentdefinition
given here. In an OOsense different models of composing components
to a compound apply, e.g. derivation, inclusion, etc. With this feature
arbitrary componenthierarchies can be modelled.
- A component encapsulates content. Content in this context is
everything that can be considered data in a broad sense, e.g. a
document, streamdata or whatever else could be
stateinformation.
- Arbitrary metadata (i.e. attributes) can be attached to a component.
Metadata can e.g. be of descriptive nature like author, type
or creation date. Metadata can also be dependent on certain
applications that need to deal with the components, e.g. display hints,
etc. For this reason metadata is defined to be a treestructured
container of keys with values of arbitrary type that can be accessed through
the keys.
- Arbitrary relations can be attached to a component or to parts of it.
Relations can also e.g. be attached to metadata. As an example there
can be a relation from the author attribute to an addressrecord
in a database that represents the author.
- A component can provide arbitrary operations. From an OOpoint
of view the operations can be seen as the methods of a component.
- Sometimes operations are not enough to deal with components, because
it can happen that too much knowledge about the internals of the component
could be necessary. For example a component could have its origin in a
database that supports very special user accessrights. If an application
would want to provide e.g. a graphical interface that would allow users
to change accessrights, then the application would have to have the
knowledge about the internals of the database. E.g. the syntax of the attributes
to call a method for setting the rights correctly has to be known. For
this reason components can also provide arbitrary socalled services.
Services in the context of Dinopolis are GUIobjects that applications
can request and that provide highlevel userinterface functionality
for special purposes that would otherwise require too much internal knowledge.
Services deal with arbitrary userinterface libraries and their lookandfeel
is configurable accordingly, but this is beyond the scope of this paper.
- Components have a standard, uniform interface representing access
to their content, metadata, relations, methods and
services. Therefore applications need not know the internals of
different components to deal with them. Part of the standard interface
is also a possibility to ask components for their capabilities in a
uniform way. For example one can ask a component whether it supports
versioning. A schematic view how application programmers see
components according to the definition given here, is sketched in
figure 1.
5 Globally Unique Handles
From the discussion of the requirements at the beginning of this
paper we know that components have to be accessible through globally
unique handles. These handles have to be robust against component
movement and one handle always refers to one and the same
component. Handles can also be stored, e.g. somewhere on a user's
harddisk when bookmarking a component.
![]()
Figure 1: Schematic view of components
Considering these requirements it becomes clear that there are two ways
to ensure consistency of handles: either moved components leave traces
in the form of forwarders or a lookup service is implemented. The algorithm
with forwarders does not scale at all considering a huge number of objects
and a highly dynamic case. In addition, the requirement for component replication
(see section 2.9) would not work with forwarding anyhow
and would definitely require a lookupmechanism.
Therefore a lookupservice has to be the implementation of choice.
However, considering a huge number of handles in large and highly dynamic
distributed systems, a naive implementation (e.g. a central lookup service)
will not be enough, because it would not scale either.
The first idea that comes to mind to get control over the situation
is to define hierarchically structured handles and treat them like hostnames
are treated in DNS (see also [Mockapetris, Dunlap 1988]).
With this approach the lookupservice is well distributable. Nevertheless
there still exists a huge problem: we required robustness against objectmovement,
even if handles are stored "somewhere". Therefore, if a component is
moved from one "domain" to another, either its handle would change or
one lookupservice would have to take over control of the handle space
of a different domain. Both approaches are not realistic.
It becomes even worse if we consider the case of a heavily growing
system. At the beginning one lookupservice is enough, but as the
number of objects in the system and the number of users of the system
grows the lookupservice has to be split across two or more
machines. Also the opposite case is possible and we have to deal with
it in the case of the MTP Project: if a doctor representing a
datastoring institution retires and the system would go offline,
the data has to be stored in a different system, causing a
"merge" of two systems. Besides it can happen that not only
a simple merge of two systems takes place, but that the contents of
the system going offline could even have to be split across several
systems.
Thus, everything can move, components, parts of components (in the case
of compounds), servers that store components and even lookup servers. Nonetheless
globally unique handles have to remain stable and have to be robust against
all dynamic changes that can happen!
For this reason we developed an algorithm called DOLSA (Distributed
Object of Lookup Service Algorithm) that deals with arbitrarily granular,
arbitrarily distributed lookupservers and keeps handles stable, no
matter which dynamic changes in the whole component and lookupservice
world take place. The detailed description of this algorithm can be found
in [Schmaranz 2002]. Here is just a summary of
its very basic principles:
- Globally unique handles always consist of three parts, which can be
partially empty, if nothing has moved: 1. The BirthplaceID
of the component. This is the part of the handle that always allows it
to find its location. Therefore in a way this is exactly the globally unique
handle that we are looking for and that must never change. However, just
having this ID does not scale for huge numbers of objects and in the highly
dynamic case. 2. The MovedBirthplaceID of the component.
This ID represents the new ID, if the birthplace lookupservice is
no longer available and a different lookupservice has taken over control.
If this "new" lookupservice is also no longer available and its
responsibility is therefore moved again to a different server, this ID
is overwritten by the actual one. Please note that overwriting this ID
happens for scalability reasons, but it is not essential for resolving
a handle. The BirthplaceID can always be resolved. The algorithm
also deals with the case that one Birthplace lookupservice can be
split across several systems. 3. The ActualID of the component.
This ID represents the ID in the lookupservice that is responsible
after a component was moved across the network.
- Each of the three parts of the ID described above consists itself of
two parts: a LookupServiceID and an ObjectID
within the lookupservice.
- Lookupservices are hierarchically organized, but this
organization is not reflected in their
LookupServiceIDs to remain robust against changes in
the hierarchy. The principle here is the same that led to the
separation of relations and handles. The hierarchy of
lookupservices makes sense for scalability reasons: a request to
resolve a handle is always sent to the "closest"
lookupservice. Lookupservices can cache resolved handles
very similar to DNS servers and can give authoritative and
nonauthoritative answers. If a handle cannot be resolved
locally, the request is passed further up the hierarchy until it can
be resolved and the result is cached.
- To make sure that the distributed lookupservices can always
be found, the top of the hierarchy is formed by a set of socalled
MasterLookupServices.
- All IDs, LookupServiceIDs as well as ObjectIDs
are of arbitrary length in chunks of 64 Bits. This prevents the case of
running out of free IDs, although this may seem unnecessary when using
64 Bits. However, there is the requirement that IDs must not be used twice
and there are components that travel around a lot (like e.g. mobile agents).
Thus they can effectively "eat up" lots of IDs and having no limit can
therefore be essential.
6 Relations
As has been discussed, the most universal case of relations are n:m
relations and for this reason they are implemented this way in Dinopolis.
Relations can interconnect arbitrary components or even other relations.
There are enough examples, where at least one endpoint of a relation is
a relation itself, e.g. a hyperlink that says "have a look at this link".
As is the case with handles, also relations have to be robust against component
movement. The simplest and besides the most logical way to achieve
this, is to define the endpoints of relations by globally unique handles.
The further logical consequence is that relations are components themselves.
Relations being components in the sense of this paper result in a flexibility
that cannot be found in any other system:
- Relations between components can be held anywhere and are not bound
to the components' locations. Therefore it is possible to e.g. use relations
for personal hyperlinks between documents that reside on the users' desktop
computers. Additionally those hyperlinks are kept consistent if documents
are moved.
- Arbitrary type and metainformation can be attached to relations.
- Not only metainformation can be attached to relations, they can
also provide methods and services for greater flexibility.
- Internally relations are multidirectional, the endpoints of a
relation are subcomponents and the relationcomponent is the enclosing
composite. For consistency reasons, e.g. when restructuring the componentspace
it is necessary to find out which components are interconnected.
Because relations are components of their own that can be stored separately,
it is possible to interconnect arbitrary objects that are not even aware
of relations at all. For example it is possible to annotate videostreams
or audiostreams. Even private annotations are possible that are not
visible for others.
Typed relations with arbitrary metainformation also make it
possible to have arbitrary many different navigationpaths through
huge componentspaces without having to create many different sets
of hyperlinked documents as would be the case in today's systems. As
an example consider an elearning application: reusing
existing coursematerial and structuring it for different
audiences by means of typed relations for navigation is an easy task
to do. It is then even possible to switch back and forth between
different navigational structures.
This makes it easy to build adaptive
courses, where navigation depends on the skills of the learners (see also
[Dallermassl et al 2000c]).
One of today's buzzwords is KnowledgeManagement. Without
going into details of KnowledgeManagement, one of the main goals of
KM is to put information into context to make it knowledge. As knowledge
is growing, one aspect of growth is the number of interconnections between
different information chunks. The more interconnections between related
topics exist, the better the knowledgebase. However, it does not always
make sense to see all the interconnections. One and the same chunk of information
can be interesting for different audiences, but from different points of
view. As the number of interconnections grows and as the number of
different points of view grows, it becomes necessary to have adaptive relations,
so that users only find relevant knowledge rather than having to extract
relevant parts themselves from a huge pile of interconnections. There are
many different examples where a flexible relationmechanism is essential.
7 Conclusion
According to the motto "a good component system is one that the
application programmers don't notice", Dinopolis is trying to implement
all transparencyaspects discussed in this paper. Distributed component
systems will eventually become some sort of highlevel operating systems
that serve as a platform for all different kinds of applications. If this
is the case it would absolutely be desirable to standardize such frameworks.
For this reason and also because different people have many different ideas
about what such a platform should provide, Dinopolis is an opensource
project and the results are available for everyone free of charge. Dinopolis
is not intended to be a huge monolith. Just the opposite is true: the core
of the system is a very slim middleware layer providing the basic functionality
of globally unique handles, relations and a highly sophisticated objectmodel.
Everything else is grouped around this core in the form of modules that
can be loaded dynamically during runtime. Therefore the system is adaptable
for everybody according to the special needs of different applications.
One of the applications that require the implementation of a very
robust system with highly sophisticated access and
securitymechanisms is MTP that the IICM develops together with
German Aerospace. The security, reliability and
robustnessrequirements for medical applications are extremely
high because all the data in the system is extremely
sensitive. Therefore Dinopolis is not developed
"quickanddirty" but very structured with a very
detailed designphase and throrough documentation.
Because we want to build a platform that can be used for as wide a range
of different applications as possible, all ideas for necessary or desired
modules that can be grouped around the core of the system are very welcome.
If you have ideas or questions please have a look at http://www.dinopolis.org or
feel free to contact us via email at contact@dinopolis.org.
References
[Andrews et al 1995] Andrews K., Kappe F., Maurer
H., Schmaranz K.: On Second Generation Hypermedia Systems, Proceedings
EDMEDIA 95, Graz (1995), 75-80.
[Aly et al 1998] Aly F., Bethke K., Bartels E.,
Novotny J., Padeken D., Schmaranz K., Schwartmann D., Wilke D., Wirtz
M.: Medical Intranets for Telemedicine Services: Concepts and
Solutions, Proceedings G7 Meeting "The Impact of Telemedicine on
Health Care Management", Regensburg (1998), available online at
http://www.uni-regensburg.de/Fakultaeten/Medizin/
Uch/g7/program/mon.htm.
[BernersLee et al 1994] BernersLee T.,
Masinger L., McCahill M.: RFC 1738: Uniform Resource Locators (URL),
available online at ftp://ftp.internic.
net/rfc/rfc1738.txt
[Dallermassl et al 2000a] Dallermassl C., Haub
H., Maurer H., Schmaranz K., Zambelli P.: Dinopolis A Leading
Edge Application Framework for the Internet and Intranets, Proceedings
WebNet 2000, San Antonio, TX (2000), 111-116.
[Dallermassl et al 2000b] Dallermassl C. Haub H.,
Krottmaier H., Schmaranz K., Zambelli P.: Using Highly Sophisticated
Middleware for Building Arbitrarily Distributed Teaching Environments,
Proceedings ICCE/ICCAI 2000: Learning Societies In The New Millennium:
Care ativity, Caring & Commitments, Taipei (2000), 1439-1442.
[Dallermassl et al 2000c] Dallermassl C. Haub
H., Krottmaier H., Schmaranz K., Zambelli P.: Adaptive Learning Environments,
Proceedings ICCE/ICCAI 2000: Learning Societies In The New Millennium:
Care ativity, Caring & Commitments, Taipei (2000), 1443-1446.
[EJB] Enterprise Java Beans Technology,
electronically available at http:// java.sun.com/products/ejb.
[Freismuth et al 1997] Freismuth D., Helic D.,
Meszaros G., Schmaranz K., Zwantschko B.: DINO Distributed
Interactive Network Objects -The Java Approach, Proceedings
EdMedia '97, Calgary (1997), available online at http://www.iicm.edu/liberation/iicm_papers/edmed97/dino.
html.
[Mockapetris, Dunlap 1988] Mockapetris P., Dunlap
K. J.: Development of the domain name system, Proceedings ACM SIGCOMM
1988, Stanford, CA (1988), 123-133.
[OMG] The Object Management Group's Home page,
electronically available at http://www.omg.org.
[RMI] Java Remote Method Invocation, available
online at http://java.sun.
com/j2se/1.4/docs/guide/rmi.
[Schmaranz 2002] Schmaranz K.: DOLSA
A Robust Algorithm for Massively Distributed, Dynamic ObjectLookup
Services, submitted to J.UCS.
[Terry 1984] Terry D. B.: An analysis of naming
conventions for distributed computer systems, Proceedings ACM SIGCOMM
1984, Montreal (1984), 218-224.
[Voyager] ObjectSpace's Home page, available
online at http://www.objectspace.
com.
[Znati, Molka 1992] Znati T. B., Molka J.: A
Simulation Based Analysis of Naming Schemes for Distributed Systems,
Proceedings of the 25th annual Symposium on Simulation 1992, Orlando, FL
(1992), 42-51.
|