Algorithms and Experiments:
The New (and Old) Methodology
Bernard M.E. Moret
Department of Computer Science
University of New Mexico
Albuquerque, NM 87131, USA
moret@cs.unm.edu
Henry D. Shapiro
Department of Computer Science
University of New Mexico
Albuquerque, NM 87131, USA
shapiro@cs.unm.edu
Abstract: The last twenty years have seen enormous progress in
the design of algorithms, but little of it has been put into practice.
Because many recently developed algorithms are hard to characterize theoretically
and have large runningtime coefficients, the gap between theory and
practice has widened over these years. Experimentation is indispensable
in the assessment of heuristics for hard problems, in the characterization
of asymptotic behavior of complex algorithms, and in the comparison of
competing designs for tractable problems.
Implementation, although perhaps not rigorous experimentation, was characteristic
of early work in algorithms and data structures. Donald Knuth has throughout
insisted on testing every algorithm and conducting analyses that can predict
behavior on actual data; more recently, Jon Bentley has vividly illustrated
the difficulty of implementation and the value of testing. Numerical analysts
have long understood the need for standardized test suites to ensure robustness,
precision and efficiency of numerical libraries. It is only recently, however,
that the algorithms community has shown signs of returning to implementation
and testing as an integral part of algorithm development. The emerging
disciplines of experimental algorithmics and algorithm engineering have
revived and are extending many of the approaches used by computing pioneers
such as Floyd and Knuth and are placing on a formal basis many of Bentley's
observations.
We reflect on these issues, looking back at the last thirty years of
algorithm development and forward to new challenges: designing cacheaware
algorithms, algorithms for mixed models of computation, algorithms for
external memory, and algorithms for scientific research.
Key Words: Algorithm engineering, cacheaware algorithms,
efficiency, experimental algorithmics, external memory algorithms, implementation,
methodology
1 Introduction
Implementation, although perhaps not rigorous experimentation, was characteristic
of early work in algorithms and data structures. Donald Knuth insisted
on implementing every algorithm he designed and on conducting a rigorous
analysis of the resulting code (in the famous MIX assembly language) [Knu98],
while other pioneers such as Floyd are remembered as much for practical
"tricks" (e.g., the fourpoint method to eliminate most points
in an initial pass in the computation of a convex hull and the "bounce"
techniques for binary heaps, see, e.g. [MS91a]) as
for more theoretical contributions.
Throughout the last 25 years, Jon Bentley
has demonstrated the value of implementation and testing of algorithms,
beginning with his text on writing efficient programs [Ben82]
and continuing with his invaluable Programming Pearls columns in
Communications of the ACM, now collected in a new volume [Ben99],
and his Software Explorations columns in the UNIX Review.
David Johnson, whose own work on optimization for NPhard problems
involves extensive experimentation, started the annual ACM/SIAM Symposium
on Discrete Algorithms (SODA), which has sought, and every year featured
a few, experimental studies. It is only in the last few years, however,
that the algorithms community has shown signs of returning to implementation
and testing as an integral part of algorithm development. Other than SODA,
publication outlets remained rare until the late nineties: the ORSA
J. Computing and Math. Programming have published several strong
papers in the area, but the standard journals in the algorithm community,
such as the J. Algorithms, J. ACM, SIAM J. Computing, and
Algorithmica, as well as the more specialized journals in computational
geometry and other areas, have been slow to publish experimental studies.
(It should be noted that many strong experimental studies dedicated to
a particular application have appeared in publication outlets associated
with the application area; however, many of these studies ran tests to
understand the data or the model rather than to understand the algorithm.)
The online ACM Journal Experimental Algorithmics is dedicated to
this area and is starting to publish a respectable number of studies. The
two workshops targeted at experimental work in algorithms, the Workshop
on Algorithm Engineering (WAE), held every late summer in Europe, and
the Workshop on Algorithm Engineering and Experiments (ALENEX),
held every January in the United States, are also attracting growing numbers
of submissions. Support for an experimental component in algorithms research
is growing among funding agencies as well. We may thus be poised for a
revival of experimentation as a research methodology in the development
of algorithms and data structures, a most welcome prospect, but also one
that should prompt some reflection.
2 Empiricism in Algorithm Design
Natural scientists have perfected since at least the Middle Ages a particular
form of enquiry, which has come to be called the scientific method. It
is founded upon experiments; in its most basic form, it consists of using
accumulated data to formulate a conjecture within a particular model, then
conducting additional experiments to affirm or refute the conjecture. Such
a completely empirical approach is well suited for a natural science, where
the final arbiter is nature as revealed to us through experiments and measurements,
but it is incomplete in the artificial and mathematically precise world
of computing, where the behavior of an algorithm or data structure can
often, at least in principle, be characterized analytically. Natural scientists
run experiments because they have no other way of learning from nature.
In contrast, algorithm designers run experiments mostly because an analytical
characterization is too hard to achieve in practice. (Much the same is
done by computational scientists in physics, chemistry, and biology,
but
typically their aim is to analyze new data or to compare the predictions
given by a model with the measurements made from nature, not to characterize
the behavior of an algorithm.) Algorithm designers are measuring the actual
algorithm, not a model, and the results are not assessed against some gold
standard (nature), but simply reported as such or compared with other experiments
of the same type. Thus computer scientists must both learn from the natural
sciences, where experimentation has been used for centuries and where the
scientific method has been developed to optimize the use of experiments,
but must also remain aware of the fundamental difference between the natural
sciences and computer science, since the goal of experimentation in algorithmic
work differs in important ways from that in the natural sciences.
3 Asymptotic Analysis vs. Implementation
For over thirty years, the standard mode of theoretical analysis, and
thus also the main tool used to guide new designs, has been the asymptotic
analysis ("big Oh" and "big Theta") of worstcase
behavior (running time or quality of solution). The asymptotic mode eliminates
potentially confusing behavior on small instances due to startup costs
and clearly shows the growth rate of the running time. The worstcase
(per operation or amortized) mode gives us clear bounds and also simplifies
the analysis by removing the need for any assumptions about the data. The
resulting presentation is easy to communicate and reasonably well understood,
as well as machineindependent. However, we pay a heavy price for these
gains:
 The range of values in which the asymptotic behavior is clearly exhibited
("asymptopia," as it has been named by many authors) may include
only instance sizes that are well beyond any conceivable application. A
good example is the algorithm of Fredman and Tarjan for minimum spanning
trees. Its asymptotic worstcase running time is O ( E 
(  E ,  V )) where (m, n) is given by min{i
 log ^{(i)} n m/n}, so that, in particular, (n,
n) is just log* n. This bound is much better for dense graphs
than that of Prim's algorithm, which is O ( E  log  V
) when implemented with binary heaps, but experimentation [MS94]
verifies that the crossover point occurs for dense graphs with well over
a million vertices and thus hundreds of millions of edges beyond
the size of any reasonable data set.
 In another facet of the same problem, the constants hidden in the asymptotic
analysis may prevent any practical implementation from running to completion,
even if the growth rate is quite reasonable. An extreme example of this
problem is provided by the theory of graph minors: Robertson and Seymour
(see [RS85]) gave a cubictime algorithm to determine
whether a given graph is a minor of another, but the proportionality constants
are gigantic a recent estimate was on the order of 10^{150}
[Fel99] and have not been substantially lowered
yet, making the algorithm entirely impractical.
 The worstcase behavior may be restricted to a very small subset
of instances and thus not be at all characteristic of instances encountered
in practice. A classic example here is the running time of the simplex
method for linear programming; for over thirty years, it has been known
that the worstcase behavior of this method is exponential, but also
that its practical running time is typically bounded by a lowdegree
polynomial [AMO93]. (Indeed, in some of its newer
versions, its running time is competitive with that of the modern interiorpoint
methods [BFG + 00].)
 Even in the absence of any of these problems, deriving tight asymptotic
bounds may be very difficult. All optimization metaheuristics for NPhard
problems (such as simulated annealing or genetic algorithms) suffer from
this drawback: by considering a large number of parameters and a substantial
slice of recent execution history, they create a complex state space which
is very hard to analyze with existing methods, whether to bound the running
time or to estimate the quality of the returned solution.
These are the most obvious drawbacks. A more insidious drawback, yet
one that could prove much more damaging in the long term, is that worstcase
asymptotic analysis tends to promote the development of "paperandpencil"
algorithms, that is, algorithms that never get implemented. This problem
compounds itself quickly, as further developments rely on earlier ones,
with the result that many of the most interesting algorithms published
over the last ten years rely on several layers of complex, unimplemented
algorithms and data structures. In order to implement one of these recent
algorithms, a programmer would face the daunting prospect of developing
implementations for all preceding layers. Moreover, the "paperandpencil"
algorithms often ignore issues critical in making implementations efficient,
such as lowlevel algorithmic issues and architecturedependent
issues (particularly caching), as well as issues of robustness (such as
the potential effects of numerical errors or unexpected symmetries in geometric
computations, although recent recent conferences in computational geometry
have featured a number of papers addressing these issues). Transforming
paperandpencil algorithms into efficient and useful implementations
is today referred to as algorithm engineering; case studies show
that the use of algorithm engineering techniques, all of which are based
on experimentation, can improve the running time of code by up to three
orders of magnitude [MWB + 01] as well as yielding
robust libraries of data structures with minimal overhead, as done in the
LEDA library [MN95, MN99].
There is no reason to abandon asymptotic worstcase analysis; but
there is a definite need to supplement it with experimentation, which implies
that most algorithms should be implemented, not just designed. Many algorithms
are in fact quite difficult to implement because of their intricate
nature and also because the designer described them at a very high level.
The practitioner is not the only one who stands to benefit from implementation:
the detailed, stepbystep understanding required for implementation
may enable the designer to notice features that had remained invisible
in the highlevel design and so to bring about a simplified or improved
design.
4 Modes of Empirical Assessment
We can classify modes of empirical assessment into a number of nonexclusive
categories:
 Checking for accuracy or correctness in extreme cases (e.g., standardized
test suites for numerical computing).
 Assessing the quality of algorithms (heuristics, approximations, or
exact solvers) for the solution of NPhard problems.
 Comparing the actual performance of competing algorithms for tractable
problems and characterizing the effects of algorithm engineering.
 Investigating and refining models and optimization criteria what
should be optimized? and what parameters matter?
The first category has reached a high level of maturity in numerical
computing, where standard test suites are used to assess the quality of
new numerical codes. Similarly, the operations research community has developed
a number of test cases for linear program solvers. We have no comparable
emphasis to date in combinatorial and geometric computing. Investigation
and refinement of models and optimization criteria is of major concern
today, particularly in areas such as computational biology and computational
chemistry. While many studies are published, most demonstrate a certain
lack of sophistication in the conduct of the computational studies
suffering as they do from various sources of errors. We eschew a lengthy
discussion of this important area and instead present sound principles
and illustrate pitfalls in the context of the two categories that have
seen the bulk of research to date in the algorithms community. Most of
these principles and pitfalls can be related directly to the testing and
validation of discrete optimization models in the natural sciences.
4.1 Assessment of Competing Algorithms and Data Structures for Tractable
Problems
The goal here is to measure the actual performance of competing algorithms
for wellsolved problems. This is fairly new work in combinatorial
algorithms and data structures, but common in Operations Research; early
(1960s) work in data structures typically included code and examples, but
no systematic study. Scattered articles during the 70s (see, e.g., [DS85])
kept a low level of experimentation active, but did not attempt to provide
methodological pointers. More recent and comprehensive work began with
Bentley's many contributions in his Programming Pearls (starting
in 1983 [Ben83]), then with Jones' comparison of data
structures for priority queues [Jon86] and Stasko
and Vitter's combination of analytical and experimental work in the study
of pairing heaps [SV87]. An early experimental study
on a large scale was that of Moret and Shapiro on sorting algorithms [MS91a]
(Chapter 8), itself inspired by the work of
Knuth in his Volume III [Knu98],
followed by that of the same authors on algorithms for constructing minimum
spanning trees [MS94]. In 1991, David Johnson and others
initiated the very successful DIMACS Computational Challenges, the first
of which [JM93] focused on network flow and shortest
path algorithms, indirectly giving rise to several modern, thorough studies,
by Cherkassky et al. on shortest paths [CGR96],
by Cherkassky et al. on the implementation of the pushrelabel
method for matching and network flows [CGM + 98, CG97],
and by Goldberg and Tsioutsiouliklis on cut trees [GT01].
The DIMACS Computational Challenges (the fifth, in 1996, focused on another
tractable problem, priority queues and point location data structures)
have served to highlight work in the area, to establish common data formats
(particularly formats for graphs and networks), and to set up the first
tailored test suites for a host of problems.
Much interest has focused over the last three to four years on the question
of tailoring algorithms and implementations to the cache structure and
policies of the architecture. Caching effects can significantly alter the
predictions of asymptotic analysis; a classic example is hashing: most
textbook on data structures still advocate using double hashing in preference
to linear probing, whereas experimental data clearly indicates that linear
probing is the faster method, thanks to its good locality (see [BMQ98]).
Pioneering studies by Ladner and his coworkers [LL96,
LL97] established that cache optimization was feasible,
algorithmically interesting, and worthwhile, even for such old friends
as sorting algorithms [ACVW01, LL97,
RR99, XZK00] and priority queues
[LL96, San00]; indeed, even matrix
multiplication, which has been optimized in numerical libraries for over
40 years (including optimizations for paging behavior), is amenable to
such techniques [ERS90]. Ad hoc reduction in memory
usage and improvement in patterns of memory addressing have been reported
to gain speedups of as much as a factor of 10 [MWB + 01].
The related, and much better studied, model of outofcore computing,
as pioneered by Vitter and his coworkers [Vit01],
has inspired new work in cacheaware and cacheindependent algorithm
design. Once again, though, this trend was pioneered over 40 years ago,
when programmers had to work with very limited memory and studied detailed
optimization strategies for accessing early secondarystorage devices
such as magnetic drums, and followed in the seventies by much work on outofcore
computing Knuth has detailed analyses of external sorting algorithms
in his Volume III [Knu98]. Today, we are confronted
with much deeper memory hierarchies and enormous volumes of data, so we
need to return to these optimization techniques and extend them to apply
throughout the hierarchy. Characterizing the behavior of algoritms on realworld
instances is generally very hard simply because we often lack the crucial
instance parameters with which to correlate running times. Experimentation
can quickly pinpoint good and bad implementations and whether theoretical
advantages are retained in practice. In the process, newer insights may
be gleaned that might enable a refinement or simplification of the algorithm.
Experimentation can also enable us to determine the actual constants in
the running time analysis; determining such constants beforehand is quite
difficult (see [FM97]
for a possible methodology),
but a simple regression analysis from the data can gives us quite accurate
values. Experimental studies naturally include caching effects, whereas
adding those into the analysis in a formal manner is very challenging.
4.2 Assessment of Heuristics
Here the goal is to measure the performance of heuristics on real and
artificial instances and to improve the theoretical understanding of the
problem, presumably with the aim of producing yet better heuristics or
proving that current heuristics have guaranteed performance bounds. By
performance is implied both the running time and the quality of the solution
produced.
Since the behavior of heuristics is very difficult to characterize analytically,
experimental studies have been the rule. The Operations Research community,
which has a long tradition of application studies, has slowly developed
some guidelines for experimentation with integer programming problems (see
[AMO93], Chapter 18). Inspired in part by experimental
studies of integerprogramming algorithms for combinatorial optimization,
such as algorithms for the setcovering problem see, e.g., [BH80],
we conducted a largescale combinatorial study on the minimum test
set problem [MS85], one of the first such studies in
Computer Science to include both realworld and generated instances.
Other largescale studies were published in the same time frame, most
notably the classic and exemplary study of simulated annealing by David
Johnson's group [JAMS89, JAMS91],
which, among other things, demonstrated the value of a varied collection
of test instances. The Second DIMACS Computational Challenge [JT96]
was devoted to satisfiability, graph coloring, and clique problems and
thus saw a large collection of results in this area. The ACM/SIAM Symposium
on Discrete Algorithms (SODA) has included a few such studies in each of
its dozen events to date, such as the study of cut algorithms by Chekuri
et al. [CaDRKLS97]. The Traveling Salesperson problem
has seen large numbers of experimental studies (including the well publicized
study of Jon Bentley [Ben90]), made possible in part
by the development of a library of test cases [Rei94].
Graph coloring, whether in its NPhard version of chromatic number
determination or in its much easier (yet still challenging) version of
planar graph coloring, has seen much work as well; the second study of
simulated annealing conducted by Johnson's group [JAMS89]
discussed many facets of the problem, while Morgenstern and Shapiro [MS91b]
provided a detailed study of algorithms to color planar graphs.
Understanding how a heuristic works to cut down on computational time
is generally too difficult to achieve through formal derivations; much
the same often goes for bounding the quality of approximations obtained
with many heuristics. Of course, we have many elegant results bounding
the worstcase performance of approximation algorithms, but many of
these bounds, even when attainable, are overly pessimistic for realworld
data. Yet both aspects are crucial in evaluating performance and in helping
us design better heuristics.
In the same vein, understanding when an exact algorithm runs quickly
is often too difficult for formal methods. It is much easier to characterize
the worstcase running time of an algorithm than to develop a classification
of input data in terms of a few parameters that suffice to predict the
actual running time in most cases. Experimentation can help us assess the
performance of an algorithm on realworld instances (a crucial point)
and develop at least ad hoc boundaries between instances where it runs
fast and instances that exhibit the exponential worstcase behavior.
5 Experimental Setup
How should an experimental study be conducted, once a topic has been
identified? Surely the most important criterion to keep in mind is that
an experiment is run either as a discovery tool or as a means to answer
specific questions. Experiments as explorations are common to all endeavors;
the setup is essentially arbitrary it should not be allowed to limit
one's creativity. We focus instead on experiments as means to answer specific
questions the essence of the scientific method used in all physical
sciences. In this methodology, we begin by formulating a hypothesis or
a question, then set about gathering data to test or answer it, while ensuring
reproducibility and significance. In terms of experiments with algorithms,
these characteristics give rise to the following procedural rules
but the reader should keep in mind that most researchers would mix the
two activities for quite a while before running their "final"
set of experiments:
 Begin the work with a clear set of objectives: which questions will
you be asking, which statements will you be testing?
 Once the experimental design is complete, simply gather data.
 Analyze the data to answer only the original objectives. (Later, consider
how a new cycle of experiments can improve your understanding.)
At all stages, we should beware of a number of potential pitfalls, including
various biases due to:
 The choice of machine (caching, addressing, data movement), of language
(register manipulation, builtin types), or of compiler (quality of
optimization and code generation).
 The quality of the coding (consistency and sophistication of programmers).
 The selection or generation of instances (we must use sufficient size
and variety to ensure significance).
 The method of analysis (many steps can be taken to improve the significance
of the results as well as to bring out trends).
Caching, in particular, may have very strong effects when comparing
efficient algorithms. For instance, in our study of MST algorithms, we
observed 3:1 ratios of running
time depending on the order in which the
adjacency lists were stored; in our study of sorting algorithms, we observed
a nonlinear running time for radix sort (contradicting the theoretical
analysis), which is a simple consequence of caching effects. Recent studies
by LaMarca and Ladner [LL96, LL97]
have quantified many aspects of caching and offer suggestions on how to
work around (or take advantage of) caching effects.
Johnson [Joh01] offers a detailed list of the various
problems he has observed in experimental studies, particularly those dealing
with heuristics for hard optimization problems. Most of these pitfalls
can be avoided with the type of routine care used by experimentalists in
any of the natural sciences. However, we should point out that confounding
factors can assume rather subtle forms. Knuth long ago pointed out curious
effects of apparently robust pseudorandom number generators (see [Knu98],
Vol. II); the creation of unexpected patterns as an artifact of a hidden
routine (or, in the case of timing studies, as an artifact of interactions
between the memory hierarchy and the code) could easily lead the experimenter
to hypothesize nonexistent relationships in the data. The problem is compounded
in complex model spaces, since obtaining a fair sampling of such a space
is always problematic. Thus it pays to go over the design of an experimental
study a few times just to assess its sensitivity to potential confounding
factors and then to examine the results with the same jaundiced eye.
6 What to Measure?
One of the key elements of an experiment is the metrology. What do we
measure, how do we measure it, and how do we ensure that measurements do
not interfere with the experiments? Obvious measures may include the value
of the solution (for heuristics and approximation algorithms), the running
time (for almost every study), the running space, etc. These measures are
indeed useful, but a good understanding of the algorithm is unlikely to
emerge from such global quantities alone. We also need structural measures
of various types (number of iterations; number of calls to a crucial subroutine;
etc.), if only to serve as a scale for determining such things as convergence
rates. Knuth [Knu93] has advocated the use of mems,
or memory references, as a structural substitute for running time. Other
authors have used the number of comparisons, the number of data moves (both
classical measures for sorting algorithms), the number of assignments,
etc. Most programming environments offer some type of profiler, a support
system that samples code execution at fixed intervals and sets up a profile
of where the execution time was spent (which routines used what percentage
of the CPU time) as well as of how much memory was used; with suitable
hardware support, profilers can also report caching statistics. Profiling
is invaluable in algorithm engineering multiple cycles of profiling
and revising the most timeconsuming routines can easily yield gains
of one to two orders of magnitude in running time.
In our own experience, we have found that there is no substitute, when
evaluating competing algorithms for tractable problems, for measuring the
actual running time;
indeed, time and mems measurements, to take one example,
may lead one to entirely different conclusions. However, the obvious measures
are often the hardest to interpret as well as the hardest to measure accurately
and reproducibly. Running time, for instance, is influenced by caching,
which in turn is affected by any other running processes and thus effectively
not reproducible exactly. In the case of competing algorithms for tractable
problems, the running time is often extremely low (we can obtain a mini
mum spanning tree for a sparse graph of a million vertices in much less
than a second on a typical desktop machine), so that the granularity of
the system clock may create problems this is a case where it pays
to repeat the entire algorithm many times over on the same data, in order
to obtain running times with at least two digits of precision. In a similar
vein, measuring the quality of a solution can be quite difficult, due to
the fact that the optimal solution can be very closely approached on instances
of small to medium size or due to the fact that the solution is essentially
a zeroone decision (as in determining the chromatic index of a graph
or the primality of a number), where the appropriate measure is statistical
in nature (how often is the correct answer returned?) and thus requires
a very large number of test instances.
7 How to Present and Analyze the Data
Perhaps the first requirement in data presentation is to ensure reproducibility
by other researchers: we need to describe in detail what instances were
used (how they were generated or collected), what measurements were collected
and how, and, preferably, where the reader can find all of this material
online. The data should then be analyzed with suitable statistical
methods. Since attaining levels of statistical significance may be quite
difficult in the large state spaces we commonly use, various techniques
to make the best use of available experiments should be applied (see McGeoch's
excellent survey [McG92] for a discussion of several
such methods). Crosschecking the measurements with any available theoretical
results, especially those that attempt to predict the actual running time
(such as the "equivalent code fragments" approach of [FM97]),
is crucial; any serious discrepancy needs to be investigated. Normalization
and scaling are a particularly important part of both analysis and presentation:
not only can they bring out trends not otherwise evident, but they can
help in filtering out noise and thus increasing the significance of the
results.
8 Conclusions
Implementation and experimentation should become once again the "gold
standard" in algorithm design, for several compelling reasons:
 Experimentation can lead to the establishment of well tested and well
documented libraries of routines and instances.
 Experimentation can bridge the gap between practitioner and theoretician.
 Experimentation can help theoreticians develop a deeper understanding
of existing algorithms and thus lead to new conjectures and new algorithms.
 Experimentation can point out areas where additional research is most
needed.
However, experimentation in algorithm design needs some methodological
development. While it can and, to a large extent, should seek inspiration
from the natural sciences, its different setting (a purely artificial one
in which the experimental procedure and the subject under test are unavoidably
mixed) requires at least extra precautions. Fortunately, a number of authors
have blazed what appear to be a good trail to follow; hallmarks of good
experiments include:
 Clearly defined goals;
 Largescale testing, both in terms of a range of instance sizes
and in terms of the number of instances used at each size;
 A mix of realworld instances and generated instances, including
any significant test suites in existence;
 Clearly articulated parameters, including those defining artificial
instances, those governing the collection of data, and those establishing
the test environment (machines, compilers, etc.);
 Statistical analyses of the results and attempts at relating them to
the nature of the algorithms and test instances; and
 Public availability of instances and instance generators to allow other
researchers to run their algorithms on the same instances and, preferably,
public availability of the code for the algorithms themselves.
Acknowledgments
Bernard Moret's work was supported in part by the National Science Foundation
under grant ITR 0081404.
References
[ACVW01] L. Arge, J. Chase, J. S. Vitter, and R.
Wickremesinghe, Efficient sorting using registers and caches, Proc.
4th Workshop on Algorithm Eng. WAE 2000, Springer Verlag, 2001, to appear
in LNCS.
[AMO93] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin,
Network flows, Prentice Hall, Englewood Cliffs, NJ, 1993.
[Ben82] J. L. Bentley, Writing efficient programs,
PrenticeHall, Englewood Cliffs, NJ, 1982.
[Ben83] J. L. Bentley, Programming pearls: cracking
the oyster, Commun. ACM 26 (1983), no. 8, 549552.
[Ben90] J. L. Bentley, Experiments on geometric
traveling salesman heuristics, Report CS TR 151, AT&T Bell Laboratories,
1990.
[Ben99] J. L. Bentley, Programming pearls,
ACM Press, New York, 1999.
[BFG + 00] R. E. Bixby, M. Fenelon, Z. Gu, E. Rothberg,
and R. Wunderling, MIP: Theory and practice closing the gap,
System Modelling and Optimization: Methods, Theory and Applications (M.
J. D. Poweel and S. Scholtes, eds.), Kluwer Acad. Pub., 2000, pp. 1949.
[BH80] E. Balas and A. Ho, Set covering algorithms
using cutting planes, heuristics, and subgradient optimization: A computational
study, Math. Progr. 12 (1980), 3760.
[BMQ98] J. R. Black, C. U. Martel, and H. Qi, Graph
and hashing algorithms for modern architectures: design and performance,
Proc. 2nd Workshop on Algorithm Eng. WAE 98, MaxPlanck Inst. für
Informatik, 1998, in TR MPII981019, pp. 3748.
[CaDRKLS97] C. S. Chekuri, A. V. Goldberg adn
D. R. Karger, M. S. Levine, and C. Stein, Experimental study of minimum
cut algorithms, Proc. 8th ACM/SIAM Symp. on Discrete Algs. SODA 97,
SIAM Press, 1997, pp. 324333.
[CG97] B. V. Cherkassky and A. V. Goldberg, On
implementing the pushrelabel method for the maximum flow problem,
Algorithmica 19 (1997), 390410.
[CGM + 98] B. V. Cherkassky, A. V. Goldberg, P.
Martin, J. C. Setubal, and J. Stolfi, Augment or push: a computational
study of bipartite matching and unitcapacity flow algorithms, ACM J.
Exp. Algorithmics 3 (1998), no. 8, www.jea.acm.org/1998/CherkasskyAugment/.
[CGR96] B. V. Cherkassky, A. V. Goldberg, and T.
Radzik, Shortest paths algorithms: theory and experimental evaluation,
Math. Progr. 73 (1996), 129174.
[DS85] S. P. Dandamudi and P. G. Sorenson, An
empirical performance comparison of some variations of the kd tree
and bd tree, Int'l J. Computer and Inf. Sciences 14 (1985),
no. 3, 134158.
[ERS90] N. Eiron, M. Rodeh, and I. Stewarts, Matrix
multiplication: a case study of enhanced data cache utilization, ACM
J. Exp. Algorithmics 4 (1990), no. 3, www.jea.acm.org/1999/EironMatrix/.
[Fel99] M. Fellows, 1999, private communication.
[FM97] U. Finkler and K. Mehlhorn, Runtime prediction
of real programs on real machines, Proc. 8th ACM/SIAM Symp. on Discrete
Algs. SODA 97, SIAM Press, 1997, pp. 380389.
[GT01] A. V. Goldberg and K. Tsioutsiouliklis, Cut
tree algorithms: an experimental study, J. Algs. 38 (2001), no. 1,
5183.
[JAMS89] D. S. Johnson, C. R. Aragon, L. A. McGeoch,
and C. Schevon, Optimization by simulated annealing: an experimental
evaluation. 1. graph partitioning, Operations Research 37 (1989), 865892.
[JAMS91] D. S. Johnson, C. R. Aragon, L. A. McGeoch,
and C. Schevon, Optimization by simulated annealing: an experimental
evaluation. 2. graph coloring and number partitioning, Operations Research
39 (1991), 378 406.
[JM93] D. S. Johnson and C. C. McGeoch (eds.), Network
flows and matching: First DIMACS implementation challenge, vol.
12, Amer. Math. Soc., 1993.
[Joh01] D. S. Johnson, A theoretician's guide
to the experimental analysis of algorithms, DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, Amer. Math. Soc., 2001, to
appear.
[Jon86] D. W. Jones, An empirical comparison
of priority queues and eventset implementations, Commun. ACM 29
(1986), 300311.
[JT96] D. S. Johnson and M. Trick, Cliques, coloring,
and satisfiability: Second DIMACS implementation challenge, vol. 26,
Amer. Math. Soc., 1996.
[Knu93] D. E. Knuth, The Stanford GraphBase:
A platform for combinatorial computing, AddisonWesley, Reading,
Mass., 1993.
[Knu98] D. E. Knuth, The art of computer programming,
vols I (3rd ed.), II (3rd ed.), and III (2nd ed.), AddisonWesley,
Reading, Mass., 1997, 1997, and 1998.
[LL96] A. LaMarca and R. Ladner, The influence
of caches on the performance of heaps, ACM J. Exp. Algorithmics 1 (1996),
www.jea.acm.org/1996/LaMarcaInfluence/.
[LL97] A. LaMarca and R. Ladner, The influence
of caches on the performance of sorting, Proc. 8th ACM/SIAM Symp. on
Discrete Algs. SODA 97, SIAM Press, 1997, pp. 370379.
[McG92] C. C. McGeoch, Analysis of algorithms
by simulation: variance reduction techniques and simulation speedups,
ACM Comput. Surveys 24 (1992), 195212.
[MN95] K. Mehlhorn and S. Näher, LEDA, a
platform for combinatorial and geometric computing, Commun. ACM 38
(1995), 96102.
[MN99] K. Melhorn and S. Näher, The LEDA
platform of combinatorial and geometric computing, Cambridge U. Press,
Cambridge, UK, 1999.
[MS85] B. M. E. Moret and H. D. Shapiro, On minimizing
a set of tests, SIAM J. Scientific & Statistical Comput. 6
(1985), 9831003.
[MS91a] B. M. E. Moret and H. D. Shapiro, Algorithms
from P to NP, volume I: Design and efficiency, BenjaminCummings
Publishing Co., Menlo Park, CA, 1991.
[MS91b] C. Morgenstern and H. D. Shapiro, Heuristics
for rapidly fourcoloring large planar graphs, Algorithmica 6
(1991), 869891.
[MS94] B. M. E. Moret and H. D. Shapiro, An empirical
assessment of algorithms for constructing a minimal spanning tree,
DIMACS Series in Discrete Mathematics and Theoretical Computer Science
(N. Dean and G. Shannon, eds.), vol. 15, Amer. Math. Soc., 1994, pp. 99117.
[MWB + 01] B. M. E. Moret, S. K. Wyman, D. A. Bader,
T. Warnow, and M. Yan, A new implementation and detailed study of breakpoint
analysis, Proc. 6th Pacific Symp. Biocomputing PSB 2001, World Scientific
Pub., 2001, pp. 583594.
[Rei94] G. Reinelt, The traveling salesman: Computational
solutions for tsp applications, Springer Verlag, Berlin, 1994, in LNCS
840.
[RR99] N. Rahman and R. Raman, Analysing cache
effects in distribution sorting, Proc. 3rd Workshop on Algorithm Eng.
WAE 99 (Berlin), Springer Verlag, 1999, in LNCS 1668, pp. 183197.
[RS85] N. Robertson and P. Seymour, Graph minors
a surveys, Surveys in Combinatorics (J. Anderson, ed.), Cambridge
U. Press, Cambridge, UK, 1985, pp. 153171.
[San00] P. Sanders, Fast priority queues for
cached memory, ACM J. Exp. Algorithmics 5 (2000), no. 7, www.jea.acm.org/2000/SandersPriority/.
[SV87] J. T. Stasko and J. S. Vitter, Pairing
heaps: experiments and analysis, Commun. ACM 30 (1987), 234249.
[Vit01] J. S. Vitter, External memory algorithms
and data structures: dealing with massive data, ACM Comput. Surveys
(2001), to appear, available at www.cs.duke.edu/
jsv/Papers/Vit.IO survey.ps.gz.
[XZK00] L. Xiao, X. Zhang, and S. Kubricht, Improving
memory performance of sorting algorithms, ACM J. Exp. Algorithmics
5 (2000), no. 3, www.jea.acm.org/2000/XiaoMemory/.
