Abstract: A molecular computational procedure in which manipulation of DNA strands may be harnessed to solve a classical problem in NP - the directed Hamiltonian path problem - was recently proposed [Adleman 1994], [Gifford 1994]. The procedure is in effect a massively parallel chemical analog computer and has a computational capacity corresponding to approximately CPU years on a typical 10 MFLOP workstation. In this paper limitations on the potential scalability of molecular computation are considered. A simple analysis of the time complexity function shows that the potential of molecular systems to increase the size of generally solvable problems in NP is fundamentally limited to . Over the chemically measurable picomolar to molar concentration range the greatest practical increase in problem size is limited to

Key Words: Molecular Computation, DNA, NP-problem

1 Introduction

The (directed) Hamiltonian Path Problem (HPP), first proposed by William Rowan Hamilton, is one of the most troublesome problems in Graph theory. Very similar to the Euler path problem, it may be stated as follows: for a graph G with vertices , and edges , is there a path which starting and ending at specified vertices, visits each remaining vertex once and only once?

A trial and error approach is theoretically possible but proves impossible in practice for all but the smallest graphs; the number of paths which it is necessary to check simply grows too rapidly. In fact the HPP problem belongs to a class of problems called NP-complete, which rapidly become too complex to be solved by standard (deterministic) computers.

Problems of this nature are of considerable importance in computer science. There was therefore considerable interest in a recent report by Adleman which suggested that modern laboratory techniques of molecular biology could be employed to manipulate strands of DNA so as to solve the HPP [Adleman 1994]. A subsequent study has suggested how the method may be adapted for the solution of another classical problem in NP - the "satisfiability" or SAT problem [Lipton 1995].

Page 87

Adleman's insight lay in exploiting the parallelism inherent in chemical systems to yield a massively parallel analog computer, in a brute force approach to solving complex problems in NP. However, while it was noted that chemical systems - and thus molecular computers - can be easily scaled in size, the fundamental question of the scalability of problems solvable by molecular computation was left open. This paper addresses this central issue and, because the discussion draws on the diverse disciplines of molecular biology and computer science, begins with a brief review of both problem classification and Adleman's experiment.

1.1 Classification of Problem Complexity

Time complexity functions describe how the cost of solving a problem grows on increasing the size (or input length), n, of the problem. Computer scientists recognize two distinct classes [Garey and Johnson 1979]. Algorithms for which the time complexity function is O(p(n)), where p(n) is a polynomial in n, are termed polynomial time algorithms, whereas algorithms with no polynomial bound on their complexity are called exponential time algorithms. There is a remarkable difference in growth rates between polynomial and exponential algorithms [Fig. 1].

Figure 1: Time complexity functions for some polynomial (n, ) and exponential () functions as a function of problem size, n.

Page 88

Problems with polynomial algorithms are considered "good", whereas problems which are so difficult that no polynomial time algorithm can solve them are termed "intractable".

The theory of NP-completeness is concerned with decision problems. Informally, a decision problem is said to belong to the class P if there exists a Deterministic Turing Machine (DTM) program which solves the problem in polynomial time. Non-deterministic computation differs from deterministic computation in that a solution is exhaustively guessed and then checked. If a correct solution to a decision problem can be checked in polynomial time, then that problem is said to belong to the class NP (Non-deterministic Polynomial time). Since problems which can be solved deterministically in polynomial time can also be guessed non-deterministically and checked in polynomial time, we can write P NP [Fig. 2]. It is not known whether this inclusion is proper, that is whether NP\P is occupied.

A practical difficulty with problems in NP is that their time-complexity with a deterministic algorithm is exponential. Thus problems in NP are intractable with deterministic machines.

Certain problems in NP have the property that all other problems in NP may be polynomially reduced to them, and are termed NP-complete. If a polynomial time algorithm can be found for any member of this equivalence class then one can be found for all problems in NP. In other words, if for any (problem) NP-complete, then P if and only if P = NP. Problems NP-complete may be regarded as the hardest problems in NP. The relationship between P, NP and NP-complete is illustrated below

Figure 2: Classification of problem complexity

Not surprisingly the class of NP-complete problems has been the focus of considerable attention; including some recent ideas on "Quantum Computers" [DiVincenzo 1995] which could in principle solve NP problems in polynomial time. Several classical problems such as the Hamiltonian Path problem, the satisfiability problem have been shown to be NP-complete. For a more complete

Page 89

discussion of these points the reader is referred to the work of Garey and Johnson [Garey and Johnson 1979].

1.2 Molecular Computation and the HPP

The following is a simple non-deterministic algorithm for solving the HPP [Adleman 1994]:

(Step I) generate random paths in the graph
(Step II) keep only those paths with begin and end at the specified vertices.
(Step III) for an n vertex graph keep only those paths with n vertices.
(Step IV) keep only those paths which visit each vertex once.
(Step V) if any paths remain then those paths are Hamiltonian and the answer to the HPP is "Yes". Otherwise "No".

Adleman demonstrated, using a graph small enough to be solved by visual inspection [Fig. 3], that DNA could be used to solve the Hamiltonian Path Problem ( NP-complete).

Figure 3: The directed graph used by Adleman. For and the edges 0-1, 1-2, 2-3, 3-4, 4-5, 5-6, define a Hamiltonian path.

20-mer oligonucleotides were associated with each vertex in the graph. Oligonucleotides representing edges , linking vertices and , consisted of the 3' 10-mer of and the 5' 10-mer of [Fig. 4]. The highly specific binding of nucleotides A with T and C with G are such that strands which are Watson-Crick complementary to , denoted , serve to link compatible edges [Fig. 4]. Thus, ligation reactions can only generate nucleotide polymers corresponding to random paths on the graph.

Page 90

Step I is the most critical step in the algorithm as it is essential that all paths be explored, a task with exponential time complexity on a deterministic machine. Adleman addressed this by employing 50 picomole quantities of nucleotides corresponding to edges and the Watson-Crick complement of vertices. While 50 picomole quantities are minute in chemical terms, the parallelism of chemical systems is such that they correspond to copies of each edge and vertex - presumed sufficient to ensure that polymers corresponding to all paths in the graph are generated.

Figure 4: (a) DNA strand corresponding to vertex . (b) Watson-Crick complement of vertex . (c) strands corresponding to edges and . Nucleotide specificity in binding (A-T and C-G) ensures that vertex serves to bind edges and .

The remaining steps (II-V) in the algorithm were executed using standard laboratory techniques. Polymers corresponding to paths between the specified starting vertex and finishing vertex were selected (Step II) and amplified (i.e. concentration increased) by using the polymerase chain reaction (PCR). Argose gel electrophoresis was used to identify those polymers corresponding to paths involving n (= 7) vertices (Step III), and the product was again amplified using the PCR. Step IV was implemented using affinity purification for oligonucleotide sequences corresponding to vertices 1, 2, 3, 4, 5 and 6. Any remaining polymer must correspond to a Hamiltonian path, and a suitably primed PCR was employed to detect and amplify any such polymer.

2. Scalability of Molecular Computation

2.1 Computational Capacity of Molecular Computers

In Adleman's original paper the question of the scalability of problems solvable by molecular computation was left open [Adleman 1994]. The system contained 14 edges, each represented by 50 picomoles of nucleotide, corresponding to a total of 5 x edge nucleotides. Associating each nucleotide binding with a computation, Adleman's molecular system yielded a total of

Page 91

computations. Concentrations could easily be scaled to yield computations [Adleman 1994], which may be easily shown to correspond to CPU years on a typical 10 MFLOP workstation.

Given this phenomenal computational capacity, and the ease with which chemical systems can be scaled, it would be easy for the non-specialist to conclude that the potential for molecular computation is practically unlimited. This is not the case; [Lineal and Lineal 1995], [Bunow 1995] and [Lipton 1995] have identified problems in the potential scalability of Adleman's method. The following analysis quantifies these limitations.

Problems in NP, among them the directed Hamiltonian Path problem, have a time complexity of , where n is the ''size`` of the problem and p(n) a suitably defined polynomial in n [Garey and Johnson 1979]. Taking the best case situation, p(n) = n, the cost of solving a problem of size using a given computational resource is . Increasing the scale of the molecular system by a factor allows for the solution of larger problems, the size of which, , is given by solving the equality

and the increase in the size of solvable problem, , is given by

2.2 Size of Molecular Computers

The limitations expressed in equation (2) on the capacity of molecular computers to solve problems in NP are forcefully illustrated by considering limits on the scale of chemical systems. In practice, electroanalytical techniques allow for the detection of chemicals in the picomolar range [Dong and Wang 1988]. However, if we set aside this technological limit, and assume that individual molecules can in principle be detected (as in scanning tunnelling microscopy), then the smallest possible molecular computers would consist of order unity molecules.

The largest conceivable molecular computer would involve all particles in the universe. Of course, the amount of matter in the universe is not known, but it may be reasonably estimated as follows: Taking Hubbel's constant as [Kaufmann 1994] the age of the universe is 1.3 x years or 4.1 x seconds. Expanding at the speed of light this gives the volume of the universe as 7.9 x . Further assuming a universe with critical density, [Kaufmann 1994], i.e. a universe which has just sufficient matter to be closed, the mass of the universe may be estimated as 8.7 x . This is the equivalent of particles of mass 1 a.m.u. (taking Avogadro's number as 6.02 x ).

Page 92

Assuming therefore for the purposes of illustration that the universe contains particles, it follows that the size of molecular systems can span a maximum 80 decades, and the greatest possible increase in solvable problem size is given by equation (2) as

Thus the potential to increase the size of problem (in NP) which may be solved, by increasing the size of the chemical system, is very modest indeed, even when fantastic scales are involved.

More relevantly, over the realizable/detectable picomolar to molar concentration range we have

Thus while the size of molecular computers can be readily increased, the increase in the size of problems which such systems could solve is much more modest. The scalability of problem size with the size of molecular system employed is illustrated in Fig. 5.

Page 93

Figure 5: Increase in solvable problem size with increase in the size of the molecular computer (relative to the size of problem solvable with a system of order unity molecules).

3 Conclusion

The potential of molecular systems to increase the size of solvable problems in NP is fundamentally limited to the order of . Nevertheless, the computational capacity of Adleman's approach is enormous, equivalent to CPU years on a 10 MFLOP workstation. The analysis is general and based on the time complexity function, which gives the time in which a solution is guaranteed to be found, and is therefore a worst possible case analysis; faster solutions may be available in many cases. It is not therefore suggested that there will be no instances where molecular computers will prove superior to electronic computers.

References

[Adleman 1994] Adleman, L.: "Molecular Computation of Solutions to Combinatorial Problems"; Science, 266, (1994), 1021-1024.

[Bunow 1995] Bunow, B.: "On the Potential of Molecular Computing"; Science, 268, (1995), 481-482.

[DiVincenzo 1995] DiVincenzo, D. P.: "Two-Bit Gates are Universal for Quantum Computation", Phys. Rev. A, 51, (1995), 1015-1022.

[Dong and Wang 1988] Dong, S., Wang, Y.: "A Nafion/Crown Ether Film Electrode and its Application in the Anoidic Stripping Voltametric Determination of Traces of Silver"; Anal. Chim. Acta, 212, (1988), 341-347.

[Garey and Johnson 1979] Garey, M. R., Johnson, D. S.: "Computers and Intractability"; Freeman, New York, (1979).

[Gifford 1994] Gifford, D. K.: On the Path to Computation with DNA, Science, 266, 1994, 993-994.

[Lineal and Lineal 1995] Lineal, M., Lineal, N.: "On the Potential of Molecular Computing", Science, 268, (1995), 481.

[Lipton 1995] Lipton, R. J.: "DNA solution of Hard Combinatorial Problems", Science, 268, (1995), 542-545.

[Kaufmann 1994] Kaufmann, W. J.: "Universe"; Freeman, New York, (1994).

Acknowledgments

The author would like to thank Dr. T. Tuohy (Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati, Ohio), Dr. J. Greer

Page 94

(Department of Chemistry, Trinity College Dublin) and Dr. H. Gibbons (Department of Computer Science, Trinity College Dublin), for useful discussions.

Page 95