# Hardware Implementation of an Efficient Correlator for Interleaved Complementary Sets of Sequences

# María del Carmen Pérez, Jesús Ureña, Álvaro Hernández, Carlos De Marziani, Ana Jiménez

(Dept. of Electronics, University of Alcalá, Spain carmen@depeca.uah.es; urena@depeca.uah.es; alvaro@depeca.uah.es; marziani@depeca.uah.es; ajimenez@depeca.uah.es)

# William P. Marnane

(Dept. of Electrical & Electronic Engineering, University College Cork, Ireland marnane@ucc.ie)

**Abstract:** Some sensor systems are characterized by multiple simultaneous aperiodic emissions, with low signal-to-noise ratio and asynchronous detection. In these systems, complementary sets of sequences can be used to encode emissions, due to their suitable auto-correlation and cross-correlation properties. The transmission of a complementary set can be accomplished by interleaving the sequences of the set, generating a macro-sequence which is easily transmitted by a BPSK modulation. The detection of the macrosequence can be performed by means of efficient correlation algorithms with a notably decreased computational load and hardware complexity. This work presents a new hardware design in configurable logic of an efficient correlator for macrosequences generated from complementary sets of sequences. A generic implementation has been achieved, so the configuration can be changed according to the requirements of the application. The developed correlator has been tested in an ultrasonic pulse compression system in which real-time is needed. However, it is applicable in any multi-sensor or communication system where the goal is to make simultaneous emissions from different independent sources, minimizing mutual interference.

**Key Words:** Compression techniques, Complementary Sets of Sequences, efficient correlation, FPGA-based implementation, Ultrasonics.

Category: B.4.0, B.5.1, B.5.2

## 1 Introduction

Many sensor systems based on time-of-flight measurements, for example ultrasonic systems, have incorporated signal processing techniques previously used in radar theory [Audenaert et al. 1992] [Jörg and Berg 1998]. In these systems, signals are encoded with binary sequences with similar properties to Gaussian noise. Afterwards, the corresponding echoes are detected by means of correlation with the original binary sequence. Thus, an improvement of temporal precision, spatial resolution and robustness to noise is obtained, in comparison with methods based on threshold detection. In the case of simultaneous multi-emission systems, with non-periodic emissions and asynchronous detection, sequences with a high Auto-Correlation (AC) function and low sidelobes should be used, as well as with low aperiodic Cross-Correlation (CC) values among them. As a result, it is possible to detect them even in the case of high noise or attenuation with minimum interference.

Different sequences are used to encode the transducer emissions of these systems: pseudo-random sequences [Sarwate and Pursley 1980], Walsh-Hadamard [Harmuth 1969], Barker codes [Golomb and Scholtz 1965], Golay codes [Golay 1961] or Complementary Sets of M sequences (M-CSS) [Tseng and Liu 1972]. M-CSS are characterized by the ideal properties of the sum of their AC and CC functions: they provide a high process gain and also M mutually orthogonal sets. Therefore, M signals can be simultaneously emitted and receivers are able to distinguish them without interference. The problem is how to emit the M-CSS assigned to every transducer without a big change in their ideal characteristics.

There exist several methods of transmitting M-CSS [Álvarez et al. 2004]. A simple one consists of interleaving the bits from the sequences of complementary sets, obtaining a longer sequence, called *macro-sequence*. Thus, all changes in the environment have the same effect on the M sequences of the set. This is important since the detection capability of the system is based on the complementary property of sequences. Furthermore, efficient algorithms for M-CSS processing have been developed [De Marziani et al. 2005], reducing their computational cost and hardware complexity, in comparison with those required in straightforward matched filter implementations [Hahm et al. 1997].

This work presents a generic implementation in Field Programable Gate Arrays (FPGAs) of an efficient correlator of interleaved M-CSS, and its application to an ultrasonic pulse compression system. The rest of the paper is organized as follows: Section 2 reviews the properties that characterize the M-CSS, and details the transmission method based on the interleaving of the sequences of the set. Section 3 presents the efficient correlator of macro-sequences. The implementation of this system is considered in Section 4. Section 5 shows an ultrasonic detection system prototype based on the developed correlator. Finally, conclusions are outlined in Section 6.

#### 2 Encoding scheme based on interleaved *M*-CSS

#### 2.1 Complementary sets of sequences

A set of M equally long binary sequences  $(S_{i,M}[k] = [x_{i,1} x_{i,2} \cdots x_{i,L}]; 1 \leq i \leq M)$  has been defined, where  $x_{i,j} = \pm 1$  are the elements of the sequences,  $L = M^N$  is their length, and  $M = 2^m$ , with N and  $m \in \mathbb{N} - \{0\}$ . This set is said to be complementary if the Addition of the Auto-Correlation (AAC) functions

of the sequences from that set is zero for all non-zero time shifts (1).

$$AAC = \sum_{i=1}^{M} \phi_{S_{i,M}S_{i,M}}[k] = \phi_{S_{1,M}S_{1,M}}[k] + \dots + \phi_{S_{M,M}S_{M,M}}[k]$$
$$= M \cdot L \cdot \delta[k] = M \cdot M^{N} \cdot \delta[k] = M^{N+1} \cdot \delta[k]$$
(1)

Where  $\delta[k]$  is a Krönecker delta function; and  $\phi_{x,y}$  is the correlation between x and y.

Furthermore, it is possible to obtain M mutually orthogonal sets. Complementary sets of sequences are mutually orthogonal if the Addition of Cross-Correlation (ACC) functions from the corresponding sequences of any two sets is zero with any shift. Consider two different orthogonal sets, with sequences  $(S'_{i,M}[k], S''_{i,M}[k]; 1 \le i \le M)$  respectively, the ACC is (2):

$$ACC = \phi_{S'_{1,M}S''_{1,M}}[k] + \dots + \phi_{S'_{M,M}S''_{M,M}}[k] = 0 \qquad \forall k \tag{2}$$

#### 2.2 Transmission method by interleaving

In many systems, transducers used to transmit and receive the encoded signals have limited bandwidths. Therefore, it can be necessary to modulate the sequences in order to have their spectra at the region of maximum response of trasducers. In asynchronous detection, the process gain increase, provided by multi-level modulation techniques, is reduced because of the lower energy transmitted per bit [Álvarez et al. 2004]. Also, if the number M of sequences is high, the asynchronous detection scheme becomes more complex. Thus, a BPSK modulation has been adopted. It is desirable that all the sequences of the set be equally affected by changes in the environment, in order not to lose the complementary properties among them. A transmission method based on the construction of macro-sequences by interleaving sequences of complementary sets has been carried out.

In order to construct a macro-sequence Ms[k] of length  $M \cdot L$ , the bits  $x_{i,j}$  that form the sequences  $S_{i,M}$  have been grouped by interleaving as in (3), where  $\otimes$  is the interleaving operation.

$$S_{1,M} = \begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,L} \end{bmatrix} \\ S_{2,M} = \begin{bmatrix} x_{2,1} & x_{2,2} & \cdots & x_{2,L} \end{bmatrix} \\ \vdots = \begin{bmatrix} \vdots & \vdots & \cdots & \vdots \\ x_{M,1} & x_{M,2} & \cdots & x_{M,L} \end{bmatrix} \} \Rightarrow Ms = \begin{bmatrix} S_{1,M} \otimes S_{2,M} \otimes \cdots \otimes S_{M,M} \end{bmatrix} \\ = \begin{bmatrix} x_{1,1} & \cdots & x_{M,1} & \cdots & x_{1,L} & \cdots & x_{M,L} \end{bmatrix}$$

$$(3)$$

After interleaving, the ideal AC and CC properties of the M-CSS are degraded: neither the AC sidelobes, nor the CC values for two orthogonal sets are null anymore. A method to reduce this degradation is explained in [De Marziani et al. 2006]. It consists in choosing macro-sequences with better AC and CC properties, and in applying a post-processing algorithm. As a result, the non-ideal effects in correlation are minimized, and the interleaving method still assures the reliability of the detection system, even when operating in noisy environments.

# 3 Efficient correlator for interleaved *M*-CSS

# **3.1** Efficient correlation algorithm for complementary sets of sequences

In [De Marziani et al. 2005] a recursive method is developed to allow the construction of an efficient correlator for *M*-CSS, called *M*-ESSC. It can be considered as a digital filter of *N* similar stages. This filter simultaneously performs the correlation of the input signal with the *M* sequences in the set, reducing the total number of operations required by the correlation. Thus, the *M*-ESSC carries out  $3 \cdot 2^{m-1} \cdot m \cdot N$  operations, whereas a straightforward implementation of the correlator computes  $2^m \cdot (2^{mN+1}-1)$  operations. Thanks to this reduction in the number of operations, the *M*-ESSC can be implemented in configurable hardware, achieving real-time operation.

The efficient correlator is composed of a set of adders, subtractors and delay elements. Figure 1 shows the implementation scheme of an M-ESSC for the case of 2-CSS (a), 4-CSS (b), and finally, for the general case of M-CSS (c). As can be observed in Figure 1.c, the M-ESSC mainly consists of repeating the architecture of the M/2-ESSC. However, some modifications have to be considered with every iteration:

- Firstly, coefficients  $\{(\frac{M}{2}-1), (\frac{M}{2}-2), \cdots, 1\}$  multiplying delay elements  $D_n$  of the repeated structure are changed by coefficients  $\{(M-1), \cdots, (\frac{M}{2}+1), (\frac{M}{2})\}$ .  $D_n$  is a positive delay defined as  $D_n = 2^{Pn}$ , where Pn is any permutation of numbers  $\{0, 1, 2, \cdots, N-1\}$ .
- Secondly, it is necessary to add a set of multipliers  $\{w_{i,k}, 1 \leq i \leq m, 1 \leq k \leq N\}$  at the output of the initial structure. These multiplication coefficients  $w_{i,k}$  have values +1 or -1, and represent the generation seed  $W = [w_{m,N}w_{m-1,N}\cdots w_{m,N-1}w_{m-1,N-1}\cdots w_{1,1}]$  of the complementary set of sequences. Since these coefficients  $w_{i,k}$  are binary, multiply any signal by them is reduced to negate or not the signal. In practice, the value of these coefficients determine the final configuration of the adders and subtractors in the stage.
- A new set of adders and subtractors should be incorporated. Notice that adders and subtractors in the initial structure have the sign inverted.



Figure 1: Efficient correlator structures for *M*-CSS [De Marziani et al. 2005].

- Finally, it is necessary to reorder outputs  $\{O_{i,M}; 1 \leq i \leq M\}$  from every stage in the correlator, before connecting them to the next stage, since there does not exist a direct correspondence. In [De Marziani et al. 2005] a general recursive method is proposed to determine the output order.

This architecture is not suitable for a generic hardware implementation. The internal links in every stage and the order of the sequences at the output of every stage, differ depending on the number of sequences in the set. Furthermore, it does not take into account the interleaving transmission scheme. Therefore, it is necessary to carry out the following modifications:

- Transformation of the *M*-ESSC architecture into a regular and modular one, so it can be easily adapted to the different values of *M*.
- Adaptation of the architecture for the interleaving-based transmission method.

#### 3.2 Transformation of *M*-ESSC structure

\_\_\_\_

The first proposed transformation exploits the similarities between the structure of the M-ESSC (see Figure 2.a) and the structure of the decimation in frequency FFT (*Fast Fourier Transform*) algorithm (see Figure 2.b); so the properties of the FFT can be used to make the structure of the M-ESSC regular.

It is possible to arrange the decimation in frequency FFT algorithm, so the same geometry is obtained at every stage, allowing sequential data access and storage [Oppenhein et al. 1999]. With this purpose, the input should be organized in a bit-reversed order: if  $(n_2 n_1 n_0)$  represents the index n of the input sequence x[n] in binary form, the sample  $x[n_2 n_1 n_0]$  should appear at location  $(n_0 n_1 n_2)$  after reordering. In this way, the sample x[3] should appear at position 6, as can be observed in Table 1.

| $n_2 n_1 n_0 \Rightarrow \text{Bit reversed} \Rightarrow n_0 n_1 n_0$ |               |  |  |
|-----------------------------------------------------------------------|---------------|--|--|
| $(0 \ 0 \ 0)$                                                         | $(0 \ 0 \ 0)$ |  |  |
| $(0 \ 0 \ 1)$                                                         | $(1 \ 0 \ 0)$ |  |  |
| $(0\ 1\ 0)$                                                           | $(0\ 1\ 0)$   |  |  |
| $(0\ 1\ 1)$                                                           | $(1\ 1\ 0)$   |  |  |
| $(1 \ 0 \ 0)$                                                         | $(0 \ 0 \ 1)$ |  |  |
| $(1 \ 0 \ 1)$                                                         | $(1 \ 0 \ 1)$ |  |  |
| $(1 \ 1 \ 0)$                                                         | $(0\ 1\ 1)$   |  |  |
| $(1 \ 1 \ 1)$                                                         | $(1 \ 1 \ 1)$ |  |  |

 Table 1: Bit-reversed order algorithm.

If the same technique is used to rearrange data in the *M*-ESSC, a regular distribution of the internal links for every stage is obtained, and the order of the outputs can be easily achieved. Thus, coefficients multiplying delays  $D_n$  should be organized according to the bit-reversed order algorithm. In this way, the delay  $3 \cdot D_n$  appears at the location previously occupied by the delay  $6 \cdot D_n$ . As an illustration of this transformation, an example with an 8-ESSC is shown in Figure 3. It can be observed that, after the transformation, the internal connections between sub-stages are always the same, independently of the number M



Figure 2: Similarities between *M*-ESSC and FFT.

of sequences in the set. Furthermore, the stage outputs are always ordered in the same way. Therefore, the architecture of the generic correlator has a regular data flow, allowing a more efficient hardware implementation.



(b) Rearrangement of Figure 3.a, with the same geometry for every sub-stage.

Stage N

Figure 3: Transformation of the M-ESSC structure proposed in [De Marziani et al. 2005].

# 3.3 Algorithm adaptation to the transmission scheme by interleaving

Stage 1

Since a macro-sequence is generated by interleaving the M sequences of the complementary set, the correlation can not be carried out directly with the sequences of the set, but with the interpolated versions of these sequences. The distance between two bits from the same sequence in the transmitted macro-sequence is M bits, so the interpolation factor is equal to M. The received signal is demodulated to extract the emitted macro-sequence. The demodulation is carried out asynchronously, by firstly sampling the received signal at a high acquisition rate, and then correlating it with the symbol used in the modulation. Let  $O_f = \frac{f_a (acquisition frequency)}{f_e (emission frequency)}$  be the over-sampling factor, and  $N_{SM}$  the number of carrier periods in the modulating symbol. The bits corresponding to a sequence  $S_{i,M}$  are obtained every  $M \cdot O_f \cdot N_{SM}$  samples, so it is necessary



Figure 4: Block diagram of the detection of a macro-sequence using a Ms-ESSC.

to decimate the demodulated signal by the same factor prior to computing the correlations. This decimation can be easily achieved by multiplying the delays  $D_n$  from every stage by  $M \cdot O_f \cdot N_{SM}$ .

One advantage associated with this transmission scheme is that only one correlator is necessary to detect the signal from every emitter. The *M*-ESSC adapted to interleaving (hereafter *Ms*-ESSC) simultaneously performs the correlation of the input signal r[k] with the *M* sequences of the set, providing the AC  $\phi_{rS_{1,M}}$  with the first sequence of the set at the first branch of the *Ms*-ESSC, the AC  $\phi_{rS_{2,M}}$  with the second sequence of the set at the second branch, and so on. Nevertheless, these ACs are not in phase, so, it is necessary to insert delays  $D_o$  at every branch of the *Ms*-ESSC, in order to compute the in-phase addition of the AC functions (4).

$$\phi_{rMs}[k] = \phi_{rS_{1,M}}[k - (M - 1) \cdot D_o] + \dots + \phi_{rS_{M-1,M}}[k - D_o] + \phi_{rS_{M,M}}[k]$$
$$= \sum_{i=1}^{M} \phi_{rS_{i,M}}[k - (M - i) \cdot D_o]$$
(4)

Where  $D_o$  is a delay depending on the over-sampling factor  $O_f$ , and on the number  $N_{SM}$  of carrier periods in the modulating symbol:  $D_o = O_f \cdot N_{SM}$ . Figure 4 shows the detection scheme of a macro-sequence using *Ms*-ESSC.

# 4 Hardware implementation of the efficient correlator for interleaved M-CSS

#### 4.1 Design strategy

The design of the efficient correlator for interleaved M-CSS has been based on five generic parameters in the VHDL specification. It is possible to configure the



Figure 5: Input and output ports of the correlation module based on a *Ms*-ESSC structure.

number M of sequences in the set through the parameter m  $(M = 2^m)$ . It is possible to select the number N of stages in the correlator, and therefore the length  $L = M^N$  of the sequences to be processed. The data-width DW at the input can be configured as well, in order to have more accuracy in the obtained results. Considering the demodulation effect, the number  $N_{SM}$  of carrier periods used in the modulation and the over-sampling rate  $O_f$  can be also selected by the user.

Figure 5 shows the structure of the hierarchical root block. As the number of sequences in the set are defined during hardware synthesis, the number of output ports in the Ms-ESSC block is specified at the same moment. Since it is not possible in VHDL to have a generic number of ports, only one single port with variable size has been defined. This port consists of M Ms-ESSC outputs concatenated. The correlation with the first sequence of the set  $\phi_{rS_{1,M}}$  comes from the  $DW + m \cdot N$  most significant bits; and the correlation with the last sequence of the set  $\phi_{rS_{M,M}}$  comes from the  $DW+m\cdot N$  least significant bits. Note that every Ms-ESSC output { $\phi_{rS_{i,M}}$ ,  $1 \leq i \leq M$ } requires  $DW + m \cdot N$  bits, to avoid overflow in internal computations. These outputs are delayed according to (4). With this purpose, specific shift registers in Xilinx FPGA's architecture (SRL16 modules [Xilinx 2005]) has been used. Finally, the in-phase addition of the Ms-ESSC ouputs is performed.

The Ms-ESSC is based on N similar stages connected in cascade. Internally the design of these stages has been divided into four blocks, as shown in Figure 6.

- The first block implements the required delays for each stage, by using SRL16 modules.
- The second block rearranges the delayed outputs according to the bit-reversed order algorithm explained in Subsection 3.2.



Figure 6: Internal configuration of a generic stage.

- A combinational block carries out the operations of the stage. Multiplication operations by coefficients  $w_{i,k}$  have been reduced to additions and subtractions. Thereby, the values of these coefficients determine the configuration of the adders and subtractors at every stage (see Figure 7). Due to the new geometry of the efficient correlator, it is enough to implement one basic operation block sub-stage, and to repeat it in synthesis time, depending on the parameter m. Also, the introduction of pipeline registers between sub-stages increases the operation frequency of the system.
- Finally, an arrangement block orders the outputs from the operation block for its correct connection to the next stage.

It should be remarked that, at every stage, the number of bits required to store the partial results is incremented, so the larger delays of the algorithm should be placed at the first stages of the correlator, and the smaller ones at the last stages. Thus, a reduction of the total memory required by the correlation is obtained. Furthermore, the delays at the first stage of the correlator share the input of the system r[k], so it is enough to implement the largest delay at this stage,  $(M-1) \cdot D_1 \cdot DW$ , and to allow the access to intermediate positions, as can be observed in Figure 8. It implies a reduction of  $\left(\frac{M^2-3\cdot M+2}{2}\right) \cdot M \cdot O_f \cdot N_{SM} \cdot DW$  memory positions. It would be also possible to optimize the memory requirements by using range-propagation analysis.

#### 4.2 Results

The resource requirements and maximum operating frequencies in the Ms-ESSC implementation depend on the length of macro-sequences. A macro-sequence of length  $L_{Ms} = M \cdot L$  can be generated from complementary sets with different number M of sequences, keeping the final value  $L_{Ms}$  by changing the length L



Figure 7: Implementation scheme of a multiplier, adder and subtracter for a stage of the Ms-ESSC.



Figure 8: Structure of a correlation module based on a 4s-ESSC.

of the sequences. In Table 2 the resource requirements of the correlation module for macro-sequences with length  $L_{Ms} = 64$  are shown. It have been generated from 2-CSS with length L = 32, 4-CSS with length L = 16, and 8-CSS with length L = 8. In Table 3 the resource requirements of the correlation module for macro-sequences with length  $L_{Ms} = 256$  are shown, generated from 2-CSS with length L = 128, 4-CSS with length L = 64, and 16-CSS with length L = 16. In both Tables, a single carrier period  $N_{SM} = 1$  has been used, with an over-sampling factor  $O_f = 10$ , and the number of bits of the input signal is DW = 8. The correlation system has been implemented in a Spartan3 xc3s1500-5fg676 FPGA [Xilinx 2005], and results are given after place and route. It can be

| xc3s1500-5fg676  | M = 2, L = 32 | M = 4, L = 16          | M = 8, L = 8 |
|------------------|---------------|------------------------|--------------|
| Slices           | 323           | 375                    | 804          |
| Luts             | 432           | 522                    | 1079         |
| IOBs             | 29            | 28                     | 27           |
| Flip-Flops       | 123           | 160                    | 254          |
| Max. Freq.       | 106.135 MHz   | $81.473 \mathrm{~MHz}$ | 73.438 MHz   |
| $f_{FPGA} = f_a$ |               |                        |              |
| Max. Throughput  | 106.135       | 81.473                 | 73.438       |
| (M samples/s)    |               |                        |              |

**xc3s1500-5fg676** ||M = 2, L = 32||M = 4, L = 16||M = 8, L

Table 2: Resources required by the correlation of macro-sequences with length  $L_{Ms} = 64$ .

| xc3s1500-5fg676  | M = 2, L = 128 | M = 4, L = 64 | M = 16, L = 16 |
|------------------|----------------|---------------|----------------|
| Slices           | 985            | 1131          | 2980           |
| Luts             | 1152           | 1386          | 3734           |
| IOBs             | 33             | 32            | 30             |
| Flip-Flops       | 183            | 270           | 688            |
| Max. Freq.       | 110.011 MHz    | 92.920 MHz    | 62.477 MHz     |
| $f_{FPGA} = f_a$ |                |               |                |
| Max. Throughput  | 110.011        | 92.920        | 62.477         |
| (M samples/s)    |                |               |                |

Table 3: Resources required by the correlation of macro-sequences with length  $L_{Ms} = 256$ .

observed that, for equally long macro-sequences, less resources are required for cases generated from M-CSS with low number M of sequences. In regard to the operating frequency, the more sequences M in the set, the harder is to compute the combinational addition of the M output branches of the Ms-ESSC, so the operating frequency decrease. Also, by comparing Tables 2 and 3, it is possible to verify that large macro-sequences have high requirements.

Finally, a performance comparison between Ms-ESSC and a straight-forward matched filter implementation has been achieved. Figure 9 shows the hardware implementation scheme of a straight-forward correlator based on internal BRAM memory [Pérez et al. 2006]. The demodulated signal is acquired with a frequency  $f_a$ , and stored in a buffer of size  $L_{Ms} \cdot O_f \cdot N_{SM}$ , that has been implemented in BRAM blocks [Xilinx 2005]. This sampling buffer is read every  $f_{FPGA} = L_{Ms} \cdot f_a$ . At each reading, a sample is added or subtracted with the previous accumulated result, depending on the used macro-sequence



Figure 9: Sequential implementation of a straight-forward correlator based on internal BRAM memory.

symbol. Whenever a new sample is received, the accumulated value is reset in order to compute a new correlation. The access to the sampling buffer is carried out in gaps of  $O_f \cdot N_{SM}$  positions, by taking into account the demodulation effect. To achieve a comparison between this structure and the correlation module based on a Ms-ESSC, a macro-sequence of length  $L_{Ms} = 64$ has been generated from 2-CSS of length L = 32. In Table 4 the requirements for the straight-forward implementation are shown. Correlation results are obtained every  $t_a = t_{FPGA} \cdot L_{Ms} = 8.29 \ ns \cdot 64 = 531.14 \ ns$ , being  $t_a = \frac{1}{f_a}$  and  $t_{FPGA} = \frac{1}{f_{FPGA}}$ . Nevertheless, the 2s-ESSC implementation provides a new correlation result every FPGA clock cycle (9.42 ns), assuring real-time operation, although it requires more resources. To increase the operation frequency of the straight-forward implementation, tasks have to be overlapped temporarily, which implies introducing registers to store intermediate data. Therefore, the amount of required resources increases, approaching that required by the 2s-ESSC implementation. Thus, it can be stated that a Ms-ESSC implementation is more suitable for real-time correlation of macro-sequences generated from M-CSS.

| xc3s1500-5fg676                       | M=2, L=32   |
|---------------------------------------|-------------|
| Slices                                | 49          |
| Luts                                  | 85          |
| IOBs                                  | 26          |
| Flip-Flops                            | 59          |
| Max. Freq. $f_{FPGA}$                 | 120.496 MHz |
| Max. Freq. $f_a$                      | 1.88 MHz    |
| <b>Max. Throughput</b> $(Msamples/s)$ | 1.88        |

Table 4: Resources required by a straight-forward implementation.

### 5 Application example

The suitability of the Ms-ESSC has been tested by including it in the ultrasonic pulse compression system shown in Figure 10. Four different macro-sequences  $Ms^0$ ,  $Ms^{11}$ ,  $Ms^{20}$  and  $Ms^{31}$  with length  $L_{Ms} = 256$  are obtained, by using four different 4-CSS, generated with seeds 0, 11, 20 and 31 respectively. Each one of these macro-sequences is assigned to a different emitter from which is transmitted once modulated. Figure 11 depicts the position of the emitters with regard to the receiver in a xy-plane; although three coordinates are given (x: lenght, y: width, z: height). Four REGAL-RH16E [REGAL 2006] high power speakers have been used to continuously transmit the four macro-sequences in a simultaneous way. These speakers are often used in the audible frequency range from 280 Hz to 20 kHz. Nevertheless, after analysing the spectral characteristics of theses speakers, it was observed that the speakers can also operate at a 25 kHz ultrasonic frequency. Thus, the macro-sequences have been BPSK modulated by using a symbol with one period of a 25 kHz squared signal.

All the emitted macro-sequences are received by a condenser microphone capsule Avisoft Bioacoustics CM16 [Avisoft 2006], which has a flat frequency response between 15 kHz and 180 kHz. Afterwards, the received signal is amplified and digitalized at a sampling frequency of 500 kHz, so  $O_f = 20$ ; then, it is demodulated by correlation with the modulation symbol. Four 4s-ESSCs simultaneously correlate the demodulated signal with the four ideal emitted macro-sequences. As was explained in Subsection 3.3, it is necessary to add a set of additional delays at the output of every 4s-ESSC in order to compute the in-phase sum of the AC functions. When the last sample from a macro-sequence is processed, a peak in the correlator for this macro-sequence is obtained. A peak detector confirms the peaks exceeding a static threshold  $U_e$ , assuming that there does not exist a higher peak in the neighborhood.

Figure 12 shows the results obtained at the output from each correlator. A



Figure 10: Block diagram of a macro-sequences detection system.



Figure 11: Diagram of the transducer position in the experimental tests.

circle has been introduced to mark the values confirmed by the peak detector. Every time a macro-sequence is completely received, a new peak is obtained in the corresponding output. Separation between peaks from one macro-sequence is equal to the emission interval  $t_e = 10.24$  ms. The detection system is able to distinguish between the four macro-sequences, despite they are received practically overlapped in an environment with an approximate signal-to-noise ratio of  $SNR = 18 \ dB$ . Figure 13 represents an enlargement of Figure 12, where the distance between peaks can be observed. The presence of sidelobes caused by the interleaving process can be reduced by applying the post-processing algorithm detailed in [De Marziani et al. 2006].



Figure 12: Results obtained from the macro-sequences detection system.

#### 6 Conclusions

A generic hardware implementation of an efficient correlator for interleaved complementary sets of sequences has been presented. The design has been developed as a parameterized module, able to be adapted to the requirements from different sensor systems. By synthesizing the design, the number of sequences of the complementary set, their length, the sampling factor, the number of periods of the symbol used in the modulation, and the data width can be changed.

The performance of the correlator has been tested on a Xilinx Spartan3 FPGA. Furthermore, it has been included on an ultrasonic sensory system to verify the detection of simultaneous ultrasonic emissions in real-time.

#### Acknowledgements

This work has been possible thanks to the Madrid Community (FPI grant and ANESUS project: CAM-UAH2005/016), and from the Spanish Ministry of Science and Technology (PARMEI: DIP2003-08715-C02-01).



Figure 13: Enlargement of Figure 12.

# References

- [Audenaert et al. 1992] Audenaert, K., Peremans, H., Kawahara, Y., Campenhout, J.: "Accurate Ranging of Multiple Objects using Ultrasonic Sensors"; Proc. IEEE International Conference on Robotics and Automation, Vol. 2, (May 1992), 1733-1738.
- [Avisoft 2006] AVISOFT BIOACOUSTICS: "Avisoft Bioacoustics UltraSoundGate microphone capsule CM16"; Product Spec. (2006).
- [Álvarez et al. 2004] Álvarez, F., Ureña, J., García, J., Mazo, M., De Marziani, C., Hernández, A.: "A comparative analysis of two modulation schemes for the efficient transmission of complementary sequences in a pulse compression system"; Proc. IADT-tcn2004 Int. Conf. on Telecommunications and Computer Networks, (2004), 26-30.
- [Budisin et al. 1989] Budisin, S., popovic, P., Indjin, I.: "Designing radar signals using complementary sequences"; Proc. on Radar, 87, (1989), 593-597.
- [De Marziani et al. 2005] Marziani, C., Ureña, J., Hernández, A., Mazo, M., Álvarez, F., García, J., Donato, P.: "Modular Architecture for Efficient Generation and Correlation of Complementary Sets of Sequences"; IEEE Trans. on Signal Processing, Accepted for its publication, Ref. SP33377 (2005).
- [De Marziani et al. 2006] Marziani, C., Ureña, J., Hernández, A., Mazo, M., García, Jiménez A., Villadangos, J., Pérez, M., Álvarez, F.: "Inter-Symbol Interference Reduction on Macro-Sequences Generated from Complementary Sets of Sequences"; IEEE Proc. 32nd Annual Conference of the IEEE Industrial Electronics Society (November 2006).

[Golay 1961] Golay, M.: "Complementary Series"; IRE Trans. On Inform. Theory, Vol. IT-7, N 2, (1961), 82-87.

- [Golomb and Scholtz 1965] Golomb, S., Scholtz, R.: "Generalized Barker sequences"; IEEE Trans. Inf. Theory, Vol. IT-11, N 4, (1965), 533-537.
- [Hahm et al. 1997] Hahm, M., Friedman, G., Titlebaum, E.: "A comparison of Analog and Digital Circuit Implementations of Low Power Matched Filters for Use in Portable Wireless Communication Terminals"; IEEE Trans. on circuits and systems II: analog and digital signal processing, Vol. 44, N 6, (June 1997), 498-506.
- [Harmuth 1969] Harmuth, H.: "Application of Walsh Functions in Communications"; IEEE Spectrum, Vol. 6, (1969), 82-91.
- [Jörg and Berg 1998] Jörg, K., Berg, M.: "Sophisticated Mobile Robot Sonar Sensing with Pseudo-Random Codes"; Robotics and Autonomous Systems, Vol. 25, (1998), 241-251.
- [Oppenhein et al. 1999] Oppenhein, A., Shafer, R., Buck, J.: "Discrete-Time Signal Processing"; Prentice Hall, 2<sup>nd</sup> edition, (1999), ISBN: 0137549202.
- [Pérez et al. 2006] Pérez, M., Hernández, A., Ureña, J., De Marziani, C., Jiménez, A.: "FPGA-based Implementation of a Correlator for Kasami Sequences"; Proc. 11th IEEE International Conference on Emerging Technologies and Factory Automation ETFA'06, (2006), 1141-1144.
- [REGAL 2006] REGAL ELECTRONIC: "Regal Electronic Speakers"; Product Spec. (2006).
- [Sarwate and Pursley 1980] Sarwate, D., Pursley, M.: "Crosscorrelation Properties of Pseudorandom and Related Sequences"; Proc. IEEE, Vol. 68, N 5, (1980), 593-619.
- [Tseng and Liu 1972] Tseng, C., Liu, C.: "Complementary Sets of Sequences"; IEEE Trans. Inf. Theory, Vol. IT-18, N 5, (1972), 644-652.
- [Xilinx 2005] Xilinx, Inc.: "Spartan-3E FPGA Family: Complete Datasheet"; Product Docummentation, (November 2005).