Design and Implementation of a Novel High Performance Content Processor for Storage Disks Using an Exact String Matching Architecture

Journal of Applied Sciences

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 488-496
DOI: 10.3923/jas.2009.488.496

Design and Implementation of a Novel High Performance Content Processor for Storage Disks Using an Exact String Matching Architecture

A. Peiravi and M. J. Rahimzadeh

Abstract: In this study, a high performance content processor for storage disks is proposed which could be easily installed in any host as an interface between the hard disk and the system bus. Moreover, a novel and powerful exact string matching architecture is presented using which we can search for several thousand strings at high rates. The average database size and associated software support systems are growing at rates that are greater than the increase in general purpose processor performance. Also, at the physical level there is a remarkable growth in disk storage performance. However, with the ever-increasing amounts of information available, the ability to accurately and quickly search and retrieve desired information has become a critical issue. We have implemented the proposed architecture on a Xilinx XC4VFX100 Field Programmable Gate Array (FPGA) and shown that the system can search for over sixteen-thousand 32 byte strings with a speed near the maximum stated in ATA-7 standard. Present design implementation has a much better performance as measured in Throughput/(LogicCells/Char) when compared with the best existing designs.

Fulltext PDF Fulltext HTML

How to cite this article

A. Peiravi and M. J. Rahimzadeh, 2009. Design and Implementation of a Novel High Performance Content Processor for Storage Disks Using an Exact String Matching Architecture. Journal of Applied Sciences, 9: 488-496.

Keywords: bloomier filter, Custom computing, string matching, storage devices and programmable hardware

INTRODUCTION

There is an ever increasing growing trend in the size of data bases and the associated systems. This is due to the increased public use of data storage especially with the rapid developments in the use of internet. On the other hand, there has been a great deal of progress in storage systems such as high capacity disk storage devices. Thus, there is a need for a more reliable search technique as a basis of many modern data processing applications. Another important application is data security which is becoming more and more important as internet-based transactions are on the rise. The threats posed by potential attackers who seek to discover important features of the data being transmitted should be considered in the design and implementation of data security applications. This also reinforces the need for fast and reliable search algorithms which do not slow down the data transfer process.

The software based techniques are not fast enough to meet these requirements. Thus, there is a serious need for the design and implementation of high performance hardware based content processors. A good place for such hardware devices is the data storage devices of the host processors where a large amount of data is constantly being transferred at a high speed. This could include the hard disk of a personal computer or the server of an electronic mail host.

In this study, the design and implementation of a high performance content processor for storage disks using a novel exact string matching architecture is presented. This content processor could act as an interface between the main CPU and the disk as a hard disk accelerator and present the capabilities of a content processor to search and find specific data in the text being transferred without using up the main processor’s resources. It includes a powerful search core using a novel architecture which is scalable and can search in a large number of text strings at a high speed. The proposed system is implemented using an FPGA.

STRING MATCHING ALGORITHMS

There are many string matching algorithms already available. The problem which is addressed in string matching problems is defined as follows. Given a text, T, where T is an array T[1..n] of characters and a pattern P, where P is an array P[1..m] of characters, where we usually have mn and typically n, we wish to know whether P appears as a substring of T and if so, where and may be, how often. The applications of these algorithms are in text editors, web searching and genetics. The common algorithms for string matching are the naive algorithm or brute force method with a time complexity of O(n-m+1)xm) and the pre-computation method where we compute some extra information about P and/or T before starting the search to speed up the search for P in T as a result of which this can be faster than the naive algorithm. The history of this dates back to Karp and Rabin (1987), who presented what later became known as the Karp-Rabin algorithm where a hash function h is used for strings of length m. Other algorithms were presented later. For example, in the Boyer-Moore Algorithm both character-jump heuristic and looking glass heuristic are applied. Here the worst case run-time is O and the runtime can be improved to O(m+n) by also using good-suffix heuristic. There are also many other algorithms such as the Morris-Pratt, Knuth-Morris-Pratt, Galil-Seiferas, Boyler-Moore, Turbo-Boyler-Moore, Tunnel-Boyler-Moore, Zhu-Takaoka, Berry-Ravidran, Smith, Raita, Horsepool, etc. to name a few. The details of these algorithms are presented in many studies which could be referred to by the interested reader.

IMPLEMENTATION APPROACHES

Data security could be implemented using either a software or a hardware approach. The software approach is limited to single user/client-side applications or very low bandwidth server applications because of the computational intensity of authentication and encryption algorithms. Software implementations only support modest throughput while better performance could be achieved with a reasonable cost with hardware. The next approach would be to accelerate software solutions with custom hardware which is common in x86- and Network Processor Unit (NPU)-based system designs. Several companies such as Cisco, NetScreen and PMC-Sierra produce ASIC security programmable co-processors. In many applications an in-line architecture is more appropriate where in-line processors integrate data security directly into the system data path. This approach is used in communications-centered applications such as VPN gateways and storage security appliances.

Many researchers have worked on FPGA-based string matching for data security due to their reconfigurability. ASIC and FPGA solutions provide much better performance when compared to software solutions. Many ASIC intrusion detection systems have been commercially developed. They usually store their rules in large memory blocks and examine incoming packets in integrated processing engines. ASIC programmable security co-processors are expensive, complicated and do not achieve impressive performance. The memory blocks that store the data security rules are re-loaded whenever an updated rule set is available. The most common technique for pattern matching in ASIC intrusion detection systems is the use of regular expressions. Sailesh et al. (2006) used regular expression-based pattern matching since it offers superior expressive power and flexibility. The approach presented by Sailesh Kumar et al. (2006) could reduce the number of distinct transitions between states. The architecture which was proposed performed deep packet inspection at multi-gigabit rates using multiple on-chip memories in such a way that each remains uniformly occupied and accessed over a short duration, thus effectively distributing the load and enabling high throughput providing a cost-effective packet content scanning at OC-192 rates with memory requirements consistent with current ASIC technology. The problem with their approach is that it is not trivial to update the rule set and the system should be able to support a variation of rules, with sometimes complex syntax and special features. However, FPGAs are better since they are reconfigurable and exploit parallelism. An FPGA-based system can be entirely changed by just keeping the interface constant. Pryor et al. (1993) were the first to propose a flexible, reprogrammable hardware solution to the acceleration of text-based keyword search problems. Their solution employed an attached processor called Splash 2, which exploits the speed and reconfigurability of field programmable gate array technology. Their algorithm, implemented on Splash-2 platform succeeded to perform a dictionary search that consisted of English alphabet characters without case sensitivity. They achieved a great performance and performed a low overhead and-reduction of the match indicators using hashing. The Splash 2 system was designed and built and it was comprised of an interface board to a Sun Sparc-2 host and up to 16 Splash boards, each of which contained 16 Xilinx 4010 FPGAs interconnected in a linear array and also through a 16-way full crossbar switch. Each Xilinx chip was coupled with a 4 Mbit static RAM through a dedicated interface. The text searching program implemented on a one-board Splash 2 system was capable of processing text at an estimated rate of 50 million characters per second.

Since 1993, many others have worked on implementing FPGA-based string match systems. Sidhu and Parasanna (2001) presented a method for finding matches to a given regular expression in given text using FPGAs. To match a regular expression of length n, a serial machine requires O(n²) memory and takes 0(1) time per text character. Their approach required O(n²) space and processed a text character in 0(1) time (one clock cycle). The improvement that they had obtained was due to the Nondeterministic Finite Automaton (NFA) used to perform the matching. They implemented their algorithm for conventional FPGAs and the Self-configurable Gate Array (SRGA) and mapped the NFA logic onto the Virtex XCV100 FPGA and the SRGA.

Several techniques have been proposed for string matching especially in the context of network intrusion detection in recent years. Most hardware-based techniques use reconfigurable logic/FPGAs. Some of the FPGA based techniques use on-chip logic resources to compile strings into parallel state-machines or combinational logic. Although these techniques are fast, they are known to use up most of the chip’s resources with just a few thousand strings. Hence, scalability with string set size is the main concern with purely FPGA-based approaches.

From the scalability viewpoint, memory-based techniques are attractive since memory chips are cheap. Unfortunately, memory access speed becomes a bottleneck in using memory-based techniques. An optimized hardware-based Aho-Corasick algorithm was proposed by Tuck et al. (2004) that drastically reduces the amount of memory required and improves its performance on hardware implementations. This algorithm relies on run length encoding and/or bit-mapping adapted from similar techniques used to speed up IP-lookup. Although the algorithm is very fast even in the worst case (8 Gbps scanning rate), it assumes the availability of an excessively large memory bus such as 128 bytes to eliminate the memory access bottleneck and would suffer from power consumption issues.

A Bloom filter based algorithm proposed by Dharmapurikar et al. (2004) is made use of a small amount of embedded-memory along with off-chip memory to scan a large number of strings at a high speed. Upon the approximate matching done using on-chip Bloom filters, the presence of the string is verified by using a hash table in the off-chip memory. Dharmapurikar et al. (2004) argued that since the strings of interest are rarely found in the packets, the quick check in Bloom filter reduces more expensive memory accesses and greatly improves the overall throughput. However, their work does not offer worst case guaranteed bandwidth. An algorithm presented by Dharmapurikar and Lockwood (2005) modifies the classic Aho-Corasick algorithm to allow it to consider multiple characters. This modification allows parallelizing the Aho-Corasick algorithm and using multiple instances of it to achieve a required speedup by advancing the text stream by multiple characters at a time.

With the help of Bloom filters in implementing the automation, off-chip memory access can be suppressed to a great extent. However, when a string of interest appears in the text stream, their machine should perform the necessary memory accesses which would slow down the string matching process. Tan and Sherwood (2006) proposed a technique for reducing the out-degree of string-matching state machines. The technique presented by Tan and Sherwood (2006) is allowed state machines to be represented using significantly less state memory that would be required in a naive implementation. Through the use of bit-splitting, a single state machine is split into multiple machines that handle some fraction of the input bits. The earlier works did not provide many of the requirements for a realistic implementation. The basic architectural design proposed by Tan and Sherwood (2006) was adapted to an FPGA implementation by Jung et al. (2006), who showed that by considering FPGA implementation details such as restriction of a block RAM size, pin count and routing delay, this architecture does not use on-chip memory efficiently and wastes a high percentage of on-chip block RAMs. Sourdis et al. (2005) and Papadopoulos and Pnevmatikatos (2005) tried to combine hashing and use of memory in order to present a cost-effective exact string matching solution. However, they had to use additional indirection memories to reduce the sparseness of the memory and compact storing of the search strings because of their hashing schemes. Moreover, the implementation of their hashing modules uses FPGA lookup tables and the place & route of such a design takes a couple of hours.

BLOOMIER FILTER

The key to the proposed string matching architecture in this study is that it is based on the Bloomier filter introduced by Chazelle et al. (2004), which is a generalization of Bloom filter. While a Bloom filter is designed to represent a set, a Bloomier filter is designed to represent a function on a set. Specifically, the Bloomier filter computes arbitrary functions defined only in a small subset S, of size n, taken from a universal set U of size N. For an arbitrary function, f, defined over a set, S, the filter always returns f(x) if x ∈S. Otherwise the filter will return null with a high probability. Just as a Bloom filter can have false positives, a Bloomier filter can return a non-null value for an element not in the set.

The data structure consists of m q-bit locations and is addressed by k hash functions, where m, q and k are design parameters. We call this data structure the lookup table. Also, the k hash values of an element x, {h₁,..., h_k} where 0≤h_i≤m, are collectively referred to as its neighborhood, denoted by N(x).

The lookup table is constructed such that for every x, a location τ(x) among N(x) can be found in such a way that there is a one-to-one mapping between all x and τ(x). Because τ(x) is unique for each x, collision-free lookups are guaranteed. Later on in the study we describe how to tune the design parameters such that the probability of finding such a mapping is arbitrarily close to 1.

The idea is to construct the lookup table so that a lookup for x returns τ(x). Then we can store f(x) in another data structure called the result table, at address τ(x). Thereby, we can guarantee deterministic, collision-free lookups of arbitrary functions. Since the Result Table will be addressed by τ(x), it must have as many locations as the lookup table.

During lookup table setup, once we find τ(x) for a certain x, we store V(x) in the location τ(x) which is computed as follows:

(1)

where, ⊕ denotes the XOR operation, k is the total number of hash functions, is the i’th hash value of x, D[h_i(x)] is the data value in the h_i(x) th location of the lookup table and h_τ(x) identifies which hash function produces τ(x). Now during a lookup for x, h_τ(x) can be computed as:

(2)

We can use h_τ(x) to get τ(x). This can be done by re-computing the h_τ(x)th hash function or by remembering all hash values and selecting the h_τ(x)th one. Then, we can read f(x) from the location h_τ(x) of the result table.

DESCRIPTION OF THE PROPOSED ARCHITECTURE

Here, we describe the proposed string matching architecture, which is based upon the Bloomier filter. First we consider the issue of tuning the filter design parameters. Next we describe how our architecture addresses false positives. Then we investigate the scalability of our architecture as number of strings increases. At the end of this section, we present an overview of our architecture.

TUNING THE DESIGN PARAMETERS

Chazelle et al. (2004) showed that for a Bloomier filter with k hash functions, n elements and a lookup table size m≥kn, the probability of setup failure p(fail) is upper bounded as follows:

(3)

First, we investigate how P(fail) varies with the lookup table size m and the number of hash functions k. In Fig. 1 we show a graph of P(fail) versus the ratio m/n for n = 4K elements. There is a separate curve for each value of k. We note that the failure probability decreases marginally with increasing m/n, but decreases significantly with increasing k. However, a high value of k comes at the expense of large storage to maintain m≥kn. Hence, we should choose a suitable value of k to balance system cost against probability of setup failure. Next, a graph of P(fail) versus n while fixing k = 4 and m/n = 4 is shown in Fig. 2. We note that P(fail) decreases dramatically as n increases.


Fig 1:	Setup failure probability versus m/n (n = 4K)


Fig 2:	Setup failure probability versus n

For values of n typical in applications like intrusion detection (a few thousand), P(fail) is about 1 in 3 millions or smaller (i.e., 1 setup out of 3 millions setups may fail to converge).

For present architecture, we choose k = 4 and m/n = 4 because this provides a very small failure probability, yet yields a total storage requirement of a few bits per character for lookup table.

Eliminating false positives: In Bloomier filter, a false positive can happen when some element x which was not among original elements used in filter setup, produces some value for h_τ(x). Hence, because τ(x) is not null as expected, this will return an incorrect f(x) which lies at location τ(t) in the result table. Chazelle et al. (2004) address such false positives by using concatenation of a checksum to h_τ(x) instead of h_τ(x) in Eq. 1 during setup. After looking up element x, checksum is computed and compared with the checksum obtained from the concatenation yielded by Eq. 2 during lookup. The wider this checksum is, the smaller the Probability of False Positives (PFP) would be. Thus, Chazelle et al. (2004) effectively trade off storage space to reduce PFP.

Note that a non-zero PFP is unacceptable for applications like intrusion detection, no matter how small it may be, because it would lead to false alerts. Therefore, reducing the probability of collisions does not guarantee the worst-case deterministic lookup rate desired by the line rate and such network intrusion detection systems would be vulnerable to denial of service attacks.

A storage-efficient scheme which has been proposed by Hasan et al. (2006) is used to eliminate false positives in our string matching architecture. The basic idea is to store all original strings in a table and compare them with the search strings. In Bloomier filter architecture, the result table can be used for this purpose. It means f(x) which is extracted from location τ(t) in the result table is a possible match string. This table has as many locations as the lookup table (i.e., m≥kn), but only n locations are really needed to store the n elements. Hence the naive way would need k times more storage than what is needed to actually store all the elements. Instead, as described by Hasan et al. (2006), we can dissociate the size of the lookup and result tables. During setup, instead of encoding h_τ(x), for each x, a pointer p(x) which directly points into a result table having n locations is encoded. Thus, the lookup table encoding equation (Eq. 1) is modified as follows:

(4)

On the other hand, during lookup, p(x) is extracted from the lookup table using Eq. 2 and x is read out of location p(x) in the result table. Then we compare the search string against the value of x. If the two match, then there is a correct result. Otherwise, it is a false positive.

The lookup table which stores pointers with maximum value n, will require log₂(n) bits (q = log₂(n)) while in the naive approach which encodes h_τ(x) into the lookup table, it will require only log₂(k) bits per element (q = log₂(k)). This approach reduces the overall required storage compared to the naive one despite this increase in the Lookup Table size along with a reduction in the size of the result table.

Scalability: Now it will be shown how our architecture scales with increasing number of strings. The total storage space required by underlying data structure of the proposed architecture consists of two tables (lookup table and result table) is kn log₂(n)+nL where k is the number of hash functions, n is the number of strings and L is the maximum string length (assuming all strings have the maximum length). In Table 1 this storage requirement is shown for string set sizes in the range of 4K to 32K for several values of k and L.

While the storage space required by the result table (nL) scales linearly with increasing number of strings, the storage space required by the lookup table (kn log₂(n)) scales faster than linear. However, as the lookup table takes more part in the overall required storage of our architecture (L is much larger than k log₂(n) in our target applications), the overall storage requirement of the proposed architecture scales approximately linearly with increasing number of strings.


Fig 3:	Storage requirements vs. number of strings for L = 32 bytes

Table 1:	Storage requirements of our proposed architecture


Fig. 4:	Storage requirements vs. number of strings for k = 4

In Fig. 3 we chose string length equal to 32 bytes and graphed a separate diagram for values of k ranging from 3 to 6 while in Fig. 4, we chose number of hash functions equal to 4 and graphed a separate diagram for values of L ranging from 16 to 64 bytes.

Architecture overview: The basic architecture proposed is shown in Fig. 5. The Bloomier filter engine analyzes a window of L bytes from data stream that arrives at the rate of one byte per clock cycle.

As shown in Fig. 5, the architecture data path involves three main operations consisting of hash computations and two memory accesses. Appropriate pipelining is utilized in the data path to achieve a higher operating frequency. In this way, the system can verify the matching of the L bytes string in a single clock cycle.


Fig. 5:	Basic proposed architecture implementation details and considerations

IMPLEMENTATION DETAILS AND CONSIDERATIONS

It is possible to employ some optimization techniques to reduce the amount of required resources and improve the design efficiency. We can group the strings according to their length such that strings of each group have a unique substring. Subsequently, we reduce the length of substrings, keeping only the bit-positions that are necessary to distinguish the strings. Finally, we generate the hash over these reduced substrings. This method requires fewer resources for hash computing. Also, to reduce the size of the result table, we can group strings such that strings of the same group have unique prefixes or have unique suffixes in order to distinguish them using hashing. Generating groups of strings that have unique prefixes or suffixes increases the chance to fit more strings in memory. The effectiveness of these techniques, however, depends on string types and varies from case to case. In order to analyze the worst case performance of the proposed scheme, we do not include such techniques in our design and consider the basic architecture.

In the following section, the method to support required hash functions in hardware and also perform the corresponding random lookups in on-chip memory is described.

Hash functions: The proposed scheme uses hash functions which are from a class of universal hash functions described by Ramakrishna et al. (1994). Hardware implementation of these functions is relatively inexpensive.

In our string matching architecture k hash functions are needed. These hash functions are calculated as described below.

For any string consisting of b bits, represented as, <x₁, x₂, x₃,...x_b>, i^th hash function over this string is calculated as:

(5)

where d_ij is a predetermined random number in the range [1 . . . m].

It can be observed that the hash functions are calculated cumulatively. Hence, the results calculated over the first j bits can be used for calculating the hash function over the first j+1 bits. To support strings with different lengths, this property of the hash functions results in regular and less resource consuming hash functions.

Realizing memories: Each hash function corresponds to a random memory access in the lookup table. Thus, for 4 hash functions, the lookup table memory should be able to support 4 random accesses every clock cycle in our proposed architecture.

Memories with such density and lookup capacity required by our proposed architecture could be realized using embedded RAMs in modern FPGA devices. For instance, Xilinx FPGAs have a memory core called Block RAM which is physically a dual-port memory. Hence, using this memory, two random memory locations can be read in a single clock cycle. However, due to the block RAM’s fast access performance, it is possible to create quad-port memories by time-division multiplexing the signals in and out of memory at the cost of some additional logic and access latency for each port. This involves a trade-off of data access time (halved) versus functionality (doubled). To improve the design performance, we utilized multiple clock domains to overcome memories latency.

The size of lookup table is kn log₂ (n) bits (since k = 4 in our architecture). Hence, by using the FPGA’s embedded memories in quad-port configuration, the total memory required by the lookup table remains 4n log₂ (n).

The result table is realized by making use of FPGA’s on-chip Block RAMs too. The total memory required by this table is nL and on-chip Block RAMs are used to implement this memory in a straightforward manner.

EVALUATION METRICS

The proposed string matching scheme is evaluated using two main metrics: performance in terms of processing throughput and area cost in terms of required FPGA resources (all post place and route results). In order to compare our proposed scheme with previously reported research, we use a metric which takes into account both performance and area cost simultaneously: Throughput/(LogicCells/Char).

The proposed architecture was implemented on Xilinx Virtex4 FPGA family and a Virtex4 FX 100 using Xilinx ISE 8.1i. was used which supports up to 16K single size signatures of 32 bytes. The architecture has 4 hash functions and a quad-port memory with the required lookup table size is used to perform 4 concurrent memory accesses. The lookup table (4x16kx14 bits) and result table (16kx256 bits) were both implemented using the FPGA’s on-chip embedded SRAM blocks.

All hardware elements of our scheme are implemented on the FPGA, while software functions such as the setup algorithms are executed on the host processor. A clock speed of 150 MHZ was achieved on the FPGA. Therefore by processing one byte per clock cycle, present design can achieve 1.2 Gbps of throughput.

PRESENTING RESULTS AND COMPARISONS

Table 2 shows the results of our scheme against latest reported research on string matching designs. Note that methods like the ones proposed by Dharmapurikar et al. (2004) and Attig et al. (2004) need off-chip memories to verify their approximate matching results which are not shown in this table.

Table 2:	Results of our scheme and other FPGA-based string matching approaches

More importantly, we should note that most of the approaches listed in this table use Snort rule set in their implementations. In these schemes, by preprocessing the signature strings and taking advantage of the optimization techniques they improve the performance and efficiency of their implementations. In contrast, we did not make any assumptions about the string set. Thus the type of strings has no role in our reported results. Therefore changing the string set does not affect the performance and efficiency of our work while it may affect the performance and efficiency of previously reported methods.

However, the comparison shown in Table 2 indicates that our design has a better performance and higher efficiency seen by the Throughput/(LogicCells/Char) metric value compared to the best reported schemes. Since our design mainly makes use of embedded on-chip memory blocks and the logic resources required for implementing our scheme are independent of the number of strings, our design has a better metric value (more than 25% better) compared to the best result of previously reported schemes.

CONCLUSIONS

In this study, a storage-efficient architecture for the String Matching problem was presented. This scheme is based on a recently proposed hashing scheme called Bloomier filter, which is a memory efficient data structure. It leads to a simple yet powerful architecture which can handle several thousands of strings at high rates and is amenable to on-chip realization. Due to its dependence on only on-chip memory, it is more scalable in terms of speed and the size of the string set, when compared to other hardware approaches based on FPGA.

Present scheme can be used as a basic building block to architect solutions for network intrusion detection as well as for generic content searches. As network speed increases, our scheme will be invaluable for maintaining the required performance and efficiency.

REFERENCES

Pryor, D.V., M.R. Thistle, N. Shirazi, 1993. Text searching on splash. Proceedings of the Workshop on FPGAs for Custom Computing Machines, April 5-7, 1993, California, pp: 172-177.

Sidhu, R. and V.K. Prasanna, 2001. Fast regular expression matching using FPGAs. Proceedings of the 9th Annual Symposium on Field-Programmable Custom Computing Machines, April 2001, Rohnert Park, CA, pp: 227-238.

Sailesh, K., D. Sarang, Y. Fang, C. Patrick and J. Turner, 2006. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer, September 11-15, 2006, Pisa, Italy, pp: 339-350.

Tuck, N., T. Sherwood, B. Calder and G. Varghese, 2004. deterministic memory-efficient string matching algorithms for intrusion detection, In IEEE Infocom. http://www.cs.usd.edu/users/calder/papers/INFOCOM-04-IDS.pdf.

Dharmapurikar, S., P. Krishnamurthy, T. Sproull and J.W. Lockwood, 2004. Deep packet inspection using parallel bloom filters. IEEE Microbiol., 24: 52-61.
Direct Link

Dharmapurikar, S. and J. Lockwood, 2005. Fast and scalable pattern matching for content filtering. Proceedings of the Symposium on Architecture for Networking and Communications Systems, October 16-18, 2005, ACM, USA., pp: 183-192.

Karp, R.M. and M.O. Rabin, 1987. Efficient randomized pattern-matching algorithms. IBM. J. Res. Dev., 31: 249-260.
Direct Link

Tan, L. and T. Sherwood, 2006. Architectures for bit-split string scanning in intrusion detection. IEEE Microbiol., 26: 110-117.
CrossRef Direct Link

Jung, H., Z.K. Baker and V.K. Prasanna, 2006. Performance of FPGA implementation of bit-split architecture for intrusion detection systems. Proceedings of the 20th Parallel and Distributed Processing Symposium, April 25-29, 2006, IPDPS (RAW), pp: 177-177.

Sourdis, I., D. Pnevmatikatos, S. Wong and S. Vassiliadis, 2005. A reconfigurable perfect-hashing scheme for packet Inspection. Proceeding of the 15th International Conference on Field Programmable Logic and Applications (FPL), 2005.

Papadopoulos, G. and D. Pnevmatikatos, 2005. Hashing + memory = low cost, exact pattern matching. Proceedings of the 15th International Conference on Field Programmable Logic and Applications, August 24-26, 2005, Tampere, Finland, pp: 39-44.

Chazelle, B., J. Kilian, R. Rubinfeld and A. Tal, 2004. The bloomier filter: An efficient data structure for static support lookup tables. Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, January 11-14, 2004, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA., pp: 30-39.

Hasan, J., S. Cadambi, V. Jakkula and S. Chakradhar, 2006. Chisel: A storage-efficient, collision-free hash-based network processing architecture. ACM. Sigarch Comp. Architecture News, 34: 203-215.
Direct Link

Attig, M., S. Dharmapurikar and J. Lockwood, 2004. Implementation results of bloom filters for string matching. Proceedings of the 12th Annual Symposium on Field-Programmable Custom Computing Machines, April 20-23, 2004, Napa, CA., pp: 322-323.

Ramakrishna, M., E. Fu and E. Bahcekapili, 1994. A performance study of hashing functions for hardware applications. Proceedings of the 6th Intnernational Conference on Computing and Information, May 26-28, 1994, Peterborough, Ont., pp: 1621-1636.

Cho, Y.H. and W.H. Mangione-Smith, 2004. Programmable hardware for deep packet filtering on a large signature set. 1st Watson Conference on Interaction between Architecture, Circuits and Compilers, 2004. http://cares.icsl.ucla.edu/cares/content/papers/wassa04.pdf.

Sourdis, I. and D. Pnevmatikatos, 2004. Predecoded CAMs for efficient and high-speed nids pattern matching. Proceedings of the 12th Annual Symposiums on Field-Programmable Custom Computing Machines, April 20-23, 2004, IEEE CS Press, pp: 258-267.

Baker, Z.K. and V.K. Prasanna, 2006. Automatic synthesis of efficient intrusion detection systems on FPGAs. IEEE Trans. Dependable Secure Comput., 3: 289-300.
CrossRef

HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 488-496 DOI: 10.3923/jas.2009.488.496

Design and Implementation of a Novel High Performance Content Processor for Storage Disks Using an Exact String Matching Architecture

A. Peiravi and M. J. Rahimzadeh

How to cite this article

A. Peiravi and M. J. Rahimzadeh, 2009. Design and Implementation of a Novel High Performance Content Processor for Storage Disks Using an Exact String Matching Architecture. Journal of Applied Sciences, 9: 488-496.

Keywords: bloomier filter, Custom computing, string matching, storage devices and programmable hardware

REFERENCES

Year: 2009 | Volume: 9 | Issue: 3 | Page No.: 488-496
DOI: 10.3923/jas.2009.488.496