HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2003 | Volume: 3 | Issue: 7 | Page No.: 493-509
DOI: 10.3923/jas.2003.493.509
Generalized Agreement Underlying a Multicasting Environment
S.C. Wang and K.Q. Yan

Abstract: The BA problem is solved in a multicasting environment. The proposed protocol uses the minimum number of message exchanges to reach an agreement while tolerating the maximum number of faulty components in the distributed system. It makes all the fault-free processors reach a common value to keep the system from the influences of processor failures and transmission medium failures.

Fulltext PDF Fulltext HTML

How to cite this article
S.C. Wang and K.Q. Yan, 2003. Generalized Agreement Underlying a Multicasting Environment. Journal of Applied Sciences, 3: 493-509.

Keywords: distributed architecture, Byzantine agreement, distributed system and faulttolerance and multicasting network

Introduction

Traditionally, if some information is to be delivered from a processor to other n processors in a Fully Connected Network (FCN), the source needs to send out n copies of it. The FCN is considered inefficient because such a way of delivery costs too much of the bandwidth. In a broadcasting Network (BCN), the message is spread through the local bus. Although the BCN consumes less bandwidth, regardless of the messages' destinations, all the processors connected to the bus have to be alert for any messages on-line. It brings a heavy load onto the CPU resources and thus decreases the processors' efficiency. In addition, the scope of information dissemination is limited to a local bus. In a multicasting network, a frame can be sent to an individual, broadcasters, or a group address in a LAN (Halsall, 1995). Distinct from the FCN, only one copy of the message needs to be sent over the Internet. Therefore, the multicasting network is more powerful and applicable than the traditional FCN or the BCN. Some well-known network systems are in fact applications (Wang et al., 1982) of the multicasting network; e.g. a modern advanced avionic network may include a fire control system, a flight path control system, a communication/navigation control system, a mission planning system and a radar system. In this complex distributed system, each subsystem allocates a number of CPUs or computers to achieve the stability and reliability. That is, each subsystem is equipped with 3 or 4 processors to increase the reliability of the software/hardware. In addition, each subsystem (or called the group of processors) needs to carry out an individual task, based on some certain common initial value, such as the start time and the initial geographical location. In these applications, if some faulty components exist, the system may fail due to the influence of the faulty components. It is a critical problem and to the best of our knowledge, there is yet no solution to this problem. In this paper, a novel approach for solving this unanimity problem is to be proposed.

Reaching an agreement in the presence of system errors plays an important role in designing a fault-tolerant distributed system. Such a problem is called the Byzantine Agreement (BA) (Dolev, 1982, Dolev et al., 1985; Lamport et al., 1982; Lamport et al., 1984; Pease et al., 1980; Reischuk 1982, Strong et al., 1982) problem. The goal of the BA is to reach a common value among fault-free processors even if certain components fail. The source has its initial value vs to start with. The agreement is reached if all the fault-free processors agree upon a common value v. That is, the BA problem is solved when the following constraints are satisfied:

(BA1) : All fault-free processors agree on a common value v.
(BA2) : If the initial value of the source is vs and the source is fault-free, then all fault-free processors shall agree on the value vs; i.e., v = vs.

With the agreement, many applications can be achieved, such as a two-phase commitment to be made in a distributed database system (Molina, 1986), a task of locating the whereabouts of a replicated file in a distributed environment (Gifford, 1979) and a landing task controlled by the processors in a flight control system (Yan et al., 1999).

Previously, the agreement problem has been discussed in the FCN, the BCN, or the Generalized Connected Network (GCN). Lamport et al. (Lamport et al., 1982) first brought up the BA problem with the following assumptions: (1) In an n-processor system, it tolerates at most fp faulty processors. (2) The processors communicate with each other directly through the transmission medium; that is, the network architecture is fully connected. (3) The message sender is always identified; in other words, the receiver is aware of the source of the message. (4) Only the symptom of some processor failure is discussed. It requires rounds of message exchange to reach a common value and the system tolerates faulty processors. Fischer (Fischer et al., 1985) further proved that faulty processors with fp+1 rounds is the optimal solution. Bar-Noy (Bar-Noy et al., 1987) proposed a data structure to solve the BA problem. In a generalized case (where both processor failures and transmission medium failures may exist), (Yan et al., 1999) showed an optimal solution to making the fault-free processors reach an agreement by fp+2 rounds of message exchange in the generalized FCN. In the BCN, Babaoglue (Babaoglue et al., 1985) has proved that the agreement can be reached at the cost of only 2 rounds if the number of faulty processors is less than half of n.

To improve the applicability of the traditional FCN and the BCN, Wang et al. (1985) and Siu et al. (1996) have revised network models respectively. Wang et al., introduced a GCN model, in which the concept of a group was first employed. It partitions n processors into g groups and each one of them contains p processors. Each group communicates with the others directly. That is, Wang ’ s GCN requires a group-to-group fully connected environment. On the other hand, Siu et al. (1996) gave another GCN model, similar to the FCN, which allows the network with a bounded connectivity, called the c-connectivity (c is a constant). It is connected in a pure processor-to-processor manner and each pair of processors does not need to be connected.

The GCN models proposed by Wang or Siu were elegant, but they are still not generalized due to some limitations. For instance, Wang’s GCN must be built on a fully connected environment. It is hardware wasting and inapplicable. On the other hand, Siu’s GCN states the processor-to-processor relationship; it does not seem to go with the LAN or WAN environment. To offer a better solution, in this paper, we shall review the BA problem in a popular multicasting network.

As for the faulty components, they are classified as either processor failures or transmission medium failures. The types of processor failures include crash fault, omission fault and malicious fault; the types of transmission medium failures are crash fault, stuck-at fault and malicious fault (Yan et al., 1999). Generally, both faulty processors and faulty transmission media may exist. In the generalized case, as many as nine failure types may exist because of the interaction between the three types of processor failures and the three types of transmission medium failures. The nine possible types of faults are: crash-crash, crash-sa (Stuck-at), crash-malicious, omission-crash, omission-sa, omission-malicious, malicious-crash, malicious-sa and malicious-malicious. The part of the word before the hyphen indicates the faulty processor type and the part of the word after the hyphen denotes the faulty transmission medium type. For example, the term “crash-malicious” indicates that the processor is in the symptom of crash fault and the transmission medium is in the failure type of malicious fault. The symptom of a malicious fault is unpredictable and the behaviors of the other failure types can be treated as special cases of a malicious fault. If the malicious-malicious case, which is the thorniest case, can be solved, then the other types can surely be solved.

In short, the goal of the BA, based on the assumption that the source has its own initial value, is to make all the fault-free processors reach a common value. The proposed protocol solves the BA problem in a multicasting network in the presence of generalized component failures. The rest of this paper is organized as follows. Section 2 will serve to define the multicasting network model. Then, in Section 3, we shall illustrate out MAP (Multicasting Agreement Protocol) protocol in detail. Finally, the conclusion and direction of future work will be presented in Section 4.

The SGCN model
In this section, a multicasting network model, called the SGCN (Super Generalized Connected Network), is to be defined. It combines group-to-group connection (Cristian, 1991) and loose constraints on the connectivity: (1) A system with n processors can be partitioned into g groups, where 1 ≤ g ≤ n. The number of processors, 1 ≤ p ≤ n, in each of the groups can be varied. (2) The transmission medium is freely connected. The connectivity does not necessarily have to reach each pair of groups. In other words, the connectivity c can be varied. Fig. 1 illustrates an SGCN model, in which 15 processors are partitioned into 6 groups. There are 2 processors in the group G1, 3 in G2, 1 in G3 and so on. As for the connectivity, for example, G1 is connected to G3, G4 and G5, but there is no direct link between G1 and G2 as well as G6. G2 is connected to G3, G4 and G5, but it has no direct link to G1 and G6. Consequently, c = 3.

Fig. 1: An example of an SGCN (n = 15, g = 6, p = {1, 2, 3, 4} and c = 3)

Fig. 2: Various SGCN models

The traditional FCN, the BCN and Wang or Siu’s GCN are special cases of the SGCN: (1) An n-processor FCN can be treated as an n-group SGCN where g = n, p = 1 and c = n-1. Fig. 2(a) shows a 6-processor FCN, which is in fact an SGCN with n = 6, g = 6, p = 1 and c = 5. (2) The BCN includes packet terrestrial radio, satellite network, bus local network and ring local network (Lamport et al., 1984). For simplicity, Fig. 2(b) illustrates an example bus local network as well as an SGCN with n = 6, g = 1, p = 6 and c = 1. Due to the fact that the processors communicate with each other through a local bus, the BCN can be regarded to as a 1-group SGCN with c = 1. (3) Wang’s GCN works as an SGCN with g groups while c = g-1. Fig. 2 © is an example of Wang’s GCN and is also an SGCN with n = 18, g = 6, p = 3 and c = 5. (4) In addition, an n-group, c-connectivity SGCN coincides with Siu’s GCN presented in Fig. 2(d), in which n = 6, g = 6, p = 1 and c = 4. According to these observations, if the BA problem can be solved in the SGCN, then the BA problem can surely be reached in the FCN and the BCN as well as the GCNs proposed by Wang and Siu separately.

To the best of our knowledge, no currently existing protocols are capable of solving the BA problem in an SGCN. The SGCN has loose restrictions on the network architecture, is generalized and can practically be widely applied to many fields. The agreement problem mostly happens in the network environment and leads to a critical issue. If the BA is reached in an SGCN, then our purposed protocol MAP for the SGCN can take care of the problem for the FCN, the BCN and the GCN as well since the traditional FCN, BCN and GCN are in fact subsets of the SGCN. The concept and protocol to solve the BA problem in the generalized case will be discussed right next.

Concept and protocol
Conceptually, there are two phases in this protocol: the message exchange phase and the decision making phase. In order for all the fault-free processors to reach an agreement, each of them needs to collect enough agreement values from all the others if they are fault-free. As a result, exchanging the received values helps fault-free processors to collect enough agreement values. The received messages are stored in a tree structure called the message-gathering tree (mg-tree), which is similar to what Bar-Noy et al. (1987) proposed. This is the basic concept of the message exchange phase. Besides, a local majority (LMAJ) at the end of each round can also help reduce some following rounds since the processors are partitioned into groups in an SGCN. The LMAJ of the set {v1, … ,vn} is v if the number of v's presented in the set is greater than n/2; otherwise, a default value φ is chosen. For example, the LMAJ of the set {0, 0, 0, 1, 1} is 0 and that of the set {1, 0, 1, 0, 0, 1} is φ . In addition, the influence of a faulty transmission medium is removed during the message exchange phase by applying a global majority function, called the MAJ. After removing the influence of the faulty transmission medium, the mg-tree is reorganized into an information-collecting tree (ic-tree) and another global majority, called the VOTE, is applied for removing the influence of a faulty processor. The details of such functions as the MAJ and the VOTE will be given later.

Since all the fault-free processors reach a common value, the first step of the protocol is to determine the number of rounds of message exchange required. If the faulty component is identified, then the protocol can decide the minimum number of rounds required to solve the BA problem. For example, if the faulty component is a processor, then the protocol can save some rounds required for removing the influence of a faulty transmission medium because there is not any. However, if there are both a faulty processor and a faulty transmission medium, then the number of rounds required will be bigger. The protocol works efficiently. During each round of message exchange, all the fault-free processors communicate with each other through the transmission media directly. In an SGCN, one may not be fully connected to all the others, but it can still communicate with them smoothly. For instance, in Fig. 3, processor P2 is not directly connected to P3. During the first round of message exchange, P1 communicates with P2 and P3 through the transmission media T12 (connected to P1 and P2) and T13 (connected to P1 and P3) separately. P1 gets P2 ’ s value as well as P3's value. P2 gets P1's value, but not P3 ’ s. Similarly, P3 gets P1's value, but not P2 ’ s. In the next round, P1 asks P2 what P3 ’ s value is and P2 gets P3's value through P1. Enough information helps making correct decisions. Then, the decision is made by a function VOTE executed by each processor in the decision making phase.

Fig. 3: The concept of message exchange

The message exchange phase is time consuming. Therefore, to reduce the number of required rounds is the major concern in designing the protocol. If the condition of the network is given, the number of required rounds is a minimum. For a given network healthy condition, there are four possibilities: (1) all components are fault-free; (2) transmission medium failure only; (3) processor failure only; (4) both processor failures and transmission medium failures exist. They are elaborated in the following passages.

Case 1: All components are fault-free.
Such a case is obviously the simplest one. The processors are fault-free and the messages are not forged. The transmission media are perfect and the messages are delivered correctly. Without facing any obstacle, each processor reaches the agreement in the first round. With only one round, each processor receives one value and agrees on the value at the end of the decision making phase while the network topology is fully connected; otherwise, two rounds of message exchange are needed in a c-connectivity (c < n-1) network. If it is not fully connected, some processors may not receive the value in the first round and thus another round of message exchange is needed. Since each processor will definitely receive values in the second round, the agreement can surely be reached in two rounds. To sum up, in case all the components are fault-free, an agreement can be reached in one round of message exchange if the network is a fully connected one; otherwise, two rounds will be needed.

Case 2: Transmission medium failure only
In this case, no fault-free processor knows exactly which transmission medium is failed. In order to keep away from the influence of a faulty transmission medium, collecting all the processors' messages is necessary. If the number of the faulty transmission media is not a dominant one, then the agreement can be determined through the MAJ. This way, only two rounds are needed in this case and at most transmission medium failures can be tolerated (the number of faulty transmission media is denoted as ft).

In the first round of message exchange, where r = 1, the source multicasts its initial value vs through the connected transmission media. Each processor stores the value at the root of its mg-tree. Based on the assumption that the message sender is always identified, the processors are aware of the absence of messages. If there is no connection, replace it with λ . In the second round of message exchange, each processor acts as the source to exchange message and receives a vector of values. In order to reduce the influence of disconnected or faulty transmission media, we apply the LMAJ on the messages received from each same group. After that, the values are stored into the corresponding vertices at the second level of the mg-tree. Then, we reach an agreement by MAJ in the decision making phase. The MAJ(α ) is defined as follows. The val(α i) tells that the message is sent to a series of receivers, denoted as α and the processors in Group Gi are the latest receivers.

Consider the case that part of the messages may be forged by a faulty transmission medium. The decision value can be dominated due to a greater number of processors belonging to the same group and consequently communicating with other groups through the identical faulty transmission medium. As a result, taking local majority eases and reduces the influences occurred from the faulty transmission media.

Case 3: Processor failure only
In this case, fg + 2 rounds are required to solve the BA problem. The number of faulty groups is denoted as . If the number of fault-free processors in a group is greater than , it is called a fault-free group; otherwise, it is a faulty group, where p is the number of processors in the group and pi denotes the number of processors in the group . As a result, the system can tolerate at most faulty processors while reaching a common value, where

In order to prevent the influence from the faulty processors, each processor multicasts the messages received in the (r-1)th round at the beginning of the rth round and maintains its mg-tree. In each round, the LMAJ is applied to the messages from the same group to lessen the influence of false messages. For each processor, if there is no connection, replace it with λ in the mg-tree. These steps are not done by the protocol of Bar-Noy et al. (Bar-Noy et al., 1987). After fg+2 rounds, the messages stored in the mg-tree are reorganized into an ic-tree. In the decision making phase, all the fault-free processors reach a common value by applying the function VOTE α to the messages collected in the message exchange phase and stored in the ic-trees. The VOTE(α ) function for vertex α is defined as follows.

The ic-tree is similar to the mg-tree, but the ic-tree only stores the non-duplicated name of each vertex. It helps avoid the cycle influence of a faulty processor/group and thus the agreement can be reached more easily.

Case 4: Both processor failures and transmission medium failures exist
In this generalized case, we have to face both faulty processors and faulty transmission media at the same time. To solve the BA problem here, the influences of both the faulty processors and the faulty transmission media must be removed. An intuitive method to remove these influences is to combine both the protocol for transmission medium failures and the protocol for processor failures. For instance, the protocol for transmission medium failures solves the transmission medium failure problem only, if . In addition, the protocol for processor failures is valid only for the specific fallible environment. If , each fault-free processor must reach the agreement through the collected messages. Previous protocols (Lamport et al., 1982; Meyer et al., 1991; Ramaswami et al., 1993; Wang et al., 1995; Yan et al., 1999) deal with different fallible components separately, but they are incapable of reaching a common value in a complicated environment.

To solve the BA problem in a complicated environment where both faulty processors and faulty transmission media exist, there are four possible methods to remove the influences in different order. (1) Removing the influences of faulty processors and transmission media simultaneously. (2) Removing the influences of faulty processors first and then the influences of faulty transmission media. (3) Removing the influences of faulty transmission media first and then removing the influences of faulty processors. (4) Removing the influences of faulty transmission media and faulty processors alternately. Yan et al. (1999) depicted that the third method (removing the influences of faulty transmission media first and then removing the influences of faulty processors) is capable of solving the generalized case problem. Our MAP, following their observation, solves the transmission medium failures first and then the processor failures. The number of required rounds of message exchange, denoted as δ , for the BA is as follows.

Case 1 : δ = 1, if all the components are fault-free in an SGCN where c = n-1; δ = 2, if all the components are fault-free in an SGCN where c < n-1
Case 2 : δ = 2, if the fallible component is transmission medium only.
Case 3 : δ = , if the fallible component is processor only.
Case 4 : δ = fg + 3, if the fallible components include both processors and transmission media.

In our system, for a given network condition, the number of required rounds is the minimum among various cases. If not, case 4 is a default case. If and the amount of the faulty units (including the faulty group and the faulty transmission medium) , then the BA problem is solved. If the minimum number of required rounds is determined, each fault-free processor works effectively for collecting enough information and uses these messages to compute a common value in the decision making phase. The BA problem in the generalized case is solved in an SGCN.

The protocol MAP (Multicasting Agreement Protocol)
In this subsection, the MAP will be introduced to solve the generalized BA problem in the SGCN. The concept of the MAP protocol, as shown in Fig. 4, is as follows. In each round, the messages from the same group have to undergo the LMAJ to lessen the influence of false messages. In order to remove the influence caused by the faulty transmission medium, each processor applies the MAJ. In addition, in order to prevent the influence from the faulty processors, each processor exchanges messages with the others and maintains its mg-tree. For each processor, if there is no connection, then λ is put down in the mg-tree. After fg+3 rounds, the messages stored in the mg-tree are reorganized into an ic-tree. In the decision making phase, each processor reaches a common value by applying the VOTE function to the messages in its ic-tree. The ic-tree is constructed from a corresponding mg-tree by the following reorganization rules: (1) After the message exchange phase, the leaves of the mg-tree are deleted, since the influence of a faulty transmission medium in the leaves of an mg-tree is not removed; (2) The vertices with duplicated names are deleted; the purpose is to avoid the influence of a faulty processor being repeatedly stored again in an ic-tree.

Correctness
A vertex α is called common if each fault-free processor derives the same value from α . That is, the value stored in vertex α of each fault-free processor’s mg-tree or ic-tree is identical if the vertex α is common. If each fault-free processor has the common value in the root of its ic-tree, then an agreement is reached.

To prove that a vertex is common, the term common frontier is defined as follows. If each root-to-leaf path of the tree (the mg-tree or the ic-tree) contains a common vertex, then the collection of the common vertices forms a common frontier.

Fig. 4: The proposed protocol MAP solves the BA problem

That is, each fault-free processor has the same values collected in the common frontier if a common frontier exists in a fault-free processor’s tree structure. The function VOTE computes the root value of the tree structure and each fault-free processor obtains the same root value due to the same input (the same collected values in the common frontier) and computing function. This concept is similar to the protocol proposed by (Bar-Noy et al., 1987).

The vertex α i of a tree is a correct vertex (Yan et al., 1999) if group Gi is fault-free. That is, a correct vertex is the place to store the value received from a fault-free processor. In addition, for a vertex α i in a tree structure of a fault-free processor in the fault-free group Gj, val(α i) is the true value (Yan et al., 1999) if the transmission medium Tij (connects to Gi and Gj) is perfect. In the following passages, Lemma 1 proves the true values of all the correct vertices in an ms-tree can be obtained. That is, the influence of any given faulty transmission medium is removed from each fault-free processor’s mg-tree. Lemma 2 indicates that all the correct vertices in each fault-free processor’s mg-tree, except the leaves, are common. Lemmas 3 and 4 state that all the correct vertices in an ic-tree (eliminating the leaves of an mg-tree) are still common. Lemma 5 shows the existence of a common frontier. Lemma 6 and Corollary 2 present that the root of a tree structure is common if the common frontier exists in an ic-tree. The computed value stored in the root of an ic-tree is a common value, which is free from the influence of a faulty processor. As a result, Theorems 1 and 2 indicate the whole correctness.

Lemma 1
At the (r-1)th round, the processors in a fault-free group Gi multicast val(α ) to the others and each fault-free processor stores the local majority of the received values from Gi after using the LMAJ, denoted as val(α i) in vertex α i of its mg-tree. At the rth round, the MAJ(α i) which is applied to the vertices at the (r-1)th level of the mg-tree of each fault-free processors in Gj’s mg-tree should be the true value of vertex α i, namely val(α ), for 1 ≤ j ≤ g if the number of faulty units is less than in an SGCN model, for each r in 2 ≤ r≤ fg+3.

Proof: Part 1. The transmission medium Tij is perfect
Since Tij is perfect, the processors in Gj will receive val(α i) from the processors in Gi in the (r-1)th round after taking local majority by LMAJ and val(α i) = val(α ). The processors in Gi multicast val(α ) to the others. There are c transmission media connected to a processor in which at most units could be failed. In the next round, each processor in Gj receives at least after LMAJ, stored in the children of vertex α i, from processors in other groups such as Gm, where at least and a val(α ij) are both equal to this val(α ) stored in the children of vertex α i of the processors in Gj and MAJ(α i) is equal to val(α ).

Part 2. The transmission medium Tij is faulty
If Tij is faulty, consider the two following cases. Note that val(α ij) = val(α i) for the processors in a fault-free group Gj.

Case 1: val(α i) = val(α )
There are at most faulty units connected to the fault-free processors in Gj and at most values that may be ¬ val(α ) ’ s in the children of vertex α i. Since val(α ij) = val(α ), there are at least (c-1)-() val(α ) free from the influence of a faulty unit among the rest (c-1) messages. The number of val(α ) ’ s is (c-1)- ()+1 = + 1 in this set. As a result, the majority for the values in the set val(α ), or MAJ(α i), is val(α ).

Case 2: val(α i) = ¬ val(α )
There are at most faulty units in the network. At the rth round, the number of ¬ val(α )'s is not beyond and the number of val(α )'s is at least (c -1)-(). If c is an even number, then = , and the majority of the set [val(α i1), val(α i2), … , val(α ij)] for vertex α i is undefined. By definition, MAJ(α i) = ¬ val(α ) = val(α i). In addition, if c is an odd number, then < , and the majority value, MAJ(α i), of the set is val(α ).

Corollary 1
At the end of the (fg+2)nd round, the true value of each non-leaf vertex in a fault-free processor’s mg-tree can be obtained.

Lemma 2
All the correct and non-leaf vertices in a fault-free processor’s mg-tree are common.

Proof: The value stored in a correct vertex α j is v, which implies that most of the processors in a fault-free group Gj say that the value stored in vertex α of its mg-tree is v. Since most of the processors in Gj are fault-free, the value val(α j) = v is stored in each fault-free processor’s mg-tree unless the transmission medium connected to Gj is failed. In this case, at most values are v, which is stored in the children of vertex α j for such a receiver. The majority value v is obtained and restored in vertex α j by applying MAJ to vertex α j. As a result, all the correct vertices have the common value v.

Lemma 3
If a parent vertex is correct in an mg-tree, then the fault-free children of the parent should be common. The true value for these fault-free children is the same at level ≤ fg+2.

Proof
To prove this lemma is to show that if val(α ) = v for a correct vertex α , then the value MAJ (α i) of each fault-free processor should be v on the condition that group Gm is fault-free, 1 ≤ m ≤ g. At round r ≤ fg+3, by Lemma 1, the value val(α ) = v is available for the fault-free processors in Gj and Gm, where vertex α is at level r-2;

Case 1 : If transmission medium Tim is perfect, then the fault-free processors in Gi and Gm have the value MAJ(α m) = val(α m) = v, where vertex α m is at level r-1.
Case 2 : If transmission medium Tim is faulty, then val(α m) may not be v. Since there are at most faulty units in the network, there are at least correct vertices α mi with a common value v among the children of vertex α m. Thus, by Lemma 1, MAJ (α m) = v.

According to cases 1 and 2, where the level of α r in r-1 is lower than or equal to fg+2, the true value of each fault-free children α m (at the level ≤ fg+2, Gm is a fault-free group) should be the true value of its fault-free parent α . Regardless of whether the transmission medium between the receiver and Gm is fault-free or faulty, the fault-free children are common.

Lemma 4
All the correct vertices of an ic-tree are common.

Proof
After reorganization, no repeated vertices are in an ic-tree. At level fg+1 or above, the correct vertex α has at least 2fg+1 children, of which at least fg+1 children are fault-free. By Corollary 1, the true values of those fg+1 correct vertices are in common, and the majority value of vertex α is common. The correct vertex α is common in the ic-tree if the level of α is ≤ fg+1. At level fg+2, by Corollary 1, the correct vertex is still common. As a result, all the correct vertices of the ic-tree are common.

Lemma 5
The common frontier exists in the ic-tree.

Proof
There are fg+2 vertices along each root-to-leaf path of an ic-tree in which the root is labeled by the source name, and the others are labeled by sequence of group names. Since at most fg groups can be failed, there are at least one correct vertex along each root-to-leaf path of the ic-tree. By Lemma 4, the correct vertex is common, and therefore the common frontier exists in each fault-free processor ’ s ic-tree.

Lemma 6
Let α be a vertex, α is common if there is a common frontier in the subtree rooted at α .

Proof
If the height of α is 0, and the common frontier (α itself) exists, then α is common. If the height of α is r, plus the children of α are all in common by the induction hypothesis with the height of the children being r-1, then the vertex α is common.

Corollary 2: The root is common if the common frontier exists in the ic-tree.
Theorem 1: The root of a fault-free processor’s ic-tree is common.
Proof: By Lemma 5 and Corollary 2, the theorem is proved.

Theorem 2: Algorithm MAP solves the generalized BA problem in an SGCN.

Proof:
To prove the theorem, we have to show that the MAP algorithm meets agreements (BA1') and (BA2').

By Theorem 1, (Agreement ’ ) is satisfied.

Since most of the processors are fault-free, they multicast the source’s initial value to all the others. The true value of the correct vertices for all the fault-free processors’ mg-trees is v. When the mg-trees are reorganized into ic-trees, the correct vertices still exist. As a result, each correct vertex of the ic-tree is common (Lemma 4), and its true value is v. By Theorem 1, this root is common. The computed value VOTE(s) = v is stored in the root for all the fault-free processors. Agreement (BA2’ ) is thus satisfied.

Complexity
The complexity of the protocol is evaluated in terms of (1) the minimum number of rounds, and (2) the maximum number of allowable faulty components.

Theorem 3
It requires fg+3 rounds to solve the generalized BA problem in an SGCN, and it can tolerate at most fu (≤) units. Here, the number of allowable faulty groups is fg (≤ ), and the number of allowable faulty transmission media is ft (≤).

Proof

(1) In the message exchange phase, fg+3 rounds are required, and no extra round is required during the decision making phase.
(2) By Lemma 1, the MAJ in the mg-tree can take care of at most faulty transmission media. By Lemma 4, the ambiguity due to at most fg faulty groups can be resolved. The theorem is proved.

Theorem 4
The proposed protocol solves the generalized BA problem by using the minimum number of rounds.

Proof
For an SGCN, fg+2 rounds is the minimum number of rounds to solve the BA problem with fg faulty groups. In the generalized case, at least fg+2 rounds are required to reach an agreement. Suppose there is an algorithm that can solve the generalized BA problem by using fg+2 rounds under the assumption of processor/transmission medium failures. Since the symptoms of faulty processors/transmission media are both malicious, a faulty transmission medium may be just as disruptive as a faulty group, and vice versa. As a result, the fg+2 rounds algorithm solves the BA problem with fg+1 malicious faulty groups. It contradicts the results of processor failures only, in which fg faulty groups are the maximum number of faulty groups allowed by the fg+2 rounds algorithm to solve the BA problem in an SGCN. In other words, it is impossible to solve the BA problem by fg+2 rounds in an SGCN with fg faulty groups and some additional faulty transmission media. As a result, fg+3 rounds are the minimum number of rounds.

Theorem 5
The total number of allowable faulty units fu (≤ ) is the maximum.

Proof
Since fu is the maximum number of allowable faulty transmission media in a system with transmission medium failures only (fg=0), if the number of fu = fg+ft is not the maximum number of allowable faulty units, then fu = fg + ft = 0 +ft if fg = 0. Then, ft is either equal to or larger than , which leads to a contradiction.

As a result, the MAP takes the minimum number of rounds and tolerates f faulty components to make each fault-free processor reach the agreement. The optimality of the protocol is proved.

Conclusions and future works

The multicasting network is a more and more popular choice in various communication environments. The main advantage of the multicasting network is that one processor has to only send a copy of some frame across the network, and different computers under the cover of the network can receive the same frame, which is more convenient and efficient than traditional network systems. Based on the generalized network model, in this paper, the BA problem has been reviewed with malicious processor/transmission medium failures. The MAP solves the generalized BA problem at the cost of the minimum rounds of message exchange and tolerates the maximum number of faulty components.

(fg+2) rounds is the lower bound for solving the BA problem in an SGCN under the assumption of processor failures only, for the messages received at the (fg+2)th round may be influenced by the faulty transmission media, and additional rounds may be required to remove the influence of faulty transmission media on the (fg+2)th round. As a result, fg+3 rounds is the optimal solution for the generalized BA under the assumption of both processor and transmission medium failures. The protocol MAP solves processor failures/transmission medium failures only cases as well. If the fallible component is known, then the number of required rounds is:

(1) Only two rounds are required to help each processor reach an agreement while all the components in the SGCN model are fault-free/perfect with c < n-1; otherwise, only one round is needed.
(2) Two rounds are required for the transmission-medium-failures-only case (the processors are fault-free).
(3) (fg+2) rounds are required to solve the BA problem when only the processors are on the fallible side.
(4) (fg+3) rounds are required when the network condition is unknown or when both the processors and transmission media may be fallible.

The MAP solves the BA problem in the SGCN model if and only if the number of faulty units is smaller than . It tolerates at most faulty groups (which implies that is the maximum number of faulty processors allowed), and at most faulty transmission media may exist. If the number of faulty components exceeds the limitation, reached. The messages sent by fault-free/perfect components do not dominate the messages from faulty components, and the concept of majority, which helps to remove the influence of faulty components, is applied; as a result, we have the following facts:

(1) To solving the generalized BA, the MAP is an optimal solution that tolerates the maximum number of faulty components at the cost of the minimum number of rounds of message exchange.
(2) The protocol solves the generalized BA problem, and thus the processor-failures-only case and the transmission-medium-failures-only case are also solved.
(3) The symptom of faulty components we can deal with is the malicious type, which is the thorniest and the most generalized type. With such ability, the proposed protocol can cope with other failure types with ease, such as crash and omission with processors only or transmission media only or both.
(4) Since the SGCN incorporates the FCN, the BCN, and the GCN, the FCN, the BCN, and the GCN are in fact special cases of the SGCN. In other words, the capability of our solution to the BA problem with the SGCN covers all the other three network structures.

The multicasting network is a network system where a frame may be broadcast or sent to an individual or a set of group addresses. That is, a frame may be sent to an individual this time and sent to a set of group addresses next time. Both the receivers and the number of them can be different, and that is the case the proposed protocol is incapable of closing yet. Therefore, our future work will be focused on extending the SGCN model and designing a more generalized protocol. The SGCN model will be made more complex so that a frame can be sent to different sets of receivers in different rounds. For example, P1 sends messages to P2 and P3 in the first round and to P4, P5, and P6 in the second round.

REFERENCES

  • Babaoglu, O. and R. Drummond, 1985. Streets of byzantium: Network architectures for fast reliable broadcasts. IEEE Trans. Data Knowledge Eng., 11: 546-554.
    Direct Link    


  • Bar-Noy, A., D. Dlove, C. Dwork and H.R. Strong, 1987. Shifting gears: Changing algorithms on the fly to expedite byzantine agreement. Proceedings of the Symposium on Principles of Distributed Computing, Aug. 10-12, New York, USA., pp: 42-51.


  • Cristian, F., 1991. Reaching Agreement on processor-group membership in synchronous distributed systems. Distributed Comput., 4: 175-187.
    CrossRef    Direct Link    


  • Dolev, D., 1982. The byzantine generals strike again. J. Algorithms, 1: 14-30.


  • Dolev, D. and R. Reischuk, 1985. Bounds on Information exchange for byzantine agreement. J. ACM., 32: 191-204.
    Direct Link    


  • Fischer, M., M. Paterson and N. Lynch, 1985. Impossibility of distributed consensus with one faulty process. ACM J., 32: 374-382.
    Direct Link    


  • Gifford, G.K., 1979. Weighted voting for replicated data. Technical Report, CSL-79-14, XEROX Palo Alto Research Center.


  • Halsall, F., 1995. Data Communications, Computer Networks and Open Systems. 4th Edn., Addison-Wesley Publishing Company, USA


  • Lamport, L., R. Shostak and M. Pease, 1982. The byzantine generals problem. ACM Trans. Programm Languages Syst., 4: 382-401.
    Direct Link    


  • Lamport, L. and P. Melliar-Smith, 1984. Byzantine clock synchronization. Proceedings of the 3rd ACM Principles of Distributed Computing Conference, Aug. 27-29, New York, USA., pp: 68-74.


  • Meyer, F.J. and D.K. Pradhan, 1991. Consensus with dual failure modes. IEEE Trans. Parallel Distributed Syst., 2: 214-222.
    CrossRef    


  • Molina, G., 1986. Application of byzantine agreement in database systems. ACM Trans. Database Syst., 11: 27-47.
    Direct Link    


  • Pease, M., R. Shostak and L. Lamport, 1980. Reaching agreement in presence of faults. J. ACM., 27: 228-234.
    Direct Link    


  • Ramaswami, V. and J.L. Wang, 1993. Analysis of the link error monitoring protocols in the common channel signaling network. IEEE/ACM Trans. Network., 1: 31-47.
    CrossRef    Direct Link    


  • Reischuk, R., 1982. A new solution for the byzantine generals problem. IBM Research Report, RJ-3673, Computer Science.


  • Siu, H.S., Y.H. Chin and W.P. Yang, 1996. A note on consensus on dual failure modes. IEEE Trans. Parallel Distributed Syst., 7: 225-230.
    Direct Link    


  • Strong, H. and D. Dolev, 1982. Byzantine agreement. IBM Research Report, RJ-3714, Computer Science.


  • Wang, S.C., Y.H. Chin and K.Q. Yan, 1995. Byzantine agreement in a generalized connected network. IEEE Trans. Parallel Distributed Syst., 6: 420-427.
    Direct Link    


  • Yan, K.Q., S.C. Wang and Y.H. Chin, 1999. Consensus under unreliable transmission. Inform. Process. Lett., 69: 243-248.
    Direct Link    

  • © Science Alert. All Rights Reserved