HOME JOURNALS CONTACT

Information Technology Journal

Year: 2013 | Volume: 12 | Issue: 12 | Page No.: 2358-2365
DOI: 10.3923/itj.2013.2358.2365
A Novel Multi Pursuers-one Evader Game based on Quantum Game Theory
Wang Hao, Dong Chao-Wei and Fang Bao-Fu

Abstract: In this study, the problem of multi pursuit-evasion game strategy optimal and optional was studied and analyzed. In the multi pursuit-evasion games includes two self-interest pursuers and single evader and the speed of pursuer is equal to the speed of evader. By extended classic game model to the quantum game model, the self-interest of individual no longer leads to the disintegration and obtain the optimal cooperative relations. This would solve the problem that self-interested individual tend to benefit from cooperation at the lowest cost and leading to the cooperation's disintegration and cannot get stable optimal cooperative relations. On the other hand, the problem of individual strategy’s cooperation was solved in the quantum game model. Under quantum game model, the plight of the classic hunt for decision-making was finally solved through theoretical analysis and simulation experiments, the evolution and results of the quantum game also discussed. It shows the advantages and effectiveness of the quantum game model in dealing with such issue. This also broadened the applications of the quantum game, provides a new way of thinking and research methods for multi-robot problem.

Fulltext PDF Fulltext HTML

How to cite this article
Wang Hao, Dong Chao-Wei and Fang Bao-Fu, 2013. A Novel Multi Pursuers-one Evader Game based on Quantum Game Theory. Information Technology Journal, 12: 2358-2365.

Keywords: nash equilibrium, quantum games, pursuit-evasion strategies, Multi-robot pursuit game and quantum decision

INTRODUCTION

Multi-robot pursuit-evasion problem is a classic platform of researching the competing and cooperation in the multi-robot system. A one-to-one double robot pursuit problem was first proposed by Isaacs (1965). In the research, the robots collaborating together to complete the pursuit of multiple moving evaders, Yamaguchi and Arai (1994) used the linear autonomous system to control the multi-robot system. In the research of discrete multi-robot pursuit-evasion problem, many methods such as reinforcement learning, contract net were studied by Pu-Cheng et al. (2007) and Lazhar et al. (2011). Recently, researchers tend to pay more attention to continuous multi cooperation pursuit. A method to decompose multi cooperation pursuit into simple problems is presented by Cai et al. (2009). In this way, the main emphasis has shifted to how to decompose it and strengthen the cooperation. Other researchers, such as Herrero-Perez and Martinez-Barbera (2010) and Mehrjerdi et al. (2010) transform the continuous state into discrete state by using fuzzy technology and state reduce technique.

All above research considered the robot as rational but not self-interested agent. So the robot team can get the optimal result without regard to the robot individual benefit. But in the multi-robot pursuit-evasion game, self-interested individual tend to benefit from cooperation at the lowest cost and leading to the cooperation's disintegration and cannot get stable optimal cooperative relations. It cause that the robot team can’t get the optimal result. By extended classic game model to the quantum game model, the self-interest of individual no longer leads to the disintegration and obtain the optimal cooperative relations. In the field of quantum game theory, the PQ problems were first studied by Meyer (1999). A new method based on quantum games proposed by Eisert et al. (1999) solved the prisoner's dilemma problem. And then a series of studies were carried out, such as quantum entanglement problem by Du et al. (2001, 2002) and Kargin (2008), the relation between payoff matrix structure and quantum coherence by Brandenburger (2010) and Iqbal and Abbott (2009). In this paper the robot is considered as rational and self-interested agent. So when robot takes the best behavior for its own decision-making pursuer team is not necessarily the best. The pursuer may drop in the dilemma because the individual’s self-optimal so that can’t complete the task of the pursuit. A cooperative strategy from the perspective of the game theory is studied. The cooperation and competition in the process of multi-robot pursuit-evasion is also actually a process of a game. The important point in the process of the game is the individual robot pursuing their best interests and maintaining the collaboration. In the case of pursuit and evasion the constant velocity of the two sides, we build a quantum game model for the multi-robot system and extend the classic game strategy space to the quantum strategy space and then analyze the game process in the quantum model and quantum entanglement. Under quantum game model the contradiction individual optimal and team optimal in classic game eventually is resolve. It shows the advantages and effectiveness of the quantum game model in dealing with such issue. This also broadened the applications of the quantum game, provides a new way of thinking and research methods for multi-robot problem.

MULTI-ROBOT PURSUIT-EVASION PROBLEM DESCRIPTION

In a region, two self-interested pursuers cooperate to capture evader. The 9x9 grid region which all robots are located in shown as Fig. 1. Their speed is equal and their initial locations are random. After the start of the pursuit, they are in cooperation to pursue the evader. Each robot only have five pursuit action strategies: up, down, left, right, hold. The pursuers know the location information each other and allow two pursuers in the same grid but not allow one pursuer and one evader in the same grid. Each robot makes decision independently and not allowed communication with each other. The evader will not initiatively reduce the distance with the pursuer (according the value of location we can calculate the distance). When the pursuers occupy the adjacent grids of the evader the pursuit is success.

Fig. 1: Multi-robot pursuit schematic under the 9x9 grid region, it contains two pursuers and one evader

The pursuer's profits are defined as follows:

The pursuers are according to their most favorable direction to pursue the evader
When the distance between the pursuer and the evader is constant the profit is 0
When the distance between the pursuer and the evader increase 1 grid the profit is -1
When pursuers make the evader reduce one escape direction the profit is 1.5
When the pursuer approaches the evader 1 grid (the X direction or Y direction) the profit is 1

In the process of pursuit often appears that when the pursuer individual interest is optimal but the team is not the optimal (Fig. 2, 3).

As the pursuers pursue the evader in the accordance of their most favorable interest direction so that we can easily calculate the best strategy and the direction is left and both pursuer profit is 1. If they consider the team interests what the cooperation brings that they can get higher reward. In Fig. 3, the pursuer 1 selects the down direction and the pursuer 2 selects the up direction, under this cooperative strategy their returns are 2 but it is not easy to get this strategy.

Assume that Si1∈S1 are the strategies of pursuer 1, Sj2∈S2 are the strategies of pursuer 2. There is an advantage strategy to the pursuer that is S* what must satisfy the following conditions:

(1)

Fig. 2: Two pursuers select the cooperative strategy

(2)

We call the strategy unit (S*1, S*2) a Nash equilibrium point and they must satisfy the following relationships:

π1 (S*1, S*2)≥π1 (Si1, S*2)

π2 (S*2, S*1)≥π1 (Si2, S*1)

where, Si1∈S1, Sj2∈S2 assume that uncooperative strategy is D and cooperative strategy is C. The pursuer’s max payoff are as follows:

π1 (C,C) = 2.0, π1 (S,D) = 0.5, π1 (D,C) = 2.5, π1
(D,D) = 1.0

π2 (C,C) = 2.0, π2 (S,D) = 2.5, π2 (D,C) = 0.5, π2
(D,D) = 1.0

If the pursuer 2 chooses S2 = C then the pursuer 1 has the following relationship:

π1 (D,C) = 2.5≥π1 (C,C) = 2

If pursuer 2 chooses S2 = D then the pursuer 1 has the following relationship:

π1 (D,D) = 1≥π1 (C,D) = 0.5

So that the strategy D is the most optimal to pursuer 1 and it is the same to the pursuer 2. The strategy unit (S1, S2) is a Nash equilibrium point so any participant who deviates from this equilibrium will not get better profit.

Fig. 3: Two pursuers select the uncooperative strategy

The Pareto may be the most optimal strategy, namely the strategy unit pay is no less than the other strategies and there is no participant improves his own pay without any participant does not reduce their own pay. Obviously, Pareto is not the most optimal in the strategy unit.

MULTI ROBOT PURSUIT ALGORITHM BASED ON THE QUANTUM GAME THEORY

In the quantum game we set the strategy of the pursuer as a quantum state. Each pursuer has a quantum bit then the pursuer decision-making becomes the operation on each quantum bit, after the operation of quantum state calculation we can get the final state and we can get the final state after the measurement of the final quantum state then collapses to the eigenvector of the probability, finally they can get their own pays.

The quantum model is just like the Fig. 4, two quantum bits’ initial state is the |C, we can use a quantum gate to make the two quantum bits produce an entanglement, then we can get the game’s initial state (|Ψ0 =|CC). The quantum entanglement gate is a fair and open U operation to the pursuers. Then the two pursuers’ decision-making operation according to the corresponding Û1 or the Û2, this is also a quantum gate whose action is on their respective quantum bit. Followed by, the quantum superposition states are resolved by a quantum gate and then we can get the final quantum state:

(3)

We can also get the corresponding pursuer strategy (Û1, Û2) pay expectation by the quantum’s mixed state |Ψf measuring.

The analysis of the robot pursue game model: In the physical model of the quantum game discussed above we can use two quantum bits to correspond to the decision-making of the two pursuers.

Fig. 4: Quantum game mode of multi-robot pursuit-evasion problem

In the Hilbert space the most classical strategy described this quantum model C and D can be corresponded to the quantum’s two nature states:

Then the pursuer’s strategy is corresponded to one U matrix concluded two variables:

(4)

0≤α≤π, 0≤θ≤π/2

The pursuer strategy is corresponded to the quantum game model cooperation and the formula is just as the following U matrix:

There is also a new strategy is defined as the follows:

The choice requires of the quantum gate makes the classic game model be included in the quantum game model and this is required to meet the following commutation relations:

(5)

Abelian sub-group theory can give the type 5 a solution:

In this formula the γ represents the entanglement degree:

(6)

In the model of the quantum game the two quantum bits are operated by the original state |C in the U gate by the pursuer's U operator, we can get a quantum state which is a superposition state by saluting the entanglement and this is the quantum game final state:

(7)

where, a1, a2, a3, a4 are the probability blessing under their respective basic vector and |a1|2, |a2|2, |a3|2, |a4|2 final state matrix can be described as the following formula 8 by evolutionary computation in the specific Hilbert space.

Finally, using the Von Neumann measurement model to the superposition and superposition state collapse to the corresponding probability on the base of each orthogonal, the result of the measurement of the game is (σ1, σ2) whose probability is Pσ1σ2:

Pcc = |a1|2, PcD |a2|2, PDc |a3|2, PDD |a4|2

The results above give out the statement of the game result, so the pursuer's payoff expectation can be expressed as following formula 9:

(8)

(9)

Then r, d, s, p are the corresponding payoffs of the game strategy unit. The pursuer 1 payoff is r if two pursuers both choose the strategy C.

Realization and simulation: In the quantum game model described above the pursuer adopts a new strategy space and we can calculate their respective payoff. To pursuer 1:

(r, t, s, p) = (2, 2.5, 0.5, 1) so the pursuer 1 payoff is:

(10)

When it is γ = 0:

(11)

Fig. 5: Pursuer 1’s payoff function surface and set γ = 0, the U1 axes is the parameterized strategy matrix U (α, θ) of pursuer 1

Then we can get that:

(12)

The strategies U1 depend on the parameter kε [-1,1] and set U1 = U (kπ,0) for kε [0,1] and U1 = U(0,-kπ/2) for kε [-1,0] (same for U2).

Pursuer 1 payoff function surface is shown as Fig. 5. It can be seen from the figure that all of the payoffs are 1 when the pursuers choose the strategy 2, Fig. 6 shows that if the pursuer 1 wants to improve his payoff he must be at the expense of the pursuer 2 for premise, when the pursuer 2 choose the strategy he can obtain the maximum payoff for 2.5 but the figure shows that the pursuer 2 payoff has been reduced to 0.5 so the pursuer 2 impossibly makes such decision, we draw both the pursuer 1 and pursuer 2 payoff function surfaces in the Fig. 6 and the deeper color is the pursuer 2 payoff function surface, now it can be seen that the two payoff function surfaces are coincident when the two pursuers strategies U(α, θ) both valued between Ĉ and , it demonstrates a certain symmetry in other areas, then we can get the following regulations by analyzing the two pay off functions: 0≤α2≤π, 0≤θ2≤π/2, 0≤α1≤π, 0≤θ1≤π/2 have the following regulations:


Fig. 6: Pursuer 1 and pursuer 2 payoffFunction Surface γ = 0. The parametrization is chosen as in Fig. 5

It is the same to the pursuer 2:

Has the following relationship:

It can be seen from the relationship above that is always the game Nash equilibrium point and the Pareto is not the most optimal. The pursuit dilemma still exists. Game participants are still unable to obtain the optimal decision-making of both the individual and team interests; here we change the value of the γ to re-calculus:

(13)

When γ = π/2, we can calculate the game's quantum state before the measurement state just according to the formula 9 we get formula 12.

Fig. 7: Pursuer 1 payoff function surface and γ = π/2, The parametrization is chosen as in Fig. 5

Fig. 8: Pursuer 1 and the pursuer 2 payoff function surface, the deeper is the pursuer 2 and γ = π/2. The parametrization is chosen as in Fig. 5

So that by the formula 10 we can get the pursuer payoff functions, then pursuer 1 payoff function surface is shown as Fig. 7 shows that the payoff still depends on the pursuer 2 decision-making if the pursuer 1 chooses the strategy but the pursuer 2 chooses the strategy the situations changes greatly compared to the front and the pursuer's payoff is 2.5 but the pursuer 1 payoff reduced to the minimum, now obviously that the strategy is a better choice to the pursuer 1 and we can easily make the condition 0≤α1≤π, 0≤θ1≤π/2 have the following relationship:

(14)

And it is the same with the pursuer 2, it can be seen that now is a new Nash equilibrium point, both side payoff are π1 = π1 = 2 and the classic game strategy's dilemma is disappeared at this time.

Fig. 9: Payoffs curve when pursuer 1 pays the strategy and pursuer 2 pays the parameterized strategy chosen as in Fig. 5

It can be seen from the Fig. 8 that the strategy is the most optimal choice if the opponent chooses the strategy , it is the smallest payoff strategy if choose the strategy Ĉ. Now we can consider that the pursuer 1 chooses strategy and the pursuer 2 can choose the strategy himself, the two pursuers are respectively the following relationships:

(15)

Now we can get the payoff surface Fig. 9, when the pursuer 1 strategy is the strategy the pursuer 2’s strategy must approach the strategy if he wants more payoff, it can be seen from the figure that the pursuer 2 payoff is basically increasing linearly with the approaching degree to the strategy , behind the half of the curve both payoff show the significantly correlation, depend on each other, the game results in a satisfactory end when they both choose the strategy , every pursuer’s payoff is 2 and the team payoff is 4, not only the individual gets the most optimal value but the team is also the most optimal. So the strategy is not only the Nash equilibrium point but also Pareto optimal.

RESULTS AND ANALYSIS OF THE EXPERIMENT

Now calculate the payoff of the model discussed earlier. The data are listed in the following Table 1.

In the classical case, the pursuers adopt individual optimal strategy, pursuer individual interest is optimal but the team is not the optimal. Sometimes degenerate into a single pursuit situation as shown in Fig. 10. The pursuers adopt individual optimal strategy D that leads to the escapes have more escape-directions; after the escape chooses one direction randomly the pursuers become more time-consuming.

Table 1: Payoff of pursuers under quantum model

Fig. 10: Pursuit of the process in classical model

In the quantum game model when γ = π/2, it can be seen from Table 1 the pursuers take quantum strategy will get a better payoff. So as shown in Fig. 11, the quantum game can outperform the classical game. In the model pursuer adopts the quantum strategy , although the pursuer requires the individual interests but under the quantum game model the individual interests and the team interests do not generate the dilemma when they unify together, the pursuer pursue for interests himself is the same with the team interests at the same time. This shows that in the quantum game model the value γ in the model of the game is an important role in the game results, it can be known that when γ = 0 the game and the classical have no changes, when γ = π/2 the game give the result that we desire. In the classical game, the mutual cooperation Ĉ⊗Ĉ is a nash equilibrium but it is not Pareto optimal. However, in the quantum game, a novel Nash equilibrium has the property to be Pareto optimal and so the players escape the dilemma.

Now let us analyze the dominant strategy of the pursuer 1 when the value γ changes, it can be seen from the Nash equilibrium condition that when the pursuer choose one decision the smallest payoff he can get is as he following:

(16)

Fig. 11: Pursuit of the process in quantum model and γ = π/2

Fig. 12: Min-payoff of pursuer 1 when pursuer 2 select the dominant strategy, 0≤γ≤π/2

This will determine the dominant strategy. As shown in Fig. 12 when the value γ is very small we can easily calculate the smallest payoff is 1 under the optimal strategy , this is better than other strategies. But when the relationship γ≥γ0 ≈ 0.511 changes the strategy smallest payoff is better than the strategy , the point becomes the new Nash equilibrium point.

CONCLUSION

Through the analysis of the multi-robot pursuit strategy choice, because of the appearance of the non-cooperative strategy you can only choose one in the process of the pursuer individual interests-oriented when they are pursuing so that the multi-robot can’t play out as a team advantage. The reason for this result is the dilemma of the pursuer individual self-interest and the team win-win. To solve this dilemma we use a quantum model to extend the original game to a larger quantum game space. So the original strategy space becomes a subset of the quantum model, in this case we discuss in detail the outcome of the game changes, the result is similar with the classic situation when the entanglement is γ = 0, with the increasing of the entanglement γ the self-interest behavior between the pursuers is disappeared, when the entanglement is large enough the pursuer appears the cooperation, the dilemma existed before is resolved.

Through the discussion to this issue, we think that the introduction of the quantum model provides an effective way to the cooperation strategy of the multi-robot pursuing. The quantization of the game model allows the robot to be able to combine the two interests so that there is no contradict in dealing with the individual and the team interests. Finally, through the derivation and experimental detailed analysis of the mechanism of this method, our study shows that the original complex or contrary problems can get more convenient and better results in the quantum mechanics. We want to be able to make the quantum game can play its role in the broader field.

ACKNOWLEDGMENTS

The study is supported by National Natural Science Foundation of China (No. 61070131, No. 61175033, No. 61175051) and the Fundamental Research Funds for the Central Universities (2011HGQC1011).

REFERENCES

  • Brandenburger, A., 2010. The relationship between quantum and classical correlation in games. Games Econ. Behav., 69: 175-183.
    CrossRef    


  • Cai, Z., S. Li-Ning and H. Gao, 2009. A novel hierarchical decomposition for multi-player pursuit evasion differential game with superior evaders. Proceedings of the 1st ACM/SIGEVO Summit on Genetic and Evolutionary Computation, June 12-14, 2009, Shanghai, China, pp: 795-798.


  • Meyer, D.A., 1999. Quantum strategies. Phys. Rev. Lett., 82: 1052-1055.
    CrossRef    Direct Link    


  • Du, J., H. Li, X. Xu, X. Zhou and R. Han, 2002. Entanglement enhanced multiplayer quantum games. Phys. Lett. A, 302: 229-233.
    CrossRef    Direct Link    


  • Herrero-Perez, D. and H. Martinez-Barbera, 2010. Fuzzy uncertainty modeling for grid based localization of mobile robots. Int. J. Approximate Reasoning, 51: 912-932.


  • Iqbal, A. and D. Abbott, 2009. Non-factorizable joint probabilities and evolutionarily stable strategies in the quantum prisoner's dilemma game. Phys. Lett. A, 373: 2537-2541.
    Direct Link    


  • Isaacs, R., 1965. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. John Wiley and Sons, New York, USA


  • Eisert, J., M. Wilkens and M. Lewenstein, 1999. Quantum games and quantum strategies. Phys. Rev. Lett., 83: 3077-3080.
    CrossRef    Direct Link    


  • Kargin, V., 2008. On coordination games with quantum correlations Int. J. Game Theory, 37: 211-218.
    Direct Link    


  • Lazhar, K., F. Touati, K. Benhmed and A. Al-Yahmedi, 2011. Mobile robot navigation based on Q-learning technique. Int. J. Adv. Rob. Sys., 8: 45-51.
    Direct Link    


  • Mehrjerdi, H., M. Saad, J. Ghommam and A. Zerigui, 2010. Optimized neuro-fuzzy coordination for multiple four wheeled mobile robots. Inform. Technol. J., 9: 1557-1570.
    CrossRef    Direct Link    


  • Du, J., X. Xu, H. Li, X. Zhou and R. Han, 2001. Entanglement playing a dominating role in quantum games. Phys. Lett. A, 289: 9-15.
    CrossRef    Direct Link    


  • Yamaguchi, H. and T. Arai, 1994. Distributed and autonomous control method for generating shape of multiple mobile robot group. Proceedings of the IEEE/RSJ/GI International Conference on Intelligent Robots and Systems: Advanced Robotic Systems and the Real World, Volume: 2, September 12-16, 1994, Federal Armed Forces University, Munich, Germany, pp: 800-807.


  • Pu-Cheng, Z., H. Yu-Sheng and X. Mo-Gen, 2007. Extended contract net protocol for multi-robot dynamic task allocation. Inform. Technol. J., 6: 733-738.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved