Combinatorial optimization problems are expressed by a cost function with or
without constraints to be minimized or maximize on a set of definitions finished
or countable. Metaheuristics bring approximate solutions to large problem instances
and require an intensive scientific computation. However, their resolutions
are limited by available resources capacities. Thus, suitable distributed parallel
models decrease the computing time and improve the quality of obtained solutions.
Also, the exploitation of several workstations is an opportunity for parallel
computing but new problems are posed (resources heterogeneity, failure of node).
These new challenges are overcome with the use of high performance clusters
and grids computing. Grids computing are an emerging computing model that provides
the ability to perform higher throughput computing by taking advantage of many
networked computers to model a virtual computer architecture that is able to
distribute process execution across a parallel infrastructure. Grids computing
are often confused with cluster computing. The key difference is that a cluster
is a single set of compute nodes sitting in one location, while a grid is composed
of many clusters and other kinds of resources (Parsopoulos
et al., 2004).
In this study, we analyse a parallel distributed model performance of a bio-inspired
algorithm: the Parallel Swarm Optimization metaheuristic. The application solves
a scheduling problem of tasks in a hybrid flow shop production system. We deploy
a cluster of nodes based on resources manager (Condor) (Pruyne
et al., 1995) coupled with Parallel Virtual Machine communication
libraries (Requile, 1995).
THE PSO ALGORITHM
In PSO algorithm, each particle is characterized by:
||The objective function cost for its current position or that
acquired previously: pbest
||The knowledge of its neighbors
||The best previous and current position acquired among all
the particles in the swarm gbest
The PSO is initialized with a set of random particles (solutions) and then
searches for optima by updating generations (Hu et al.,
|| Pseudo code of PSO algorithm
In each step, the particle is updated and makes a compromise between three possible choices (Fig. 1):
||To return towards its best obtained position
||To move towards the best obtained position of the swarm
Each particle, updates velocity and position according pbest and gbest.
The velocity restriction constant Vmax was also included in the
algorithm. If the sum of the tree parts exceeds
a constant value, then Particles velocity is clamped to a maximum velocity
Vmax that is specified by the user. This mechanism prevents the phenomenon
of swarm explosion (Li-Ping et al., 2005).
The pseudo code of the procedure is as follows (Kennedy
et al., 1995):
PARALLEL DISTRIBUTED STRATEGY OF PSO ALGORITHM
Initial swarm is divided into sub-swarms and distributed to clusters
compute nodes (Fig. 2). Each processor executes the PSO algorithm
independently and accelerates convergence to the global optima by sending their
best solutions (fitness) at each interval of iterations to a nearby node in
a ring topology. This approach was applied to a problem of multiobjective optimization
(Parsopoulos et al., 2004).
|| Parallel Distributed PSO model
|| Parallel distributed PSO: Cmax
In this study, we investigated this approach with an aim of measure parallel
efficiency by studying the impact of the number of sub-swarms and particles
on solution quality.
In order to minimize inter-nodal communication, which often forms the performance
bottleneck on networked machines, we used a master/slave communication model
with a minimal amount of communication. In our application, the master node
is used exclusively for distributing sub-swarms and to coordinate the particle
queue. The slaves executes PSO algorithm in parallel, migrate their best solution
to a nearby slave and finally send the solution to the master (Fig.
3). We used a cluster of 14 nodes and PVM library routines for sending and
|| The characteristics of the cluster
The main characteristics of our cluster are reported in Table
EXPERIMENTAL TESTS AND RESULTS
Serial execution algorithm results are shown by Table 2:
In this case, the objective function corresponds to Cmax.
The performance of a parallel program is related to execution time. By measuring how long the parallel program needs to run to solve our problem, we can directly measure its effectiveness. There are two important performance metrics: speedup and parallel effectiveness.
Speedup is the ratio of the serial program execution time Ts to the parallel execution time Tp on N processors. Parallel efficiency is the ratio between the speedup and the number N of processors. Parallel effectiveness is an indication of scalability. Ideal Speedup should equal the number of processors with a maximum of Parallel efficiency of 1 (100%).
Our infrastructure is optimized for large sized problem instances. The swarm is initialized with a population of 2048 particles and then divided into sub-swarms as follow:
||2 sub-swarms of 1024 particles
||4 sub-swarms of 512 particles
||8 sub-swarms of 256 particles
||146 sub-swarms of 146 particles
The swarm is divided 4 times. At each time, a cluster of 2, 4, 8 and 14 nodes evaluates the parallel implementation. These experiments aim to show if it is better to choose a parallel optimization with a significant number of sub-swarms with reduced sizes rather than some sub-swarms with a large population. For each case, 10 independent experiments have been performed. The Cmax cost is computed; Speedup and the parallel effectiveness are measured.
According to the numerical results, it seems that using 2 swarms or more improves
the performance of parallel algorithm in some cases (Fig. 4).
This improvement reaches the optimum (maximum) when using 4 sub-swarms of 512
|| Serial execution algorithm results
|| Parallel distributed PSO: speedup
|| Parallel distributed PSO: effectiveness
This implies that cooperation between sub-swarms by sending and receiving the
Cmax cost with regular intervals has advantages.
Reducing the swarms size produces this degradation. For the case of 14 swarms, the size is equal 146. Thus, when used many sub-swarms with reduced amount of particles, the algorithm evolves within a restricted research space and that in spite of exchanges.
Concerning the parallel performance, speedup shows a good acceleration of the parallel algorithm, then performance degradation when using a cluster of 8 or 10 nodes (Fig. 5).
|| Parallel distributed (PSO,GA): Cmax
Although, the experimentations were accomplished within a high-performance cluster, we seen that parallelization becomes ineffective with 10 compute nodes. Indeed, when speedup is lower than 1, the computational time of the parallel algorithm exceeds that of the serial algorithm (Fig. 6).
The enhanced time is due to bottleneck. The main performance bottleneck is the communication latency between processors.
To compare these results with those with those obtained in the implementation
of the parallel genetic algorithm model with migration in a ring topology (Hao
et al., 1998) we applied this model to solve the same scheduling
problem. Experiences were accomplished in a configuration of shared memory.
The parallel implementation resulted in an improvement of the performance. However, the number of threads remains limited within a shared memory.
So, we have adapted this schema to our hardware configuration. Thus, with a large population size; the genetic algorithm searches the solution space more thoroughly. In a master/slave paradigm, the Master initialises the known parameters of the algorithm: population size, crossover rate, mutation rate and migration frequency.
Then, like the previous schema, the master divides the population in multiple subpopulations in which each computational node carries out genetic operations on its own chromosome set and communicates with only the neighbours in a ring topology.
As can be seen, the obtained results using 2, 4, 6, 8, 10, 12 and 14 sub-populations follow the same reasoning. Cmax grows gradually by reducing the size of the population (Fig. 7).
|| Parallel distributed (PSO,GA): speedup
|| Parallel distributed (PSO, GA): Parallel effectiveness
This model presents an acceptable acceleration (Fig. 8) and parallel effectiveness which decreases more slowly than in the PSO model but we notice that the parallel distributed PSO model gave better results.
This study has presented a parallel implementation of the Particle Swarm Optimization algorithm. The method was validated by an industrial scheduling problem. The PSO is an approach to problems whose solutions can be represented as a point in an n-dimensional solution space. Many improvements of original model PSO were proposed by adjusting parameters in the algorithm. However, Parallel optimization uses multiple computers or processors with an aim of to reduce the elapsed time. Then, a good acceleration can be obtained. Both single node and parallel implementations of the algorithm have been developed and applied on a high-performance cluster with a fault-tolerant strategy to obtain an efficient and robust parallel scheme. Two widely used metrics have been used for the evaluation of the results and for comparisons with the corresponding results of the parallel distributed genetic algorithm approach. Through the results, we note that cooperation between computational nodes produce improvements but too many processors decreases the parallel efficiency. Also, the PSO model outperforms the genetic algorithm model. Lastly, we can say that distributed system technology, grid computing offers a number of potential uses and benefits for parallel and distributed algorithms and a wide range of computational problems. Nevertheless several parameters should be studied beforehand: the parallel model, the topology, the populations size as well as the infrastructure used play a dominate role in the solution quality and the parallel effectiveness. Future research includes parallelization of other another bio-inspired algorithm: ant-colony optimization.
Our acknowledgments are intended to University of Science and Technology of Oran and in particular to Laboratory of Simulations and Modelling of Industrial Systems which has placed at our disposal the necessary material and a suitable environment for carrying out this study.