INTRODUCTION
Combinatorial optimization problems are expressed by a cost function with or
without constraints to be minimized or maximize on a set of definitions finished
or countable. Metaheuristics bring approximate solutions to large problem instances
and require an intensive scientific computation. However, their resolutions
are limited by available resources capacities. Thus, suitable distributed parallel
models decrease the computing time and improve the quality of obtained solutions.
Also, the exploitation of several workstations is an opportunity for parallel
computing but new problems are posed (resources heterogeneity, failure of nodes).
These new challenges are overcome with the use of high performance clusters
and grids computing. Grids computing are an emerging computing model that provides
the ability to perform higher throughput computing by taking advantage of many
networked computers to model a virtual computer architecture that is able to
distribute process execution across a parallel infrastructure. Grids computing
are often confused with cluster computing. The key difference is that a cluster
is a single set of compute nodes sitting in one location, while a grid is composed
of many clusters and other kinds of resources.
In this study, we analyse a parallel model performance of a bio-inspired algorithm:
the parallel swarm optimization metaheuristic. The application solves a scheduling
problem of tasks in a hybrid Flow Shop production system. We deploy a cluster
of nodes based on resources manager (Condor) (Pruyne and
Livny, 1995) coupled with Parallel Virtual Machine communication libraries
(Requile, 1995).
The scheduling problem in FSH production systems: In Flow-shop system of N stages, we consider tasks in entry that pass by the first stage, then the second, until the N th stage. The system includes an inter-stage stock and in each stage, there is only one resource (machine). Hybrid flow-shop problem is a generalization of the classical flow-shop problem by permitting multiple parallel machines available to process the tasks that have entered the stage.
The hybrid flow-shop is defined by a set of N stages and each stage i (i = 1,
, N) is composed of N(i) parallel machines. The M tasks pass by N stages and the completion treatment dates of tasks are known as a preliminary . On each stage, a task is treated by only one machine and all machines can treat the same operations but with different performance (the machine performance can be related to the competence of the agent which uses it). Thus, the main goal is to find a scheduling of tasks in entry and an assignment of these tasks to the various machines, which optimize a performance criterion. The criterion consists in minimizing the completion time of work and is noted Cmax. This is a combinatorial optimization problem and could not be solved by exact algorithms.
Particle swarm optimization: Like the other evolutionary algorithms, Particle Swarm Optimization (PSO) is a population based stochastic optimization technique inspired by social behavior of bird flocking or fish schooling.
Reynolds, Heppner and Grenander created simulation models on flights of birds
(Li-Ping et al., 2005). Reynolds was intrigued
by the aesthetic choreography of the birds moving while, Heppner was interested
by the discovery of fundamental rules which made that the birds gathered, changed
direction suddenly, dispersed then again gathered in synchronized manner. The
finality of this behavior is the effective search for food allowing survival
in the environment.
Sociologists confirm that a local process is at the base of this dynamic and unforeseeable behavior of individuals groups. In a research space, each individual is guided by his own experiment and benefits from the preceding experiments of the other individuals.
The PSO introduced by Kennedy and Eberhart (1995).
Based on the Heppner model, they try to simulate the human social behavior (Koh
et al., 2005). The scientists estimate that find a source of food
is similar to find a solution in a common research space and assimilate the
physical movements of animals to the opinion changes of individuals.
In the PSO algorithm, the birds in a flock are symbolically represented as
particles. These particles can be considered as simple agents flying through
a problem space. A particles location in the multi-dimensional problem
space represents one solution for the problem. When a particle moves to a new
location, a different problem solution is generated (Hu
et al., 2003).
The canonical model: Assume a D dimensional search space. The position
of the ith particle in design space is a D dimensional vector Xi
= (Xi1,..., Xid). The velocity of this particle is also
a D dimensional vector Vi = (Vi1,...,Vid).
The best previous position pbest encountered by the ith particle
is denoted by Pi = (Pi1,..., Pid). Then, the
swarm is manipulated by the equations (Schi and Eberhart,1998):
where, rand() is a random number uniformly distributed within [0,1], C1
is a cognitive parameter used to bias the search of a particle toward its best
experience, C2 is a social parameter used to bias the search of a particle toward
the best experience of the whole swarm, W is an inertia weight, used as mechanisms
for the control of the velocitys magnitude A large inertia weight facilitates
global exploration by searching new spaces while a small inertia weight facilitates
local search. Suitable selection of the parameter W provides a balance between
the local and global exploration search abilities to reach the optimum in a
minimum of iterations (Schutte et al., 2004).
The algorithm: In PSO algorithm, each particle is characterized by:
• |
The objective function cost for its current position or that
acquired previously: pbest |
• |
The knowledge of its neighbors |
• |
The best previous and current position acquired among all
the particles in the swarm gbest |
The PSO is initialized with a set of random particles (solutions) and then
searches for optima by updating generations (Hu et al.,
2003).
In each step, the particle is updated and makes a compromise between three possible choices (Fig. 1):
• |
To return towards its best obtained position |
• |
To move towards the best obtained position of the swarm |
Each particle, updates velocity and position according pbest and gbest.
|
Fig. 1: |
Pseudo code of PSO algorithm |
The velocity restriction constant Vmax was also included in the
algorithm. In Eq. 1, if the sum of the tree parts exceeds
a constant value, then Particles velocity is clamped to a maximum velocity
Vmax that is specified by the user. This mechanism prevents the phenomenon
of swarm explosion (Li-Ping et al., 2005).
The pseudo code of PSO algorithm is given as follow (Koh
et al., 2005):
An asynchronous parallelization of PSO on a high performance cluster: High-performance clusters are implemented primarily to provide increased performance by splitting a computational task across different nodes. Clusters are often managed by cluster resources which optimize the use of resources such as computational hardware, storage, network and increase robustness of long-running parallel optimization algorithms.
For our application we choose a type of resources manager known as batch queue manager interfaced with a communication libraries.
The resource management mechanism: Condor is a batch scheduler for compute-intensive
jobs which provides a job queueing mechanism, scheduling policy, priority scheme,
resource monitoring and resource management. Users submit their jobs, Condor
places them into a queue, chooses when and where to run the jobs based upon
a policy, carefully monitors their progress and ultimately informs the user
upon completion.
|
Fig. 2: |
Roles and daemons of condors pool |
Within Condor, each node can play one or many roles simultaneously and the various roles are implemented by several daemons. It can be submit node, execute node or a central manager. The pool contains only one central manager. It is the important part in the cluster. The machine collect informations and negotiate between availables resources and requests.
As can be seen in Fig. 2, central manager is managed by master, collector and negociator daemons. Started daemon represents a given resource on execute machine. It is responsible for enforcing the policy that resource owners conFig. which determines under what conditions remote jobs will be started, suspended, resumed, vacated, or killed. When the startd is ready to execute a Condor job, it spawns the starter daemon.
Starter is the entity that actually spawns the remote Condor job on a given machine.
The Schedd daemon is present on the submit machine. It represents resources requests to the Condor pool.
The Condor pool can also contain a checkpoint server. It is an optional role. This node stores the entire checkpoint files for the jobs submitted (Fig. 2).
The PVM is a viable scientific computing platform appropriate as well for multiprocessors with shared memory as with single processors with distributed memory. It is composed of a set of demons and functions libraries. Programs use PVM library routines for functions such as process initiation, message transmission and reception. Than, requests clusters resources are managed by a master process created by Condor, PVM daemons communicate by sending and receiving messages. The main characteristics of our cluster are reported in Table 1.
Asynchronous parallel strategy of PSO algorithm: Serial algorithm follows
a synchronous model by updating each best individual pbest and group gbest fitness
values then their associated velocities and positions.
Table 1: |
The characteristics of the cluster |
 |
|
Fig. 3: |
Master Slave model |
Serial PSO was already implemented to solve the FSH scheduling problem Eq.
1 nevertheless, several strategies of parallelization are possible.
The parallel algorithm should operate in such a fashion that it can be easily decomposed for parallel operation on a cluster. It is obvious to note that fitness function evaluation is done in a concurrent way. Parallel implementation follows a master/slave model (Fig. 3). The master spawned tasks. Tasks number corresponds to the swarm size. After, the master initialises and sends data to slaves nodes that achieve pbests evaluation and transmit the result to master node. After receiving all individuals fitness values, the master deduces gbest and updates parameters.
This is a parallel synchronous particle swarm optimization PSPSO. Step 2 of PSO algorithm is modified as follow:

Synchronization of nodes: The performance of a parallel program is related to execution time. By measuring how long the parallel program needs to run to solve our problem, we can directly measure its effectiveness. There are two important performance metrics: speedup and parallel effectiveness.
Speedup is the ratio of the serial program execution time Ts to the parallel execution time Tp on N processors. Parallel efficiency is the ratio between the speedup and the number N of processors. Parallel effectiveness is an indication of scalability. Ideal speedup should equal the number of processors with a maximum of parallel efficiency of 1 (100%).
Some implementations were carried out in this context, in particular (Schutte
et al., 2004) for a global biomechanical optimization. Experimentations
revealed that improved convergence rates could be obtained within a homogeneous
cluster of computers without interruptions. Also, the parallel effectiveness
reduced as compute nodes number increases. Indeed, it is easy to generate a
bottleneck when the parallel processors reach simultaneously the shared disc
of master node. Due to the synchronization requirement of the current parallel
implementation, the resulting load imbalance caused by even one slow fitness
evaluation was sufficient to degrade parallel performance rapidly with increasing
number of nodes. This degradation produced also when master node was idle the
most of time (Schutte et al., 2004).
This low effectiveness is overcome with asynchronous version of PSO algorithm.
Parallel Asynchronous Particle Swarm Optimization (PAPSO) has proposed and implemented
recently (Koh et al., 2005). In PAPSO, particle
positions and velocities are updated continuously based on currently available
data (gbest).
Thus, the master computes velocity and positions particle as soon as it received pbest from slave node. Consequently, PAPSO incorporates a dynamic load balancing scheme with a centralized task queue approach to reduce load imbalance.
For all this reasons, we chose PAPSO in our implementation. Experimental tests are used on a cluster of 14 compute slave nodes. The master node is used to coordinate the particle queue. The main characteristics of our cluster are reported in Table 1.
Experimental tests and results: Our tests treat scheduling of 20 tasks in entry on FSH of 2 stages with three machines in the first and two machines in the second. The problem is noted by FSH (P3,P2)||Cmax. In this case, the objective function corresponds to Cmax.. Experimental tests aim to study the impact of swarms size and clusters size on numerical results. Both serial and parallel algorithm are executed and compared. Ten independent experiments have been performed. The Cmax cost is computed; Speedup and the parallel effectiveness are measured.
Our infrastructure is optimized for large sized problem instances. We apply
experimental tests by varying clusters size (6, 8, 10, 12 and 14).
Table 2: |
Numerical results of PSO and PAPSO |
 |
|
Fig. 4: |
Parallel distributed PSO: Cmax |
The swarm is initialised seven times with different particles sizes
and distributed to compute nodes for evaluation of fitness values. For example,
on a cluster with 10 nodes, the swarms size vary from 10, 20, 30, 40,
50, 60 to 70 particles. With 14 nodes, the sizes of swarm vary from 14, 28,
42, 56, 70, 84 to 98 particles.
Serial execution algorithm results and parallel execution results on a cluster with 14 nodes are given by:
According to the numerical results, it appears that adding more processors improves performance of parallel PSO. The obtained results support the claim that the PAPSO algorithm outperforms the serial PSO algorithm in all cases (Table 2, Fig. 4). It is speculated that because the updating occurs immediately after each fitness evaluation, the swarm reacts more quickly to an improvement in the best-found fitness value and improved convergence is accelerated.
Concerning parallel results, we notice that an improvement is obtained when
each node treats two particles (with a swarm of 28 particles). Increasing the
size of the swarm produces an unavoidable performance loss in terms of convergence
rate. This is when the swarm is initialised with 42 and 56 particles. In this
case, each node treats 3 and 4 particles. The reason is that the more processors
are used, the more number of particles each individual processor owns, which
results in relative lower computation cost and higher communication overhead
to parallel processing.
Thus, the master updates velocities and positions values more slowly and solution
quality decreases.
With 70 particles, convergence rate reaches the optimum. Objective function cost reaches 493. This is can be explained by increase of swarm's size. Indeed, with enhanced amount of particles, the algorithm evolves within a large research space. However, exceeding this value, the Cmax cost increases, consequently, the swarm will react more slowly to changes of the best fitness value in the design space. This is due to communication latency between slaves nodes and the master. We observed also that our cluster limits the swarms size to 84 particles (84 tasks) and cannot assign more than six tasks to each slave node.
In addition and in order to study parallel performance, we vary the number of compute nodes from 6 to 14.
As can be seen in Fig. 5, the speedup goes down when using more nodes. Our application requires much communication between the coordinate node and the slaves nodes. For this reason, our speedup curve presents saturation (Fig. 5). Thus, parallel effectiveness decreases when the number of nodes increases (Fig. 6). This occurs when communication time grows faster than computation time decreases.
According curves, we can conclude that the communication overhead is the main cause of the speedup saturation.
In general, our experiments present good overall performance results of our parallel asynchronous implementation in terms of optimization and solution quality. We can hope for a better improvement by increasing the size of the cluster.
|
Fig. 6: |
PAPSO: Parallel effectiveness |
CONCLUSION
This study has presented a parallel asynchronous implementation of the particle swarm optimization algorithm. The method was validated by an industrial scheduling problem. The PSO is an approach to problems whose solutions can be represented as a point in an n-dimensional solution space. Many improvements of original model PSO were proposed by adjusting parameters in the algorithm. However, Parallel optimization uses multiple computers or processors with an aim of to reduce the elapsed time and to accelerate convergence rate. Then, a good acceleration can be obtained. Both, single node and parallel implementations of the algorithm have been developed and applied on a high-performance cluster with a fault-tolerant strategy to obtain an efficient and robust parallel scheme.
Two widely used metrics have been used for the evaluation of the results. Through the obtained results, we note that decomposition of PSO algorithm in several tasks produce improvements and accelerate convergence however too many processors decreases the parallel efficiency and cluster size limits the number of tasks spawned by master node.
Lastly, we can say that distributed system technology, grid computing offers
a number of potential uses and benefits for parallel and distributed algorithms
and a wide range of computational problems. Nevertheless several parameters
should be studied beforehand: the parallel model, the topology, the populations
size as well as the infrastructure used play a dominate role in the solution
quality and the parallel effectiveness. Future research includes parallelization
of other another bio-inspired algorithm: ant-colony optimization.
ACKNOWLEDGMENTS
Authors Acknowledgments are intended to University of Science and Technology of Oran and in particular to Laboratory of Simulations and Modelling of Industrial Systems which has placed at our disposal the necessary material and a suitable environment for carrying out this study.