HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2012 | Volume: 12 | Issue: 20 | Page No.: 2115-2127
DOI: 10.3923/jas.2012.2115.2127
Software Refactoring: Solving the Time-Dependent Schrodinger Equation via Fast Fourier Transforms and Parallel Programming
Ali Khwaldeh, Amani Tahat and Jordi Marti

Abstract: In this study a multiprocessor C++ message passing interface implementation of a new bit-reversal algorithm to numerically solve the time dependent Schrodinger equation using a spectral method based on Fourier transform was presented. The major issues of parallel computer performance were discussed in terms of efficiency; speed up, cost and fraction of the execution time that could be parallelized. The scalable performance to a very high number of processors was addressed as well as compared with ideal values of Amdahl’s Law when presenting the parallel performance of the new developed algorithms. The results showed that message passing interface was an optimal method of implementing a parallelized bit-reversal algorithm.

Fulltext PDF Fulltext HTML

How to cite this article
Ali Khwaldeh, Amani Tahat and Jordi Marti, 2012. Software Refactoring: Solving the Time-Dependent Schrodinger Equation via Fast Fourier Transforms and Parallel Programming. Journal of Applied Sciences, 12: 2115-2127.

Keywords: Amdahl`s law, Software refactoring, fast Fourier transform, parallel computing, schrodinger equation and spectral method

INTRODUCTION

Refactoring of programming codes is a very useful technique to improve the quality of existing computational codes (Fowler, 1999). A series of small steps has to be applied for changing the internal structure of the required code during refactoring process. On the other hand, the system functionality and the main code external behavior would not be affected (Griswold, 1991). Refactoring improves the structure of a code, making it easier to maintain and to extend. So, the activity of refactoring has two general categories of benefits: maintainability and extensibility of a code. The first category can be achieved by reducing large well-named, single-purpose methods as well as monolithic routines into a set of individually concise ones, through moving a method to a more appropriate class or by removing misleading comments. Then processes of bug fixing would be effortless and the modified code would be more readable, easier to understand and grasp (Robert, 2008). Extensibility means that capabilities of the application could be easier to be extended by using recognizable design patterns which provide flexibility where never before may have existed (Kerievsky, 2004). Some types of refactoring techniques are illustrated in Table 1. Some of these might be only applied to certain languages or language types. A longer list can be found in Fowler’s Refactoring book (Fowler, 1999). This study presented the work of the authors on refactoring codes by using the programming language C++ via a message passing interface implementation of a new bit-reversal algorithm on multiprocessors. The new code outperforms an existing one included in the Fast Fourier Transform (FFT) program (Cooley and Tukey, 1965). Further, Sarkar et al. (1986), Peng et al. (2009), Jing et al. (2010), He et al. (2011) and Arioua et al. (2012), presented some applications of FFT algorithms in solving real-life problems. For an efficient implementation of the Discrete Fourier Transform (DFT) to an array of 2N samples, FFT could be considered as an optimized computational algorithm. DFT of N data samples can be mathematically defined as follows:

(1)

Hence:

(2)

Herein, WN is the Nth primitive root of unity, x is the original sequence and X is the FFT of x, k = 0, 1, 2,..., N-1.

Table 1: Types of refactoring techniques adopted from Fowler (1999)

WniK is defined as a NxN matrix. Generally, X, x and W are complex numbers. This can be commutated by using matrix-vector multiplication. This method required O (N2) complex multiplications. But using FFT can enhance fast DFT calculations. It works by dividing the input points into two sets (k sets in general), calculating the FFT of the smaller sets recursively and combining them together to get the Fourier transform of the original set. That will reduce the number of multiplications from O (N2) to O (N log N). Furthermore, the total time must be proportional to N2, i.e., it is an O (N2) algorithm. However, the algorithm could be optimized down to O (N log N) based on rearranging these operations, what makes a huge difference for large N. As far as FFT must be calculated into two steps, the first step would transform the original data array into a bit reverse order and then its implementation required the so-called bit-reversal reordering of the data for making the mathematical calculations of the second part much easier. But if bit-reversal is not done, performing the FFT could take a substantial fraction of the total computational time. For example, FFT has been widely used in computer based numerical techniques for solving electronic structure where bit-reversal algorithm could be considered as main part of any atomic structure software. If the bit-reversal code was not properly designed, FFT could take about 10-30% of the overall computational time, delaying calculations (Saad et al., 2010). Therefore, bit-reversal program must be carefully designed to enhance the quality of the code. In this study, a performance analysis of a parallel programming implementation of new bit-reversal algorithm (Kumar et al., 1994) on multiprocessors, has been developed via code refactoring of an existing library programs recently used in the computational process of solving the electronic atomic structure of an online atomic database (Tahat and Salah, 2011; Tahat et al., 2010; Tahat and Tahat, 2011).

So, the main goal of this study was the improvement of numerical techniques able to solve Schrodinger’s time-dependent equation, achieving higher speed and efficiency.

THEORY AND METHODS

Discrete Fourier transform using assumptions: Rewriting Eq. 1 and 2 by using a complex-valued function ψ(X) instead of X and implementing periodic boundary conditions ψ(X+L) = ψ(X) is shown in Fig. 1, including discrete function values, where the propagation of the wave function has been set to be in X direction in the range Xε[0, L]. Also ψ(X) has been discretized on N mesh points equally spaced by ΔX as: Xj = jΔX, with j = 0,…., N-1 (Fig. 1). Further, L, X, j, N, Xj represented, the length of the box where ψ is defined (in A°), the spatial coordinate (in atomic units), a counter index, the total number of spatial mesh points of the sample and the discretized step distance along X, respectively. On the other hand, Wave functions ψ(X) at coordinate and momentum spaces must satisfy boundary conditions ψ(X+L) = ψ(X) in the interval Xε[0, L]. So that ψ(0) = ψ(L) and ψ (+/-π N/L) = 0, respectively, so that the meaning of each term (Fig. 1a, b), (L, X, j, N, Xj) should be the same as before.

The expression of Eq. 3, shows that DFT presents the wave function ψ(X) as a linear combination of trigonometric components [exp(iKX) = cos(KX)+i sin(KX)] with different wave numbers:

(3)

Here, Km represents discrete wave numbers and can be defined as:

(4)

Equation 3 has periodicity L with the expansion coefficients ; where:

(5)

Fig. 1(a-b): Discrete function ψ(X) values and boundary conditions, (a) Discrete function ψ(X) values along X axis on N mesh points and equal mesh spacing ΔX = L/N, using a discrete step of Xj = jΔX (j = 0,......, N-1) and (b) Discrete function ψ(K) values at different wave numbers (K) in the range of [-π/ΔX, π/ΔX] at ΔX = L/N and even number N; using a discrete step of ΔK = 2π/L. Here, (m) denoted the wave numbers for the higher indices and blank wholes indicate periodic boundary conditions both in coordinates and momentum spaces

Accuracy of the assumed Fourier expansion: The choice of wave numbers in Eq. 4 guarantees that ψ(X) has L periodicity. Wave numbers Kn have been separated by:

therefore, equivalent wave numbers would be expected to be found, since because of the discrete sampling all are equivalent in real space (e.g., high values of wave numbers oscillate more but come back to the same value as their lower wave number counterparts at Xj). Moreover, mathematically, the discrete mesh points in real space cannot represent higher wave numbers. But physically the smallest-magnitudes of wave numbers where involved in these assumptions as far as they represent the lowest-energy state. For that reason, the wave numbers for the higher indices in Eq. 4 must be folded back by 2πN/L. For simplicity, it has been assumed that N is an even number. From Fig. 1b, all wave numbers are in the range:

Considering the discrete function ψj, as a vector in N-dimensional vector space, |ψ = ψ0, ψ1, ψ2,..., ψ0N-1, must be convenient to prove the correctness of Eq. 5. The plane-wave basis set of Eq. 6 is defined in this vector space:

(6)

Equation 6 is orthonormal, i.e., inner products of the basis functions are:

(7)

Then for m≠n, the sum of the geometric series is carried out; otherwise (m = n), all N summands must be 1/N, Eq. 8 and 9:

(8)

(9)

For the complete basis set, a discrete function ψj in N-dimensional vector space can be represented as a linear combination of N basis set functions bm(Xj) in particular:

(10)

Or:

(11)

So that, suppose the function is expanded as:

(12)

Multiplying both sides by, m| and using the orthonormality condition, Eq. 7, leads to:

(13)

Therefore, Fourier coefficients in Eq. 5 can be readily obtained from Eq. 10 by substituting the definitions of the basis functions and the inner product in Eq. 10. This leads to obtain:

(14)

Identification of the expansion coefficients in Eq. 3 and 5 can be done by comparing Eq. 14 with 3.

Solving time-dependent Schrodinger equation using spectral method: Most of the computational effort in electronic structure calculations development had often been focused on solving the time-dependent Schrodinger equation:

(15)

For the purpose of solving this equation numerically, DFT and FFT are the two required computable types of Fourier transformation. Here, represents the Hamiltonian matrix consisting of the sum of the kinetic T and potential operators V and can be defined as:

(16)

The ongoing work presents such numerical solution by re-factoring existing software based on a numerical technique, the so called spectral method (Canuto et al., 2006) which depends on Fourier transformation (Bracewell, 2000). Here, the exposition of Fourier transform of discretely sampled data and fast Fourier transform from Press et al. (1992) will be followed.

Consider the time-dependent Schrodinger equation (Eq. 15), to be implemented in one dimension in atomic unit. Thus:

(17)

Here, Hamiltonian operator in Eq. 14 can be defined in one dimension based on the kinetic and potential-energy operators, within the pseudopotential approximation (Heine, 1970) and the momentum-space formalism (Ihm et al., 1979), in which plane waves are used as the basis set for the eigenfunctions:

(18)

In momentum-space, the kinetic energy operator is diagonal and hence it trivially applies to a single trial eigenvector. However, the calculation of the potential energy would include multiplication in real-space as well as expensive convolution in momentum-space. The kinetic energy operator is diagonal in the Fourier (or momentum) space. To see this, in the present work T was operated on the wave function in its Fourier representation using the assumed discrete Fourier transform of Eq. 5:

(19)

i.e., the kinetic energy operator multiplied the factor, K2m/2, applies to the Fourier coefficient of the wave function:

On the other hand, recalling the potential energy operator is diagonal in the real space means that it multiplies a factor Vj = V(Xj) to the wave function:

This suggests an efficient algorithm for the time evolution of the wave function, where the kinetic and potential energy operators are diagonal in real and momentum space, respectively. Then Trotter expansion Xiao-Ping and Broughton (1987) can be used.

Trotter expansion and spectral method: Recalling the Trotter expansion technique (Xiao-Ping and Broughton, 1987), when used in the current simulation to compute the energy leads to:

(20)

Where:

can be defined as the time evolution operator that had arisen from the potential energy V and it can be easily operated in the real space as:

(21)

On the other hand, the operator exp(iTΔt) can be defined as the time evolution operator that had arisen from the kinetic energy T. This operator can be operated in the Fourier space as:

(22)

Fast Fourier transforms: In view of the fact that the computation of each of the N Fourier coefficients ψm involved summation over N terms, the computational time grows as O(N)2. Then computational cost associated with DFT would be one of the most important obstacles of implementing the spectral method. Herein, using the FFT algorithm will improve the computational effort by reducing the complexity to O(N log N) (Cooley and Tukey, 1965). FFT will allow the cheaply eigenvector transformation between real and momentum space. So potential energy operator can also be applied efficiently, making the quantum dynamics simulation less intensive from the computational point of view. Therefore, FFT has to be involved in the current numerical solution. Equation 20 can be rewritten in terms of forward (F) and inverse Fourier transformation (F-1) operators (Press et al., 1992) as follows:

(23)

Where:

(24)

(25)

The adopted FFT program being re-factored in this work follows Danielson-Lanczos procedures (Danielson and Lanczos, 1942). More detailed description of the each step in our adopted FFT can be found in reference (Press et al., 1992). In short, the input wave function values of the FFT array were first re-ordered by applying the bit-reversal operation to each wave function index. The Danielson-Lanczos procedures, such as in Eq. 26 and 27, then applied recursively, starting from the smaller sub-arrays up:

(26)

Where:

(27)

ψj0 and ψ1j represent N/2 element Fourier transforms consisting of even and odd sub-arrays, respectively. Since there are log2N recursive steps, the number of complex floating-point operations in the FFT algorithm would be 2log2N.

Computing the energy: Total energy can be computed by using a set of equations based on the time-dependent Schrodinger equation appearing in Eq. 17 which presented the total energy as a conserved quantity. Thus, by discretizing:

(28)

(29)

(30)

Here, expansion of the wave function in terms of its Fourier components (Eq. 3) is required for calculating the first term (i.e., the kinetic energy T):

(31)

(32)

(33)

(34)

(35)

Finally, by substituting Eq. 33 in Eq. 28, 29 could be rewritten as follows:

(36)

Parallel Algorithm for spectral method: Equation 23 and 36 present the algorithm for the spectral method which is the same for sequential and parallel codes. Parallel implementation of the algorithm involved parallelizing each of the following four steps: (1) nonlinear step, (2) forward F Eq. 25, (3) linear step and (4) backward F-1 Eq. 24. The difficulty of parallelizing spectral method algorithms arises in steps 2 and 4, because there are nontrivial data dependences over the entire range 0≤L≤N, involving forward and backward Fourier transforms (FFT and butterfly transform, BFT), respectively. On the other hand, steps 1 and 3 could be trivially evolved because of the natural independence of the data. The problem of parallelizing the one-dimensional FFT has been of great interest for vector (Averbuch et al., 1990; Swartztrauber, 1987) and distributed memory computers (Dubey et al., 1994). Moreover, communication issues and memory hierarchy are the most two important parameters to be recognized in the paralyzed algorithms in order to achieve an enhanced speed up of the one dimensional FFT.

These one-dimensional algorithms are architecture dependent and involve efficient methods for the data rearrangement and transposition phases (Averbuch et al., 1990). The complete parallel algorithm consists of the following steps:

Nonlinear step
Row FFT
Multiplication by
Bit-reversal algorithms with respect column FFT
Linear step (transposed linear operator)
Column BFT
Multiplication by the complex conjugate of , ( *)
Row BFT

The parallelism would be due to the row and column subarray FFTs in steps 2, 4, 6 and 8 in addition to the independent operations in steps 1, 3, 5 and 7.

Distributed-memory approach: The Message Passing Interface (MPI) (Karniadakis and Kirby, 2003) can be considered as a tool for distributed parallel computing that has become an “ad hoc” standard (Gropp et al., 1998). In distributed parallel programming, different processors would work on completely independent data and explicitly used to send and receive library calls to communicate data between processors (Snir et al., 1996; Nehra et al., 2007). To implement the distributed parallel algorithm for the time-dependent Schrodinger equation, in this work the rows of FFT arrays will be distributed among 4 processors. Fast communication between them would be the main reason of successful implementation of the algorithm. The large performance cost in this algorithm is due to the redistribution of data between row and column operations. If the row and column computational stages result in significant speed up compared with the communication expense incurred in redistributing the matrix data, then the algorithm will be successful. When speeding up the process of commutation of the current parallel code, bit-reversal problem can be solved by swapping columns and/or rows of a matrix of size X (n)(n) in the FTT arrays. In particular, bit-reversal swapping can be done in a specific time, causing a problem with communication between parallel processes. Consequently, communication took a longer time than the bit-reversal time. The present work has proposed a solution for this problem based on a new algorithm of bit reversal procedure of FFT arrays, by solving the conflict in time between the speed of bit-reversal swapping and the communication between parallel processes on multiprocessors environments. The new algorithms patter performance with parallel processes. In parallel programming, the need for communications between tasks depends upon the nature of the proposed tasks. To implement the final step, the scatter method will be adopted. In the scatter operation, a single node sends a unique message of size m to every other node (also called a one-to-all personalized communication), in order to distribute the data into processors. A unique message from each node would be collected by a single node during the gather operation. In the meantime, gathered operation could be defined as the inverse of the scatter operation and could be executed as such (Fig. 2).

Example: In order to simply explain the proposed algorithms, a compression of the new developed bit-reversal algorithm and the current used algorithm is presented in Appendices 1 and 2. In Appendix 1, it is reported how to distribute the following array of input data: A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] into two processors based on the new algorithms. In the meantime, this array can also be distributed into 4 processors as shown in Fig. 3. Further, in Appendix 3 the parallel source code of bit-reversal algorithm is presented.

Advanced example of distributing data on 4 processors using the new bit-reversal algorithm: Breaking the problem into discrete "chunks" of work, must be the first step of designing a parallel program, these chunks could be distributed to multiple tasks as well (Kumar et al., 1994).

Fig. 2: The scatter method (Scatter and Gather)

The steps of distributing the data of a matrix of 8x8 sizes on 4 processors are presented in Fig. 4 based on four proposed chunks as follows:

The one dimensional array fulfilled with the input data could be arranged in columns as shown in Fig. 4a
The bit reversal for each column would be found, simply as shown in Fig. 5a, along with swapping between the columns as shown in Fig. 5b, the significant results of these two processes are illustrated in Fig. 4b
Distributing the data into specified processors as appeared in Fig. 4c
Finding the bit reversal for each row then swapping between the rows must be done. The final results would be appeared in the last step as shown in Fig. 4d

Fig. 3: Distributing the vector A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] in 4 processors

On the other hand, number of process must be in base 2x2"; (n = 0, 1, 2, 3.., etc.) for any arrangement (e.g., 2, 4, 8, 16, 32, 64, 128, 256, etc.); (Kumar et al., 1994). Herein, the data of the (8x8) matrix could be also distributed into 8 processors. Figure 6 presented the significant results of the redistributed data.

Fig. 4(a-d): Example of an 8x8 matrix, (a) Arrange table, (b) Take the bit reversal of column and then swap the columns, (c) Distribute data on several processes (4 processes) and (d) Communication between the processors takes the bit reversal of row and then swap the rows

Fig. 5(a-b): Explanation of bit reversal and swap of columns, (a) Bit-reversal of columns and (b) Swap between the columns

Fig. 6: Distributing the data of the matrix (8x8) into 8 processes

RESULTS

After refactoring the existing code and including the new bit-reversal algorithms for reordering the input data and to do the FFT matrix in the spectral method, the one-processor implementation of the bit reversal parallel code has become 8.7 to 17.3% faster than that of the sequential code (Table 5). The produced parallel code has been tested by using a portable implementation of MPI, called MPICH (http://www.mcs.anl.gov/research/projects/mpich2/).

Program input parameters:

Matrix dimension
A (1, 1) = 0, 1, 2… any random number
Fill raw/column
Number of processors (n)

Program outputs: Table 2-5 present the computed results for distributed memory and MPI implementations of the new bit-reversal algorithm for different square matrix of size X (n)(n) of FTT arrays. They have been produced and tested using three methods for computing the parallel speed up, in order to measure the program scalability and to show how program could scale as more processors would be used.

Speed up for various matrix size X (n)(n) at different fractions (P) and fixed number of processors (n = 4): As shown in Table 2, with four processors (n = 4), the speed up increases with the working set size X(n). The measured results showed that the speed up eventually decreased with larger X(n) at various values of fraction p. The maximum and minimum dimensionless speed up ratios have been 3.8352 and 2.9110 occurred at p = 0.9856; X (n) = 2x2 and p = 0.8753; X (n) = 19x19, respectively. The speed up values can be very close to each other at same values of parallel fractions p, far away from the matrix size. For example the speed up was equalled (3.7352) at X (n) = 3x3, p = 0.9763 which was very close to the speed up of X (N) = 4x4; (3.7123) at p = 0.9741. The measured efficiency is closely related to speed up at fixed n and dependent proportionally of matrix size. All values had been less than one and ranged between 0.9588 and 0.7277. The communication cost for the MPI code as a function of efficiency T1/E(n) has become close to perfect speed up showing an excellent agreement with the expected ideal cost that may be calculated by multiplying the number of used processors (n = 4) by parallel processing time (T4).

Speed up at fixed fraction (P) and fixed matrix size X (n)(n) on different number of processors (n): For fixed parallel fraction (p = 0.9545), the speed up and the cost rapidly increased with larger number of processors for the same matrix size (8x8), with decreasing efficiency. The maximum speed up has been 20.3134 and occurred at the highest number of processors n = 256 (Table 3).

Speed up at large matrix size X (n)(n) with various fractions (P): The results of Table 4 and 5 revealed that the speed up increased as the problem size X (n) became larger, for high values of parallel fraction; (e.g., p ≈ 0.9554) with increasing the number of processors.

Table 2: Results for distributed-memory and MPI implementations of (2x2)≥X(n)n≤(19x19), at n = 4, A (1, 1) = 0, with filling columns, repeat = 1000

Table 3: Speed up for implementing a matrix of size (8x8) at (p = 0.9545; n = 4)

Table 4: Speed up at large matrix size

Table 5: Output of running the re-factored code after including new bit-reversal algorithm

However, speed up eventually decreased with larger X (n) on the same processor at lowered values of parallel fraction (p = 0.8753). Generally speaking, all measured speed up of the large matrix has increased when increasing the number of processors based on p. In all cases p was less than one and maximum speedup was around 20.

DISCUSSION

Parallel performance analysis: The reported results in this work indicated that parallel implementations of distributed memory for the bit-reversal of FFT arrays over the range (2x2)≥X(n)n≤(19x19), produced important speed up of the calculations, with maximum values at high parallel fraction, beyond which the communication cost increased and the computation/communication ratio decreased. The communication cost for the MPI code has been a result of “communication stages”; the less than perfect speed up is thus due to the volume of data communicated between processors in redistribution stages and not to the constant sharing of small sub-array data. Many studies in the literature had covered such topic. For example, Yan and Zhang (1999), focused on investigating the relationship between cost-effective parallel computing along with profit-effective parallel computing. In the present work the cost results match their ideal result. With four processors (n = 4), the speed up increased with the working set size X(n) based on (p). This has been due both to the faster computation stages and to the reduced volume of data communicated between single processors in the redistribution stage. The division of the problem among four processors would not be beneficial in the case of small problem sizes, because there would not be enough work to make. In the main time, the speed up could be attributed to data independency contained in the processor’s local cache between stages of data rearrangement. These observations are in excellent agreement with the qualitative trends seen in Amdahl’s law (Amdahl, 1967), empirical data on speed up for parallel computing method, where the maximum speed up was attained with a problem size. Therefore, the current results showed that if the fraction p is not high enough for effective scaling, there is no point in further tuning until it had been increased (e.g., 8x8 matrix, p = 0.9545), increasing the number of processors will not affect the value of the speed up at very low fractions (e.g., p = 0.1). For this reason, parallel computing will be only useful for either small numbers of processors (n = 4) or problems with very high values of p. The present results also show a good agreement with those reported by Gustafson (Gustafson, 1988). The efficiency estimated the processors affectivity in solving problems, compared to the wasted effort in the communication and synchronization processes. In addition, efficiency had to be ranged between 0 and 1. Thus, this work showed that all measured values of efficiency were less than one. This showed a good agreement with expected efficiency according to Alba’s work (Alba, 2002). Finally a comparison between spectral method techniques of solving the time dependent Schrodinger equation can be found in (Feit et al., 1980; Pshenichnov and Sokolov, 1964; Zhirnov and Turev, 1980). The difference between these studies and the current work is the present use of the distributed memory parallel programming in solving a critical part of the fast Fourier transform that reduces the time of solving the equation which has lead to increase the speed up and to reach the ideal value of Amdahl’s law (Amdahl, 1967), where S(n) must equal the number of processors (n). In this work all measured speed up on 4 processors have been very close to 4. On the other hand this work could be retested by using the sharing memory approach in solving the Schrodinger equation other than the distributed parallel programming.

The spectral method commonly used in the numerical solution of the time-dependent Schrodinger equation often proved very slow in serial versions, even on the fastest workstations. This study has successfully implemented and tested the parallelization of new bit-reversal algorithms that have been shown to be appropriated for multiprocessors that outperformed an existing one. Distributed bit-reversal FFT speed up is a function of the number of processors (n) which reduces both the computational time and the communication volume between single processors. The speed up of the parallel algorithm has been strongly dependent on reductions in both communication time (Tn) and contention in the multiprocessor. Thus, MPI has been revealed to be an optimal method of implementing parallelized bit-reversal algorithms.

APPENDIX

Appendix 1: Bit-reversal algorithm after refactoring

Appendix 2: Bit-reversal algorithm before refactoring

Appendix 3: C++ (MPI) code

REFERENCES

  • Tahat, A.N., W. Salah and A.B. Hallak, 2010. PASS (Pixe analysis shell software): A computer utility program for the evaluation of PIXE spectra. Int. J. Pixe, 20: 63-76.
    CrossRef    


  • Alba, E., 2002. Parallel evolutionary algorithms can achieve super-linear performance. Inform. Process. Lett., 82: 7-13.
    Direct Link    


  • Tahat, A. and W. Salah, 2011. Comprehensive online atomic database management system (DBMS) with highly qualified computing capabilities. Int. J. Data Database Manage. Syst., 3: 1-20.
    CrossRef    


  • Tahat, A. and M. Tahat, 2011. Python GUI scripting interface for running atomic physics applications. Python Paper Source Codes, 3: 1-7.
    Direct Link    


  • Amdahl, G.M., 1967. Validity of the single processor approach to achieving large scale computing capabilities. Proc. AFIPS Spring Joint Comput. Conf., 30: 483-485.


  • Fowler, M., 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley, New York, USA., ISBN-13: 9780201485677, Pages: 431


  • Griswold, W.G., 1991. Program restructuring as an aid to software maintenance. Ph.D. Thesis, University of Washington


  • Robert, M., 2008. Clean Code: A Handbook of Agile Software Craftsmanship. 1st Edn., Prentice Hall, New York, ISBN-10: 0132350882
    Direct Link    


  • Kerievsky, J., 2004. Refactoring to Patterns. 1st Edn., Addison Wesley, USA., ISBN-13: 978-0321213358, Pages: 400


  • Press, W.H., B.P. Flannery, S.A. Teukolsky and W.T. Vetterling, 1992. Numerical Recipes in Fortran the Art of Scientific Computing. 2nd Edn., Cambridge University Press, New York


  • Saad, Y., J.R. Chelikowsky and S.M. Shontz, 2010. Numerical methods for electronic structure calculations of materials. SIAM Rev., 52: 3-54.
    CrossRef    Direct Link    


  • Yan, Y. and X. Zhang, 1999. Profit-effective parallel computing. IEEE Concurrency, 7: 65-69.
    CrossRef    


  • Kumar, V., A. Grama, A. Gupta and G. Karypis, 1994. Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings Publishing Co., IPC., North-Holland, ISBN: 9780805331707, Pages: 597


  • Canuto, C., M.Y. Hussaini, A. Quarteroni and T.A. Zang, 2006. Spectral Methods: Fundamentals in Single Domains. Springer-Verlag, Berlin, ISBN: 9783540307266, Pages: 563


  • Bracewell, R.N., 2000. The Fourier Transform and its Applications. 3rd Edn., McGraw-Hill, Boston, ISBN: 0073039381
    Direct Link    


  • Heine, V., 1970. The Pseudopotential Concept. In: Solid State Physics, Ehrenreich, H., F. Seitz and D. Turnbull (Eds.). Vol. 24, Academic Press, New York, USA., pp: 1-36


  • Ihm, J., A. Zunger and M.L. Cohen, 1979. Momentum-space formalism for the total energy of solids. J. Phys. C: Solid State Phys., 12: 4409-4422.
    CrossRef    Direct Link    


  • Xiao-Ping, L. and J.Q. Broughton, 1987. High-order correction to the Trotter expansion for use in computer simulation. J. Chem. Phys., 86: 5094-5100.
    Direct Link    


  • Cooley, J.W. and J.W. Tukey, 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comput., 19: 297-301.
    CrossRef    


  • Danielson, G.C. and C. Lanczos, 1942. Some improvements in practical fourier analysis and their application to X-ray scattering from liquids. J. Franklin Inst., 233: 365-380.


  • Karniadakis, G.E. and R.M. Kirby, 2003. Parallel Scientific Computing in C++ and MPI: A Seamless Approach to Parallel Algorithms and their Implementation. Cambridge UniversityPress, UK., ISBN-13:9780521817547, Pages: 628


  • Gropp, W., S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir and M. Snir, 1998. MPI: The Complete Reference. The MPI-2 Extensions. vol. 2, MIT Press, Cambridge, MA., ISBN-13:9780262571234, Pages: 344


  • Snir, M., S. Otto, S.H. Lederman, D. Walker and J. Dongarra, 1996. MPI: The Complete Reference. MIT Press, London, ISBN-10: 0262692155
    Direct Link    


  • Dubey, A., M. Zubair and C.E. Grosch, 1994. A general purpose subroutine for fast Fourier transform on a distributed memory parallel machine. Parallel Comput., 20: 1697-1710.
    Direct Link    


  • Swartztrauber, P.N., 1987. Multiprocessor FFTs. Parallel Comput., 5: 197-210.
    Direct Link    


  • Averbuch, A., E. Gabber, B. Gordissky and Y. Medan, 1990. A parallel FFT on an MIMD machine. Parallel Comput., 15: 61-74.
    Direct Link    


  • Gustafson, J.L., 1988. Reevaluating amdahl's law. Commun. ACM, 31: 352-533.


  • Feit, M.D., J.A. Jr Fleck, and A. Steiger, 1980. Solution of the schrtidinger equation by a spectral method. J. Comput. Phys., 47: 412-433.
    Direct Link    


  • Pshenichnov, E.A. and N.D. Sokolov, 1964. Eigenvalues and quantum transition probabilities for an asymmetrical double potential well. Optics Spectrosc., 17: 183-183.
    Direct Link    


  • Zhirnov, N.I. and A.V. Turev, 1980. New approximate solution of the anharmonic oscillator problem. Optics Spectrosc., 49: 142-144.
    Direct Link    


  • Nehra, N., R.B. Patel and V.K. Bhat, 2007. Load balancing with fault tolerance and optimal resource utilization in grid computing. Inform. Technol. J., 6: 784-797.
    CrossRef    Direct Link    


  • Arioua, M., S. Belkouch and M.M. Hassani, 2012. Efficient 16-points FFT/IFFT architecture for OFDM based wireless broadband communication. Inf. Technol. J., 11: 118-125.
    CrossRef    Direct Link    


  • He, C., F. Lang and H. Li, 2011. Medical image registration using cascaded pulse coupled neural networks. Inform. Technol. J., 10: 1733-1739.
    CrossRef    


  • Jing, M., Z. Nai-Tong and Z. Qin-Yu, 2010. IR-UWB waveform distortion analysis in NLOS localization system. Inform. Technol. J., 9: 139-145.
    CrossRef    Direct Link    


  • Sarkar, T., E. Arvas and S. Rao, 1986. Application of FFT and conjugatevgradient method for the solution of electromagnetic radiation from electrically large and small conduction bodies. Trans. Antennas Propagat., 34: 635-640.
    CrossRef    Direct Link    


  • Peng, W., X. Wang, K. Chen and B. Tang, 2009. The analysis of the synthetic range profile based on doppler filter bank using FFT. Inform. Technol. J., 8: 1160-1169.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved