HOME JOURNALS CONTACT

Journal of Applied Sciences

Year: 2012 | Volume: 12 | Issue: 14 | Page No.: 1501-1506
DOI: 10.3923/jas.2012.1501.1506
Determination of the Probability Plotting Position for Type I Extreme Value Distribution
Ahmad Shukri Yahaya, Norlida Md. Nor, Nor Rohashikin Mat Jali, Nor Azam Ramli, Fauziah Ahmad and Ahmad Zia Ul-Saufie

Abstract: In this study the emphasis is placed on choosing the best plotting position used for the estimation of the parameters of the Gumbel distribution. The Type I extreme value distribution that is also known as Gumbel distribution has been used frequently to predict return periods in many engineering studies. The location parameter (μ) and scale parameter (σ) of the Gumbel distribution was estimated using the regression method. Simulation was used to obtain random variables of twenty Gumbel distributions with different location and scale parameters. The number of sample sizes chosen is 10, 20, 50, 60, 80 and 100. The sample sizes was divided into three different criteria namely small sample sizes for n = 10 and n = 20; medium sample sizes for n = 50 and n = 60 and large sample sizes for n = 80 and n = 100. This simulation study was replicated ten times. Seventeen plotting position formulae were chosen for this study namely Adamowski, Beard, Blom, Chegodayev, Cunnane, Gringorten, Hazen, Hirsch, IEC56, Landwehr, Laplace, Mc Clung, Tukey, Filliben estimator, Weibull, Gumbel and Anon. These plotting positions were used to estimate the Cumulative Distribution Function (CDF) of Gumbel Distribution. Performance measures comprising of two error measures and two accuracy measures were used to determine the best probability plotting position. It was found that different plotting positions perform well for different sample sizes. The best plotting method for small sample size is Landwehr and Hazen, for medium sample size the most suitable method is Landwehr and for large sample size the best method is the Blom method.

Fulltext PDF Fulltext HTML

How to cite this article
Ahmad Shukri Yahaya, Norlida Md. Nor, Nor Rohashikin Mat Jali, Nor Azam Ramli, Fauziah Ahmad and Ahmad Zia Ul-Saufie, 2012. Determination of the Probability Plotting Position for Type I Extreme Value Distribution. Journal of Applied Sciences, 12: 1501-1506.

Keywords: simulation, error measures, Gumbel distribution, accuracy measures and probability plotting formula

INTRODUCTION

Graphical procedures form a very useful visual method of verifying whether a theoretical distribution fits an empirical distribution. This procedure is also known as the regression method and it is important in engineering especially in air pollution study, hydrological study and structural study.

The Gumbel distribution is also known as the Type I extreme value distribution and has been used successfully to estimate return periods. Yahaya et al. (2006) fitted the Gumbel distribution to the carbon monoxide (CO) concentration in Penang. They found that the method of moments is better than the maximum likelihood estimator for predicting parameters of the Gumbel distribution. Almaawali et al. (2008) have used the Gumbel distribution to estimate wind loads in Oman. They compared the Gumbel plotting formula with the Gringorten plotting formula to determine the accuracy of the estimates. They found that the Gumbel plotting formula gives the best estimate for wind loads. Makkonen (2008) compares ten plotting positions that have been used to estimate hazards to structural safety. They found that using these plotting positions results in underestimation of the risk. De (2000) introduced a new plotting position to predict the estimators of the Gumbel distribution. This new plotting position was found to be as good as the Gringorten plotting formula. Ying and Pandey (2005) used the Weibull plotting formula to predict extreme wind speed. They found that this estimator is good for estimating the 50/500 year design wind speed. Haktanira and Bozduman (1995) compares the effect of using four plotting positions when using the probability-weighted moments for estimating parameters four different distributions. They found that the Landwehr plotting formula are better or almost as good for all four distributions that have been considered.

This study compared seventeen different plotting positions by using graphical procedure that can be used to estimate the location parameter (μ) and scale parameter (σ) of the Type I extreme value distribution which is also known as the Gumbel distribution.

MATERIALS AND METHODS

Gumbel distribution: A continuous random variable X has a Gumbel distribution (Bury, 1999) if the Probability Distribution Function (PDF) is in the form of:

(1)

where, σ > 0, -∞ < x, μ < |.

The Cumulative Distribution Function (CDF ) of the Gumbel distribution is defined as:

(2)

where, σ > 0, -∞ < x, μ < |.

This parameter μ is a location parameter and σ is a scale parameter.

Plotting procedure: Commonly, engineers will plot the observation against the estimated Cumulative Distribution Function (CDF) of the Gumbel distribution on a probability paper. Then, a line is fitted using the least squares method commonly used in fitting regression lines. The intercept and gradient of the straight line will provide the estimated values for the location and scale parameters of the Gumbel distribution.

By using transformations of the CDF of the Gumbel distribution Equation 2), a straight line is obtained as follows:

(3)

where, x(i) is the ordered observations for x.

Thus, plotting -In(-In(Pi) versus the ordered data x(i) results in approximately a straight line if the data is from a Gumbel process. The parameters μ a location parameter and σ a scale parameter could be estimated from the intercept a and the slope b of the plot.

Plotting position: Seventeen plotting positions were used in this study as shown as Table 1. These formulae are the plotting positions that have been used frequently in the literature (De, 2000; Adeboye and Alatise, 2007; Jay et al., 1998; Forthegill, 1990) estimating by Makkonen (2008) and Hynman and Fan (1996).

Performance indicators: Performance Indicators (PI) were used to measure how well the plotting formulae predict the observed values of the Gumbel distributions. Two error measures that is the Normalized Absolute Error (NAE) and the Root Mean Square Error (RMSE) and two accuracy measures that is Prediction Accuracy (PA) and coefficient of determination (R2) were used.

Table 1: The plotting positions

To produce a good estimator, the error measures must approach zero and the accuracy measure should approach one.

Normalized absolute error (NAE): NAE is a statistics indicator which measures the error between the predicted values and observed values of a model or estimator (as a comparison of residual error). The error is calculated according to Yahaya et al. (2008):

(4)

where, n is the number of observations, Oi is the observed values and Pi is the predicted values.

The root mean squared error (RMSE): RMSE is the difference between the observed values and the predicted values. It gives an average error of the estimator. The formula is defined as (Junninen et al., 2004):

(5)

where, n is the number of observations, Oi is the observed values and Pi is the predicted values.

Prediction accuracy (PA): PA is statistical indicator which measures the accuracy of the model or estimator. The formula is defined as (Yahaya et al., 2008):

(6)

where, n is the number of observations, Oi is the observed values, Pi is the predicted values, ō is the average of the observed values, is average of predicted values, σp is standard deviation of observed values and σo is standard deviation of observed values. A good estimator or model should have value of PA closer to one.

Coefficient of determination (R2): R2 is the proportion of variability in a set data that is accounted for a statistical model. The formula is defined as (Junninen et al., 2004):

(7)

where, n is the number of observations, Oi is the observed values , Pi is the predicted values, ō is the average of the observed values, is average of predicted values, σp is standard deviation of observed values and σo is standard deviation of observed values. A good estimator or model should have value of PA closer to one.

DATA

Twenty different parameters of the Gumbel distribution were randomly selected to represent the various forms of the distribution. The simulated parameter values for location parameter (μ) ranges from a minimum of 16 to a maximum of 80 and the scale parameter (σ) values range from 122 until 339. Sample of sizes 10, 20, 50, 60, 80 and 100 were used. Each of the sample sizes were replicated five times for each Gumbel distribution. For each sample sizes, 10000 random variables were generated. Multiplicative congruential generator (Law and Kelton, 2000) of the form:

(8)

was used to simulate the random variables. The multiplicative congruential generator given in Eq. (8) was found to have good statistical properties.

RESULTS AND DISCUSSION

Seventeen probability plotting positions were determined that can be used for the estimation of the parameters of Gumbel distributions using regression method. The predicted random variables for each of the plotting positions were compared with the observed simulated random variables and the error and accuracy measures were obtained. Since twenty Gumbel distributions were simulated the average of the performance indicators were obtained. The results of the error measures that are the normalized absolute error and the root mean square error are given in Table 2 and 3, respectively. The sample sizes was divided into three different criteria namely small sample sizes for n = 10 and n = 20; medium sample sizes for n = 50 and n = 60 and large sample sizes for n = 80 and n = 100.

From Table 2 and 3, it is clearly shown that for all chosen plotting positions, the error obtained by NAE and RMSE tends to zero as the sample sizes increases. The results obtained for both NAE and RMSE error measures show that Landwehr method is the best for small and medium sample sizes and the Blom method is the best for large sample size.

Table 4 and 5 give the results for the accuracy measures that are the average prediction accuracy and average coefficient of determination. The accuracy measures tend to be better as the sample sizes increases.

Table 2: Average normalized absolute error values
PPP: Probability plotting position

Table 3: Average root mean square error values
PPP: Probability plotting position

Table 4: Average prediction accuracy values
PPP: Probability plotting position

For the average prediction accuracy, even for small size, the value is above 0.90 and for large sample the value approaches 0.99. However, for the average coefficient of determination the values range from 0.69 for small sample size and increases to 0.99 for large sample sizes. For both these accuracy measures, the Gringorten method is the best for small sample, the Laplace method is the best for medium sample and the Weibull method is the best for large sample size.

Table 5: Average coefficient of determination values
PPP: Probability plotting position

Overall, when all the prediction indicators were compared together, it was found that the best plotting method for small sample size is Landwehr and Hazen, for medium sample size the most suitable method is Landwehr and for large sample size the best method is the Blom method. This result is similar to those obtained by Haktanira and Bozduman (1995) although the distributions used are different. Yahaya et al. (2012) did a simulation study to determine the best probability plotting position when the underlying distribution is the Weibull distribution. They found that for all sample sizes, the best plotting position is the Gringorten method followed by the Cunnane and Blom methods respectively. Adeboye and Alatise (2007) fitted the normal and log-Pearson Type III distributions to the peak flow discharge of two rivers in Nigeria using seven probability plotting positions. They concluded that the Weibull method is suitable for fitting the normal distribution while the Anon method fits well the log-Pearson Type III distribution. Carter and Challenor (1983) used simulated data to fit the parameters of the Gumbel distribution using Gumbel method, Gringorten method, expected plotting position and maximum likelihood estimation. They found that the Gumbel method is the least satisfactory. Goel and Seth (1989) studied various probability plotting positions to fit the Gumbel distribution using various statistical criteria. They concluded that the best probability plotting position is the Gringorten method. Guo (1990) also showed that the Gringorten method is the best method to fit the Gumbel distribution. Ying and Pandey (2005) investigated the use of eleven probability plotting positions to fit the Gumbel distribution using flow discharge data obtained from the Surma basin. They found that the Weibull method fits best the Gumbel distribution. Suhaiza et al. (2007) fitted the Gumbel distribution to investigate the magnitude and frequency of floods using data from nineteen stations in Malaysia. They found that the Gringorten method is the best method to fit the Gumbel distribution.

CONCLUSION

Choosing the best probability position is important for estimating parameters of distributions and hence in determining the best estimate for return periods. This study uses the simulation method to obtain random observations from twenty different Gumbel distributions. Seventeen plotting positions were used to estimate the cumulative distributions. Two different error measures and two different accuracy measures were used for determining the performance of these plotting positions.

For error measures, it was found that the Landwehr plotting formula should be considered for small and medium sample sizes and the Blom method is shown to be good for large sample size. For accuracy measures, it was found that the Gringorten plotting formula is suitable for small sample size, the Laplace formula is good for medium sample size and the Weibull formula is the best for large sample size. However, when both the error measures and the accuracy measures are taken into consideration simulateneously, the Landwehr and Hazen plotting formulae perform well for small sample, the Landwehr formula is suitable for medium sample and the Blom method is good for large sample. The Landwehr plotting formula is the best when we consider for all sample sizes.

REFERENCES

  • Adeboye, O.B. and M.O. Alatise, 2007. Performance of probability distributions and plotting positions in estimating the flood of River Osun at Apoje Sub-basin, Nigeria. Agric. Eng. Int.: CIGR J., Vol. 9.


  • Almaawali, S.S.S., T.A. Majid and A.S. Yahaya, 2008. Determination of basic wind speed for building structures in oman. Proceedings of the International Conference on Constructions and Building Technology, June 16-20, 2008, Kuala Lumpur, pp: 235-244-.


  • Carter, D.J.T. and P.G. Challenor, 1983. Methods of fitting the fisher-tippett type 1 extreme value distribution. Ocean Eng., 10: 191-199.
    CrossRef    


  • Bury, K.V., 1999. Statistical Distributions in Engineering. Cambridge University Press, USA


  • De, M., 2000. A new unbiased plotting position formula for gumbel distribution. Stochastic Envir. Res. Risk Asses., 14: 1-7.
    CrossRef    


  • Forthegill, J.C., 1990. Estimating the cumulative probability of failure data points to be plotted on weibull and other probability paper. Electr. Insulation Transact., 25: 489-492.
    Direct Link    


  • Goel, N.K. and S.M. Seth, 1989. Studies on plotting position formulae for Gumbel distribution. J. Inst. Eng. (India): Civil Eng. Division, 70: 121-126.
    Direct Link    


  • Guo, S.L., 1990. Discussion on unbiased plotting positions for the general extreme value distribution. J. Hydrol., 121: 33-44.
    CrossRef    


  • Haktanira, T. and A. Bozduman, 1995. A study on sensitivity of the probability-weighted moments method on the choice of the plotting position formula. J. Hydrol., 168: 265-281.
    CrossRef    


  • Hyndman, R.J. and Y. Fan, 1996. Sample quantiles in statistical packages. Am. Stat., 50: 361-365.
    Direct Link    


  • Jay, R.L., O. Kalman and M. Jenkins, 1998. Integrated planning and management for Urban water supplies considering multi uncertainties. Technical Report, Department of Civil and Environmental Engineering, Universities of California.


  • Law, A.M. and M.D. Kelton, 2000. Simulation Modelling and Analysis. 3rd Edn., McGraw-Hill, New York


  • Makkonen, L., 2008. Problem in the extreme value analysis. Structural Safety, 30: 405-419.


  • Suhaiza, S.O., S. Salim and J.P. Frederik, 2007. Flood frequency analysis for Sarawak using Weibull, Gringorten and L-moments formula. J. Inst. Eng., 68: 43-52.
    Direct Link    


  • Yahaya, A.S., N.A. Ramli and N.F.F.M. Yusof, 2006. Fitting extreme value distribution to CO data. Proceedings of the 3rd Bangi World Conference on Environmental Management, September 5-6, 2006, Bangi, pp: 324-330.


  • Yahaya, A.S., H.A. Hamid, N.M. Noor and L.M. Meei, 2008. Comparisons between method of moments and method of maximum likelihood for estimating parameters of the Weibull distribution. Proceedings of the 16th National Symposium on Mathematical Science Spurring Excellence in Thinking, June 2-5, 2008, University of Malaysia Trengganu, pp: 176-182.


  • Ying, A. and M.D. Pandey, 2005. A comparison of method of extreme wind speed estimation. J. Wind Eng. Indust. Aerodynamics, 93: 535-545.
    CrossRef    


  • Yahaya, A.S., S.Y. Chong, N.A. Ramli and F. Ahmad, 2012. Determination of the best probability plotting position for predicting parameters of the Weibull distribution. Int. J. Applied Sci. Technol., 2: 106-111.
    Direct Link    


  • Hirsch, R.M., 1981. Estimating Probabilities of Reservoir Storage for the Upper Delaware River Basin. Vol. 81, U.S. Geological Survey, Reston, VA., USA., Pages: 36


  • Cunnane, C., 1978. Unbiased plotting positions: A review. J. Hydrol., 37: 205-222.
    CrossRef    Direct Link    


  • Junninen, H., H. Niska, K. Tuppurainen, J. Ruuskanen and M. Kolehmainen, 2004. Methods for imputation of missing values in air quality data sets. Atmos. Environ., 38: 2895-2907.
    CrossRef    

  • © Science Alert. All Rights Reserved