INTRODUCTION
Analyzing daily rainfall data is increasing and becoming important in
order to provide a more reliable prediction in climatic events. Modeling
daily rainfall occurrence is one of the important aspects in analyzing
rainfall data. The development of rainfall occurrence models is continuously
being explored by a number of researchers in the field since the 20th
century. In selecting the most successful model to describe the distribution
of the rainfall event, the model with the less number of parameters is
preferred. However, there have been cases where the model with more parameters
did not show a significant fit. Therefore, it is necessary to develop
a more appropriate model which can be used for various applications including
the water resource management in the hydrological and agricultural sectors.
Several probability models such as geometric distribution (GD), compound
geometric distribution (CGD), geometric log series distribution (GMLS),
log series distribution, modified log series distribution and truncated
negative binomial have been applied for the distribution of dry (wet)
spells. These probability models had been applied by a number of researchers
in their respective areas of study (Deni et al., 2008; Deni and
Jemain, 2008; Tolika and Maheras, 2005; Anagnostopoulou et al.,
2003; Wilks, 1999; Chapman, 1997). The improvement on the probability
models by considering the mixture distributions is continuously being
explored by a number of researchers in the field. The mixture of the two
geometric models (MGD) proposed by Racsko et al. (1991) fitted
well to the distribution of dry spells in Hungary. They found that the
duration of wet spells could be approached by a single geometric distribution,
while for long dry spells the mixed distributions were more appropriate.
In addition, Deni and Jemain (2008) reported that the MGD was also found
to successfully fit the sequence of dry days over Peninsular Malaysia
during two different periods, 1940s to 1976 and 1977 to 2004. Apart from
that, the mixture of geometric and Poisson distribution (MGPD) which was
first introduced by Dobi-Wantuch et al. (2000) successfully fitted
the distribution of wet (dry) spells for the two stations in Hungary.
However, the findings of Deni and Jemain (2008) indicated that MGPD was
found not to successfully fit the distribution of dry spells over Peninsular
Malaysia.
In order to further enhance the development of the geometric probability
models, this present study is aimed to propose a mixture of geometric
with truncated Poisson distributions (MGTPD) as the alternative probability
model to describe the occurrence in daily rainfall events. The daily rainfall
data from five selected stations over Peninsular Malaysia for the period
of 1975 to 2004 will be used to demonstrate the performance of the MGTPD
and the existing models. In selecting the best fitting probability model
to represent the distribution of wet spells at this station, a Chi-square
goodness-of-fit test is used.
MATERIALS AND METHODS
The study area and data: Peninsular Malaysia lies entirely in
the equatorial zone which is situated in the northern latitude between
1 and 6 °N and the eastern longitude from 100 to 103 °E. There
are two types of monsoons that influence the climate of the country, namely,
the Southwest monsoon (May to August) and the Northeast monsoon (November
to February). Table 1 displays the geographical coordinates
of the five selected rainfall stations for the period of 1975 to 2004
in Peninsular Malaysia. In addition, Fig. 1 shows the
location of the selected rainfall stations that will be used in the analysis.
The homogeneity of the data series was checked using four types of homogeneity
tests as recommended by Wijngaard et al. (2003). The standard normal
homogeneity test, the Buishand range test, the Pettit test and the Von
Neumann ratio test. The results showed that the annual mean of the wet
spells of the data series was homogeneous. The data used in this present
study can be considered good quality data as three out of the five stations
had less than 4% missing values throughout the 30-year period. The missing
values in the data series for the period of 1975 to 2004 were estimated
using various types of weighting methods such as the inverse distance,
the normal ratio and the correlation between the target and the neighboring
stations (Suhaila et al., 2008; Teegavarapu and Chandramouli, 2005;
Sullivan and Unwin, 2003; Eischeid et al., 2000).
Probability models for wet spells: A wet day is defined as a day
with a rainfall amount of at least 0.1 mm. A wet (dry) spells is a period
of consecutive days of exactly, say x, wet (dry) days immediately preceded
and followed by a dry (wet) day. The minimum length of a wet spells is
taken as one day which means a single wet day. Six probability models
will be fitted to the observed length (days) of wet spells to determine
the best fitted probability model in representing the sequences of wet
days at the stations. One out of the six probability models tested is
the new model proposed in this present study. According to Lana and Burgueno
(1998) there are two advantages in applying the Poisson models to very
long dry episodes, e.g., 22 to 50 days. Firstly, the ability of this model
in quantifying the probability and return periods concerning a long episode
and secondly, the probability of the drought event could be quantified
for a repeated fixed number of times during a fixed number of years. Based
on the modification on the Poisson model, which is also known as the truncated
Poisson distribution and also the success of the one parameter GD in representing
the sequences of wet (dry) days, thus, the new probability model MGTPD
is proposed as an alternative probability model for daily rainfall occurrence
in this present study. This new probability model is developed to provide
a statistical description of the observation which is meant for various
purposes not only for data generation but also for modeling climatic events
in order to describe the physical explanation of the rainfall occurrence.
In fitting a particular probability model to the observed distribution
of wet spells, the parameters are estimated by using either the methods
of maximum likelihood, moment or factorial moment. Table
2 shows the probability models applied including the probability functions
and also the method of estimating the parameters for the six models tested.
The parameter of the GD is estimated using the maximum likelihood method.
However, for mathematical convenience due to the complexity of the two
other distributions CGD and GMLS, the factorial moment and moment method
will be applied respectively when estimating the parameters.
Table 1: |
The geographical coordinates, percentage of missing
data and the main characteristics of wet spells length for each of
the five selected rainfall stations over Peninsular Malaysia for the
period of 1975-2004 |
 |
|
Fig. 1: |
The physical map showing the five selected rainfall
stations in Peninsular Malaysia |
Table 2: |
The list of probability models, probability function
and the method of estimation used for fitting the distribution of
wet spells, x = 1, 2,... |
 |
Moreover,
for the other three probability models MGD, MGPD and MGTPD, the parameters
of these mixture probability models are estimated using the Quasi-Newton
method with the maximum likelihood estimation. The parameter of and for
each probability model applied ranges from 0 to 1, while W is the weight
factor, where the sum of W and is unity.
Table 3: |
The parameter estimates for each model in fitting the
distribution of wet spells for the five selected rainfall stations
over Peninsular Malaysia |
 |
Table 3 shows
the parameters estimation for each probability model tested in the study.
In some cases, the GMLS has a disadvantage in estimating the parameter,
p, which is obtained by solving the quadratic equation. Due to some circumstances,
the real value of p cannot be produced as the imaginary term exists in
the quadratic function.
In order to determine the most successful and the best fitting probability
model to represent the distribution of wet spells at each of the five
selected rainfall stations, the chi-square goodness-of-fit test is considered.
The test will be computed based on the differences between the observed
and expected frequencies of various lengths of wet spells (Snedecor and
Cochran, 1989). The classes should be grouped for adjacent spells, whenever
frequencies are less than 5. The Pearson`s chi-square goodness-of-fit
test is shown:
where, Oi and Ei are the observed and expected
frequencies of wet spells and n is the number of classes. The chi-square
test of 5% level of significance with the degree of freedom, v = n-t-1,
where, t is the number of parameters estimated from each probability model,
will be considered. Note that a higher probability associated to the lower
chi-square statistics value produces a better fit.
RESULTS AND DISCUSSION
Main characteristics of the length of wet days: Table
1 shows the usual statistical parameters of the distribution of wet days
at each of the five selected rainfall stations in Peninsular Malaysia. These
include the mean, median, standard deviation, coefficient of variation, maximum
spell length and the total number of wet days. It is shown that almost all the
characteristics of wet spells in the Ldg. Boh station was observed to be slightly
higher than in the other stations. This may be due to the geographical aspect
of the Ldg. Boh station which is located in the highlands, as the highlands
are the wettest areas in the Peninsula. The mean, median and the standard deviation
of the duration of wet spells that varied from 2 to 3 days were observed in
all the stations except in the Ldg. Boh station. Over the study period, the
J. Bahru station, which is located in the southern Peninsula, experienced the
longest wet spells length, with a maximum of 48 days. Meanwhile, the A. Pedu
station, which is situated in the northwestern area of the Peninsula, had the
shortest of the maximum length of wet spells duration which was 19 days. Moreover,
the results indicated that the highest total number of wet days of 6004 days
was observed at the Ldg. Boh station.
The most successful and the best fitting probability model for wet
spells: The most successful model as shown in Table
4 is based on the highest number of stations that successfully fit
the distribution of wet spells to a particular probability model at 5%
level of significance. In determining the most successful model to describe
the distribution of wet spells, the model with the less number of parameters
is always a main interest in this field of study. However, not all of
the one or two parameter models can be fitted to the distribution of interest.
For example, the results in Table 4 indicated that the
wet spells data at the A.Pedu station failed to fit both the one and two
parameter models. In this situation, the model with the higher number
of parameters is recommended. The results revealed that all the three
parameter models were found to successfully fit the wet spells at the
A.Pedu station except GMLS. It is observed in Table 4
that the new proposed model MGTPD was the only model that successfully
fitted the distribution of wet spells in all the five selected stations.
Additionally, MGTPD was also found to successfully fit the data in the
J. Bahru and Sitiawan stations which were not able to fit with the existing
probability model, MGPD. Moreover, the results of this present study have
proven that the performance of GMLS, which is a three-parameter model,
could not be compared with the rest of the models including MGTPD, in
the J. Bahru station due to the failure in producing the parameter, p.
In the present study, the best fitting model was based on the largest
probability associated to the lower chi square values as shown in Table
4 (in bold) for the distribution of wet spells at each of the five
selected rainfall stations over Peninsular Malaysia.
Table 4: |
The chi square goodness-of-fit test of various types
of distributions for the wet spells in each rainfall station over
Peninsular Malaysia |
 |
The shaded areas and bold face indicate the successful
and best fitted models at each rainfall station respectively. na;
No solution exists for the GMLS because of the imaginary term of the
parameter p, 
; Successfully fitted at 5% level of significance |
Table 5: |
The main features of the various types of probability
models fitted to the length of wet spells at each of the five selected
rainfall stations for the period of 1975 to 2004 |
 |
na; No solution exists for the GMLS because of the imaginary
term of the parameter p |
The findings of this
present study indicated that the new proposed model, MGTPD was found not
only to successfully fit all the data sets, but this model showed the
best fit to the wet spells data at the J. Bahru and A. Pedu stations.
The results indicated that the feature of the main characteristics of
the theoretical frequency of wet spells produced by MGTPD at each of the
five selected rainfall stations were slightly underestimated or overestimated.
The overall findings however, indicated that this new proposed model,
MGTPD, was better than the existing probability model, MGPD (Table
5).
CONCLUSION
The development of the rainfall occurrence model will benefit many areas
not only for data generation purposes but also in managing water resources
for the hydrological and agricultural planning sectors. With a slight
modification on the Poisson model which is also known as the truncated
Poisson distribution and also the success of the one parameter GD in representing
the sequences of wet (dry) days, the new probability model, MGTPD, is
proposed as an alternative probability model for daily rainfall occurrence
in this present study. The findings indicated that the MGTPD was superior
to the existing probability model, MGPD, in describing the distribution
of wet spells at each of the five selected rainfall stations for the period
of 1975 to 2004. It is shown that MGTPD successfully fitted all the data
sets used in this present study. Moreover, two out of the five selected
rainfall stations were found not to successfully fit the existing MGPD
but these data sets were found to successfully fit the MGTPD. The new
probability model proposed in this present study is the alternative probability
model which was developed to enhance the development in rainfall modeling
particularly on the sequences of wet (dry) days. This model can be applied
not only to the whole of Malaysia, but everywhere throughout the world,
by considering different monsoon seasons and different threshold values.
ACKNOWLEDGMENTS
The authors are indebted to the staff of the Drainage and Irrigation
Department and Malaysian Meteorological Department for providing the daily
rainfall data for this study. They also acknowledge their sincere appreciation
to reviewers for their valuable comments and suggestions. This research
was funded by the Universiti Kebangsaan Malaysia (UKM-GUP-PI-08-34-318).