INTRODUCTION
Groundwater is one of the major water resources especially in small tropical
islands. Small tropical islands have limited sources of freshwater, no surface
water and fully reliant on rainfall and groundwater recharge. The distortion
caused by over exploitation of freshwater using pumping well have created an
imbalance in the rechargedischarge equilibrium and resulting in the drawdown
of the water table or upcoming of the saltwater intrusion (Kristie,
2007). This is often exacerbated by insufficient recharge to the freshwater
aquifer which can occur in times of deficiency. The freshwater aquifers afloat
on top of the saltwater at the interface due to density differences in the two
respective water sources. The saltwater tends to form a lodge under the freshwater
that extends inland. As saltwater intrusion occurs, this lodge extends further
inland and is seen at shallower depths. The result is that wells that previously
produced freshwater can see an increase in chloride concentration that makes
the well unusable for irrigation or potable uses (Baharuddin
et al., 2009).
The intrusion of saltwater has been the factor of saline water penetration
due to the “landward and upward displacement of the freshwatersaltwater
interface in coastal aquifers (Knighton et al., 1991)
and as the invasion of fresh or brackish surface water or groundwater by water
with higher salinity. Salinity can be explained by the total of all noncarbonate
salts dissolved in water. Salinity is a capacity of the total salt concentration,
comprised mostly of Na^{+} and Cl¯ ions. Even though, there are
smaller quantities of other ions in seawater (e.g., K^{+}, Mg^{2+}
or SO_{4}^{2}¯), sodium and chloride ions represent about
91% of all seawater ions (AlNaeem, 2008). Salinity is
an important measurement in seawater where freshwater mixes with salty water
(Abdullah et al., 2011). Chloride, an ion of
the element chlorine, is naturally abundant within sea water. High chloride
concentrations are often used as an indicator that seawater intrusion is occurring
at a well but it is not a conclusive confirmation (Naeem
et al., 2007).
In general, the Total Dissolved Solids (TDS) concentration is the amount of
the cations (positively charged) and anions (negatively charged) ions in the
water. Thus, salinity of the water can be determined by the TDS concentration
(Mitra et al., 2007). Electrical Conductivity
(EC) is a useful indicator of TDS because the conduction of current in an electrolyte
solution is primarily dependent on the concentration of ionic species. EC is
proportional to the sum of cations and anions and roughly equivalent to TDS
in water. Solids can be found in nature in a dissolved form. Salts that dissolve
in water break into positively and negatively charged ions. Conductivity is
the capability of water to conduct an electrical current and the dissolved ions
are the conductors (Alslaibi et al., 2011). The
best method of monitoring mixture of fresh water and saline water can be done
by measuring the electrical conductivity. Monitoring is conducted for separating
stream hydrographs and geophysical mapping of contaminated groundwater. Examples
of EC for distilled water should typically have an EC of less than 0.3 μS
cm^{1} compared to groundwater, EC values greater than 500 μS
cm^{1} indicate that the water may be polluted. The EC value of drinking
water should be no more than 2500 μS cm^{1}. Water with a higher
TDS may have water quality problems and be unpleasant to drink (Thirumalini
and Joseph, 2009). Factors of existing major chemical elements such as Na^{+},
Cl¯, K^{+}, Mg^{2+} and SO_{4}^{2}¯
contribute a significant role in the process of classifying and assessing groundwater
quality. These ionic chemical elements have the ability to carry an electric
current. The more dissolved ionic solutes in water, the greater it’s EC,
because the conduction of current in an electrolyte solution is primarily dependent
on the concentration of ionic species. Warm weather in small tropical islands
can increase the water salinity because of the evaporation process. This will
leads to more widespread and severe problems in groundwater quality in small
tropical islands.
MATERIALS AND METHODS
Study area: Currently, small tropical island which have been known as tourist attraction depends entirely on shallow aquifer groundwater supply mainly for domestic usage and washing purposes. Increased demands of the residents and tourists impose great pressure on the available groundwater resources. Dug wells are used to extract groundwater from the sandy aquifer. Groundwater is pumped routinely using water pump with water level meter integrated. Thus, aquifer in this tropical small island will becomes gradually vulnerable to seawater intrusion.
Data collection: A total of 59 groundwater samples were collected monthly
from October 2008 to March 2009 in Manukan Island, a small tropical island in
Sabah, West Malaysia located in South China Sea. The groundwater samples were
collected from 10 boreholes which were drilled, using hand auger, to align perpendicularly
from the sea. The depth of the boreholes ranged from 1.53 m.

Fig. 1: 
Cross section (AA’) of boreholes installed 
Cross section (AA’) of boreholes were installed perpendicularly from
the coastline towards inland over a distance of 130 m and the proximity to the
sea is in the order of B1 to B10, as depicted in Fig. 1.
The groundwater was extracted from boreholes using a portable vacuum pump interconnected with 0.3 inch polyethylene tubing. Groundwater was allowed to run for approximately 10 min in order to purge several boreholes volumes. The main reason was to remove stagnant water and allow representative groundwater to be sampled. Prior to each sample collection, the bottles were rinsed thoroughly with the groundwater from the boreholes. Insitu parameter such as EC was measured on site. Water samples to be sent for laboratory analysis were collected in Polyethylene (PE) bottles of one liter volume for anions and cations analysis. After filling the bottles with samples, the bottles were capped tightly, labeled and stored in a cooler box.
STATISTICAL ANALYSES
Preliminary study: The dataset used in this paper had undergone a preliminary
study by using the Factor Analysis (FA). Factor analysis was performed on a
subset of 19 selected variables (pH, EC, Ca, Mg, Na, K, HCO_{3}, Cl,
SO_{4}, H_{4}SiO_{4}, Al, Ba, Be, Fe, Li, Mn, Pb, Se
and Sr), that represented the overall groundwater chemistry framework. Five
factors were extracted from the rotated component matrix. High positive loadings
of Na, EC, SO_{4}, Li, K, Mg and Cl on Factor 1 indicated that the groundwater
chemical composition was largely influenced by marine signature as these ions
were found to be predominant in seawater (Voudouris et
al., 2000). For the Multiple Regression Analysis (MRA), only parameters
from Factor 1 will be used. Two dummy variables (Tides and Borehole position)
have been created to be included in the process of model building as shown in
Table 1.
This variables set have been analyzed by using the MRA without any interaction
involved. Only the single variables have been used and the result of the final
model (M63.0.6) explained that only Sodium and Borehole Position give the significant
effect in EC estimation. The details of this study are explained in Lin
et al. (2012).
Multiple regression (MR) models with interaction: Multiple regression
analysis, a form of general linear modelling (Hair et al.,
2010) is a statistical technique that can be used to analyze the relationship
between a single dependent (criterion) variable and several independent (predictor)
variables.
Table 1: 
Description of variable involved in the models 

The objective of regression analysis is to predict a single Dependent Variable
(DV) from the knowledge of one or more Independent Variables (IV)’s. Interaction
effects represent the combined effects of variables on the criterion or dependent
measure. When an interaction effect is present, the impact of one variable depends
on the level of the other variable. Part of the power of MR is the ability to
estimate and test interaction effects when the predictor variables are either
categorical or continuous. As, Pedhazur and Schmelkin (1991)
had noted, the idea that multiple effects should be studied in research rather
than the isolated effects of single variables is one of the important contributions
of Sir Ronald Fisher. When interaction effects are present, it means that interpretation
of the individual variables may be incomplete or misleading. The specific MR
model that has been explained by Lind et al. (2005)
can be stated as follows:
where, X_{i} is the random variable representing the ith value of DV Y. Thus, X_{1i}, X_{2i }, …, X_{ki} are the ith value of IV for i = 1, 2, …, n.
Models results
All possible models: In the development of the MR models for this
datasets, Electrical Conductivity (EC) would be the Dependent Variable (DV)
noted by Y, whereas, Na (X_{1}), SO_{4} (X_{2}), Li
(X_{3}), K (X_{4}), Cl (X_{5}) and Mg (X_{6})
would be the Independent Variables (IV). Tides (T) and boreholes position (P)
were included as independent dummy variables included in the models. Dummy variables
were executed during the calculation of the possible models but included in
the models before next model building procedure was carried out. All possible
models, N can be calculated by using the formula:
where, N is the number of possible models generated and q is the number of variables and j = 1,2,…, q.
For this study, q = 6 (excluded the 2 dummy), the possible model is:
Table 2: 
Summary of all possible models 

The summary of all possible models are shown in Table 2. In this study, 192 models have been considered into further analysis because the interaction with dummy variable can only be done until the first order interaction (shaded area). The total numbers of models that have been considered in this analysis is 192 = single variable (63 models) + first order interaction variable (57 models).
Selected models: Multicollinearity is the intercorrelation of IV. The
higher correlation coefficient will increase the standard error of the beta
coefficients and produce assessment of the unique role of each independent resulting
in difficult or impossible output. Multicollinearity exist if Correlation Coefficient
>0.95. ZainodinNoraini multicollinearity remedial procedures had been applied
and details are explained in Abdullah et al. (2011)
and Zainodin et al. (2011). Pearson correlation
analysis verifies that there is existence of multicollinearity between IV’s
in M116 and seven variables (X_{1}P, X_{2}X_{3}, X_{1}T,
X_{2}X_{6}, P, X_{3}X_{5}, X_{3}X_{6})
have been eliminated from the models (M116.7.0).
Next, the coefficient test should be carried out as an elimination procedure
of insignificant variable by using the backward elimination as shown by Abdullah
et al. (2008). To justify the removal of the insignificant variable,
Wald test (Ramanathan, 2002) should be applied to the
possible models upon the completion of all the elimination procedure of insignificant
variables. In this step, the total of 14 insignificant variables have been eliminated
from the model M116.7.0. At the end of this phase, only six variables have been
left in the model (i.e., model M116.7.14). Table 3 shows the
entered variable before the elimination procedure and the remaining variable
after the elimination of insignificant variables.
Eight criteria of model selection (8SC): Identification of the best
model should be based on Eight Selection Criteria (8SC) as shown in Abdullah
et al. (2011). The objective is to determine a model with the lowest
value of a criterion statistic. The calculation of the criterion statistics
will be based on the Sum of Square Error (SSE), number of estimated parameters
and the sample size. Table 4 shows the details of each model
selection criteria.
Where, n would be the number of observations, (k+1) is the number of model’s
parameters and SSE the sum of square of error. The Akaike Information Criterion
(AIC) (Akaike, 1974) and Finite Prediction Error (FPE)
(Akaike, 1970) are developed by Akaike. The Generalised
Cross Validation (GCV) is developed by Golub et al.
(1979) while the HQ criterion is suggested by Hannan
and Quinn (1979). The RICE criterion is discussed by Rice
(1984) and the SCHWARZ criterion is discussed by Schwarz
(1978). The SGMASQ is developed by Ramanathan (2002)
and the Shibata criterion is suggested by Shibata (1981).
Table 3: 
Model M116.7.0 with entered variable before elimination procedure
of insignificant variables and model M116.7.14 with remaining variable after
elimination procedure of insignificant variables 

Table 4: 
Eight selection criteria (8SC) for best model identification 

From 192 possible models generated during the stage of this analysis, only 67 models have been selected with the same SSE value and number of model parameter. These models then been grouped and any models from this group can be the selected model. The best model was then chosen from the selected models by using the 8SC based on the majority of least values as shown in Table 5. The best model selected is M116.7.14.
Table 5: 
Value of 8SC for all selected models 

Best model verification: By using the Wald test, the complete model (M116) was taken as initial possible model and M116.7.14 as the reduced model. The complete (C) model (M116):
The reduced (R) model (M116.7.14):
Hypothesis:
H_{0}:β_{1} = β_{2} = β_{5} = β_{15} = β_{23} = β_{26} = β_{35} = β_{36} = β_{56} =β_{T} = β_{P} = β_{1T} = β_{2T} = β_{5T} = β_{6T} = β_{1P} = β_{2P} = β_{5P} = β_{6P} = 0
H_{0}:At least one β_{S} is non zero
Decision:
The value of F_{critical} value from F distribution curve = F_{table} = F_{0.05, 28, 49} = 1.80 and the calculated value of F = F_{cal} = 0.8358. Since the calculated value of F is less than F_{table}, the decision is to accept H_{0}. The removal of insignificant variables in coefficient test is justified.
The final phase of model building is applying the GoodnessofFit on the final best model. The goodnessoffit comprises of the randomness test and normality test. Randomness test is to determine that the residuals are randomly distributed and normality test on the KolmogorovSmirnov statistics is to ensure that the normality assumptions are not violated. The Runs Test value is 3.7767, since the value of Z = 0.192<asymp. Sig (2tailed) = 0.847, therefore, H_{0} is accepted and this test supported the conclusion that there is enough evidence that the residual is randomly distributed. Since the KolmogorovSmirnov statistics (0.192) gives the significant pvalue = 0.200>0.05, therefore, H_{0} is accepted. There is enough evidence at 0.05 significant levels that the standardized residual is normal. This statement is supported by the scatter plot and histogram in Fig. 2.
From here, the best regression model would therefore be represented by:
where, X_{3} is Lithium, X_{6} is Magnesium, X_{1}X_{2}
is interaction between SodiumSulphate, X_{1}X_{3} is the interaction
between SodiumLithium, X_{1}X_{6} is the interaction between
SodiumMagnesium and X_{2}X_{5} is the interaction between SulphateChlorine.
This interaction factor could be considered to reflect ionexchange reactions
between groundwater and the aquifer matrix corresponded to the positive loadings
especially between Sodium and Sulphate.

Fig. 2(ab): 
(a) Standardized residual scatter plot and (b) Histogram
with normal curve 
Sodium played a substantial role in controlling the behaviour of Sulphate
and Magnesium in the shallow aquifer which will increase the salinity (EC level).
Model accuracy measurement: The Mean Absolute Percentage Error (MAPE)
is commonly used in quantitative forecasting methods because it produces a measure
of relative overall fit. The absolute values of all the percentage errors are
summed up and the average is computed (Levy and Lemeshow,
1991). In this study, MAPE is used to verify the best model obtain. It usually
expresses accuracy as a percentage and is defined by the formula:
where, A_{t} is the actual value and F_{t} is the forecast (estimated) value. The difference between A_{t} and F_{t} is divided by the actual value A_{t} again. The absolute value of this calculation is summed for every fitted or forecast point in time and divided again by the total number of fitted point’s a. In this case, the number of a = 3, number of data reserved for this purpose. In general a MAPE of 10% is considered very well, a MAPE in the range 1125% or even higher is quite common. The lower MAPE value the best the model can be used in forecasting or evaluating the missing values. By substituting the remaining observation that has no been included in the model building analysis, the value of MAPE obtained is 2.022%. This value indicates that this model could be best used for estimation of missing value or forecasting.
CONCLUSION
EC is widely used for monitoring the mixing of fresh and saline water. The
groundwater with high EC level is not appropriate for drinking purposes attributed
to its high salinity and elevated concentration of several minor elements. In
this study, the model obtained clearly stated the contribution of each parameter
in determining the EC level. Na is dominant ions of seawater, high levels of
Na ions in coastal groundwater may indicate a significant effect of seawater
mixing. Eventually, Na is not independently significant in estimation of EC
level. The interaction between NaSO_{4}, NaLi and NaMg has given
a significant impact in the model. Two dummy variables (Tides and Borehole position)
have been created to be included in the process of model building but have been
eliminated during the modeling process. The dummy variables do not show any
significant effect in estimation of EC level. The application of the model to
such an island proved useful in demonstrating the mechanism of seawater intrusion
in monitoring the water quality. The uses of variable interaction effects in
the statistical model especially for environmental datasets have shown a significant
impact parallel with the environmental theory. The interpretation on the environmental
theory supported by the statistical modelling plays an important role in this
task of problem solving and decision making. For further analysis, the remaining
72 models (Table 2) will be analysing using MRA with higher
interaction. The highest interaction that can be considered for this dataset
is until 5th order. With the higher interaction effects, the model is expected
to give more significant.
ACKNOWLEDGMENT
The data for this study is financially supported by the Ministry of Science, Technology and Innovation, Malaysia (under Science Fund Grant No 040110SF0065. The authors thank the Mr. Lin Chin Yik and his project team members for providing the data for this research. The authors would also like to thank anonymous reviewers for their useful comments and enlightened ideas.