HOME JOURNALS CONTACT

Journal of Medical Sciences

Year: 2013 | Volume: 13 | Issue: 7 | Page No.: 526-536
DOI: 10.3923/jms.2013.526.536
Risk Factors Determination on UGIB Patients in Kota Kinabalu, Sabah, Malaysia
A. Noraini, H.J. Zainodin and L.B. Rick

Abstract: One of the common causes of medical emergencies in Malaysia is the Upper Gastrointestinal Bleeding (UGIB). This bleeding comprises of two types, namely, the variceal bleeding and the non-variceal bleeding. Although, UGIB has been frequently researched in general, not much reference has been done with regard to the specific factors that affect the type of bleeding. Hence, this research aims at determining these factors via a Multiple Binary Logit (MBL) approach based on the patients’ information obtained from the Queen Elizabeth Hospital in Kota Kinabalu. From the twenty-eight independent variables, three are quantitative would be analyzed, while twenty-five are qualitative would be transformed into dummy variables. Four phases in a model-building approach are executed to obtain the best model. The three qualitative variables (age, systolic blood pressure and Blatchford score) have interacted with the other dummy variables to affect the type of bleeding on the patients. The main contributing factors to the type of bleeding are identified as Race, Blood Urea Nitrogen and Medical Shock since they require no interactions with the other variables in the models.

Fulltext PDF Fulltext HTML

How to cite this article
A. Noraini, H.J. Zainodin and L.B. Rick, 2013. Risk Factors Determination on UGIB Patients in Kota Kinabalu, Sabah, Malaysia. Journal of Medical Sciences, 13: 526-536.

Keywords: dummy, binary logit, model-building, Gastrointestinal bleeding and best model

INTRODUCTION

Despite great advances in diagnostic and therapeutic techniques, acute upper gastrointestinal bleeding remains a common reason for hospital admission, with estimated rates of mortality and rebleeding at 4-14% and 10-30%, respectively (Bessa et al., 2006). The two types of bleeding, variceal and non-variceal require different sets of treatments but generally similar clinical presentations of hematemesis. Patients may be sensitive and allergic to certain treatments, causing death, morbidities and extra costs (Demoly and Bousquet, 2002). There are results showing that there are frequent occurrences of errors in medication allergy documentation and this may increase when summarized information is used (Pohl et al., 2010). These situations work together causing major harms in cases of misdiagnosis. Furthermore, there is a general lack of awareness with regards of the bleeding. Alcoholics are a high risk population for upper gastrointestinal bleeding, in addition to their very poor awareness of how to respond correctly to hematemesis and melena (Gundling et al., 2007). There is thus a need to understand the factors affecting the type of bleeding for greater self care. Factors affecting the type of bleeding are different depending on geographical location (Sarin and Negi, 2006). This creates a need for an analysis on factors determination which affect the bleeding on a local level.

According to Straube et al. (2009), overall mortality in patients with bleeding and perforation has decreased gradually since 1997 but still remains at 1 in 13 cases but these occur at a higher rate in those exposed to high NSAID usage. They also mentioned, however, that patients exposed to high NSAID usage recorded an increase in mortality, to 1 in 7 cases and the reasons to this are still unknown.

These kinds of ambiguous situations leave much space for further research. Identifying the factors affecting each respective bleed allow a further analysis on the nature of these bleedings. From there, deductions on how these factors relate to the local people and their way of life can be made. For example, the variable, ‘Race of Patient’ or ‘Age’ may tell a lot on the target groups of people with higher risk of a certain bleed. Furthermore, interaction variables may tell more about how factors can work together to form risk factors for the bleeding.

As mentioned earlier, problems can occur from wrong diagnosis and drug allergy. It is thus imperative for medical practitioners to have adequate proof and diagnose to avoid any harms to the patients above that which is caused by the gastrointestinal bleeding itself (Kostopulou et al., 2008). It is also important for an early and precise diagnosis and endoscopy to address the health issue in its initial stages to decrease the hospitalization period (Sridhar et al., 2009; Kato and Asaka, 2010, 2012). Hence, this paper intends to highlight the importance of identifying these risks factors through a mathematical modelling process using a Multiple Binary Logit (MBL) model.

MATERIALS AND METHODS

The Logit model is based on the cumulative logistic distribution function which gives probability estimated bounded by 1 and 0 (Zainodin and Khuneswari, 2010). The Multiple Binary Logit (MBL) Model can be represented as:

(1)

(2)

(3)

For which Pi denotes the ith probability for an event to occur where i = 1, 2,..., n and (1-Pi) is the complement for the events above. In the model equation:

Yi = β01X12X2+...+ βkXk+u

where, Yi is the binary dependant variable with ‘1’ denoting the incidence of non variceal bleeding and ‘0’ denoting the incidence of variceal bleeding; Xj denotes the j-th independent variable which may exist as single independent variables, interaction variables (first order, second order, third order,...), generated variables (polynomial or dummy variable) and transformed variables (Ladder Transformation, Box-Cox Transformation); β0 denotes the constant term for the model and βj’ denotes jth coefficient of independent variable, Xj and ‘u’ denotes the error term or residuals of the model. The model has k number of independent variables and (k+1) denotes the number of parameters, which is the number of independent variables including the constant term.

Data preparation, transformation and description: Data were obtained from 489 UGIB patients of the Queen Elizabeth Hospital (QEH) in Kota Kinabalu in 2008, with 22 patients were diagnosed with non-variceal bleeding while 467 were of variceal bleeding (Chin et al., 2008).

There were ninety-seven different types of information contained in the data set consisting of personal details such as age and race and diagnoses, such as type of bleeding to medical treatments that had been administered, were encoded into a categorical form. However, in this study, 16 sets of information had been selected with a dependent variable (type of bleeding), three quantitative (systolic blood pressure score, age in years and Blatchford score) and 12 qualitative independent variables. The 12 independent qualitative variables were transformed into dummy variables of values ‘0’ and ‘1’ leading to 33 dummy variables. For practicality, dummy variables with zero occurrences and low frequency of cases or of small variances were removed so as to reduce the chances of multicollinearity. Hence a total of 23 dummy variables had remained for further analysis.

Model building phases: The model building phases using the multiple regression analysis had been introduced by Zainodin et al. (2011). However, the model-building approach for the multiple binary logit model is introduced with similar procedures as in Zainodin et al. (2011), except that the eight selection criteria (8SC) are replaced by the modified eight selection criteria (M8SC) and the goodness-of-fit tests are based on the Deviance and Pearson Chi-square Tests, as shown in Fig. 1 of the model building summary. The model-building procedures can be briefly explained as follows:

Phase 1: All possible models: All the possible models consisting of the different combinations of variables and their interactions are listed out for consideration. The calculation for the all possible models is given as:

where, ‘N’ denotes the number of all possible models and ‘q’ is the number of single independent variables (for j = 1, 2, 3,…, q), excluding the dummy variables (Zainodin and Khuneswari, 2010). Table 1 summarizes the number of possible models for 3 single independent variables.

Phase 2: Selected models: All the possible models listed out in Phase 1 will go through a variable filtering process so as to find these selected models (Zainodin and Khuneswari, 2010). The parameter tests would include the Global test, Multicollinearity test, Coefficient test and the Wald test and gthe models in each stage will be labeled as in Fig. 2.

Table 1: Summary of all possible models

Fig. 1: Model building summary

Fig. 2: Model labeling Zainodin et al. (2011) where, M stands for the multiple binary logit model, a is the model number, b is the number of multicollinearity source variables removed in the multicollinearity test and c is the number of insignificant variables eliminated from the coefficient test

The existence of multicollinearity source variables can be detected from their correlation coefficients (Wheeler and Tiefelsdorf, 2005). In this research, multicollinearity is defined as an absolute correlation coefficient larger or equal to 0.95. These multicollinearity source variables will have to be removed by generating a Pearson Correlation matrix which can be done in many types of statistical softwares, such as from Microsoft Excel to SPSS. The multicollinearity source variables will be subsequently removed one by one, while each time recalculating the Pearson Correlation Matrix. The Zainodin-Noraini multicollinearity removal procedures must be in accordance to certain procedures outlined below by the case types in a descending order of execution as in Zainodin et al. (2011):

Case A: Incidence where the variable with the highest frequency of multicollinearity detected will be removed first
Case B: There will be cases for which two or more variables share the same frequency of multicollinearity with other variables. The absolute correlation coefficient of these variables with the dependant variable will be first considered. The variable with the lowest absolute correlation value will then be removed
Case C: For incidences where variables have common frequency of multicollinearity at 1. The number of these common ties may be one or more

The highest absolute correlation value will be identified and the variables corresponding directly with these values will be identified and compared. The variable with the lower absolute correlation value with the dependant variable will hence be removed.

The coefficient test is performed on the reduced model after the multicollinearity source variables removals. It is an elimination procedure of insignificant variable by using the backward elimination method (Abdullah et al., 2008).

Table 2: Modified eight selection criteria (M8SC)

After all the selected models are obtained, the Wald test is carried out to justify the removal of the insignificant variables (Zainodin and Khuneswari, 2010).

Phase 3: Best model selection: The best model will be chosen from among the selected models, based on the Modified Eight Selection Criteria (M8SC). The modified 8SC (M8SC) replaces the sum of square error (SSE) with Deviance Statistics, (Zainodin and Khuneswari, 2010). Kutner et al. (2008) had defined the likelihood ratio test statistic as (-2) times the log likelihood:

where ‘Yi’ is the ith observation of the dependant variable and is the estimated probability. The sum of square deviance residual is the deviance statistic, multiplied by a penalty factor that depends on the complexity of the model measured by the number of (k +1) parameters. The usage of these model criteria is subjected to 2(k+1) <1. Table 2 summarizes the criterion according to Zainodin and Khuneswari (2010) where G2 is the deviance statistics, n is the number of observations and (k+1) is the number of estimated parameters.

Phase 4: Model goodness of fit: The appropriateness of the best model has to be tested before it is accepted for prediction. This is tested by the Pearson Chi Square statistics, deviance statistics and the LOWESS method (Kutner et al., 2008). The pearson residual is numerically equal to the Pearson chi square statistic, as indicated by Zainodin and Khuneswari (2010), while the Deviance residuals (sum of squares of the deviance residuals) are numerically equal to the Deviance statistic (Zainodin and Khuneswari, 2010). The LOWESS method was developed by Cleveland and Devlin (1988) as a non parametric diagnostic measure. It utilizes a neighborhood around each X value, fitting a linear regression to it and then uses the fitted X value as the smoothed value to obtain a smoothed corresponding Y-value (Kutner et al., 2008). Smooth curves are useful when regression model is selected in confirming the appropriateness of the chosen regression. An appropriate regression model is correct when: E {Yi} = Y and E {Yi-Yi} = 0. This means that a lowess smooth of the plot of residuals against the estimated probability Yi, should be approximately a horizontal line with zero intercept. The LOWESS plots are used as supporting evidences for the goodness of fit test (Zainodin and Khuneswari, 2010).

RESULTS AND DISCUSSION

The objective of this analysis is to obtain the best model which describes the main factors which contributes to the type of upper gastrointestinal bleeding. Table 3 shows some of the 97 data variables which can be used for analysis. However, in this research, sixteen sets of information had been selected as the chosen data variables, namely, one dependant variable, 3 quantitative independent variables and 12 qualitative independent variables.

The multiple binary logit model allows for the 12 independent qualitative data variables to be transformed into dummy variables of values 0 and 1. After transformation, the number of independent qualitative variables has increased from 12 to 33 dummy variables. It is thus important to reduce this number of variables for ease and practical analysis (Ramanathan, 2002) since there is a higher chance of multicollinearity to occur in a model with more variables, hence causing difficulty in interpretation. Table 4 indicates some of the transformation carried out on the chosen dummy (independent qualitative) variables.

Dummy variables which are recorded as zero occurrences will be removed, since this means that no patient has been diagnosed with that condition. Next, variables due to extremely low frequency of cases will be removed, which would cause a subsequently small variance. As a rule of thumb for this analysis, any variable with frequency of cases amounting to less than 1% of the total observations of 477 (or variance is approximately 0.01) will be removed. Overall, a total of 23 variables had remained which are then ready to be analyzed.

Table 3: Data variables and their descriptions

The remaining variables are then re-labeled in alphabetical form, as commonly seen in regression models, denoted by: Y (dependent), X1, X2 and X3 (independent quantitative) while A, B, C, D, E, F, G, H, J, K, L, M, N, P, Q, R, S, T, U,V, W, Z and $ (independent qualitative) as in Table 4.

Descriptive statistics are then carried out on the independent quantitative variables X1, X2 and X3 (Table 5) since according to Zainodin and Khuneswari (2010) descriptive statistics cannot be carried out to analyze variables of binary nature.

A best model can only be concluded if it was selected from a pool of different possibilities. From three quantitative independent variables (q = 3), a total of 12 models are obtained and each possible model can be estimated as long as the number of parameters in the model is less than the sample size or number of observations (k+1<n, where k+1 is the number of parameters and n denotes the sample size). In this study, the sample size, n is 477. Taking model M11 for illustrative purposes:

(4)

Table 4: Transformation on dummy variables, definitions and their descriptions

The Global test will be tested on all the 12 models. Since this process tests the existence of net zero regression coefficients in the model, it has to be executed before and after the removal and elimination of the variables, viz., before and after the multicollinearity test and the coefficient test of each model.

Table 6 shows the summary of the multicollinearity removals on model M11.3.

There are 38 multicollinearity source variables being removed (Table 6). Model M11 then becomes model M11.38. The model which is hence free from multicollinearity then contains the following variables:

Table 5: Descriptive statistics for independent quantitative variables (X1, X2, X3)

Table 6: Summary of multicollinearity variables removal on model M11

Table 7: Summary of variable elimination in M11.38

Table 8: Selected models with the modified eight selection criteria (M8SC)

Table 9: Summary of deviance statistics for model M11.38.46

The coefficient test then follows with the elimination of 46 insignificant variables. Table 7 shows the summary of the insignificant variables eliminations from model M11.38. Model M11.38 then becomes model M11.38.46.

The selected model, M11.38.46 and its variables are thus given as follows:

(5)

All the possible models have to undergo the model building phases of Fig. 1 so as to obtain the twelve selected models as shown in Table 8. Table 8 depicts the modified eight selection criteria (M8SC) of all the selected MBL models. It can be seen that the best model is model M11.38.46 which has the least values in all of the modified selection criteria (M8SC).

The appropriateness of the best model needs to be tested before it can be used for prediction. This is tested by the deviance statistics, Pearson Chi Square statistics and the LOWESS method. The sum of squares of the deviance statistics are calculated and summarized in Table 9.

In Table 9, the sum of square of the deviance statistic, G2 is less than the χ2critical = 514.1642819, thus the null hypothesis is accepted (Zainodin and Khuneswari, 2010). This thus confirms the appropriateness of model M11.38.46 as the best model.

The sum of squares for the Pearson Chi Square statistic are calculated as shown in Table 10.

Table 10: Pearson chi square statistic for model M11.38.46

The sum of square of the calculated Pearson Chi Square statistics is χ2calc = 556.7857 which is larger than χ2critical = 514.1642819 at α = 5% level of significance. The null hypotheses would then be rejected (Zainodin and Khuneswari, 2010). However, at α = 0.1% level of significance, χ2critical = 562.7617 and the null hypotheses would then confirms the appropriateness of model acceptance. There are possible sources of errors for this to happen that could be due to the improper data categories which misrepresented the information and also the small variance of the dependent variable.

The plots of the ordinary residuals, deviance residuals and Pearson residuals in Fig. 3 are obtained from Graphed Prism 5.02 as supporting evidences for the appropriateness of the model M11.38.46.

As can be seen in the three preceding plots in Fig. 3, there are generally two downward sloping parallel lines in each type, showing attributes of independent binary values, 0 and 1. Based on the plots, it can be generally concluded that M11.38.46 is appropriate as the best model. There are some points which deviate away from this parallel line, as can be seen especially in the Pearson residual plot. This is caused by a small variance in the dependent variable, since the lines that seem to move downwards in a steep manner are those with values of 0.

The LOWESS method or the locally weighted regression scatter plot smoothing is utilized to further support and justify the selected best model. 1920 segments are used with three different levels of windows of Graphed Prism 5.02; 5, 10 and 20, where the bigger the value, the finer would be the plot.

Fig. 3(a-c): Plots for ordinary residual, deviance residual and pearson residual

Fig. 4(a-f): Deviance residual and pearson residual of LOWESS plots

Figure 4 shows a summary table of the LOWESS smooth plots for the Pearson and Deviance residuals.

It can be seen that the LOWESS smooth plots (in dotted black) of Fig. 4 do not follow the horizontal lines. However, the trends do have a zero intercept. The inconsistency of these trends may root from the problems with the data set, for which there are missing data and the small variance of the dependent variable causing errors in the results obtained. However, these plots are only to be used as method of justification that suggests the shape of the regression curve and they are not meant to analyze the regression relationship (Kutner et al., 2008).

From Eq. 5 of the best model M11.38.46, the definition of the significant determining variables remained in the model would then be as indicated in appendix.

The best model M11.38.46 obtained signifies the importance of the ability to estimate the missing values in the patients’ data sets. In the selected data set on the UGIB patients of QEH, 12 patients were found to have missing values. These patients had at most 14 missing information, ranging from information on age to symptoms such as co-morbidities and possibility of re-bleeding. Of these, data for cases of re-bleeding within 30 days was found to have the most missing values. These patients had been removed from the list of data. Using the MBL approach, the model can hence be used to estimate the missing information of these patients.

CONCLUSION

The MBL model which best describes the main risk factors determining the type of upper gastrointestinal bleeding. Model M11.38.46 serves this purpose so as the other eleven selected models. These models each contain different variables and are worth consideration for further research. The 3 quantitative independent variables: age, Blatchford Score and systolic blood pressure are found to interact with other dummy variables to contribute significantly to the determination of the type of bleeding. The main factors that have been identified would include: race, blood urea nitrogen and shock. These factors together with the significant factors are analyzed in terms of their signs (positive or negative) and magnitude of their coefficients to determine the nature and strength of contribution to the type of bleed. It is worth noting the effect of the Malay race to the type of bleed. The predominantly Muslim culture of this group is worth analyzing in terms of its effects on the upper gastrointestinal bleeding. Finally, it is found that the variances lower or equal to 0.05 and frequency of the category of a qualitative less or equal to 5% has a high chance of causing multicollinearity. These variables may be removed before analysis to avoid complications in the later stages. The results of this analysis can be analyzed further to raise awareness of the upper gastrointestinal bleeding. The factors chosen are general in nature, in a sense that they may affect anyone in the society. In a more complete data set, the best model attained from this MBL approach may be a good measure of being correctly diagnosed with this bleeding.

ACKNOWLEDGMENT

The authors would like to thank Dr Chin Su Kiun, a specialist in Gastroenterology and Upper Gastrointestinal Bleed (UGIB) for his kind assistance in data collection and scientific advices at the Queen Elizabeth Hospital during this research.

Appendix:

Table A: Determining Variables of Model M11.38.46

REFERENCES

  • Abdullah, N., Z.H.J. Jubok and J.B.N. Jonney, 2008. Multiple regression models of the volumetric stem biomass. WSEAS Trans. Math., 7: 492-502.
    Direct Link    


  • Bessa, X., E. O'Callaghan, B. Balleste, M. Nieto and A. Seona et al., 2006. Applicability of the rockall score in patients undergoing endoscopic therapy for upper gastrointestinal bleeding. Digestive Liver Dis., 38: 12-17.
    CrossRef    Direct Link    


  • Chin, S.K., J. Menon, G. Khuneswari, H.J. Zainodin and C.M. Ho et al., 2008. Does the clinical Rockall score predict upper gastrointestinal rebleed during hospitalization in the malaysia population of sabah? GUT 2008: Malaysian Society of Gastroenterology and Hepatology, Kuala Lumpur, 21-24 August 2008.


  • Cleveland, S.W. and S.J. Devlin, 1988. Locally weighted regression: An approach to regression analysis by local fitting. J. Am. Stat. Assoc., 83: 596-610.
    CrossRef    Direct Link    


  • Demoly, P. and J. Bousquet, 2002. Drug allergy diagnosis work up. Allergy, 57: S37-S40.
    CrossRef    Direct Link    


  • Gundling, F., R. Harms, I. Shiefke, W. Schepp, J. Mossner and N. Teich, 2007. Self assessment of warning symptoms in upper gastrointestinal bleeding. Dtsch Arztebl Int., 105: 73-77.
    CrossRef    Direct Link    


  • Kostopulou, O., B.C. Delaney and C.W. Mundo, 2008. Diagnostic difficulty and error in primary care-a systematic review. Family Practice, 25: 400-413.
    CrossRef    Direct Link    


  • Kato, M. and M. Asaka, 2010. Recent knowledge of the relationship between Helicobacter pylori and gastric cancer and recent progress of gastroendoscopic diagnosis and treatment for gastric cancer. Jpn. J. Clin. Oncol., 40: 828-837.
    CrossRef    Direct Link    


  • Kato, M. and M. Asaka, 2012. Recent development of gastric cancer prevention. Jpn. Clin. Oncol., 42: 987-994.
    CrossRef    Direct Link    


  • Kutner, M.H., C.J. Nachtsheim and J. Neter, 2008. Applied Linear Regression Models. 4th Edn., McGraw-Hill, Singapore


  • Pohl, S., M.D. Reis and S.N. Forjuoh, 2010. Medication allergy documentation in ambulatory care: A case report of errors and missed opportunities quantified during the unique transition from paper records to electronic medical records. Internet J. Family Pract., Vol. 8
    CrossRef    


  • Ramanathan, R., 2002. Introductory Econometrics with Applications. 5th Edn., Harcourt College Publishers, Ohio, USA., ISBN-13: 9780030341861, Pages: 688


  • Sarin, S.K. and S. Negi, 2006. Management of gastric variceal hemorrhage. Indian J. Gastroenterol., 25: 25-28.


  • Sridhar, S., S. Chamberlain, S. Thiruvaiyaru, S. Sethuram, M. Scgubert and F. Cuartas-Hoyos, 2009. Hydrogen peroxide improves the visibility of ulcer bases in acute non-variceal upper gastrointestinal bleeding: A single-center prospective study. Digestive Dis. Sci., 54: 2427-2433.
    CrossRef    Direct Link    


  • Straube, S., M.R. Tramer, R.A. Moore, S. Derry and H.J. McQuay, 2009. Mortality with upper gastrointestinal bleeding and perforation: Effects of time and NSAID use. BMC Gastroenterol., Vol. 9
    CrossRef    


  • Wheeler, D. and M. Tiefelsdorf, 2005. Multicollinearity and correlation among local regression coefficients in geographically weighted regression. J. Geog. Syst., 7: 161-187.
    CrossRef    Direct Link    


  • Zainodin, H.J. and G. Khuneswari, 2010. Model-building approach in multiple binary logit model for coronary heart disease. Malaysian J. Mathem. Sci., 4: 107-133.
    Direct Link    


  • Zainodin, H.J., A. Noraini and S.J. Yap, 2011. An alternative multicollinearity approach in solving multiple regression problem. Trends Applied Sci. Res., 6: 1241-1255.
    CrossRef    Direct Link    

  • © Science Alert. All Rights Reserved