INTRODUCTION
Cancer imposes a heavy burden on the public health and has become one of the leading causes of death worldwide. Cancer is a disease caused by an uncontrolled division of abnormal cells in a part of the body and when the growth cell form lump on breast it is called breast cancer^{1}. Breast cancer is more prevalent among the female population in all the countries wherever it has been studied. Breast cancer is curable if it is detected in the early stage of diagnosis and screening is the key element for early detection. But unfortunately in India the survival rate is very low due to lack of screening awareness^{2}. In 2012, 1676633 new cases of breast cancer are estimated to have been diagnosed worldwide which is 25.16% of all new cases of female cancer patients (Excluding nonmelanoma skin cancer)^{3,4}. Healthcare burden related to breast cancer in India too is increasing day by day. It is mentioned in the International agency for research on cancer (WHO) report that number of new cases of breast cancer increased from 115251 (22.2%) in the year 2008 to 144937 (27.0%) in the year 2012. Also, 17000 more deaths have occurred in the year 2012 compared to 2008^{5}. In NorthEast India, there is a wide disparity in both the diagnosis and treatment of cancers which are mostly due to socioeconomic conditions, lack of awareness and difficulty to access the facilities for cancer diagnosis and treatment^{6}. Breast cancer also occupies the highest place with relative proportion of 17.5% in the hospital based cancer registry in progress in the Dr. B. Borooah Cancer Institute^{7}. Dr. B. Borooah Cancer Institute is the Regional Cancer Care Centre for entire NorthEast region of India.
Increasing age is a primary risk factor for breast cancer^{1,8} and earlier study reveal that the risk factors may vary from one group to another age group of the patients. Mayberry^{9} investigated about the risk factors of breast cancer in different age group. Wingo et al.^{10} also found that the relationship between the risk of breast cancer and oral contraceptives use appeared to vary by age at diagnosis. So, there is a dependency between the age distribution pattern and the risk factors of the beast cancer cases^{9,10}.
Present study is carried out in Assam which is in NorthEast of India. The NorthEast region of India is unlike the rest of the country regarding customs, assorted ethnic groups with their typical food habits, lifestyle and varying types and pattern of tobacco use. Moreover, due to its unique and strategic geographic location with age old history of migrations it is considered as a genetic pool. The present study aims to obtain an appropriate statistical distribution model for the age pattern of the female breast cancer occurrences and to test how well the chosen statistical distribution fits the data. In previous study, only frequency have been given for different age group. It also makes a study of the basic demographic information of the patients and to compare the mean ages of the occurrence of breast cancer cases between two different place of residence, viz., rural and urban. This study will help draw the attention of cancer epidemiologists to the most prevalent age group of the breast cancer patients in the study region as probability estimation of breast cancer occurrences for a particular age group can be useful tool in the development of a risk prediction model.
MATERIALS AND METHODS
To execute the present study a retrospective analysis utilizing secondary data available during 20102012 at Dr. B. Borooah Cancer Institute was done. Case files of the clinically diagnosed breast cancer patient were reviewed and information on age, marital status, occupation, child birth and place of residence were abstracted. A total of 1261 cases were registered during this period. But 94 cases which are from outside Assam and 14 male patients are excluded and finally 1153 cases were considered for the study. Being a retrospective study and no ethical approval was required for the study as all the patients were treated with the standard departmental protocol.
Basic information of the breast cancer cases are presented in Table 1. To accomplish the age distribution pattern, ages of the breast cancer cases are grouped into 14 subintervals and corresponding observed frequency of the number of cases have been found out. Examining the frequency curve of the observed data, it was felt that a good fit of the observed data in each age group can be obtained with a form of gamma probability distribution. Parameter of the gamma probability distribution model are estimated using method of moments and the goodnessoffit was checked using the chisquare goodnessoffit test. The procedure of parameter estimation and goodnessof fit test are briefly described below.
Fitting the age distribution data using gamma probability distribution model: The twoparameter gamma distribution has one shape and one scale parameter. The random variable X follows gamma distribution with the shape and scale parameters as α>0 and β>0, respectively, if it has the following Probability Density Function (PDF):
Table 1: 
Frequency table for demographic information of female breast cancer patients 

It will be denoted by gamma (α,β).
where, is the gamma function and expressed as:
It is well known that the PDF of gamma (α,β) can take different shapes but it is always unimodal. The mean and variance of the gamma distribution is:
and:
The cumulative distribution function of the gamma distribution is:
Use the GaussLaguerre integration method to derive gamma values. These values were compared with the table values available in the tracts of computers^{11}. The values obtained by the computer program were accurate upto 4 places. Also, Simpsons 1/3rd integration technique is used for evaluation of values of cumulative distribution function corresponding to different value of variable x.
Method of moments (MOM): Consider the method of moments for parameter estimation. Let x_{1}, x_{2},..., x_{n} represent the set of data which is i.i.d distributed as gamma (α,β) with moment generating function:
The moments of gamma distribution are:
Using the method of moment estimator:
where, and are calculated from observed data.
Goodnessoffit test: Goodnessoffit test procedures are intended to detect the existence of a significant difference between the observed (Empirical) frequency of occurrence of an item and the theoretical (Hypothesized) pattern of occurrence of that item. This study assumed that the gamma distribution is a good fit to the given dataset.
Chisquare goodnessoffit: The chisquare test is used to test if a sample of data came from a population with a specific distribution.
In this study, chisquare test is defined for the hypothesis:
H_{0}: 
Data follow a gamma distribution against the alternative hypothesis 
H_{a}: 
Data do not follow the gamma distribution 
For the chisquare goodnessoffit computation, the data are divided into k = 14 bins and the test statistic is defined as:
where, O_{i } is the observed frequency for bin i and E_{i} is the expected frequency for bin I.
The expected frequency is calculated by:
where, F is the cumulative distribution function for the gamma distribution, Y_{u} is the upper limit for class i, Y_{l} is the lower limit for class i and N is the sample size.
Implementation of the age distribution data: The data for this study is the frequency of the number of patients occurring in a specific age group. Programs written in Clanguage (Appendices) was used to find out the expected frequency from gamma distribution model. The goodnessoffit test is done in MsExcel sheets.
Independent sample ttest: Independent sample ttest has been used to test whether mean age is significantly different in two place of residence of the patients. Here the null hypothesis (H_{0}) and alternative hypothesis (H_{1}) of the independent samples ttest can be expressed in the given way:
H_{0}: 
:µ_{1} = µ_{2} (Two population means are equal) 
H_{a}: 
µ_{1} ≠ µ_{2} (Two population means are not equal) 
When the two independent samples are assumed to be drawn from populations with identical population variances the test statistic t is computed as:
With:
When the two independent samples are assumed to be drawn from populations with unequal variances the test statistic t is computed as:
Where:

= 
Mean age of the first sample 

= 
Mean age of the second sample 
n_{1} 
= 
Sample size (i.e., No. of observations) of first sample 
n_{1} 
= 
Sample size (i.e., No. of observations) of second sample 
s_{1} 
= 
Standard deviation of first sample 
s_{2} 
= 
Standard deviation of first sample 
s_{p} 
= 
Pooled standard deviation 
RESULTS AND DISCUSSION
A total of 1153 clinically diagnosed cases of the female breast cancer are found between the study periods 20102012. Table 1 reveal that among the cases, 18.6% cases occurred in the age group 3843. Though breast cancer is more prevalent in urban area^{12} but in the present study it is observed that larger amount of patients reported are from rural areas of the study region. Patients from rural area may ignore initial symptoms of the disease due to lack of awareness and their financial problem and ultimately when they came to consult with doctor it already turns into malignant tumour. So, lack of awareness and income level may be a responsible factor for this outcome. Majority of the patients are married (94%) and housewives (91.6%). Regarding these factors of the patient, same scenario have been observed in other Indian studies^{1315}. Among the child birth group, where this study has also included those patients who do not have previous pregnancy records, it is observed that 61.5% breast cancer patients are in 02 child birth group. A study on current breast cancer scenario of India done by Agarwal and Ramakant^{16} too mentioned that nulliparous women had higher risk than parous women^{16}.
Previous study shows that age is most important factor for breast cancer occurrences and risk of occurrence of breast cancer cases increasing with age of the patients. So, in this study more emphasis has been given to observe the pattern and characteristics of age of the patients. Table 2 depicts the more prevalent age group of the breast cancer occurrences found in other studies conducted within India.
It is observed that researcher of the all the above given study mentioned only highest peak of the age group in their specific region. In this study, an attempt has been made to fit the age distribution data to a probability distribution model in order to estimate the expected frequency of female breast cancer cases that might occur in specific age group in the study region. This study found that gamma probability distribution model gives good fits to the age distribution data. In the study, observed probability curve is positively skewed and unimodel which matched with the characteristics of the gamma distribution model^{21,22}.
The shape parameter (α) and scale parameter estimated from the method of moments are 16.51 and 0.37, respectively. Table 3 presents the observed frequency and gamma fits for each of the 14 age groups. It can be observed from the estimated frequency that the fit is quite good. The chisquare statistic for the goodnessoffit test is 15.8899 and the corresponding pvalue for (14121 = 10 degrees of freedom) is 0.2271 which is insignificant. Therefore, the null hypothesis can be accepted and concluded that the fit is good.
The visual assessment of the fit is shown in Fig. 1 and 2.
Table 2:  Age peak of breast cancer patients in other place of India 

From the observed probability curve it can be found that the curve is unimodel and has highest peak at the age group 3843 and for estimated curve it is 4348. From the age group 4348 the slope of the curve slants down and from 6873 age groups it becomes flat. It can be observed that highest peak of the age of breast cancer occurrences is lower to some extent than earlier related studies as mentioned in Table 2.

Fig. 1: 
Observed and fitted count resulting from gamma probability distribution model 

Fig. 2:  Cumulative probability curve 
Table 3:  Gamma Probability distribution model fit to age distribution data 


Fig. 3: 
Age distribution pattern in two different place of residence 
Table 4:  Descriptive measure of the age of the breast cancer patients in different place of residence 

Barrya and Breen^{23} reported that study of place of residence have importance to breast and cervical cancer diagnosis. Sither^{24} reported that locality of the patients have effect on the diagnosis stage of the breast cancer cases. So, considering the importance of the previous study, this study has applied independent sample ttest in the present study to observe whether there is any difference in the mean age of the patients in the rural and urban group of residence.
Table 4 shows the difference in the average age of the patients and Fig. 3 depicts that frequency curve of age representing rural group of the patients takes the highest peak in the 3843 age group. As age increases further, the curve slopes down and from 6873 age group it becomes flat. Interestingly for the urban group, the curve shows slightly oscillatory nature. First it rises to a high place in 3338 age group, slopes down to 3843 age groups and subsequently it again rises upward to 4348 age groups.
From the independent sample ttest, significant difference has been observed in the mean age of the patients in rural and urban group of residence at 10% level of significance with pvalue = 0.061.
CONCLUSION
It is well known that there exists regional variation in the incidence and mortality of cancer. This study proves that the age incidence of breast cancer is slightly lower and incidence of breast cancer in rural women is slightly higher in the study region than other regions in India. These results are interesting because they vary a bit from the reports of the other individual researchers. It may be because there is a slight shift in the trend since the previous studies or the pattern may vary in this region as compared to other regions in India. Further, this study is important because probability estimation of breast cancer occurrences for a particular age group can be useful tool in the development of a risk prediction model. Using the outcome of the present study, the cancer epidemiologist can predict more accurately the probability of occurrence of cancer of a women belonging to particular age in NorthEast India and give importance to the most prevalent age group of women and their life style related factor with reduced study time and more accuracy. For those epidemiological studies, the screening will be more successful for the study region.
ACKNOWLEDGMENT
We would like to express sincere thanks to the Director of Dr. B. Borooah Cancer Research Institute for giving us the permission to collect the data from the institute to continue this study. We gratefully acknowledge all the support extended by Institute of Advanced Study in Science and Technology during this study period.
APPENDICES