INTRODUCTION
The insurance covers a large area. Each branch of professional liability insurance is comprehensive and requires expertise. The scope of this study consists of professional liability insurance in accounting and legal risks of this insurance coverage. Determination of risk and premium in insurance is of great importance. There are many statistical methods in the development of insurance risk assessment models.
Insurance claims were defined in terms of entropy of probability distribution for losses. It was tried to find out whether there is a relationship between the level of loss and purchase insurance requirements^{1}. The entropy approach was applied to crop insurance by Najafabadi et al.^{2} to estimate loss sizes. In the insurance sector, degree of competition, market structure and market power were analysed by the entropy method^{3,4}. They paid attention to the computation of credibility parameters based on the concept of relative entropy between demand sizes of the entire portfolio. Interested readers are referred to some studies in the field of insurance with the entropy method^{57}.
According to the authors' best knowledge, there is no study on decision tree algorithm and entropy regarding the assessment of insurance risks in professional liability insurance. This research is expected to provide an effective decisionmaking based on insurance risk factors in professional responsibility branches such as independent accountant, independent financial advisor and independent accountant financial advisor. Thus, insurance companies will be able to understand risk policies and evaluate riskbased premiums. Insurance companies can claim low premiums for high risk and high risk policies for low risk policies. This study will also help insurers decide whether policies are renewable. In the light of previously explained necessities, this paper aims at evaluating risks in the professional liability insurance with decision tree algorithm and entropy. Thus, the present paper is believed to provide an effective decisionmaking tool based on the risk factors of insurance in various branches of professional responsibility.
Since 1970s machine learning has been paid attention, specifically a decisiontree procedure, ID3 (Iterative Dichotomiser), was developed^{8}. This study was extended from previous studies on concept learning systems^{8,9 }and C4.5 was then produced, this became a benchmark against which new controlled learning algorithms were compared. Breiman et al.^{10} published the book Classification and Regression Trees (CART), which explains the formation of binary decision trees. Two similar approaches for learning decision trees, ID3 and CART, were invented independently of each other. The ID3, C4.5 and CART adopted a nonfeedback approach in which decision trees are constructed in the form of a topdown recursive partitioningconquer way. Many approaches to starting the decision tree continue in such a topdown manner, starting with the class tags and their incorporated tags.
The ID3 utilizes information gain as a feature selection measure. This measure is based on the remarkable work of Shannon and Weaver^{11} on the theory of knowledge that investigates the value. Let node N indicate the tuples of partition D. The attribute with the highest information gain is selected as the section attribute for the node N. This feature reduces the information needed to categorize screens in the result sections and reflects the least randomness in those sections. Such an approach minimizes the number of expected tests needed to categorize a given tuple and allows a simple tree to be found. The expected information required to categorize a tuple in D is given by:
where, p_{i} is the probability of an arbitrary tuple in D belongs to class C_{i} and is estimated by . Info (D) stands for the average amount of information required to define the class label of a tuple in D. Notice that information is based only on the proportions of tuples of each class, also known as the entropy of D. Entropy is one of the most common discretization measures used in data based applications. It was first produced by Shannon and Weaver^{11} in the pioneering study on the concept of information theory. Entropybased discretization is a supervised, topdown partition process. It investigates the class distribution information in the computation of the split points. For discretization of a numerical attribute, A, the method takes the value of A with minimum entropy as a splitpoint and repeatedly divides the resulting ranges to achieve a hierarchical separation. Such discretization creates a concept hierarchy for the attribute A.
Now let us divide the tuples in D on some attribute A with v different values, {a_{1}, a_{2}, ..., a_{v}} as realized from the training data. If A has a discrete value, these values correspond directly to the v results of a test on A. The attribute A can be used to divide D into v sections, {D_{1}, D_{2}, ..., D_{v}}, where D_{j} involves those tuples in D that have outcome a_{j} of A. These sections correspond to the branches grown from node N. Let this segmentation produce a complete classification of the tuples. However, the partitions are quite likely to be impure. To obtain a complete classification, the amount of information is calculated by:
The expression acts as the weight of the jth partition. Info_{A} (D) is the expected information to classify a tuple from D based on the partitioning by A. The smaller the expected information required, the greater the purity of the partitions. Information gain is given by:
The attribute A with the highest information gain, Gain (A), is chosen as the splitting attribute at node N. For more details, readers interested in the entropy technique are referred to the work of Han and Kamber^{12}.
MATERIAL METHODS AND STUDY DESIGN
The main subject of this study is the legal risk analysis of professional liability insurances. The data in the proposal form used belongs to the firm of the members of the accounting profession. The risk was dealt with through the entropydecision tree. The input variables used in the professional liability insurance located in the offer forms are as follows:
• 
Profession (Independent accountant, independent financial advisor, independent accountantfinancial advisor) 
• 
Proportions of corporate income taxes 
• 
Proportions of individual income taxes 
• 
Giro in the financial yearend 
• 
Giro in the current financial year 
• 
Insured person working alone? (Yes/No) 
• 
Insurance application cancelled? (Yes/No) 
• 
Insurance demand cancelled? (Yes/No) 
• 
Insurance premium (Turkish Lira) 
• 
Amount of damage (Turkish Lira) 
The output variable is the legal risk. The insurance companies covered in this study consist of 312 policies (258 policies: Examination group and 54 policies: testing group) for the evaluation of the decision tree in the entropy method.
Decision tree introduced by using entropy approach: In this approach, in order to determine the risk in professional liability, an algorithm was determined by classifying the input parameters in terms of the produced decision tree. Since some of the parameters contained quantitative values, the C4.5 algorithm was preferred. The median of the input parameters consisting of quantitative values was calculated. Therefore, the input parameters are basically classified into 2 groups: (i) The values of the input parameters are less than or equal to the median, (ii) The values of the input parameters are greater than the median. Taking into account the input parameters, the following can be listed as: profession, proportions of income, proportions of individual income taxes, giro in the financial yearend, giro in the current financial year, the insured person working alone, the cancellation of the insurance claim, the cancellation of the insurance application, the insurance premium and the amount of damage. The first and second groups are thought to be damaged and notdamaged, respectively. For risk in professional liability, the classes are C_{damaged} = 79 and C_{notdamaged} = 179. In this respect, the probabilities are found as and . The entropy values, in the sense of the average amount of information, can be found using Eq. 1. Using Eq. 2, entropy values were calculated for each value of input parameters in the sense of expected information. Also, using Eq. 3, the information gain for the input variables can be seen in Table 1. As shown in Fig. 1, “the amount of damage” was seen as root of decision tree and it has the maximum value of the information gain.
The risk in professional liability is classified for the case of “less or equal to” of “the amount of damage” and using Eq. 1 the entropies of “the amount of damage” were calculated. For the case of “less or equal to” of the amount of damage, using Eq. 2, entropy values were calculated for each value of the input variables in the sense of expected information. For the same case, also, using Eq. 3, the information gain for the input parameters can be seen in Table 2. As seen from Fig. 1, the “insurance demand cancelled?” was seen to as the left hand of the decision tree and it has the maximum value of the information gain. As clearly seen from Fig. 1, the risk in professional liability was classified for the case of “less or equal to” of the “insurance demand cancelled?” and has been found to be “less or equal to”. On the other hand, there exist 2 different cases for the case “greater” of the parameters of “damaged”. The corresponding decision trees are seen in Fig. 1.
The risk in professional liability was classified for the case of “less or equal to” of “ the amount of damage” and using Eq. 1 the entropies of “the amount of damage” were calculated.
Table 1:  Calculation of gain to main root of the decision tree 


Table 2:  Calculation of gain to “the amount of damage” 


For the case of “yes” of the “insurance demand cancelled”, using Eq. 2, the entropy values were computed for each one of the input variables in the sense of expected information. For the same case, also, using Eq. 3, the information gains of the input parameters were presented in Table 3.
Table 3:  Calculation of gain to “insurance demand cancelled?” 


As shown in Fig. 1, the “insurance demand cancelled” was seen to be the left hand of the decision tree and it has the maximum value of the information gain. As seen in the corresponding figure, the “notdamaged” was seen to be the right hand of the decision tree. As seen from Fig. 1, the risk in professional liability was classified for the case “greater” of “profession” (Table 4). Also, the information gain of the input parameter (proportions of corporate income tax) was found to be maximum for the case “no” of “insurance demand cancelled”. The corresponding decision trees were given in Fig. 1.
For the case “less or equal to” of “proportions of corporate income tax”, the risk in professional liability was found to be “giro in the financial yearend”.
Table 4:  Calculation of gain to “proportions of individual income taxes” 


Table 5:  Calculation of gain to “proportions of corporate income taxes” 


Table 6:  Comparison of insurance policy with computer prediction 


When taking the case “greater” as presented in Fig. 1, the input variable “proportions of corporate income tax” was seen to be maximum gain of the information. The risk in professional liability of those input factor “the insured person working alone” was shown in Fig. 1 and Table 5. In a similar manner, the rest of the details of the decision tree can be read from Fig. 1. The produced program codes using the decision tree is as follows:
• 
If “The amount of damage” is “Greater” than “Damaged” 
• 
If “The amount of damage” is “Less or equal to” and “Insurance demand cancelled?“ is “Yes” and “Proportions of individual income taxes” is “Less or equal to” than “Not Damaged” 

… 
C # programming codes were generated by using decision tree to determine the effects of input variables. The produced codes were tested for 54 policies.
RESULTS AND DISCUSSIONS
The accuracy of the software developed using 258 policies (exam group) was tested on 54 policies (testing group). As seen in Table 6, 47 policies from the testing group were calculated correctly (87.04%). The results presented in the table also showed that 16 out of 20 damaged policies were estimated correctly (80.0%). The produced codes predicted 31 policies of 34 “not damaged” policies correctly (91.18%).
One of the most common approaches is the entropy encountered in applied sciences. The insurance demand in terms of the entropy of the probability distribution for losses was characterized in Nakata et al.^{1}. They tried to find out if there was a relationship between the level of loss and the purchase insurance requirements. In order to estimate losses, the entropy approach was applied by Najafabadi et al.^{2} to crop insurance. The degree of competition, market structure and market power in the insurance society were examined through the entropy by Bello et al.^{3}. FernándezDurán and GregorioDomínguez^{4} concentrated on the calculation of credibility factors based on the concept of relative entropy between the claim size distributions of the policyholder. More details of the related studies carried out in the field of insurance through the entropy can be found in the references^{57}.
Some decision models for insurance were attempted to be created in Huang et al.^{13} using 300 insurance companies from a Taiwanese insurance company. In this article, decision trees were used to make purchases. Five main types of insurance were included in this study, including life, annuity, health, accident and investmentoriented insurances.
A comparative analysis of predictive performance of a battery of data mining techniques using reallife automotive insurance fraud data was provided by Gepp et al.^{14}. For a prediction, a successful comparison was made by between logistic regression, neural network and decision tree classifiers^{15}. Yet, it was proposed to use a causal inference framework to measure the price elasticity of auto insurance. Their model allows one for estimating priceelasticity functions at the individual policyholder level^{16}.
CONCLUSION
The assessment of the risk levels of professional insurance companies and the best prices for customers are very important in daily life. The level of risk in professional liability insurance and optimal price for many clients were successfully discovered with the decision tree algorithm with entropy. It was concluded that effective decisionmaking based on the risk factors of insurance is provided in various branches of professional liability for multivariables. The computed results were realized to be in agreement with the policies, over 87% of the results. According to the results produced by the tool designed in this study, there was a minimum risk for professional liability insurance companies and at the same time the prices were the most favorable for clients. Note also that insurance companies are able to distinguish between risk policies and calculate premiums based on possible risk. Insurance companies can request low premiums for low risk policies and high premiums for high risk policies. This study is also believed to help insurance companies decide whether policies are renewable. This study is believed to help researchers who want to uncover critical areas of insurance companies and affordable prices for many customers.