


Deoxyribonucleic Acid (DNA) is a hereditary material that everyone possesses and serves as the blueprint for each person. Human DNA structure is unique to each person and can be used to distinguish one individual from another1. In forensics, personal identification through DNA analysis is an effective diagnostic tool. Short tandem repeats (STRs) are microsatellites with repeat units ranging in length from 2 to 7 base pairs. The STR analysis is a popular molecular biology technique for analyzing allele repeats at particular loci in DNA between two or more samples. In the PCR amplification stage, several STR loci were used as primers. According to the recommendation of the Federal Bureau of Investigation (FBI) in 1997, 13 STR loci were used including THO1, TPOX, CSF1PO, vWA, FGA, D3S1358, D5S818, D7S820, D13S317, D16S539, D8S1179, D18S51 and D21S11 which is known as the STR Combined DNA Index System (CODIS)2.
In forensic identification, there are three types of conclusions: Exclusion, inclusion and inconclusive. Exclusion conclusion when there are unmatched alleles (unmatched) at least two distinct STR loci despite being matched at other loci. If it fits all loci used, the conclusion is inclusion1. If the obtained findings are to be included, statistical interpretation is required. The estimated number of allele frequencies is put into the calculation formula for calculating the probability of paternity2.
The power of discrimination of each locus used in DNA analysis in an area is linked to the accuracy of the use of loci in DNA analysis in that area, where the differentiating power is determined by the number of allele varieties and the frequency of each allele at each locus. The greater the diversity of alleles found, the greater the value of heterozygosity and power of discrimination (PD) at that locus. The higher the heterozygosity and PD values generated, the better the locus is used in forensic DNA analysis3. The only study on each population or community can reveal the variety and allele frequency of each locus4.
The Mentawai population is an Indonesian tribe that lives in West Sumatra. They originate from the Proto-Malay race, came from Yunnan around 2000 BC and arrived in Indonesia via Indochina through the Malay Peninsula to Sumatra5. The Mentawai tribe is considered the oldest tribe in Indonesia with a culture that is still unique.
West Sumatra is an area that is prone to disaster. In particular, the Mentawai Islands and the west coast of West Sumatra Province are the areas closest to the epicenter of the earthquake and the potential for a tsunami6. Therefore, microsatellite DNA research or STR for forensic purposes needs to be done to provide a DNA profile database.
Indonesia consists of various ethnic groups with diverse cultures. Research on allele frequencies and genetic variations of each ethnic group in Indonesia has not been widely established. A study of 13 STR loci panels was conducted to analyze Mentawai populations as an enhancement of the Indonesian population genetic data library and its genetic variation.
Study area: The research was conducted in the city of Padang, West Sumatera and the sample examination was carried out at the National Police DNA Laboratory, Jakarta. The study takes place from August, 2022 through February, 2023.
Sample collection and DNA extraction: Peripheral blood samples from 42 unrelated healthy individuals from Mentawai who come from three generations of history of native Mentawaians were obtained. Written informed consent was obtained from every participant. The DNA was extracted from blood with a PrepFiler kit according to the manufacturer’s instructions (ThermoFisher, Applied Bio systems, Foster City, California, USA).
Polymerase chain reaction amplification and short tandem repeat typing: Using a GlobalFiler kit and the GeneAmp PCR System 9700, all 13 STR loci were concurrently amplified (ThermoFisher, Applied Biosystems, Foster City, California, USA). Thirteen STR loci were used in the STR genotyping, which was carried out on the ABI PRISM 3500 Genetic Analyzer (ThermoFisher, Applied Biosystems, Foster City, California, USA) and analyzed by the GeneMapper ID 3.2 program (ThermoFisher, Applied Biosystems, Foster City, California, USA). The 13 STR loci were used/GlobalFiler kit (Thermofisher, applied biosystems, Foster City, California, USA) (Table 1).
Table 1: | The 13 STR loci were used in this study GlobalFiler kit (Thermofisher, applied biosystems, Foster City, California, USA) | ||||
Loci | Allele control | Allele range | Allele size (bp) | Repeat motive | Chromosome location |
TPOX | 8, 8 | 5-15 | 332.5-384.5 | (AATG)n | 2p23-2 per |
D3S1358 | 15, 16 | 9-20 | 90.5-146.5 | (TCTR)n | 3p21.31 |
FGA | 24, 26 | 13-51.2 | 221-380 | (YTYY)n | 4q28 |
D5S818 | 11, 11 | 7-18 | 133.5-189.5 | (AGAT)n | 5q21-31 |
CSF1PO | 11, 12 | 6-15 | 277-325 | (AGAT)n | 5q33.3-34 |
D7S820 | 7, 12 | 6-15 | 256.5-304.5 | (GATA)n | 7q11.21-22 |
D8S1179 | 12, 13 | 5-19 | 108.5-176.5 | (TCTR)n | 8q24.13 |
TH01 | 7, 9, 3 | 4-13.3 | 174-219.5 | (TCAT)n | 11p15.5 |
vWA | 14, 16 | 11-24 | 151-215 | (TCTR)n | 12013.31 |
D13S317 | 11, 11 | 5-16 | 197-249 | (TATC)n | 13q22-31 |
D16S539 | 9, 10 | 5-15 | 221.5-347.5 | (GATA)n | 16q24.1 |
D18S51 | 12, 15 | 7-27 | 255.5-347.5 | (AGAA)n | 18q21.33 |
D21S11 | 28, 31 | 24-38 | 179.5-246.5 | (TCTR)n | 21q11.2-q21 |
Ethical consideration: This study received ethical approval from the Research Ethics Committee of the Faculty of Medicine, Universitas Andalas, 761/UN.16.2/KEP-FK/2022.
Statistical analysis: Easy DNA software7 and FORSTAT v1.0 software8 were used in calculating the allelic frequencies and forensic parameters including expected heterozygosity, power of discrimination (PD) and polymorphism information content (PIC). Heterozygosity and power of discrimination are considered high if greater than 75% and the PIC value was considered highly polymorphic if greater than 0.5.
There were 90 alleles observed with corresponding allelic frequencies from 0.012 to 0.56 in the group. The least number of allele variants were found at the D5S818 locus (4 allele variants) while the highest number of alleles were at the D18S51 locus (14 allele variants) (Table 2).
The expected heterozygosity was found in the lowest range for TPOX (0.607) to the highest for FGA (0.866). Power discrimination (PD) values were identified from TPOX (0.792) to FGA (0.968). The average PIC value is 0.694 with the most and least informative loci being vWA (0.921) and TPOX (0.642), respectively (Table 3).
Three off-ladder alleles were identified in this study. Off-ladder alleles are alleles that are not present on the allele ladder. The allele ladder at the TH01 locus (Globalfiler kit) namely alleles 4, 5, 6, 7, 8, 9, 9.3, 10, 11 and 13.3 were shown in Fig. 1. In this study, we found allele 6.3 that was not present in the allele ladder. Figure 2 also showed that the alleles 21.2 and 24.2 was absent in the allele ladder allele FGA locus, which is known as the off-ladder allele.
Table 2: | Allele frequency 13 STR loci in Mentawai populations |
Allele | CSF1PO | TPOX | THO1 | D13S317 | D16S539 | D18S51 | D21S11 | D8S1179 | D7S820 | D5S818 | D3S1358 | FGA | vWA |
6 | - | - | 0.155 | - | - | - | - | - | - | - | - | - | - |
6.3 | - | - | 0.012 | - | - | - | - | - | - | - | - | - | - |
7 | - | - | 0.262 | - | - | - | - | - | - | - | - | - | - |
8 | 0.012 | 0.56 | 0.048 | 0.202 | - | - | - | - | 0.179 | - | - | - | - |
9 | 0.012 | 0.167 | 0.429 | 0.31 | 0.107 | - | - | - | 0.012 | - | - | - | - |
9.3 | - | 0.024 | - | - | - | - | - | - | - | - | - | - | |
10 | 0.31 | 0.024 | 0.071 | 0.119 | 0.214 | 0.024 | - | 0.095 | 0.167 | 0.381 | - | - | - |
11 | 0.286 | 0.226 | - | 0.274 | 0.369 | 0.012 | - | 0.083 | 0.44 | 0.262 | - | - | - |
12 | 0.321 | 0.24 | - | 0.071 | 0.226 | 0.06 | - | 0.167 | 0.202 | 0.274 | - | - | - |
13 | 0.06 | - | - | 0.012 | 0.083 | 0.036 | - | 0.119 | - | 0.083 | - | - | - |
14 | - | - | - | 0.012 | - | 0.143 | - | 0.274 | - | - | 0.071 | - | 0.131 |
14.2 | - | - | - | - | 0.012 | - | - | - | - | - | - | - | |
15 | - | - | - | - | - | 0.31 | - | 0.179 | - | - | 0.238 | - | 0.06 |
15.2 | - | - | - | - | - | - | - | - | - | - | - | ||
16 | - | - | - | - | - | 0.19 | - | 0.048 | - | - | 0.381 | - | 0.107 |
16.2 | - | - | - | - | - | - | - | - | - | - | - | - | |
17 | - | - | - | - | - | 0.06 | - | 0.024 | - | - | 0.202 | - | 0.19 |
18 | - | - | - | - | - | 0.06 | - | 0.012 | - | - | 0.107 | 0.024 | 0.381 |
18.2 | - | - | - | - | - | - | - | - | - | - | - | - | |
19 | - | - | - | - | - | 0.024 | - | - | - | - | - | 0.119 | 0.107 |
20 | - | - | - | - | - | 0.048 | - | - | - | - | - | 0.024 | 0.024 |
20.2 | - | - | - | - | - | - | - | - | - | - | 0.012 | - | |
21 | - | - | - | - | - | 0.12 | - | - | - | - | - | 0.143 | - |
22 | - | - | - | - | - | 0.12 | - | - | - | - | - | 0.226 | - |
28 | - | - | - | - | - | - | 0.071 | - | - | - | - | - | - |
29 | - | - | - | - | - | - | 0.214 | - | - | - | - | - | - |
30 | - | - | - | - | - | - | 0.119 | - | - | - | - | - | - |
30.2 | - | - | - | - | - | 0.012 | - | - | - | - | - | - | |
31 | - | - | - | - | - | - | 0.202 | - | - | - | - | - | - |
31.2 | - | - | - | - | - | 0.06 | - | - | - | - | - | - | |
32 | - | - | - | - | - | - | 0.083 | - | - | - | - | - | - |
32.2 | - | - | - | - | - | 0.131 | - | - | - | - | - | - | |
33.2 | - | - | - | - | - | 0.095 | - | - | - | - | - | - | |
34.2 | - | - | - | - | - | 0.012 | - | - | - | - | - | - |
Table 3: | Expected heterozygosity (He), power of discrimination (PD) and match probability (MP) in Mentawai populations |
Loci | He | PD | PIC |
CSF1PO | 0.715 | 0.865 | 0.664 |
D13S317 | 0.769 | 0.910 | 0.679 |
D16S539 | 0.748 | 0.897 | 0.673 |
D18S51 | 0.832 | 0.954 | 0.683 |
D21S11 | 0.857 | 0.963 | 0.704 |
D3S1358 | 0.741 | 0.892 | 0.673 |
D5S818 | 0.704 | 0.857 | 0.653 |
D7S820 | 0.705 | 0.867 | 0.673 |
D8S1179 | 0.832 | 0.951 | 0.707 |
FGA | 0.866 | 0.968 | 0.656 |
TH01 | 0.716 | 0.877 | 0.695 |
TPOX | 0.607 | 0.792 | 0.642 |
vWA | 0.774 | 0.921 | 0.921 |
Average | 0.759 | 0.901 | 0.694 |
![]() |
Fig. 1: | Allele ladder TH01 loci and allele 6.3 in TH01 loci in Mentawai populations Allele 6.3 is not found in the allele ladder TH01 loci, so allele 6.3 is called off-ladder allele |
There were 90 alleles observed in 13 STR loci in this study. The highest frequency allele was allele 8 at the TPOX locus (0.56) and the lowest frequency (0.012) was allele 8 and 9 at the CSF1PO loci, allele 6.3 at the TH01 loci, alleles 13 and 14 at the D13S317 loci, allele 11, 14.2, 21, dan 22 at the D18S51 loci, allele 30.2 and 34.2 at the D21S11 loci, allele 18 at the D8S1179 loci, allele 9 at the D7S820 loci and allele 20.2 at the FGA loci (Table 2). The lowest allele frequency was 0.012 indicating that this allele is rarely found in the Mentawaians. These interpretations show that when someone from Mentawaians has this allele, it is unlikely that a random person taken from the population would have the same allele (only 12 people have the allele in 1000 people of Mentawaians). The least number of allele variants was found at the D5S818 loci (4 alleles), TPOX, D7S820, D3S1358 and D16S539 loci (5 alleles), while the highest number of allele variants was at the D18S51 loci (14 alleles). The expected heterozygosity was found in the lowest range for TPOX (0.607) to the highest for FGA (0.866). Power of discrimination (PD) values was identified from TPOX (0.792) to FGA (0.968). Expected heterozygosity describes the level of genetic diversity in the population, while the power of discrimination provides a probability of how likely two individuals have different genotypes. The value of the power of discrimination is determined by the number of allele variations and allele frequencies. A high variety of alleles and a small allele frequency will lead to high heterozygosity and power of discrimination9. In this study, low allele frequency, a large variety of alleles, the highest expected heterozygosity and the power of discrimination was at the FGA locus. The FGA locus performed the best of the 13 loci tested in this study, with strong discriminatory power to distinguish one individual from another, making it appropriate for use in forensic identification processes.
Nikmatul Iza et al.3 found the highest allele frequency at allele of 8 TPOX loci by analyzing 13 loci in Javanese and Madurese. Research on the Indonesian population using 13 CODIS loci also obtained the same result that the highest allele frequency was found at allele 8 of the TPOX locus4. This data suggested that variants of an STR locus are very frequently found in a population (high frequency).
![]() |
Fig. 2: | Allele ladder FGA loci and allele 24.2 and 21.2 FGA loci in Mentawai populations Allele 24.2 and 21.2 is not found in the allele ladder FGA loci, so allele 24.2 and 21.2 is called off-ladder allele |
The STR locus is considered less specific to distinguish individuals in that population and this locus should not be used in individual identification9.
Polymorphic information content (PIC) is often used to measure whether a genetic marker for a related study is informative and its ability to detect polymorphisms among individuals of a population. The PIC values range from 0 (monomorphic) to 1 (very informative). A PIC value greater than 0.5 is considered highly informative, a value between 0.25 and 0.50 is relatively informative and a value lower than 0.25 is not very informative. The PIC values above 0.5 are recommended for genetic studies while those below 0.25 are not recommended10. The average PIC value in this study is 0.694, indicating that the 13 loci used are highly informative.
Three off-ladder alleles were found in this study. Off-ladder alleles are alleles that are not present in the allele ladder, allele 6.3 at the TH01 locus and alleles 21.2 and 24.2 at the FGA locus. An allele ladder is a common allele found in the population at a particular chromosomal location. The allele ladder is used like a molecular ruler to help measure the length of the fragments in a sample11.
Allele 6.3 were found at locus TH01 with 0.012 as its frequency result (Fig. 1). The allele 6.3 of the repeat structure is (AATG)3ATG(AATG)3. The 6.3 alleles were not found in the study for the Indonesian population9. The 6.3 allele was also not found in studies in Malaysia, Hong Kong, Iraq, Ghana, Turkey and Malaysia12-16. Allele 6.3 locus TH01 is a rare allele17. The profile of the 6.3 alleles at the TH01 locus, with its low prevalence, suggests that it is most likely unique to the Mentawai community.
The FGA locus is located on chromosome 4q28, with [YYYY]n repeat motive. Results showed that off-ladder alleles were alleles 21.2 and 24.2 (Fig. 2). The off-ladder is also found in studies in Malaysia, Singapore and India which were also found with low frequency18-20.
Allele 14.2 at the D18S51 locus was also found in this study. Allele 14.2 at the D18S51 locus has not been found in several studies in other countries such as China, Iraq, Ghana, Turkey, India, and Malaysia12-16,18,19. The results of previous studies indicated that the 14.2 allele found in the Mentawai is a rare allele.
Ensuring the frequencies of the detected alleles represent those in the total population is more important than detecting all present alleles in population-based studies. This result could be obtained without having very low-frequency alleles present. Even if they are shared among populations, these rarest alleles are only helpful for some population-based analyses. Their existence may have been identified because of repeated mutations rather than historical relationships or gene flow. As a result, the most informative alleles for assessing genetic structure among people are reasonably prevalent among some people but rarely absent in a large number of people. Very rare alleles are useful for some applications (for example, paternity or kinship analysis), but more information is required when evaluating patterns of genetic diversity or population structure20.
Increasing the sample size automatically increases the accuracy of allele frequency estimation, but the increasing rate will not be linear. Research conducted by Hale et al.21, the more samples we use, the more allele diversity will get, but accuracy is obtained if a minimum of 25 samples are used.
Although the sample size in this study was small (42 participants), allele frequency and genetic variation can be used as preliminary data for the Mentawai population. This study provides the first-ever autosomal STR dataset for the Mentawai population. The mean expected heterozygosity and mean power of discrimination value across the 13 loci were 75.9 and 90.1%. Highly heterozygosity and power of discrimination as well as highly polymorphic (PIC>0.5) indicated that the 13 loci in this study can be used for forensic identification and population genetic studies. Further research is needed with a larger sample to establish a Mentawai population database.
The 13 STR loci in mentawai populations were provided for the 1st time. Based on allelic frequency, heterozygosity, power of discrimination and polymorphic information content, it can be concluded that these 13 autosomal STR loci could be well applied in individual identification and population study in Mentawai populations. 3 off-ladder alleles in this study were also found. Allele 6.3 at the TH01 loci and allele 14.2 at the D18S51 loci are exquisite.
West Sumatra is one of the earthquake-prone areas in Indonesia, the Mentawai Islands are the areas closest to the epicenter of the earthquake. Research on allele frequencies and genetic variations of Mentawaians has not been well established. These data are needed for the forensic identification process. From this study, the allele frequency data for 13 autosomal loci in Mentawai populations were obtained. Three off-ladder alleles were found. Allele 6.3 at the TH01 loci and allele 14.2 at the D18S51 loci are exquisite. The mean expected heterozygosity and mean power of discrimination value across the 13 loci were 75.9 and 90.1%, indicating striking gene diversity. Results concluded that these 13 STR loci can be used for forensic identification and population genetic studies.
We thanked Dr. Ratna Relawati, SpFM, chief of DNA Laboratory Pusdokkkes Jakarta, has authorized the use of the laboratory. This research has been funded by Universitas Andalas, in 2022 with grant number 667/UN16.02.D/PP/2022.