

Genome-Wide Association Studies (GWAS) have emerged as a critical tool applied in the detection of genes that are found in association with certain phenotypes. Thus, it has made it possible for reliable inferences to be made into the genetic basis for the differences in the phenotypic characteristics of organisms1 to conduct genome-wide association studies, a large set of genomic data is required to reliably identify the presence of Single Nucleotide Polymorphism (SNP) distributed across various populations of an organism. This implied that GWAS in its elementary form is a mathematical assessment of the association between single nucleotide polymorphisms in the genome and a defined phenotype of interest1. Holmes et al.2 posted that the association between genes and phenotypes becomes significant where the differences in the genotype at a particular locus relate to the variations in the phenotype to a larger extent than might be likely by chance.
The availability of human genomic data was made possible with the completion of the Human Genome Project which provided an increased understanding of the human genome and genetic variations3. However, more than just the knowledge of the genome is needed to fully conduct genome-wide analysis studies. A detailed description of human haplotypes and discovery of linkage equilibrium maps are also required in the identification of small numbers of SNPs capable of representing the most variations in the human genome1,4. The increase in the amount of information about the human genome and all other genetic information needed to conduct GWAS studies has made it possible for several researchers to undertake these studies. This is evident by the increase in the number of GWAS publications now available5-8. Currently, there are over 3,000 human genome-wide association studies resulting in the detection of over 60,000 SNPs associated with a vast array of phenotypes7.
Although, GWAS began as a tool for investigating the correlation between the human genome and phenotypic characters, it can now be applied in various fields of study such as virology. Specifically, genome-wide association studies have been applied to study the genetic determinants of viral diseases in humans. One of the most extensively studied viral diseases is the Human Immunodeficiency Syndrome-1 (HIV-1). The first genome-wide association studies of infectious diseases were done on HIV-1. This made it possible for us to go beyond the existing knowledge to new exciting facts about the interactions between HIV and host genetics9. It has been established that HIV-1 interacts with various host factors during its infection and replication. Amongst the host factors studied, the Human CC-Type Chemokine
Receptor 5 (CCR5-Δ32) gene is the only confirmed genetic factor that modifies the outcomes of HIV-1 diseases. The CCR5 is a part of the G-protein Coupled Receptor (GPCR) family that is predominantly found on the surface of leucocyte cells such as the macrophages, T- cells and monocytes10. Earlier research on HIV-1 treatment focused on the development of inhibitors that blocks the binding of HIV-1 to CD4 cells because it was thought that the presence of CD4 alone was sufficient for the progression of HIV-1 infection. However, it is now understood that the binding of HIV-1 to CD4 receptors though necessary for HIV infections, is not sufficient to enable HIV to invade the host cells. A second step and coreceptor are needed, the CCR5 and CXR4 receptors3.
The absence or abnormalities in the CCR5 receptor has been shown in several studies to increase the host’s resistance to the infection and progression of HIV-18. The HIV-1 commonly uses the CCR5 receptors to invade immunological cells. The CCR5 receptors are found on the surface of immune cells, thus providing a means of entry for HIV-1 viral particles to infect the cell. In cases where mutation of the CCR5 gene occurs resulting in a 32-base pair deletion and the absence of CCR5 on cell surfaces makes humans that are homozygous for 32 to be resistant to HIV-1 infections. Also, people heterozygous for 32 have recorded delays in the progression of HIV infections, a less rapid decline in the CD4 cell count and even a lower viral load in circulation11,12.
Considering the role of the CCR5, treatment of HIV-1 infections using a genetic approach has been proposed to block the CCR5 from being expressed. In one study, T-cells modified not to express CCR5 were mixed with wild T-cells that express the CCR5 to perform a challenge test against HIV-1. Modified T-cells became predominant over the wild-type because the HIV-1 viral particles destroyed the wild-type CCR5 cells while the modified CCR5 cells were resistant to the HIV-1 cells9.
The discovery of the role of CCR5 receptors and the impact of its modification through genome-wide analysis studies could have a substantial impact on medical care. For example, it was only possible to produce new antiviral drugs because of the increase in our understanding of the role of CCR5del32 homozygotes as the molecular basis of HIV-1 resistance13-17.
This review paper will highlight the relationship between the CCR5 receptor and HIV-1 infection and its significance in identifying significant phenotypic associations.
Units of genetic variation used in GWAS: At the early ages of GWAS, the use of a single nucleotide polymorphism was the unit for investigating genetic variations18. However, with the advent of more genomic technologies, different techniques and measures of genetic variations about phenotypes have become popular. These include linkage disequilibrium, direct and indirect associations and population stratification19,20.
CCR5 receptors and HIV-1 resistance: The CCR5 receptor is a member of the chemokine receptor family and is usually expressed on different cell types, including macrophages, activated T lymphocytes and dendritic cells. The chemokine receptor family is made up of an extracellular N- terminus, an intracellular C-terminus, 7 transmembrane helices and 3 extracellular loops. The elements that are located in the second extracellular loops and the N-terminus are specifically critical for interacting with HIV-1 during the phase of entry. This has directed attention to the application of CCR5 for designing antiretroviral therapies more productive10.
Additionally, there is an extra advantage of CCR5 as a cellular target because it is not necessary for the normal functioning of the immune system unlike the CD4 receptors and the CXCR4 viral co-receptor (the CD4 and CXCR4 have essential roles in immune function, thus, limiting their use as targets of antiretroviral agents)10.
Homozygous individuals for the Δ32 mutation of CCR5 demonstrate the comparative dispensability of the CCR5 co-receptor and have been shown to have greater resistance to HIV-1 infections19. Also, individuals who are heterozygous for the Δ32 mutation of the CCR5 receptor progress relatively slower to AIDS than those who are homozygous for the wild-type gene21,22.
Furthermore, there exists a correlation, between the density of CCR5 on CD4 T-cells, RNA viral loads and the development of AIDS in infected patients21. This relationship has been established in vitro. In a study undertaken by Heredia et al.23 it was reported that the antiretroviral activity of CCR5 antagonists was directly impacted by CCR5 surface antigens. The experiment established the correlation between the CCR5 levels with the inhibition rates of HIV-1 entry23,24.
These discoveries, coupled with the ostensible therapeutic effect observed when patients with leukaemia and AIDS received a transplant of Δ32 homozygous hematopoietic stem cells have given impetus for the application of blockers of CCR5 receptors for inhibiting entry and infection of HIV-124. Currently, these blockers include antagonists of CCR5, drugs that reduce the CCR5 surface density, CCR5 antibodies, fusion proteins targeting the CCR5 N-terminus and other essential sites of the CCR5 receptor24-28.
Remarkable suppression of HIV-1 entry has been recorded by some CCR5 blockers in clinical trials28,29. Within the extracellular environment, entry inhibitors have a further appeal because of their ability to cause immobilization of HIV-1 where it is reachable to the immune system30.
Determining the relationship between the CCR5 gene and the outcome of HIV-1 infections: To ascertain the role of the CCR5 gene on the outcome of HIV-1 infections, two groups of human participants must be presented, a control group (participants without HIV-1 infection and a treatment group (HIV-infected participants). A good population size should be chosen, usually above 1000 participants for each group25. A suitable genotyping platform is thereafter chosen to obtain the genotypic data of the CCR5 gene. The genotypic data of the CCR5 genes can be produced using microarray technology which allows the determination of the polymorphisms present in the sampled population26.
Some of the available chips can be used to analyze as many single nucleotide polymorphisms as possible. The data generated from genotyping is statistically analyzed to ascertain the relationship between the presence of the SNPs in the experimental population and the outcome of HIV-1 infections in the sampled population. The statistical analysis begins with a holistic quality control analysis aimed at determining the accuracy of the genotypic data obtained26. The statistical analysis involves testing the statistical hypothesis for each single nucleotide polymorphism obtained. The results of the statistical analysis were thereafter analyzed and inferences are made based on the analysis. Primarily, the differences in the SNPs between the healthy patients and infected patients are examined for a significant relationship between the deviations in the genes (SNPs) and the phenotypes in the population27. In this case, the phenotypes are the measurable traits such as the rate of HIV progression, level of CD4 cells in the participants etc. If these traits are proven to be in significant association with the alterations in the CCR5 genes, it is considered to be a determinant of the outcome of HIV-1 infection. The GWAS is usually concluded by providing a biological or clinical validation to ensure the reliability of the findings28.
Studies on GWAS and susceptibility to HIV infections: Globally, there is a huge disease burden and deaths resulting from infectious agents. The World Health Organization29 reported a statistic that the deaths recorded from infectious diseases were about 12 million people. This has created the need for a more holistic approach to understanding disease processes and the determinants of susceptibility of humans to infectious agents. There has been remarkable progress in the applications of GWAS in studying genes and phenotypes associated with diseases. This advancement results from the development in the implementation of genomics, bioinformatics and statistical tools. One of the most extensively studied infectious agents and its pattern of infection and resistance by host cells is the Human Immunodeficiency Virus (HIV)30. Samson et al.31 reported that the 32 base pair deletion that occurs in the Chemokine Receptor 5 Gene (CCR5) represses the expression of the receptor, making the host cells resistant to HIV infections.
Apart from the mutation of the CCR5del3 gene that confers cells’ resistance to cells being infected by HIV, there have been several candidate genes have been identified. This includes, Killer-Immunoglobulin-Like-Receptors molecules (KIRs), Human Leukocyte Antigen (HLA), Cytokines, etc32.
Stewart et al.33 reported that the progression of HIV infections may be affected by genetic factors. For instance, humans that are heterozygotes for CCR5del32 have been reported to exhibit susceptibility to HIV infections but the progression of the infection occurs very slowly when compared to other phenotypes.
In a similar study, Mallal et al.34 suggested that genetic factors may likewise play a role in the response to HIV treatment as seen in the role of the Human Leukocyte Antigen (HLA) antigen B*5701 allele in cases of severe hypersensitivity reactions to abacavir (an anti-retroviral drug).
A similar study was undertaken by Fellay et al.30 aimed at investigating the association between single nucleotide polymorphism in the Illumina 550k array and the viral load and the rate of CD4 T lymphocyte count decline in 486 research subjects. It was observed that 15% of the differences in the viral load resulted from two single nucleotide polymorphisms inside the I HLA regions. Interestingly, it was reported that near the HLA-C gene was one of the HLA single nucleotide polymorphisms. The other Human Leukocyte Antigen (HLA) SNP was found in the HCP5.
The study discovered that the genes that regulate the viral load were not overlapping with the genes identified to regulate the disease progression although there was a correlation between the parameters. The findings of this study were replicated severally to establish the effects of HCP5 and HLA-C on either HLA-A10 or HLA-B*57, thus pointing to the linkage disequilibrium within the Human Leukocyte Antigen (HLA) region35.
Another study by Fellay et al.30 was expanded to involve over 2500 patients infected with HIV. Their study showed that the genetic variants that are present close to the HLA-B and HLA-C genes are regarded to be the most significant genetic variants associated with the control of the HIV-1 virus. However, their investigations did not find previously identified loci that were detected through gene studies.
There have been other areas of the HIV infectious process that have been studied by GWAS such as the role of the X chromosome, HIV-associated Human Leukocyte Antigen (HLA) loci in CD4, the ratios of CD8 T lymphocytes and genetic factors that influence viral transmission from mother to child. Ferreira et al.36 suggested that the variants of the Human Leukocyte Antigen (HLA) that are associated with viral load regulation in HIV-1 infections are linked independently with CD4 and CD8 ratio (CD4 and CD8 are dysregulated in HIV-1infections). In a related study by Joubert et al.37, 100 HIV-infected persons were compared with 126 infants that were uninfected using GWAS. In their study, about nine single nucleotide polymorphisms found within sex-different genes were found to be associated with a decreased transmission rate.
Ethical issues in GWAS in Africa: The advent of technological advancement has caused a spike in the interest in human genomic differences as a tool for analyzing common complex human diseases. It has been established that the genetic diversity of humans can be applied in the study of the mechanisms of intricate diseases. The GWAS has been applied expansively in these studies. In fact, for close to a decade, it has been a reliable tool in the identification of loci of the genome that plays a role in the determination of susceptibility or resistance to a wide array of common diseases38-40. Genomic studies focusing on diseases such as GWAS have largely been used to study a wide array of disease conditions but the application of this tool in studying diseases that affect people in low-income nations has been scarce40,41. This makes it imminent for medical genomic studies to be conducted to study diseases that are common in poor countries. These countries are usually characterized by high illiteracy levels and poor healthcare systems. The application of GWAS in these regions will help mitigate the impact of such imbalance42,43.
Despite the relevance of medical genomic research in low-income countries, there are a lot of setbacks encountered in conducting such studies. These setbacks could range from obtaining consent from research participants/volunteers to the privacy of the volunteers and collection and storage of the genomic data obtained44-52.
Protecting the interests of research participants: One of the major challenges faced in the conduct of GWAS in low-income countries is the limited accessibility of testing sites, health care and relevant resources that would make it easier for the participants to get involved. Because of the nature of GWAS, sample collection cannot be done in locations without the required resources. Thus, participants may have to travel far to get involved. This makes it quite difficult for GWAS to be conducted. Although these challenges are faced even in developed countries, the issues are more critical in less developed countries14.
Valid consent: According to the World Medical Association53, before medical research will be undertaken, it is a must to obtain valid consent. The valid consent should generally be adequately informed and understood, it should be voluntary. The processes through which valid consent is obtained should be appropriate to the locally established laws54. However, the processes undertaken to obtain consent may present challenges. One of the major challenges faced is related to providing adequate information to the participants in a comprehensible manner. Due to the illiteracy level in low-income countries, providing comprehensive information on the research to the participants is usually a daunting task55. It may however be possible to describe certain aspects of the research to the participants but providing enough comprehensive information to ensure the participants give their valid consent for GWAS remains a significant challenge51,56.
Regulating human genomics research: It is common to see collaborations between researchers from low-income and high-income countries especially when the research is centred on diseases prevalent in low-income countries. During such collaborations, it is common for advanced molecular analysis such as genotyping and whole-genome analysis to be done in developed countries. This creates issues about the use of the stored samples and data and the possession and ethical review of the research51,56.
Sample export and ownership: To perform GWAS, access to sophisticated facilities and personnel with relevant expertise and experience is required. These are often available in a few nations around the world. As a result, the collected samples for genome-wide analysis are usually exported for analysis and quality control. This transport of samples to other countries for analysis results in serious ethical issues51,56.
Most concerns arise because of the uncertainty over the control and use of the samples in the country to which they are exported to. This concern becomes aggravated knowing the participants will have limited or no control over their samples57,58.
Archived samples: Genome-wide analysis studies require the assemblage of a very large sum of samples and well-characterized phenotypes. The large quantity of samples implies that the GWAS will be labour-intensive, requiring high technology for genomic studies. Such facilities are not available in low-income countries. Consequently, it is often not feasible to conduct such genomic research and it may not also be possible to collect genomic samples for every research. As a result, genomic data are often archived. Issues on the decisions for the re-use of human genomic samples arise and this further complicates obtaining consent for GWAS. Therefore, there is a need to strike an equilibrium between the ethical consequences of assembling thousands of samples against the ethics of analyzing archived samples with no valid consent from the original participants58.
Limitations of GWAS: The application of genome-wide association studies may fail under certain conditions to detect new loci that are susceptible to certain diseases even though a large sample size is used for the study59. Wray et al.60 reported that genome-wide analysis studies may perform poorly in determining associations of genotype and phenotype in cases of rare diseases because of the difficulty in obtaining the sample size required.
Another potential pitfall of GWAS is the issue of validation and discovery sample overlap59. Sometimes, there may be an overestimation of the validation samples (an independent sample with determined/known phenotypic characters), especially in situations where they are more strongly associated with the discovery sample when compared to the target samples.
Wray et al.60 proposed that the similarities of population stratification can inflate the accuracy of the association in cases where there is a very close match between both the validation and discovery samples with population stratification but does not suggest a strong match to the stratification of the targeted sample.
Moore et al.59 pointed to another GWAS limitation observed in the use of standard logical regression. Their study thus suggested that more advanced algorithms should be used to illuminate the potential relationship with variations in DNA sequences, variations in susceptibility to disease and environmental exposure. The findings of GWAS have aided the improvement of antiretroviral therapy for HIV-1 treatment.
Scarcity of world-class genomic laboratories mitigates the application and advancement of GWAS in Africa. Therefore, more genomic research centres should be established to promote the use of GWAS in Africa.
The GWAS should be deployed to study more viral diseases to find ways of developing effective antiviral therapies.
Genome-wide association studies have been developed as an approach that involves the scanning of markers across selected sets of DNAs of a large population to find possible genetic variations that are associated with a particular disease. The information about the identified relationship between genes and diseases is used by researchers to build better strategies for disease detection, treatment and even prevention.
The GWAS has been applied in medical virology research to find the genes associated with HIV-1 disease and its progression. Various genome-wide analysis studies have established that the abnormalities of the CCR5 receptor such as the CCR5-Δ32 mutation and the absence of the receptor confer resistance of the host to HIV-1 infection and even slow down the progression of the infection. Furthermore, there exists a correlation, between the density of CCR5 on CD4 T-cells, RNA viral loads and the development of AIDS in infected patients. This has resulted in the development of antiretroviral therapeutic agents that blocks the expression of the CCR5 receptors. These CCR5 blockers have immense potential in HIV-1 therapy as they have a high potential for prevention and treatment. This seminar text thus emphasizes the benefits of embracing GWAS and its applications in therapeutic strategies against viral infectious diseases.
Although, there has been a significant increase in the quality and the number of publications on the applications of Genome-Wide Association Studies (GWAS), there are very few journals or none that link the application of GWAS and the setbacks it faces in Africa despite being hotspot for HIV. Also, this review paper presented both empirical and theoretical research as evidence for the role of the CCR5 co-receptor in HIV-1 infections. Generally, this review deepens the scientific appreciation for GWAS as a critical tool not just in studying HIV-1 related infection and other viral diseases. Providing emphasis on the setbacks of GWAS in Africa provides key knowledge to the public, especially those in Africa about some of the critical aspects of conducting molecular genetic research in human populations. This is believed to increase the level of cooperation in future GWAS experiments.