Identification of Pathogenic Missense Mutations in the CHRNA5 Gene: A Computational Approach

Aim The CHRNA5/A3/B4 gene locus is closely related to nicotine dependence and other smoking-related disorders. Coupling genetic and clinical studies of nicotine dependence and smoking behaviors may open new avenues for medication development. The aim of this study is to investigate the functional missense mutations in the CHRNA5 gene. Methodology The Ensembl database was used to gather data on missense mutations of the human CHRNA5 gene. Computational tools viz. SIFT (Sorting Intolerant From Tolerant), PolyPhen (Polymorphism Phenotyping), PROVEAN (Protein Variation Effect Analyzer), I-Mutant, and MutPred were used to uncover the pathogenic mutations in the gene under investigation. Results Among 161 missense variants reported inthe CHRNA5 gene, 94 variants were found to be highly pathogenic. Moreover, 20 were pathogenic and 4 were not pathogenic. Conclusion The computational analysis disclosed harmful mutations in the CHRNA5 gene which could be potentially associated with smoking-related traits.


Introduction
Smoking increases the risk of lung cancer, heart disease, and chronic obstructive pulmonary disease [1].Worldwide, countries have placed the reduction of smoking prevalence by early identification, prevention, and prompt treatment as their pre-eminent public health goal.Tobacco usage is propelled explicitly by nicotine dependence for a large part of smokers, considering that nicotine is the chief chemical culpable of tobacco addiction and reinforcement [2].
Hereditary influences contribute significantly to the process of nicotine dependence as has been proved with indisputable evidence [2].Family and twin studies constitute the evidence of a high-intensity heritability of nicotine addiction [3,4].Exhaustive genome-wide association study (GWAS) meta-analyses conducted recently, have indicated the importance of variation in the nAChR (nicotinic acetyl cholinergic receptor) subunit genes as the most vital genetic contributor to smoking behaviors [5,6].
A very crucial research area encompasses missense mutations, which are responsible for over half of all reported inherited diseases [7].These are single nucleotide variants that result in amino acid substitutions at the protein level and are of two types, conservative and non-conservative (depending on the functional status of the protein) [8].
The most plausible authentication for smoking phenotypes such as the amount smoked (cigarettes per day, CPD), has been extricated from the CHRNA5-CHRNA3-CHRNB4 gene cluster situated on chromosome 15q25.1,which encodes for the subunits alpha5, alpha3, and beta4 [9].On further scrutinization of the 15q25.1 region, the most well-established locus is indicated by the functional SNP rs16969968 which results in an amino acid change (D398N) in the alpha5 subunit, which in turn contributes to an increased nicotine intake by lowering the capacity of nAChRs to produce a timely inhibitory signal intended to control and limit nicotine consumption [10].The link between the SNP rs16969968 in CHRNA5 and nicotine dependence was first noted in a candidate gene analysis done by Saccone SF et al. in 2007 [11].
Therefore, this study was formulated to identify potential functional mutations that may have a possible association with tobacco initiation, addiction, and cessation through the implementation of computational tools.

Data extraction
The rationale underlying the choice of CHRNA5 as the gene for this study is that genome-wide association studies have identified associations between the CHRNA5-CHRNA3-CHRNB4 gene cluster and smoking heaviness and nicotine dependence.Hence, the identification of pathogenic mutations in the CHRNA5 gene is an indispensable step to obtain an elaborate rendition of the pathogenesis of the gene.The FASTA sequence of the gene was downloaded from the National Center for Biotechnology Information website [12].The data on missense mutations of the human CHRNA5 gene were collected from the Ensembl database [13].As of June 2022, 161 missense mutations were identified and screened using three distinct computational tools which are SIFT (Sorting Intolerant From Tolerant), PolyPhen (Polymorphism Phenotyping), and PROVEAN (Protein Variation Effect Analyzer).The systematized data derived from the three software programs were subjected to further scrutiny using I-Mutant and MutPred to determine the stability of protein variants and their pathogenic potential respectively.Specifications regarding each of these software have been discussed in detail below.

Analysis using the SIFT program
An amino acid alteration is categorized by SIFT as either tolerated or deleterious to protein function.Intolerant/deleterious substitutions are those with a tolerance index less than 0.05 and those which are higher than 0.05 are classified as tolerated [14].

Analysis using the PolyPhen program
PolyPhen is a computerized program that predicts the potential effects of an amino acid substitution on a human protein's structure and function.A multitude of sequence, phylogenetic, and structural characteristics that describe the substitution are used to make the prediction.It then calculates the likelihood that the missense mutation will be damaging using an amalgamation of all these characteristics [15].

Analysis using the PROVEAN program
PROVEAN is a computer software program that determines if an amino acid substitution or indel will affect the biological functioning of a protein [16].

Analysis using the I-Mutant program
I-Mutant v3.0 is a tool that can forecast how a single-point mutation will impact the stability of a protein structure and how substantially a mutation in a protein sequence will or won't affect the stability of a folded protein [17].

Analysis using the MutPred program
MutPred2 is a tool that prioritizes pathogenic amino acid substitutions more effectively than current approaches, analyzes probable disease-causing molecular pathways, and provides decipherable pathogenicity score patterns on individual genomes.The chances that the mutation will have detrimental effects are identified [18].

Results
Figure 1 represents a schematic view of the results of the study.

FIGURE 1: Schematic representation of the present investigation on the missense variants of CHRNA5 gene
The list of missense variants in the transcript (ENST00000299565.9)CHRNA5 gene classified on the basis of their influences as determined by the three gene prediction programs (SIFT, PolyPhen, and PROVEAN) were listed in a table (Appendix A).Out of 161 missense variants analyzed and screened, 134 SNPs were observed to possess a damaging potential as discerned by all three prediction computational tools elaborated in the methods column (Appendix A and Figure 2).

FIGURE 3: Percentage distribution of mutations with increased and decreased stability as assessed by I-Mutant
Moving forward, the MutPred software tool was employed to assess the pathogenicity of variants with decreased stability.From the 118 missense variants, 94 were observed to be highly pathogenic, 20 were observed to be pathogenic and 4 were observed to be not pathogenic with MutPred scores of >0.75, >0.5, and <0.5 respectively (Appendix C and Figure 4).

Discussion
Nicotine dependency (ND) is a multifaceted condition with substantial rates of morbidity and mortality.According to Li et al., the projected heritability for ND is 0.46 in women and 0.59 in men, showing a sizable genetic influence [19].
For the identification of potentially harmful missense mutations, numerous computational methods have been invented which are also known as variant effect predictors.However, because different tools use distinct predictive traits, they frequently disagree with one another.Performance can be enhanced by ensemble approaches, which integrate the findings of numerous individual predictors [20].
Hence in this study, we used multiple computational tools such as SIFT, PolyPhen, PROVEAN, MutPred, and IMutant to identify pathogenic missense mutations in the CHRNA5 gene.These techniques use sequence information from homologs, structural information, like accessible surface area, and changes in amino acid properties to provide feature information as input to machine learning methods for phenotype prediction.They are conditioned on existing sets of mutation/phenotype association data [21].
In 2008, Berrettini et al. hypothesized that the CHRNA5/CHRNA3 genes that predispose a person to nicotine addiction create haplotypes, through their study of a European population [22].Then, in 2009, Saccone et al. identified 11 SNPs in the CHRNA3 gene, one related SNP in the CHRNB4 gene, and five SNPs with substantial links with smoking in the CHRNA5 gene [23].Al-Omoush et al. determined from a genomic DNA analysis of Jordanians that the rs16969968 SNP in the nAChR gene cluster CHRNA5-A3-B4 is strongly linked to waterpipe smoking dependency in Jordanians [24].
Genetic polymorphisms in CHRNA5 may increase the risk of nicotine dependency in Caucasian and African-American populations, according to a case-control study conducted by Sherva et al. [25].
Delving deeper, the relationship between smoking cessation and CHRNA5-A3-B4 gene variants was analyzed by researchers to result in conflicting conclusions.The results of a study done by Tyndale et al. exhibited an absence of association between CHRNA5-A3-B4 genetic variations/haplotype and posttreatment smoking cessation traits in Caucasian smokers by means of measuring the levels of cotinine which is the metabolite formed in the body after nicotine consumption, regardless of the high expected association between CHRNA5-A3-B4 genetic variants and pre-treatment smoking behaviors [26].
These observations are in consonance with previously conducted large genome-wide association studies which reported that smokers with CHRNA5-A3-B4 variations on chromosome 15 consume more tobacco than non-smokers but are not related to smoking status(i.e., current smoker vs. former smoker), demonstrating that these variants are not linked to smoking cessation [5].However, in converse, Chen and coworkers and Bergen and coworkers found strong relationships between the CHRNA5-A3-B4 haplotype and quitting smoking in smokers who were administered a placebo treatment [27,28].
Projected fatalities from tobacco-related diseases are anticipated to ascend to unfathomable numbers in the coming years unless effective preventive measures are taken and prompt treatment can be provided.The utilization of genetics to enhance our current knowledge of the neurobiological mechanisms of nicotine is of prime importance in developing the foundational knowledge required for prompt action [29].Hence, the utilization of software such as SIFT, PolyPhen, PROVEAN, I-Mutant, and MutPred can streamline the process of extricating genetic data as has been proven in previous literature, apart from saving time and manpower resources [30].
However, an important limitation of the study is the analysis of missense variants of only one transcript of the gene(ENST00000299565.9).Moreover, only missense mutations were analyzed in this study.Further elaborate studies taking into account all transcripts and other types of mutations are required to arrive at more irrefutable results.
The present in silico study was carried out to curate genetic variants from a large pool of single nucleotide variants reported in the CHRNA5 gene.Although there are many different types of variants, the missense variants were considered to be of utmost importance since they result in amino acid change, which eventually leads to structural or functional changes in the protein.Multiple tools were used to identify The tools such as SIFT, PolyPhen, and PROVEAN provide scores based on the substitution of amino acids.Those missense variants that were found to be associated with deleterious phenotype as ascertained by the 3 tools were further analyzed for protein stability using IMutant Suite.The majority of the variants were found to show decreased protein stability.Furthermore, these variants were checked for pathogenicity using another tool known as MutPred which predicts the impact of the genetic variants on protein functions based on 50 different protein properties.Thus, the present study provided us with 94 highly pathogenic variants, which can be further validated using experimental procedures to gain insight into their association with nicotine metabolism or smoking cessation.

Conclusions
There have been considerable advances in decoding the role of genetics in nicotine dependence.However, inconsistent results and associations that are commonly small in magnitude pose a remarkable challenge to completely understanding genetic influences.Taking into account the above reports, this study has facilitated the process of extracting and organizing important preliminary data using an elaborate data extrication process to discern the potentially pathogenic variants of the CHRNA5 gene.This preliminary data will act as a useful tool and lay the foundation for further population-based studies that are warranted to conclude the association of these pathogenic variants with tobacco initiation, addiction, and cessation.

FIGURE 4 :
FIGURE 4: Percentage distribution of highly pathogenic, pathogenic, and not pathogenic mutations as assessed by MutPred