Phylogenetic, Sequencing, and Mutation Analysis of SARS-CoV-2 Omicron (BA.1) and Its Subvariants (BA.1.1, BA.2) During the Fifth Wave of the COVID-19 Pandemic in the Iraqi Kurdistan Region

Introduction In December 2019, a global outbreak of SARS-CoV-2 occurred in Wuhan, China, resulting in the COVID-19 pandemic. Since then, the virus has spread to all countries, necessitating a worldwide initiative to create effective treatments and vaccines. Methods The RNA of samples QIAamp Viral RNA Mini Kit (Qiagen, MD). SARS-CoV-2 RNA was reverse transcribed with SuperScript IV VILO (ThermoFisher Scientific, Waltham, MA). The virus cDNA was amplified in two multiplexed PCR reactions using Q5 DNA High-fidelity Polymerase (New England Biolabs, Ipswich, MA). The genome was entirely sequenced from 40 samples at the Scripps Research Institute (TSRI) in California, USA. The samples were sequenced using a NovaSeq 6000 SP Reagent Kit v1.5 (Illumina, USA). The TSRI then entered these sequences into the GISAID database. The virus sequence was matched to the SARS-COV-2 virus identified in Wuhan, China (accession number: NC 045512.2) using Illumina sequencing technology (Illumina, CA), finding 95 different changes. The NextClade (clades.nextstrain.org) and Mega 11 (https://www.megasoftware.net) software tools were used to analyze SARS-CoV-2 genome sequence alignment and mutation studies. Results Following a sequencing analysis, it was determined that the spike glycoprotein (S) included a total of 38 mutations. Thirty of these mutations were found in the ORF1a gene. Additionally, 11 mutations were found in the ORF1b gene, with the remaining mutations found in the nucleocapsid (N), membrane protein (M), open reading frames 6 (ORF6), open reading frames 9 (ORF9), and envelope (E) genes. The phylogenetic analysis and transmission studies indicated that the isolates discovered in Iraq had separate infection origins and were closely linked to those discovered in other nations and states. Conclusion According to the findings of this study, a new vaccine can be developed based on identifying new Omicron variant mutations and subvariants such as BA.2, which were identified for the first time in Iraq.


Introduction
A global outbreak of SARS-CoV-2 occurred in Wuhan, China, in December 2019, resulting in the COVID-19 pandemic.This sudden epidemic prompted a global effort to develop treatments and vaccines.According to the World Health Organization, as of November 2021, there have been around 260 million confirmed cases of COVID-19, with more than five million deaths worldwide.SARS-CoV-2 is a virus that has a genome of 30 kb and almost 9860 amino acids, with a positive-sense single-stranded RNA structure [1].
The urgency of the epidemic has prompted scientists to dedicate various efforts toward producing effective vaccines.However, viruses, like any other infection, tend to spread more easily in open-air environments.The increase in visitors to crowded areas was less shocking than expected, given the current state of health.Random mutations occur during the replication of SARS-CoV-2 due to the lack of a proofreading mechanism, leading to the formation of numerous strains that have now spread globally [2].Next-generation sequencing NGS technique is commonly used to sequence the genome of SARS-CoV-2 [3].The EpiCoV database comprises nearly 12 million whole-genome sequences (WGS) of SARS-CoV-2, which are publicly accessible through the Global Initiative for Sharing All Influenza Data (GISAID).Presently, GISAID recognizes 11 clades based on mutations shared by markers, with L and S emerging early on during the pandemic.The observation of the evolution and transmission of SARS-CoV-2 became possible because of the rapid generation of its genome.The viruses circulating in various regions began to diverge and create separate lineages caused by mutations that accumulated during the viral genome replication and spread among susceptible individuals [4].Various nomenclature systems are currently used to track the genetic lineages of SARS-CoV-2 at local and global levels.These include WHO labels, GISAIDs, NextStrains, and Pango lineages [5].
The World Health Organization (WHO) has identified four different variants of concern (VOCs).The first variant, B. 1.1.7,was initially detected in the United Kingdom.The second variant, P.1, is a descendant of the gamma variant (B.1.1.28)and was first discovered in South Africa in October 2020.The third and fourth variants are the Beta (B.1.351) and the Delta (B.1.617.2),first identified in Brazil and India, respectively [6].
The COVID-19 pandemic affected Iraqi Kurdistan in three waves: the first occurred from March to December 2020, the second from January to June 2021, and the third from July to December 2021 [7].The Alpha and Beta variants were the most dangerous during the first two waves, while the Delta variant predominated in the third wave [8].In Iraq, the death rates for the first, second, and third waves were 2.15%, 0.58%, and 0.92%, respectively [9].Despite the severity of the Delta variant, fatality rates were lower during the third wave [10], which can be related to a lack of experience in managing COVID-19 cases and limited healthcare facilities during the initial stage of the pandemic.In January 2022, five cases of the Omicron variety were recorded in a family in Duhok after a member returned from overseas [11].
It is essential to conduct continuous genomic surveillance to understand better and respond effectively to the potential epidemiological consequences of new mutations.In our study, we analyzed 40 viral genomes of SARS-CoV-2.As a result, we could identify prevalent variations, clades, and lineages and understand the virus better by identifying mutation patterns.The virus cDNA was amplified in two multiplexed PCR reactions using Q5 DNA high-fidelity polymerase (New England Biolabs, Ipswich, MA).The libraries were purified with AMPureXP beads and quantified using the Qubit High Sensitivity DNA assay kit (Invitrogen, Waltham, MA) and Tapestation D5000 tape (Agilent, Santa Clara, CA).The samples were sequenced using a NovaSeq 6000 SP Reagent Kit v1.5 (300 cycles) (Illumina, Inc., San Diego, CA).

Whole genome sequencing of the Omicron variant of SARS-CoV-2
As previously stated, the genome was entirely sequenced from 40 samples at the Scripps Research Institute (TSRI) in California, USA.The TSRI then entered these sequences into the GISAID database, where they were assigned accession numbers (see the Appendices Table 3).

Genome alignment and phylogenetic analysis
SARS-CoV-2 genome sequence alignment and mutation analyses were performed using the NextClade (clades.nextstrain.org)and GISAID database tools to determine lineage and clade assignments, and the pangolins program (version v.3.1.7)was utilized.MEGA 11 (https://www.megasoftware.net) was used to build phylogenetic trees.To construct a phylogenetic tree, we compared the accession numbers of 40 hCoV-19/Iraq/KR sequences to the GISAID database's closest SARS-CoV-2 genome sequences.

Results
The SARS-COV-2 Omicron variant (BA.1) and its subvariants (BA.1.1,BA.2) were sequenced in Duhok, Iraq, and at the Scripps Research Institute (TSRI) in California from 40 samples.A study sequencing the virus's genome has revealed that the spike glycoprotein (S) has undergone 38 mutations, with 30 of these mutations occurring in the ORF1a gene.Additionally, eleven mutations were found in the ORF1b gene.In contrast, the remaining mutations were located in the nucleocapsid (N), membrane protein (M), Open Reading Frames 6 (ORF6), Open Reading Frames 9 (ORF9), and Envelope (E) genes (Table 1).The phylogenetic analysis and transmission investigations have indicated that the strains isolated in Iraq have distinct infection origins and are closely related to those found in other countries and states (Figure 1).

FIGURE 1: Phylogenetic tree using SARS-COV-2 isolates from various regions and their almost complete gene sequences.
The sequences marked in red color were isolated from Duhok province, Iraqi-Kurdistan.The GISAID databases' websites were used to download the sequence.The MEGA 11 tree-building program employed the neighborjoining approach.
The genome sequences of the SARS-CoV-2 virus were analyzed and compared with those of several countries, including the United States, United Kingdom, Germany, Austria, Kenya, Poland, Denmark, Kyrgyzstan, Malaysia, Morocco, Singapore, and Lithuania.The results indicate that the isolates studied were closely related to previously sequenced SARS-CoV-2 genomes from Poland, the United States, and the United Kingdom, which belong to the BA.1 lineage.However, isolates from other countries were found to have lineages similar to BA.1.1.Notably, the subvariant BA.2 was found only in Duhok and was isolated from the rest of the world (Table 2).The analyzed data were downloaded from GISAID databases, and the uploaded data in this study were compared with other neighboring countries downloaded from GISAID databases and GenBank.For each query sequence, the table gives the sequence number within the input data and the Accession ID (identified from the name provided in the input data).The other columns provide information on the closest related genome, including the match distance, match quality, accession ID, collections date, submissions date, lineage, and country of origin of the matched genome.

Upload accession ID
The identification of the Omicron BA.2 subvariant in Iraq's fifth wave of COVID-19 cases is a significant and unique discovery (Figure 2).

Discussion
In late December 2019, multiple cases of pneumonia were reported in China.It was later discovered that a new strain of coronavirus was the cause [12].SARS-CoV-2 is a virus that has a genome comprising a singlestranded RNA with positive polarity.This means that the orientation of the RNA base sequences is the same as the following messenger RNA (mRNA).SARS-CoV-2 has one of the largest RNA genomes, measuring between 26.4 and 31.7 kilobases.This virus is responsible for causing COVID-19, which has affected millions of people worldwide [13].
This article was previously posted at a preprint server (https://www.preprints.org/manuscript/202308.1167) on 14 August 2023 [14].In April 2022, we uploaded the genome sequences of our isolated viral strain to the GISAID database for analysis between the fourth and fifth waves.The latest COVID-19 wave in Iraq has been linked to the Omicron subvariant (BA.2), according to the coronavirus dashboard of the World Health Organization.This discovery is significant as it has not been previously documented in Iraqi investigations.
Researchers have examined the genetic sequence of the SARS-CoV-2 Omicron variant and found that the S gene has the most mutations, followed by the ORF1ab, N, M, ORF6, ORF3a, ORF9b, and E genes.The genes ORF3a, ORF6, E-gene, and ORF9b have the fewest mutations among these genes.The spike glycoprotein (S) is responsible for the virus's host affinity and pathogenesis; hence, it is the primary target for therapy and diagnostics.The S gene is the structural protein that attaches to the virus and allows host cell receptors to recognize a variety of hosts.Because of the significant mutations found in this gene, it is essential to continuously monitor its evolution to determine its possible impact on public health [16].This gene contains 38 mutations, with each mutation having the encoded protein's amino acid sequence.T76I, T95I, Y145D, L212I, and V213G are examples of non-synonymous mutations, as are several others whose references are indicated by document numbers.N969K and L981F mutations have also been reported, as have deletions such as del211/211, del3674/3676, del69/70, del142/144, del22/23, del68/70, del142/145, and del211/212.Previous research indicates that this gene has a higher mutation probability than other genomic sites [17].According to the GISAID SARS-CoV-2 database, as of June 25, 2020, the prevalence of the S protein variant has increased over time.It is now present in 74% of the known variants.A common mutation of the SARS-CoV-2 virus, known as D614G, was initially discovered in China on January 24, 2020.This mutation replaces the polar negatively charged side chain amino acid aspartate (D) with the nonpolar side chain amino acid glycine (G) [18].D614G significantly enhances transmission, infectivity, and cellular penetration of SARS-CoV-2 in human cells.Positive natural selection drives this [19].
Our study found that some of the most common changes in the RBD improved the binding of ACE2.These changes include N501Y, S477N, and E484K.Additionally, we found that these same substitutions were present in most VOCs, which are associated with greater transmissibility.Furthermore, our study discovered several mutations in T478K, Q498R, and Q493K that increase the electrostatic potential, leading to a stronger binding affinity between RBD and ACE2 [20].Immune escape has been linked to changes in E484K [21].A prior investigation found numerous substitutions (E484A) in several VOCs, including B.1.617.2 (E484K/E484Q), B.1.351(E484K), P. 1, and B.1.1.529 [22].Notably, during the evolution of SARS-CoV-2, the scientists identified numerous mutations, including deletions such as idel69/70 and del142/144.The Nterminal domain (NTD) is a region where deletions frequently occur, including positions 69-70, 141-144, 146, 210, and 243-244.A previous study found that most NTD mutations change the antigenicity or remove epitopes, which may cause immune escape [23].
The mutations of B.1.1.529have spread extensively, but it is unclear how they affect the severity of the virus and the response of polyclonal mAbs.The ORF1a/b is essential for SARS-CoV-2 nucleic acid tests as the nonstructural proteins produced by ORF1a and ORF1b are necessary for maintaining and replicating the virus [24].ORF1ab produces two crucial proteins that aid in viral replication: an RNA-dependent RNA polymerase enzyme and a helicase protein.The most frequent changes among the nonstructural proteins in ORF1ab were observed in nsp3, including a deletion at amino acid positions del2083/2083, A 2710 T, and K 856 R. Variants T 3255 I and I 3758 V were found on nsp4 and nsp6, respectively, accompanied with nonframeshift deletions (del3674/3676) [25].
The N-gene, also known as the nucleocapsid protein, is essential for virus detection, antigen and nucleic acid tests, and vaccine production [26].It is important for the virus's assembly and budding and the host cell's response to viral infection.Its principal role is to maintain the DNA structure within the membrane [27].The most common forms of N-proteins in the current investigation were found to be the mutations of R203K and G204R, which have been observed in various parts of the world [28].A previous study has revealed that these mutations in SARS-CoV-2 can lead to increased virulence and transmission.Because of these mutations, the P13L mutation was detected in the current investigation [29].
Mutations in both spike protein and nucleocapsid protein are crucial for the widespread transmission of the virus.The Omicron variation's N gene has multiple deletions that can affect the accuracy of specific diagnostic kits by interfering with primer binding.However, the impact of these changes on the virus's pathogenicity remains unclear.On the other hand, accessory proteins like ORF6 and ORF9b have been discovered to decrease innate immunity, signaling pathways, and interferon (IFN) production by targeting the MAVS adapter linked with mitochondria [30].
Observing the Omicron BA.2 subvariant in Iraq's fifth wave of COVID-19 cases is a remarkable and unique discovery.This discovery highlights the importance of ongoing genomic monitoring initiatives, global data sharing, and reporting of new variants and subvariants.These measures are crucial for motivating prompt public health responses and minimizing the impact of the pandemic.
The SARS-CoV-2 virus has a higher mutation rate than the other strains of concern.This could increase its transmission rate and make it easier for the virus to bypass the immune system, particularly in the spike protein.Due to significant mutations in the immunogenic epitopes of the spike protein, it has become necessary to develop new vaccines incorporating the Omicron strain as a reference.Further research is needed to determine the effectiveness of current vaccines against Omicron and its infectivity rate.

Conclusions
In this study, 40 SARS-CoV-2 genomes from a clinical sample collected from Duhok, Iraqi Kurdistan, were sequenced.The genome contains additional mutations and known mutations (non-synonymous mutations).Nucleotide sequences contain position-specific mutations such as single-nucleotide variants.Additional analyses were performed on the amino acid substitutions to determine the functional stability of the proteins.Our findings showed that the SARSCoV-2 Omicron variant proteins have 95 different variant locations in their coding regions.The spike protein has the most mutations, followed by six structural proteins (ORF1a, N, M, ORF6, ORF9, and E).We found fewer mutations in ORF6, ORF9, and E proteins than in S proteins and ORF1a.A non-synonymous substitution was observed in all protein variants in the wholegenome sequence.Additionally, we observed an omicron subvariant (BA.2) in Duhok, which has not been previously recorded in other studies from Iraq.The new subvariant (BA.2) findings suggest that public health authorities should strengthen SARS-CoV-2 control strategies by examining mutation patterns in circulating variants.Considering everything, we urge that future studies focus on the impacts and roles of these genetic polymorphisms on virus transmission, disease progression, and intensity.Furthermore, creating novel vaccines, particularly ones containing several variants of concern (VOCs), could aid in the control of recent outbreaks.

FIGURE 2 :
FIGURE 2: The phylogenetic tree illustrates the distribution of the Pango lineage in Iraq, BA.2 (Accession ID: EPI_ISL_12604503) was first observed in Duhok, Iraq.

FIGURE 3 :
FIGURE 3: Daily confirmed COVID-19 cases in Iraq from 2021 to 2022, as reported on the WHO dashboard.

sampling, extraction of viral RNA, and performing real-time PCR
SettingDuhok COVID-19 Hospital is the main tertiary care referral hospital for COVID-19 cases in the Duhok Governorate.It consisted of 50 ward beds and 20 ICU beds.The primary goal of the hospital is to manage severe, critical, and complicated cases.The University of Duhok (UoD) COVID-19 Center for Research and Diagnosis is an independent center under the University of Duhok, which was officially launched in 2021 following endorsement by the Ministry of Higher Education and Scientific Research and the Ministry of Health of the Kurdistan Region of Iraq.Clinical sample and processing,A nasopharyngeal swab was collected from 40 COVID-19 patients and was preceded at the COVID-19 Center for Research and Diagnosis, University of Duhok, in December 2021.The chosen samples were extracted and evaluated quantitatively with a virus detection system called QIAprep & Viral RNA UM Kit (Qiagen, MD),