Overview
What is a variant?
Viruses constantly change through a process called mutation. When a virus has one or more new mutations, it’s called a variant of the original virus. Since SARS-CoV-2, the virus that causes COVID-19, was discovered, there have been hundreds of variants identified and described. The original virus that was discovered in Wuhan has not been seen in the United States since the middle of 2020.
Are all variants important for public health?
Not necessarily. The World Health Organization (WHO) defines as the following:
- Variants of Concern: According to the WHO, a variant of concern contains changes that have resulted in one or more of the following: reduced efficacy of treatments or vaccine, increased disease severity, and increased spread from person to person.
- Variants of Interest: According to the WHO, a variant of interest contains changes that may result in reduced efficacy of treatments or vaccine, increased disease severity, and increased spread from person to person.
The CDC also uses the term Variants Under Monitoring (VUM) to describe a variant with genetic changes that are suspected to affect virus characteristics and early signals of growth advantage relative to other circulating variants, but for which evidence of disease or transmission impact remains unclear
Are variants determined for every positive COVID-19 test?
Not all positive COVID-19 tests proceed to sequencing for a variety of reasons. To determine which variant has infected an individual, a sample from that individual needs to be sequenced. Sequencing means performing additional techniques to determine the genetic makeup of the virus after initial detection.
Why is sequencing variants important?
Public health agencies track the number and geographic distribution in the population of variants, to monitor the viral evolution within the circulating strains, the importation of other strains from outside of the jurisdiction, and to watch for the emergence of variants of concern or of interest. This helps us to understand changes in the number of COVID-19 cases and hospitalizations and to plan for the future. For example, tracking a variant with high transmissibility can help hospitals prepare for a surge in cases.
Tracking Variants
The Wadsworth Center is currently sequencing COVID-19 virus specimens with a capacity up to approximately 100 per day. Specimens are selected at random from throughout the state to provide surveillance across all geographic locations and data analyzed across the entire sequence of the virus. The analyses include assessment for mutations that indicate variants of concern and variants of interest. Other laboratories in New York State are conducting similar work. The results from Wadsworth and other laboratories are uploaded into public databases, primarily GISAID. From this database, sequence data from all contributors can be downloaded and analyzed for a more complete picture of virus trends across the state and the distribution of variants from these analyses can be summarized over time.
Current Summary
The New York State Department of Health monitors the prevalence of SARS-CoV-2 variants by summarizing sequence data deposited in GISAID, an online database of viral genome sequences. CDC produces a related summary using sequences reported to their national SARS-CoV-2 genomic surveillance program, which can be found here.
The Department of Health's Wadsworth Center Laboratory continues to actively monitor COVID-19 virus samples selected throughout the State to compare sequences and identify circulating and new variants. The Department also monitors all data submitted to public sequence databases by the many other sequencing laboratories throughout New York State and across the US, contributing to a robust and collaborative surveillance program for variant analysis. The Department will continue to communicate openly with New Yorkers as we closely follow WHO actions and work with our partners at the CDC.
Data Source Explanations
GISAID
These data are pulled from the Global Initiative on Sharing Avian Influenza Data (GISAID) database, the world’s largest database for SARS-CoV-2 sequence data. The sequence data entered into GISAID may come from surveillance sequencing programs, more targeted cluster investigations, or other research.
From this database, we pull only information related to cases from New York State, broken down by specific time frame of specimen collection. Because sequence uploads to the database can be delayed, data for the most recent time interval are based on a small number of specimens and should be interpreted with caution. When data are next abstracted, it is likely that the number of specimens for that interval will increase.
Why are there differences between the results from GISAID and CDC?
Sequencing is a time-intensive activity. Sequencing results obtained on a given day are typically based on specimens taken in the days and weeks prior. In a given most recent two-week period of sequence data from GISAID, the number of specimens may thus be lower than the positive test results for that time period, as sequence data will continue to be generated and submitted on those samples during the following weeks. The sequences already there will typically come from samples that were collected during the early portion of the two-week period. As more data is generated, the sequencing results for specimens from this same period increase, and the overall analysis becomes more stable, representing the true variant proportions that were circulating during that two week period. To address these lags in information for the most recent weeks, CDC uses a statistical model called Nowcast to project variant proportions for more recent timeframes. This model is subject to assumptions and additional statistical uncertainty. The New York State GISAID data directly reflect the frequency of different variants in a particular time period, with no statistical extrapolation. In the context of a rapidly-increasing variant, these factors may cause short-term differences between the data from both systems.
What is the difference between the variant data on this page and on the wastewater variant data page?
This page reports on the prevalence of variants sequenced from clinical data, i.e. positive COVID-19 tests from testing at a medical facility. The wastewater variant page reports on variant levels, from ongoing surveillance of wastewater data to detect COVID-19 by participating wastewater treatment facilities. They are complementary ways of understanding variant prevalence across New York State.