## Warning: replacing previous import 'dplyr::filter' by 'stats::filter' when
## loading 'flumodelr'
## Warning: replacing previous import 'dplyr::lag' by 'stats::lag' when
## loading 'flumodelr'
The following brief descriptions of test data used in this package.
This file contains the complete set of data reported to 122 Cities Mortality Reposting System. The system was retired on 10/6/2016. 122 cities across the United States reported the total number of death certificates processed and the number of those for which pneumonia or influenza was listed as the underlying or contributing cause of death by age group. A death is reported by the place of its occurrence and by the week that the death certificate was filed. Fetal deaths are not included. After the system was retired, mortality has been monitored via a more general mechanism described here:
National Center for Health Statistics (NCHS) mortality surveillance data – > NCHS collects death certificate data from state vital statistics offices for > all deaths occurring in the United States. Pneumonia and influenza (P&I) deaths are identified based on ICD-10 multiple cause of death codes. NCHS surveillance data are aggregated by the week of death occurrence and as a result, P&I percentages based on the NCHS surveillance data are released two > weeks after the week of death to allow for collection of enough data to produce a stable P&I percentage. The NCHS surveillance data based on P&I percentage for earlier weeks are continually revised and may increase or decrease as new and updated death certificate data are received from the states by NCHS. The seasonal baseline of P&I deaths is calculated using a periodic regression model that incorporates a robust regression procedure applied to data from the previous five years. An increase of 1.645 standard > deviations above the seasonal baseline of P&I deaths is considered the “epidemic threshold,” i.e., the point at which the observed proportion of deaths attributed to pneumonia or influenza was significantly higher than would be expected at that time of the year in the absence of substantial influenza-related mortality.
https://www.cdc.gov/flu/weekly/overview.htm
cdc122 <- (flumodelr::cdc122city)
dim(cdc122)
#> [1] 346342 8
head(cdc122)
#> # A tibble: 6 x 8
#> region state city year week deaths_pnaflu deaths_allcause
#> <dbl> <chr> <chr> <int> <int> <int> <int>
#> 1 1 MA Bost~ 1962 1 11 262
#> 2 1 MA Bost~ 1962 2 11 270
#> 3 1 MA Bost~ 1962 3 5 237
#> 4 1 MA Bost~ 1962 4 12 285
#> # ... with 2 more rows, and 1 more variable: deaths_65older <int>
Viral Surveillance — Data collection from both the U.S. World Health Organization (WHO) Collaborating Laboratories and National Respiratory and Enteric Virus Surveillance System (NREVSS) laboratories began during the 1997-98 season. During the 1997-98 season 43 state public health laboratories participated in surveillance, and by the 2004-05 season all state public health laboratories were participating in surveillance.
The CDC cautions against cross-sectional evaluations:
The number of specimens tested and % positive rate vary by region and season based on different testing practices including triaging of specimens > by the reporting labs, therefore it is not appropriate to compare the magnitude of positivity rates or the number of positive specimens between regions or seasons.
Regarding subtype:
The U.S. WHO and NREVSS collaborating laboratories report the total number > of respiratory specimens tested and the number positive for influenza types > A and B each week to CDC. Most of the U.S. WHO collaborating laboratories also report the influenza A subtype (H1 or H3) of the viruses they have isolated, but the majority of NREVSS laboratories do not report the influenza A subtype.
For more information: http://www.cdc.gov/flu/weekly/overview.htm#Viral
nrevss <- (flumodelr::nrevss)
dim(nrevss)
#> [1] 14094 12
head(nrevss)
#> # A tibble: 6 x 12
#> state year week spec_tot spec_pos type_A_h1n1 type_A_h1 type_A_h3
#> <chr> <int> <int> <int> <dbl> <int> <int> <int>
#> 1 Alab~ 2010 40 54 0 0 0 0
#> 2 Alas~ 2010 40 40 0 0 0 0
#> 3 Ariz~ 2010 40 40 2.5 0 0 1
#> 4 Arka~ 2010 40 15 0 0 0 0
#> # ... with 2 more rows, and 4 more variables: type_A_np <int>,
#> # type_A_us <int>, type_B <int>, type_A_h3n2 <int>
Outpatient Illness Surveillance — Information on patient visits to health care providers for influenza-like illness is collected through the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet).
CDC cautions regarding cross-sectional comparisons:
The number and percent of patients presenting with ILI each week will vary > by region and season due to many factors, including having different provider type mixes (children present with higher rates of ILI than adults, > and therefore regions with a higher percentage of pediatric practices will have higher numbers of cases). Therefore it is not appropriate to compare the magnitude of the percent of visits due to ILI between regions and seasons.
For more information, see: http://www.cdc.gov/flu/weekly/overview.htm#Outpatient
ilinet <- (flumodelr::ilinet)
dim(ilinet)
#> [1] 19556 7
head(ilinet)
#> # A tibble: 6 x 7
#> state year week ili_perc ili_tot providers patients
#> <chr> <int> <int> <dbl> <int> <int> <int>
#> 1 Alabama 2010 40 2.13 249 35 11664
#> 2 Alaska 2010 40 0.875 15 7 1714
#> 3 Arizona 2010 40 0.675 172 49 25492
#> 4 Arkansas 2010 40 0.696 18 15 2586
#> # ... with 2 more rows
The MMWR Week is a unique definition used by the CDC. This definition is born out in the ?cdc122city
dataset.
The following explanation of the definition was obtained from: https://wwwn.cdc.gov/nndss/document/MMWR_Week_overview.pdf
“The MMWR week is the week of the epidemiologic year for which the National Notifiable Diseases Surveillance System (NNDSS) disease report is assigned by the reporting local or state health department for the purposes of MMWR disease incidence reporting and publishing. Values for MMWR week range from 1 to 53, although most years consist of 52 weeks.”
CDC business rules for assigning MMWR week:
“The first day of any MMWR week is Sunday. MMWR week numbering is sequential beginning with 1 and incrementing with each week to a maximum of 52 or 53. MMWR week #1 of an MMWR year is the first week of the year that has at least four days in the calendar year.”
This means if Jan. 1 occurs on a Sunday, Monday, Tuesday or Wednesday, the calendar week that includes Jan. 1 would be MMWR week #1. If Jan 1 occurs on a Thursday, Friday, or Saturday, the calendar week that includes Jan 1 would be the last MMWR week of the previous year (#52 or #53). Because of this rule, December 29, 30, and 31 could potentially fall into MMWR week #1 of the following MMWR year.
All ?ilinet
, ?cdc122city
and ?nrevss
datasets are imported in their raw CDC format. It is the perogative of the user to determine how best to treat the CDC MMWR week definition. However, we illustrate below how the above description was used to compute a first day of week, date variable for the example dataset ?fludta
.
flumodelr::epiweek_dt
#> function (year, weeknum)
#> {
#> jan4 <- ymd(paste(year, 1, 4, sep = "-"))
#> DofW <- wday(jan4, week_start = 7) - 1
#> startweek <- if_else(DofW == 7, jan4, (jan4 - (DofW)))
#> d0 = startweek + (weeknum - 1) * 7
#> d1 = startweek + (weeknum - 1) * 7 + 6
#> return(list(d0 = ymd(d0), d1 = ymd(d1)))
#> }
#> <bytecode: 0x00000000246871b8>
#> <environment: namespace:flumodelr>
Alternative package which addresses this:
Xiahong Zhao (2016). EpiWeek: Conversion Between Epidemiological Weeks and Calendar Dates. R package version 1.1. https://CRAN.R-project.org/package=EpiWeek
Also see ?epiweek_dt()
Code adapted from EpiWeek
package.
example <- cdc122 %>%
mutate(FirstDateOfWeek = epiweek_dt(year, week)[[1]])
example %>% select(year, week, FirstDateOfWeek) %>%
distinct() %>%
dplyr::filter(year>2010) %>%
head(.)
#> # A tibble: 6 x 3
#> year week FirstDateOfWeek
#> <int> <int> <date>
#> 1 2011 1 2011-01-02
#> 2 2011 2 2011-01-09
#> 3 2011 3 2011-01-16
#> 4 2011 4 2011-01-23
#> # ... with 2 more rows