Skip to contents

The calculate_prevalence_series() function calculates prevalence rates series based on the given diagnostic and demographic information. Use calculate_prevalence() function for only one time period or time point. Prevalence represents the number of cases of a given diagnosis that exist in a population of interest at a specified point or period in time.

Usage

calculate_prevalence_series(
  linked_data,
  time_points,
  id_col = "id",
  date_col = "date",
  pop_data,
  pop_col = "pop_count",
  grouping_vars = NULL,
  only_counts = FALSE,
  suppression = TRUE,
  suppression_threshold = 5,
  CI = TRUE,
  CI_level = 0.99,
  log_path = NULL
)

Arguments

linked_data

A data frame containing linked relevant diagnostic and demographic information.

time_points

A list containing either individual time points or time period (range).

  • For time points, each element of the list should be an individual year. For example, time_points = list(2012, 2013, 2014)

  • For time periods, each element of the list should have two years representing the range of years in the desired period. For example, time_points <- list(c(2012,2014), c(2014,2016), c(2016,2018))

id_col

A character string. Name of ID (unique personal identifier) column in linked_data. Default is "id".

date_col

A character string. Name of the date column in linked_data. Default is "date".

pop_data

A data frame containing corresponding population count information.

pop_col

A character string. Name of the column containing population counts in pop_data.

grouping_vars

Character vector (optional). Grouping variables for the aggregation of diagnostic counts (e.g. sex, education).

only_counts

Logical. Only want diagnostic counts? Default is FALSE.

  • If TRUE, return only counts.

suppression

Logical. Suppress results (counts and rates) in order to maintain statistical confidentiality? Default is TRUE.

  • If TRUE, applies primary suppression (NA) to any value under the threshold defined by suppression_threshold

suppression_threshold

Integer. Threshold used for suppression, default is set to 5 (NPR standard).

CI

Logical. Want to compute binomial confidence intervals? Default is TRUE.

  • If TRUE, add two new columns with the upper and lower CI bound with significance level defined by CI_level. Uses the Pearson-Klopper method.

CI_level

A numerical value between 0 and 1. Level for confidence intervals, default is set to 0.99

log_path

A character string. Path to the log file to append function logs. Default is NULL.

  • If NULL, a new directory /log and file is created in the current working directory.

Value

Prevalence series for specified time points/periods

Examples


log_file <- tempfile()
cat("Example log file", file = log_file)

pop_df <- tibble::tibble(year = c(2012:2020), population = floor(runif(9, min=3000, max=4000)))
linked_df <- linked_df |> dplyr::rename("year"= "y_diagnosis_first")

prevalence_df <- calculate_prevalence_series(linked_df,
  time_points = list(2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020),
  id_col = "id",
  date_col = "year",
  pop_data = pop_df,
  pop_col = "population",
  only_counts = FALSE,
  suppression = TRUE,
  suppression_threshold = 1,
  CI = TRUE,
  CI_level = 0.95,
  log_path = log_file)
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2012
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2013
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2014
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2015
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2016
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2017
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2018
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2019
#> Computing prevalence rates/counts...
#>  Suppressed counts using 1 threshold
#>  Removed 0 cells out of 1
#> Joining with `by = join_by(year)`
#> 
#>  Prevalence rates ready!
#> 
#> ── Summary ─────────────────────────────────────────────────────────────────────
#>  Diagnostic and demographic data: linked_data
#>  Population data: pop_data
#>  Grouped by variables: 
#>  For time point/period:  2020