Calculate incidence rates — calculate

The calculate_incidence() function calculates incidence rates based on the given diagnostic and demographic information. Incidence represents the number of new cases of a given diagnosis that exist in a population of interest at a specified point or period in time.

Usage

calculate_incidence(
  linked_data,
  type = c("cumulative", "rate"),
  id_col = "id",
  date_col = "date",
  pop_data = NULL,
  pop_col = "pop_count",
  person_time_data = NULL,
  person_time_col = NULL,
  time_p = NULL,
  grouping_vars = NULL,
  only_counts = FALSE,
  suppression = TRUE,
  suppression_threshold = 5,
  CI = TRUE,
  CI_level = 0.99,
  log_path = NULL
)

Arguments

linked_data

A data frame containing linked relevant diagnostic and demographic information. Should include only first time diagnosis, see 'curate_diag'

type

Character string. Valid options are "cumulative" or "rate".

id_col

A character string. Name of ID (unique personal identifier) column in linked_data. Default is "id".

date_col

A character string. Name of the date column in linked_data. Default is "date".

pop_data

A data frame containing corresponding population at risk information.

pop_col

A character string. Name of the column containing population counts in pop_data.

person_time_data

A data frame containing corresponding person-time information.

person_time_col

A character string. Name of the column containing person-time counts in person_time_data.

time_p

A numeric value or numeric vector. Time point or time period used to calculate the incidence.

For time period, specify as a range. The first value of the vector is the period's lower bound, and the second element is the period's upper bound. Example: time_p = c(2010,2015)
For time point, single numeric value. Example: time_p = 2010

grouping_vars

Character vector (optional). Grouping variables for the aggregation of diagnostic counts (e.g. sex, education).

only_counts

Logical. Only want diagnostic counts? Default is FALSE.

If TRUE, return only counts.

suppression

Logical. Suppress results (counts and rates) in order to maintain statistical confidentiality? Default is TRUE.

If TRUE, applies primary suppression (NA) to any value under the threshold defined by suppression_threshold

suppression_threshold

Integer. Threshold used for suppression, default is set to 5 (NPR standard).

CI

Logical. Want to compute binomial confidence intervals? Default is TRUE.

If TRUE, add two new columns with the upper and lower CI bound with significance level defined by CI_level. Uses the Pearson-Klopper method.

CI_level

A numerical value between 0 and 1. Level for confidence intervals, default is set to 0.99

log_path

A character string. Path to the log file to append function logs. Default is NULL.

If NULL, a new directory /log and file is created in the current working directory.

Value

Incidence table

Examples

log_file <- tempfile()
cat("Example log file", file = log_file)

pop_df <- tibble::tibble(year = "2012-2013", population = 4500)
linked_df <- linked_df |> dplyr::rename("year"= "y_diagnosis_first")

incidence_df <- calculate_incidence(linked_df,
  type = "cumulative",
  id_col = "id",
  date_col = "year",
  pop_data = pop_df,
  pop_col = "population",
  time_p = c(2012,2013),
  only_counts = FALSE,
  suppression = TRUE,
  suppression_threshold = 10,
  log_path = log_file)
#> 
#> ! To correctly calculate incidence rates, the provided dataset should only contain new/first time diagnoses.
#> 
#> 
#> ✔ Suppressed counts using 10 threshold
#> ℹ Removed 0 cells out of 1
#> 
#> Joining with `by = join_by(year)`
#> 
#> ✔ Cumulative incidence ready