Other useful functions • regtools

This vignette describes some additional function in regtools that can be useful when working with registry data in Norway. It is important to note that the get_population_ssb() function requires an internet connection and therefore only works outside of TSD (Services for sensitive data).

Harmonize municipality codes

In general terms, Norway is administratively divided in two main levels: counties (fylker) and municipalities (kommuner). During the past 10 years, Norway has undergone a municipal structural reform with the purpose of building larger local governments. Starting in 2016, several Norwegian municipalities have merged with each other and in some cases have changed counties. The amount of municipalities was progressively reduced from 428 to 357 municipalities in the period between 2017-2020. Similarly, the amount of counties has been reduced from 19 counties in 2018 to 11 counties in 2020, before being increased again to 15 counties in 2024.

The constant changes in the administrative structure of Norway exemplify common challenges of working with registry data. For instance, the merging of municipalities and counties hinders analyses with geographical components, as the population composition of the different regions in Norway has significantly changed. This problem is specially evident in longitudinal analysis as the ability to compare between different time periods can be affected by changes in the classification systems.

One way of ensuring the comparability between different time points is to use harmonized classifications. As most of the changes in the structural reforms represent the merging of municipalities or counties, it is possible to know what the municipality code in 2010 would be in the new merged municipalities in 2024. However, in the case of municipalities that split up through the years, it is not possible to know exactly what there equivalent would be. For important time series, Statistics Norway currently publishes results using this harmonized classifications.

For example, imagine you have individual-level data for your population of interest including place of residence in the year 2017 and 2024. However, you want to determine whether the prevalence of a certain disease has significantly changed between 2024 and 2017 by place of residence. The first step would be to ensure that all the codes and classifications in your data have stayed consistent and are comparable between 2017 and 2024. The function harmonize_municipality_codes() aids researchers to harmonize municipality codes (between 1994-2024):


# Silenced CLI output for this example
simulated_list <- regtools::synthetic_data(
  population_size = 100,
  prefix_ids = "P000",
  length_ids = 6,
  family_codes = c("F45", "F84"),
  pattern = "increase",
  prevalence = .023,
  diag_years  = 2017,
  sex_vector = c(0,1),
  y_birth = c(2010:2018),
  filler_codes = "F",
  filler_y_birth = c(2000:2009),
  invariant_codes = list("innvandringsgrunn" = c("ARB", "NRD", "UKJ")),
  invariant_codes_filler = list("innvandringsgrunn" = c("FAMM", "UTD")),
  varying_query = "kommuner",
  date_classifications = "2017-01-01", 
  seed = 123
)

residence_df <- simulated_list$datasets$var_df

The residence_df includes then a column varying_code with the municipality codes in 2017.


head(residence_df)
#> # A tibble: 6 × 3
#>   id         year_varying varying_code
#>   <chr>             <dbl> <chr>       
#> 1 P000025558         2017 5.1         
#> 2 P000037543         2017 5           
#> 3 P000043041         2017 3           
#> 4 P000053240         2017 5.1         
#> 5 P000090076         2017 5.2         
#> 6 P000117065         2017 4

Using harmonize_municipality_codes() you can easily get the harmonized codes (standard 2024), harmonized name and corresponding county in 2024.

residence_df_harmonized <- regtools::harmonize_municipality_codes(
  data = residence_df,
  municipality_col = "varying_code", 
  fylke = TRUE)
#> ! NAs in municipality code column in residence_df: 0
#> ────────────────────────────────────────────────────────────────────────────────
#> ✔ Successfully matched old municipality codes with harmonized municipality codes
#> ℹ Total matched rows: 0

head(residence_df_harmonized)
#> # A tibble: 6 × 7
#>   id        year_varying varying_code harmonized_code harmonized_name fylke_code
#>   <chr>            <dbl> <chr>        <chr>           <chr>           <chr>     
#> 1 P0000255…         2017 5.1          NA              NA              NA        
#> 2 P0000375…         2017 5            NA              NA              NA        
#> 3 P0000430…         2017 3            NA              NA              NA        
#> 4 P0000532…         2017 5.1          NA              NA              NA        
#> 5 P0000900…         2017 5.2          NA              NA              NA        
#> 6 P0001170…         2017 4            NA              NA              NA        
#> # ℹ 1 more variable: fylke_name <chr>

Get population SSB

get_population_ssb() is a useful wrapper of the function ApiData() from the package PxWebApiData. The main goal of this function its to facilitate retrieving population information from Statistics Norway, and performing some handy operations like aggregating ages or sex. As mentioned before, this function requires an internet connection and will most likely not work inside of a secure environment (like TSD).

If you would want to get the population in 2020 and 2021 for every county in Norway using the harmonized codes we have previously mentioned for individuals aged 10-15:

population_fylke<- get_population_ssb(
  regions = "fylker", 
  years = c(2020, 2021), 
  ages = c(10:15),
  aggregate_age = TRUE,
  by_sex = TRUE,
  save_xslx = FALSE)
#> ℹ Retrieving population of fylker for the years: 2020,2021, and ages: 10,11,12,13,14,15
#> ℹ Aggregating ages...
#> ✔ Population dataset ready!

head(population_fylke, 10)
#> # A tibble: 10 × 7
#>    region_code sex_code age   year  population region_name   sex_value
#>    <chr>       <chr>    <chr> <chr>      <int> <chr>         <chr>    
#>  1 03          1        010   2020        3742 Oslo - Oslove Males    
#>  2 03          1        011   2020        3678 Oslo - Oslove Males    
#>  3 03          1        012   2020        3645 Oslo - Oslove Males    
#>  4 03          1        013   2020        3365 Oslo - Oslove Males    
#>  5 03          1        014   2020        3291 Oslo - Oslove Males    
#>  6 03          1        015   2020        3249 Oslo - Oslove Males    
#>  7 03          1        Total 2020       20970 Oslo - Oslove Males    
#>  8 03          1        010   2021        3687 Oslo - Oslove Males    
#>  9 03          1        011   2021        3702 Oslo - Oslove Males    
#> 10 03          1        012   2021        3658 Oslo - Oslove Males