Prevalence models in health science

I chose to divide generic prediction models applied in health science and administration into two main groups: Models based on general activity measures such as number of hospitalizations, LOS, number of visits, cost, diagnose groups, age, geography and other background information. A second and neglected group of models is based on prevalence of specific activity measures common for a substantial part of the population in question. Prediction models in health take advantage of RFM-I methodology from market analysis, which have previously been mentioned in posts on SAS macros on this blog, below I discuss the simplicity of prevalence models.

Prevalence models have my special attention as pivot for machine learning and deep learning models. Prevalence models include indicators on activity common among 1%, 5% or 10% of a population, e.g. diagnoses, operations and procedures common to 1% of the patients from a ward with a retroperspective ranging from months to years. Background information on age, gender, geography, total cost etc may be added, furthermore and more importantly a clinical specialist may request addition or exclusion of operations, procedures and have other demands for quantitative measures mirroring the clinical developmental program of a specialization. Prevalence models offer very flexible modelling frameworks for quality analysis and decision support tools in the clinic.

In the result below I define a population of patients visiting a ward within a particular month. Then I add information on their activity patterns from the LPR (Danish National Health Register) in 2 years retroperspective and information on whether they are hospitalized (acute) within the next month. Indicators are defined using a short dummy-variable coding function and aggregated with ML techniques. The R-function use a key-variable V_CPR, and needs to be adapted before it is applicable in other settings…

dummyl <- function(data, varname, vallevels,datevar,evaldate){
data<-data[trimws(data[[which(names(data)==varname)]]) %in% vallevels,]
for(i in 1:length(vallevels)){

A 200 line code script generates a fairy good raw prevalence model for prediction of acute hospitalization with a AUC above 0.92, the probability of aligning a pair of patients correct based on estimated risk for acute hospitalization is very high. Least squares and subsequently logistic regression makes a solid foundation for a stable and adjustable prediction model.

#Example of usage, generating indicators for 5% prevalence model used for accumalating measures in regression analysis

The data extraction and manipulation uses SQL and ML R-packages RODBC, tidyr, stringr and dplyr. Estimation requires basic R algorithms and GLM modeling. 

0 views0 comments

Recent Posts

See All

SAS University Edition load error solution

Using SAS University Edition with VirtualBox in Fedora requires kernel signing of 3 modules. Avoid this by disabling Secure Boot in BIOS settings. Secure Boot makes it impossible to load nonauthorized

©2020 by Danish Institute for Data Science. Proudly created with