Overview and disclaimers

The development version of runjags::template_huiwalter contains some useful modifications/extra features (as discussed at the 2024 SBED conference). This brief guide illustrates how to use the features in their current partly finished state. This guide was updated on 2024-03-13 to indicate which features are available in the current CRAN version of runjags (2.2.2-4, released 2024-03-10) and which require the development version (≥ 2.2.3).

DISCLAIMER: these features are in development, so the code WILL CHANGE before the final version has been released. There may also be things that don’t work as expected. So please don’t put this into production code just yet…

Installing compilers

For the experimental features, you will need to install the development version of runjags from source - this requires C++ compilers. See the relevant platform-specific instructions below:

Windows

You will need to install Rtools from here: https://cran.r-project.org/bin/windows/Rtools/rtools43/rtools.html

You will also need to set JAGS_ROOT in your Makevars.win file - see this file for more instructions: https://cran.r-project.org/web/packages/runjags/INSTALL

Note that your HOME folder is the default starting directory for an R session - this is NOT the same as R.home. It will typically be C:/Users/your_user_name

Note also that the JAGS_ROOT path must match your installed version - to see what this is you can use:

runjags::findjags()

Something like the following code might work (untested!):

# Change this:
HOME <- "C:/Users/your_user_name"
# And maybe also change this:
jags_path <- "C:/Progra~1/JAGS/JAGS-4.3.1"

stopifnot(dir.exists(HOME))
if(!dir.exists(file.path(HOME, ".R"))) dir.create(file.path(HOME, ".R"))
cat("JAGS_ROOT=", jags_path, " \n", sep="", file=file.path(HOME, ".R", "Makevars.win"), append=TRUE)

macOS

You need to install the command line tools - to do this open a terminal (e.g. Applications/Utilities/Terminal) and type:

xcode-select --install

And follow the instructions. That should be all you need, unless you have a non-standard installation of JAGS (e.g. via homebrew), in which case you will also need to set up pkg-config to point to your JAGS installation.

Linux

Your OS already includes compilers and pkg-config, so you don’t need to do anything!

Install runjags

You can install runjags from GitHub using the remotes package (which you may need to install if you don’t already have it). The code is:

remotes::install_github("ku-awdc/runjags")

This will take longer than usual, and will be more verbose. Look out for errors - these will most likely be due to problems with JAGS_ROOT on Windows!

As this is the development version, updates to the package will occur much more frequently than for the stable version on CRAN - so it is worth re-running this every few weeks to see if anything new is available (and so that you get bug fixes etc).

Install tidyverse and pbapply

The code below also requires the tidyverse and pbapply packages, which you can install from CRAN.

library("runjags")
## Attaching runjags (version 2.2.3-8) and setting user-specified options
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks runjags::extract()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("pbapply")

Introduction

The newer versions of template_huiwalter have some new features and arguments. In particular, constraints are imposed to ensure both that conditional dependence terms do not invalidate probabilities and that Youden’s J (sensitivity+specificity-1) is >= 0 (as discussed at the SBED conference in January). Other features that you might be interested in are:

covariance: this can now be specified as a data frame for specific covariance terms to include

se_priors and sp_priors: these can now be a list of values for different tests/populations, including a default value for anything not specifically named

prev_priors: a new option that works in a similar way to se_priors/sp_priors

specify_populations: this facilitates cross-validation by dropping a population at a time without having to re-generate the model

single_check, agreement_check and outcome_check: different mechanisms of doing model validation (as discussed at the SBED conference) - except for outcome_check these are DEVELOPMENT VERSION ONLY (and undocumented).

ppp-values: this allows post-hoc calculation of Se/Sp by population, as well as ROC curves. IMPORTANT NOTE: this does not currently work as expected when there are correlation terms included, and/or with one or more missing test combinations - DEVELOPMENT VERSION ONLY (and undocumented).

You will also want to make sure that you have your original data in long format, and including SampleID, TestName and Results columns, to enable the ppp features (also, the Results column currently needs to be a factor with two levels, i.e. it won’t work with a logical vector).

Dataset format

This is an example of long format data:

N <- 600
tibble(
    Population = rep(LETTERS[1:3], each=N/3),
    status = rbinom(N, 1, rep(c(0.25,0.5,0.75), each=N/3)),
    FirstTest = rbinom(N, 1, status*0.75 + (1-status)*0.05),
    SecondTest = rbinom(N, 1, status*0.75 + (1-status)*0.05),
    ThirdTest = rbinom(N, 1, status*0.75 + (1-status)*0.05)
) |>
  mutate(SampleID = row_number()) |>
  select(-status) |>
  pivot_longer(ends_with("Test"), names_to="TestName", values_to="Result") |>
  mutate(Result = factor(Result, levels=c(0,1), labels=c("Neg","Pos"))) ->
  long_data

Annoyingly, you (currently) still need to pass this to template_huiwalter as wide data - fortunately it is very easy to convert long data to wide data (and back again):

long_data |>
  pivot_wider(names_from="TestName", values_from="Result") ->
  wide_data

The development version of the function now returns (invisibly) some useful things that we need to save:

indexes <- template_huiwalter(
  wide_data,
  # You can use this code to deactivate correlation terms completely:
  covariance = tribble(~Test_A, ~Test_B, ~Active, ),
  # Required for post-hoc estimations:
  ppp_values = TRUE,
  # New model fit metrics:
  single_check = TRUE,
  agreement_check = TRUE,
  outcome_check = TRUE
)
## The model and data have been written to huiwalter_model.txt in the current working directory
## *** NOTE: The template provided should be examined and modified (including checking ***
## ***   priors, covariance terms, and verifying the code) before running the model!   ***

Have a look at the model code - a few things have changed compared to before…

## Auto-generated Hui-Walter model created by runjags version 2.2.3-8 on 2024-03-13

model{

    ## Observation layer:

    # Complete observations (N=600):
    for(p in PopulationsUsing){
        Tally_RRR[1:8,p] ~ dmulti(prob_RRR[1:8,p], N_RRR[p])
        prob_RRR[1:8,p] <- (prev[p] * se_prob[1:8,p]) + ((1-prev[p]) * sp_prob[1:8,p])
    sim_RRR[1:8,p] ~ dmulti(prob_RRR[1:8,p], N_RRR[p])
            check_outcome_RRR[1:8,p] <- Tally_RRR[1:8,p] - sim_RRR[1:8,p]
        }


    ## Observation probabilities:

    for(p in PopulationsUsing){

        # Probability of observing FirstTest- SecondTest- ThirdTest- from a true positive:
        se_prob[1,p] <- (1-se[1])*(1-se[2])*(1-se[3])
        # Probability of observing FirstTest- SecondTest- ThirdTest- from a true negative:
        sp_prob[1,p] <- sp[1]*sp[2]*sp[3]

        # Probability of observing FirstTest+ SecondTest- ThirdTest- from a true positive:
        se_prob[2,p] <- se[1]*(1-se[2])*(1-se[3])
        # Probability of observing FirstTest+ SecondTest- ThirdTest- from a true negative:
        sp_prob[2,p] <- (1-sp[1])*sp[2]*sp[3]

        # Probability of observing FirstTest- SecondTest+ ThirdTest- from a true positive:
        se_prob[3,p] <- (1-se[1])*se[2]*(1-se[3])
        # Probability of observing FirstTest- SecondTest+ ThirdTest- from a true negative:
        sp_prob[3,p] <- sp[1]*(1-sp[2])*sp[3]

        # Probability of observing FirstTest+ SecondTest+ ThirdTest- from a true positive:
        se_prob[4,p] <- se[1]*se[2]*(1-se[3])
        # Probability of observing FirstTest+ SecondTest+ ThirdTest- from a true negative:
        sp_prob[4,p] <- (1-sp[1])*(1-sp[2])*sp[3]

        # Probability of observing FirstTest- SecondTest- ThirdTest+ from a true positive:
        se_prob[5,p] <- (1-se[1])*(1-se[2])*se[3]
        # Probability of observing FirstTest- SecondTest- ThirdTest+ from a true negative:
        sp_prob[5,p] <- sp[1]*sp[2]*(1-sp[3])

        # Probability of observing FirstTest+ SecondTest- ThirdTest+ from a true positive:
        se_prob[6,p] <- se[1]*(1-se[2])*se[3]
        # Probability of observing FirstTest+ SecondTest- ThirdTest+ from a true negative:
        sp_prob[6,p] <- (1-sp[1])*sp[2]*(1-sp[3])

        # Probability of observing FirstTest- SecondTest+ ThirdTest+ from a true positive:
        se_prob[7,p] <- (1-se[1])*se[2]*se[3]
        # Probability of observing FirstTest- SecondTest+ ThirdTest+ from a true negative:
        sp_prob[7,p] <- sp[1]*(1-sp[2])*(1-sp[3])

        # Probability of observing FirstTest+ SecondTest+ ThirdTest+ from a true positive:
        se_prob[8,p] <- se[1]*se[2]*se[3]
        # Probability of observing FirstTest+ SecondTest+ ThirdTest+ from a true negative:
        sp_prob[8,p] <- (1-sp[1])*(1-sp[2])*(1-sp[3])

        # Ensure that the covariance terms do not result in an invalid probability:
        for(c in 1:8){
            AcceptProb[c,p] ~ dbern(ifelse(se_prob[c,p] >= 0 && se_prob[c,p] <= 1 && sp_prob[c,p] >= 0 && sp_prob[c,p], 1, 0))
        }

        # Calculate PPP-values for external assessment of additional tests:
        for(c in 1:8){
            # The probability of being positive given this combination of test results
            # (simple application of Bayes theorem anagolous to PPV = TP / (TP + FP))
            se_ppp[c,p] <- prev[p] * se_prob[c,p]
            sp_ppp[c,p] <- (1-prev[p]) * sp_prob[c,p]
            ppp[c,p] <- se_ppp[c,p] / (se_ppp[c,p] + sp_ppp[c,p])
        }

    }


    ## Model fit checks:

    for(p in PopulationsUsing){

        # Single test check:
        for(t in 1:3){
            sim_single[t,p] ~ dbinom(prev[p]*se[t] + (1-prev[p])*(1-sp[t]), N_single[t,p])
            check_single[t,p] <- Tally_single[t,p] - sim_single[t,p]
        }

        # Pairwise crude agreement check:
        for(c in 1:3){
            sim_pair[1:4,c,p] ~ dmulti(prob_pair[1:4,c,p], N_pair[c,p])
            check_pair[c,p] <- Agreement[c,p] - sum(sim_pair[c(1,4),c,p]) / N_pair[c,p]
        }

        prob_pair[1,1,p] <- (prev[p] * ((1-se[1])*(1-se[2]))) + ((1-prev[p]) * ((sp[1])*(sp[2])))
        prob_pair[2,1,p] <- (prev[p] * ((se[1])*(1-se[2]))) + ((1-prev[p]) * ((1-sp[1])*(sp[2])))
        prob_pair[3,1,p] <- (prev[p] * ((1-se[1])*(se[2]))) + ((1-prev[p]) * ((sp[1])*(1-sp[2])))
        prob_pair[4,1,p] <- (prev[p] * ((se[1])*(se[2]))) + ((1-prev[p]) * ((1-sp[1])*(1-sp[2])))
        prob_pair[1,2,p] <- (prev[p] * ((1-se[1])*(1-se[3]))) + ((1-prev[p]) * ((sp[1])*(sp[3])))
        prob_pair[2,2,p] <- (prev[p] * ((se[1])*(1-se[3]))) + ((1-prev[p]) * ((1-sp[1])*(sp[3])))
        prob_pair[3,2,p] <- (prev[p] * ((1-se[1])*(se[3]))) + ((1-prev[p]) * ((sp[1])*(1-sp[3])))
        prob_pair[4,2,p] <- (prev[p] * ((se[1])*(se[3]))) + ((1-prev[p]) * ((1-sp[1])*(1-sp[3])))
        prob_pair[1,3,p] <- (prev[p] * ((1-se[2])*(1-se[3]))) + ((1-prev[p]) * ((sp[2])*(sp[3])))
        prob_pair[2,3,p] <- (prev[p] * ((se[2])*(1-se[3]))) + ((1-prev[p]) * ((1-sp[2])*(sp[3])))
        prob_pair[3,3,p] <- (prev[p] * ((1-se[2])*(se[3]))) + ((1-prev[p]) * ((sp[2])*(1-sp[3])))
        prob_pair[4,3,p] <- (prev[p] * ((se[2])*(se[3]))) + ((1-prev[p]) * ((1-sp[2])*(1-sp[3])))
    }


    ## Priors:

    # Prevalence in population A:
    prev[1] ~ dbeta(1,1)

    # Prevalence in population B:
    prev[2] ~ dbeta(1,1)

    # Prevalence in population C:
    prev[3] ~ dbeta(1,1)


    # Sensitivity of FirstTest test:
    se[1] ~ dbeta(1,1)
    # Specificity of FirstTest test:
    sp[1] ~ dbeta(1,1)

    # Sensitivity of SecondTest test:
    se[2] ~ dbeta(1,1)
    # Specificity of SecondTest test:
    sp[2] ~ dbeta(1,1)

    # Sensitivity of ThirdTest test:
    se[3] ~ dbeta(1,1)
    # Specificity of ThirdTest test:
    sp[3] ~ dbeta(1,1)

    # Ensure that label switching does not occur for any test:
    for(t in 1:3){
        AcceptTest[t] ~ dbern(ifelse((se[t]+sp[t]) >= 1.0, 1, 0))
    }

}

#monitor# se, sp, prev, check_outcome_RRR, check_single, check_pair, ppp

## Inits:
inits{
"se" <- c(0.5, 0.99, 0.5)
"sp" <- c(0.99, 0.75, 0.99)
"prev" <- c(0.05, 0.95, 0.05)
}
inits{
"se" <- c(0.99, 0.5, 0.99)
"sp" <- c(0.75, 0.99, 0.75)
"prev" <- c(0.95, 0.05, 0.95)
}

## Data:
data{
"PopulationsUsing" <- c(1, 2, 3)
"AcceptTest" <- c(1, 1, 1)
"AcceptProb" <- structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(8, 3))
"N_RRR" <- c(200, 200, 200)
"Tally_RRR" <- structure(c(128, 6, 7, 5, 14, 7, 13, 20, 93, 7, 9, 15, 8, 15, 8, 45, 41, 10, 5, 18, 10, 14, 23, 79), .Dim = c(8, 3))
"Tally_single" <- structure(c(38, 45, 54, 82, 77, 76, 121, 125, 126), .Dim = c(3, 3))
"N_single" <- structure(c(200, 200, 200, 200, 200, 200, 200, 200, 200), .Dim = c(3, 3))
"Agreement" <- structure(c(0.835, 0.81, 0.835, 0.805, 0.81, 0.765, 0.74, 0.695, 0.765), .Dim = c(3, 3))
"N_pair" <- structure(c(200, 200, 200, 200, 200, 200, 200, 200, 200), .Dim = c(3, 3))
}

But we still run it the same way:

results <- run.jags("huiwalter_model.txt")
## Loading required namespace: rjags
summary(results, vars=c("se","sp","prev"))
##           Lower95    Median   Upper95      Mean         SD      Mode
## se[1]   0.7161097 0.7742602 0.8322135 0.7735954 0.02961780 0.7733734
## se[2]   0.7340988 0.7933080 0.8485722 0.7928400 0.02936653 0.7922936
## se[3]   0.7304288 0.7881952 0.8399843 0.7874676 0.02822815 0.7899728
## sp[1]   0.9174079 0.9561720 0.9871070 0.9547361 0.01812380 0.9600848
## sp[2]   0.9176079 0.9547588 0.9851829 0.9536619 0.01748871 0.9564157
## sp[3]   0.8798199 0.9203135 0.9591238 0.9193934 0.02039386 0.9215402
## prev[1] 0.1791970 0.2462685 0.3169235 0.2477728 0.03511502 0.2466552
## prev[2] 0.3834551 0.4640776 0.5431137 0.4640452 0.04097472 0.4656979
## prev[3] 0.6877262 0.7603519 0.8258895 0.7589121 0.03539693 0.7638740
##                MCerr MC%ofSD SSeff         AC.10     psrf
## se[1]   0.0003350363     1.1  7815 -0.0061269113 1.001796
## se[2]   0.0003372868     1.1  7581 -0.0001026725 1.000029
## se[3]   0.0003151564     1.1  8023  0.0025457538 1.000927
## sp[1]   0.0002163637     1.2  7017  0.0007895805 1.000439
## sp[2]   0.0002019918     1.2  7496 -0.0015545067 1.000991
## sp[3]   0.0002172725     1.1  8810  0.0069133232 1.000174
## prev[1] 0.0003684634     1.0  9082  0.0056981332 1.000203
## prev[2] 0.0004270223     1.0  9207 -0.0073862376 1.000175
## prev[3] 0.0003795453     1.1  8698 -0.0005701329 1.000383

One nice feature of the indexes is that it gives us names of populations and tests:

indexes$populations
## [1] "A" "B" "C"
indexes$tests
## [1] "FirstTest"  "SecondTest" "ThirdTest"

And it also (currently) contains the add_post_hoc and add_roc functions:

(post_hoc <- indexes$add_post_hoc(long_data, results))
## # A tibble: 33 × 9
##    Group        Parameter Variable Population Type   Mean Median Lower95 Upper95
##    <fct>        <chr>     <chr>    <fct>      <chr> <dbl>  <dbl>   <dbl>   <dbl>
##  1 TestPerform… Se        FirstTe… A          Post… 0.659  0.661   0.523   0.793
##  2 TestPerform… Se        FirstTe… B          Post… 0.827  0.829   0.744   0.906
##  3 TestPerform… Se        FirstTe… C          Post… 0.772  0.773   0.702   0.838
##  4 TestPerform… Se        FirstTe… Combined   Model 0.774  0.774   0.716   0.832
##  5 TestPerform… Se        FirstTe… Combined   Post… 0.774  0.774   0.720   0.828
##  6 TestPerform… Se        SecondT… A          Post… 0.775  0.779   0.652   0.891
##  7 TestPerform… Se        SecondT… B          Post… 0.767  0.768   0.677   0.854
##  8 TestPerform… Se        SecondT… C          Post… 0.807  0.808   0.740   0.873
##  9 TestPerform… Se        SecondT… Combined   Model 0.793  0.793   0.734   0.849
## 10 TestPerform… Se        SecondT… Combined   Post… 0.793  0.793   0.738   0.845
## # ℹ 23 more rows
ggplot(post_hoc |> filter(Group=="TestPerformance"), aes(x=str_c(Type, " - ", Population), y=Median, ymin=Lower95, ymax=Upper95, col=Type)) +
  geom_errorbar() +
  geom_point() +
  facet_grid(Variable ~ Parameter, scales="free_x") +
  coord_flip() + xlab(NULL) + ylab(NULL)

The primary use of this is to assess sensitivity and specificity in each population separately, in order to validate this important model assumption. However, in theory the PPP values can also be used to assess tests that were not known to the model as long as the PPP values produced are sufficiently close to 0 and 1 in order to avoid bias (i.e. the tests included in the model have a sufficiently high combined predictive value). For example we can exclude SecondTest from the model and still get post-hoc estimates:

template_huiwalter(
  wide_data |> select(-SecondTest),
  covariance = tribble(~Test_A, ~Test_B, ~Active,
  ),
  ppp_values = TRUE
) -> indexes
## The model and data have been written to huiwalter_model.txt in the current working directory
## *** NOTE: The template provided should be examined and modified (including checking ***
## ***   priors, covariance terms, and verifying the code) before running the model!   ***
results_no2 <- run.jags("huiwalter_model.txt")
post_hoc <- indexes$add_post_hoc(long_data, results_no2)

ggplot(post_hoc |> filter(Group=="TestPerformance"), aes(x=str_c(Type, " - ", Population), y=Median, ymin=Lower95, ymax=Upper95, col=Type)) +
  geom_errorbar() +
  geom_point() +
  facet_grid(Variable ~ Parameter, scales="free_x") +
  coord_flip() + xlab(NULL) + ylab(NULL)

Note that the post-hoc estimates are still produced for SecondTest even though it was not estimated by the model - these estimates will be biased downwards but the magnitude of the bias should be minimal as long as the combined predictive value of the tests included in the model is sufficiently high. This can be useful for (1) assessing tests that were too highly correlated with another test to be included in the model, (2) assessing tests where the Se/Sp varies by population and therefore cannot be added to the model without breaking key assumptions. Of course, the model itself still needs a minimum of two tests, excluding information from the model unnecessarily will just give wider credible intervals, and where the tests included in the model have poor combined performance the resulting post-hoc calculations will exhibit substantial bias.

ROC curves

There is another function in the indexes output called add_roc. This requires a column in the long_data called “Continuous”, which holds the raw test result value - I haven’t shown that here, but it should be fairly easy to use if you have suitable data, for example:

roc <- indexes$add_roc(long_data |> filter(TestName=="SecondTest"), results, positive_direction=">")
ggplot(roc, aes(x=Threshold, y=Mean, ymin=Lower95, ymax=Upper95)) +
  geom_ribbon(alpha=0.25, lwd=0) +
  geom_line() +
  geom_vline(xintercept=_current_cutoff_, lty="dashed") +
  ylab(NULL) + xlab("Cutoff")

This works on the same principle as add_post_hoc function discussed above, so is subject to the same caveats regarding the expected bias in estimated performance of SecondTest.

More disclaimers

Just in case you didn’t read this the first time I will say it again: although these features will be included in a feature release version of runjags (on CRAN), I expect the implementation to change quite a bit - so the code given above will eventually break (and be replaced with something better, and documented). So treat this guide as a sneak preview of what is to come…

If anyone has any suggestions for how the new interface might look (and even better: is willing to help with runjags development) then please do get in touch!