---
title: "mcgibbsit Example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mcgibbsit Example}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
bibliography: references.bib
link-citations: true
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

nformat <- function(x)
  format(x, big.mark = ',')
```

```{r setup}
library(mcgibbsit)

set.seed(42)        # for reproducibility
tmpdir <- tempdir()
```

The `mcgibbsit` package provides an implementation of Warnes & Raftery's
MCGibbsit run-length diagnostic for a set of (not-necessarily
independent) MCMC samplers. It combines the estimate error-bounding
approach of Raftery and Lewis with the between chain variance verses
within chain variance approach of Gelman and Rubin.

For a set of *exchangeable*[^1] MCMC simulations on the same data and
model `mcgibbsit` computes:

[^1]: MCMC Simulations with the same data, prior, and posterior model.

-   minimum run length: $N_{min}$

-   required burn in length: $M$

-   total run length: $N$

-   run length inflation due to *auto-correlation*: $I$

-   run length inflation due to *between-chain* correlation: $R$

These simulations need not be independent, such as those generated by
the Normal Kernel Coupler adaptive CMC method (see @warnes2000nkc or
@warnes2001tr395).

For 

The normal usage is to perform an initial MCMC run of some
pre-determined length (e.g. 300 iterations) for each of a set of $k$
(e.g. 20) MCMC samplers. The output from these samplers is then read in
to create an `mcmc.list` object and `mcgibbsit` is run specifying the
desired accuracy of estimation for quantiles of interest. This will
return the minimum number of iterations to achieve the specified error
bound. The set of MCMC samplers is now run so that the total number of
iterations exceeds this minimum, and `mcgibbsit` is again called. This
should continue until the number of iterations already complete is less
than the minimum number computed by `mcgibbsit`.

If the initial number of iterations in `data` is too small to perform
the calculations, an error message is printed indicating the minimum
pilot run length.

## Example

This basic example constructs a dummy set of files from an *imaginary*
MCMC sampler and shows the results of running `mcgibbsit` with the
default settings.

```{r}
# Define a function to generate the output of our imaginary MCMC sampler
gen_samples <- function(run_id, nsamples=200)
{
  x <- matrix(nrow = nsamples+1, ncol=4)
  colnames(x) <- c("alpha","beta","gamma", "nu")
  
  x[,"alpha"] <- exp(rnorm (nsamples+1, mean=0.025, sd=0.025))
  x[,"beta"]  <- rnorm (nsamples+1, mean=53,    sd=14)
  x[,"gamma"] <- rbinom(nsamples+1, 20,         p=0.15) + 1
  x[,"nu"]    <- rnorm (nsamples+1, mean=x[,"alpha"] * x[,"beta"], sd=1/x[,"gamma"])
#'
  # induce serial correlation of 0.25
  x <- 0.75 * x[2:(nsamples+1),] + 0.25 * x[1:nsamples,]

  # induce ~50% acceptance rate
  accept <- runif(nsamples) > 0.50
  for(i in 2:nsamples)
    if(!accept[i]) x[i,] <- x[i-1,]

  write.table(
    x,
    file = file.path(
      tmpdir,
      paste("mcmc", run_id, "csv", sep=".")
      ),
    sep = ",",
    row.names = FALSE
  )
}
```

First, we'll generate and load only a 3 runs of length 200:

```{r}
# Generate and load 3 runs 
for(i in 1:3)
  gen_samples(i, 200)
  
mcmc.3 <- read.mcmc(
  3, 
  file.path(tmpdir, "mcmc.#.csv"), 
  sep=",",
  col.names=c("alpha","beta","gamma", "nu")
  )
```

```{r trace_density_plot_3,fig.width=8,fig.height=8}
# Trace and Density Plots
plot(mcmc.3)
```

Now run `mcgibbsit` to determine the necessary total number of MCMC
samples to to provide accurate 95% posterior confidence region estimates
for all four of the parameters:

```{r}
# And check the necessary run length 
mcg.3 <- mcgibbsit(mcmc.3)
print(mcg.3)
```

The results from `mcgibbsit` indicate that the required number of
samples is `r nformat(max(mcg.3$resmatrix[,"Total"]))`, which is less
than we've generated so far.

Lets generate 7 more runs, each of length 200, for a total of 2,000
samples:

```{r}
# Generate and load 7 more runs 
for(i in 3 + (1:7))
  gen_samples(i, 200)
  
mcmc.10 <- read.mcmc(
  10, 
  file.path(tmpdir, "mcmc.#.csv"), 
  sep=",",
  col.names=c("alpha","beta","gamma", "nu")
  )
```

```{r trace_density_plot_10,fig.width=8,fig.height=8}
# Trace and Density Plots
plot(mcmc.10)
```

Now run `mcgibbsit` to determine the necessary number of MCMC samples:

```{r}
# And check the necessary run length 
mcg.10 <- mcgibbsit(mcmc.10)
print(mcg.10)
```

`mcgibbsit` now estimates that a total of required number of samples is
`r nformat(max(mcg.10$resmatrix[,"Total"]))`[^2] Since we we have
already generated `r nformat(mcg.10$nchains * mcg.10$len)` samples, we
do not need to perform any additional runs.

[^2]: This is slightly fewer than before because the the larger number
    of samples allowed more accurate estimates of the variances and
    correlations.

We can now calculate the posterior confidence regions for each of the
parameters.

```{r}
summary(mcmc.10)
```