Introduction to simssd • simssd

Overview

The simssd package performs sample size determination (SSD) and power computation via simulation for fixed effects in linear regression models, including generalized linear models and multilevel models.

What problem does simssd address?

When using simulation to estimate sample size requirements for multilevel models, execution time can become impracticably slow. simssd implements a method to improve computational speed under certain circumstances.

The package is designed to be extensible, offering the potential to support any linear regression model for which it is possible to simulate data and fit the model. It comes with a selection of predefined models out of the box, including some random effect (multilevel) models.

Background

The basic idea, elaborated in Writing effective and reliable Monte Carlo simulations with the SimDesign package by Chalmers & Adkins, is that in order to conduct a Monte Carlo simulation (MCS) experiment you need a model together with three essential components:

A mechanism for generating (simulating) data according to your model;
A method for analysing any given dataset of such simulated data; and
A method for summarising the results of the analysis to yield (Monte Carlo) estimates for your quantity of interest.

You would proceed by using (1) to generate multiple independent datasets (usually in the region of 10,000 or more) according to your model. You would then use (2) to analyse each of these datasets in whichever way is relevant to your study purpose. The final step (3) would be to compute a summary of the results obtained from (2).

Typically, you might be interested in investigating a number of different scenarios (sometimes called conditions), each of which require a separate MCS. For example, the conditions could be differing sample sizes.

How this relates to sample size determination

In the context of SSD via simulation, the analysis and summary steps would involve estimating the power for a specific sample size (condition). You would then repeat this for a series of increasing sample sizes and observe how the power increases as you do so. In other words, given a particular model (and its parameters), you could use this approach to estimate which sample size corresponds to the required level of power and, in turn, estimate the sample size needed. This is, in a high-level sense, what simssd does.

Multilevel sample size determination

For a 2-level model, where you would have two sample sizes, you could create a grid of unique sample size combinations (each being one condition) and proceed along the same lines, once again observing how the power curve changes.

However, the process described here needs a very large number of simulations for multilevel models and can be extremely slow, often taking hours or even days to complete.

In the case of 2-level models where one of the sample sizes should (or perhaps could) be constrained, for example due to cost considerations, simssd can implement a method to reduce the number of simulations required, thereby reducing the computation time needed. See The simssd approach for more details.