Written by Jen Ware & Jeff Long

Before you Begin…

A brief list of things we think are important to consider before you launch into your analysis.

Are you at the ‘back of an envelope’ stage of your study? Fantastic. That’s the perfect time to be reading this. Further down the line? That’s okay too. Hopefully there’s some helpful advice in here, for now or next time. Do you think you know all of this stuff already? Awesome! None of the above? Well, we promise this entry is less than a three-minute read.

Okay, so here is our list of things we think are important to consider before you launch into your analysis:

First, make sure to clearly define your research question and objectives. It may sound blindingly obvious, but even seasoned professionals can struggle with vaguely defined questions and objectives. This can open the door to all kinds of problems down the road (keep reading).

Next, search the literature. Has this question been addressed previously? Adequately? What does your proposed research add? All good things to lay out clearly from the offset (and useful to have on hand when drafting your ‘Research in Context’ section for journals like Lancet Neurology).

Next up – what is your population of interest? What type or group of individuals do you want to be able to make conclusions about? For example, people early/late in the disease? And how will you operationalize (i.e., practically define) this target population? Your population of interest might be very narrow (e.g., all premanifest, ambidextrous, French violinists with a CAG repeat length of 47) or very broad (e.g., all HD gene-expansion carriers).

And so, to the data! But wait, are there: a) appropriate data, and b) enough data, to allow you to robustly and rigorously address your research question? This is known as a feasibility assessment. You will want to make sure the number of individuals of interest in your dataset (i.e., your sample size) is large enough to generate meaningful results (not always easy to define!). Your analyses should be appropriately powered to detect the effect of interest. If a power analysis is not appropriate (e.g., if the study is exploratory and no inferential statistics will be reported) this should be explicitly stated in your statistical analysis plan.

Talking of analyses…we strongly recommend working with a statistician throughout the research process (not just when you want someone to run your linear mixed model). A statistician can work with you to formulate clearly defined research objective(s), design a statistical analysis plan, conduct power/sample size analyses, and aid in reporting and interpreting of results, as well as handling and analyzing your data.

Did we mention a statistical analysis plan? Yes we did. And we’re going to talk about them some more here because they’re a critical part of any rigorously conducted study. Your statistical analysis plan (SAP) – which you might want to consider pre-registering – should clearly address the following:

Objectives: What are your study objectives (primary, secondary, exploratory)?

Data set: Which dataset(s) will you use, and how will your population of interest be operationally defined?

Measures of interest: What are your measures of interest (outcome, exposure, covariates)? Are you confident the variables capture your intended measures of interest? We recommend making friends with the data support documentation.

Data cleaning: What will your data cleaning and QC consist of? For example, how will you identify outliers in your dataset? Prespecify your method, and consider reporting results both with and without outliers.

Missing data: How will you handle missing data?

Statistical methods: What statistical methods will you use for your analysis? The characteristics of your outcome variable often dictate the statistical method (e.g., continuous outcome vs binary outcomes), as will the timing of measurements (e.g., cross-sectional vs longitudinal), or reaching a landmark event (e.g., time from birth to motor diagnosis).

Modifier and confounder variables: What important modifier and/or confounding variables should you include in your analysis? Terminology varies, but modifiers are variables that interact with one another (e.g., CAG-repeat length and age may interact such that the combination of long CAG and old age is associated with much more progression than can be explained by the individual variables, i.e., the main effects). CAG length and age interaction is commonly modeled, whereas interaction of variables with sex can be important in some contexts, and interaction of variables with education for cognitive variables. Confounders are variables that affect both an outcome and a predictor to cause a spurious association between the two (e.g., participants in Enroll-HD who have a history of drug use have milder motor signs than those with no history, but the association is spurious because the former tend to have shorter CAG expansions. In this case CAG length is a confounder of the spurious relationship between history of drug use and motor signs). Don’t forget to consider, if relevant, the time point at which covariate values will be drawn.

Treatment of multiplicity. If you are investigating multiple outcomes/associations, will you adjust for multiple comparisons? Describe whether family-wise error rate or false discovery rate correction will be performed, and at what level (and make sure to report all analyses performed).

Ensure reproducibility. Consider pre-registering your analysis plan and archiving analysis scripts.

Finally, a little shout out to our Analysis Resources section. Check out the new data documentation hub, and you can also learn more about available statistical consultancy resources, such as the Independent Statistical Standing Committee.