1. Description

This document details initial planned data checks of the cognitive task data. These data checks are planned prior to the submission of the registered report. See below and the Open Science Workflow for more information on how we minimized the chance of bias.

2. Aim

To investigate whether the cognitive tasks that we identified as being suitable for Drift Diffusion Modeling (DDM) indeed have the required properties. These are 1) the NIH Flanker Task, 2) the NIH Pattern Comparison Processing Speed Task, 3) the NIH Picture Vocabulary Task, 4) The NIH Dimensional Change Card Sort Task, and 5) the Little Man Task. Suitability will be investigated in the following ways:

Trial-level response time and accuracy data are available for each task.
Response times are generally < 3-4 seconds (Lerche et al., 2017).
In general, tasks generate a sufficient number of errors. When many participants perform with (near) perfect accuracy, this can cause problems in estimating DDM parameters, especially the boundary separation. On the other hand, if accuracy is too low (i.e., approaching chance level) this might indicate that responses are not generated by a diffusion process and therefore not valid. If the mean accuracy level of a particular (sub)task is exceedingly high (>~97%) or low (<~65%), we will run parameter recovery simulations based on the empirical data to investigate whether this is likely to cause problems in parameter estimation.
Fitting an hierarchical Bayesian drift diffusion model (HDDM) to the task data provides good convergence, model fit, and realistic parameter estimates.

3. Approach

3.1 Reading in the data

These data checks are performed prior to submitting the registered report. To prevent researcher bias, we developed an automated workflow that is based on automated tracking via Github. In short, the workflow is as follows (for more details, see here):

Only the necessary data files (i.e., those containing the trial-level task data) will be loaded into R for further cleaning and processing.
Files that contain the data of several cognitive tasks will be read partially, one task at the time.
An automated function will randomly shuffle subject IDS when reading in the data. This function also tracks the MD5 hash of the file as it is read into R, which is a unique identifier that changes whenever something in the data file changes. If the function does not recognize the MD5 hash from a previous session - i.e., a new (selection of a) data file is read in for the first time - it will automatically log this event and commit it to Github. The commit history will later be used to create a full project history for the reviewers and any other future readers.

Note that in this phase, we will only read in and process cognitive task data. We will not look at any other variables that will play a role in the final data analyses - such as demographics and adversity measures.

3.2 Data cleaning

On a task trial-level, we will exclude:

Response times < 300 ms and log-transformed response times > 3 SD from the mean of each participant;
Trials with missing response times or accuracy data.

On a participant-level, we will exclude participants who:

Have < 20 valid trials on a particular task (after trial-level exclusions);
Did not complete one or more cognitive tasks included in this preregistration.
Did not perform above chance-level on a particular task. Using a binomial distribution, we will calculate the expected accuracy (with 97.5% probability) that a participant would get when purely guessing.
At some point in the past suffered possible mild traumatic brain injury (TBI) or worse.

3.3 Model fitting

We will fit DDM to each task separately in a hierarchical Bayesian framework as implemented in the Python package HDDM (Wiecki et al., 2013). In each model, parameter z will be fixed to the mid-point (assuming no bias for a particular response option) and the inter-trial variability parameters are fixed to 0 in order to reduce model complexity. Parameters boundary separation (a), drift rate (v) and non-decision time (t0) are freely estimated. The model specification will look as follows:

hddm.HDDM(<data>, bias = False)

For the Flanker Task and the Dimensional Card-Sorting Task, we compare the fit of two model versions: one in which a single value for a and v is estimated across task conditions (incongruent vs. congruent for the Flanker Task and switch vs. repeat trials for the Dimensional Card-Sorting Task) and one in which these parameters are estimated separately in each condition. The latter model specification will look as follows:

hddm.HDDM(<data>, bias = False, depends_on={‘v’: <condition>, ‘t’: <condition>})

In the case of small differences, we favor model versions with one estimate across conditions given the low number of trials per condition.

For each task, we initially draw 5,000 samples from the posterior distribution while discarding the first 1,000 samples as burn-in. Model convergence is first investigated visually by plotting the trace, autocorrelation, and the marginal posterior (following Wiecki et al. (2013)). If a model does not converge properly, we will first try increasing the number of drawn samples to 10,000 (discarding the first 5,000 samples), 50,000 (discarding the first 45,000 samples) and if necessary to 100,000 samples (discarding 95,000 samples). In case high autocorrelation remains, we will apply thinning (taking 100,000 samples, discarding 50,000 and keeping every 10th sample). If none of these steps improve convergence, DDM might not be suitable for the task at hand.

References

Lerche, V., Voss, A., & Nagler, M. (2017). How many trials are required for parameter estimation in diffusion modeling? A comparison of different optimization criteria. Behavior Research Methods, 49(2), 513–537. https://doi.org/10.3758/s13428-016-0740-2

Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python. Frontiers in Neuroinformatics, 7, 14. https://doi.org/10.3389/fninf.2013.00014