From raw to ready: preparing the A4/LEARN dataset for resting-state fMRI analysis

Some notes for future me (and any mysterous reader). If you’ve ended up with the task of understanding those, don't hesitate to get in touch: lea.chauveau@med.lu.se

Important: Bianca (HPC) does not seem to tolerate job arrays with more than 1000 elements. Therefore, job arrays must be submitted in separate batches, each containing a maximum of 1000 IDs.

BIDS Conversion:

The conversion to BIDS format was performed using convert_to_bids_parall.sh, which was executed on Bianca cores via submit_convert_bids.sh.

BIDS Validation:

Validation Issues and Fixes:

The validate function from CuBIDS was run to perform a BIDS validation on raw/ data, using the Singularity image cubids_v1.2.0.sif. (Looking back, using a Singularity container for CuBIDS wasn’t the best move — I would probably do it differently next time).

Run 1 cubids --validate: The output can be retrived in ./CuBIDS/v0_validation.tsv

Missing TaskName in json files for func/: The validator highlighted a single recurring error for resting-state fMRI data (our primary focus): 'TaskName' was missing. This was resolved using add_taskname.sh, submitted via submit_add_taskname.sh.
DeidentificationMethod wrong format in json files from anat/: A second error came up, this time with the 'DeidentificationMethod' field in all json files associated structural MRI scans. BIDS format expects it to be an array, so the fix was to wrap the value accordingly. This was achieved using make_array_deidentif.sh, submitted via submit_make_array.sh.
Other errors were related to the PET data. These ones were ignored for now, as no BIDS apps were planned to be run on pet/.
Missing SliceTiming for some json files in func/: The validator did not flag it as an error, but as a warning. I contacted A4 support service to retrieve acquisition parameters for neuroimaging exams, but they could not make it available for now. Sidenote: Since slice timing correction is generally less critical for resting-state fMRI, we decided not to apply it in later steps since info was misisng for several json keys, which mainly come from Philips scanners.

Run 2 cubids --validate: The output can be retrived in ./CuBIDS/v1_validation.tsv

No more errors related to MRI: TaskName and DeidentificationMethod were successfully solved.

Metadata heterogenity visualization:

Run 1 cubids --group: The group function of CuBIDS was run to gain some insight into the dataset’s structure, heterogeneity, and metadata inconsistencies. The output is available in ./CuBIDS/v0_summary.tsv, along with the corresponding group-derived files for version v0.

A large number of groups were flagged due to significant heterogeneity (e.g., 188 KeyParamGroup for BOLD data). However, many of the identified groups were formed from only a single scan (e.g., 116 for BOLD data).
Since parameters relevant to group functioning were missing from the MRI json files, they were extracted from nifti headers and added to sidecars using add_nifti_info_xxxx.sh, submitted via submit_add_nifti.sh. This pipeline mimics the functionality of the cubids --add-nifti-info command but was implemented in-house, likely due to missing dependencies in the Singularity container used.

Run 2 cubids --group: Once the missing parameters added, the CuBIDS group function was re-run to check whether these new json keys introduced any additional heterogeneity into the dataset. Spoiler: yep, it got messier. The output is available in ./CuBIDS/v1_summary.tsv, along with the corresponding group-derived files for version v1.

Due to the overwhelming number of variants, we restricted our preprocessing to participants of interest (i.e., those who have at least one T1, one fMRI and one tau-PET scan across their sessions). Anyone else got sent to quarantine (c'est ciao!) using mv_to_quarantine.sh, submitted via submit_mv.sh. Out of 4479, we were left with 437 survivors included in raw/.

Run 3 --group: We run the CuBIDS group function again (because why not?) to zoom in on heterogeneity in our subset of interest. The output is available in ./CuBIDS/v2_summary.tsv, along with the corresponding group-derived files for version v2.

Alléluia! CuBIDS went easy on us, with fewer groups/variants overall (e.g., 77 groups for BOLD, including 48 with only one scan).
Inspection of the outputs decided us to drop ParallelAcquisitionTechnique, ParallelReductionFactorInPlane and PartialFourier as grouping parameters for BOLD, and to set the tolerance for T1w RepetitionTime to 0.001 instead of 0.000001 (see config_custom file).

Meanwhile, outside of CuBIDS' group universe...

Assessing within-participant heterogeneity: One participant was flagged for having both Siemens and Philips acquisitions. To investigate this kind of cross-manufacturer weirdness, we collected several json-key values across sessions using assess_site_param.sh and further examined results in R.
- 29 participants were found to have been scanned on different scanners across sessions. Visual inspection of the images confirmed that these were not labeling or sorting errors — all sessions did belong to the same individuals.
- Along with the manufacturer, other json-derived infos were collected. However, they were not consistent enough to provide further site definition.
Updating json files to support omni's T2-based distortion correction: The omni pipeline requires some specific key-value pairs in the json files to generate synthetic fieldmaps based on T2w images. Since these fields were sometimes missing, we patched the dataset using the following custom scripts:
- add_ees_siemens.sh - to fill in missing 'EffectiveEchoSpacing' for Siemens scans
- update_philips_ees.sh – same thing, but for Philips
- add_ped.sh – to inject 'PhaseEncodingDirection' and 'TotalReadoutTime' where absent
Testing pipelines planned for use:
- Let's keep the details about preprocessing steps for below. But for what relevant to this section, we figured out that fMRIPrep (at least the version we pulled) failed to perform due to the acq-xxx entity that was previously (until now) included in BIDS filenames. We therefore removed it using ./remove_acq_entity.sh.

Run 3 cubids --validate: One final BIDS validation check, just to ensure our edits did not break anything. The output can be retrived in ./CuBIDS/v2_validation.tsv

No errors related to (f)MRI (yay)

Run 4 cubids --group: Same, one last round for grouping parameters. The output is available in ./CuBIDS/v3_summary.tsv, along with the corresponding group-derived files for version v3.

As mentionad above, we dropped 'ParallelAcquisitionTechnique', 'ParallelReductionFactorInPlane' and 'PartialFourier' for BOLD grouping parameters — this brought the number of variants down to 70. For T1w, we reduced 'RepetitionTime' precision from 6 to 3 decimals, now resulting in 33 variants.

Run 1 cubids --copy-exemplars: At this point, we considered the dataset to be sufficiently curated, so we gave CuBIDS' exemplar creation a first try.

Due to issues running the omni pipeline later on (and storage issue, i.e., file limit), the exemplar folder was eventually emptied to only sync subjects that were actually processed — but the original list is still available in list_exemplar.txt. Probably not useful for anyone else, but I kept it anyway — just so I don’t forget down the road.

As a sidenote, we chose not to rename files using CuBIDS' variant-based naming convention to avoid introducing new errors when running our preprocessing pipelines (also this feature didn’t work properly in my CuBIDS container — so it all worked out in the end).

Hopefully I didn't forget any major steps here. If I did... well, you caught me!

Preprocessing:

Alright, I am writing this a bit after the fact — so let’s hope my memory holds up!

Omni

Test phase and subsequent decisions:

Since the A4-LEARN dataset didn’t come with any fieldmaps, I went hunting for the best fieldmap-less distortion correction out there. That's how I came across Omni. According to the paper Omni performs impressively well — even reaching results comparable to fieldmap-based corrections (FSL_fugue/topup). It also outperformed its competitors, SyN SDC and SynB0-DisCo.

Here’s the gist:

SyN SDC (fMRIPrep’s fieldmap-less option) aligns directly to the participant’s T1w image — which has a very different contrast than the EPI image.
Omni, on the other hand, generates a synthetic image (using both T1w and T2w — so yeah, you need those two) whose contrast is closer to the EPI, making alignment easier and more accurate.
SynB0-DisCo is a deep learning approach that can shine on some datasets, but it doesn’t always generalize well to new ones. So, I crossed that one off the list early on.

Run Omni pipeline

I pulled the Singularity container omni_2023.2.1.sif to run the piepline. As mentioned above, I manually injected some required json keys that were missing (see scripts and slides for details). Apart from that glitch, I believe everything ran like a charm — or just don't recall.

I then ran a few comparative tests:

Omni
SyN SDC (the fMRIPrep fieldmap-less default)
No correction (a.k.a. fMRIPrep default baseline)

Visual inspection of the results clearly favored Omni, so we decided to go with that option.

Sidenote: Using mutual information metrics to compare corrected vs. uncorrected bold scans later on confirmed the visual impressions — Omni-corrected scans were better aligned overall (df = 2175, t = 41.45, p < 0.0001).

Custom our Omni-based distortion correction pipeline

Here’s the catch: Omni outputs the corrected bold image in template space and performs motion correction internally to create its synthetic image. But since I wanted to keep everything in EPI space to feed fMRIPrep with the undistorted image (and avoid extra interpolation steps using Omni's --atlas flag), I took a slightly different route: I grabbed the final_epi_to_synth_warp file generated by Omni — this is the end-point warp field used internally for unwarping. Instead of letting Omni do the unwarping concurrently with normalization step, I applied it myself using AFNI, keeping the data in native EPI space. I then inverted the motion correction matrix and applied it to undo motion correction, yielding an undistorted yet motion-uncorrected EPI image.

A visual check confirmed that everything looked good — motion was back, and I was glad (never thought I would say such a thing).

All these steps are wrapped up in the run_distortion_corr_pipeline.sh script.

Omni produces quite a bunch of files, so to save space—and avoid becoming Bianca’s black sheep—I only kept the relevant outputs in derivatives/omni/.

Sidenote: Some sessions were missing T1w and/or T2w images. These sessions were therefore excluded from Omni-based distortion correction. Another session failed during the Omni pipeline due to a BET-related issue with the T1w image. Manual application of the BET function still produced poor brain extractions, so this session was excluded from Omni processing as well. In total, n = 27 sessions (out of 2204) were not preprocessed using Omni, but using the SyN SDC approach instead (see below for details).

fMRIPrep

Run fMRIPrep

Here we are — with our relatively undistorted BOLD images, ready to trick fMRIPrep! To do that, I added the distortion-corrected images into the raw/ directory, setting the acquisition entity to acq-dc to clearly distinguish them from the original ones. This was performed using add_dc_epi.sh. The script also changes obliquity status in json files to match the distortion corrected images.

To run fMRIPrep, I pulled the Singularity container fmriprep_v25.1.1.sif. Both undistorted and original (distorted) epi data were preprocessed, and outputs were stored in derivatives/fmriprep_ses. This allowed us to perform mutual information comparisons to evaluate alignment quality. Mutual informtion metrics were extracted with calculate_similarity.sh script. As mentioned earlier, for sessions where Omni-based distortion correction wasn’t available, I used the SyN SDC method within fMRIPrep. Those outputs can be found in derivatives/fmriprep_ses_sdc.

All command-line settings are available in the run_fmriprep* scripts

Notable points:

No slice-timing correction was performed (see sidenote above).
Preprocessing was done per session rather than across all sessions of a participant. Instead of using the default option that reuses the same anatomical files for all sessions, I filtered data session-wise to allow session-specific coregistration and segmentation. Of note, the --longitudinal flag wasn’t sufficient here, as we anticipated some degree of atrophy.
Outputs were generated in native space, as we initially planned to keep everything native. Spoiler: we’ll need to move to template space later.
Outputs were generated in volume space only — we forgot to request CIFTI outputs… oops

After catching and fixing a few errors (and waiting quite a long time), every scan was finally preprocessed (yay). One session, however, failed due to a surface reconstruction issue during autorecon1. We decided to exclude this scan rather than rerun it using volumetric alternatives. In total, 2203 scans made ot through preprocessing — sounds fine… until QC hits (that’s the real Koh-Lanta).

Sidenote: For sessions missing T1w images, I took the closest T1w from another session of the same participant — provided it was acquired with the same manufacturer. In these cases, an acquisition entity was added to indicate the true session origin of the T1w.

XCP-D

The light at the end of the tunnel.

XCP-D requires data in template space, so I fixed that by normalizing the preprocessed scans into MNI space using normalize.sh.

To run XCP-D, I pulled the Singularity container xcp_d_v0.11.1.sif.

After a few rounds of testing (confirming that XCP-D was working as expected), we decided to go a bit fancier. The main twist was customizing the atlas parcellations.

Since we only have volumetric fMRIPrep outputs, running XCP-D at the surface level was off teh table — which was actually ok, because we wanted to include subcortical regions, plus vertex-wise analyses showed limitations anyway.

Still, to account for anatomical variability across individuals, I generated individual-tailored gray matter (GM) atlas. To achieve this, I simply thresholded each (session-specific) participant's GM segmentation probability maps outputed by fMRIPrep to 0.3 using create_atlas_indiv.sh, and then multiply this binary mask by our atlas of interest — consisting of Schaefer 200/400 cortical parcels + Ye Tian S2 subcortex + Diedrichsen's cerebellum. All command-lines are wraped into run_xcp_indiv.sh.

From there, XCP-D produced denoised bold scans, ALFF and ReHo maps, as well as parcel-wise timeseries and ALFF/ReHo for all available data (n=2203).

Notable points:

Denoising strategy: 36p regression model including six motion parameters + their derivatives + quadratic terms, as well as WM, CSF, and global signal with their derivatives and quadratic terms — this combo best minimized the link between motion and connectivity
Despike
No additional smoothing
No censoring or scrubbing

Quality assessment:

Motion metrics

Mean framewise displacement (FD) was retrieved from XCP-D LINC QC files using extract_fd.sh. On average, FD was 0.16 ± 0.11 for all data. Out of 2203 scans, 230 fell under the 0.3 threshold, and 374 under 0.25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

From raw to ready: preparing the A4/LEARN dataset for resting-state fMRI analysis

BIDS Conversion:

BIDS Validation:

Validation Issues and Fixes:

Metadata heterogenity visualization:

Preprocessing:

Omni

fMRIPrep

XCP-D

Quality assessment:

Motion metrics

Visual inspection

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
add_dc_epi.sh		add_dc_epi.sh
add_ees_siemens.sh		add_ees_siemens.sh
add_nifti_info_anat.sh		add_nifti_info_anat.sh
add_nifti_info_func.sh		add_nifti_info_func.sh
add_ped.sh		add_ped.sh
add_taskname.sh		add_taskname.sh
assess_site_param.sh		assess_site_param.sh
calculate_similarity.sh		calculate_similarity.sh
convert_to_bids_parall.sh		convert_to_bids_parall.sh
create_atlas_indiv.sh		create_atlas_indiv.sh
extract_fd.sh		extract_fd.sh
make_array_deidentif.sh		make_array_deidentif.sh
mv_to_quarantine.sh		mv_to_quarantine.sh
normalize.sh		normalize.sh
remove_acq_entity.sh		remove_acq_entity.sh
run_distortion-corr_pipeline.sh		run_distortion-corr_pipeline.sh
run_xcp_indiv.sh		run_xcp_indiv.sh
submit_add_taskname.sh		submit_add_taskname.sh
submit_convert_parall.sh		submit_convert_parall.sh
update_philips_ees.sh		update_philips_ees.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

From raw to ready: preparing the A4/LEARN dataset for resting-state fMRI analysis

BIDS Conversion:

BIDS Validation:

Validation Issues and Fixes:

Metadata heterogenity visualization:

Preprocessing:

Omni

fMRIPrep

XCP-D

Quality assessment:

Motion metrics

Visual inspection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages