Skip to content

Add nf-core/rnaseq GSE110004 Nextflow instance#3

Open
RE-Z3R0 wants to merge 1 commit into
wfcommons:mainfrom
RE-Z3R0:add-rnaseq-gse110004-instance
Open

Add nf-core/rnaseq GSE110004 Nextflow instance#3
RE-Z3R0 wants to merge 1 commit into
wfcommons:mainfrom
RE-Z3R0:add-rnaseq-gse110004-instance

Conversation

@RE-Z3R0

@RE-Z3R0 RE-Z3R0 commented Jun 11, 2026

Copy link
Copy Markdown

This PR adds a candidate WfFormat instance generated from a real nf-core/rnaseq 3.25.0 execution using Nextflow 26.04.2 and Singularity.

Run summary:

  • Workflow: nf-core/rnaseq 3.25.0
  • Workflow management system: Nextflow 26.04.2
  • Container runtime: Singularity
  • Input data: two paired-end Saccharomyces cerevisiae RNA-seq runs from GSE110004 (SRR6357070 and SRR6357076)
  • Reference: Ensembl Fungi Saccharomyces cerevisiae R64-1-1 release 115
  • Execution host: TU Berlin server, Ubuntu 24.04, x86_64
  • Outcome: completed successfully in 2 h 34 m 10 s; 93 tasks succeeded, 8 cached; 101 Nextflow task records

The instance was generated from the run’s Nextflow pipeline_info artefacts using WfCommons 1.4 NextflowLogsParser and validated with SchemaValidator.validate_instance().

Known limitations observed during validation/evaluation:

  • The generated WfInstance contains no task dependencies or file records.
  • Machine information is not propagated into the instance, although partial machine-related information such as cpu_model is present in the upstream Nextflow report artefact.
  • Per-task executedAt timestamps are parse-time timestamps rather than original task timestamps.
  • The makespan is affected by cached-task timestamps from a resumed run.
  • memoryInBytes appears to be stored a factor of 1024 too small.
  • Some repeated Nextflow task records are collapsed into fewer WfInstance tasks.

These limitations appear to arise from the current NextflowLogsParser conversion rather than from the underlying nf-core/rnaseq execution artefacts. I am submitting the instance as a candidate contribution and documenting these limitations explicitly.

@henricasanova

Copy link
Copy Markdown
Contributor

Thanks for this PR! Adding new workflow instances is of course one of our goals. We are, however, concerned that there are no task dependencies at all. Is it because those dependencies were not captured from the logs? (Nextflow is not the easiest). We are not sure we want to add a workflow instance to WfInstances with such a strong limitation. There is no way around it?

@RE-Z3R0

RE-Z3R0 commented Jun 12, 2026

Copy link
Copy Markdown
Author

Thanks for this PR! Adding new workflow instances is of course one of our goals. We are, however, concerned that there are no task dependencies at all. Is it because those dependencies were not captured from the logs? (Nextflow is not the easiest). We are not sure we want to add a workflow instance to WfInstances with such a strong limitation. There is no way around it?

Thanks for the quick look, fair concern.

To be clear, the missing dependencies are not something I stripped. The instance was produced by the official WfCommons NextflowLogsParser available in wfcommons 1.4 on a recent stack (Nextflow 26.04.2, nf-core/rnaseq 3.25.0), and the parser emits empty parents/children fields (and zero files) for this run. The pipeline_info trace/report it reads are task-level records without the dataflow graph, and it does not reconstruct edges from the separate Nextflow DAG artefact. So this is a parser limitation I observed, not a property of the underlying run.

For context: this came out of a small university seminar project, so reconstructing the DAG manually or patching the parser is out of scope for me here. Selected execution fields such as runtime, cores, CPU utilisation, and I/O are still usable, but I fully agree that the missing dependencies are a strong limitation for WfInstances.

I am happy to leave it to you whether an execution-focused candidate instance with this documented limitation is useful, or whether you would rather close the PR until the parser handles dependencies more faithfully.

@henricasanova

Copy link
Copy Markdown
Contributor

Thanks for the clarification, and I didn't think you had purposely stripped the dependencies :) I

don't think we'll be willing to add a workflow instance without the dependencies, BUT this is an opportunity to improve WfCommons. I believe the lack of dependencies may be because the current implementation of the parser expects that the workflow was executed with a patched version of Nextflow, which is a really unfortunate limitation or the parser... other people have worked on this parser, and I am not 100% up-to-date, and I may be wrong there.

I have been improving/fixing many log parsers lately, and I would like to take a closer look at the Nextflow parser, and I may as well start with your workflow! Is there any way you can send me the Nextflow directory with all logs so that I can take a look myself, and perhaps fix the parser? Or alternately, and perhaps better, you could send me what's needed to run the workflow using Nextflow (perhaps there is a repo for it)? If not, no problem, I'll pick some random Nextflow workflow out there...

@RE-Z3R0

RE-Z3R0 commented Jun 19, 2026

Copy link
Copy Markdown
Author

Thanks for the clarification, and I didn't think you had purposely stripped the dependencies :) I

don't think we'll be willing to add a workflow instance without the dependencies, BUT this is an opportunity to improve WfCommons. I believe the lack of dependencies may be because the current implementation of the parser expects that the workflow was executed with a patched version of Nextflow, which is a really unfortunate limitation or the parser... other people have worked on this parser, and I am not 100% up-to-date, and I may be wrong there.

I have been improving/fixing many log parsers lately, and I would like to take a closer look at the Nextflow parser, and I may as well start with your workflow! Is there any way you can send me the Nextflow directory with all logs so that I can take a look myself, and perhaps fix the parser? Or alternately, and perhaps better, you could send me what's needed to run the workflow using Nextflow (perhaps there is a repo for it)? If not, no problem, I'll pick some random Nextflow workflow out there...

Thanks, that makes sense. I completely understand not wanting to include an instance without dependencies.

I would be happy to provide the artefacts so you can use this run as a test case for the Nextflow parser. The workflow does not have a separate custom repository: it was run directly from nf-core/rnaseq 3.25.0 with Nextflow 26.04.2 and Singularity. The input data are public ENA/SRA FASTQs from GSE110004 (SRR6357070 and SRR6357076), and the reference is Ensembl Fungi Saccharomyces cerevisiae R64-1-1 release 115.

I can provide a small archive containing:

  • the Nextflow pipeline_info/ directory (execution_trace, execution_report, execution_timeline, pipeline_dag, params, etc.),
  • the .nextflow.log file,
  • the samplesheet,
  • the run config,
  • the exact Nextflow command,
  • the FASTQ/reference download URLs,
  • and the generated WfInstance JSON for comparison.

I would avoid sending the full work/ directory and raw FASTQ files unless you really need them, since those are much larger and the input data are public/re-downloadable.

If that sounds useful, I can upload the archive somewhere and link it here, or alternatively attach it to the PR if GitHub accepts the file size.

@henricasanova

Copy link
Copy Markdown
Contributor

Such an archive would be great. And yes, the full data is overkill, especially as I can figure out where to download them from. Not sure attaching to the PR will work, but of course a link will work in case the archive's too big!

@RE-Z3R0

RE-Z3R0 commented Jun 23, 2026

Copy link
Copy Markdown
Author

Such an archive would be great. And yes, the full data is overkill, especially as I can figure out where to download them from. Not sure attaching to the PR will work, but of course a link will work in case the archive's too big!

Thanks, that makes sense. I have prepared a small reproducibility bundle with the relevant Nextflow artefacts rather than the full work directory.

It contains:

  • pipeline_info/ with execution_trace, execution_report, execution_timeline, pipeline_dag, params, etc.
  • .nextflow.log
  • samplesheet
  • final_run.config
  • generated WfInstance JSON
  • README_reproduce.md with the exact command and public FASTQ URLs

The raw FASTQ and reference files are not included because they are public and much larger, but the README contains the download URLs and the exact run command.

Bundle:
nfcore_rnaseq_gse110004_nextflow_parser_bundle.tar.gz

@henricasanova

Copy link
Copy Markdown
Contributor

Great! I'll look at it soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants