Add nf-core/rnaseq GSE110004 Nextflow instance#3
Conversation
|
Thanks for this PR! Adding new workflow instances is of course one of our goals. We are, however, concerned that there are no task dependencies at all. Is it because those dependencies were not captured from the logs? (Nextflow is not the easiest). We are not sure we want to add a workflow instance to WfInstances with such a strong limitation. There is no way around it? |
Thanks for the quick look, fair concern. To be clear, the missing dependencies are not something I stripped. The instance was produced by the official WfCommons NextflowLogsParser available in wfcommons 1.4 on a recent stack (Nextflow 26.04.2, nf-core/rnaseq 3.25.0), and the parser emits empty parents/children fields (and zero files) for this run. The pipeline_info trace/report it reads are task-level records without the dataflow graph, and it does not reconstruct edges from the separate Nextflow DAG artefact. So this is a parser limitation I observed, not a property of the underlying run. For context: this came out of a small university seminar project, so reconstructing the DAG manually or patching the parser is out of scope for me here. Selected execution fields such as runtime, cores, CPU utilisation, and I/O are still usable, but I fully agree that the missing dependencies are a strong limitation for WfInstances. I am happy to leave it to you whether an execution-focused candidate instance with this documented limitation is useful, or whether you would rather close the PR until the parser handles dependencies more faithfully. |
|
Thanks for the clarification, and I didn't think you had purposely stripped the dependencies :) I don't think we'll be willing to add a workflow instance without the dependencies, BUT this is an opportunity to improve WfCommons. I believe the lack of dependencies may be because the current implementation of the parser expects that the workflow was executed with a patched version of Nextflow, which is a really unfortunate limitation or the parser... other people have worked on this parser, and I am not 100% up-to-date, and I may be wrong there. I have been improving/fixing many log parsers lately, and I would like to take a closer look at the Nextflow parser, and I may as well start with your workflow! Is there any way you can send me the Nextflow directory with all logs so that I can take a look myself, and perhaps fix the parser? Or alternately, and perhaps better, you could send me what's needed to run the workflow using Nextflow (perhaps there is a repo for it)? If not, no problem, I'll pick some random Nextflow workflow out there... |
Thanks, that makes sense. I completely understand not wanting to include an instance without dependencies. I would be happy to provide the artefacts so you can use this run as a test case for the Nextflow parser. The workflow does not have a separate custom repository: it was run directly from nf-core/rnaseq 3.25.0 with Nextflow 26.04.2 and Singularity. The input data are public ENA/SRA FASTQs from GSE110004 (SRR6357070 and SRR6357076), and the reference is Ensembl Fungi Saccharomyces cerevisiae R64-1-1 release 115. I can provide a small archive containing:
I would avoid sending the full work/ directory and raw FASTQ files unless you really need them, since those are much larger and the input data are public/re-downloadable. If that sounds useful, I can upload the archive somewhere and link it here, or alternatively attach it to the PR if GitHub accepts the file size. |
|
Such an archive would be great. And yes, the full data is overkill, especially as I can figure out where to download them from. Not sure attaching to the PR will work, but of course a link will work in case the archive's too big! |
Thanks, that makes sense. I have prepared a small reproducibility bundle with the relevant Nextflow artefacts rather than the full work directory. It contains:
The raw FASTQ and reference files are not included because they are public and much larger, but the README contains the download URLs and the exact run command. Bundle: |
|
Great! I'll look at it soon! |
This PR adds a candidate WfFormat instance generated from a real nf-core/rnaseq 3.25.0 execution using Nextflow 26.04.2 and Singularity.
Run summary:
The instance was generated from the run’s Nextflow pipeline_info artefacts using WfCommons 1.4 NextflowLogsParser and validated with SchemaValidator.validate_instance().
Known limitations observed during validation/evaluation:
These limitations appear to arise from the current NextflowLogsParser conversion rather than from the underlying nf-core/rnaseq execution artefacts. I am submitting the instance as a candidate contribution and documenting these limitations explicitly.