Skip to content

Commit eccda35

Browse files
committed
Updated datasets 2025-04-06 UTC
1 parent 46dc999 commit eccda35

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

aws_open_datasets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3806,7 +3806,7 @@
38063806
},
38073807
{
38083808
"Name": "COVID-19 Genome Sequence Dataset",
3809-
"Description": "Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCBI accession Files in the `sra-src` folder are in FASTQ, BAM, or CRAM format (original submission); files in the `run` folder are in sra format and require the SRA Toolkit",
3809+
"Description": "Genetic variations of SARS-CoV-2 in VCF format, organized by NCBI accession Each VCF file corresponds to the SRA parent-run's accession ID Files in the vcf folder are in VCF and can be read by any program that accepts *vcf files or can read tabular data",
38103810
"ARN": "arn:aws:s3:::sra-pub-sars-cov2",
38113811
"Region": "us-east-1",
38123812
"Type": "S3 Bucket",

aws_open_datasets.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMO
139139
COBRA Whole slide images with corresponding labels including skin cancer and basal cel arn:aws:s3:::cobra-pathology us-west-2 S3 Bucket https://daangeijs.github.io/cobra/ https://www.computationalpathologygroup.eu/members/daan-geijs/ Radboud University Medical Center As required CC BY-SA-NC 4.0 aws-pds, life sciences, cancer, computational pathology, deep learning, histopathology, computer vision
140140
COCO - Common Objects in Context - fast.ai datasets Datasets arn:aws:s3:::fast-ai-coco us-east-1 S3 Bucket http://course.fast.ai/datasets info@fast.ai [fast.ai](http://www.fast.ai/) As required Creative Commons http://cocodataset.org/#termsofuse aws-pds, deep learning, computer vision, machine learning
141141
COVID-19 Data Lake Collected COVID-19 related datasets arn:aws:s3:::covid19-lake us-east-2 S3 Bucket https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-1 aws-covid-19-data-lake@amazon.com [Amazon Web Services](https://aws.amazon.com/) Periodically Varies by dataset amazon.science, bioinformatics, biology, coronavirus, COVID-19, health, life sciences, MERS, medicine, SARS ['[Browse Bucket](https://covid19-lake.s3.amazonaws.com/index.html)']
142-
COVID-19 Genome Sequence Dataset Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCB arn:aws:s3:::sra-pub-sars-cov2 us-east-1 S3 Bucket https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/ https://support.nlm.nih.gov/support/create-case/ [National Library of Medicine (NLM)](http://nlm.nih.gov/) Daily [NIH Genomic Data Sharing Policy](https://osp.od.nih.gov/scientific-sharing/geno aws-pds, bioinformatics, biology, coronavirus, COVID-19, fastq, bam, cram, genomic, genetic, health, life sciences, MERS, SARS, virus, STRIDES, whole genome sequencing, transcriptomics
142+
COVID-19 Genome Sequence Dataset Genetic variations of SARS-CoV-2 in VCF format, organized by NCBI accession Eac arn:aws:s3:::sra-pub-sars-cov2 us-east-1 S3 Bucket https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/ https://support.nlm.nih.gov/support/create-case/ [National Library of Medicine (NLM)](http://nlm.nih.gov/) Daily [NIH Genomic Data Sharing Policy](https://osp.od.nih.gov/scientific-sharing/geno aws-pds, bioinformatics, biology, coronavirus, COVID-19, fastq, bam, cram, genomic, genetic, health, life sciences, MERS, SARS, virus, STRIDES, whole genome sequencing, transcriptomics
143143
COVID-19 Genome Sequence Dataset Metadata for sra-pub-sars-cov2 in an Athena-queryable format arn:aws:s3:::sra-pub-sars-cov2-metadata-us-east-1 us-east-1 S3 Bucket https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/ https://support.nlm.nih.gov/support/create-case/ [National Library of Medicine (NLM)](http://nlm.nih.gov/) Daily [NIH Genomic Data Sharing Policy](https://osp.od.nih.gov/scientific-sharing/geno aws-pds, bioinformatics, biology, coronavirus, COVID-19, fastq, bam, cram, genomic, genetic, health, life sciences, MERS, SARS, virus, STRIDES, whole genome sequencing, transcriptomics
144144
COVID-19 Harmonized Data COVID-19 Harmonized Dataset arn:aws:s3:::covid19-harmonized-dataset us-east-2 S3 Bucket https://www.stitchdata.com/docs/integrations/saas/covid-19 covid19.dataset@talend.com [Talend / Stitch](http://www.stitchdata.com/) New COVID-19 data is added twice daily There are no restrictions on the use of this data. aws-pds, COVID-19, coronavirus, life sciences ['[Browse Bucket](https://covid19-harmonized-dataset.s3.amazonaws.com/index.html)']
145145
COVID-19 Molecular Structure and Therapeutics Hub Data storage of for the MolSSI and BioExcel COVID-19 Hub Includes atomistic str arn:aws:s3:::molssi-bioexcel-covid-19-structure-therapeutics-hub us-east-1 S3 Bucket https://covid.molssi.org/ info@molssi.org [Molecular Sciences Software Institute (MolSSI)](https://molssi.org/) and [BioEx Data contributions come from external researchers and groups at a roughly weekly Most data will be in an open license provided by the contributing individual(s). aws-pds, biology, bioinformatics, coronavirus, COVID-19, life sciences, molecular docking, pharmaceutical

0 commit comments

Comments
 (0)