Skip to content

Commit cb6ed59

Browse files
committed
Updated datasets 2025-05-04 UTC
1 parent b1d6395 commit cb6ed59

6 files changed

Lines changed: 130 additions & 130 deletions

File tree

aws_geo_datasets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1855,7 +1855,7 @@
18551855
],
18561856
"Explore": [
18571857
"[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)",
1858-
"[Data Catalog](https://cadcat.s3.amazonaws.com/cae.yaml)"
1858+
"[Data Catalog](https://cadcat.s3.amazonaws.com/cae-collection.csv)"
18591859
],
18601860
"RequesterPays": null,
18611861
"AccountRequired": null,

aws_geo_datasets.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Chalmers Cloud Ice Climatology CCIC total ice water path and 2D cloud probabilit
7070
CitrusFarm Dataset CitrusFarm Dataset sequences arn:aws:s3:::ucr-robotics/citrus-farm-dataset us-west-2 S3 Bucket https://ucr-robotics.github.io/Citrus-Farm-Dataset/ Hanzhe Teng (hteng007@ucr.edu), Konstantinos Karydis (kkarydis@ece.ucr.edu) [Autonomous Robots and Control Systems Lab](https://sites.google.com/view/arcs-l NA Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). aws-pds, robotics, computer vision, agriculture, localization, mapping, lidar, IMU
7171
Cloud Indexes for Bowtie, Kraken, HISAT, and Centrifuge This bucket contains genomic indexes for Bowtie, Kraken, HISAT, and Centrifuge arn:aws:s3:::genome-idx us-east-1 S3 Bucket https://benlangmead.github.io/aws-indexes/ https://github.com/BenLangmead/aws-indexes/issues Langmead Lab at Johns Hopkins University & Kim Lab at University of Texas Southw As new data becomes available; roughly quarterly Public Domain aws-pds, genomic, bioinformatics, biology, whole genome sequencing, medicine, reference index, mapping, life sciences
7272
Cloud to Street - Microsoft Flood and Clouds Dataset Flood and Cloud Training Dataset arn:aws:s3:::radiant-mlhub/c2smsfloods us-west-2 S3 Bucket https://www.drivendata.org/competitions/81/detect-flood-water/ support@cloudtostreet.info [Radiant Earth Foundation](https://radiant.earth/) Not updated CC-BY-4.0 https://creativecommons.org/licenses/by/4.0/ aws-pds, computer vision, deep learning, machine learning, floods, geospatial, earth observation, satellite imagery, cog, synthetic aperture radar
73-
Co-Produced Climate Data to Support California's Resilience Investments Data catalog arn:aws:s3:::cadcat us-west-2 S3 Bucket https://analytics.cal-adapt.org/data/ analytics@cal-adapt.org Cal-Adapt Analytics Engine https://analytics.cal-adapt.org/ Infrequent, Irregular Varies, see dataset specific metadata atmosphere, aws-pds, climate, climate model, earth observation, geoscience, geospatial, meteorological, simulations, weather, zarr ['[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)', '[Data Catalog](https://cadcat.s3.amazonaws.com/cae.yaml)']
73+
Co-Produced Climate Data to Support California's Resilience Investments Data catalog arn:aws:s3:::cadcat us-west-2 S3 Bucket https://analytics.cal-adapt.org/data/ analytics@cal-adapt.org Cal-Adapt Analytics Engine https://analytics.cal-adapt.org/ Infrequent, Irregular Varies, see dataset specific metadata atmosphere, aws-pds, climate, climate model, earth observation, geoscience, geospatial, meteorological, simulations, weather, zarr ['[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)', '[Data Catalog](https://cadcat.s3.amazonaws.com/cae-collection.csv)']
7474
Collection of open nation-scale LiDAR datasets Open LiDAR datasets arn:aws:s3:::open-lidar-data eu-central-1 S3 Bucket https://github.com/flai-ai/open-lidar-data info@flai.ai [Flai](https://flai.ai/) When new open dataset is published. The exact version of the licence depends on LiDAR dataset and is not the same fo aws-pds, lidar, earth observation, geoscience, geospatial, land cover, mapping, survey
7575
Community Earth System Model Large Ensemble (CESM LENS) Project data files arn:aws:s3:::ncar-cesm-lens us-west-2 S3 Bucket https://doi.org/10.26024/wt24-5j82 rdahelp@ucar.edu [National Center for Atmospheric Research](https://ncar.ucar.edu/) Rare. The LENS experiment is complete, but we may occasionally copy additional f https://www.ucar.edu/terms-of-use/data climate, model, climate model, atmosphere, oceans, land, ice, geospatial, aws-pds, sustainability, zarr
7676
Community Earth System Model v2 ARISE (CESM2 ARISE) Project data files arn:aws:s3:::ncar-cesm2-arise us-east-2 S3 Bucket (https://github.com/NCAR/CESM2-ARISE) opendata-aws-arise@ucar.edu [National Center for Atmospheric Research](https://ncar.ucar.edu/) Rare once complete (August 2022) https://www.ucar.edu/terms-of-use/data climate, model, climate model, atmosphere, oceans, land, ice, geospatial, aws-pds, sustainability ['[Browse Bucket](https://ncar-cesm2-arise.s3.amazonaws.com/index.html)']

aws_open_datasets.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4861,7 +4861,7 @@
48614861
],
48624862
"Explore": [
48634863
"[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)",
4864-
"[Data Catalog](https://cadcat.s3.amazonaws.com/cae.yaml)"
4864+
"[Data Catalog](https://cadcat.s3.amazonaws.com/cae-collection.csv)"
48654865
],
48664866
"RequesterPays": null,
48674867
"ControlledAccess": null,

aws_open_datasets.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3) RNA-Seq Gene Expression
175175
Clinical Trial Sequencing Project - Diffuse Large B-Cell Lymphoma RNA-Seq Gene Expression Quantification arn:aws:s3:::gdc-ctsp-phs001175-2-open us-east-1 S3 Bucket https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001175.v dcf-support@datacommons.io [Center for Translational Data Science at The University of Chicago](https://ctd Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers month NIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access- aws-pds, cancer, genomic, life sciences, transcriptomics, whole genome sequencing, STRIDES
176176
Cloud Indexes for Bowtie, Kraken, HISAT, and Centrifuge This bucket contains genomic indexes for Bowtie, Kraken, HISAT, and Centrifuge arn:aws:s3:::genome-idx us-east-1 S3 Bucket https://benlangmead.github.io/aws-indexes/ https://github.com/BenLangmead/aws-indexes/issues Langmead Lab at Johns Hopkins University & Kim Lab at University of Texas Southw As new data becomes available; roughly quarterly Public Domain aws-pds, genomic, bioinformatics, biology, whole genome sequencing, medicine, reference index, mapping, life sciences
177177
Cloud to Street - Microsoft Flood and Clouds Dataset Flood and Cloud Training Dataset arn:aws:s3:::radiant-mlhub/c2smsfloods us-west-2 S3 Bucket https://www.drivendata.org/competitions/81/detect-flood-water/ support@cloudtostreet.info [Radiant Earth Foundation](https://radiant.earth/) Not updated CC-BY-4.0 https://creativecommons.org/licenses/by/4.0/ aws-pds, computer vision, deep learning, machine learning, floods, geospatial, earth observation, satellite imagery, cog, synthetic aperture radar
178-
Co-Produced Climate Data to Support California's Resilience Investments Data catalog arn:aws:s3:::cadcat us-west-2 S3 Bucket https://analytics.cal-adapt.org/data/ analytics@cal-adapt.org Cal-Adapt Analytics Engine https://analytics.cal-adapt.org/ Infrequent, Irregular Varies, see dataset specific metadata atmosphere, aws-pds, climate, climate model, earth observation, geoscience, geospatial, meteorological, simulations, weather, zarr ['[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)', '[Data Catalog](https://cadcat.s3.amazonaws.com/cae.yaml)']
178+
Co-Produced Climate Data to Support California's Resilience Investments Data catalog arn:aws:s3:::cadcat us-west-2 S3 Bucket https://analytics.cal-adapt.org/data/ analytics@cal-adapt.org Cal-Adapt Analytics Engine https://analytics.cal-adapt.org/ Infrequent, Irregular Varies, see dataset specific metadata atmosphere, aws-pds, climate, climate model, earth observation, geoscience, geospatial, meteorological, simulations, weather, zarr ['[Browse Bucket](https://cadcat.s3.amazonaws.com/index.html)', '[Data Catalog](https://cadcat.s3.amazonaws.com/cae-collection.csv)']
179179
CoMMpass from the Multiple Myeloma Research Foundation RNA-Seq Gene Expression Quantification arn:aws:s3:::gdc-mmrf-commpass-phs000748-2-open us-east-1 S3 Bucket https://gdc.cancer.gov/about-gdc/contributed-genomic-data-cancer-research/founda dcf-support@datacommons.io [Center for Translational Data Science at The University of Chicago](https://ctd Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers month NIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access- aws-pds, cancer, genomic, genetic, whole genome sequencing, STRIDES, life sciences
180180
Collection of open nation-scale LiDAR datasets Open LiDAR datasets arn:aws:s3:::open-lidar-data eu-central-1 S3 Bucket https://github.com/flai-ai/open-lidar-data info@flai.ai [Flai](https://flai.ai/) When new open dataset is published. The exact version of the licence depends on LiDAR dataset and is not the same fo aws-pds, lidar, earth observation, geoscience, geospatial, land cover, mapping, survey
181181
Common Crawl Crawl data (WARC and ARC format) arn:aws:s3:::commoncrawl us-east-1 S3 Bucket https://commoncrawl.org/the-data/get-started/ https://commoncrawl.org/connect/contact-us/ [Common Crawl](https://commoncrawl.org/) Monthly This data is available for anyone to use under the [Common Crawl Terms of Use](h aws-pds, encyclopedic, natural language processing, internet, web archive True

0 commit comments

Comments
 (0)