Skip to content

Commit 0fbc436

Browse files
committed
Updated datasets 2025-07-26 UTC
1 parent 4dccb67 commit 0fbc436

6 files changed

Lines changed: 2933 additions & 2830 deletions

File tree

aws_open_datasets.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34868,7 +34868,7 @@
3486834868
"ARN": "arn:aws:s3:::softwareheritage-inventory",
3486934869
"Region": "us-east-1",
3487034870
"Type": "S3 Bucket",
34871-
"Documentation": "https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html",
34871+
"Documentation": "https://docs.softwareheritage.org/devel/swh-export/graph/athena.html",
3487234872
"Contact": "aws@softwareheritage.org",
3487334873
"ManagedBy": "Software Heritage",
3487434874
"UpdateFrequency": "Data is updated yearly",
@@ -34892,7 +34892,7 @@
3489234892
"ARN": "arn:aws:s3:::softwareheritage",
3489334893
"Region": "us-east-1",
3489434894
"Type": "S3 Bucket",
34895-
"Documentation": "https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html",
34895+
"Documentation": "https://docs.softwareheritage.org/devel/swh-export/graph/athena.html",
3489634896
"Contact": "aws@softwareheritage.org",
3489734897
"ManagedBy": "Software Heritage",
3489834898
"UpdateFrequency": "Data is updated yearly",

aws_open_datasets.tsv

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1280,8 +1280,8 @@ Single-Cell Atlas of Human Blood During Healthy Aging Raw sequencing data (fastq
12801280
Smithsonian Open Access Smithsonian Open Access Media and Metadata arn:aws:s3:::smithsonian-open-access us-west-2 S3 Bucket http://edan.si.edu/openaccess/docs/ openaccess@si.edu [SI](http://www.si.edu/) New / updated metadata and image files will be pushed weekly. CC0 aws-pds, art, history, culture, museum, encyclopedic
12811281
SocialGene RefSeq Databases SocialGene 2023_v041 Data and Database Dumps arn:aws:s3:::socialgene-open-data us-east-2 S3 Bucket https://socialgene.github.io/precomputed_databases/2023_v0.4.1/aws/aws https://github.com/socialgene/socialgene.github.io/issues University of Wisconsin-Madison This database is currently what was published in our 2024 paper introducing Soci Where applicable, SocialGene data is released under CC0 (https://creativecommons metagenomics, genomic, bioinformatics, microbiome, chemical biology, pharmaceutical, graph, protein, amino acid ['[Browse Bucket](https://socialgene-open-data.s3.amazonaws.com/)']
12821282
Sofar Spotter Archive Hourly position, wave spectra and bulk wave parameters from global free drifting arn:aws:s3:::sofar-spotter-archive us-west-2 S3 Bucket [Spotter Technical Reference Manual](https://content.sofarocean.com/hubfs/Spotte opendata@sofarocean.com [Sofar Ocean](https://www.sofarocean.com/company/contact-us) As available [Sofar Data Access Agreement](https://sofarocean.notion.site/sofarocean/Sofar-Da aws-pds, climate, meteorological, sustainability, weather, oceans, environmental, oceans ['[Browse Bucket](https://sofar-spotter-archive.s3.amazonaws.com/index.html)']
1283-
Software Heritage Graph Dataset S3 Inventory files arn:aws:s3:::softwareheritage-inventory us-east-1 S3 Bucket https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html aws@softwareheritage.org Software Heritage Data is updated yearly "The term ""Software Heritage Graph Dataset"" designates the internal structure of " aws-pds, source code, open source software, free software, digital preservation
1284-
Software Heritage Graph Dataset Software Heritage Graph Dataset arn:aws:s3:::softwareheritage us-east-1 S3 Bucket https://docs.softwareheritage.org/devel/swh-dataset/graph/athena.html aws@softwareheritage.org Software Heritage Data is updated yearly "The term ""Software Heritage Graph Dataset"" designates the internal structure of " aws-pds, source code, open source software, free software, digital preservation
1283+
Software Heritage Graph Dataset S3 Inventory files arn:aws:s3:::softwareheritage-inventory us-east-1 S3 Bucket https://docs.softwareheritage.org/devel/swh-export/graph/athena.html aws@softwareheritage.org Software Heritage Data is updated yearly "The term ""Software Heritage Graph Dataset"" designates the internal structure of " aws-pds, source code, open source software, free software, digital preservation
1284+
Software Heritage Graph Dataset Software Heritage Graph Dataset arn:aws:s3:::softwareheritage us-east-1 S3 Bucket https://docs.softwareheritage.org/devel/swh-export/graph/athena.html aws@softwareheritage.org Software Heritage Data is updated yearly "The term ""Software Heritage Graph Dataset"" designates the internal structure of " aws-pds, source code, open source software, free software, digital preservation
12851285
Solar Dynamics Observatory (SDO) Machine Learning Dataset The v1 dataset includes AIA observations 2010-2018 and v2 includes AIA observati arn:aws:s3:::gov-nasa-hdrl-data1/contrib/fdl-sdoml/ us-west-2 S3 Bucket https://github.com/SDOML/sdoml.github.io Meng Jin (jinmeng@lmsal.com) and Paul Wright (paul@pauljwright.co.uk) [NASA](http://www.nasa.gov/) N/A (The IDL/Python scripts for generating the datasets are published online, wh There are no restrictions on the use of this data. aws-pds, machine learning, NASA SMD AI
12861286
SondeHub Radiosonde Telemetry Radiosonde Telemetry as JSON blobs of Universal Telemetry format arn:aws:s3:::sondehub-history us-east-1 S3 Bucket https://github.com/projecthorus/sondehub-infra/wiki/Amazon-Open-Data Michaela Wheeler <vk3fur@sondehub.org> [SondeHub](https://sondehub.org/) Data is updated as we receive it Creative Commons BY-SA 2.0 aws-pds, climate, environmental, weather, GPS ['[Browse Bucket by serial number](http://sondehub-history.s3-website-us-east-1.amazonaws.com/#serial/)', '[Browse Bucket by date/time](http://sondehub-history.s3-website-us-east-1.amazonaws.com/#date/)']
12871287
Sophos/ReversingLabs 20 Million malware detection dataset Sophos/ReversingLabs 20 million sample dataset arn:aws:s3:::sorel-20m/ us-west-2 S3 Bucket https://github.com/sophos-ai/SOREL-20M/blob/master/README.md sorel-dataset@sophos.com Sophos AI At most annually See the [Terms of Use](https://github.com/sophos-ai/SOREL-20M/blob/master/Terms% aws-pds, cyber security, deep learning, labeled, machine learning

0 commit comments

Comments
 (0)