|
| 1 | +<p></p> |
| 2 | +<p align="center"><img width="250" src="https://user-images.githubusercontent.com/3348134/177604157-11181f6c-57e5-44b1-8f6c-774edbba5512.png" alt="YData Logo"></p> |
| 3 | +<p></p> |
| 4 | + |
| 5 | +[](https://pypi.org/project/ydata-synthetic) |
| 6 | + |
| 7 | +[](https://pepy.tech/project/ydata-synthetic) |
| 8 | + |
| 9 | + |
| 10 | +[](https://github.com/ydataai/ydata-synthetic/actions/workflows/tests.yml) |
| 11 | +[](https://codecov.io/gh/ydataai/ydata-synthetic) |
| 12 | +[](https://github.com/ydataai/ydata-synthetic) |
| 13 | +[](https://discord.com/invite/mw7xjJ7b7s) |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +## Overview |
| 18 | +`ydata-synthetic` is the go-to Python package for **synthetic data generation for tabular and time-series data**. It uses the latest Generative AI models to learn the properties of real data and create realistic synthetic data. This project was created to educate the community about synthetic data and its applications in real-world domains, such as data augmentation, bias mitigation, data sharing, and privacy engineering. To learn more about Synthetic Data and its applications, [check this article](https://ydata.ai/resources/10-most-frequently-asked-questions-about-synthetic-data). |
| 19 | + |
| 20 | +## Current Functionality |
| 21 | +- 🤖 **Create Realistic Synthetic Data using Generative AI Models:** `ydata-synthetic` supports the state-of-the-art generative adversarial networks for data generation, namely Vanilla GAN, CGAN, WGAN, WGAN-GP, DRAGAN, Cramer GAN, CWGAN-GP, CTGAN, and TimeGAN. Learn more about the use of [GANs for Synthetic Data generation](https://medium.com/ydata-ai/generating-synthetic-tabular-data-with-gans-part-1-866705a77302). |
| 22 | + |
| 23 | +- 📀 **Synthetic Data Generation for Tabular and Time-Series Data:** The package supports the synthesization of tabular and time-series data, covering a wide range of real-world applications. Learn how to leverage `ydata-synthetic` for [tabular](https://ydata.ai/resources/gans-for-synthetic-data-generation) and [time-series](https://towardsdatascience.com/synthetic-time-series-data-a-gan-approach-869a984f2239) data. |
| 24 | + |
| 25 | +- 💻 **Best Generation Experience in Open Source:** Including a guided UI experience for the generation of synthetic data, from reading the data to visualization of synthetic data. All served by a slick Streamlit app. |
| 26 | +:fontawesome-brands-youtube:{ style="color: #EE0F0F" } Here's a [quick overview](https://www.youtube.com/watch?v=ep0PhwsFx0A) – :octicons-clock-24: 1min |
| 27 | + |
| 28 | + |
| 29 | +## Supported Data Types |
| 30 | + |
| 31 | +=== "Tabular Data" |
| 32 | + **Tabular data** does not have a temporal dependence, and can be structured and organized in a table-like format, where **features are represented in columns**, whereas **observations correspond to the rows**. |
| 33 | + |
| 34 | + Additionally, tabular data usually comprises both *numeric* and *categorical* features. **Numeric** features are those that encode **quantitative** values, whereas **categorical** represent **qualitative** measurements. Categorical features can further divided in *ordinal*, *binary* or *boolean*, and *nominal* features. |
| 35 | + |
| 36 | + Learn more about synthesizing tabular data in this [article](https://ydata.ai/resources/gans-for-synthetic-data-generation), or check the [quickstart guide](getting-started/quickstart.md#synthesizing-a-tabular-dataset) to get started with the synthesization of tabular datasets. |
| 37 | + |
| 38 | +=== "Time-Series Data" |
| 39 | + **Time-series data** exhibit a sequencial, **temporal dependency** between records, and may present a wide range of patterns and trends, including **seasonality** (patterns that repeat at calendar periods -- days, weeks, months -- such as holiday sales, for instance) or **periodicity** (patterns that repeat over time). |
| 40 | + |
| 41 | + Read more about generating time-series data in this [article](https://ydata.ai/resources/synthetic-time-series-data-a-gan-approach) and check this [quickstart guide](getting-started/quickstart.md#synthesizing-a-time-series-dataset) to get started with time-series data synthesization. |
| 42 | + |
| 43 | + |
| 44 | +## Supported Generative AI Models |
| 45 | +The following architectures are currently supported: |
| 46 | + |
| 47 | +- [GAN](https://arxiv.org/abs/1406.2661) |
| 48 | +- [CGAN](https://arxiv.org/abs/1411.1784) (Conditional GAN) |
| 49 | +- [WGAN](https://arxiv.org/abs/1701.07875) (Wasserstein GAN) |
| 50 | +- [WGAN-GP](https://arxiv.org/abs/1704.00028) (Wassertein GAN with Gradient Penalty) |
| 51 | +- [DRAGAN](https://arxiv.org/pdf/1705.07215.pdf) (Deep Regret Analytic GAN) |
| 52 | +- [Cramer GAN](https://arxiv.org/abs/1705.10743) (Cramer Distance Solution to Biased Wasserstein Gradients) |
| 53 | +- [CWGAN-GP](https://cameronfabbri.github.io/papers/conditionalWGAN.pdf) (Conditional Wassertein GAN with Gradient Penalty) |
| 54 | +- [CTGAN](https://arxiv.org/pdf/1907.00503.pdf) (Conditional Tabular GAN) |
| 55 | +- [TimeGAN](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf) (specifically for *time-series* data) |
0 commit comments