docs: add examples to documentation (#280)

miriamspsantos · web-flow · commit 14f605d1c82d · 2023-06-07T11:31:59.000+02:00
Delete examples.md
diff --git a/docs/examples/cgan_example.md b/docs/examples/cgan_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *CGAN* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+CGAN is a deep learning model that combines GANs with conditional models to generate data samples based on specific conditions:
+
+- 📑 **Paper:** [Conditonal Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
+
+Here’s an example of how to synthetize tabular data with CGAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
+
+
+```python
+--8<-- "examples/regular/models/creditcard_cgan.py"
+```
diff --git a/docs/examples/cramer_gan_example.md b/docs/examples/cramer_gan_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *CRAMER GAN* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+CRAMER GAN is a variant of GAN that employs the Cramer distance as a measure of similarity between real and generated data distributions to improve training stability and enhance sample quality:
+
+- 📑 **Paper:** [The Cramer Distance as a Solution to Biased Wasserstein Gradients](https://arxiv.org/abs/1705.10743)
+
+Here’s an example of how to synthetize tabular data with CRAMER GAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
+
+
+```python
+--8<-- "examples/regular/models/creditcard_cramergan.py"
+```
diff --git a/docs/examples/ctgan_example.md b/docs/examples/ctgan_example.md
@@ -0,0 +1,18 @@
+# Synthesize tabular data
+
+**Using *CTGAN* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+Additionally, real-world data usually comprises both **numeric** and **categorical** features. Numeric features are those that encode quantitative values, whereas categorical represent qualitative measurements.
+
+CTGAN was specifically designed to deal with the challenges posed by tabular datasets, handling mixed (numeric and categorical) data:
+
+- 📑 **Paper:** [Modeling Tabular Data using Conditional GAN](https://arxiv.org/pdf/1907.00503.pdf)
+
+Here’s an example of how to synthetize tabular data with CTGAN using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
+
+
+```python
+--8<-- "examples/regular/models/adult_ctgan.py"
+```
diff --git a/docs/examples/cwgangp_example.md b/docs/examples/cwgangp_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *CWGAN-GP* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+CWGAN GP is a variant of GAN that incorporates conditional information to generate data samples, while leveraging the Wasserstein distance to improve training stability and sample quality:
+
+- 📑 **Paper:** [Conditional Wasserstein Generative Adversarial Networks](https://cameronfabbri.github.io/papers/conditionalWGAN.pdf)
+
+Here’s an example of how to synthetize tabular data with CWGAN-GP using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
+
+
+```python
+--8<-- "examples/regular/models/creditcard_cramergan.py"
+```
diff --git a/docs/examples/dragan_example.md b/docs/examples/dragan_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *DRAGAN* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+DRAGAN is a GAN variant that uses a gradient penalty to improve training stability and mitigate mode collapse:
+
+- 📑 **Paper:** [On Convergence and Stability of GANs](https://arxiv.org/pdf/1705.07215.pdf)
+
+Here’s an example of how to synthetize tabular data with DRAGAN using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
+
+
+```python
+--8<-- "examples/regular/models/adult_dragan.py"
+```
diff --git a/docs/examples/timegan_example.md b/docs/examples/timegan_example.md
@@ -0,0 +1,21 @@
+# Synthesize time-series data
+
+**Using *TimeGAN* to generate synthetic time-series data:**
+
+Although tabular data may be the most frequently discussed type of data, a great number of real-world domains — from traffic and daily trajectories to stock prices and energy consumption patterns — produce **time-series data** which introduces several aspects of complexity to synthetic data generation.
+
+Time-series data is structured sequentially, with observations **ordered chronologically** based on their associated timestamps or time intervals. It explicitly incorporates the temporal aspect, allowing for the analysis of trends, seasonality, and other dependencies over time. 
+
+TimeGAN is a model that uses a Generative Adversarial Network (GAN) framework to generate synthetic time series data by learning the underlying temporal dependencies and characteristics of the original data:
+
+- 📑 **Paper:** [Time-series Generative Adversarial Networks](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf)
+
+Here’s an example of how to synthetize time-series data with TimeGAN using the [Yahoo Stock Price](https://www.kaggle.com/datasets/arashnic/time-series-forecasting-with-yahoo-stock-price) dataset:
+
+
+```python
+--8<-- "examples/timeseries/stock_timegan.py"
+```
+
+
+
diff --git a/docs/examples/wgan_example.md b/docs/examples/wgan_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *WGAN* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+WGAN is a variant of GAN that utilizes the Wasserstein distance to improve training stability and generate higher quality samples:
+
+- 📑 **Paper:** [Wasserstein GAN](https://arxiv.org/abs/1701.07875)
+
+Here’s an example of how to synthetize tabular data with WGAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
+
+
+```python
+--8<-- "examples/regular/models/creditcard_wgan.py"
+```
diff --git a/docs/examples/wgangp_example.md b/docs/examples/wgangp_example.md
@@ -0,0 +1,16 @@
+# Synthesize tabular data
+
+**Using *WGAN-GP* to generate tabular synthetic data:**
+
+Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
+
+WGANGP is a variant of GAN that incorporates a gradient penalty term to enhance training stability and improve the diversity of generated samples:
+
+- 📑 **Paper:** [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028)
+
+Here’s an example of how to synthetize tabular data with WGAN-GP using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
+
+
+```python
+--8<-- "examples/regular/models/adult_wgangp.py"
+```
diff --git a/docs/getting-started/examples.md b/docs/getting-started/examples.md
diff --git a/examples/timeseries/stock_timegan.py b/examples/timeseries/stock_timegan.py
@@ -0,0 +1,65 @@
+"""
+    TimeGAN architecture example file
+"""
+
+# Importing necessary libraries
+from os import path
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+
+from ydata_synthetic.synthesizers import ModelParameters
+from ydata_synthetic.preprocessing.timeseries import processed_stock
+from ydata_synthetic.synthesizers.timeseries import TimeGAN
+
+# Define model parameters
+seq_len=24
+n_seq = 6
+hidden_dim=24
+gamma=1
+
+noise_dim = 32
+dim = 128
+batch_size = 128
+
+log_step = 100
+learning_rate = 5e-4
+
+gan_args = ModelParameters(batch_size=batch_size,
+                           lr=learning_rate,
+                           noise_dim=noise_dim,
+                           layers_dim=dim)
+
+# Read the data
+stock_data = processed_stock(path='../../data/stock_data.csv', seq_len=seq_len)
+print(len(stock_data),stock_data[0].shape)
+
+# Training the TimeGAN synthesizer
+if path.exists('synthesizer_stock.pkl'):
+    synth = TimeGAN.load('synthesizer_stock.pkl')
+else:
+    synth = TimeGAN(model_parameters=gan_args, hidden_dim=24, seq_len=seq_len, n_seq=n_seq, gamma=1)
+    synth.train(stock_data, train_steps=50000)
+    synth.save('synthesizer_stock.pkl')
+
+# Generating new synthetic samples
+synth_data = synth.sample(len(stock_data))
+print(synth_data.shape)
+
+# Reshaping the data
+cols = ['Open','High','Low','Close','Adj Close','Volume']
+
+# Plotting some generated samples. Both Synthetic and Original data are still standartized with values between [0,1]
+fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 10))
+axes=axes.flatten()
+
+time = list(range(1,25))
+obs = np.random.randint(len(stock_data))
+
+for j, col in enumerate(cols):
+    df = pd.DataFrame({'Real': stock_data[obs][:, j],
+                   'Synthetic': synth_data[obs][:, j]})
+    df.plot(ax=axes[j],
+            title = col,
+            secondary_y='Synthetic data', style=['-', '--'])
+fig.tight_layout()
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -10,7 +10,17 @@ nav:
     - Overview: 'index.md'
     - Installation: 'getting-started/installation.md'
     - Quickstart: 'getting-started/quickstart.md'
-    - Examples: 'getting-started/examples.md'
+  - Examples:
+      - Generate Tabular Data:
+        - CGAN: "examples/cgan_example.md"
+        - WGAN: "examples/wgan_example.md"
+        - WGAN-GP: "examples/wgangp_example.md"
+        - CTGAN: "examples/ctgan_example.md"
+        - DRAGAN: "examples/dragan_example.md"
+        - Cramer GAN: "examples/cramer_gan_example.md"
+        - CWGAN-GP: "examples/cwgangp_example.md"
+      - Generate Time-Series Data:
+        - TimeGAN: "examples/timegan_example.md"
   - Support: 
     - Help & Troubleshooting: 'support/help-troubleshooting.md'
     - Contribution Guidelines: 'support/contribute.md'