Skip to content

Commit 14f605d

Browse files
docs: add examples to documentation (#280)
Delete examples.md
1 parent 1e1d8cb commit 14f605d

11 files changed

Lines changed: 211 additions & 1 deletion

File tree

docs/examples/cgan_example.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *CGAN* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
CGAN is a deep learning model that combines GANs with conditional models to generate data samples based on specific conditions:
8+
9+
- 📑 **Paper:** [Conditonal Generative Adversarial Nets](https://arxiv.org/abs/1411.1784)
10+
11+
Here’s an example of how to synthetize tabular data with CGAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/creditcard_cgan.py"
16+
```
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *CRAMER GAN* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
CRAMER GAN is a variant of GAN that employs the Cramer distance as a measure of similarity between real and generated data distributions to improve training stability and enhance sample quality:
8+
9+
- 📑 **Paper:** [The Cramer Distance as a Solution to Biased Wasserstein Gradients](https://arxiv.org/abs/1705.10743)
10+
11+
Here’s an example of how to synthetize tabular data with CRAMER GAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/creditcard_cramergan.py"
16+
```

docs/examples/ctgan_example.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Synthesize tabular data
2+
3+
**Using *CTGAN* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
Additionally, real-world data usually comprises both **numeric** and **categorical** features. Numeric features are those that encode quantitative values, whereas categorical represent qualitative measurements.
8+
9+
CTGAN was specifically designed to deal with the challenges posed by tabular datasets, handling mixed (numeric and categorical) data:
10+
11+
- 📑 **Paper:** [Modeling Tabular Data using Conditional GAN](https://arxiv.org/pdf/1907.00503.pdf)
12+
13+
Here’s an example of how to synthetize tabular data with CTGAN using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
14+
15+
16+
```python
17+
--8<-- "examples/regular/models/adult_ctgan.py"
18+
```

docs/examples/cwgangp_example.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *CWGAN-GP* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
CWGAN GP is a variant of GAN that incorporates conditional information to generate data samples, while leveraging the Wasserstein distance to improve training stability and sample quality:
8+
9+
- 📑 **Paper:** [Conditional Wasserstein Generative Adversarial Networks](https://cameronfabbri.github.io/papers/conditionalWGAN.pdf)
10+
11+
Here’s an example of how to synthetize tabular data with CWGAN-GP using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/creditcard_cramergan.py"
16+
```

docs/examples/dragan_example.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *DRAGAN* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
DRAGAN is a GAN variant that uses a gradient penalty to improve training stability and mitigate mode collapse:
8+
9+
- 📑 **Paper:** [On Convergence and Stability of GANs](https://arxiv.org/pdf/1705.07215.pdf)
10+
11+
Here’s an example of how to synthetize tabular data with DRAGAN using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/adult_dragan.py"
16+
```

docs/examples/timegan_example.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Synthesize time-series data
2+
3+
**Using *TimeGAN* to generate synthetic time-series data:**
4+
5+
Although tabular data may be the most frequently discussed type of data, a great number of real-world domains — from traffic and daily trajectories to stock prices and energy consumption patterns — produce **time-series data** which introduces several aspects of complexity to synthetic data generation.
6+
7+
Time-series data is structured sequentially, with observations **ordered chronologically** based on their associated timestamps or time intervals. It explicitly incorporates the temporal aspect, allowing for the analysis of trends, seasonality, and other dependencies over time.
8+
9+
TimeGAN is a model that uses a Generative Adversarial Network (GAN) framework to generate synthetic time series data by learning the underlying temporal dependencies and characteristics of the original data:
10+
11+
- 📑 **Paper:** [Time-series Generative Adversarial Networks](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf)
12+
13+
Here’s an example of how to synthetize time-series data with TimeGAN using the [Yahoo Stock Price](https://www.kaggle.com/datasets/arashnic/time-series-forecasting-with-yahoo-stock-price) dataset:
14+
15+
16+
```python
17+
--8<-- "examples/timeseries/stock_timegan.py"
18+
```
19+
20+
21+

docs/examples/wgan_example.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *WGAN* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
WGAN is a variant of GAN that utilizes the Wasserstein distance to improve training stability and generate higher quality samples:
8+
9+
- 📑 **Paper:** [Wasserstein GAN](https://arxiv.org/abs/1701.07875)
10+
11+
Here’s an example of how to synthetize tabular data with WGAN using the [Credit Card](https://www.openml.org/search?type=data&sort=runs&id=1597&status=active) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/creditcard_wgan.py"
16+
```

docs/examples/wgangp_example.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# Synthesize tabular data
2+
3+
**Using *WGAN-GP* to generate tabular synthetic data:**
4+
5+
Real-world domains are often described by **tabular data** i.e., data that can be structured and organized in a table-like format, where **features/variables** are represented in **columns**, whereas **observations** correspond to the **rows**.
6+
7+
WGANGP is a variant of GAN that incorporates a gradient penalty term to enhance training stability and improve the diversity of generated samples:
8+
9+
- 📑 **Paper:** [Improved Training of Wasserstein GANs](https://arxiv.org/abs/1704.00028)
10+
11+
Here’s an example of how to synthetize tabular data with WGAN-GP using the [Adult Census Income](https://www.kaggle.com/datasets/uciml/adult-census-income?resource=download) dataset:
12+
13+
14+
```python
15+
--8<-- "examples/regular/models/adult_wgangp.py"
16+
```

docs/getting-started/examples.md

Whitespace-only changes.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
"""
2+
TimeGAN architecture example file
3+
"""
4+
5+
# Importing necessary libraries
6+
from os import path
7+
import pandas as pd
8+
import numpy as np
9+
import matplotlib.pyplot as plt
10+
11+
from ydata_synthetic.synthesizers import ModelParameters
12+
from ydata_synthetic.preprocessing.timeseries import processed_stock
13+
from ydata_synthetic.synthesizers.timeseries import TimeGAN
14+
15+
# Define model parameters
16+
seq_len=24
17+
n_seq = 6
18+
hidden_dim=24
19+
gamma=1
20+
21+
noise_dim = 32
22+
dim = 128
23+
batch_size = 128
24+
25+
log_step = 100
26+
learning_rate = 5e-4
27+
28+
gan_args = ModelParameters(batch_size=batch_size,
29+
lr=learning_rate,
30+
noise_dim=noise_dim,
31+
layers_dim=dim)
32+
33+
# Read the data
34+
stock_data = processed_stock(path='../../data/stock_data.csv', seq_len=seq_len)
35+
print(len(stock_data),stock_data[0].shape)
36+
37+
# Training the TimeGAN synthesizer
38+
if path.exists('synthesizer_stock.pkl'):
39+
synth = TimeGAN.load('synthesizer_stock.pkl')
40+
else:
41+
synth = TimeGAN(model_parameters=gan_args, hidden_dim=24, seq_len=seq_len, n_seq=n_seq, gamma=1)
42+
synth.train(stock_data, train_steps=50000)
43+
synth.save('synthesizer_stock.pkl')
44+
45+
# Generating new synthetic samples
46+
synth_data = synth.sample(len(stock_data))
47+
print(synth_data.shape)
48+
49+
# Reshaping the data
50+
cols = ['Open','High','Low','Close','Adj Close','Volume']
51+
52+
# Plotting some generated samples. Both Synthetic and Original data are still standartized with values between [0,1]
53+
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 10))
54+
axes=axes.flatten()
55+
56+
time = list(range(1,25))
57+
obs = np.random.randint(len(stock_data))
58+
59+
for j, col in enumerate(cols):
60+
df = pd.DataFrame({'Real': stock_data[obs][:, j],
61+
'Synthetic': synth_data[obs][:, j]})
62+
df.plot(ax=axes[j],
63+
title = col,
64+
secondary_y='Synthetic data', style=['-', '--'])
65+
fig.tight_layout()

0 commit comments

Comments
 (0)