Skip to content

Commit 43fa8a3

Browse files
fabclmntjfsantos-dsarunnthevapalan
authored
fix: bug fixes and input parameters improvements (#76)
* feat: Input parameters entity (#74) * feat(synth): Change synthesizers input - Add Model and Train parameters as namedTuples. * feat(examples): Change examples inputs to match changes. * feat: Change CGAN and WGAN examples. * feat: Changing GAN parameters. * feat: Chaing DRAGAN inputs and example file. * feat(synth): Change learning rate inputs. * feat: Introduce the concept of epochs. * fix: Adding stock data dataset. * feat: Adding test datasets. * fix: Save and load methods from DRAGAN, WGAN and WGANGP. (#75) * fix: Save and load methods from DRAGAN, WGAN and WGANGP. (#75) * fix(TimeGAN): supervised loss computation (#77) * removing unused argument in build methods * Fixed supervisor loss temporal alignment * temporal allignment fix, path fix on example * path fix on example ipynb * docs: add contributing, support and license and minor changes (#78) * minor changes on quick start and project resources * add contributing guidelines * add support and licensing info * fix: Custom loss function. Example update. (#79) Co-authored-by: Francisco Santos <36741643+jfsantos-ds@users.noreply.github.com> Co-authored-by: Arunn Thevapalan <arunn.work@gmail.com>
1 parent c816150 commit 43fa8a3

19 files changed

Lines changed: 5047 additions & 1245 deletions

File tree

README.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,25 +21,27 @@ Synthetic data can be used for many applications:
2121

2222
# ydata-synthetic
2323
This repository contains material related with Generative Adversarial Networks for synthetic data generation, in particular regular tabular data and time-series.
24-
It consists in a set of different GANs architectures developed ussing Tensorflow 2.0. An example Jupyter Notebook is included, to show how to use the different architectures.
24+
It consists a set of different GANs architectures developed using Tensorflow 2.0. Several example Jupyter Notebooks and Python scripts are included, to show how to use the different architectures.
2525

2626
# Quickstart
27+
28+
The source code is currently hosted on GitHub at: https://github.com/ydataai/ydata-synthetic
29+
30+
Binary installers for the latest released version are available at the [Python Package Index (PyPI).](https://pypi.org/project/ydata-synthetic/)
2731
```
2832
pip install ydata-synthetic
2933
```
3034

3135
## Examples
3236
Here you can find usage examples of the package and models to synthesize tabular data.
3337

34-
**Credit Fraud dataset** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ydataai/ydata-synthetic/blob/master/examples/regular/gan_example.ipynb)
35-
36-
**Stock dataset** [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ydataai/ydata-synthetic/blob/master/examples/timeseries/TimeGAN_Synthetic_stock_data.ipynb)
38+
- Synthesizing the minority class with VanillaGAN on credit fraud dataset [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ydataai/ydata-synthetic/blob/master/examples/regular/gan_example.ipynb)
39+
- Time Series synthetic data generation with TimeGAN on stock dataset [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ydataai/ydata-synthetic/blob/master/examples/timeseries/TimeGAN_Synthetic_stock_data.ipynb)
40+
- More examples are continously added and can be found in `/examples` directory.
3741

3842
# Project Resources
39-
- Synthetic GitHub: https://github.com/ydataai/ydata-synthetic
40-
- Synthetic Data Community Slack: [click here to join](http://slack.ydata.ai/)
4143

42-
### In this repo you can find the following GAN architectures:
44+
In this repository you can find the several GAN architectures that are used to create synthesizers:
4345

4446
#### Tabular data
4547
- [GAN](https://arxiv.org/abs/1406.2661)
@@ -51,3 +53,15 @@ Here you can find usage examples of the package and models to synthesize tabular
5153

5254
#### Sequential data
5355
- [TimeGAN](https://papers.nips.cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf)
56+
57+
# Contributing
58+
We are open to collaboration! If you want to start contributing you only need to:
59+
1. Search for an issue in which you would like to work. Issues for newcomers are labeled with good first issue.
60+
2. Create a PR solving the issue.
61+
3. We would review every PRs and either accept or ask for revisions.
62+
63+
# Support
64+
For support in using this library, please join the #help Slack channel. The Slack community is very friendly and great about quickly answering questions about the use and development of the library. [Click here to join our Slack community!](http://slack.ydata.ai/)
65+
66+
# License
67+
[GNU General Public License v3.0](https://github.com/ydataai/ydata-synthetic/blob/master/LICENSE)

data/stock_data.csv

Lines changed: 3686 additions & 0 deletions
Large diffs are not rendered by default.

examples/regular/adult_dragan.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from ydata_synthetic.preprocessing.regular.adult import transformations
22
from ydata_synthetic.synthesizers.regular import DRAGAN
3+
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
34

45
#Load and process the data
56
data, processed_data, preprocessor = transformations()
@@ -12,16 +13,22 @@
1213
batch_size = 500
1314

1415
log_step = 100
15-
epochs = 200+1
16+
epochs = 300+1
1617
learning_rate = 1e-5
1718
beta_1 = 0.5
1819
beta_2 = 0.9
1920
models_dir = './cache'
2021

21-
gan_args = [batch_size, learning_rate, beta_1, beta_2, noise_dim, processed_data.shape[1], dim]
22-
train_args = ['', epochs, log_step]
22+
gan_args = ModelParameters(batch_size=batch_size,
23+
lr=learning_rate,
24+
betas=(beta_1, beta_2),
25+
noise_dim=noise_dim,
26+
n_cols=processed_data.shape[1],
27+
layers_dim=dim)
28+
29+
train_args = TrainParameters(epochs=epochs,
30+
sample_interval=log_step)
2331

2432
synthesizer = DRAGAN(gan_args, n_discriminator=3)
2533
synthesizer.train(processed_data, train_args)
26-
27-
synth_data = synthesizer.sample(1000)
34+
synthesizer.save('adult_synth.pkl')

examples/regular/adult_wgangp.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,36 @@
11
from ydata_synthetic.preprocessing.regular.adult import transformations
22
from ydata_synthetic.synthesizers.regular import WGAN_GP
3+
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
34

45
#Load and process the data
56
data, processed_data, preprocessor = transformations()
67

78
# WGAN_GP training
8-
#Defininf the training parameters of WGAN_GP
9+
#Defining the training parameters of WGAN_GP
910

1011
noise_dim = 32
1112
dim = 128
1213
batch_size = 128
1314

1415
log_step = 100
15-
epochs = 200+1
16-
learning_rate = 5e-4
16+
epochs = 300+1
17+
learning_rate = [5e-4, 3e-3]
1718
beta_1 = 0.5
1819
beta_2 = 0.9
1920
models_dir = './cache'
2021

21-
gan_args = [batch_size, learning_rate, beta_1, beta_2, noise_dim, processed_data.shape[1], dim]
22-
train_args = ['', epochs, log_step]
22+
gan_args = ModelParameters(batch_size=batch_size,
23+
lr=learning_rate,
24+
betas=(beta_1, beta_2),
25+
noise_dim=noise_dim,
26+
n_cols=processed_data.shape[1],
27+
layers_dim=dim)
28+
29+
train_args = TrainParameters(epochs=epochs,
30+
sample_interval=log_step)
2331

2432
synthesizer = WGAN_GP(gan_args, n_critic=2)
2533
synthesizer.train(processed_data, train_args)
2634

2735
synth_data = synthesizer.sample(1000)
36+
synthesizer.save('test.pkl')

examples/regular/cgan_example.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from ydata_synthetic.synthesizers.regular import CGAN
22
from ydata_synthetic.preprocessing.regular.credit_fraud import transformations
3+
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
34

45
import pandas as pd
56
import numpy as np
@@ -46,7 +47,7 @@
4647
beta_2 = 0.9
4748

4849
log_step = 100
49-
epochs = 500 + 1
50+
epochs = 300 + 1
5051
learning_rate = 5e-4
5152
models_dir = './cache'
5253

@@ -57,14 +58,25 @@
5758
train_sample[ data_cols ] = train_sample[ data_cols ] / 10 # scale to random noise size, one less thing to learn
5859
train_no_label = train_sample[ data_cols ]
5960

60-
gan_args = [batch_size, learning_rate, beta_1, beta_2, noise_dim, train_sample.shape[1], dim]
61-
train_args = ['', -1, epochs, log_step, (0, 1)]
61+
#Test here the new inputs
62+
gan_args = ModelParameters(batch_size=batch_size,
63+
lr=learning_rate,
64+
betas=(beta_1, beta_2),
65+
noise_dim=noise_dim,
66+
n_cols=train_sample.shape[1],
67+
layers_dim=dim)
68+
69+
train_args = TrainParameters(epochs=epochs,
70+
cache_prefix='',
71+
sample_interval=log_step,
72+
label_dim=-1,
73+
labels=(0,1))
6274

6375
#Init the Conditional GAN providing the index of the label column as one of the arguments
64-
synthesizer = CGAN(gan_args, num_classes=2)
76+
synthesizer = CGAN(model_parameters=gan_args, num_classes=2)
6577

6678
#Training the Conditional GAN
67-
synthesizer.train(train_sample, train_args)
79+
synthesizer.train(data=train_sample, label="Class",train_arguments=train_args)
6880

6981
#Saving the synthesizer
7082
synthesizer.save('cgan_synthtrained.pkl')

0 commit comments

Comments
 (0)