Skip to content

Commit d642bd9

Browse files
docs: add faqs (#289)
* docs: add faqs * docs: update faqs * docs: update mkdocs.yml
1 parent edd34a9 commit d642bd9

3 files changed

Lines changed: 89 additions & 1 deletion

File tree

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,5 +130,8 @@ We are open to collaboration! If you want to start contributing you only need to
130130
## Support
131131
For support in using this library, please join our Discord server. Our Discord community is very friendly and great about quickly answering questions about the use and development of the library. [Click here to join our Discord community!](https://tiny.ydata.ai/dcai-ydata-synthetic)
132132

133+
## FAQs
134+
Have a question? Check out the [Frequently Asked Questions](https://ydata.ai/resources/10-most-asked-questions-on-ydata-synthetic) about `ydata-synthetic`. If you feel something is missing, feel free to [book a beary informal chat with us](https://meetings.hubspot.com/fabiana-clemente).
135+
133136
## License
134137
[MIT License](https://github.com/ydataai/ydata-synthetic/blob/master/LICENSE)

docs/examples/faqs.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Frequently Asked Questions
2+
3+
## How to get accurate data from my synthetic data generation processes?
4+
Depending on your use case, the downstream application of your synthetic data, and the characteristics of your original data, you will need to adjust your synthetisation process accordingly. That often involves performing a thorough data preparation and fitting your generation models appropriately.
5+
6+
!!! tip
7+
8+
For a use-case oriented UI experience, try [YData Fabric](https://ydata.ai/ydata-fabric-free-trial). From an interactive and complete data profiling to an efficient synthetization, your data preparation process will be seamlessly adjusted to your data characteristics.
9+
10+
## How can I run the Streamlit app?
11+
12+
To try `ydata-synthetic` using the streamlit app, you need to install it using the `[]` notation that encodes the extras that the package incorporates. In this case, you can simply create your virtual environment and install `ydata-synthetic` as:
13+
14+
```bash
15+
pip install ydata-synthetic[streamlit]
16+
```
17+
18+
Note that Jupyter or Colab Notebooks are not yet supported, so you need to work it out in your Python environment. Once the package is installed, you can use the following snippet to start the app:
19+
20+
```python
21+
from ydata_synthetic import streamlit_app
22+
23+
streamlit_app.run()
24+
```
25+
26+
And that's it! After running the command, the console will output the URL from which you can access the app!
27+
28+
!!! example
29+
For a step-by-step installation guide, [check this 5-min video](https://www.youtube.com/watch?v=jj9X1_cKRwI&t=2s) that will help you get started!
30+
31+
32+
## What is the best way to evaluate the quality of my synthetic data?
33+
The most appropriate metrics to evaluate the quality of your synthetic data are also dependent on the goal for which synthetic data will be used. Nevertheless, we may define three essential pillars for synthetic data quality: privacy, fidelity, and utility:
34+
35+
- Privacy refers to the ability of synthetic data to withhold any personal, private, or sensitive information, avoiding connections being drawn to the original data and preventing data leakage;
36+
37+
- Fidelity concerns the ability of the new data to preserve the properties of the original data (in other words, it refers to "how faithful, how precise" is the synthetic data in comparison to real data);
38+
39+
- Finally, utility relates to the downstream application where the synthetic data will be used: if the synthetization process is successful, the same insights should be derived from the new data as from the original data.
40+
41+
For each of these components, several specific statistical measures can be evaluated.
42+
43+
!!! abstract
44+
45+
To learn more about how to define specific trade-offs between privacy, fidelity, and utility, check out this white paper on [Synthetic Data Quality Metrics](https://ydata.ai/synthetic-data-quality-metrics).
46+
47+
48+
## How to generate synthetic data in Google Colab and Python Environments?
49+
Most issues with installations are usually associated with unsupported Python versions or misalignment between python environments and package requirements.
50+
51+
Let’s see how you can get both right:
52+
53+
### Python Versions
54+
Note that `ydata-synthetic` currently requires Python >=3.9, < 3.11 so if you're trying to run our code in Google Colab, then you need to [update your Google Colab’s Python version](https://stackoverflow.com/questions/68657341/how-can-i-update-google-colabs-python-version/68658479#68658479) accordingly. The same goes for your development environment.
55+
56+
### Virtual Environments
57+
A lot of troubleshooting arises due to misalignments between environments and package requirements.
58+
Virtual Environments isolate your installations from the "global" environment so that you don't have to worry about conflicts.
59+
60+
Using conda, creating a new environment is as easy as running this on your shell:
61+
62+
```
63+
conda create --name synth-env python==3.9 pip
64+
conda activate synth-env
65+
pip install ydata-synthetic
66+
```
67+
68+
Now you can open up your Python editor or Jupyter Lab and use the synth-env as your development environment, without having to worry about conflicting versions or packages between projects!
69+
70+
71+
## Does TimeGAN replicate my full sequence of data?
72+
No. This is an unrealistic expectation because the TimeGAN architecture is not meant to replicate the long-term behavior of your data.
73+
74+
TimeGAN works with the concept of "windows": it learns to map the data distribution of short-term frames of time, within the time windows you provide. It also considers that those windows are independent of each other, so it cannot return a temporal pattern most people expect.
75+
76+
That's not supported by this architecture itself, but there are others that allow for both short-term and long-term synthesization, as those available in [YData Fabric](https://ydata.ai/products/synthetic_data).
77+
78+
!!! abstract
79+
80+
Learn more about how YData's Time-Series Synthetic Data Generation compare to TimeGAN in [this dedicated post](https://ydata.ai/resources/the-best-generative-ai-model-for-time-series-synthetic-data-generation).
81+
82+
83+
# Additional Support
84+
Couldn't find what you need? Reach out to our [dedicated team](https://meetings.hubspot.com/fabiana-clemente) for a quick and *syn-ple* assistance!

mkdocs.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ nav:
1010
- Overview: 'index.md'
1111
- Installation: 'getting-started/installation.md'
1212
- Quickstart: 'getting-started/quickstart.md'
13-
- Examples:
13+
- Synthetic Data Generation:
1414
- Generate Tabular Data:
1515
- CGAN: "examples/cgan_example.md"
1616
- WGAN: "examples/wgan_example.md"
@@ -21,6 +21,7 @@ nav:
2121
- CWGAN-GP: "examples/cwgangp_example.md"
2222
- Generate Time-Series Data:
2323
- TimeGAN: "examples/timegan_example.md"
24+
- Frequently Asked Questions: "examples/faqs.md"
2425
- Integrations:
2526
- Great Expectations: "integrations/gx_integration.md"
2627
- Support:

0 commit comments

Comments
 (0)