Skip to content

Latest commit

 

History

History
50 lines (35 loc) · 2.29 KB

File metadata and controls

50 lines (35 loc) · 2.29 KB

How to Generate New Data

cd _visualize/scripts/
./UPDATE.sh

(Additional script functionality detailed in the ./scripts section below.)

IMPORTANT!
Data fetching scripts require an environment variable GITHUB_API_TOKEN containing a valid GitHub OAuth token or personal access token.
You will also need to install the Python dependencies as listed in requirements.txt.

About the Contents of this Directory

Simple text files containing input lists. (e.g. list of organizations, list of independent repositories)

The actual queries sent to GitHub's GraphQL API when the data fetching scripts are run. This makes writing/editing queries easier, as it allows them to remain in individual, human-readable text files.

Scripts for data fetching and manipulation. Data is written to visualize/github-data in appropriate json formats.

New files are created for each type of data structure.
For most files, data is overwritten each time the scripts are run.
Other scripts may collect cumulative data with a daily timestamp. If one of these scripts is run multiple times in a single day, the entry for that day will be overwritten.

Running UPDATE.sh will run all of the necessary scripts in the appropriate order to fetch the latest data. It will also update LAST_FULL_UPDATE.txt to record when this complete data update was last run.

You can also run the script with an argument UPDATE.sh <TAG> to select a custom set of scripts defined in UPDATE_<TAG>.txt. (See UPDATE_FULL.txt for the default set of scripts.) Timestamps will also be recorded in LAST_<TAG>_UPDATE.txt instead.

The scripts are only for gathering new data. You do not need them to run in order to view the webpage visualizations.