Skip to content

Commit aa754d4

Browse files
committed
Major data update script and workflow refactoring.
Customizable data updates via text files listing scripts. Renamed MASTER script to UPDATE, reflecting the new versatility. Moved most workflow logic into yml files directly. Divided data script runs for different credentials. Set workflow concurrency groups. Removed unused file-by-year breakdowns.
1 parent c8dbae6 commit aa754d4

17 files changed

Lines changed: 212 additions & 253 deletions

File tree

.github/scripts/cache.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 3 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,30 +6,15 @@ set -eu
66

77
# From action env:
88
# REPO_DIR
9+
# TAG
910

10-
ACT_LOG_PATH=_visualize/LAST_MASTER_UPDATE.txt
11+
ACT_LOG_PATH=_visualize/LAST_${TAG}_UPDATE.txt
1112
ACT_INPUT_PATH=_visualize
1213
ACT_DATA_PATH=visualize/github-data
13-
ACT_SCRIPT_PATH=_visualize/scripts
14-
15-
### SETUP ###
16-
17-
# Store absolute path
18-
cd $REPO_DIR
19-
REPO_ROOT=$(pwd)
20-
21-
# Store previous END timestamp
22-
OLD_END=$(cat $ACT_LOG_PATH | grep END | cut -f 2)
23-
OLD_END=$(date --date="$OLD_END" "+%s")
24-
25-
### RUN MASTER SCRIPT ###
26-
27-
cd $REPO_ROOT/$ACT_SCRIPT_PATH
28-
./MASTER.sh
2914

3015
### VALIDATE UPDATE ###
3116

32-
cd $REPO_ROOT
17+
cd $REPO_DIR
3318

3419
# Timestamp log changed
3520
cat $ACT_LOG_PATH
@@ -50,17 +35,6 @@ if [ $(cat $ACT_LOG_PATH | grep -c FAILED) -ne "0" ] || [ $(cat $ACT_LOG_PATH |
5035
echo "Timestamp log valid"
5136
fi
5237

53-
# New START is later than previous END
54-
NEW_START=$(cat $ACT_LOG_PATH | grep START | cut -f 2)
55-
NEW_START=$(date --date="$NEW_START" "+%s")
56-
if [ "$OLD_END" -gt "$NEW_START" ]
57-
then
58-
echo "UPDATE FAILED - New START is earlier than previous END"
59-
exit 1
60-
else
61-
echo "START timestamp valid"
62-
fi
63-
6438
# All changes are to valid files only
6539
git diff --name-only HEAD
6640
CHANGE_COUNT=$(git diff --name-only HEAD | grep -c -E ".+")

.github/workflows/cache.yml

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,18 @@ name: Routine Data Cache Request
33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: "45 8 * * *"
6+
- cron: '45 8 * * *'
7+
8+
concurrency:
9+
group: data-cache
710

811
defaults:
912
run:
1013
shell: bash
1114

15+
env:
16+
TAG: CACHE
17+
1218
jobs:
1319
runDataUpdate:
1420
name: Run Cache Request
@@ -20,35 +26,44 @@ jobs:
2026
- name: Store timestamp
2127
run: |
2228
echo "TIMESTAMP=$(date -u +"%F-%H")" >> "$GITHUB_ENV"
29+
2330
- name: Checkout
2431
uses: actions/checkout@v6
2532
with:
2633
path: ${{ env.REPO_DIR }}
2734
token: ${{ secrets.GITHUB_TOKEN }}
35+
2836
- name: Setup python
2937
uses: actions/setup-python@v6
3038
with:
31-
python-version: "3.11"
32-
cache: "pip"
33-
cache-dependency-path: "${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt"
39+
python-version: '3.11'
40+
cache: 'pip'
41+
cache-dependency-path: '${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt'
42+
3443
- name: Install dependencies
35-
run: pip install -r ${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt
36-
- name: Run cache script
37-
run: ./${{ env.REPO_DIR }}/.github/scripts/cache.sh
44+
run: pip install -r $REPO_DIR/_visualize/scripts/requirements.txt
45+
46+
- name: Run data collection script
47+
run: |
48+
set -eu
49+
cd $REPO_DIR/_visualize/scripts
50+
./UPDATE.sh $TAG
3851
env:
3952
GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
53+
4054
- name: Show health stats
4155
if: ${{ always() }}
4256
run: |
43-
cat ${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.txt || true
44-
echo "Warning Count: $(grep -c 'Warning' ${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.log)"
45-
echo "From Timeouts: $(grep -c 'but failed' ${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.log)"
46-
echo "Limit Reached: $(grep -c 'rate limit exceeded' ${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.log)"
57+
cat $REPO_DIR/_visualize/LAST_$TAG_UPDATE.txt || true
58+
echo "Warning Count: $(grep -c 'Warning' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
59+
echo "From Timeouts: $(grep -c 'but failed' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
60+
echo "Limit Reached: $(grep -c 'rate limit exceeded' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
61+
4762
- name: Save log files
4863
if: ${{ always() }}
4964
uses: actions/upload-artifact@v6
5065
with:
5166
name: logfiles_${{ env.TIMESTAMP }}_cache
5267
path: |
53-
${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.txt
54-
${{ env.REPO_DIR }}/_visualize/LAST_CACHE_REQUEST.log
68+
${{ env.REPO_DIR }}/_visualize/LAST_${{ env.TAG }}_UPDATE.txt
69+
${{ env.REPO_DIR }}/_visualize/LAST_${{ env.TAG }}_UPDATE.log

.github/workflows/update.yml

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,11 @@ name: Routine Data Update
33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: "45 10 * * *"
6+
- cron: '45 10 * * *'
7+
8+
concurrency:
9+
group: data-updates
10+
cancel-in-progress: true
711

812
defaults:
913
run:
@@ -20,66 +24,110 @@ jobs:
2024
- name: Store timestamp
2125
run: |
2226
echo "TIMESTAMP=$(date -u +"%F-%H")" >> "$GITHUB_ENV"
27+
2328
- name: Checkout
2429
uses: actions/checkout@v6
2530
with:
2631
path: ${{ env.REPO_DIR }}
2732
token: ${{ secrets.GITHUB_TOKEN }}
2833
persist-credentials: false
34+
2935
- name: Setup python
3036
uses: actions/setup-python@v6
3137
with:
32-
python-version: "3.11"
33-
cache: "pip"
34-
cache-dependency-path: "${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt"
38+
python-version: '3.11'
39+
cache: 'pip'
40+
cache-dependency-path: '${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt'
41+
3542
- name: Install dependencies
36-
run: pip install -r ${{ env.REPO_DIR }}/_visualize/scripts/requirements.txt
37-
- name: Run update script
43+
run: pip install -r $REPO_DIR/_visualize/scripts/requirements.txt
44+
45+
- name: Create GitHub App Installation Token1
46+
uses: actions/create-github-app-token@v2
47+
id: app-token1
48+
with:
49+
app-id: ${{ vars.APP_ID }}
50+
private-key: ${{ secrets.PRIVATE_KEY }}
51+
52+
- name: Run data collection script with App Installation Token
53+
run: |
54+
set -eu
55+
cd $REPO_DIR/_visualize/scripts
56+
./UPDATE.sh $TAG
57+
env:
58+
GITHUB_API_TOKEN: ${{ steps.app-token1.outputs.token }}
59+
TAG: MEMBERS
60+
61+
- name: Validate members data updates
62+
run: ./$REPO_DIR/.github/scripts/validate.sh
63+
env:
64+
TAG: MEMBERS
65+
66+
- name: Run data collection script with Action Token
67+
run: |
68+
set -eu
69+
cd $REPO_DIR/_visualize/scripts
70+
./UPDATE.sh $TAG
3871
env:
3972
GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
40-
run: ./${{ env.REPO_DIR }}/.github/scripts/update.sh
73+
TAG: CORE
74+
75+
- name: Validate core data updates
76+
run: ./$REPO_DIR/.github/scripts/validate.sh
77+
env:
78+
TAG: CORE
79+
4180
- name: Create GitHub App Installation Token
4281
uses: actions/create-github-app-token@v2
4382
id: app-token
4483
with:
4584
app-id: ${{ vars.APP_ID }}
4685
private-key: ${{ secrets.PRIVATE_KEY }}
86+
4787
- name: Get GitHub App User ID
4888
id: get-user-id
4989
env:
5090
GH_TOKEN: ${{ steps.app-token.outputs.token }}
5191
run: echo "user-id=$(gh api "/users/${{ steps.app-token.outputs.app-slug }}[bot]" --jq .id)" >> "$GITHUB_OUTPUT"
92+
5293
- name: Configure git
5394
env:
5495
GH_TOKEN: ${{ steps.app-token.outputs.token }}
5596
run: |
5697
gh auth setup-git
5798
git config --global user.name '${{ steps.app-token.outputs.app-slug }}[bot]'
5899
git config --global user.email '${{ steps.get-user-id.outputs.user-id }}+${{ steps.app-token.outputs.app-slug }}[bot]@users.noreply.github.com'
100+
59101
- name: Commit updated data
60102
env:
61103
GH_TOKEN: ${{ steps.app-token.outputs.token }}
62104
run: |
63-
pushd ${{ env.REPO_DIR }}
105+
set -eu
106+
cd $REPO_DIR
64107
git stash
65108
git pull --ff-only
66109
git stash pop
67110
git add -A .
68-
git commit -m "${{ env.TIMESTAMP }} Data Update by ${{ steps.app-token.outputs.app-slug }}"
111+
git commit -m "$TIMESTAMP Data Update by ${{ steps.app-token.outputs.app-slug }}"
69112
git push
70-
popd
113+
71114
- name: Show health stats
72115
if: ${{ always() }}
73116
run: |
74-
cat ${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.txt || true
75-
echo "Warning Count: $(grep -c 'Warning' ${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.log)"
76-
echo "From Timeouts: $(grep -c 'but failed' ${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.log)"
77-
echo "Limit Reached: $(grep -c 'rate limit exceeded' ${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.log)"
117+
for TAG in MEMBERS CORE; do
118+
cat $REPO_DIR/_visualize/LAST_$TAG_UPDATE.txt || true
119+
echo "Warning Count: $(grep -c 'Warning' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
120+
echo "From Timeouts: $(grep -c 'but failed' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
121+
echo "Limit Reached: $(grep -c 'rate limit exceeded' $REPO_DIR/_visualize/LAST_$TAG_UPDATE.log)"
122+
done
123+
78124
- name: Save log files
79125
if: ${{ always() }}
80126
uses: actions/upload-artifact@v6
81127
with:
82128
name: logfiles_${{ env.TIMESTAMP }}_update
83129
path: |
84-
${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.txt
85-
${{ env.REPO_DIR }}/_visualize/LAST_MASTER_UPDATE.log
130+
${{ env.REPO_DIR }}/_visualize/LAST_MEMBERS_UPDATE.txt
131+
${{ env.REPO_DIR }}/_visualize/LAST_MEMBERS_UPDATE.log
132+
${{ env.REPO_DIR }}/_visualize/LAST_CORE_UPDATE.txt
133+
${{ env.REPO_DIR }}/_visualize/LAST_CORE_UPDATE.log

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ _site
22
Gemfile.lock
33
*.pyc
44
_visualize/*.log
5-
_visualize/LAST_CACHE_REQUEST.txt
65
.DS_Store
76
.vscode/
87
.bundle

_visualize/README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
```bash
44
cd _visualize/scripts/
5-
./MASTER.sh
5+
./UPDATE.sh
66
```
77

88
_(Additional script functionality detailed in the [`./scripts` section below][jump2 scripts].)_
@@ -29,7 +29,11 @@ New files are created for each type of data structure.
2929
For most files, data is overwritten each time the scripts are run.
3030
Other scripts may collect cumulative data with a daily timestamp. If one of these scripts is run multiple times in a single day, the entry for that day will be overwritten.
3131

32-
Running [`MASTER.sh`][mastersh] will run all of the necessary scripts in the appropriate order to fetch the latest data. It will also update [`LAST_MASTER_UPDATE.txt`][lastmasterup] to record when this complete data update was last run.
32+
Running [`UPDATE.sh`][updatesh] will run all of the necessary scripts in the appropriate order to fetch the latest data. It will also update `LAST_FULL_UPDATE.txt` to record when this complete data update was last run.
33+
34+
You can also run the script with an argument `UPDATE.sh <TAG>` to select a custom set of scripts defined in `UPDATE_<TAG>.txt`.
35+
(See [`UPDATE_FULL.txt`][updatefull] for the default set of scripts.)
36+
Timestamps will also be recorded in `LAST_<TAG>_UPDATE.txt` instead.
3337

3438
The scripts are only for gathering new data. You do not need them to run in order to view the webpage visualizations.
3539

@@ -39,8 +43,8 @@ The scripts are only for gathering new data. You do not need them to run in orde
3943
[queries dir]: queries
4044
[scripts dir]: scripts
4145
[requires]: scripts/requirements.txt
42-
[mastersh]: scripts/MASTER.sh
43-
[lastmasterup]: LAST_MASTER_UPDATE.txt
46+
[updatesh]: scripts/UPDATE.sh
47+
[updatefull]: scripts/UPDATE_FULL.txt
4448
[gitgraphql]: https://developer.github.com/v4/
4549
[oauth]: https://github.com/settings/developers
4650
[personaltoken]: https://github.com/settings/tokens

_visualize/scripts/CACHE.sh

Lines changed: 0 additions & 50 deletions
This file was deleted.

0 commit comments

Comments
 (0)