Skip to content

Commit 324d22b

Browse files
max-ostapenkodependabot[bot]ksakae1216github-actions[bot]tunetheweb
authored
Privacy 2024 queries (#3653)
* readme * copied 2022 SQLs over to update/review * fixed link * origin trials * Bump puppeteer from 22.7.1 to 22.8.0 in /src (#3655) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.7.1 to 22.8.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.7.1...puppeteer-v22.8.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * notebook + readme (#3652) * Bump pytest from 8.1.1 to 8.2.0 in /src (#3651) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.1.1 to 8.2.0. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.1.1...8.2.0) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Translation of privacy chapter to Japanese (#3654) * Update Timestamps (#3657) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * 2023 Performance (#3525) * cp 2022->2023 * 2023ify * 2023/perf * lint * lint * fix initiator * null initiators * Bump puppeteer from 22.8.0 to 22.9.0 in /src (#3662) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.8.0 to 22.9.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.8.0...puppeteer-v22.9.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Upgrade to web-vitals v4 (#3661) * Upgrade to web-vitals v4 * Update src/static/js/send-web-vitals.js Co-authored-by: Barry Pollard <barrypollard@google.com> --------- Co-authored-by: Barry Pollard <barrypollard@google.com> * Bump pytest from 8.2.0 to 8.2.1 in /src (#3664) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.2.0 to 8.2.1. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.2.0...8.2.1) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#3665) updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.9.0 to 22.10.0 in /src (#3668) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.9.0 to 22.10.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.9.0...puppeteer-v22.10.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump jsdom from 24.0.0 to 24.1.0 in /src (#3669) Bumps [jsdom](https://github.com/jsdom/jsdom) from 24.0.0 to 24.1.0. - [Release notes](https://github.com/jsdom/jsdom/releases) - [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md) - [Commits](jsdom/jsdom@24.0.0...24.1.0) --- updated-dependencies: - dependency-name: jsdom dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Typofix (#3670) Seems like "desktop" is mentioned twice and according to the data, the second mention is related to mobile https://docs.google.com/spreadsheets/d/1JvJMiRsL6T9m_NEBHFh-rrQmU5a-ufdOKriSJbrEN8M/edit#gid=1472139207 * SQL and MD folders the 2024 Web Almanac (#3666) * upload 2024 * change mds * Test update * Revert test update * Fix line endings --------- Co-authored-by: Barry Pollard <barrypollard@google.com> * Bump prettier from 3.2.5 to 3.3.0 in /src (#3672) Bumps [prettier](https://github.com/prettier/prettier) from 3.2.5 to 3.3.0. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.2.5...3.3.0) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pytest from 8.2.1 to 8.2.2 in /src (#3673) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.2.1 to 8.2.2. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.2.1...8.2.2) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump prettier from 3.3.0 to 3.3.1 in /src (#3674) Bumps [prettier](https://github.com/prettier/prettier) from 3.3.0 to 3.3.1. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.3.0...3.3.1) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix loaf monitoring bug (#3675) * Fix LoAF monitoring bug * Add semi colon * Update Timestamps (#3677) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * Bump web-vitals from 4.0.1 to 4.1.0 in /src (#3678) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.0.1 to 4.1.0. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.0.1...v4.1.0) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fixed link * remove unreviewed sql * lint test * lint * ads supply graph * lint * close file * lint * top_direct_sellers * ads_txt_lines_histogram * ads_txt_seller_accounts_by_type * top_ads_variables * format * tcf2 * rename * lint * using custom_metrics * most_common_cname_domains * adguard list * gpc * referrer policy * usp * iab frameworks * lint * bounce trackers * Added privacy sandbox related queries * lint * missed lint * dnt * client hints * whotracksme update * lint * referrer policy * rank filter removed * trackers * util deps * limits * Privacy 2024 queries - CCPA, fingerprinting, cookies (#3720) * CCPA metrics * fingerprinting metrics * cookie metrics * lint * bq to sheets updates * query optimisation * downgrade for python 3.8 * more categories * more categories and columns reordered * forms and formatted logs * Refactoring queries to produce output for queries only * lint * lint * Privacy Sql Tracking Detection Using Easylist Adservers (#3730) * Add GA4 fields to match documentation (#3679) * Add standard GA4 web-vital fields * Add value * Update Timestamps (#3680) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * Bump web-vitals from 4.1.0 to 4.1.1 in /src (#3681) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.0 to 4.1.1. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.1.0...v4.1.1) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.10.0 to 22.10.1 in /src (#3682) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.0 to 22.10.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.10.0...puppeteer-v22.10.1) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump prettier from 3.3.1 to 3.3.2 in /src (#3683) Bumps [prettier](https://github.com/prettier/prettier) from 3.3.1 to 3.3.2. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.3.1...3.3.2) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.10.1 to 22.11.0 in /src (#3684) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.10.1 to 22.11.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.10.1...puppeteer-v22.11.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Translation of security chapter to Japanese (#3685) * Bump puppeteer from 22.11.0 to 22.11.2 in /src (#3688) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.0 to 22.11.2. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.11.0...puppeteer-v22.11.2) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.1.1 to 4.2.0 in /src (#3690) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.1.1 to 4.2.0. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.1.1...v4.2.0) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.11.2 to 22.12.0 in /src (#3689) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.11.2 to 22.12.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.11.2...puppeteer-v22.12.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3691) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * Remove deploy.zip step of deployment (#3692) * Remove deploy.zip * Remove from ignore files * Bump puppeteer from 22.12.0 to 22.12.1 in /src (#3694) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.0 to 22.12.1. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.12.0...puppeteer-v22.12.1) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0 (#3693) * Bump treosh/lighthouse-ci-action from 11.4.0 to 12.1.0 Bumps [treosh/lighthouse-ci-action](https://github.com/treosh/lighthouse-ci-action) from 11.4.0 to 12.1.0. - [Release notes](https://github.com/treosh/lighthouse-ci-action/releases) - [Commits](treosh/lighthouse-ci-action@11.4.0...12.1.0) --- updated-dependencies: - dependency-name: treosh/lighthouse-ci-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Upgrade to Node 20 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Barry Pollard <barrypollard@google.com> * Bump web-vitals from 4.2.0 to 4.2.1 in /src (#3695) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.0 to 4.2.1. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.0...v4.2.1) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/setup-python from 5.1.0 to 5.1.1 (#3699) Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5.1.0 to 5.1.1. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5.1.0...v5.1.1) --- updated-dependencies: - dependency-name: actions/setup-python dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump puppeteer from 22.12.1 to 22.13.0 in /src (#3698) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.12.1 to 22.13.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.12.1...puppeteer-v22.13.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Translation of mobile-web chapter to Japanese (#3700) * Bump puppeteer from 22.13.0 to 22.15.0 in /src (#3711) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.13.0 to 22.15.0. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.13.0...puppeteer-v22.15.0) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump jsdom from 24.1.0 to 24.1.1 in /src (#3707) Bumps [jsdom](https://github.com/jsdom/jsdom) from 24.1.0 to 24.1.1. - [Release notes](https://github.com/jsdom/jsdom/releases) - [Changelog](https://github.com/jsdom/jsdom/blob/main/Changelog.md) - [Commits](jsdom/jsdom@24.1.0...24.1.1) --- updated-dependencies: - dependency-name: jsdom dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.2.1 to 4.2.2 in /src (#3706) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.1 to 4.2.2. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.1...v4.2.2) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump prettier from 3.3.2 to 3.3.3 in /src (#3702) Bumps [prettier](https://github.com/prettier/prettier) from 3.3.2 to 3.3.3. - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.3.2...3.3.3) --- updated-dependencies: - dependency-name: prettier dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump web-vitals from 4.2.2 to 4.2.3 in /src (#3715) Bumps [web-vitals](https://github.com/GoogleChrome/web-vitals) from 4.2.2 to 4.2.3. - [Changelog](https://github.com/GoogleChrome/web-vitals/blob/main/CHANGELOG.md) - [Commits](GoogleChrome/web-vitals@v4.2.2...v4.2.3) --- updated-dependencies: - dependency-name: web-vitals dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3716) Co-authored-by: rviscomi <1120896+rviscomi@users.noreply.github.com> * tracking detection using easylist adservers * easylist_adserver tracking detection and query * 2022 cdn portuguese (#3725) * add file to translation * done translation cdn.md Makes progress on #505 * Bump puppeteer from 22.15.0 to 23.0.2 in /src (#3719) Bumps [puppeteer](https://github.com/puppeteer/puppeteer) from 22.15.0 to 23.0.2. - [Release notes](https://github.com/puppeteer/puppeteer/releases) - [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json) - [Commits](puppeteer/puppeteer@puppeteer-v22.15.0...puppeteer-v23.0.2) --- updated-dependencies: - dependency-name: puppeteer dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Timestamps (#3726) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * Replace `<object>` with `<iframe>` for embedded SVG (#3727) * Replace object with iframe for embedded SVG * Translations * auto upload easylist data to table * Fix the build to ignore 2024 chapters (for now) (#3728) * Fix the build to ignore 2024 chapters (for now) * Remove test line * Update Timestamps (#3729) Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> * liniting * liniting * linting * linting * linting * linting * fixes of Simplified Chinese translation for 2020 Performance (#3734) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Barry Pollard <barrypollard@google.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sakae Kotaro <ksakae1216@gmail.com> Co-authored-by: rviscomi <1120896+rviscomi@users.noreply.github.com> Co-authored-by: Hadi Amjad <hadiamjad@Hadis-MacBook-Air.local> Co-authored-by: William Constantinov <33907565+HakaCode@users.noreply.github.com> Co-authored-by: Zuckjet <zuckjet@gmail.com> Co-authored-by: Max Ostapenko <1611259+max-ostapenko@users.noreply.github.com> * log query errors * Fixed privacy sandbox attestation query bug * maximum_bytes_billed parameter * moved to chapter root * postpone dryrun check * fingerprinting_most_common_apis: improve resilience to malformed JSON (#3737) * optional maximum_bytes_billed parameter * formatting * queries and notebook updates * queries to rerun * origin trials function fix * optimised sellers count * apps included in ads.txt lines * another rerun * lint * no origins * optimized perf * more optimized perf * graph optimization and OT expiration * earlier grouping for performance * graph fixes * cookies, ccpa, fingerprinting: calculate percent of total pages * query for top third-party cookie names * bq writer module * add grouping * domain suffixes and regexes removed * add comments * review * add PR link * lint * remove mobile filter * lint * lint * disable import-error rule * adguard not used * linting * pages_pct in query * lint --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sakae Kotaro <ksakae1216@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: tunetheweb <10931297+tunetheweb@users.noreply.github.com> Co-authored-by: Rick Viscomi <rviscomi@users.noreply.github.com> Co-authored-by: Barry Pollard <barrypollard@google.com> Co-authored-by: Boris Schapira <borisschapira@gmail.com> Co-authored-by: ChrisBeeti <32492572+ChrisBeeti@users.noreply.github.com> Co-authored-by: Yash Vekaria <yvekaria.09@gmail.com> Co-authored-by: Ben Standaert <71239179+bstandaert-wustl@users.noreply.github.com> Co-authored-by: Hadi Amjad <46374292+hadiamjad@users.noreply.github.com> Co-authored-by: rviscomi <1120896+rviscomi@users.noreply.github.com> Co-authored-by: Hadi Amjad <hadiamjad@Hadis-MacBook-Air.local> Co-authored-by: William Constantinov <33907565+HakaCode@users.noreply.github.com> Co-authored-by: Zuckjet <zuckjet@gmail.com> Co-authored-by: bstandaert-wustl <b.g.standaert@wustl.edu>
1 parent 6375165 commit 324d22b

45 files changed

Lines changed: 2273 additions & 215 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
WITH publishers AS (
2+
SELECT
3+
page,
4+
JSON_QUERY(custom_metrics, '$.ads.ads.account_types') AS ads_account_types,
5+
JSON_QUERY(custom_metrics, '$.ads.app_ads.account_types') AS app_ads_account_types
6+
FROM `httparchive.all.pages`
7+
WHERE date = '2024-06-01' AND
8+
is_root_page = TRUE AND
9+
(CAST(JSON_VALUE(custom_metrics, '$.ads.ads.account_count') AS INT64) > 0 OR
10+
CAST(JSON_VALUE(custom_metrics, '$.ads.app_ads.account_count') AS INT64) > 0)
11+
), ads_accounts AS (
12+
SELECT
13+
page,
14+
CEIL(CAST(JSON_VALUE(ads_account_types, '$.direct.account_count') AS INT64) / 100) * 100 AS direct_account_count_bucket,
15+
CEIL(CAST(JSON_VALUE(ads_account_types, '$.reseller.account_count') AS INT64) / 100) * 100 AS reseller_account_count_bucket,
16+
COUNT(DISTINCT page) OVER () AS total_pages
17+
FROM publishers
18+
), app_ads_accounts AS (
19+
SELECT
20+
page,
21+
CEIL(CAST(JSON_VALUE(app_ads_account_types, '$.direct.account_count') AS INT64) / 100) * 100 AS direct_account_count_bucket,
22+
CEIL(CAST(JSON_VALUE(app_ads_account_types, '$.reseller.account_count') AS INT64) / 100) * 100 AS reseller_account_count_bucket,
23+
COUNT(DISTINCT page) OVER () AS total_pages
24+
FROM publishers
25+
)
26+
27+
SELECT
28+
'ads' AS source,
29+
'direct' AS account_type,
30+
direct_account_count_bucket AS account_count_bucket,
31+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
32+
COUNT(DISTINCT page) AS number_of_pages
33+
FROM ads_accounts
34+
GROUP BY source, direct_account_count_bucket
35+
UNION ALL
36+
SELECT
37+
'ads' AS source,
38+
'reseller' AS account_type,
39+
reseller_account_count_bucket AS account_count_bucket,
40+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
41+
COUNT(DISTINCT page) AS number_of_pages
42+
FROM ads_accounts
43+
GROUP BY source, reseller_account_count_bucket
44+
UNION ALL
45+
SELECT
46+
'app_ads' AS source,
47+
'direct' AS account_type,
48+
direct_account_count_bucket AS account_count_bucket,
49+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
50+
COUNT(DISTINCT page) AS number_of_pages
51+
FROM app_ads_accounts
52+
GROUP BY source, direct_account_count_bucket
53+
UNION ALL
54+
SELECT
55+
'app_ads' AS source,
56+
'reseller' AS account_type,
57+
reseller_account_count_bucket AS account_count_bucket,
58+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
59+
COUNT(DISTINCT page) AS number_of_pages
60+
FROM app_ads_accounts
61+
GROUP BY source, reseller_account_count_bucket
62+
63+
ORDER BY account_count_bucket ASC
64+
LIMIT 1000
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
WITH RECURSIVE pages AS (
2+
SELECT
3+
CASE page -- Publisher websites may redirect to an SSP domain, and need to use redirected domain instead of page domain. CASE needs to be replaced with a more robust solution from HTTPArchive/custom-metrics#136.
4+
WHEN 'https://www.chunkbase.com/' THEN 'cafemedia.com'
5+
ELSE NET.REG_DOMAIN(page)
6+
END AS page_domain,
7+
JSON_QUERY(ANY_VALUE(custom_metrics), '$.ads') AS ads_metrics
8+
FROM `httparchive.all.pages`
9+
WHERE date = '2024-06-01' AND
10+
is_root_page = TRUE
11+
GROUP BY page_domain
12+
), ads AS (
13+
SELECT
14+
page_domain,
15+
JSON_QUERY(ads_metrics, '$.ads.account_types') AS ad_accounts
16+
FROM pages
17+
WHERE
18+
CAST(JSON_VALUE(ads_metrics, '$.ads.account_count') AS INT64) > 0
19+
), sellers AS (
20+
SELECT
21+
page_domain,
22+
JSON_QUERY(ads_metrics, '$.sellers.seller_types') AS ad_sellers
23+
FROM pages
24+
WHERE
25+
CAST(JSON_VALUE(ads_metrics, '$.sellers.seller_count') AS INT64) > 0
26+
), relationships_web AS (
27+
SELECT
28+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS demand,
29+
'Web' AS supply,
30+
'direct' AS relationship,
31+
page_domain AS publisher
32+
FROM ads, UNNEST(JSON_VALUE_ARRAY(ad_accounts, '$.direct.domains')) AS domain
33+
UNION ALL
34+
SELECT
35+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS demand,
36+
'Web' AS supply,
37+
'indirect' AS relationship,
38+
page_domain AS publisher
39+
FROM ads, UNNEST(JSON_VALUE_ARRAY(ad_accounts, '$.reseller.domains')) AS domain
40+
UNION ALL
41+
SELECT
42+
page_domain AS demand,
43+
'Web' AS supply,
44+
'direct' AS relationship,
45+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS publisher
46+
FROM sellers, UNNEST(JSON_VALUE_ARRAY(ad_sellers, '$.publisher.domains')) AS domain
47+
UNION ALL
48+
SELECT
49+
page_domain AS demand,
50+
'Web' AS supply,
51+
'direct' AS relationship,
52+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS publisher
53+
FROM sellers, UNNEST(JSON_VALUE_ARRAY(ad_sellers, '$.both.domains')) AS domain
54+
), relationships_adtech AS (
55+
SELECT
56+
page_domain AS demand,
57+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS supply,
58+
'indirect' AS relationship
59+
FROM sellers, UNNEST(JSON_VALUE_ARRAY(ad_sellers, '$.intermediary.domains')) AS domain
60+
UNION ALL
61+
SELECT
62+
page_domain AS demand,
63+
NET.REG_DOMAIN(REGEXP_EXTRACT(NORMALIZE_AND_CASEFOLD(domain), r'\b[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b')) AS supply,
64+
'indirect' AS relationship
65+
FROM sellers, UNNEST(JSON_VALUE_ARRAY(ad_sellers, '$.both.domains')) AS domain
66+
), nodes AS (
67+
(
68+
SELECT
69+
demand,
70+
supply,
71+
CONCAT(demand, '-', supply) AS path,
72+
relationship,
73+
HLL_COUNT.INIT(publisher) AS supply_sketch
74+
FROM relationships_web
75+
GROUP BY demand, supply, relationship
76+
)
77+
UNION ALL
78+
(
79+
SELECT
80+
relationships_grouped.demand AS demand,
81+
relationships_grouped.supply AS supply,
82+
CONCAT(relationships_grouped.demand, '-', nodes.path) AS path,
83+
relationships_grouped.relationship AS relationship,
84+
nodes.supply_sketch AS supply_sketch
85+
FROM (
86+
SELECT
87+
demand,
88+
supply,
89+
relationship
90+
FROM relationships_adtech
91+
GROUP BY
92+
demand,
93+
supply,
94+
relationship
95+
) AS relationships_grouped
96+
INNER JOIN nodes
97+
ON relationships_grouped.supply = nodes.demand AND
98+
nodes.supply_sketch IS NOT NULL AND
99+
nodes.relationship = 'indirect' AND
100+
relationships_grouped.demand IS NOT NULL AND
101+
STRPOS(nodes.path, relationships_grouped.demand) = 0
102+
)
103+
)
104+
105+
SELECT
106+
supply,
107+
demand,
108+
HLL_COUNT.MERGE(supply_sketch) AS publishers_count,
109+
relationship,
110+
path
111+
FROM nodes
112+
GROUP BY demand, supply, relationship, path
113+
ORDER BY publishers_count DESC
114+
LIMIT 5000
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
WITH RECURSIVE pages AS (
2+
SELECT
3+
CASE page -- publisher websites may redirect to an SSP domain, and need to use redirected domain instead of page domain
4+
WHEN 'https://www.chunkbase.com/' THEN 'cafemedia.com'
5+
ELSE NET.REG_DOMAIN(page)
6+
END AS page,
7+
CAST(JSON_VALUE(custom_metrics, '$.ads.ads.line_count') AS INT64) AS ads_line_count,
8+
CAST(JSON_VALUE(custom_metrics, '$.ads.app_ads.line_count') AS INT64) AS app_ads_line_count
9+
FROM `httparchive.all.pages`
10+
WHERE date = '2024-06-01' AND
11+
is_root_page = TRUE
12+
), ads AS (
13+
SELECT
14+
page,
15+
CEIL(ads_line_count / 100) * 100 AS line_count_bucket,
16+
COUNT(DISTINCT page) OVER () AS total_pages
17+
FROM pages
18+
WHERE ads_line_count > 0
19+
), app_ads AS (
20+
SELECT
21+
page,
22+
CEIL(app_ads_line_count / 100) * 100 AS line_count_bucket,
23+
COUNT(DISTINCT page) OVER () AS total_pages
24+
FROM pages
25+
WHERE app_ads_line_count > 0
26+
)
27+
28+
SELECT
29+
'ads.txt' AS type,
30+
line_count_bucket,
31+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
32+
COUNT(DISTINCT page) AS number_of_pages
33+
FROM ads
34+
GROUP BY line_count_bucket
35+
HAVING line_count_bucket <= 10000
36+
UNION ALL
37+
SELECT
38+
'app-ads.txt' AS type,
39+
line_count_bucket,
40+
COUNT(DISTINCT page) / ANY_VALUE(total_pages) AS pct_pages,
41+
COUNT(DISTINCT page) AS number_of_pages
42+
FROM app_ads
43+
GROUP BY line_count_bucket
44+
HAVING line_count_bucket <= 10000
45+
ORDER BY type, line_count_bucket ASC
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
WITH pages_with_phrase AS (
2+
SELECT
3+
client,
4+
rank_grouping,
5+
page,
6+
COUNT(DISTINCT page) OVER (PARTITION BY client, rank_grouping) AS total_pages_with_phrase_in_rank_group,
7+
JSON_QUERY_ARRAY(custom_metrics, '$.privacy.ccpa_link.CCPALinkPhrases') AS ccpa_link_phrases
8+
FROM `httparchive.all.pages`, --TABLESAMPLE SYSTEM (0.01 PERCENT)
9+
UNNEST([1000, 10000, 100000, 1000000, 10000000, 100000000]) AS rank_grouping
10+
WHERE date = '2024-06-01' AND
11+
is_root_page = true AND
12+
rank <= rank_grouping AND
13+
array_length(JSON_QUERY_ARRAY(custom_metrics, '$.privacy.ccpa_link.CCPALinkPhrases')) > 0
14+
)
15+
16+
SELECT
17+
client,
18+
rank_grouping,
19+
link_phrase,
20+
COUNT(DISTINCT page) AS num_pages,
21+
COUNT(DISTINCT page) / any_value(total_pages_with_phrase_in_rank_group) AS pct_pages
22+
FROM pages_with_phrase,
23+
UNNEST(ccpa_link_phrases) AS link_phrase
24+
GROUP BY
25+
link_phrase,
26+
rank_grouping,
27+
client
28+
ORDER BY
29+
rank_grouping,
30+
client,
31+
num_pages DESC
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
WITH pages AS (
2+
SELECT
3+
client,
4+
rank_grouping,
5+
page,
6+
JSON_VALUE(custom_metrics, '$.privacy.ccpa_link.hasCCPALink') AS has_ccpa_link
7+
FROM `httparchive.all.pages`, -- TABLESAMPLE SYSTEM (0.0025 PERCENT)
8+
UNNEST([1000, 10000, 100000, 1000000, 10000000, 100000000]) AS rank_grouping
9+
WHERE date = '2024-06-01' AND
10+
is_root_page = true AND
11+
rank <= rank_grouping
12+
)
13+
14+
SELECT
15+
client,
16+
rank_grouping,
17+
has_ccpa_link,
18+
COUNT(DISTINCT page) AS num_pages
19+
FROM pages
20+
GROUP BY
21+
has_ccpa_link,
22+
rank_grouping,
23+
client
24+
ORDER BY
25+
rank_grouping,
26+
client,
27+
has_ccpa_link
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
WITH RECURSIVE pages AS (
2+
SELECT
3+
page,
4+
JSON_QUERY(custom_metrics, '$.ads.ads') AS ads_metrics
5+
FROM `httparchive.all.pages`
6+
WHERE
7+
date = '2024-06-01' AND
8+
is_root_page = TRUE AND
9+
CAST(JSON_VALUE(custom_metrics, '$.ads.ads.account_count') AS INT64) > 0
10+
), ads AS (
11+
SELECT
12+
page,
13+
variable,
14+
COUNT(DISTINCT page) OVER() AS total_publishers
15+
FROM pages,
16+
UNNEST(JSON_VALUE_ARRAY(ads_metrics, '$.variables')) AS variable
17+
WHERE
18+
CAST(JSON_VALUE(ads_metrics, '$.account_types.reseller.account_count') AS INT64) > 0 OR
19+
CAST(JSON_VALUE(ads_metrics, '$.account_types.direct.account_count') AS INT64) > 0
20+
)
21+
22+
SELECT
23+
variable,
24+
COUNT(DISTINCT page) / ANY_VALUE(total_publishers) AS pct_publishers,
25+
COUNT(DISTINCT page) AS number_of_publishers
26+
FROM ads
27+
GROUP BY variable
28+
ORDER BY pct_publishers DESC
29+
LIMIT 100
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
-- Most common cookie names, by number of domains on which they appear. Goal is to identify common trackers that use first-party cookies across sites.
2+
3+
WITH pages AS (
4+
SELECT
5+
client,
6+
root_page,
7+
custom_metrics,
8+
COUNT(DISTINCT net.host(root_page)) OVER(PARTITION BY client) AS total_domains
9+
FROM `httparchive.all.pages`
10+
WHERE date = '2024-06-01'
11+
), cookies AS (
12+
SELECT
13+
client,
14+
cookie,
15+
NET.HOST(JSON_VALUE(cookie, '$.domain')) AS cookie_host,
16+
NET.HOST(root_page) AS firstparty_host,
17+
total_domains
18+
FROM pages,
19+
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.cookies')) AS cookie
20+
)
21+
22+
SELECT
23+
client,
24+
COUNT(DISTINCT firstparty_host) AS domain_count,
25+
COUNT(DISTINCT firstparty_host) / any_value(total_domains) AS pct_domains,
26+
JSON_VALUE(cookie, '$.name') AS cookie_name
27+
FROM cookies
28+
WHERE firstparty_host LIKE '%' || cookie_host
29+
GROUP BY
30+
client,
31+
cookie_name
32+
ORDER BY
33+
domain_count DESC,
34+
client DESC
35+
LIMIT 500
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
WITH pages AS (
2+
SELECT
3+
page,
4+
client,
5+
root_page,
6+
custom_metrics,
7+
COUNT(DISTINCT page) OVER (PARTITION BY client) AS total_pages
8+
FROM `httparchive.all.pages`
9+
WHERE date = '2024-06-01'
10+
), cookies AS (
11+
SELECT
12+
client,
13+
page,
14+
cookie,
15+
NET.HOST(JSON_VALUE(cookie, '$.domain')) AS cookie_host,
16+
NET.HOST(root_page) AS firstparty_host,
17+
total_pages
18+
FROM pages,
19+
UNNEST(JSON_QUERY_ARRAY(custom_metrics, '$.cookies')) AS cookie
20+
)
21+
22+
SELECT
23+
client,
24+
cookie_host,
25+
COUNT(DISTINCT page) AS page_count,
26+
COUNT(DISTINCT page) / any_value(total_pages) AS pct_pages
27+
FROM cookies
28+
WHERE firstparty_host NOT LIKE '%' || cookie_host
29+
GROUP BY
30+
client,
31+
cookie_host
32+
ORDER BY
33+
page_count DESC,
34+
client
35+
LIMIT 500

0 commit comments

Comments
 (0)