Skip to content

docs: fix Bing crawler duplicate-title, short-description, and missing-alt warnings#131

Merged
kelsos merged 5 commits into
rotki:mainfrom
kelsos:seo/noindex-version-previews
Jun 25, 2026
Merged

docs: fix Bing crawler duplicate-title, short-description, and missing-alt warnings#131
kelsos merged 5 commits into
rotki:mainfrom
kelsos:seo/noindex-version-previews

Conversation

@kelsos

@kelsos kelsos commented Jun 25, 2026

Copy link
Copy Markdown
Member

Why

Bing's crawler flagged the docs site for three classes of SEO/accessibility issues (see the two FailingUrls reports):

  1. Duplicate / triplicate page titles — the stable (/), latest (/latest/), and patch (/patch/) builds render the same pages with identical <title>s across three URL trees.
  2. /foo vs /foo.html duplicates — GitHub Pages serves every page at both forms, which the crawler indexes as two distinct, duplicate pages.
  3. Short meta descriptions — many pages had descriptions under ~100 chars (Bing's short threshold).
  4. Missing image alt — the VitePress nav logo rendered with an empty alt on every page.

What changed

  • noindex previewslatest/patch builds emit <meta name="robots" content="noindex, follow"> plus a canonical pointing at their stable twin, so the preview trees drop out of the index entirely (consistent with the sitemap already being stable-only).
  • clean URLs + self-canonical — enable cleanUrls so internal links and the sitemap use the extensionless form, and emit a self-referencing canonical on every page (index dirs → /). Both /foo and /foo.html now consolidate to one canonical URL. The .html files are still generated, so existing .html links keep working (return 200).
  • meta descriptions — every content page now has a unique description in the recommended ~120–160 char range (the 22 pages Bing flagged, plus a sweep of all remaining short ones and llms.md which had none).
  • logo altlogo is now { src: '/logo.png', alt: 'rotki' }, so the nav logo emits alt="rotki".

Verification (local build)

  • 61 content pages: all have unique descriptions, 110–160 chars; none missing/default/duplicate.
  • No duplicate <title> among real pages.
  • Every page self-canonicals to one clean URL; /foo and /foo.html both return 200 with the same canonical.
  • Previews carry noindex + canonical→stable.
  • 260 built <img> tags: 0 with missing/empty alt.
  • Every URL in both FailingUrls reports re-checked → clean.

Notes

  • blog.rotki.com appears in the alt report but is a separate site, not this repo — needs fixing wherever the blog lives.
  • Warnings won't clear instantly; Bing needs a few recrawl cycles to follow the canonicals and drop the duplicate/preview URLs. Can be nudged via Bing Webmaster Tools.

kelsos added 5 commits June 25, 2026 12:47
The latest/patch builds render the same pages as stable across three URL
trees, which Bing flags as duplicate/triplicate titles and content. Add a
transformPageData hook that, on preview builds only, emits a noindex robots
meta plus a per-page canonical pointing at the stable equivalent. Stable
stays fully indexed and remains the sole sitemap.
Bing's crawler flagged 22 pages for short meta descriptions (all under
~100 chars). Rewrite each page's description frontmatter into the
recommended ~120-160 char range, keeping them accurate to page content.
Values containing colons are quoted so the YAML frontmatter still parses.
GitHub Pages serves each page at both /foo and /foo.html, which Bing
indexes as two URLs and flags as duplicates. Enable cleanUrls so internal
links and the sitemap use the extensionless form, and emit a
self-referencing canonical on every build pointing at that one clean URL
(index dirs -> /). Both forms now consolidate to a single canonical.
Previews keep their canonical pointing at the stable twin and stay noindex.
Sweep every page still under the safe ~110-char meta-description threshold
(plus llms.md, which had none) up into the recommended 120-160 range,
keeping each accurate to its page content. All 61 content pages now carry
a unique, well-sized description so future Bing crawls stop flagging them.
The VitePress nav logo rendered with an empty alt on every page, which
crawlers/linters report as a missing-alt image. Pass logo as
{ src, alt: 'rotki' } so it emits alt="rotki". All built images now have
non-empty alt text.
@kelsos kelsos merged commit 35faf2e into rotki:main Jun 25, 2026
5 checks passed
@kelsos kelsos deleted the seo/noindex-version-previews branch June 25, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant