docs: fix Bing crawler duplicate-title, short-description, and missing-alt warnings#131
Merged
Merged
Conversation
The latest/patch builds render the same pages as stable across three URL trees, which Bing flags as duplicate/triplicate titles and content. Add a transformPageData hook that, on preview builds only, emits a noindex robots meta plus a per-page canonical pointing at the stable equivalent. Stable stays fully indexed and remains the sole sitemap.
Bing's crawler flagged 22 pages for short meta descriptions (all under ~100 chars). Rewrite each page's description frontmatter into the recommended ~120-160 char range, keeping them accurate to page content. Values containing colons are quoted so the YAML frontmatter still parses.
GitHub Pages serves each page at both /foo and /foo.html, which Bing indexes as two URLs and flags as duplicates. Enable cleanUrls so internal links and the sitemap use the extensionless form, and emit a self-referencing canonical on every build pointing at that one clean URL (index dirs -> /). Both forms now consolidate to a single canonical. Previews keep their canonical pointing at the stable twin and stay noindex.
Sweep every page still under the safe ~110-char meta-description threshold (plus llms.md, which had none) up into the recommended 120-160 range, keeping each accurate to its page content. All 61 content pages now carry a unique, well-sized description so future Bing crawls stop flagging them.
The VitePress nav logo rendered with an empty alt on every page, which
crawlers/linters report as a missing-alt image. Pass logo as
{ src, alt: 'rotki' } so it emits alt="rotki". All built images now have
non-empty alt text.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Bing's crawler flagged the docs site for three classes of SEO/accessibility issues (see the two
FailingUrlsreports):stable(/),latest(/latest/), andpatch(/patch/) builds render the same pages with identical<title>s across three URL trees./foovs/foo.htmlduplicates — GitHub Pages serves every page at both forms, which the crawler indexes as two distinct, duplicate pages.alton every page.What changed
latest/patchbuilds emit<meta name="robots" content="noindex, follow">plus a canonical pointing at their stable twin, so the preview trees drop out of the index entirely (consistent with the sitemap already being stable-only).cleanUrlsso internal links and the sitemap use the extensionless form, and emit a self-referencing canonical on every page (index dirs →/). Both/fooand/foo.htmlnow consolidate to one canonical URL. The.htmlfiles are still generated, so existing.htmllinks keep working (return 200).llms.mdwhich had none).logois now{ src: '/logo.png', alt: 'rotki' }, so the nav logo emitsalt="rotki".Verification (local build)
<title>among real pages./fooand/foo.htmlboth return 200 with the same canonical.noindex+ canonical→stable.<img>tags: 0 with missing/empty alt.FailingUrlsreports re-checked → clean.Notes
blog.rotki.comappears in the alt report but is a separate site, not this repo — needs fixing wherever the blog lives.