|
| 1 | +# SEO Strategy for npmx.dev |
| 2 | + |
| 3 | +This document outlines the technical SEO strategy adopted for `npmx.dev`, considering its nature as a dynamic SSR application with infinite content (the npm registry) and current internationalization constraints. |
| 4 | + |
| 5 | +## 1. Indexing & Crawling |
| 6 | + |
| 7 | +### The Challenge |
| 8 | + |
| 9 | +`npmx` acts as a mirror/browser for the npm registry. We do not know all valid URLs (`/package/[name]`) in advance, and there are millions of possible combinations. Additionally, invalid URLs could generate spam content or infinite loops. |
| 10 | + |
| 11 | +### The Solution: Organic Crawling |
| 12 | + |
| 13 | +We do not use a massive `sitemap.xml`. We rely on natural link discovery by bots (Googlebot, Bingbot, etc.): |
| 14 | + |
| 15 | +1. **Entry Point:** The Home page (`/`) links to popular packages. |
| 16 | +2. **Expansion:** Each package page links to its **Dependencies**, **DevDependencies**, and **PeerDependencies**. |
| 17 | +3. **Result:** Bots jump from package to package, indexing the npm dependency graph organically and efficiently. |
| 18 | + |
| 19 | +### Error Handling (404) |
| 20 | + |
| 21 | +To prevent indexing of non-existent URLs (`/package/fake-package`): |
| 22 | + |
| 23 | +- The SSR server returns a real **HTTP 404 Not Found** status code when the npm API indicates the package does not exist. |
| 24 | +- This causes search engines to immediately discard the URL and not index it, without needing an explicit `noindex` tag. |
| 25 | + |
| 26 | +## 2. `robots.txt` File |
| 27 | + |
| 28 | +The goal of `robots.txt` is to optimize the _Crawl Budget_ by blocking low-value or computationally expensive areas. |
| 29 | + |
| 30 | +**Proposed `public/robots.txt`:** |
| 31 | + |
| 32 | +```txt |
| 33 | +User-agent: * |
| 34 | +Allow: / |
| 35 | +
|
| 36 | +# Block internal search results (duplicate/infinite content) |
| 37 | +Disallow: /search |
| 38 | +
|
| 39 | +# Block user utilities and settings |
| 40 | +Disallow: /settings |
| 41 | +Disallow: /compare |
| 42 | +Disallow: /auth/ |
| 43 | +
|
| 44 | +# Block code explorer and docs (high crawl cost, low SEO value for general search) |
| 45 | +Disallow: /package-code/ |
| 46 | +Disallow: /package-docs/ |
| 47 | +
|
| 48 | +# Block internal API endpoints |
| 49 | +Disallow: /api/ |
| 50 | +``` |
| 51 | + |
| 52 | +## 3. Internationalization (i18n) & SEO |
| 53 | + |
| 54 | +### Current Status |
| 55 | + |
| 56 | +- The application supports multiple languages (UI). |
| 57 | +- **No URL prefixes are used** (e.g., `/es/package/react` does not exist, only `/package/react`). |
| 58 | +- Language is determined on the client-side (browser) or defaults to English on the server. |
| 59 | + |
| 60 | +### SEO Implications |
| 61 | + |
| 62 | +- **Canonicalization:** There is only one canonical URL per package (`https://npmx.dev/package/react`). |
| 63 | +- **Indexing Language:** Googlebot typically crawls from the US without specific cookies/preferences. The SSR server renders in `en-US` by default. |
| 64 | +- **Result:** **Google will index the site exclusively in English.** |
| 65 | + |
| 66 | +### Is this a problem? |
| 67 | + |
| 68 | +**No.** For a global technical tool like `npmx`: |
| 69 | + |
| 70 | +- Search traffic is predominantly in English (package names, technical terms). |
| 71 | +- We avoid the complexity of managing `hreflang` and duplicate content across 20+ languages. |
| 72 | +- User Experience (UX) remains localized: users land on the page (indexed in English), and the client hydrates the app in their preferred language. |
| 73 | + |
| 74 | +## 4. Summary of Actions |
| 75 | + |
| 76 | +1. ✅ **404 Status:** Ensured in SSR for non-existent packages. |
| 77 | +2. ✅ **Internal Linking:** Dependency components (`Dependencies.vue`) generate crawlable links (`<NuxtLink>`). |
| 78 | +3. ✅ **Dynamic Titles:** `useSeoMeta` correctly manages titles and descriptions, escaping special characters for security and proper display. |
| 79 | +4. 📝 **Pending:** Update `public/robots.txt` with the proposed blocking rules to protect the _Crawl Budget_. |
| 80 | + |
| 81 | +## 5. Implementation Details: Meta Tags & Sitemap |
| 82 | + |
| 83 | +### Pages Requiring `noindex, nofollow` |
| 84 | + |
| 85 | +Based on the `robots.txt` strategy, the following Vue pages should explicitly include the `<meta name="robots" content="noindex, nofollow">` tag via `useSeoMeta`. This acts as a second layer of defense against indexing low-value content. |
| 86 | + |
| 87 | +- **`app/pages/search.vue`**: Internal search results. |
| 88 | +- **`app/pages/settings.vue`**: User preferences. |
| 89 | +- **`app/pages/compare.vue`**: Dynamic comparison tool. |
| 90 | +- **`app/pages/package-code/[...path].vue`**: Source code explorer. |
| 91 | +- **`app/pages/package-docs/[...path].vue`**: Generated documentation (consistent with robots.txt block). |
| 92 | + |
| 93 | +### Canonical URLs & i18n |
| 94 | + |
| 95 | +- **Canonical Rule:** The canonical URL is **always the English (default) URL**, regardless of the user's selected language or browser settings. |
| 96 | + - Example: `https://npmx.dev/package/react` |
| 97 | +- **Reasoning:** Since we do not use URL prefixes for languages (e.g., `/es/...`), there is technically only _one_ URL per resource. The language change happens client-side. Therefore, the canonical tag must point to this single, authoritative URL to prevent confusion for search engines. |
| 98 | + |
| 99 | +### Sitemap Strategy |
| 100 | + |
| 101 | +- **Decision:** **No `sitemap.xml` will be generated.** |
| 102 | +- **Why?** |
| 103 | + - Generating a sitemap for 2+ million npm packages is technically unfeasible and expensive to maintain. |
| 104 | + - A partial sitemap (e.g., top 50k packages) is redundant because these packages are already well-linked from the Home page and "Popular" lists. |
| 105 | + - **Organic Discovery:** As detailed in Section 1, bots will discover content naturally by following dependency links, which is the most efficient way to index a graph-based dataset like npm. |
0 commit comments