Skip to content

Commit 7cfca83

Browse files
committed
chore: add SEO stragegy cookbook
1 parent f8a66e0 commit 7cfca83

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

SEO-STRATEGY.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# SEO Strategy for npmx.dev
2+
3+
This document outlines the technical SEO strategy adopted for `npmx.dev`, considering its nature as a dynamic SSR application with infinite content (the npm registry) and current internationalization constraints.
4+
5+
## 1. Indexing & Crawling
6+
7+
### The Challenge
8+
9+
`npmx` acts as a mirror/browser for the npm registry. We do not know all valid URLs (`/package/[name]`) in advance, and there are millions of possible combinations. Additionally, invalid URLs could generate spam content or infinite loops.
10+
11+
### The Solution: Organic Crawling
12+
13+
We do not use a massive `sitemap.xml`. We rely on natural link discovery by bots (Googlebot, Bingbot, etc.):
14+
15+
1. **Entry Point:** The Home page (`/`) links to popular packages.
16+
2. **Expansion:** Each package page links to its **Dependencies**, **DevDependencies**, and **PeerDependencies**.
17+
3. **Result:** Bots jump from package to package, indexing the npm dependency graph organically and efficiently.
18+
19+
### Error Handling (404)
20+
21+
To prevent indexing of non-existent URLs (`/package/fake-package`):
22+
23+
- The SSR server returns a real **HTTP 404 Not Found** status code when the npm API indicates the package does not exist.
24+
- This causes search engines to immediately discard the URL and not index it, without needing an explicit `noindex` tag.
25+
26+
## 2. `robots.txt` File
27+
28+
The goal of `robots.txt` is to optimize the _Crawl Budget_ by blocking low-value or computationally expensive areas.
29+
30+
**Proposed `public/robots.txt`:**
31+
32+
```txt
33+
User-agent: *
34+
Allow: /
35+
36+
# Block internal search results (duplicate/infinite content)
37+
Disallow: /search
38+
39+
# Block user utilities and settings
40+
Disallow: /settings
41+
Disallow: /compare
42+
Disallow: /auth/
43+
44+
# Block code explorer and docs (high crawl cost, low SEO value for general search)
45+
Disallow: /package-code/
46+
Disallow: /package-docs/
47+
48+
# Block internal API endpoints
49+
Disallow: /api/
50+
```
51+
52+
## 3. Internationalization (i18n) & SEO
53+
54+
### Current Status
55+
56+
- The application supports multiple languages (UI).
57+
- **No URL prefixes are used** (e.g., `/es/package/react` does not exist, only `/package/react`).
58+
- Language is determined on the client-side (browser) or defaults to English on the server.
59+
60+
### SEO Implications
61+
62+
- **Canonicalization:** There is only one canonical URL per package (`https://npmx.dev/package/react`).
63+
- **Indexing Language:** Googlebot typically crawls from the US without specific cookies/preferences. The SSR server renders in `en-US` by default.
64+
- **Result:** **Google will index the site exclusively in English.**
65+
66+
### Is this a problem?
67+
68+
**No.** For a global technical tool like `npmx`:
69+
70+
- Search traffic is predominantly in English (package names, technical terms).
71+
- We avoid the complexity of managing `hreflang` and duplicate content across 20+ languages.
72+
- User Experience (UX) remains localized: users land on the page (indexed in English), and the client hydrates the app in their preferred language.
73+
74+
## 4. Summary of Actions
75+
76+
1.**404 Status:** Ensured in SSR for non-existent packages.
77+
2.**Internal Linking:** Dependency components (`Dependencies.vue`) generate crawlable links (`<NuxtLink>`).
78+
3.**Dynamic Titles:** `useSeoMeta` correctly manages titles and descriptions, escaping special characters for security and proper display.
79+
4. 📝 **Pending:** Update `public/robots.txt` with the proposed blocking rules to protect the _Crawl Budget_.
80+
81+
## 5. Implementation Details: Meta Tags & Sitemap
82+
83+
### Pages Requiring `noindex, nofollow`
84+
85+
Based on the `robots.txt` strategy, the following Vue pages should explicitly include the `<meta name="robots" content="noindex, nofollow">` tag via `useSeoMeta`. This acts as a second layer of defense against indexing low-value content.
86+
87+
- **`app/pages/search.vue`**: Internal search results.
88+
- **`app/pages/settings.vue`**: User preferences.
89+
- **`app/pages/compare.vue`**: Dynamic comparison tool.
90+
- **`app/pages/package-code/[...path].vue`**: Source code explorer.
91+
- **`app/pages/package-docs/[...path].vue`**: Generated documentation (consistent with robots.txt block).
92+
93+
### Canonical URLs & i18n
94+
95+
- **Canonical Rule:** The canonical URL is **always the English (default) URL**, regardless of the user's selected language or browser settings.
96+
- Example: `https://npmx.dev/package/react`
97+
- **Reasoning:** Since we do not use URL prefixes for languages (e.g., `/es/...`), there is technically only _one_ URL per resource. The language change happens client-side. Therefore, the canonical tag must point to this single, authoritative URL to prevent confusion for search engines.
98+
99+
### Sitemap Strategy
100+
101+
- **Decision:** **No `sitemap.xml` will be generated.**
102+
- **Why?**
103+
- Generating a sitemap for 2+ million npm packages is technically unfeasible and expensive to maintain.
104+
- A partial sitemap (e.g., top 50k packages) is redundant because these packages are already well-linked from the Home page and "Popular" lists.
105+
- **Organic Discovery:** As detailed in Section 1, bots will discover content naturally by following dependency links, which is the most efficient way to index a graph-based dataset like npm.

0 commit comments

Comments
 (0)