|
85 | 85 | "\n", |
86 | 86 | "### Available Mistral AI models\n", |
87 | 87 | "\n", |
88 | | - "* ### Mistral OCR (25.05)\n", |
89 | | - "Mistral OCR (25.05) is a model specialized in extracting text and images from documents. It is specifically built to preserve the structure of the document pages and automatically formats the extracted text in Markdown.\n", |
90 | | - "\n", |
91 | 88 | "* ### Mistral Small 3.1 (25.03)\n", |
92 | 89 | "Mistral Small 3.1 (25.03) is the enhanced version of Mistral Small 3, featuring multimodal capabilities and an extended context length of up to 128k.\n", |
93 | 90 | "\n", |
|
174 | 171 | }, |
175 | 172 | "outputs": [], |
176 | 173 | "source": [ |
177 | | - "MODEL = \"mistral-small-2503\" # @param [\"mistral-small-2503\", \"codestral-2501\", \"mistral-large-2411\", \"mistral-nemo\", \"mistral-ocr-2505\"]\n", |
| 174 | + "MODEL = \"mistral-small-2503\" # @param [\"mistral-small-2503\", \"codestral-2501\", \"mistral-large-2411\", \"mistral-nemo\"]\n", |
178 | 175 | "if MODEL == \"mistral-small-2503\":\n", |
179 | 176 | " available_regions = [\"europe-west4\", \"us-central1\"]\n", |
180 | 177 | " available_versions = [\"latest\"]\n", |
181 | 178 | "elif MODEL == \"mistral-large-2411\":\n", |
182 | 179 | " available_regions = [\"europe-west4\", \"us-central1\"]\n", |
183 | 180 | " available_versions = [\"latest\"]\n", |
184 | | - "elif MODEL == \"mistral-ocr-2505\":\n", |
185 | | - " available_regions = [\"europe-west4\", \"us-central1\"]\n", |
186 | | - " available_versions = [\"latest\"]\n", |
187 | 181 | "elif MODEL == \"mistral-nemo\":\n", |
188 | 182 | " available_regions = [\"europe-west4\", \"us-central1\"]\n", |
189 | 183 | " available_versions = [\"2407\"]\n", |
|
300 | 294 | "import requests" |
301 | 295 | ] |
302 | 296 | }, |
303 | | - { |
304 | | - "cell_type": "markdown", |
305 | | - "metadata": { |
306 | | - "id": "Y_8xeTfWqToW" |
307 | | - }, |
308 | | - "source": [ |
309 | | - "To try the OCR model, please refer to the section \"Performing OCR\"." |
310 | | - ] |
311 | | - }, |
312 | 297 | { |
313 | 298 | "cell_type": "markdown", |
314 | 299 | "metadata": { |
|
1413 | 1398 | " print(f\"Request failed with status code: {response.status_code}\")" |
1414 | 1399 | ] |
1415 | 1400 | }, |
1416 | | - { |
1417 | | - "cell_type": "markdown", |
1418 | | - "metadata": { |
1419 | | - "id": "znQtFeXUqToe" |
1420 | | - }, |
1421 | | - "source": [ |
1422 | | - "### Performing OCR\n", |
1423 | | - "\n", |
1424 | | - "Mistral OCR (`mistral-ocr-2505`) is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents: images, text, tables, equations, etc. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.\n", |
1425 | | - "\n", |
1426 | | - "The following example showcases the application of the OCR API on a PDF file." |
1427 | | - ] |
1428 | | - }, |
1429 | | - { |
1430 | | - "cell_type": "code", |
1431 | | - "execution_count": null, |
1432 | | - "metadata": { |
1433 | | - "id": "KRaYEGJ1qToe" |
1434 | | - }, |
1435 | | - "outputs": [], |
1436 | | - "source": [ |
1437 | | - "import base64\n", |
1438 | | - "\n", |
1439 | | - "# The URL you provided\n", |
1440 | | - "pdf_url = \"https://arxiv.org/pdf/2501.12948\"\n", |
1441 | | - "\n", |
1442 | | - "# 1. Download the file\n", |
1443 | | - "print(f\"Attempting to download from: {pdf_url}\")\n", |
1444 | | - "response = requests.get(pdf_url)\n", |
1445 | | - "pdf_content_bytes = response.content\n", |
1446 | | - "\n", |
1447 | | - "# 2. Base64 encode the downloaded content\n", |
1448 | | - "encoded_content_bytes = base64.b64encode(pdf_content_bytes)\n", |
1449 | | - "encoded_string = encoded_content_bytes.decode(\"utf-8\")\n", |
1450 | | - "\n", |
1451 | | - "print(\"Download and encoding complete.\")\n", |
1452 | | - "\n", |
1453 | | - "pdf_doc_base64_uri = f\"data:application/pdf;base64,{encoded_string}\"\n", |
1454 | | - "\n", |
1455 | | - "# print(pdf_doc_base64_uri)" |
1456 | | - ] |
1457 | | - }, |
1458 | | - { |
1459 | | - "cell_type": "code", |
1460 | | - "execution_count": null, |
1461 | | - "metadata": { |
1462 | | - "id": "Tn3sECpRqToj" |
1463 | | - }, |
1464 | | - "outputs": [], |
1465 | | - "source": [ |
1466 | | - "# Get the access token\n", |
1467 | | - "process = subprocess.Popen(\n", |
1468 | | - " \"gcloud auth print-access-token\", stdout=subprocess.PIPE, shell=True\n", |
1469 | | - ")\n", |
1470 | | - "(access_token_bytes, err) = process.communicate()\n", |
1471 | | - "access_token = access_token_bytes.decode(\"utf-8\").strip() # Strip newline\n", |
1472 | | - "\n", |
1473 | | - "OCR_MODEL = \"mistral-ocr-2505\"\n", |
1474 | | - "url = f\"{ENDPOINT}/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/mistralai/models/{OCR_MODEL}:rawPredict\"\n", |
1475 | | - "data = {\n", |
1476 | | - " \"model\": OCR_MODEL,\n", |
1477 | | - " \"document\": {\"type\": \"document_url\", \"document_url\": pdf_doc_base64_uri},\n", |
1478 | | - "}\n", |
1479 | | - "headers = {\n", |
1480 | | - " \"Authorization\": f\"Bearer {access_token}\",\n", |
1481 | | - " \"Content-Type\": \"application/json\",\n", |
1482 | | - "}\n", |
1483 | | - "\n", |
1484 | | - "# Make the POST request\n", |
1485 | | - "response = requests.post(url, headers=headers, json=data)\n", |
1486 | | - "\n", |
1487 | | - "# Check status code and try to parse the response as JSON\n", |
1488 | | - "if response.status_code == 200:\n", |
1489 | | - " try:\n", |
1490 | | - " response_dict = response.json()\n", |
1491 | | - " print(response_dict)\n", |
1492 | | - " except json.JSONDecodeError as e:\n", |
1493 | | - " print(\"Error decoding JSON:\", e)\n", |
1494 | | - " print(\"Raw response:\", response.text) # Print raw response if parsing fails\n", |
1495 | | - "else:\n", |
1496 | | - " print(f\"Request failed with status code: {response.status_code}\")" |
1497 | | - ] |
1498 | | - }, |
1499 | | - { |
1500 | | - "cell_type": "markdown", |
1501 | | - "metadata": { |
1502 | | - "id": "CmpLW-IEqToj" |
1503 | | - }, |
1504 | | - "source": [ |
1505 | | - "To get more details on the options available when calling the OCR API, please refer to the [Mistral API documentation](https://docs.mistral.ai/api/#tag/ocr)." |
1506 | | - ] |
1507 | | - }, |
1508 | 1401 | { |
1509 | 1402 | "cell_type": "markdown", |
1510 | 1403 | "metadata": { |
|
0 commit comments