Skip to content

Commit ff8a9b9

Browse files
authored
(WIP) [Mistral] - Add dedicated OCR notebook (#4088)
* Add dedicated OCR notebook * - Remove OCR mentions in initial notebook - Clear outputs and variable names in OCR notebook * Fix * Fix
1 parent 5676d07 commit ff8a9b9

2 files changed

Lines changed: 657 additions & 108 deletions

File tree

notebooks/official/generative_ai/mistralai_intro.ipynb

Lines changed: 1 addition & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,6 @@
8585
"\n",
8686
"### Available Mistral AI models\n",
8787
"\n",
88-
"* ### Mistral OCR (25.05)\n",
89-
"Mistral OCR (25.05) is a model specialized in extracting text and images from documents. It is specifically built to preserve the structure of the document pages and automatically formats the extracted text in Markdown.\n",
90-
"\n",
9188
"* ### Mistral Small 3.1 (25.03)\n",
9289
"Mistral Small 3.1 (25.03) is the enhanced version of Mistral Small 3, featuring multimodal capabilities and an extended context length of up to 128k.\n",
9390
"\n",
@@ -174,16 +171,13 @@
174171
},
175172
"outputs": [],
176173
"source": [
177-
"MODEL = \"mistral-small-2503\" # @param [\"mistral-small-2503\", \"codestral-2501\", \"mistral-large-2411\", \"mistral-nemo\", \"mistral-ocr-2505\"]\n",
174+
"MODEL = \"mistral-small-2503\" # @param [\"mistral-small-2503\", \"codestral-2501\", \"mistral-large-2411\", \"mistral-nemo\"]\n",
178175
"if MODEL == \"mistral-small-2503\":\n",
179176
" available_regions = [\"europe-west4\", \"us-central1\"]\n",
180177
" available_versions = [\"latest\"]\n",
181178
"elif MODEL == \"mistral-large-2411\":\n",
182179
" available_regions = [\"europe-west4\", \"us-central1\"]\n",
183180
" available_versions = [\"latest\"]\n",
184-
"elif MODEL == \"mistral-ocr-2505\":\n",
185-
" available_regions = [\"europe-west4\", \"us-central1\"]\n",
186-
" available_versions = [\"latest\"]\n",
187181
"elif MODEL == \"mistral-nemo\":\n",
188182
" available_regions = [\"europe-west4\", \"us-central1\"]\n",
189183
" available_versions = [\"2407\"]\n",
@@ -300,15 +294,6 @@
300294
"import requests"
301295
]
302296
},
303-
{
304-
"cell_type": "markdown",
305-
"metadata": {
306-
"id": "Y_8xeTfWqToW"
307-
},
308-
"source": [
309-
"To try the OCR model, please refer to the section \"Performing OCR\"."
310-
]
311-
},
312297
{
313298
"cell_type": "markdown",
314299
"metadata": {
@@ -1413,98 +1398,6 @@
14131398
" print(f\"Request failed with status code: {response.status_code}\")"
14141399
]
14151400
},
1416-
{
1417-
"cell_type": "markdown",
1418-
"metadata": {
1419-
"id": "znQtFeXUqToe"
1420-
},
1421-
"source": [
1422-
"### Performing OCR\n",
1423-
"\n",
1424-
"Mistral OCR (`mistral-ocr-2505`) is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents: images, text, tables, equations, etc. It takes images and PDFs as input and extracts content in an ordered interleaved text and images.\n",
1425-
"\n",
1426-
"The following example showcases the application of the OCR API on a PDF file."
1427-
]
1428-
},
1429-
{
1430-
"cell_type": "code",
1431-
"execution_count": null,
1432-
"metadata": {
1433-
"id": "KRaYEGJ1qToe"
1434-
},
1435-
"outputs": [],
1436-
"source": [
1437-
"import base64\n",
1438-
"\n",
1439-
"# The URL you provided\n",
1440-
"pdf_url = \"https://arxiv.org/pdf/2501.12948\"\n",
1441-
"\n",
1442-
"# 1. Download the file\n",
1443-
"print(f\"Attempting to download from: {pdf_url}\")\n",
1444-
"response = requests.get(pdf_url)\n",
1445-
"pdf_content_bytes = response.content\n",
1446-
"\n",
1447-
"# 2. Base64 encode the downloaded content\n",
1448-
"encoded_content_bytes = base64.b64encode(pdf_content_bytes)\n",
1449-
"encoded_string = encoded_content_bytes.decode(\"utf-8\")\n",
1450-
"\n",
1451-
"print(\"Download and encoding complete.\")\n",
1452-
"\n",
1453-
"pdf_doc_base64_uri = f\"data:application/pdf;base64,{encoded_string}\"\n",
1454-
"\n",
1455-
"# print(pdf_doc_base64_uri)"
1456-
]
1457-
},
1458-
{
1459-
"cell_type": "code",
1460-
"execution_count": null,
1461-
"metadata": {
1462-
"id": "Tn3sECpRqToj"
1463-
},
1464-
"outputs": [],
1465-
"source": [
1466-
"# Get the access token\n",
1467-
"process = subprocess.Popen(\n",
1468-
" \"gcloud auth print-access-token\", stdout=subprocess.PIPE, shell=True\n",
1469-
")\n",
1470-
"(access_token_bytes, err) = process.communicate()\n",
1471-
"access_token = access_token_bytes.decode(\"utf-8\").strip() # Strip newline\n",
1472-
"\n",
1473-
"OCR_MODEL = \"mistral-ocr-2505\"\n",
1474-
"url = f\"{ENDPOINT}/v1/projects/{PROJECT_ID}/locations/{LOCATION}/publishers/mistralai/models/{OCR_MODEL}:rawPredict\"\n",
1475-
"data = {\n",
1476-
" \"model\": OCR_MODEL,\n",
1477-
" \"document\": {\"type\": \"document_url\", \"document_url\": pdf_doc_base64_uri},\n",
1478-
"}\n",
1479-
"headers = {\n",
1480-
" \"Authorization\": f\"Bearer {access_token}\",\n",
1481-
" \"Content-Type\": \"application/json\",\n",
1482-
"}\n",
1483-
"\n",
1484-
"# Make the POST request\n",
1485-
"response = requests.post(url, headers=headers, json=data)\n",
1486-
"\n",
1487-
"# Check status code and try to parse the response as JSON\n",
1488-
"if response.status_code == 200:\n",
1489-
" try:\n",
1490-
" response_dict = response.json()\n",
1491-
" print(response_dict)\n",
1492-
" except json.JSONDecodeError as e:\n",
1493-
" print(\"Error decoding JSON:\", e)\n",
1494-
" print(\"Raw response:\", response.text) # Print raw response if parsing fails\n",
1495-
"else:\n",
1496-
" print(f\"Request failed with status code: {response.status_code}\")"
1497-
]
1498-
},
1499-
{
1500-
"cell_type": "markdown",
1501-
"metadata": {
1502-
"id": "CmpLW-IEqToj"
1503-
},
1504-
"source": [
1505-
"To get more details on the options available when calling the OCR API, please refer to the [Mistral API documentation](https://docs.mistral.ai/api/#tag/ocr)."
1506-
]
1507-
},
15081401
{
15091402
"cell_type": "markdown",
15101403
"metadata": {

0 commit comments

Comments
 (0)