GoogleCloudPlatform
diff --git a/‎notebooks/community/model_garden/model_garden_advanced_features.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎notebooks/community/model_garden/model_garden_advanced_features.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎notebooks/community/model_garden/model_garden_codegemma_deployment_on_vertex.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_codegemma_deployment_on_vertex.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_deployment_tutorial.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_deployment_tutorial.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_gemma2_deployment_on_vertex.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎notebooks/community/model_garden/model_garden_gemma2_deployment_on_vertex.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎notebooks/community/model_garden/model_garden_gemma_deployment_on_vertex.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_gemma_deployment_on_vertex.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_hexllm_deep_dive_tutorial.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_hexllm_deep_dive_tutorial.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_huggingface_pytorch_inference_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_huggingface_pytorch_inference_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_huggingface_tei_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_huggingface_tei_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_huggingface_tgi_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_huggingface_tgi_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎notebooks/community/model_garden/model_garden_huggingface_vllm_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎notebooks/community/model_garden/model_garden_huggingface_vllm_deployment.ipynb‎
Lines changed: 1 addition & 1 deletion
@@ -109,9 +109,9 @@
         "- Custom model serving TPU v5e cores per region\n",
         "- Custom model serving Nvidia A100 80GB GPUs per region\n",
         "\n",
-        "By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4, which is sufficient for serving the Llama 3.1 8B model. The Llama 3.1 70B model requires 16 TPU v5e cores. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4, which is sufficient for serving the Llama 3.1 8B model. The Llama 3.1 70B model requires 16 TPU v5e cores. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
-        "The quota for A100_80GB deployment `Custom model serving Nvidia A100 80GB GPUs per region` is 0. You need to request at least 4 for 70B model and 1 for 8B model following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota)."
+        "The quota for A100_80GB deployment `Custom model serving Nvidia A100 80GB GPUs per region` is 0. You need to request at least 4 for 70B model and 1 for 8B model following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota)."
       ]
     },
     {
@@ -136,7 +136,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -95,7 +95,7 @@
         "\n",
         "# @markdown 1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
         "\n",
-        "# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown 3. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
         "\n",
 
@@ -97,7 +97,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -104,7 +104,7 @@
       "source": [
         "# @title Request for TPU quota\n",
         "\n",
-        "# @markdown By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota)."
+        "# @markdown By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota)."
       ]
     },
     {
@@ -128,7 +128,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -96,7 +96,7 @@
         "\n",
         "# @markdown 1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
         "\n",
-        "# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown 3. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
         "\n",
 
@@ -106,7 +106,7 @@
         "# @markdown\n",
         "# @markdown By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. This will be lifted to 16 in the future.\n",
         "# @markdown Verify that you have the appropriate TPU quota for your chosen configuration (e.g., 1, 4, 8, or 16 cores) in the selected region.\n",
-        "# @markdown You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota)."
+        "# @markdown You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota)."
       ]
     },
     {
 
@@ -101,7 +101,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -110,7 +110,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -102,7 +102,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
 
@@ -102,7 +102,7 @@
         "\n",
         "REGION = \"\"  # @param {type:\"string\"}\n",
         "\n",
-        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
+        "# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
         "\n",
         "# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
         "# @markdown | ----------- | ----------- | ----------- |\n",
Original file line number	Diff line number	Diff line change
`@@ -106,7 +106,7 @@`
`106`	`106`	`"# @markdown\n",`
`107`	`107`	"# @markdown By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. This will be lifted to 16 in the future.\n",
`108`	`108`	`"# @markdown Verify that you have the appropriate TPU quota for your chosen configuration (e.g., 1, 4, 8, or 16 cores) in the selected region.\n",`
`109`		`- "# @markdown You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota)."`
	`109`	`+ "# @markdown You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota)."`
`110`	`110`	`]`
`111`	`111`	`},`
`112`	`112`	`{`