Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "DZ1j6RRg-Td6",
"metadata": {
"cellView": "form",
"id": "f705f4be70e9"
Expand All @@ -27,7 +26,6 @@
},
{
"cell_type": "markdown",
"id": "99c1c3fc2ca5",
"metadata": {
"id": "71a642b5575a"
},
Expand All @@ -50,7 +48,6 @@
},
{
"cell_type": "markdown",
"id": "f9-tJ6RfDLIs",
"metadata": {
"id": "0779b48f654e"
},
Expand Down Expand Up @@ -88,7 +85,6 @@
},
{
"cell_type": "markdown",
"id": "47GcOrZjosOx",
"metadata": {
"id": "69453bf7230e"
},
Expand All @@ -98,7 +94,6 @@
},
{
"cell_type": "markdown",
"id": "1D_pWejJPHP3",
"metadata": {
"id": "bf3706e69f61"
},
Expand All @@ -109,15 +104,14 @@
"- Custom model serving TPU v5e cores per region\n",
"- Custom model serving Nvidia A100 80GB GPUs per region\n",
"\n",
"By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4, which is sufficient for serving the Llama 3.1 8B model. The Llama 3.1 70B model requires 16 TPU v5e cores. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
"By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4, which is sufficient for serving the Llama 3.1 8B model. The Llama 3.1 70B model requires 16 TPU v5e cores. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
"\n",
"The quota for A100_80GB deployment `Custom model serving Nvidia A100 80GB GPUs per region` is 0. You need to request at least 4 for 70B model and 1 for 8B model following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota)."
"The quota for A100_80GB deployment `Custom model serving Nvidia A100 80GB GPUs per region` is 0. You need to request at least 4 for 70B model and 1 for 8B model following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "L3dqbxovo5t6",
"metadata": {
"cellView": "form",
"id": "50047cc80bb9"
Expand All @@ -136,7 +130,7 @@
"\n",
"REGION = \"\" # @param {type:\"string\"}\n",
"\n",
"# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
"# @markdown 4. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
"\n",
"# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
"# @markdown | ----------- | ----------- | ----------- |\n",
Expand Down Expand Up @@ -223,7 +217,6 @@
},
{
"cell_type": "markdown",
"id": "SeGqxuMfRBS5",
"metadata": {
"id": "4782dd003acb"
},
Expand All @@ -234,7 +227,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "BxlzWU2KQqmw",
"metadata": {
"cellView": "form",
"id": "798068fc0355"
Expand Down Expand Up @@ -274,7 +266,6 @@
},
{
"cell_type": "markdown",
"id": "JpNBJJgjWL7j",
"metadata": {
"id": "10ed490e28e5"
},
Expand Down Expand Up @@ -302,7 +293,6 @@
},
{
"cell_type": "markdown",
"id": "9gZJ8cB27e1m",
"metadata": {
"id": "30ddb93fdd7b"
},
Expand All @@ -317,7 +307,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "RpmoA2nXjdCd",
"metadata": {
"cellView": "form",
"id": "b56d82c1aa6f"
Expand Down Expand Up @@ -506,7 +495,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "5QoK8c0R9U3B",
"metadata": {
"cellView": "form",
"id": "96c5afed49b4"
Expand Down Expand Up @@ -572,7 +560,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "29rn5ATmB2YC",
"metadata": {
"cellView": "form",
"id": "9a95c9f90358"
Expand Down Expand Up @@ -646,7 +633,6 @@
},
{
"cell_type": "markdown",
"id": "KjbM8E9DGuuR",
"metadata": {
"id": "12ad6d1ff725"
},
Expand All @@ -657,7 +643,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "JpLU7GRQGuuR",
"metadata": {
"cellView": "form",
"id": "1ab4e3bb74b4"
Expand All @@ -684,7 +669,6 @@
},
{
"cell_type": "markdown",
"id": "XZ33HhYmOxCS",
"metadata": {
"id": "7a8a9a1b2ddf"
},
Expand All @@ -706,7 +690,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "E8OiHHNNE_wj",
"metadata": {
"cellView": "form",
"id": "4425cc0bdedc"
Expand Down Expand Up @@ -910,7 +893,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "zex1oXl36A70",
"metadata": {
"cellView": "form",
"id": "bcbafec839cd"
Expand Down Expand Up @@ -973,7 +955,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "gDOC_nfsJeUR",
"metadata": {
"cellView": "form",
"id": "e984f43422d5"
Expand Down Expand Up @@ -1047,7 +1028,6 @@
},
{
"cell_type": "markdown",
"id": "GdGxaTirJeUR",
"metadata": {
"id": "dff0d10dcc20"
},
Expand All @@ -1058,7 +1038,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "OgoqXE-VJeUR",
"metadata": {
"cellView": "form",
"id": "5b8751773e7f"
Expand All @@ -1085,7 +1064,6 @@
},
{
"cell_type": "markdown",
"id": "w4Guijaw_NEs",
"metadata": {
"id": "863775857a46"
},
Expand All @@ -1100,7 +1078,6 @@
},
{
"cell_type": "markdown",
"id": "ml8fgoIQWSbY",
"metadata": {
"id": "565cbdc3a06b"
},
Expand Down Expand Up @@ -1142,7 +1119,6 @@
},
{
"cell_type": "markdown",
"id": "NmWRro8Q-Td6",
"metadata": {
"id": "94eaa9050abb"
},
Expand All @@ -1153,7 +1129,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "72d1GlrYifKU",
"metadata": {
"cellView": "form",
"id": "5f358cc230a6"
Expand Down Expand Up @@ -1470,7 +1445,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "CNiItf5hdVFU",
"metadata": {
"cellView": "form",
"id": "be3170e0e05a"
Expand Down Expand Up @@ -1546,7 +1520,6 @@
},
{
"cell_type": "markdown",
"id": "WahYGAZyq6Gl",
"metadata": {
"id": "30c5d2535df3"
},
Expand All @@ -1556,7 +1529,6 @@
},
{
"cell_type": "markdown",
"id": "bV5Yjkgav9BZ",
"metadata": {
"id": "63c10917ff95"
},
Expand All @@ -1567,7 +1539,6 @@
{
"cell_type": "code",
"execution_count": null,
"id": "qsks36cOH9rb",
"metadata": {
"cellView": "form",
"id": "92892e1b1730"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@
"\n",
"# @markdown 1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
"\n",
"# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
"# @markdown 2. By default, the quota for TPU deployment `Custom model serving TPU v5e cores per region` is 4. TPU quota is only available in `us-west1`. You can request for higher TPU quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
"\n",
"# @markdown 3. **[Optional]** [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
"\n",
Expand Down Expand Up @@ -434,7 +434,6 @@
" )\n",
" return model, endpoint\n",
"\n",
"\n",
"models[\"hexllm_tpu\"], endpoints[\"hexllm_tpu\"] = deploy_model_hexllm(\n",
" model_name=common_util.get_job_name_with_datetime(prefix=MODEL_ID),\n",
" model_id=model_id,\n",
Expand Down Expand Up @@ -659,6 +658,7 @@
" vllm_args.append(\"--enable-auto-tool-choice\")\n",
" vllm_args.append(\"--tool-call-parser=vertex-llama-3\")\n",
"\n",
"\n",
" env_vars = {\n",
" \"MODEL_ID\": base_model_id,\n",
" \"DEPLOY_SOURCE\": \"notebook\",\n",
Expand Down Expand Up @@ -704,7 +704,6 @@
"\n",
" return model, endpoint\n",
"\n",
"\n",
"models[\"vllm_gpu\"], endpoints[\"vllm_gpu\"] = deploy_model_vllm(\n",
" model_name=common_util.get_job_name_with_datetime(prefix=\"codegemma-serve-vllm\"),\n",
" model_id=model_id,\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
"\n",
"REGION = \"\" # @param {type:\"string\"}\n",
"\n",
"# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a higher quota\"](https://cloud.google.com/docs/quota/view-manage#requesting_higher_quota).\n",
"# @markdown 3. If you want to run predictions with A100 80GB or H100 GPUs, we recommend using the regions listed below. **NOTE:** Make sure you have associated quota in selected regions. Click the links to see your current quota for each GPU type: [Nvidia A100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_a100_80gb_gpus), [Nvidia H100 80GB](https://console.cloud.google.com/iam-admin/quotas?metric=aiplatform.googleapis.com%2Fcustom_model_serving_nvidia_h100_gpus). You can request for quota following the instructions at [\"Request a quota adjustment\"](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota).\n",
"\n",
"# @markdown | Machine Type | Accelerator Type | Recommended Regions |\n",
"# @markdown | ----------- | ----------- | ----------- |\n",
Expand Down Expand Up @@ -176,9 +176,7 @@
"# @markdown You can also filter by model name.\n",
"model_filter = \"gemma\" # @param {type:\"string\"}\n",
"\n",
"model_garden.list_deployable_models(\n",
" list_hf_models=list_hf_models, model_filter=model_filter\n",
")"
"model_garden.list_deployable_models(list_hf_models=list_hf_models, model_filter=model_filter)"
]
},
{
Expand Down Expand Up @@ -228,10 +226,10 @@
"endpoints[LABEL] = model.deploy(\n",
" hugging_face_access_token=HF_TOKEN,\n",
" use_dedicated_endpoint=use_dedicated_endpoint,\n",
" accept_eula=True, # Accept the End User License Agreement (EULA) on the model card before deploy. Otherwise, the deployment will be forbidden.\n",
" accept_eula = True, # Accept the End User License Agreement (EULA) on the model card before deploy. Otherwise, the deployment will be forbidden.\n",
")\n",
"\n",
"endpoint = endpoints[LABEL]"
"endpoint=endpoints[LABEL]"
]
},
{
Expand Down
Loading
Loading