Skip to content

Commit 86674ef

Browse files
Add and update existing vertex skills (#4467)
* Add and update existing vertex skills - Add support for fine tuning for 1p gemini tuning - Add support for deploying fine tuned model support - Add support for running inference on MaaS models - Add open model support for regions and cost estimating for 3p tuning * fixing some of the commit errors * updated scripts to use existing gemini 1.5 pro model * swap gemini 1.5 pro to gemini 2.5 pro
1 parent 8b4708c commit 86674ef

23 files changed

Lines changed: 2185 additions & 214 deletions

skills/vertex-deploy/SKILL.md

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
<!--
2+
Copyright 2026 Google LLC
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
---
18+
name: Vertex AI Model Garden Deploy
19+
description: Deploy open models or custom weights to Vertex AI endpoints.
20+
---
21+
22+
# Vertex AI Model Garden Deploy Skill
23+
24+
This skill provides instructions for deploying Open Models from Vertex AI Model
25+
Garden to endpoints, and subsequently undeploying them to clean up resources.
26+
27+
## 1. Prerequisites
28+
29+
Before deploying, ensure you have the correct project and region set. The
30+
commands below use placeholder variables `PROJECT_ID` and `LOCATION_ID`.
31+
32+
Ensure you are authenticated:
33+
34+
```bash
35+
gcloud auth login
36+
gcloud auth application-default login
37+
gcloud config set project $PROJECT_ID`
38+
```
39+
40+
## 2. Discovering Deployable Models
41+
42+
You can list models available in Model Garden and check if they can be
43+
self-deployed.
44+
45+
```bash
46+
gcloud ai model-garden models list
47+
```
48+
49+
To see what machine types and accelerators are supported for a specific model
50+
(e.g., `google/gemma3@gemma-3-27b-it`):
51+
52+
```bash
53+
gcloud ai model-garden models list-deployment-config \
54+
--model="google/gemma3@gemma-3-27b-it"
55+
```
56+
57+
> [!NOTE] Some models, especially Hugging Face models, might require a Hugging
58+
> Face Access Token for deployment.
59+
60+
> [!TIP] **Model Recommendation Instructions:** If a user asks to deploy a model
61+
> but **does not specify which one**, you should recommend a model based on
62+
> their use case (e.g., Llama 3.3 70B for general purpose or Gemma 3 for
63+
> lightweight tasks). * You **MUST** ensure you are recommending the **latest
64+
> version** or **popular version** of the suggested model family. * You **MUST**
65+
> verify the model is currently deployable using `gcloud ai model-garden models
66+
> list` before suggesting it to the user.
67+
68+
## 3. Deploying a Model
69+
70+
> [!WARNING] Deploying models, especially large ones, consumes significant
71+
> compute resources and incurs costs. 1. You **MUST** refer to
72+
> [Vertex AI prediction pricing](https://cloud.google.com/vertex-ai/pricing#prediction-and-explanation)
73+
> to calculate a rough cost estimation based on the requested `--machine-type`
74+
> and `--accelerator-type` (and count). 2. You **MUST** present this cost
75+
> estimation to the user and warn them that this is the **list price**, which
76+
> may differ from their actual bill due to potential discounts or reservations.
77+
> 3. You **MUST ALWAYS** request explicit confirmation from the user agreeing to
78+
> the estimated cost before executing any `deploy` command.
79+
80+
To deploy a model, use the `deploy` command. It is highly recommended to use the
81+
`--asynchronous` flag for long-running deployments, and then poll the status if
82+
necessary.
83+
84+
### Example: Deploying Gemma 3
85+
86+
Here is a typical bash script to deploy a model. You can run this block
87+
directly.
88+
89+
```bash
90+
#!/bin/bash
91+
# Example script to deploy a model from Model Garden
92+
93+
PROJECT_ID=$(gcloud config get-value project)
94+
LOCATION_ID="us-central1" # Recommended default region
95+
MODEL_ID="google/gemma3@gemma-3-27b-it" # Replace with your chosen model ID
96+
97+
echo "Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID..."
98+
99+
# Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.
100+
# Below is a comprehensive command with all supported parameters:
101+
gcloud ai model-garden models deploy \
102+
--project=$PROJECT_ID \
103+
--region=$LOCATION_ID \
104+
--model=$MODEL_ID \
105+
--machine-type="g2-standard-48" \
106+
--accelerator-type="NVIDIA_L4" \
107+
--accelerator-count=4 \
108+
--endpoint-display-name="my-gemma-deployment" \
109+
--hugging-face-access-token="YOUR_HF_TOKEN" \
110+
--reservation-affinity="reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation" \
111+
--asynchronous
112+
113+
echo "Deployment initiated asynchronously."
114+
echo "Check the Google Cloud Console (Vertex AI -> Online Prediction) for status."
115+
```
116+
117+
### Example: Deploying Custom Weights
118+
119+
To deploy a model using custom weights, you can use the exact same `deploy`
120+
command. Instead of providing the model garden model ID, provide the Google
121+
Cloud Storage (GCS) URI to your custom weights folder in the `--model` flag.
122+
123+
```bash
124+
#!/bin/bash
125+
# Example script to deploy a model with custom weights from a GCS bucket
126+
127+
PROJECT_ID=$(gcloud config get-value project)
128+
LOCATION_ID="us-central1"
129+
# Replace with the gs:// URI pointing to your custom weights
130+
MODEL_GCS_URI="gs://your-bucket-name/path/to/custom-weights"
131+
132+
echo "Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID..."
133+
134+
gcloud ai model-garden models deploy \
135+
--project=$PROJECT_ID \
136+
--region=$LOCATION_ID \
137+
--model=$MODEL_GCS_URI \
138+
--machine-type="g2-standard-12" \
139+
--accelerator-type="NVIDIA_L4" \
140+
--endpoint-display-name="my-custom-model" \
141+
--asynchronous
142+
143+
echo "Deployment initiated asynchronously."
144+
```
145+
146+
## 4. Checking Deployment Status
147+
148+
When you deploy a model asynchronously using the `--asynchronous` flag, the
149+
`deploy` command will return an operation ID. You can use this ID to check the
150+
ongoing status of the deployment.
151+
152+
```bash
153+
gcloud ai operations describe YOUR_OPERATION_ID \
154+
--region=$LOCATION_ID
155+
```
156+
157+
> [!NOTE] As an agent, you can also offer to check the status of a deployment
158+
> for the user if they provide an operation ID or if they just initiated the
159+
> deployment with you.
160+
161+
Alternatively, you can list your endpoints to see if it shows up and check the
162+
Cloud Console under the "Online prediction" tab.
163+
164+
```bash
165+
gcloud ai endpoints list \
166+
--region=$LOCATION_ID
167+
```
168+
169+
Note: Large models (like Llama 3.1 8B or Gemma 27B) may take 15-20 minutes to
170+
fully deploy and start serving.
171+
172+
### Verifying Deployment
173+
174+
If the model is successfully deployed, verify by making a prediction call to
175+
test. Because Model Garden models are often deployed to Dedicated Endpoints, you
176+
shouldn't use `gcloud ai endpoints predict`. Instead, you must fetch the
177+
endpoint's dedicated DNS name and send a `curl` request.
178+
179+
> [!TIP] Ask the user to try using their own prompt to see the results.
180+
> Otherwise use the default.
181+
182+
Use the following script:
183+
184+
```bash
185+
#!/bin/bash
186+
PROJECT_ID=$(gcloud config get-value project)
187+
LOCATION_ID="us-central1"
188+
ENDPOINT_ID="YOUR_ENDPOINT_ID"
189+
PROMPT=${1:-"Explain quantum computing in simple terms."}
190+
191+
echo "Fetching dedicated Endpoint DNS..."
192+
ENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format="value(dedicatedEndpointDns)")
193+
194+
if [ -z "$ENDPOINT_URL" ]; then
195+
echo "Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID."
196+
exit 1
197+
fi
198+
199+
echo "Sending prediction request to $ENDPOINT_URL..."
200+
curl -X POST \
201+
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
202+
-H "Content-Type: application/json" \
203+
"https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions" \
204+
-d '{
205+
"model": "'"$ENDPOINT_ID"'",
206+
"messages": [
207+
{
208+
"role": "user",
209+
"content": "'"$PROMPT"'"
210+
}
211+
]
212+
}'
213+
```
214+
215+
## 5. Undeploying and Cleaning Up
216+
217+
To stop incurring charges, you must undeploy the model from the endpoint. This
218+
is a multi-step process if you don't already have the exact endpoint and
219+
deployed model IDs.
220+
221+
### Example: Finding and Undeploying a Model
222+
223+
Here is a bash script demonstrating how to find the IDs and undeploy the model.
224+
225+
```bash
226+
#!/bin/bash
227+
# Example script to undeploy a model
228+
229+
PROJECT_ID=$(gcloud config get-value project)
230+
LOCATION_ID="us-central1"
231+
# The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)
232+
# It's usually easier to find the specific ID via `gcloud ai models list`
233+
# For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.
234+
235+
# 1. Find the Endpoint ID
236+
echo "Listing endpoints in $LOCATION_ID:"
237+
gcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID
238+
239+
# (Assuming you extracted ENDPOINT_ID from the above output)
240+
# ENDPOINT_ID="your_endpoint_id"
241+
242+
# 2. Find the Deployed Model ID
243+
echo "Listing models in $LOCATION_ID to find model description:"
244+
gcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID
245+
246+
# (Assuming you found the specific MODEL_ID)
247+
# MODEL_ID="your_model_id"
248+
# gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID
249+
# (Extract the deployedModelId from the output)
250+
# DEPLOYED_MODEL_ID="your_deployed_model_id"
251+
252+
# 3. Undeploy
253+
# Uncomment and replace the variables below to actually perform the undeployment
254+
# echo "Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID..."
255+
# gcloud ai endpoints undeploy-model $ENDPOINT_ID \
256+
# --project=$PROJECT_ID \
257+
# --region=$LOCATION_ID \
258+
# --deployed-model-id=$DEPLOYED_MODEL_ID
259+
#
260+
# echo "Model undeployed."
261+
262+
# 4. Delete Endpoint
263+
# echo "Deleting endpoint $ENDPOINT_ID..."
264+
# gcloud ai endpoints delete $ENDPOINT_ID \
265+
# --project=$PROJECT_ID \
266+
# --region=$LOCATION_ID \
267+
# --quiet
268+
# echo "Endpoint deleted."
269+
270+
# 5. Delete Model
271+
# echo "Deleting model $MODEL_ID..."
272+
# gcloud ai models delete $MODEL_ID \
273+
# --project=$PROJECT_ID \
274+
# --region=$LOCATION_ID \
275+
# --quiet
276+
# echo "Model deleted."
277+
```
278+
279+
> [!WARNING] Failing to undeploy a model will result in continuous charges for
280+
> the allocated compute resources, even if you are not sending prediction
281+
> requests. Always clean up after testing.
282+
283+
## 6. Troubleshooting
284+
285+
### Deployment Failure: Quota or Resource Exhausted
286+
287+
If your deployment fails (or stays in an error state) due to `QUOTA_EXCEEDED` or
288+
`RESOURCE_EXHAUSTED` errors, the specific hardware requested (e.g., `NVIDIA_L4`
289+
or `g2-standard-24`) is either not available in your chosen region or exceeds
290+
your project's quota limits.
291+
292+
**Solution:** Look closely at the error message returned. It will often
293+
recommend an alternative region or machine type that currently has availability.
294+
**Ask the user for confirmation** to retry the deployment using the suggested
295+
`--region` or `--machine-type` parameters.
296+
297+
> [!WARNING] If the alternative suggestions involve changing the machine type or
298+
> accelerator, you **MUST** recalculate the estimated cost using
299+
> [Vertex AI prediction pricing](https://cloud.google.com/vertex-ai/pricing#prediction-and-explanation),
300+
> warn the user about list prices versus actual billing, and get their explicit
301+
> confirmation for the new cost before retrying the deployment.

0 commit comments

Comments
 (0)