IntegratedML Custom Models allow you to bring your own Python machine learning models directly into IRIS SQL workflows. This enables in-database machine learning without data movement.
CREATE MODEL YourModelName
PREDICTING (target_column)
FROM YourTable
USING YourCustomModelClass
WITH (parameter1=value1, parameter2=value2)SELECT id, feature1, feature2,
PREDICT(YourModelName) as prediction
FROM NewDataVALIDATE MODEL YourModelName
FROM TestDataYour Python model must:
-
Inherit from IntegratedML base classes:
ClassificationModelfor classification tasksRegressionModelfor regression tasksEnsembleModelfor ensemble approaches
-
Implement required methods:
fit(X, y)- Train the modelpredict(X)- Make predictions_validate_parameters()- Validate configuration
-
Be scikit-learn compatible for integration with IRIS
from shared.models.classification import ClassificationModel
class CustomCreditRiskClassifier(ClassificationModel):
def __init__(self, enable_debt_ratio=True, decision_threshold=0.5):
super().__init__()
self.enable_debt_ratio = enable_debt_ratio
self.decision_threshold = decision_threshold
self.model = None
def fit(self, X, y):
# Custom feature engineering
X_engineered = self._engineer_features(X)
# Train your model
from sklearn.ensemble import RandomForestClassifier
self.model = RandomForestClassifier()
self.model.fit(X_engineered, y)
return self
def predict(self, X):
X_engineered = self._engineer_features(X)
probabilities = self.model.predict_proba(X_engineered)[:, 1]
return (probabilities > self.decision_threshold).astype(int)
def _engineer_features(self, X):
# Your custom feature engineering logic
return X # Simplified for exampleCustom models are registered with IRIS through the following process:
-
Model Placement: Place your Python model files in the IRIS container at:
/usr/irissys/mgr/python/custom_models/Models must be organized by type (classifiers, regressors, etc.)
-
Model Discovery: IRIS automatically discovers models that:
- Inherit from
sklearn.base.BaseEstimator - Implement required
fit()andpredict()methods - Are placed in the correct directory structure
- Inherit from
-
SQL Registration: Register the model using the JSON USING clause:
CREATE MODEL YourModelName PREDICTING (target_column) FROM YourTable USING { "model_name": "YourCustomModelClass", "path_to_classifiers": "/path/to/models", "isc_models_disabled": 1, "user_params": { "param1": value1, "param2": value2 } }
-
Development:
- Develop your model following scikit-learn conventions
- Test locally with sample data
- Ensure all dependencies are available in IRIS Python environment
-
Container Deployment:
- Copy model files to the IRIS container
- Install any additional Python dependencies
- Create required directory symlinks if needed
-
Model Training:
TRAIN MODEL YourModelName
- IRIS loads your Python class
- Executes the
fit()method with training data - Serializes the trained model for persistence
-
Production Use:
SELECT PREDICT(YourModelName) as prediction FROM ProductionData
-
Version Control:
- Models are versioned through the file system
- Use semantic versioning in model class names (e.g.,
ModelV1,ModelV2) - IRIS maintains model state between training and prediction
-
Model Updates:
- To update a model, create a new version with a different name
- Train the new model version
- Update SQL queries to use the new model name
- Old models remain available until explicitly dropped
-
Model Retirement:
DROP MODEL OldModelName
-
Code Execution:
- Models execute with IRIS process privileges
- Ensure models don't access unauthorized resources
- Validate all model inputs to prevent injection attacks
-
Data Access:
- Models only access data provided through SQL
- No direct file system or network access recommended
- Use IRIS security features to control data access
-
Dependency Management:
- Audit all Python dependencies for vulnerabilities
- Use only trusted packages from official repositories
- Keep dependencies updated with security patches
-
Model Design:
- Keep models lightweight for low-latency predictions
- Implement efficient feature engineering in
_engineer_features() - Use vectorized operations with NumPy/pandas
-
Caching Strategies:
- Cache computed features when possible
- Use model warm-up for initial predictions
- Consider batch predictions for bulk operations
-
Resource Management:
- Monitor memory usage during training
- Implement proper cleanup in model destructors
- Use IRIS monitoring tools to track performance
-
Best Practices:
- Test prediction latency before production deployment
- Profile model performance with realistic data volumes
- Optimize feature engineering pipelines
- Consider model complexity vs. accuracy trade-offs
Every custom model can be tailored through:
- Constructor Parameters - Control model behavior
- Feature Engineering - Domain-specific transformations
- Algorithm Selection - Choose ML algorithms
- Ensemble Strategies - Combine multiple models
Let's customize the Credit Risk model as an example:
# Original model in demos/credit_risk/models/credit_risk_classifier.py
class CustomCreditRiskClassifier(ClassificationModel):
def __init__(self, enable_debt_ratio=True, decision_threshold=0.5):
# Add new parameters for customization
super().__init__()
self.enable_debt_ratio = enable_debt_ratio
self.decision_threshold = decision_threshold
self.enable_age_groups = True # NEW: Age-based risk groups
self.use_ensemble = False # NEW: Option for ensemble
def _engineer_features(self, X):
X_engineered = X.copy()
# NEW: Add age group features
if self.enable_age_groups:
X_engineered['age_group'] = pd.cut(
X['age'],
bins=[0, 25, 35, 50, 100],
labels=['young', 'adult', 'senior', 'elderly']
)
# Existing feature engineering...
return X_engineered# Copy updated model to running container
docker cp demos/credit_risk/models/credit_risk_classifier.py \
iris-community:/opt/iris/mgr/python/custom_models/classifiers/
# Restart IRIS to reload models (optional)
docker exec iris-community iris restart iris quietly# Update Dockerfile to include new models
# In docker/Dockerfile.iris:
COPY demos/*/models/*.py /opt/iris/mgr/python/custom_models/classifiers/
# Rebuild and restart
make clean
make setup# In docker-compose.yml:
services:
iris:
volumes:
- ./demos:/opt/iris/demos:ro
- ./custom_models:/opt/iris/mgr/python/custom_models:ro-- Drop existing model
DROP MODEL IF EXISTS CreditRiskModel;
-- Create model with new parameters
CREATE MODEL CreditRiskModelV2
PREDICTING (default_risk)
FROM CreditApplications
USING {
"model_name": "CustomCreditRiskClassifier",
"path_to_classifiers": "/opt/iris/mgr/python/custom_models/classifiers",
"user_params": {
"enable_debt_ratio": 1,
"enable_age_groups": 1, -- NEW parameter
"use_ensemble": 0, -- NEW parameter
"decision_threshold": 0.45
}
}
-- Train the updated model
TRAIN MODEL CreditRiskModelV2;
-- Use in production
SELECT customer_id,
PREDICT(CreditRiskModelV2) as risk_score
FROM NewApplications;-- Keep both models active
SELECT
customer_id,
PREDICT(CreditRiskModel) as model_v1_score,
PREDICT(CreditRiskModelV2) as model_v2_score,
ABS(PREDICT(CreditRiskModel) - PREDICT(CreditRiskModelV2)) as difference
FROM TestApplications
WHERE difference > 0.1; -- Find cases where models disagree-
Feature Engineering Pipeline:
def _engineer_features(self, X): # Add interaction terms X['income_to_amount'] = X['income'] / X['credit_amount'] # Create polynomial features X['age_squared'] = X['age'] ** 2 # Binning continuous variables X['income_bracket'] = pd.qcut(X['income'], q=5) return X
-
Algorithm Swapping:
def __init__(self, algorithm='random_forest'): self.algorithm = algorithm def fit(self, X, y): if self.algorithm == 'random_forest': self.model = RandomForestClassifier() elif self.algorithm == 'xgboost': self.model = XGBClassifier() elif self.algorithm == 'neural': self.model = MLPClassifier()
-
Hyperparameter Tuning:
def __init__(self, auto_tune=False, **kwargs): self.auto_tune = auto_tune self.hyperparams = kwargs def fit(self, X, y): if self.auto_tune: # Grid search for best parameters param_grid = { 'n_estimators': [100, 200, 300], 'max_depth': [5, 10, 15] } self.model = GridSearchCV( RandomForestClassifier(), param_grid )
- Test model locally with sample data
- Verify scikit-learn compatibility
- Copy model to container
- Create/update symlinks if needed
- Update SQL CREATE MODEL statement
- Train model with production data
- Validate model performance
- Monitor prediction latency
This repository provides four complete examples:
- Credit Risk Assessment - Financial risk scoring
- Fraud Detection - Real-time fraud detection
- Sales Forecasting - Time series forecasting
- DNA Similarity - Sequence analysis
- See PRD.md for complete feature documentation
- Check CLAUDE.md for development guidance
- Run
python run_all_demos.py --quickto see examples in action
Note: This guide needs to be updated with the actual IntegratedML Custom Models syntax and implementation details from the official documentation.