Truss + XGBoost for Rapid Model Deployment

By Jesse Mostipak in technical

September 16, 2022

Note: this post was originally published as a Kaggle notebook

⬆️ Two of my favorite things: Truss + XGBoost

If you watched season 1 of #SLICED, then you might know that XGBoost is my favorite model framework (mostly because I couldn’t get CatBoost to work in R, but that’s a story for another time!) So of course when I saw that Truss–an open source Python package for model deployment–handles the serialization and packaging of XGBoost models, I knew I had to make a notebook 😍

If this is the first Truss notebook that you’re seeing, be sure to check out my other Truss-related notebooks on Kaggle:

Use Truss to deploy a model from a Kaggle notebook, which uses a Random Forest Classifier from scikit-learn
Truss + Hugging Face == 🤗 💚
Truss + PyTorch == 🔥

💻 Let’s code!

We’ll go ahead and build a relatively simple model using XGBoost, and focus on the (very few!) lines of code needed to deploy our model using Truss. Truss is a fairly robust package, and at a high level, some of the things you can accomplish with Truss are:

Turn your Python model into a microservice with a production-ready API endpoint, no need for Flask or Django
For most popular frameworks, includes automatic model serialization and deserialization
Freezes dependencies via Docker to make your training environment portable

⬇️ Install the truss and baseten packages

!pip install truss

!pip install baseten

🏞️ Set up our environment

import baseten
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from truss import mk_truss

🔍 Set up our XGBoost classification model

def create_data():
    X, y = make_classification(n_samples=100,
                           n_informative=2,
                           n_classes=2,
                           n_features=6)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
    train = xgb.DMatrix(X_train, y_train)
    test = xgb.DMatrix(X_test, y_test)
    return train, test

train, test = create_data()
params = {
    "learning_rate": 0.01,
    "max_depth": 3
}

📚 Train our model

# training, we set the early stopping rounds parameter
model = xgb.train(params,
        train, evals=[(train, "train"), (test, "validation")],
        num_boost_round=100, early_stopping_rounds=20)

🌉 Create a Truss to serialize and package our model

xgboost_truss = mk_truss(model)

🚀 Deploy our model to Baseten (or any other service that runs Docker)

To deploy our model we’ll need a couple of things: A Baseten account sign up here! Your API key, which you can generate here

It only takes a minute to sign up on Baseten, and it might take even less time to deploy your model. The following two lines of code are all you need to take your Truss-packaged model off of Kaggle and into a production environment!

In this notebook we’re deploying to Baseten, which gives you the option of then using your model in our drag-and-drop View Builder (which is pretty awesome and I encourage you to check it out! Get started with our tutorial), but you can deploy your model to any service that uses Docker.

baseten.login("your-api-key-here")
baseten.deploy(xgboost_truss)

🤝 Get involved!

Truss is an open source package, and we welcome your involvement! If you’d like to keep up to date with package development, the best way is to star the GitHub repo. And if you’d like to contribute to the development of Truss, we’ve created an excellent Contributors’ Guide to get started!

And if there’s anything I haven’t covered regarding Truss that you’d like to see, drop a comment below 💚

Posted on:: September 16, 2022

Length:: 3 minute read, 551 words

Categories:: technical

Tags:: xgboost truss kaggle baseten python

See Also:: Baseten Technical Writing; No more tears: the easy way to install Python on your machine; TPUs + Cassava Leaf Disease