Truss + XGBoost for Rapid Model Deployment
By Jesse Mostipak in technical
September 16, 2022
Note: this post was originally published as a Kaggle notebook
⬆️ Two of my favorite things: Truss + XGBoost
If you watched season 1 of #SLICED, then you might know that XGBoost is my favorite model framework (mostly because I couldn’t get CatBoost to work in R, but that’s a story for another time!) So of course when I saw that Truss–an open source Python package for model deployment–handles the serialization and packaging of XGBoost models, I knew I had to make a notebook 😍
If this is the first Truss notebook that you’re seeing, be sure to check out my other Truss-related notebooks on Kaggle:
- Use Truss to deploy a model from a Kaggle notebook, which uses a Random Forest Classifier from scikit-learn
- Truss + Hugging Face == 🤗 💚
- Truss + PyTorch == 🔥
💻 Let’s code!
We’ll go ahead and build a relatively simple model using XGBoost, and focus on the (very few!) lines of code needed to deploy our model using Truss. Truss is a fairly robust package, and at a high level, some of the things you can accomplish with Truss are:
- Turn your Python model into a microservice with a production-ready API endpoint, no need for Flask or Django
- For most popular frameworks, includes automatic model serialization and deserialization
- Freezes dependencies via Docker to make your training environment portable
⬇️ Install the truss and baseten packages
!pip install truss
!pip install baseten
🏞️ Set up our environment
import baseten
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from truss import mk_truss
🔍 Set up our XGBoost classification model
def create_data():
X, y = make_classification(n_samples=100,
n_informative=2,
n_classes=2,
n_features=6)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
train = xgb.DMatrix(X_train, y_train)
test = xgb.DMatrix(X_test, y_test)
return train, test
train, test = create_data()
params = {
"learning_rate": 0.01,
"max_depth": 3
}
📚 Train our model
# training, we set the early stopping rounds parameter
model = xgb.train(params,
train, evals=[(train, "train"), (test, "validation")],
num_boost_round=100, early_stopping_rounds=20)
🌉 Create a Truss to serialize and package our model
xgboost_truss = mk_truss(model)
🚀 Deploy our model to Baseten (or any other service that runs Docker)
To deploy our model we’ll need a couple of things: A Baseten account sign up here! Your API key, which you can generate here
It only takes a minute to sign up on Baseten, and it might take even less time to deploy your model. The following two lines of code are all you need to take your Truss-packaged model off of Kaggle and into a production environment!
In this notebook we’re deploying to Baseten, which gives you the option of then using your model in our drag-and-drop View Builder (which is pretty awesome and I encourage you to check it out! Get started with our tutorial), but you can deploy your model to any service that uses Docker.
baseten.login("your-api-key-here")
baseten.deploy(xgboost_truss)
🤝 Get involved!
Truss is an open source package, and we welcome your involvement! If you’d like to keep up to date with package development, the best way is to star the GitHub repo. And if you’d like to contribute to the development of Truss, we’ve created an excellent Contributors’ Guide to get started!
And if there’s anything I haven’t covered regarding Truss that you’d like to see, drop a comment below 💚