On this article, you’ll study what cuML is, and the way it can considerably pace up the coaching of machine studying fashions by way of GPU acceleration.
Subjects we’ll cowl embody:
- The intention and distinctive options of cuML.
- How you can put together datasets and practice a machine studying mannequin for classification with cuML in a scikit-learn-like style.
- How you can simply examine outcomes with an equal standard scikit-learn mannequin, when it comes to classification accuracy and coaching time.
Let’s not waste any extra time.

A Arms-On Introduction to cuML for GPU-Accelerated Machine Studying Workflows
Picture by Editor | ChatGPT
Introduction
This text gives a hands-on Python introduction to cuML, a Python library from RAPIDS AI (an open-source suite inside NVIDIA) for GPU-accelerated machine studying workflows throughout extensively used fashions. Along with its knowledge science–oriented sibling, cuDF, cuML has gained reputation amongst practitioners who want scalable, production-ready machine studying options.
The hands-on tutorial under makes use of cuML along with cuDF for GPU-accelerated dataset administration in a DataFrame format. For an introduction to cuDF, take a look at this associated article.
About cuML: An “Accelerated Scikit-Study”
RAPIDS cuML (quick for CUDA Machine Studying) is an open-source library that accelerates scikit-learn–fashion machine studying on NVIDIA GPUs. It offers drop-in replacements for a lot of common algorithms, typically decreasing coaching and inference occasions on massive datasets — with out main code adjustments or a steep studying curve for these conversant in scikit-learn.
Amongst its three most distinctive options:
- cuML follows a scikit-learn-like API, easing the transition from CPU to GPU for machine studying with minimal code adjustments
- It covers a broad set of strategies — all GPU-accelerated — together with regression, classification, ensemble strategies, clustering, and dimensionality discount
- By way of tight integration with the RAPIDS ecosystem, cuML works hand-in-hand with cuDF for knowledge preprocessing, in addition to with associated libraries to facilitate end-to-end, GPU-native pipelines
Arms-On Introductory Instance
For example the fundamentals of cuML for constructing GPU-accelerated machine studying fashions, we’ll think about a pretty big, but simply accessible, dataset by way of public URL in Jason Brownlee’s repository: the grownup earnings dataset. It is a massive, barely class-unbalanced dataset meant for binary classification duties, particularly predicting whether or not an grownup’s earnings degree is excessive (above $50K) or low (under $50K) primarily based on a set of demographic and socio-economic options. Due to this fact, we intention to construct a binary classification mannequin.
IMPORTANT: To run the code under on Google Colab or an identical pocket book setting, be sure you change the runtime kind to GPU; in any other case, a warning will probably be raised indicating cuDF can’t discover the precise CUDA driver library it makes use of.
We begin by importing the mandatory libraries for our situation:
import cudf import cuml from cuml.model_selection import train_test_split as gpu_train_test_split from cuml.linear_model import LogisticRegression as cuLogReg from IPython.show import show
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import time |
Be aware that, along with cuML modules and capabilities to separate the dataset and practice a logistic regression classifier, we’ve additionally imported their classical scikit-learn counterparts. Whereas not obligatory for utilizing cuML (as it really works independently from plain scikit-learn), we’re importing equal scikit-learn parts for the sake of comparability in the remainder of the instance.
Subsequent, we load the dataset right into a cuDF dataframe optimized for GPU utilization:
url = “https://uncooked.githubusercontent.com/jbrownlee/Datasets/grasp/adult-all.csv” # Column names (they aren’t included within the dataset’s CSV file we’ll learn) cols = [ “age”,“workclass”,“fnlwgt”,“education”,“education_num”, “marital_status”,“occupation”,“relationship”,“race”,“sex”, “capital_gain”,“capital_loss”,“hours_per_week”,“native_country”,“income” ]
df = cudf.read_csv(url, header=None, names=cols) show(df.head()) |
As soon as the info is loaded, we establish the goal variable and convert it into binary (1 for prime earnings, 0 for low earnings):
df[“income”] = df[“income”].str.strip() df[“income”] = (df[“income”] == “>50K”).astype(“int32”) |
This dataset combines numeric options with a slight predominance of categorical ones. Most scikit-learn fashions — together with resolution timber and logistic regression — don’t natively deal with string-valued categorical options, so that they require encoding. An identical sample applies to cuML; therefore, we’ll choose a small variety of options to coach our classifier and one-hot encode the explicit ones.
# Function choice (as an example primarily based on area experience!) options = [“age”,“education_num”,“hours_per_week”,“workclass”,“occupation”,“sex”] X = df[features] y = df[“income”]
# One-hot encode categorical options X_enc = cudf.get_dummies(X, drop_first=True) print(“Encoded characteristic form:”, X_enc.form) |
To this point, we’ve used cuML (and likewise cuDF) very like utilizing classical scikit-learn together with Pandas.
Now comes the attention-grabbing half. We are going to break up the dataset into coaching and take a look at units and practice a logistic regression classifier twice, utilizing each CUDA GPU (cuML) and standalone scikit-learn. We are going to then examine each the classification accuracy and the time taken to coach every mannequin. Right here’s the entire code for the mannequin coaching and comparability:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | # MODEL 1: GPU (cuML) train-test break up and coaching t0 = time.time() X_train, X_test, y_train, y_test = gpu_train_test_split(X_enc, y, test_size=0.2, random_state=42)
model_gpu = cuLogReg(max_iter=1000) model_gpu.match(X_train, y_train) gpu_time = time.time() – t0
acc_gpu = model_gpu.rating(X_test, y_test) print(f“cuML Logistic Regression accuracy: {acc_gpu:.4f}, time: {gpu_time:.3f} sec”)
# MODEL 2: Scikit-learn and Pandas-driven train-test break up and mannequin coaching df_pd = pd.read_csv(url, header=None, names=cols) df_pd[“income”] = df_pd[“income”].str.strip() df_pd[“income”] = (df_pd[“income”] == “>50K”).astype(“int32”)
X_pd = df_pd[features] y_pd = df_pd[“income”] X_pd = pd.get_dummies(X_pd, drop_first=True)
t0 = time.time() X_train_pd, X_test_pd, y_train_pd, y_test_pd = train_test_split(X_pd, y_pd, test_size=0.2, random_state=42)
model_cpu = LogisticRegression(max_iter=1000) model_cpu.match(X_train_pd, y_train_pd) cpu_time = time.time() – t0
acc_cpu = model_cpu.rating(X_test_pd, y_test_pd) print(f“scikit-learn Logistic Regression accuracy: {acc_cpu:.4f}, time: {cpu_time:.3f} sec”) |
The outcomes are fairly attention-grabbing. They need to look one thing like:
cuML Logistic Regression accuracy: 0.8014, time: 0.428 sec scikit–study Logistic Regression accuracy: 0.8097, time: 15.184 sec |
As we are able to observe, the mannequin skilled with cuML achieved very related classification efficiency to its classical scikit-learn counterpart, but it surely skilled over an order of magnitude quicker: about 0.5 seconds in comparison with roughly 15 seconds for the scikit-learn classifier. Your actual numbers will differ with {hardware}, drivers, and library variations.
Wrapping Up
This text offered a delicate, hands-on introduction to the cuML library for enabling GPU-boosted building of machine studying fashions for classification, regression, clustering, and extra. By way of a easy comparability, we confirmed how cuML will help construct efficient fashions with considerably enhanced coaching effectivity.