Release Notes

AI 2.4.0 New and Optimized FeaturesImage Based Model CatalogKubeflow 1.11 UpgradeWorkbench Pipeline Orchestration with ElyraLlama Stack PGVector IntegrationAscend NPU Fine-tuning with Kubeflow Trainer v2ModelSlim Integration for Ascend NPU Model CompressionvLLM-ascend Custom RuntimeImage Based Model Catalog StorageClass ConfigurationDeprecated FeaturesFixed IssuesKnown Issues

AI 2.4.0

New and Optimized Features

Image Based Model Catalog

Image Based Model Catalog provides a centralized catalog for discovering and deploying ready-to-use models. Users can deploy built-in models as inference services without manually preparing model repositories. Administrators configure the OCI registry used by the catalog and import the built-in model images before users deploy them.

Kubeflow 1.11 Upgrade

Kubeflow Base, Kubeflow Pipelines, and Kubeflow Trainer v2 are upgraded to the 1.11 release line. This keeps the platform aligned with the upstream Kubeflow community and includes fixes that are already available in the community release.

Workbench Pipeline Orchestration with Elyra

Elyra Pipeline Editor provides a visual editor in JupyterLab for creating pipeline workflows from notebooks and scripts. Users can define pipeline steps in Workbench and submit the workflow to Kubeflow Pipelines without writing the complete pipeline specification manually.

Llama Stack PGVector Integration

Llama Stack now supports PGVector-backed vector stores for agent and retrieval workflows. Users can use PostgreSQL with the pgvector extension as the vector database backend when building Llama Stack applications that upload files and run vector search.

Ascend NPU Fine-tuning with Kubeflow Trainer v2

Kubeflow Trainer v2 supports running model fine-tuning jobs on Huawei Ascend NPUs. Users can submit Trainer v2 jobs for MindSpeed-LLM workflows, including checkpoint conversion, dataset preprocessing, and supervised fine-tuning, and use Kueue scheduling when cluster quotas are configured.

ModelSlim Integration for Ascend NPU Model Compression

ModelSlim integration provides model compression and quantization workflows for Huawei Ascend NPU environments. Users can run ModelSlim-based workflows from a Workbench image prepared with CANN and PyTorch, following the Ascend-native compression path described in the product documentation.

vLLM-ascend Custom Runtime

vLLM-ascend enables model serving on Huawei Ascend NPUs through a custom inference runtime. Administrators can register the vLLM-ascend runtime for Ascend-backed inference, and users can select it when deploying compatible large language models as inference services.

Image Based Model Catalog StorageClass Configuration

Image Based Model Catalog installation supports configurable persistent storage for catalog services. Administrators can reuse an existing PVC or create a new PVC, and specify the StorageClass used when new storage is provisioned.

Deprecated Features

Gitlab based Model Catalog is deprecated. For built-in models and new model delivery workflows, use Image Based Model Catalog with OCI model artifacts instead.

Fixed Issues

When using VictoriaMetrics for monitoring data collection of inference services operating in Serverless mode, there is a known issue where the inference services cannot scale down to zero.

Known Issues

When using VictoriaMetrics for monitoring data collection of inference services operating in Serverless mode, there is a known issue where the inference services cannot scale down to zero.

#Release Notes

#TOC

#AI 2.4.0

#New and Optimized Features

#Image Based Model Catalog

#Kubeflow 1.11 Upgrade

#Workbench Pipeline Orchestration with Elyra

#Llama Stack PGVector Integration

#Ascend NPU Fine-tuning with Kubeflow Trainer v2

#ModelSlim Integration for Ascend NPU Model Compression

#vLLM-ascend Custom Runtime

#Image Based Model Catalog StorageClass Configuration

#Deprecated Features

#Fixed Issues

#Known Issues