Quickstart

This section provides a quickstart example for creating an AI Agent with Llama Stack.

Prerequisites

  • Python 3.12 or higher (if not satisfied, refer to FAQ: How to prepare Python 3.12 in Notebook)
  • Llama Stack Server installed and running via Operator (see Install Llama Stack), with VLLM_URL pointing at a vLLM-served model endpoint and POSTGRES_* configured for server persistence (see install notes)
  • Access to a Notebook environment (e.g., Jupyter Notebook, JupyterLab)
  • Python environment with llama-stack-client==0.7.1, fastmcp (for the MCP section), and other notebook dependencies installed

Quickstart Example

A simple example of creating an AI Agent with Llama Stack is available in the following resources:

Download the notebook and upload it to a Notebook environment to run.

The notebook demonstrates:

  • Two tool options: client-side tools (@client_tool) and MCP tools (FastMCP + toolgroups.register)
  • Shared agent flow: connect to Llama Stack Server, select a model, create an Agent with tools=AGENT_TOOLS, then run sessions and streaming turns
  • Optional vector store flows: upload a file, create a pgvector or milvus-remote backed vector store, and run a search query
  • Streaming responses and event logging
  • Optional FastAPI deployment of the agent

Vector Store Usage

The downloadable notebook includes optional PGVector and Milvus sections.

For PGVector, start the server with ENABLE_PGVECTOR=true and valid PGVECTOR_* connection settings, then execute the PGVector cells in the notebook. ACP-provided PostgreSQL can be used directly because it already includes the pgvector extension.

For Milvus, start the server with MILVUS_ENDPOINT, optional MILVUS_TOKEN, and MILVUS_CONSISTENCY_LEVEL, then execute the Milvus cells in the notebook. Use provider_id="milvus-remote" in the client request.

For both vector-store examples, client.models.list() must include an embedding model, for example sentence-transformers/nomic-ai/nomic-embed-text-v1.5. If it only returns LLM models, restart the LlamaStackDistribution with ENABLE_SENTENCE_TRANSFORMERS=true and configure Hugging Face cache/download access as described in Install Llama Stack.

The notebook example covers:

  • Uploading a file through client.files.create(...)
  • Creating a vector store with provider_id="pgvector" or provider_id="milvus-remote"
  • Passing embedding_model and embedding_dimension through client.vector_stores.create(..., extra_body=...)
  • Running a search with client.vector_stores.search(...); PGVector uses search_mode="hybrid" in extra_body

FAQ

How to prepare Python 3.12 in Notebook

  1. Download the pre-compiled Python installation package:

    wget -O /tmp/python312.tar.gz https://github.com/astral-sh/python-build-standalone/releases/download/20260114/cpython-3.12.12+20260114-x86_64-unknown-linux-gnu-install_only.tar.gz
  2. Extract with:

    mkdir -p ~/python312
    tar -xzf /tmp/python312.tar.gz -C ~/python312 --strip-components=1
  3. Install and Register Kernel:

    export PATH="${HOME}/python312/bin:${PATH}"
    
    python3 -m pip install ipykernel
    python3 -m ipykernel install --user --name python312 --display-name "Python 3.12"
  4. Switch kernel in the notebook page:

    • Open your Notebook environment (e.g., Jupyter Notebook or JupyterLab) in the browser, then open an existing notebook or create a new one.
    • In the notebook interface, find the current kernel name (usually shown in the top-right corner of the page, e.g., "Python 3" or "python3").
    • Click that kernel name, or use the menu Kernel → Change Kernel.
    • In the kernel list, select "Python 3.12" (the display name registered in step 3).
    • After switching, new cells will run with Python 3.12.

Note: When executing python and pip commands directly in the notebook page, the default python will still be used. You need to specify the full path to use the python312 version commands.

Additional Resources

For more resources on developing AI Agents with Llama Stack, see: