AI Agent for SREs: MCP Servers + Gen AI Studio on OpenShift AI

The Problem: Too Many Tools, Too Little Time

Troubleshooting in OpenShift means constantly jumping between tools: Prometheus for metrics, Alertmanager for alerts, kubectl for resources, and the Red Hat Knowledge Base for solutions. Each context switch wastes valuable time and increases the chance of missing something critical.

What if you could query all these sources from a single conversation?

That’s exactly what an AI agent with MCP Server access enables. In this article we’ll build an operations assistant that can:

Query Prometheus metrics
Review and silence Alertmanager alerts
Interact with Kubernetes resources
Search for solutions in the Red Hat KB

All using the new Tech Preview features from Red Hat OpenShift AI 3.2.

What is Red Hat OpenShift AI?

Before diving into details, let me explain what OpenShift AI is for those who don’t know it.

Red Hat OpenShift AI is Red Hat’s artificial intelligence platform built on top of OpenShift. In simple terms: it’s where you can develop, train, serve, and manage AI models in an enterprise-grade way.

Why does it matter? Because having an AI model isn’t enough — you need:

Infrastructure to run it (GPUs, storage, networking)
Tools to experiment (notebooks, pipelines)
Governance to control access
Observability to know what your model is doing
Integration with your enterprise stack

OpenShift AI gives you all of this on top of Kubernetes.

The Big Evolution: From RHOAI 2.x to 3.x

If you’ve been following OpenShift AI, you know that version 3.0 (late 2025) brought a significant philosophical change.

RHOAI 2.x (2023-2025)

Focused on traditional MLOps: notebooks, pipelines, model serving
Main interface: Open Data Hub Dashboard
Model serving with KServe and ModelMesh
Data Science Pipelines based on Kubeflow

RHOAI 3.x (2025-present)

New vision: Agentic AI and Enterprise Generative AI
Two new experiences:
- AI Hub — For Platform Engineers (catalog, registry, deployments)
- Gen AI Studio — For AI Engineers (playground, experimentation, MCP)
Native support for Llama Stack API
Integration with Model Context Protocol (MCP)

The key difference: RHOAI 2.x helped you train and serve models. RHOAI 3.x helps you build AI agents that can reason, plan, and act.

Tech Preview in RHOAI 3.2: What I Used

Red Hat releases new features as Technology Preview before they’re fully supported. In my experiment I used several of them:

Feature	Status	What It Does
Gen AI Studio	Tech Preview	Playground for experimenting with models and MCP
AI Hub	Developer Preview	Dashboard for managing AI assets
Llama Stack	Developer Preview	Unified API for RAG, safety, tool calling
MCP Servers	Developer Preview	Protocol for connecting LLMs to tools

How to Enable Gen AI Studio

In my case, with RHOAI 3.2.0, Gen AI Studio was already enabled. But if you need to enable it manually, you must modify the DataScienceCluster:

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    dashboard:
      managementState: Managed
      devFlags:
        manifests:
          - uri: https://github.com/opendatahub-io/odh-dashboard/tarball/main
            contextDir: manifests
            sourcePath: overlays/odh

To verify Gen AI Studio is enabled, check the DSCInitialization:

oc get dsci default-dsci -o yaml | grep -A5 genAiStudio

You should see:

genAiStudio:
  managementState: Managed

My Stack: 4 MCP Servers + Llama 3.2 + Gen AI Studio

The Environment

Component	Details
OpenShift	4.20.12
RHOAI	3.2.0
Node	SNO g6.4xlarge (16 vCPU, 64GB RAM, NVIDIA L4)
Model	Llama 3.2 3B Instruct (vLLM, tool calling enabled)

The 4 MCP Servers

Prometheus MCP Server (11 tools) — Query metrics, diagnose nodes, investigate pods
Alertmanager MCP Server (12 tools) — View alerts, create silences, investigate incidents
Kubernetes MCP Server (23 tools) — List pods, view logs, execute commands, manage resources
Red Hat KB MCP Server (5 tools) — Search solutions, troubleshoot errors, query documentation

Total: 51 tools available to the agent.

Step 1: Deploy the Model with Tool Calling

The first thing was to deploy Llama 3.2 3B with KServe, enabling tool calling. This is critical — without tool calling, the model can’t use MCP tools.

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: llama-32-3b-instruct
  namespace: my-first-model
  labels:
    opendatahub.io/genai-asset: "true"  # Required for Gen AI Studio
spec:
  predictor:
    model:
      modelFormat:
        name: vLLM
      runtime: vllm-runtime-cuda
      storageUri: oci://quay.io/modh/llama-3.2-3b-instruct:latest
      args:
        - --enable-auto-tool-choice        # Enables tool calling
        - --tool-call-parser=llama3_json   # Parser for Llama 3
        - --max-model-len=8192
    resources:
      limits:
        nvidia.com/gpu: 1
      requests:
        memory: 16Gi
        cpu: 4

Once deployed, we verify it appears in Gen AI Studio:

Screenshot of Gen AI Studio showing the Llama 3.2 3B model with Active status in the Models tab The Llama 3.2 3B model shows up with “Active” status in Gen AI Studio.

Step 2: Deploy the MCP Servers

I used Helm charts to deploy the 4 servers. Here’s the configuration for each one:

Prometheus MCP Server

helm install mcp-prometheus ./mcp-prometheus/charts/mcp-prometheus \
  -n mcp-servers \
  --set openshift=true \
  --set prometheus.namespace=openshift-monitoring \
  --set prometheus.service=prometheus-operated \
  --set prometheus.servicePort=9090 \
  --set prometheus.serviceScheme=https

Alertmanager MCP Server

helm install mcp-alertmanager ./mcp-alertmanager/charts/mcp-alertmanager \
  -n mcp-servers \
  --set openshift=true \
  --set alertmanager.namespace=openshift-monitoring \
  --set alertmanager.service=alertmanager-operated \
  --set alertmanager.servicePort=9093 \
  --set alertmanager.serviceScheme=https

Kubernetes MCP Server

helm install kubernetes-mcp-server ./kubernetes-mcp-server/charts/kubernetes-mcp-server \
  -n mcp-servers \
  --set openshift=true \
  --set rbac.extraClusterRoleBindings[0].name=cluster-admin \
  --set rbac.extraClusterRoleBindings[0].roleRef.name=cluster-admin \
  --set rbac.extraClusterRoleBindings[0].roleRef.external=true

Red Hat KB MCP Server

# First create the secret with your Red Hat token
oc create secret generic mcp-redhat-kb-secret \
  -n mcp-servers \
  --from-literal=REDHAT_TOKEN=your-token-here

helm install mcp-redhat-kb ./mcp-redhat-kb/charts/mcp-redhat-kb \
  -n mcp-servers \
  --set openshift=true \
  --set redhat.existingSecret=mcp-redhat-kb-secret

Step 3: Register MCP Servers in Gen AI Studio

Here’s the interesting part. Gen AI Studio discovers MCP servers through a ConfigMap in the redhat-ods-applications namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: gen-ai-aa-mcp-servers
  namespace: redhat-ods-applications
data:
  Prometheus-MCP-Server: |
    {
      "url": "https://mcp-prometheus-mcp-servers.apps.ocp.example.com/sse",
      "description": "MCP server for querying Prometheus metrics"
    }
  Alertmanager-MCP-Server: |
    {
      "url": "https://mcp-alertmanager-mcp-servers.apps.ocp.example.com/sse",
      "description": "MCP server for managing Alertmanager alerts"
    }
  Kubernetes-MCP-Server: |
    {
      "url": "https://kubernetes-mcp-server-mcp-servers.apps.ocp.example.com/sse",
      "description": "MCP server for managing Kubernetes/OpenShift resources"
    }
  RedHat-KB-MCP-Server: |
    {
      "url": "https://mcp-redhat-kb-mcp-servers.apps.ocp.example.com/mcp",
      "description": "MCP server for searching the Red Hat Knowledge Base"
    }

Important note about endpoints: Go servers use /sse, while the Red Hat KB server (Java) uses /mcp. This is due to differences in MCP transport (SSE vs Streamable HTTP).

The “Token Required” Problem

When I first registered the servers, the Kubernetes MCP Server showed “Token Required”:

Gen AI Studio showing 4 MCP servers registered, Kubernetes MCP Server displays Token Required error The Kubernetes MCP Server showed “Token Required” — an MCP transport issue.

The problem was that I was using /sse but Gen AI Studio expected /mcp for Streamable HTTP. The solution was to verify which transport each server uses:

# To verify the correct transport
curl -X POST https://your-mcp-server.apps.example.com/sse
curl -X POST https://your-mcp-server.apps.example.com/mcp

After the fix, all servers show as “Active”:

MCP Servers panel in Gen AI Studio showing Prometheus, Alertmanager, Kubernetes and Red Hat KB with Active status All 4 MCP servers registered and active in Gen AI Studio.

Step 4: Testing the Playground

With everything configured, it was time for the moment of truth. I opened the Gen AI Studio Playground and connected all 4 MCP servers:

Gen AI Studio Playground interface with side panel showing 4 MCP servers available to connect The Playground with all 4 MCP servers available to connect.

Gen AI Studio Playground with 51 active tools from 4 connected MCP servers, showing performance warning 51 active tools — Gen AI Studio warns about performance impact.

The warning is real: With 51 tools, the 3B parameter model has to process a lot of information with each request. For production, I’d recommend an 8B+ parameter model.

What Can You Do With This?

This is the exciting part. With this setup, you can ask the agent things like:

Cluster Diagnostics

“What’s the health status of the cluster? Are there any critical alerts?”

The agent:

Queries getClusterHealthOverview from Prometheus
Gets getCriticalAlerts from Alertmanager
Gives you a summary with recommendations

Problem Investigation

“Pod X is in CrashLoopBackOff. What’s happening?”

The agent:

Uses pods_get to see the pod status
Queries pods_log to see the logs
Searches investigatePod from Prometheus for metrics
Looks for solutions in troubleshootError from Red Hat KB

Guided Operations

“I need to silence the KubePodCrashLooping alert for 2 hours while I investigate”

The agent:

Uses createSilence in Alertmanager
Confirms the silence was created correctly

The SRE Dream

Imagine a system that:

Detects an alert in your cluster
Diagnoses the problem by querying metrics
Searches for solutions in the Red Hat KB
Executes corrective actions (with approval)
Reports what it did

This is Agentic AI applied to operations. And OpenShift AI 3.x is building the infrastructure to do it in an enterprise way.

Lessons Learned

1. Model Size Matters for Tool Calling

A 3B parameter model with 51 tools is challenging. Sometimes the model gets confused about which tool to use. For production, use 8B+ parameters.

2. MCP Transport Isn’t Uniform

/sse vs /mcp — always verify the correct transport. Go servers typically use SSE (GET), while others use Streamable HTTP (POST).

3. SNO Has Limited Resources

On a Single Node OpenShift, CPU is the bottleneck. The model competes with MCP servers for resources. Plan your resource requests carefully.

4. Gen AI Studio Is Maturing Fast

It’s Tech Preview, but already very functional. MCP server integration works well and the UI is intuitive.

5. Logs Are Your Friend

When something doesn’t work, MCP server logs are invaluable:

oc logs -f deployment/mcp-prometheus -n mcp-servers

Where OpenShift AI Is Heading

Red Hat has an ambitious roadmap for MCP and Agentic AI:

MCP Catalog in AI Assets — MCP servers will appear in the catalog alongside models
MCP Registry — Version management, security scanning, certification
MCP Gateway — Policies, rate limits, centralized logging
MCPaaS (MCP-as-a-Service) — Centralized hosting of MCP servers
Native MCP Servers — For OpenShift, Ansible, RHEL, Lightspeed

The vision is clear: enterprise infrastructure for AI Agents.

Coming Next: kagent, kmcp and More

This article focused on Gen AI Studio and “traditional” MCP servers. But there’s more in the ecosystem:

kagent — Kubernetes framework for running AI agents as native workloads
kmcp — MCP implementation on Kubernetes with CRDs and operators
Llama Stack on OpenShift — Native deployment of Meta’s API for agents

In an upcoming article I’ll be exploring these tools and how they integrate with what we built here. The idea is to have an agent that can not only query information, but also take automated actions safely.

Stay tuned!

AI Agent for SREs: MCP Servers + Gen AI Studio on OpenShift AI

The Problem: Too Many Tools, Too Little Time

What is Red Hat OpenShift AI?

The Big Evolution: From RHOAI 2.x to 3.x

RHOAI 2.x (2023-2025)

RHOAI 3.x (2025-present)

Tech Preview in RHOAI 3.2: What I Used

How to Enable Gen AI Studio

My Stack: 4 MCP Servers + Llama 3.2 + Gen AI Studio

The Environment

The 4 MCP Servers

Step 1: Deploy the Model with Tool Calling

Step 2: Deploy the MCP Servers

Prometheus MCP Server

Alertmanager MCP Server

Kubernetes MCP Server

Red Hat KB MCP Server

Step 3: Register MCP Servers in Gen AI Studio

The “Token Required” Problem

Step 4: Testing the Playground

What Can You Do With This?

Cluster Diagnostics

Problem Investigation

Guided Operations

The SRE Dream

Lessons Learned

1. Model Size Matters for Tool Calling

2. MCP Transport Isn’t Uniform

3. SNO Has Limited Resources

4. Gen AI Studio Is Maturing Fast

5. Logs Are Your Friend

Where OpenShift AI Is Heading

Coming Next: kagent, kmcp and More

References

Related articles

eBPF + AI + Kubernetes: Real-Time Threat Detection for Cloud Native

Enterprise RAG in Java: From Theory to Production

LangChain4j: Generative AI for Java Developers

Comments (0)