By apipark — 11 Dec 2025

How to Build a controller to watch for changes to crd

controller to watch for changes to crd

Kubernetes has revolutionized the way we deploy, manage, and scale applications. At its core, Kubernetes is an extensible platform, designed to be adapted to almost any workload. This extensibility is perhaps one of its most powerful features, allowing users to define new resource types and automate their management through custom logic. Central to this paradigm are Custom Resource Definitions (CRDs) and controllers. While CRDs allow us to introduce new kinds of objects into the Kubernetes API, controllers are the active agents that watch these custom resources for changes and ensure that the actual state of the cluster aligns with the desired state specified in these resources.

This comprehensive guide will delve deep into the mechanics of building a Kubernetes controller specifically designed to watch for changes to a Custom Resource Definition (CRD). We will explore the fundamental concepts, walk through the development process using popular tooling, discuss advanced considerations, and provide practical insights for creating robust, production-ready operators. Whether you're an aspiring Kubernetes operator developer, a DevOps engineer looking to extend Kubernetes' capabilities, or simply curious about the internals of custom resource management, this article aims to provide a detailed and actionable roadmap. By the end, you'll not only understand how to build such a controller but also why each component is essential, empowering you to automate complex operational tasks and streamline your application deployments within the Kubernetes ecosystem.

Understanding the Kubernetes Extensibility Model: CRDs and Controllers

Before we dive into the practical aspects of building a controller, it's crucial to solidify our understanding of Kubernetes' extensibility mechanisms, particularly Custom Resource Definitions (CRDs) and the role of controllers. These two components work hand-in-hand to transform Kubernetes from a generic container orchestrator into a highly specialized platform tailored to specific application domains.

The Kubernetes API Server: The Control Plane's Front Door

At the heart of Kubernetes lies the API Server. It serves as the front end of the Kubernetes control plane, exposing the Kubernetes API. All communication, whether from user commands via kubectl, other control plane components, or custom controllers, flows through this API. When you create a Pod, a Deployment, or any other standard Kubernetes resource, you're interacting with the API Server, which then persists the desired state of that resource in etcd. The API Server is not just for built-in resources; it's also the gateway for any custom resources you wish to introduce. Its flexibility and adherence to REST principles make it an ideal foundation for extending the system.

Custom Resources (CRs) and Custom Resource Definitions (CRDs): Extending the Kubernetes Vocabulary

While Kubernetes comes with a rich set of built-in resources like Pods, Deployments, Services, and Namespaces, it's inevitable that users will encounter use cases that don't perfectly fit these standard abstractions. This is where Custom Resources come into play. A Custom Resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It allows you to add your own API objects to a Kubernetes cluster and use them as if they were native Kubernetes objects.

The mechanism for defining these new object types is the Custom Resource Definition (CRD). When you create a CRD, you're essentially telling the Kubernetes API Server, "Hey, I'm introducing a new kind of object with this specific name, group, and version, and here's its schema." Once the CRD is registered, you can then create instances of this new custom resource (CRs), just like you would create an instance of a Pod or Deployment.

Consider a scenario where you want to manage database instances directly within Kubernetes. Instead of defining a Pod, Deployment, and Service for your database, you could define a Database CRD. Then, to deploy a database, you would simply create a Database CR, specifying its version, size, and other configurations. This approach centralizes the management of domain-specific concepts within Kubernetes, making it more intuitive for developers and operators alike.

Key aspects of CRDs include:

apiVersion, kind, metadata: Standard Kubernetes fields that identify the CRD itself.
spec: This is where the actual definition of your custom resource resides. It includes:
- group: A logical grouping for your custom resources (e.g., stable.example.com).
- version: The API version of your custom resource (e.g., v1alpha1, v1).
- scope: Specifies whether the custom resource is Namespaced (like Pods) or Cluster (like Nodes).
- names: Defines the singular, plural, short names, and kind for your resource.
- versions array: Crucially, this defines the schema for each version of your CRD using OpenAPI v3 schema. It allows for strict validation of the custom resource's data, ensuring that users provide valid configurations. This schema can define data types, required fields, minimum/maximum values, patterns for strings, and even more complex structural validation. It also determines which version is served and which is used for storage.
Subresources (status, scale): CRDs can define status and scale subresources, which are specialized endpoints for updating status information and scaling resources independently, improving performance and enabling standard Kubernetes scaling tools.

The power of CRDs lies in their ability to extend the Kubernetes API in a declarative, schema-validated manner, allowing the platform to speak the language of your applications.

Controllers: The Automation Engine

While CRDs provide the "what" (the new resource type), controllers provide the "how" (the automation logic). A Kubernetes controller is a control loop that continuously monitors the state of your cluster and makes changes to move the actual state towards the desired state. This desired state is typically defined in a Kubernetes object (like a Pod, Deployment, or in our case, a Custom Resource).

For every built-in Kubernetes resource, there's a corresponding controller. For example, the Deployment controller watches Deployment objects and ensures that the specified number of Pod replicas are running. If a Pod crashes, the Deployment controller notices this discrepancy and creates a new one.

When you define a custom resource with a CRD, Kubernetes itself doesn't know how to "do" anything with it. It just knows how to store it and validate it. This is where your custom controller comes in. You write a controller that specifically watches for changes to your custom resource (CR) instances. When it detects a new CR, an update to an existing one, or a deletion, it takes action.

This action could involve:

Creating other Kubernetes resources: For our Database CR, the controller might create a StatefulSet, a Service, PersistentVolumes, and Secrets.
Updating existing resources: If the Database CR's version changes, the controller might perform an in-place upgrade of the StatefulSet.
Deleting resources: If the Database CR is deleted, the controller cleans up all associated Kubernetes resources.
Interacting with external systems: The controller might provision resources in a cloud provider (e.g., a managed database instance on AWS RDS) or integrate with an external API Gateway to expose the service it manages.
Updating the CR's status field: The controller reports the observed state of the managed resources back to the CR's status field, providing users with real-time feedback.

The relationship between CRDs and controllers is symbiotic. CRDs provide the structured data that defines the desired state, and controllers act upon that data to bring the cluster to that desired state. Together, they enable the "Operator pattern," where human operational knowledge is encoded into software, making complex application management repeatable, reliable, and automated within Kubernetes.

The Anatomy of a CRD: Beyond the Basics

Designing a robust and user-friendly Custom Resource Definition (CRD) is the foundational step in building an effective Kubernetes controller. A well-designed CRD simplifies the controller's logic, improves user experience, and makes your custom resources feel like native Kubernetes objects. This section will delve deeper into the critical components and considerations for CRD design, offering a practical perspective on how to define your custom resource's structure and behavior.

Essential CRD Metadata and Scope

Every CRD, like any Kubernetes resource, starts with standard metadata:

apiVersion: apiextensions.k8s.io/v1: Specifies the API version for the CRD itself. For new CRDs, v1 is the current stable and recommended version.
kind: CustomResourceDefinition: Identifies this manifest as a CRD.
metadata: Contains standard Kubernetes object metadata such as name (which must be in the format <plural>.<group>), labels, and annotations.

Within the spec block of the CRD, several fields are paramount:

group: A domain-like string that logically groups your API. For example, example.com or stable.example.com. This helps avoid naming collisions with other CRDs or built-in Kubernetes APIs.
versions: This is an array that defines the different API versions of your custom resource. Each entry in this array must include:
- name: The actual version string (e.g., v1alpha1, v1). It's a common practice to start with v1alpha1 for early development and then move to v1beta1 and finally v1 as the API matures.
- served: A boolean indicating whether this version is enabled via the REST API. You can deprecate old versions by setting served: false.
- storage: A boolean indicating if this version is used for storing the resource in etcd. Only one version can be storage: true at a time. Kubernetes automatically converts resources between stored and requested versions if necessary, provided conversion webhooks are configured (more on this later).
scope: Defines whether instances of your custom resource are Namespaced (meaning they exist within a specific namespace, like Pods and Deployments) or Cluster (meaning they are cluster-wide, like Nodes or PersistentVolumes). The choice here depends entirely on the nature of the resource you're defining. Most application-specific resources are Namespaced.
names: This object defines various forms of the resource name used in the API and kubectl:
- kind: The CamelCase name for your resource type (e.g., MyApp). This is what you'll use in the kind field of your custom resource manifests.
- plural: The plural form of your resource name (e.g., myapps). Used in kubectl get myapps.
- singular: The singular form (e.g., myapp).
- shortNames: An optional array of short aliases for kubectl (e.g., ma).

OpenAPI v3 Schema Validation: Ensuring Data Integrity

One of the most critical features of apiextensions.k8s.io/v1 CRDs is the ability to define a structural schema for your custom resources using OpenAPI v3. This schema is specified under spec.versions[].schema.openAPIV3Schema. It allows the Kubernetes API Server to perform client-side validation, ensuring that any custom resource instance submitted adheres to your defined structure before it's even stored in etcd. This significantly improves data integrity and reduces errors in controller logic.

Key aspects of schema definition:

type: Defines the data type of a field (e.g., object, array, string, integer, boolean).
properties: For object types, this defines the fields within the object. Each property can have its own type, description, and further validation rules.
required: An array of strings listing the names of properties that must be present in the custom resource.
description: A human-readable explanation of the field's purpose. Essential for good documentation.

Beyond basic types and requirements, OpenAPI v3 schema offers powerful validation rules:

String Validation:
- minLength, maxLength: For string length.
- pattern: A regular expression to match against the string value.
- format: Suggests a format (e.g., email, hostname, uri).
Numeric Validation:
- minimum, maximum: For integer or float values.
- exclusiveMinimum, exclusiveMaximum: To define strict inequalities.
- multipleOf: Ensures the value is a multiple of a given number.
Array Validation:
- minItems, maxItems: For the number of elements in an array.
- uniqueItems: Ensures all elements in an array are unique.
- items: Defines the schema for each element within the array.
Object Validation:
- maxProperties, minProperties: For the number of properties in an object.
- additionalProperties: Controls whether extra properties beyond those explicitly defined are allowed. Often set to false to prevent arbitrary fields.

Example: A MyApp CRD Spec

Let's imagine a CRD for managing a simple application deployment. Its spec might look like this:

# ... (standard CRD metadata)
spec:
  group: app.example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            apiVersion: {type: string}
            kind: {type: string}
            metadata: {type: object}
            spec:
              type: object
              properties:
                image:
                  type: string
                  pattern: "^[a-z0-9]+([._-][a-z0-9]+)*(/[a-z0-9]+([._-][a-z0-9]+)*)*:[a-zA-Z0-9._-]+$" # Basic image tag pattern
                  description: "The Docker image to deploy for the application."
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
                  default: 1
                  description: "Number of desired application replicas."
                port:
                  type: integer
                  minimum: 80
                  maximum: 65535
                  description: "The port the application listens on."
                configMapRef:
                  type: string
                  description: "Reference to an existing ConfigMap for application configuration."
              required:
                - image
                - port
            status:
              type: object
              properties:
                availableReplicas:
                  type: integer
                  description: "Number of replicas currently available."
                deploymentName:
                  type: string
                  description: "Name of the Kubernetes Deployment managed by this MyApp."

This schema ensures that when a user creates a MyApp custom resource, they provide a valid image string, a port within a sensible range, and a positive number of replicas. If they try to submit an invalid resource, the API Server will reject it immediately, preventing your controller from having to deal with malformed input.

Subresources: `status` and `scale`

The status subresource (spec.versions[].subresources.status: {}) is crucial. It allows controllers to update the status of a custom resource independently of its spec, without requiring optimistic locking (like resourceVersion checks). This is important because the spec represents the desired state (managed by the user), while the status represents the observed state (managed by the controller).

For our MyApp example, the status field might contain availableReplicas and deploymentName. The user defines the desired replicas in spec, and the controller reports the availableReplicas in status.

The scale subresource (spec.versions[].subresources.scale: {}) allows your custom resource to integrate with horizontal pod autoscalers (HPAs) and kubectl scale commands. If your custom resource creates deployable objects (like Deployments or StatefulSets), configuring the scale subresource points to the spec field that controls scaling and the status field that reports the replica count.

The Importance of Thoughtful CRD Design

Designing a CRD is an API design exercise. It should be:

Intuitive: Users should easily understand what fields to provide and what they mean.
Declarative: Focus on what the desired state is, not how to achieve it.
Stable: Once v1 is released, changes should be backward compatible. Use v1alpha1, v1beta1 for experimentation.
Validated: Leverage OpenAPI v3 schema to ensure data integrity.
Extensible: Allow for future growth without breaking existing clients.

A well-designed CRD simplifies the controller's job immensely, as the controller can trust the input it receives and focus solely on reconciling the observed state with the desired state. It also elevates your custom resource to the same level of usability as built-in Kubernetes objects, fostering a consistent user experience across the entire platform.

The Heart of Automation: Kubernetes Controllers

Having understood how Custom Resource Definitions (CRDs) extend the Kubernetes API, the next logical step is to explore the "brains" of the operation: the Kubernetes controller. A controller is the active component that continuously observes the state of a specific resource (or multiple resources) within the Kubernetes cluster and takes corrective actions to ensure that the actual state matches the desired state, as defined in those resources. This continuous observation and reconciliation loop is the fundamental pattern of all Kubernetes controllers, whether they are built-in or custom.

The Reconciliation Loop: Desired State vs. Actual State

The core concept behind any Kubernetes controller is the reconciliation loop. This loop is an endlessly running process that performs the following steps:

Observe the Actual State: The controller reads the current state of the cluster, specifically focusing on the resources it is responsible for managing. This includes its primary Custom Resources (CRs) and any secondary resources (like Deployments, Services, ConfigMaps) that it creates or manages on behalf of those CRs.
Determine the Desired State: The controller extracts the desired configuration from the spec of its primary CRs. This spec field represents what the user wants the system to look like.
Compare and Reconcile: The controller compares the observed actual state with the desired state.
- If there's a discrepancy (e.g., a Deployment specified in the CR's spec is missing, or a parameter in a Service has changed), the controller identifies the necessary actions.
- If the actual state matches the desired state, the controller simply moves on, waiting for the next change or periodic check.
Take Action: The controller issues commands to the Kubernetes API Server to create, update, or delete resources, thereby moving the actual state towards the desired state.
Update Status (Optional but Recommended): After taking action, the controller often updates the status field of the primary CR. This provides feedback to the user and other controllers about the current observed state and the progress of the reconciliation.

This loop is designed to be idempotent, meaning that applying the same desired state multiple times should always result in the same actual state without side effects. If the controller creates a Deployment, it should first check if the Deployment already exists. If it does, it might update it; if not, it creates it. This makes controllers robust against transient errors or multiple event triggers.

Key Components of a Controller

To efficiently implement the reconciliation loop, controllers leverage several key components provided by Kubernetes client libraries (like client-go in Go):

1. Informers: The Watchers

An Informer is a crucial component that watches the Kubernetes API Server for changes to specific resource types (e.g., your custom MyApp resources, Deployments, Services). It provides a mechanism to receive notifications (Add, Update, Delete events) whenever a resource it's watching changes.

Efficiency: Informers don't constantly poll the API Server. Instead, they establish a long-lived connection and receive push notifications, making them highly efficient.
Caching: A significant feature of informers is that they maintain an in-memory cache of the resources they are watching. This cache is locally consistent and eventually consistent with the API Server.
Event Handling: When an event occurs (e.g., a MyApp resource is created or updated), the informer adds the key (typically namespace/name) of the affected resource to a workqueue.

2. Listers: The Local Data Source

A Lister works hand-in-hand with an informer. It provides a read-only interface to the informer's local, in-memory cache. This cache allows the controller to:

Reduce API Server Load: Instead of making a direct API call for every reconciliation, the controller can query the local cache via the lister. This significantly reduces the load on the Kubernetes API Server, especially in large clusters or for frequently reconciled resources.
Improve Performance: Accessing local memory is much faster than making network calls.
Consistency: The lister ensures that the controller sees a consistent snapshot of the data, even if the API Server is experiencing high load or network partitions.

3. Workqueue: The Event Processor

The Workqueue (or RateLimitingWorkqueue) acts as a buffer and scheduler for events. When an informer detects a change, it adds the key of the affected resource (e.g., default/my-app-instance-1) to the workqueue. The controller's reconciliation loop then pulls items from this queue, one at a time, to process them.

Key features of a workqueue:

Deduplication: If multiple events occur for the same resource within a short period, the workqueue will often deduplicate them, ensuring that the reconciliation logic is only triggered once for that resource. This prevents unnecessary work.
Retries: If an item's processing fails (e.g., due to a transient API error), the workqueue can be configured to re-add the item with a back-off delay, facilitating retries. This is crucial for building resilient controllers.
Ordered Processing: For a given resource, events are typically processed in order, preventing race conditions that could arise from out-of-order updates.

4. The `Reconcile` Function: The Business Logic

The Reconcile function is where the core business logic of your controller resides. This is the function that is invoked for each item pulled from the workqueue. Its input is typically a ReconcileRequest containing the NamespacedName of the resource to be reconciled.

Inside the Reconcile function, the controller performs the steps outlined in the reconciliation loop:

Fetch the primary CR: It uses the lister (or direct API client if the resource is not watched by an informer for the controller) to retrieve the custom resource instance (e.g., MyApp).
Handle deletion: If the custom resource is marked for deletion (i.e., its metadata.deletionTimestamp is set), the controller executes cleanup logic (e.g., deleting associated Deployments, Services) and then removes its finalizer.
Compare and Act: Based on the spec of the custom resource, it fetches or constructs the desired state of secondary resources (like Deployments, Services). It then compares these desired states with their actual states in the cluster and applies necessary changes.
Update status: Finally, it updates the status field of the custom resource to reflect the current observed state of the managed resources.
Return result: The Reconcile function returns a ReconcileResult indicating whether it needs to be re-queued (e.g., for retries after an error, or to requeue after a specific duration).

Event-Driven Nature and Dependency Management

Controllers are inherently event-driven. They react to changes. However, a custom resource like MyApp might depend on other Kubernetes objects like a Deployment or Service. Your controller needs to be aware of changes to these dependent resources as well. If a user accidentally deletes a Deployment that your MyApp controller manages, the controller needs to detect this and recreate it.

This is achieved by configuring the controller to "watch" these dependent resources. When the controller detects an event for a dependent resource, it identifies the owner of that resource (usually your custom MyApp instance, identified by OwnerReference on the dependent resource) and re-queues the owner for reconciliation. This ensures that any change to a managed resource triggers a reconciliation of its parent custom resource, maintaining the desired state.

Error Handling and Retries: Building Resilient Controllers

Robust error handling is paramount for production-grade controllers. Network issues, API server unavailability, or transient external service errors can all cause reconciliation to fail. The workqueue's rate-limiting capabilities are essential here:

If Reconcile returns an error, the workqueue can automatically re-add the item after a delay, allowing the controller to retry.
Exponential back-off can be applied to prevent overwhelming the API server during persistent issues.
It's crucial to distinguish between transient errors (which should be retried) and permanent errors (which might require human intervention or specific logic to mark the CR as failed).

Integrating with External Services: Beyond Kubernetes Resources

While many controllers focus solely on managing Kubernetes-native resources, a significant number of operators extend their reach to external services. For example, a Database CRD might trigger the creation of a managed database instance on a cloud provider. Similarly, an AIModel CRD could manage the deployment of an AI inference service.

In such scenarios, managing the exposure and governance of these services becomes a critical concern. If your controller is orchestrating the deployment of various API-driven applications or AI Gateway functionalities—like specific machine learning model endpoints or complex data processing pipelines that expose an API—you'll likely face challenges related to authentication, authorization, rate limiting, traffic management, and analytics.

This is precisely where an advanced API Gateway solution becomes indispensable. Imagine your controller deploys several AI models, each accessible via a unique endpoint, possibly defined by a custom resource. Without a centralized management layer, administering these diverse AI services can become an operational nightmare. An advanced platform like APIPark, which serves as an open-source AI Gateway and API management platform, provides a unified solution for managing, integrating, and deploying both AI and REST services. It can seamlessly bridge the gap between your custom Kubernetes resources and external consumers, offering features like unified API formats for AI invocation, prompt encapsulation into REST APIs, and comprehensive lifecycle management. By integrating such a gateway, your controller can focus on orchestrating the underlying infrastructure, while the API Gateway handles the intricate details of service exposure, security, and performance for the APIs your custom resources create. This separation of concerns ensures both operational efficiency and robust service delivery.

Setting Up Your Development Environment

Building a Kubernetes controller, especially one that watches for CRD changes, requires a specific set of tools and a structured approach. The Go programming language is the de facto standard for Kubernetes development due to its performance, concurrency model, and the availability of robust client libraries (client-go). Beyond Go, specialized SDKs like Kubebuilder and Operator SDK significantly streamline the development process.

Go Language Essentials

Go is the language in which Kubernetes itself is written, making it the natural choice for extending the platform. If you're new to Go, here are a few essentials:

Installation: Download and install Go from the official website (golang.org). Ensure your GOPATH and PATH are correctly configured.
Modules: Go modules are used for dependency management. Your controller project will be initialized as a Go module.
Basic Syntax: Familiarity with Go's basic syntax, structs, interfaces, error handling, and concurrency primitives (goroutines and channels) will be beneficial.

For controller development, you'll extensively use the client-go library, which provides clients for interacting with the Kubernetes API, along with informers, listers, and workqueues.

Kubebuilder vs. Operator SDK: Scaffolding Your Controller

When embarking on controller development, you don't start from scratch. Tools like Kubebuilder and Operator SDK provide scaffolding, code generation, and helpers that drastically accelerate development. While they have different origins and focus areas, they share a common goal: simplifying the creation of Kubernetes operators.

Let's briefly compare them:

Feature/Aspect	Kubebuilder	Operator SDK
Origin	Part of the Kubernetes SIG API Machinery project	Red Hat-sponsored, built on upstream projects
Core Philosophy	Minimalist, opinionated, focuses on `client-go`	Feature-rich, supports Go, Ansible, Helm operators
Scaffolding	Generates Go project structure, CRDs, controller	Generates Go, Ansible, Helm project structure, CRDs
CRD Definition	Primarily Go structs with `kubebuilder` markers	Go structs with `kubebuilder` markers
Reconciliation	Standard `controller-runtime` reconciliation loop	Standard `controller-runtime` reconciliation loop
Testing Helpers	Excellent integration with `envtest`	Also uses `envtest`
Webhooks	First-class support for admission webhooks	First-class support for admission webhooks
Maturity	Highly mature and widely adopted	Highly mature and widely adopted
Use Case Focus	Go-based operators, deep Kubernetes integration	Enterprise operators, multi-language support

Which one to choose?

For building a controller that watches CRD changes using Go, Kubebuilder is an excellent choice. It’s actively developed by the Kubernetes community, provides a streamlined workflow specifically for Go operators, and integrates seamlessly with controller-runtime (the underlying library used by both SDKs for common controller patterns). Operator SDK, while powerful, might introduce a bit more overhead if you only plan to write Go-based controllers and don't need its Ansible or Helm operator capabilities. For this guide, we will proceed with Kubebuilder.

Installing Kubebuilder

To install Kubebuilder, follow these steps (assuming you have Go installed):

Install Kubebuilder Binary: ```bash # For Linux: # Go to https://github.com/kubernetes-sigs/kubebuilder/releases # Find the latest release (e.g., kubebuilder_3.12.0_linux_amd64.tar.gz) # wget https://github.com/kubernetes-sigs/kubebuilder/releases/download/v3.12.0/kubebuilder_3.12.0_linux_amd64.tar.gz # sudo tar -zxvf kubebuilder_3.12.0_linux_amd64.tar.gz -C /usr/local # sudo mv /usr/local/kubebuilder_3.12.0_linux_amd64 /usr/local/kubebuilder # export PATH=$PATH:/usr/local/kubebuilder/bin

Or, follow the instructions on the Kubebuilder website for your OS:

https://kubebuilder.io/quick-start.html#installation

Verify the installation:bash kubebuilder version ``` This should output the Kubebuilder version.

Local Kubernetes Cluster for Development

For local development and testing, you'll need a Kubernetes cluster. kind (Kubernetes in Docker) or minikube are excellent choices:

kind: Creates local Kubernetes clusters using Docker containers as "nodes." It's lightweight, fast, and ideal for controller development. ```bash # Install kind (if you don't have it) # go install sigs.k8s.io/kind@v0.20.0 # or latest version

Create a cluster

kind create cluster --name my-controller-cluster * **`minikube`**: Runs a single-node Kubernetes cluster inside a VM on your laptop.bash

Install minikube (refer to minikube.sigs.k8s.io)

Start a cluster

minikube start ```

Ensure your kubectl context is pointing to your local development cluster.

With your Go environment, Kubebuilder, and a local Kubernetes cluster set up, you're now ready to embark on building your first custom controller. The scaffolding tools will handle much of the boilerplate, allowing you to focus on the core logic of managing your custom resources.

APIPark is a high-performance AI gateway that allows you to securely access the most comprehensive LLM APIs globally on the APIPark platform, including OpenAI, Anthropic, Mistral, Llama2, Google Gemini, and more.Try APIPark now! 👇👇👇

Install APIPark – it’s free

Building Your First Controller with Kubebuilder

Now that our development environment is prepared, we can dive into the practical steps of building a controller using Kubebuilder. We'll create a controller that manages a custom resource, MyApp, which will in turn deploy a Kubernetes Deployment and Service. Our controller will watch for changes to MyApp resources and reconcile the state of the associated Deployment and Service accordingly.

Step 1: Initialize the Project

First, create a new directory for your project and initialize it with Kubebuilder:

mkdir myapp-controller
cd myapp-controller
kubebuilder init --domain example.com --repo github.com/yourorg/myapp-controller

--domain example.com: This specifies the domain used for your API group (e.g., app.example.com).
--repo github.com/yourorg/myapp-controller: This is your Go module path.

This command will scaffold a new Go module, go.mod file, a Makefile, Dockerfile, PROJECT file, and config/ directory with base Kubernetes manifests.

Step 2: Create the API (CRD and Controller)

Next, we'll create the API for our custom resource, MyApp. This command generates the Go types for the CRD and the basic controller boilerplate.

kubebuilder create api --group app --version v1 --kind MyApp --resource --controller

--group app: Our API group will be app.example.com.
--version v1: The version of our custom resource API.
--kind MyApp: The Kind name for our custom resource.
--resource: Generates the Go types for the custom resource.
--controller: Generates the controller logic.

This command creates several important files:

api/v1/myapp_types.go: Defines the Go struct for our MyApp custom resource, including MyAppSpec and MyAppStatus.
controllers/myapp_controller.go: Contains the MyAppReconciler struct and the Reconcile method, which is the heart of our controller's logic.
config/crd/bases/app.example.com_myapps.yaml: The YAML definition for our MyApp CRD.

Step 3: Define the CRD (Go Types)

Open api/v1/myapp_types.go. Here, we define the MyAppSpec (desired state) and MyAppStatus (observed state). We'll add fields to manage an application's image, replica count, and port.

package v1

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// MyAppSpec defines the desired state of MyApp
type MyAppSpec struct {
    // +kubebuilder:validation:MinLength=1
    // +kubebuilder:validation:Pattern=`^[a-z0-9]+([._-][a-z0-9]+)*(/[a-z0-9]+([._-][a-z0-9]+)*)*:[a-zA-Z0-9._-]+$`
    // Image specifies the Docker image to deploy for the application.
    Image string `json:"image"`

    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=10
    // +kubebuilder:default=1
    // Replicas defines the number of desired application replicas.
    Replicas int32 `json:"replicas"`

    // +kubebuilder:validation:Minimum=80
    // +kubebuilder:validation:Maximum=65535
    // Port defines the port the application listens on.
    Port int32 `json:"port"`

    // ConfigMapRef is an optional reference to an existing ConfigMap for application configuration.
    // +optional
    ConfigMapRef string `json:"configMapRef,omitempty"`
}

// MyAppStatus defines the observed state of MyApp
type MyAppStatus struct {
    // +optional
    // AvailableReplicas is the number of currently available application replicas.
    AvailableReplicas int32 `json:"availableReplicas"`

    // +optional
    // DeploymentName is the name of the Kubernetes Deployment managed by this MyApp.
    DeploymentName string `json:"deploymentName"`

    // +optional
    // ServiceName is the name of the Kubernetes Service managed by this MyApp.
    ServiceName string `json:"serviceName"`

    // +optional
    // Conditions represent the latest available observations of a MyApp's current state.
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Image",type="string",JSONPath=".spec.image",description="Application Image"
// +kubebuilder:printcolumn:name="Replicas",type="integer",JSONPath=".spec.replicas",description="Desired Replicas"
// +kubebuilder:printcolumn:name="Available",type="integer",JSONPath=".status.availableReplicas",description="Available Replicas"
// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"

// MyApp is the Schema for the myapps API
type MyApp struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   MyAppSpec   `json:"spec,omitempty"`
    Status MyAppStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// MyAppList contains a list of MyApp
type MyAppList struct {
    metav1.TypeMeta `json:",inline"`
    metav1.ListMeta `json:"metadata,omitempty"`
    Items           []MyApp `json:"items"`
}

func init() {
    SchemeBuilder.Register(&MyApp{}, &MyAppList{})
}

Explanation of the +kubebuilder markers:

+kubebuilder:validation:...: These markers add OpenAPI v3 schema validation rules directly to your Go types. When Kubebuilder generates the CRD YAML, it will incorporate these rules into spec.versions[].schema.openAPIV3Schema. This ensures that any MyApp resource created will conform to these rules.
+kubebuilder:default=1: Sets a default value if not specified.
+kubebuilder:subresource:status: Enables the /status subresource for MyApp resources, allowing separate updates to the status field.
+kubebuilder:printcolumn: Defines custom columns for kubectl get myapps, making it easier to view key information.
+kubebuilder:object:root=true: Marks MyApp and MyAppList as root Kubernetes API objects.

After modifying myapp_types.go, run make manifests to regenerate the CRD YAML file (config/crd/bases/app.example.com_myapps.yaml) and update the zz_generated.deepcopy.go file. This command will incorporate your new fields and validation rules into the CRD definition.

Step 4: Implement the Controller Logic (`Reconcile` function)

Now, let's open controllers/myapp_controller.go. The Reconcile method is where the core logic resides. We'll implement the reconciliation loop to manage a Kubernetes Deployment and Service based on the MyApp resource.

We'll need to import a few packages: appsv1 for Deployments, corev1 for Services and ConfigMaps, networkingv1 for Ingress (if we decide to add one later), and k8s.io/apimachinery/pkg/api/errors for error handling.

The Reconcile function:

Fetch the MyApp CR: Retrieve the MyApp instance that triggered the reconciliation. If it's not found, it might have been deleted, so we ignore the request.
Handle Deletion (Finalizers): If the MyApp resource is being deleted, perform cleanup (delete associated Deployment/Service).
Create/Update Deployment: Based on MyApp.Spec, ensure a Deployment exists with the desired image, replicas, and owner reference.
Create/Update Service: Based on MyApp.Spec, ensure a Service exists to expose the Deployment, also with an owner reference.
Update MyApp.Status: Report the actual state of the created Deployment and Service back to the MyApp's status field.

package controllers

import (
    "context"
    "fmt"
    "reflect" // For deep comparison of specs

    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/apimachinery/pkg/runtime"
    "k8s.io/apimachinery/pkg/types"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
    "sigs.k8s.io/controller-runtime/pkg/log"

    appv1 "github.com/yourorg/myapp-controller/api/v1" // Update with your actual module path
)

// MyAppReconciler reconciles a MyApp object
type MyAppReconciler struct {
    client.Client
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=app.example.com,resources=myapps,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=app.example.com,resources=myapps/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=app.example.com,resources=myapps/finalizers,verbs=update
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=core,resources=configmaps,verbs=get;list;watch

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify Reconcile to compare the state specified by the MyApp object
// against the actual cluster state, and then perform operations to make the cluster state reflect the state specified by the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.16.0/pkg/reconcile
func (r *MyAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    _log := log.FromContext(ctx)

    // 1. Fetch the MyApp instance
    myapp := &appv1.MyApp{}
    err := r.Get(ctx, req.NamespacedName, myapp)
    if err != nil {
        if errors.IsNotFound(err) {
            // Request object not found, could have been deleted after reconcile request.
            // Return and don't requeue
            _log.Info("MyApp resource not found. Ignoring since object must be deleted")
            return ctrl.Result{}, nil
        }
        // Error reading the object - requeue the request.
        _log.Error(err, "Failed to get MyApp")
        return ctrl.Result{}, err
    }

    // Define the MyApp Finalizer
    myAppFinalizer := "myapps.app.example.com/finalizer"

    // Check if the MyApp instance is marked for deletion
    if myapp.ObjectMeta.DeletionTimestamp.IsZero() {
        // The object is not being deleted, so if it does not have our finalizer,
        // then lets add it and update the object. This is equivalent to registering our finalizer.
        if !controllerutil.ContainsFinalizer(myapp, myAppFinalizer) {
            controllerutil.AddFinalizer(myapp, myAppFinalizer)
            if err := r.Update(ctx, myapp); err != nil {
                _log.Error(err, "Failed to add finalizer to MyApp")
                return ctrl.Result{}, err
            }
        }
    } else {
        // The object is being deleted
        if controllerutil.ContainsFinalizer(myapp, myAppFinalizer) {
            // Our finalizer is present, so we can do any cleanup
            _log.Info("Performing finalizer cleanup for MyApp", "name", myapp.Name)

            // 2. Perform cleanup (delete associated Deployment and Service)
            // Delete Deployment
            deployment := &appsv1.Deployment{}
            err := r.Get(ctx, types.NamespacedName{Name: myapp.Name, Namespace: myapp.Namespace}, deployment)
            if err == nil { // Deployment exists, delete it
                _log.Info("Deleting associated Deployment", "Deployment.Name", deployment.Name)
                if err := r.Delete(ctx, deployment); err != nil {
                    if !errors.IsNotFound(err) {
                        _log.Error(err, "Failed to delete Deployment for MyApp", "Deployment.Name", deployment.Name)
                        return ctrl.Result{}, err
                    }
                }
            } else if !errors.IsNotFound(err) {
                _log.Error(err, "Failed to get associated Deployment during cleanup", "Deployment.Name", myapp.Name)
                return ctrl.Result{}, err
            }

            // Delete Service
            service := &corev1.Service{}
            err = r.Get(ctx, types.NamespacedName{Name: myapp.Name, Namespace: myapp.Namespace}, service)
            if err == nil { // Service exists, delete it
                _log.Info("Deleting associated Service", "Service.Name", service.Name)
                if err := r.Delete(ctx, service); err != nil {
                    if !errors.IsNotFound(err) {
                        _log.Error(err, "Failed to delete Service for MyApp", "Service.Name", service.Name)
                        return ctrl.Result{}, err
                    }
                }
            } else if !errors.IsNotFound(err) {
                _log.Error(err, "Failed to get associated Service during cleanup", "Service.Name", myapp.Name)
                return ctrl.Result{}, err
            }

            // Remove our finalizer from the list and update it.
            controllerutil.RemoveFinalizer(myapp, myAppFinalizer)
            if err := r.Update(ctx, myapp); err != nil {
                _log.Error(err, "Failed to remove finalizer from MyApp")
                return ctrl.Result{}, err
            }
        }
        // Stop reconciliation as the object is being deleted and cleanup is done
        return ctrl.Result{}, nil
    }

    // 3. Define the desired Deployment
    deployment := r.deploymentForMyApp(myapp)

    // Set MyApp instance as the owner and controller of the Deployment
    // This ensures that the Deployment is garbage-collected when the MyApp is deleted
    if err := ctrl.SetControllerReference(myapp, deployment, r.Scheme); err != nil {
        _log.Error(err, "Failed to set controller reference for Deployment")
        return ctrl.Result{}, err
    }

    // Check if the Deployment already exists
    foundDeployment := &appsv1.Deployment{}
    err = r.Get(ctx, types.NamespacedName{Name: deployment.Name, Namespace: deployment.Namespace}, foundDeployment)
    if err != nil && errors.IsNotFound(err) {
        _log.Info("Creating a new Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
        err = r.Create(ctx, deployment)
        if err != nil {
            _log.Error(err, "Failed to create new Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
            return ctrl.Result{}, err
        }
        // Deployment created successfully - don't requeue
        _log.Info("Successfully created Deployment", "Deployment.Name", deployment.Name)
    } else if err != nil {
        _log.Error(err, "Failed to get Deployment")
        return ctrl.Result{}, err
    } else {
        // Deployment already exists, check if its spec matches the desired state
        // We only care about the fields that our MyApp controller manages
        if !reflect.DeepEqual(deployment.Spec.Replicas, foundDeployment.Spec.Replicas) ||
            !reflect.DeepEqual(deployment.Spec.Template.Spec.Containers[0].Image, foundDeployment.Spec.Template.Spec.Containers[0].Image) ||
            !reflect.DeepEqual(deployment.Spec.Template.Spec.Containers[0].Env, foundDeployment.Spec.Template.Spec.Containers[0].Env) {
            _log.Info("Updating existing Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
            foundDeployment.Spec.Replicas = deployment.Spec.Replicas
            foundDeployment.Spec.Template.Spec.Containers[0].Image = deployment.Spec.Template.Spec.Containers[0].Image
            foundDeployment.Spec.Template.Spec.Containers[0].Env = deployment.Spec.Template.Spec.Containers[0].Env
            if err := r.Update(ctx, foundDeployment); err != nil {
                _log.Error(err, "Failed to update Deployment", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
                return ctrl.Result{}, err
            }
            _log.Info("Successfully updated Deployment", "Deployment.Name", deployment.Name)
        }
    }

    // 4. Define the desired Service
    service := r.serviceForMyApp(myapp)

    // Set MyApp instance as the owner and controller of the Service
    if err := ctrl.SetControllerReference(myapp, service, r.Scheme); err != nil {
        _log.Error(err, "Failed to set controller reference for Service")
        return ctrl.Result{}, err
    }

    // Check if the Service already exists
    foundService := &corev1.Service{}
    err = r.Get(ctx, types.NamespacedName{Name: service.Name, Namespace: service.Namespace}, foundService)
    if err != nil && errors.IsNotFound(err) {
        _log.Info("Creating a new Service", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
        err = r.Create(ctx, service)
        if err != nil {
            _log.Error(err, "Failed to create new Service", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
            return ctrl.Result{}, err
        }
        // Service created successfully - don't requeue
        _log.Info("Successfully created Service", "Service.Name", service.Name)
    } else if err != nil {
        _log.Error(err, "Failed to get Service")
        return ctrl.Result{}, err
    } else {
        // Service already exists, check if its spec matches the desired state
        // We only care about the fields that our MyApp controller manages
        // Note: Service.Spec.ClusterIP is immutable, so we don't compare or modify it.
        // We compare ports and selector.
        if !reflect.DeepEqual(service.Spec.Ports, foundService.Spec.Ports) ||
            !reflect.DeepEqual(service.Spec.Selector, foundService.Spec.Selector) {
            _log.Info("Updating existing Service", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
            foundService.Spec.Ports = service.Spec.Ports
            foundService.Spec.Selector = service.Spec.Selector
            if err := r.Update(ctx, foundService); err != nil {
                _log.Error(err, "Failed to update Service", "Service.Namespace", service.Namespace, "Service.Name", service.Name)
                return ctrl.Result{}, err
            }
            _log.Info("Successfully updated Service", "Service.Name", service.Name)
        }
    }

    // 5. Update MyApp status
    myappStatus := appv1.MyAppStatus{
        AvailableReplicas: foundDeployment.Status.AvailableReplicas,
        DeploymentName:    foundDeployment.Name,
        ServiceName:       foundService.Name,
    }

    if !reflect.DeepEqual(myapp.Status, myappStatus) {
        myapp.Status = myappStatus
        _log.Info("Updating MyApp status", "MyApp.Namespace", myapp.Namespace, "MyApp.Name", myapp.Name)
        if err := r.Status().Update(ctx, myapp); err != nil {
            _log.Error(err, "Failed to update MyApp status")
            return ctrl.Result{}, err
        }
        _log.Info("Successfully updated MyApp status", "MyApp.Name", myapp.Name)
    }

    return ctrl.Result{}, nil
}

// deploymentForMyApp returns a MyApp Deployment object
func (r *MyAppReconciler) deploymentForMyApp(m *appv1.MyApp) *appsv1.Deployment {
    labels := labelsForMyApp(m.Name)
    replicas := m.Spec.Replicas

    dep := &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      m.Name,
            Namespace: m.Namespace,
            Labels:    labels,
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: labels,
            },
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: labels,
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{{
                        Name:  "app",
                        Image: m.Spec.Image,
                        Ports: []corev1.ContainerPort{{
                            ContainerPort: m.Spec.Port,
                            Name:          "http",
                        }},
                        Env:   r.getEnvVarsForMyApp(m),
                    }},
                },
            },
        },
    }
    return dep
}

// serviceForMyApp returns a MyApp Service object
func (r *MyAppReconciler) serviceForMyApp(m *appv1.MyApp) *corev1.Service {
    labels := labelsForMyApp(m.Name)

    service := &corev1.Service{
        ObjectMeta: metav1.ObjectMeta{
            Name:      m.Name,
            Namespace: m.Namespace,
            Labels:    labels,
        },
        Spec: corev1.ServiceSpec{
            Selector: labels,
            Ports: []corev1.ServicePort{{
                Protocol:   corev1.ProtocolTCP,
                Port:       m.Spec.Port,
                TargetPort: intstr.FromInt(int(m.Spec.Port)),
                Name:       "http",
            }},
            Type: corev1.ServiceTypeClusterIP, // Or NodePort, LoadBalancer
        },
    }
    return service
}

// labelsForMyApp returns the labels for selecting the resources
// belonging to the given MyApp CR name.
func labelsForMyApp(name string) map[string]string {
    return map[string]string{"app": "myapp", "myapp_cr": name}
}

// getEnvVarsForMyApp generates environment variables for the application container
func (r *MyAppReconciler) getEnvVarsForMyApp(m *appv1.MyApp) []corev1.EnvVar {
    var envVars []corev1.EnvVar

    // Add port as an environment variable
    envVars = append(envVars, corev1.EnvVar{
        Name:  "APP_PORT",
        Value: fmt.Sprintf("%d", m.Spec.Port),
    })

    // If configMapRef is provided, try to load its data as environment variables
    if m.Spec.ConfigMapRef != "" {
        configMap := &corev1.ConfigMap{}
        err := r.Get(context.Background(), types.NamespacedName{Name: m.Spec.ConfigMapRef, Namespace: m.Namespace}, configMap)
        if err != nil {
            // Log error but don't fail reconciliation, maybe ConfigMap will appear later
            // A more robust solution might involve watching the ConfigMap and requeueing
            log.Log.Error(err, "Failed to get ConfigMap referenced by MyApp", "ConfigMap.Name", m.Spec.ConfigMapRef)
        } else {
            for k, v := range configMap.Data {
                envVars = append(envVars, corev1.EnvVar{
                    Name:  k,
                    Value: v,
                })
            }
        }
    }
    return envVars
}

// SetupWithManager sets up the controller with the Manager.
func (r *MyAppReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&appv1.MyApp{}).                                // Watch MyApp resources
        Owns(&appsv1.Deployment{}).                         // Watch Deployments owned by MyApp
        Owns(&corev1.Service{}).                            // Watch Services owned by MyApp
        Watches(&corev1.ConfigMap{}, handler.EnqueueRequestsFromMapFunc(
            func(ctx context.Context, o client.Object) []reconcile.Request {
                // Map function to find MyApps that reference this ConfigMap
                var myapps appv1.MyAppList
                if err := r.List(ctx, &myapps, client.InNamespace(o.GetNamespace())); err != nil {
                    log.Log.Error(err, "failed to list myapps for configmap event")
                    return nil
                }
                var reqs []reconcile.Request
                for _, myapp := range myapps.Items {
                    if myapp.Spec.ConfigMapRef == o.GetName() {
                        reqs = append(reqs, reconcile.Request{
                            NamespacedName: types.NamespacedName{
                                Namespace: myapp.Namespace,
                                Name:      myapp.Name,
                            },
                        })
                    }
                }
                return reqs
            },
        )).
        Complete(r)
}

Key Points in the Reconcile function:

r.Get(ctx, req.NamespacedName, myapp): This fetches the MyApp object. errors.IsNotFound is crucial for handling deletions gracefully.
Finalizers: The myAppFinalizer is used to implement custom cleanup logic before the MyApp resource is actually removed from etcd. Without it, Kubernetes garbage collection would immediately delete dependent resources, but your controller wouldn't get a chance to perform external cleanup (like unregistering an API from an API Gateway or external database cleanup).
ctrl.SetControllerReference(myapp, deployment, r.Scheme): This is extremely important. It sets an OwnerReference on the Deployment (and Service) pointing back to the MyApp resource. This enables Kubernetes' garbage collector to automatically delete the Deployment when the MyApp resource is deleted. It also allows our controller to easily identify which MyApp instance owns a particular Deployment or Service.
Idempotent Logic: Notice how we first Get the Deployment (or Service). If it doesn't exist (errors.IsNotFound), we Create it. If it exists, we Update it if its Spec doesn't match our desired state. This makes our controller robust to multiple reconciliation calls.
reflect.DeepEqual: Used for comparing the desired Spec with the actual Spec of Kubernetes resources.
r.Status().Update(ctx, myapp): This specifically updates the status subresource of the MyApp object. It's good practice to separate spec updates (from users) and status updates (from controllers).
getEnvVarsForMyApp: This demonstrates watching for an additional resource (ConfigMap) and injecting its data. The SetupWithManager function specifically adds a Watches clause for ConfigMap to ensure that if a referenced ConfigMap changes, the owning MyApp is re-reconciled.

SetupWithManager function:

For(&appv1.MyApp{}): Tells the controller to watch MyApp resources and trigger reconciliation whenever they are added, updated, or deleted.
Owns(&appsv1.Deployment{}): Tells the controller to watch Deployment resources. If a Deployment that is owned by a MyApp resource changes (e.g., gets deleted accidentally), it will trigger a reconciliation for the owning MyApp.
Owns(&corev1.Service{}): Similar to Owns(&appsv1.Deployment{}) but for Service resources.
Watches(&corev1.ConfigMap{}, handler.EnqueueRequestsFromMapFunc(...)): This is an example of explicitly watching a resource that is not directly owned but referenced by your CRD. If a ConfigMap changes, this map function iterates through all MyApps in the same namespace to find any that reference the changed ConfigMap, and then triggers reconciliation for those MyApp instances. This pattern is crucial for complex dependencies.

Step 5: Build and Deploy the Controller

After implementing the logic, build your controller image and deploy it to your Kubernetes cluster.

Generate RBAC and CRD manifests: bash make manifests This command generates (or updates) the config/rbac (Role-Based Access Control) manifests and the config/crd (Custom Resource Definition) manifests based on your Go types and +kubebuilder markers. The RBAC rules ensure your controller has the necessary permissions to get, list, watch, create, update, patch, and delete the MyApp resources, Deployments, Services, and ConfigMaps.
Install CRDs to your cluster: bash make install This command applies the CRD definition to your Kubernetes cluster. Now, the API Server knows about MyApp resources.
Build the Docker image: bash make docker-build IMG=yourorg/myapp-controller:v1.0.0 Replace yourorg/myapp-controller:v1.0.0 with your desired image name and tag.
Push the Docker image: bash docker push yourorg/myapp-controller:v1.0.0
Deploy the controller to your cluster: bash make deploy IMG=yourorg/myapp-controller:v1.0.0 This command updates the config/manager/manager.yaml manifest with your image name and deploys the controller (as a Deployment) to your Kubernetes cluster, typically in the myapp-controller-system namespace.

Step 6: Test Your Controller

Now, create an instance of your MyApp custom resource and observe your controller in action!

Create a sample MyApp instance: Create a file config/samples/app_v1_myapp.yaml: yaml apiVersion: app.example.com/v1 kind: MyApp metadata: name: myapp-sample namespace: default # Assuming your controller runs in default namespace, or adapt to your needs spec: image: nginx:1.23.3 # Use a real image replicas: 3 port: 80 # configMapRef: myapp-config # Optional: If you want to test ConfigMap integration
Apply the MyApp resource: bash kubectl apply -f config/samples/app_v1_myapp.yaml
Verify the resources:
- Check your MyApp status: bash kubectl get myapp myapp-sample -o yaml You should see the Status field being populated by your controller, showing availableReplicas, deploymentName, and serviceName.
- Check the Deployment: bash kubectl get deployment myapp-sample
- Check the Service: bash kubectl get service myapp-sample
- Check the Pods: bash kubectl get pods -l app=myapp,myapp_cr=myapp-sample
Test updates: Edit config/samples/app_v1_myapp.yaml to change replicas to 5 or the image to nginx:1.24.0, then kubectl apply -f .... Observe the Deployment updating.
Test deletion: bash kubectl delete myapp myapp-sample Observe the Deployment and Service being deleted by your controller's finalizer logic.

This step-by-step process provides a solid foundation for building a functional Kubernetes controller that effectively watches and reacts to changes in your Custom Resource Definitions.

Advanced Controller Concepts

Building a basic controller is a great start, but real-world operators often require more sophisticated mechanisms to ensure robustness, safety, and a complete lifecycle management experience. This section explores several advanced controller concepts that are crucial for developing production-grade operators.

Finalizers: Ensuring Graceful Cleanup

We briefly touched upon finalizers in our Reconcile function. Finalizers are a powerful mechanism in Kubernetes that allows controllers to perform cleanup logic on dependent external resources before a Kubernetes object is truly deleted from etcd. Without finalizers, when you delete a custom resource, Kubernetes' garbage collector might immediately delete all owned Kubernetes objects (like Deployments, Services), leaving no opportunity for your controller to clean up external resources or perform complex internal cleanup.

How Finalizers Work:

Registration: When your controller creates a custom resource (or sees one without its finalizer), it adds a specific string (e.g., myapps.app.example.com/finalizer) to the metadata.finalizers array of that custom resource.
Deletion Request: When a user issues a kubectl delete myapp myapp-instance command, Kubernetes does not immediately remove the resource from etcd. Instead, it sets the metadata.deletionTimestamp field on the object and adds it to the list of objects for the controller to reconcile.
Reconciliation Trigger: Your controller's Reconcile function is called because the deletionTimestamp has been set.
Cleanup Logic: Inside Reconcile, the controller checks if deletionTimestamp is set and if its finalizer is present. If both are true, it executes its cleanup logic. This could involve:
- Deleting dependent Kubernetes resources (if not handled by OwnerReference).
- Interacting with external systems (e.g., deleting cloud resources, unregistering from an API Gateway like APIPark).
- Persisting audit logs.
Finalizer Removal: Once all cleanup is complete, the controller removes its finalizer string from the metadata.finalizers array of the custom resource.
Actual Deletion: Only after the metadata.finalizers array is empty will Kubernetes finally delete the resource from etcd.

This pattern guarantees that your controller has the opportunity to perform necessary cleanup, ensuring that your custom resource's deletion is complete and consistent across both Kubernetes and any external systems it manages.

Webhooks (Admission Controllers): Enforcing Policies and Mutating Resources

While CRD schema validation is powerful for basic structural and type checks, it has limitations. It cannot perform:

Cross-field validation: (e.g., "if field A is X, then field B must be Y").
Dynamic validation: (e.g., checking if a referenced ConfigMap actually exists).
Mutating logic: Automatically setting default values that are dynamically determined.

This is where Webhooks, specifically Admission Controllers, come into play. Admission controllers are plugins that intercept requests to the Kubernetes API Server before an object is persisted to etcd (but after authentication and authorization). There are two main types:

Validating Admission Webhooks: These webhooks perform custom validation logic. If the webhook rejects the request, the object is not created or updated.
- Use Cases: Enforcing complex business rules, validating uniqueness across the cluster, ensuring references to other resources are valid.
- Example: For our MyApp CRD, a validating webhook could check if the image specified is from an approved registry, or if the port is not already in use by another MyApp instance.
Mutating Admission Webhooks: These webhooks can modify the request object before it is persisted.
- Use Cases: Injecting sidecar containers, adding labels/annotations, setting dynamic default values, transforming resource specs.
- Example: A mutating webhook could automatically inject an Istio sidecar proxy into our MyApp's Pods, or add default resource limits if not specified by the user.

Implementing Webhooks with Kubebuilder:

Kubebuilder provides excellent support for generating webhook boilerplate. You can create them using:

kubebuilder create webhook --group app --version v1 --kind MyApp --defaulting --validation

This command generates a webhook.go file with functions for defaulting and validating your MyApp resources. Webhooks run within the same controller binary, exposing HTTPS endpoints that the Kubernetes API Server calls. You'll also need to configure ValidatingWebhookConfiguration and MutatingWebhookConfiguration resources, which Kubebuilder can generate for you.

Considerations for Webhooks:

Performance: Webhooks are in the critical path of API requests. Ensure they are performant and don't introduce significant latency.
Reliability: Webhooks must be highly available. If a webhook is down or unresponsive, it can block API operations.
Idempotency: Mutating webhooks should be idempotent.
Ordering: The order in which webhooks are called can matter, especially if multiple webhooks mutate the same resource.

Scalability and Performance

As your cluster grows and the number of custom resources increases, your controller's scalability and performance become critical.

Multiple Replicas: You can run multiple replicas of your controller Deployment. controller-runtime (the library used by Kubebuilder) handles leader election using a Lease object, ensuring only one replica is active at a time (for primary resources) to prevent race conditions. For secondary resources, multiple replicas can independently process events.
Resource Limits: Define appropriate CPU and memory limits for your controller Pods to prevent resource exhaustion and ensure stability.
Efficient Watchers: Ensure your Reconcile logic is efficient. Avoid making unnecessary API calls. Rely heavily on informers and listers for cached data.
Rate Limiting: The workqueue naturally handles rate limiting for retries, but be mindful of how frequently your controller updates status or external resources, as this can still hit API limits.
Sharding: For extremely large clusters or specific workloads, you might consider sharding your controller, where different instances manage resources in different namespaces or based on specific labels.

Security: RBAC for the Controller

Your controller is a privileged entity within the Kubernetes cluster. It needs permissions to get, list, watch, create, update, patch, and delete the resources it manages (your CRDs, Deployments, Services, ConfigMaps, etc.).

Kubebuilder automatically generates the necessary Role-Based Access Control (RBAC) manifests in config/rbac. These include:

Role / ClusterRole: Defines the permissions (verbs and resources) your controller needs. ClusterRole is used if your controller manages cluster-scoped resources or needs access across namespaces.
ServiceAccount: The identity your controller Pod runs as.
RoleBinding / ClusterRoleBinding: Binds the ServiceAccount to the Role or ClusterRole.

Best Practices for RBAC:

Least Privilege: Grant your controller only the minimum necessary permissions. Avoid giving it * access to resources or verbs unless absolutely required.
Namespace Scoping: If your controller only manages resources within specific namespaces, use Role and RoleBinding instead of ClusterRole and ClusterRoleBinding where possible.
Audit: Regularly review your controller's RBAC definitions.

By thoughtfully applying these advanced concepts, you can transform a basic custom controller into a robust, secure, and production-ready Kubernetes operator that seamlessly manages complex application lifecycles.

Testing and Deployment

After developing your Kubernetes controller, thoroughly testing it and deploying it effectively are the next critical phases. A robust testing strategy ensures reliability, while a well-planned deployment makes your operator easy to consume and manage in various environments.

Testing Your Controller

Testing a Kubernetes controller can be multifaceted, involving different levels of abstraction. Kubebuilder and controller-runtime provide excellent tools to facilitate this.

Unit Tests:
- Purpose: To test individual functions or components of your controller in isolation, without involving a Kubernetes cluster.
- Focus: Logic within helper functions, data transformations, parsing, and non-Kubernetes specific code.
- Tools: Standard Go testing framework (go test). You'd typically mock or fake Kubernetes client interactions if a function calls the API server.
- Example: Testing labelsForMyApp or deploymentForMyApp functions to ensure they generate correct labels or deployment specs given a MyApp object.
Integration Tests (envtest):A typical integration test in Kubebuilder uses a _test.go file alongside your controller, leveraging ginkgo and gomega for BDD-style testing.
- Purpose: To test the interaction between your controller and a real (but lightweight) Kubernetes API server. This is where you verify the reconciliation logic in a near-real environment.
- Focus: The Reconcile loop, how the controller reacts to CRD creations/updates/deletions, and how it manages dependent resources.
- Tools: envtest (part of controller-runtime/pkg/envtest). envtest downloads Kubernetes binaries (API server, etcd) and starts them locally. It provides a real Kubernetes API endpoint for your controller to interact with, but without the full overhead of a minikube or kind cluster (no kubelet, networking, etc.).
- Methodology:
  - Start envtest.
  - Install your CRDs into the envtest cluster.
  - Start your controller's Manager against the envtest client.
  - Create instances of your custom resource (e.g., MyApp).
  - Assert that your controller creates/updates/deletes the expected dependent resources (Deployments, Services).
  - Stop envtest.
- Benefits: Fast, reliable, and provides high confidence that your reconciliation logic works as intended without needing a full cluster.
End-to-End (E2E) Tests:
- Purpose: To test the complete system, including your controller, CRDs, and the Kubernetes cluster, in a scenario that closely mimics production.
- Focus: Verifying the actual runtime behavior, including Pod scheduling, networking, external integrations (if any), and full lifecycle.
- Tools: Can use kind, minikube, or a dedicated test cluster. Test frameworks like e2e.test from Kubernetes itself, or custom Go tests that interact with kubectl.
- Methodology:
  - Deploy your controller and CRDs to a test cluster.
  - Create complex scenarios involving your custom resources.
  - Verify that Pods are running, services are accessible, and all expected side effects (e.g., external API calls, database entries) occur.
  - Test upgrades, scaling, and fault tolerance.
- Considerations: E2E tests are slower and more resource-intensive but provide the highest confidence in the overall solution.

Deployment Strategies

Once your controller is thoroughly tested, the next step is to package and deploy it.

Kubernetes Manifests (YAML):
- Kubebuilder Output: The config/ directory generated by Kubebuilder contains all the necessary YAML manifests:
  - config/crd/bases/: Your CRD definitions.
  - config/rbac/: Role, ClusterRole, ServiceAccount, RoleBinding, ClusterRoleBinding for your controller.
  - config/manager/: The Deployment for your controller and the Service to expose its webhook (if you have one).
  - config/default/: Kustomize base for applying all these manifests.
- Deployment: You can apply these directly using kubectl apply -k config/default/.
- Customization: For different environments (dev, staging, prod), you can use Kustomize overlays to modify resources (e.g., change replica count, image tag, namespace).
Helm Charts:
- Purpose: Helm is the de facto package manager for Kubernetes. It allows you to define, install, and upgrade even the most complex Kubernetes applications.
- Benefits:
  - Version Management: Easy to manage different versions of your controller.
  - Customization: Users can easily configure your controller via values.yaml (e.g., image, replica count, resource limits, webhook configurations).
  - Dependency Management: If your controller depends on other components (e.g., Prometheus for monitoring), Helm can manage those dependencies.
  - Release Management: Simplified rollback and upgrade procedures.
- Creating a Helm Chart: You can convert your Kubebuilder-generated manifests into a Helm chart. Many operators provide both raw YAML and Helm charts. Kubebuilder doesn't directly generate Helm charts, but there are community tools or manual processes to convert.
- Deployment: Users can install your controller with helm install my-controller ./my-controller-chart.
Container Image Management:
- Your controller runs as a container within your Kubernetes cluster. You need to:
  - Build the Image: Use make docker-build to create a Docker image of your controller.
  - Push to Registry: Push the image to a container registry (e.g., Docker Hub, GCR, Quay.io) that your Kubernetes cluster can access. This is done with docker push.
  - Image Pull Policy: Configure an appropriate imagePullPolicy in your controller's Deployment (e.g., Always for development, IfNotPresent for production).
  - Image Tagging: Use clear and consistent image tags (e.g., v1.0.0, latest).

Effective testing and a streamlined deployment process are essential for the adoption and success of your custom Kubernetes controller. They reduce operational burden, instill confidence in the solution, and make it easier for users to leverage your custom resource definitions and the automation they unlock.

Conclusion

The journey of building a Kubernetes controller to watch for changes to Custom Resource Definitions (CRDs) is a profound exploration into the heart of Kubernetes' extensibility. We've traversed from the fundamental concepts of the Kubernetes API and the symbiotic relationship between CRDs and controllers, through the meticulous design of custom resources, and into the practical implementation of the reconciliation loop using powerful tools like Kubebuilder. We've also delved into advanced considerations such as finalizers for graceful cleanup, webhooks for sophisticated policy enforcement, and crucial aspects of testing and deployment.

By extending the Kubernetes API with your own CRDs and automating their lifecycle with custom controllers, you transform Kubernetes from a generic orchestrator into a highly specialized platform tailored to your unique operational needs. This "Operator pattern" empowers you to codify human operational knowledge, making complex application management declarative, repeatable, and resilient. Whether you're managing complex database deployments, integrating proprietary systems, or orchestrating AI Gateway functionalities, controllers provide the automation backbone.

Consider, for instance, the challenges of deploying and managing various API-driven services, especially in the realm of Artificial Intelligence. A custom controller could define AIModelDeployment CRDs, automating the provisioning of inference servers. However, exposing these diverse AI services and traditional RESTful APIs securely and efficiently to consumers requires an additional layer. This is where platforms like APIPark become invaluable. As an open-source AI Gateway and API management platform, APIPark complements your controller by providing unified management, authentication, and traffic control for the APIs your custom resources bring to life. It ensures that the services orchestrated by your custom controller are not only robust internally but also consumable and governed externally, offering a seamless bridge between Kubernetes' powerful extensibility and comprehensive API lifecycle management.

Embracing CRDs and controllers opens up a world of possibilities for automating virtually any operational task within your Kubernetes clusters. The ability to define custom APIs and build intelligent operators around them fundamentally shifts the paradigm of infrastructure and application management, moving towards a truly declarative and self-healing system. As you continue to build and refine your controllers, remember the principles of idempotency, robust error handling, thoughtful API design, and least privilege. These tenets will guide you in creating powerful, stable, and maintainable Kubernetes operators that enhance efficiency, security, and data optimization for your entire organization.

Frequently Asked Questions (FAQs)

1. What is the fundamental difference between a Custom Resource Definition (CRD) and a Custom Resource (CR)?

A Custom Resource Definition (CRD) is a schema definition that extends the Kubernetes API, allowing you to introduce a new type of object into your cluster (e.g., MyApp). It's like defining a new table schema in a database. A Custom Resource (CR) is an instance of that custom type (e.g., myapp-sample). Just as you define a Deployment (CRD) and then create a specific my-frontend-deployment (CR), a CRD defines the blueprint, and a CR is an actual object conforming to that blueprint.

2. Why do I need a controller if I can just define my custom resources with a CRD?

While a CRD allows the Kubernetes API server to store and validate your custom resources, Kubernetes itself doesn't inherently understand how to act upon them. A controller is the active component that provides the operational logic. It continuously watches for creations, updates, or deletions of your custom resources and then takes specific actions (e.g., creating Deployments, Services, or interacting with external systems) to ensure the actual state of the cluster matches the desired state defined in your CR. Without a controller, your custom resources are merely inert data objects within the API.

3. What is the "reconciliation loop" and why is it so central to Kubernetes controllers?

The reconciliation loop is the core operational pattern of all Kubernetes controllers. It's an endless process where the controller constantly observes the cluster's actual state, compares it to the desired state defined in its watched resources (like a CRD instance's spec field), and then takes corrective actions to converge the actual state towards the desired state. This loop makes controllers resilient to failures, self-healing, and ensures the system consistently meets its defined configuration, even in the face of unexpected changes or transient errors. It's designed to be idempotent, meaning performing the same actions multiple times produces the same result without unintended side effects.

4. How do I prevent my controller from making too many API calls and overloading the Kubernetes API server?

Several mechanisms help in optimizing API interactions. Firstly, controllers primarily rely on informers and listers. Informers maintain a local, in-memory cache of watched resources, and listers provide read-only access to this cache. This significantly reduces direct calls to the API server for read operations. Secondly, the workqueue used by controllers deduplicates events, preventing redundant reconciliation attempts for the same resource within a short timeframe. Finally, implementing intelligent reconciliation logic that only performs updates when necessary (by comparing current vs. desired states of secondary resources) further minimizes API writes. For scenarios where your custom resources manage external services, using an API Gateway like APIPark can offload complex traffic management, security, and analytics tasks, further reducing the API load on your Kubernetes cluster by externalizing these concerns.

5. When should I consider using a webhook (Admission Controller) for my CRD, beyond just schema validation?

You should consider using a webhook when your validation or mutation logic becomes too complex for basic OpenAPI v3 schema validation, or when it needs to interact with the cluster's live state. Use a Validating Webhook for: * Cross-field validation (e.g., "if replicas > 5, then image must be from a trusted registry"). * Dynamic validation based on other resources (e.g., "ensure the referenced ConfigMap actually exists"). * Enforcing cluster-wide unique constraints that aren't handled by Kubernetes metadata. Use a Mutating Webhook for: * Dynamically setting default values that depend on the request context or other cluster state. * Injecting sidecar containers or specific labels/annotations based on resource characteristics. * Transforming or sanitizing resource definitions before they are persisted. Webhooks are powerful but must be designed for performance and reliability as they are in the critical path of all API requests.

🚀You can securely and efficiently call the OpenAI API on APIPark in just two steps:

Step 1: Deploy the APIPark AI gateway in 5 minutes.

APIPark is developed based on Golang, offering strong product performance and low development and maintenance costs. You can deploy APIPark with a single command line.

curl -sSO https://download.apipark.com/install/quick-start.sh; bash quick-start.sh

In my experience, you can see the successful deployment interface within 5 to 10 minutes. Then, you can log in to APIPark using your account.

Step 2: Call the OpenAI API.