Container Checkpointing in Kubernetes With a Custom API

Problem Statement

Challenge

Organizations running containerized applications in Kubernetes often need to capture and preserve the state of running containers for:

  • Disaster recovery
  • Application migration
  • Debug/troubleshooting
  • State preservation
  • Environment reproduction

However, there’s no straightforward, automated way to:

  1. Create container checkpoints on-demand
  2. Store these checkpoints in a standardized format
  3. Make them easily accessible across clusters
  4. Trigger checkpointing through a standard interface

Current Limitations

  • Manual checkpoint creation requires direct cluster access
  • No standardized storage format for checkpoints
  • Limited integration with container registries
  • Lack of programmatic access for automation
  • Complex coordination between containerd and storage systems

Solution

A Kubernetes sidecar service that:

  1. Exposes checkpoint functionality via REST API
  2. Automatically converts checkpoints to OCI-compliant images
  3. Stores images in ECR for easy distribution
  4. Integrates with existing Kubernetes infrastructure
  5. Provides a standardized interface for automation

This solves the core problems by:

  • Automating the checkpoint process
  • Standardizing checkpoint storage
  • Making checkpoints portable
  • Enabling programmatic access
  • Simplifying integration with existing workflows

Target users:

  • DevOps teams
  • Platform engineers
  • Application developers
  • Site Reliability Engineers (SREs)

Forensic container checkpointing is based on Checkpoint/Restore In Userspace (CRIU) and allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed. The copy of the container can be analyzed and restored in a sandbox environment multiple times without the original container being aware of it. Forensic container checkpointing was introduced as an alpha feature in Kubernetes v1.25.

This article will guide you on how to deploy Golang code that can be used to take a container checkpoint using an API. 

The code takes a pod identifier, retrieves the container ID from containerd as an input, and then uses the ctr command to checkpoint the specific container in the k8s.io namespace of containerd:

Prerequisites

  • Kubernetes cluster
  • Install ctr commandline tool. if you are able to run ctr commands on the kubelet or worker node; if not, install or adjust AMI to contain the ctr
  • kubectl configured to communicate with your cluster
  • Docker installed locally
  • Access to a container registry (e.g., Docker Hub, ECR)
  • Helm (for installing Nginx Ingress Controller)

Step 0: Code to Create Container Checkpoint Using GO

Create a file named checkpoint_container.go with the following content:

Go

 

Step 1: Initialize the go Module

Shell

 

Modify the go.mod file:

Go

 

Run the following command:

Shell

 

Step 2: Build and Publish Docker Image

Create a Dockerfile in the same directory:

Dockerfile

 

This Dockerfile does the following:

  1. Uses golang:1.20 as the build stage to compile your Go application.
  2. Uses amazonlinux:2 as the final base image.
  3. Installs the AWS CLI, Docker (which includes containerd), and skopeo using yum and amazon-linux-extras.
  4. Copies the compiled Go binary from the build stage.
Shell

 

Replace <your-docker-repo> with your actual Docker repository.

Step 3: Apply the RBAC resources

Create a file named rbac.yaml:

YAML

 

Apply the RBAC resources:

Shell

 

Step 4: Create a Kubernetes Deployment

Create a file named deployment.yaml:

YAML

 

Apply the deployment:

Shell

 

In deployment.yaml, update the following:

YAML

Step 5: Kubernetes Service

Create a file named service.yaml:

YAML

 

Apply the service:

Shell

 

Step 6: Install Ngnix Ingress Contoller

Shell

 

Step 7: Create Ingress Resource

Create a file named ingress.yaml:

YAML

 

Apply the Ingress:

Shell

 

Step 8: Test the API

Shell

 

Shell

 

Replace <EXTERNAL-IP> with the actual external IP.

Additional Considerations

  1. Security.
    • Implement HTTPS by setting up TLS certificates
    • Add authentication to the API
  2. Monitoring. Set up logging and monitoring for the API and checkpoint process.
  3. Resource management. Configure resource requests and limits for the sidecar container.
  4. Error handling. Implement robust error handling in the Go application.
  5. Testing. Thoroughly test the setup in a non-production environment before deploying it to production.
  6. Documentation. Maintain clear documentation on how to use the checkpoint API.

Conclusion

This setup deploys the checkpoint container as a sidecar in Kubernetes and exposes its functionality through an API accessible from outside the cluster. It provides a flexible solution for managing container checkpoints in a Kubernetes environment.

AWS/EKS Specific

Step 7: Install the AWS Load Balancer Controller

Instead of using the Nginx Ingress Controller, we’ll use the AWS Load Balancer Controller. This controller will create and manage ALBs for our Ingress resources.

1. Add the EKS chart repo to Helm:

Shell

 

2. Install the AWS Load Balancer Controller:

Shell

 

Replace <your-cluster-name> with your EKS cluster name.

Note: Ensure that you have the necessary IAM permissions set up for the AWS Load Balancer Controller. You can find the detailed IAM policy in the AWS documentation.

Step 8: Create Ingress Resource

Create a file named ingress.yaml:

YAML

 

Apply the Ingress:

Shell

 

Step 9: Test the API

1. Get the ALB DNS name:

Shell

 

Look for the ADDRESS field, which will be the ALB’s DNS name.

2. Send a test request:

Shell

 

Replace <ALB-DNS-NAME> with the actual DNS name of your ALB from step 1.

Additional Considerations for AWS ALB

1. Security groups. The ALB will have a security group automatically created. Ensure it allows inbound traffic on port 80 (and 443 if you set up HTTPS).

2. SSL/TLS: To enable HTTPS, you can add the following annotations to your Ingress:

YAML

 

3. Access logs. Enable access logs for your ALB by adding the following:

YAML

 

4. WAF integration. If you want to use AWS WAF with your ALB, you can add:

YAML

 

5. Authentication. You can set up authentication using Amazon Cognito or OIDC by using the appropriate ALB Ingress Controller annotations.

These changes will set up your Ingress using an AWS Application Load Balancer instead of Nginx. The ALB Ingress Controller will automatically provision and configure the ALB based on your Ingress resource.

Conclusion

Remember to ensure that your EKS cluster has the necessary IAM permissions to create and manage ALBs. This typically involves creating an IAM policy and a service account with the appropriate permissions.

This setup will now use AWS’s native load-balancing solution, which integrates well with other AWS services and can be more cost-effective in an AWS environment.

Source:
https://dzone.com/articles/container-checkpointing-kubernetes-api