Introduction

Warning

These docs contain information that relates to my setup. They may or may not work for you.



My Home Operations repository

... managed by Flux, Renovate and GitHub Actions πŸ€–

DiscordΒ  Renovate

Main k8s cluster stats:

Kubernetes Β  Talos Β  Age-DaysΒ  Node-CountΒ  Pod-CountΒ  CPU-UsageΒ  Memory-UsageΒ 



πŸ‘‹ Welcome to my Home Operations repository. This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.


πŸ”Ž Support

If you like this project, please consider supporting my work through my GitHub sponsorship page.


🀝 Thanks

Thanks to all the people who donate their time to the Kubernetes @Home Discord community. A lot of inspiration for my cluster comes from the people that have shared their clusters using the k8s-at-home GitHub topic. Be sure to check out the Kubernetes @Home search for ideas on how to deploy applications or get ideas on what you can deploy.


πŸ” License

See LICENSE

Hardware

DeviceCountOS Disk SizeData Disk SizeRamOperating SystemPurpose
YCSD 6LAN i211 MiniPC i3 7100U1128GB mSATA-8GBVyOSRouter
Intel NUC8i3BEH1512GB SSD1TB NVMe (rook-ceph)32GBTalosKubernetes Node
Intel NUC8i5BEH2512GB SSD1TB NVMe (rook-ceph)32GBTalosKubernetes Node
Synology DS918+1-2x14TB + 1x10TB + 1x6TB (SHR)8GBSynology DSM7NFS + Backup Server
Raspberry Pi 41128GB (SD)-4GBPiKVMNetwork KVM
Unifi USW-Lite-16-PoE2----Core network switch
Unifi USW-Flex-Mini1----Secondary network switch
Unifi UAP-AC-Pro4----Wireless AP

☁️ Cloud services

While most of my infrastructure and workloads are selfhosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.

The alternative solution to these two problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Authentik. However, maintaining another cluster and monitoring another group of workloads is a lot more time and effort than I am willing to put in and only saves me roughly $10/month.

ServiceUseCost
GitHubHosting this repository and continuous integration/deploymentsFree
Auth0Identity management and authenticationFree
CloudflareDomain, DNS and proxy managementFree
1PasswordSecrets with External Secrets~$65/y
Terraform CloudStoring Terraform stateFree
B2 StorageOffsite application backups~$5/m
PushoverKubernetes Alerts and application notificationsFree
Total: ~$10/m

Kubernetes

My main cluster is Talos provisioned on bare-metal using the official talosctl CLI tool. I render my Talos configuration using the talhelper CLI tool. This allows me to keep the Talos configuration as DRY as possible.

This is a semi hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server for (NFS) file storage.

Core Components

  • actions-runner-controller: Self-hosted Github runners.
  • cilium: Internal Kubernetes networking plugin.
  • cert-manager: Creates SSL certificates for services in my Kubernetes cluster.
  • external-dns: Automatically manages DNS records from my cluster in a cloud DNS provider.
  • external-secrets: Managed Kubernetes secrets using 1Password Connect.
  • ingress-nginx: Ingress controller to expose HTTP traffic to pods over DNS.
  • multus: Allows multi-homing Kubernetes pods.
  • rook: Distributed block storage for peristent storage.
  • sops: Managed secrets for Kubernetes, Ansible and Terraform which are commited to Git.
  • tf-controller: Additional Flux component used to run Terraform from within a Kubernetes cluster.
  • volsync and snapscheduler: Backup and recovery of persistent volume claims.

GitOps

Flux watches my kubernetes folder (see Directory structure) and makes the changes to my cluster based on the YAML manifests.

The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and one or many Flux kustomizations. Those Flux kustomizations will generally have a HelmRelease or other resources related to the application underneath it which will be applied.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When PRs are merged Flux applies the changes to my cluster.

Directory structure

My home-ops repository contains the following directories under kubernetes.

πŸ“ kubernetes      # Kubernetes clusters defined as code
β”œβ”€πŸ“ main     # My main kubernetes cluster
β”‚ β”œβ”€πŸ“ bootstrap   # Flux installation
β”‚ β”œβ”€πŸ“ flux        # Main Flux configuration of repository
β”‚ β””β”€πŸ“ apps        # Apps deployed into my cluster grouped by namespace (see below)
β””β”€πŸ“ tools         # Manifests that come in handy every now and then

Flux resource layout

Below is a a high level look at the layout of how my directory structure with Flux works. In this brief example you are able to see that authelia will not be able to run until glauth and cloudnative-pg are running. It also shows that the Cluster custom resource depends on the cloudnative-pg Helm chart. This is needed because cloudnative-pg installs the Cluster custom resource definition in the Helm chart.

# Key: <kind> :: <metadata.name>
GitRepository :: flux-system
    Kustomization :: cluster
        Kustomization :: cluster-apps
            Kustomization :: cluster-apps-authelia
                DependsOn:
                    Kustomization :: cluster-apps-glauth
                    Kustomization :: cluster-apps-cloudnative-pg-cluster
                HelmRelease :: authelia
            Kustomization :: cluster-apps-glauth
                HelmRelease :: glauth
            Kustomization :: cluster-apps-cloudnative-pg
                HelmRelease :: cloudnative-pg
            Kustomization :: cluster-apps-cloudnative-pg-cluster
                DependsOn:
                    Kustomization :: cluster-apps-cloudnative-pg
                Cluster :: postgres

Storage

Storage in my cluster is handled in a number of ways. The in-cluster storage is provided by a rook Ceph cluster that is running on a number of my nodes.

rook-ceph block storage

The bulk of my cluster storage relies on my CephBlockPool. This ensures that my data is replicated across my storage nodes.

NFS storage

Finally, I have my NAS that exposes several exports over NFS. Given how NFS is a very bad idea for storing application data (see for example this Github issue) I only use it to store data at rest, such as my personal media files, Linux ISO's, backups, etc.

Backups

Automation

Terraform

Ansible

How to...

Here you can find information on how to accomplish specific scenario's.

Run a Pod in a VLAN

Sometimes you'll want to give a Kubernetes Pod direct access to a VLAN. This could be for any number of reasons, but the most common reason is for the application to be able to automatically discover devices on that VLAN.

A good example of this would be Home Assistant. This application has several integrations that rely on being able to discover the hardware devices (e.g. Sonos speakers or ESPHome devices).

Prerequisites

For a Kubernetes cluster to be able to add additional network interfaces to Pods (this is also known as "multi-homing") the Multus CNI needs to be installed in your cluster.

NIC configuration

Make sure that the Kubernetes node has a network interface that is connected to the VLAN you wish to connect to.

Note

My nodes only have a single NIC, so I have set them up so their main interface gets it's IP address over DHCP and a virtual interface connecting to the VLAN. How to do this will depend on your operating system.

Multus Configuration

My Multus Helm configuration can be found here.

It is important to note that the paths of your CNI plugin binaries / config might differ depending on the Kubernetes distribution you are running. For my Talos setup they need to be set to /opt/cni/bin and etc/cni/net.d respectively.

NetworkAttachmentDefinition

Once the Multus CNI has been installed and configured you can use the NetworkAttachmentDefinition Custom Resource to define the virtual IP addresses that you want to hand out. These need to be free addresses within the VLAN subnet, so it's important to make sure that they do not overlap with your DHCP server range(s).

{{ #include ../../../../kubernetes/apps/home-automation/home-assistant/app/networkattachmentdefinition.yaml }}

Be sure to check out the official documentation for more information on how to configure the spec.config field.

Pod configuration

Once the NetworkAttachmentDefinition has been loaded it is possible to use it within a Pod. This can be done by setting an annotation on the Pod that references it. Staying with the Home Assistant example (full Helm values), this would be:

k8s.v1.cni.cncf.io/networks: macvlan-static-iot-hass

App-specific configuration: Home Assistant

In order for Home Assistant to actually use the additional network interface you will need to explicitly enable it instead of relying on automatic network detection. To do so, navigate to Settings >> System >> Network (this setting is only available to Home Assistant users that have "Advanced mode" enabled in their user profile) and place a checkmark next to the adapters that you wish to use with Home Assistant integrations.

Run a Service with both TCP and UDP

One example where it is really nice having a single unified Service expose all the ports instead of several "single-purpose" ones is the Unifi Controller: Helm values.

Up until Kubernetes version 1.26 it was (by default) not possible to have a single Service expose both TCP and UDP protocols.

Prerequisites

Since Kubernetes version 1.26 the MixedProtocolLBService has graduated to GA status, and no special flags should be required. Up until version 1.26 it was required to enable the MixedProtocolLBService=true feature-gate in order to achieve this functionality.