Introduction
My Home Operations repository
... managed by Flux, Renovate and GitHub Actions π€
Main k8s cluster stats:
π Welcome to my Home Operations repository. This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.
π Support
If you like this project, please consider supporting my work through my GitHub sponsorship page.
π€ Thanks
Thanks to all the people who donate their time to the Kubernetes @Home Discord community. A lot of inspiration for my cluster comes from the people that have shared their clusters using the k8s-at-home GitHub topic. Be sure to check out the Kubernetes @Home search for ideas on how to deploy applications or get ideas on what you can deploy.
π License
See LICENSE
Hardware
Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
---|---|---|---|---|---|---|
YCSD 6LAN i211 MiniPC i3 7100U | 1 | 128GB mSATA | - | 8GB | VyOS | Router |
Intel NUC8i3BEH | 1 | 512GB SSD | 1TB NVMe (rook-ceph) | 32GB | Talos | Kubernetes Node |
Intel NUC8i5BEH | 2 | 512GB SSD | 1TB NVMe (rook-ceph) | 32GB | Talos | Kubernetes Node |
Synology DS918+ | 1 | - | 2x14TB + 1x10TB + 1x6TB (SHR) | 8GB | Synology DSM7 | NFS + Backup Server |
Raspberry Pi 4 | 1 | 128GB (SD) | - | 4GB | PiKVM | Network KVM |
Unifi USW-Lite-16-PoE | 2 | - | - | - | - | Core network switch |
Unifi USW-Flex-Mini | 1 | - | - | - | - | Secondary network switch |
Unifi UAP-AC-Pro | 4 | - | - | - | - | Wireless AP |
βοΈ Cloud services
While most of my infrastructure and workloads are selfhosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.
The alternative solution to these two problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Authentik. However, maintaining another cluster and monitoring another group of workloads is a lot more time and effort than I am willing to put in and only saves me roughly $10/month.
Service | Use | Cost |
---|---|---|
GitHub | Hosting this repository and continuous integration/deployments | Free |
Auth0 | Identity management and authentication | Free |
Cloudflare | Domain, DNS and proxy management | Free |
1Password | Secrets with External Secrets | ~$65/y |
Terraform Cloud | Storing Terraform state | Free |
B2 Storage | Offsite application backups | ~$5/m |
Pushover | Kubernetes Alerts and application notifications | Free |
Total: ~$10/m |
Kubernetes
My main cluster is Talos provisioned on bare-metal using the official talosctl
CLI tool. I render my Talos configuration using the talhelper CLI tool. This allows me to keep the Talos configuration as DRY as possible.
This is a semi hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server for (NFS) file storage.
Core Components
- actions-runner-controller: Self-hosted Github runners.
- cilium: Internal Kubernetes networking plugin.
- cert-manager: Creates SSL certificates for services in my Kubernetes cluster.
- external-dns: Automatically manages DNS records from my cluster in a cloud DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Ingress controller to expose HTTP traffic to pods over DNS.
- multus: Allows multi-homing Kubernetes pods.
- rook: Distributed block storage for peristent storage.
- sops: Managed secrets for Kubernetes, Ansible and Terraform which are commited to Git.
- tf-controller: Additional Flux component used to run Terraform from within a Kubernetes cluster.
- volsync and snapscheduler: Backup and recovery of persistent volume claims.
GitOps
Flux watches my kubernetes folder (see Directory structure) and makes the changes to my cluster based on the YAML manifests.
The way Flux works for me here is it will recursively search the kubernetes/apps folder until it finds the most top level kustomization.yaml
per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml
will generally only have a namespace resource and one or many Flux kustomizations. Those Flux kustomizations will generally have a HelmRelease
or other resources related to the application underneath it which will be applied.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When PRs are merged Flux applies the changes to my cluster.
Directory structure
My home-ops repository contains the following directories under kubernetes.
π kubernetes # Kubernetes clusters defined as code
ββπ main # My main kubernetes cluster
β ββπ bootstrap # Flux installation
β ββπ flux # Main Flux configuration of repository
β ββπ apps # Apps deployed into my cluster grouped by namespace (see below)
ββπ tools # Manifests that come in handy every now and then
Flux resource layout
Below is a a high level look at the layout of how my directory structure with Flux works. In this brief example you are able to see that authelia
will not be able to run until glauth
and cloudnative-pg
are running. It also shows that the Cluster
custom resource depends on the cloudnative-pg
Helm chart. This is needed because cloudnative-pg
installs the Cluster
custom resource definition in the Helm chart.
# Key: <kind> :: <metadata.name>
GitRepository :: home-ops-kubernetes
Kustomization :: cluster
Kustomization :: cluster-apps
Kustomization :: cluster-apps-authelia
DependsOn:
Kustomization :: cluster-apps-glauth
Kustomization :: cluster-apps-cloudnative-pg-cluster
HelmRelease :: authelia
Kustomization :: cluster-apps-glauth
HelmRelease :: glauth
Kustomization :: cluster-apps-cloudnative-pg
HelmRelease :: cloudnative-pg
Kustomization :: cluster-apps-cloudnative-pg-cluster
DependsOn:
Kustomization :: cluster-apps-cloudnative-pg
Cluster :: postgres
Storage
Storage in my cluster is handled in a number of ways. The in-cluster storage is provided by a rook Ceph cluster that is running on a number of my nodes.
rook-ceph block storage
The bulk of my cluster storage relies on my CephBlockPool
. This ensures that my data is replicated across my storage nodes.
NFS storage
Finally, I have my NAS that exposes several exports over NFS. Given how NFS is a very bad idea for storing application data (see for example this Github issue) I only use it to store data at rest, such as my personal media files, Linux ISO's, backups, etc.
Backups
Automation
Terraform
Ansible
How to...
Here you can find information on how to accomplish specific scenario's.
Run a Pod in a VLAN
Sometimes you'll want to give a Kubernetes Pod direct access to a VLAN. This could be for any number of reasons, but the most common reason is for the application to be able to automatically discover devices on that VLAN.
A good example of this would be Home Assistant. This application has several integrations that rely on being able to discover the hardware devices (e.g. Sonos speakers or ESPHome devices).
- Prerequisites
- NIC configuration
- Multus Configuration
- NetworkAttachmentDefinition
- Pod configuration
- App-specific configuration: Home Assistant
Prerequisites
For a Kubernetes cluster to be able to add additional network interfaces to Pods (this is also known as "multi-homing") the Multus CNI needs to be installed in your cluster.
I use the Helm chart provided by @angelnu to install Multus. The reason for using this over the official deployment method is that it has better support for upgrade/update scenarios.
NIC configuration
Make sure that the Kubernetes node has a network interface that is connected to the VLAN you wish to connect to.
My nodes only have a single NIC, so I have set them up so their main interface gets it's IP address over DHCP and a virtual interface connecting to the VLAN. How to do this will depend on your operating system.
Multus Configuration
My Multus Helm configuration can be found here.
It is important to note that the paths of your CNI plugin binaries / config might differ depending on the Kubernetes distribution you are running. For my Talos setup they need to be set to /opt/cni/bin
and etc/cni/net.d
respectively.
NetworkAttachmentDefinition
Once the Multus CNI has been installed and configured you can use the NetworkAttachmentDefinition
Custom Resource to define the virtual IP addresses that you want to hand out. These need to be free addresses within the VLAN subnet, so it's important to make sure that they do not overlap with your DHCP server range(s).
{{ #include ../../../../kubernetes/main/apps/home-automation/home-assistant/app/networkattachmentdefinition.yaml }}
Be sure to check out the official documentation for more information on how to configure the spec.config
field.
Pod configuration
Once the NetworkAttachmentDefinition has been loaded it is possible to use it within a Pod. This can be done by setting an annotation on the Pod that references it. Staying with the Home Assistant example (full Helm values), this would be:
k8s.v1.cni.cncf.io/networks: macvlan-static-iot-hass
App-specific configuration: Home Assistant
In order for Home Assistant to actually use the additional network interface you will need to explicitly enable it instead of relying on automatic network detection.
To do so, navigate to Settings >> System >> Network
(this setting is only available to Home Assistant users that have "Advanced mode" enabled in their user profile) and place a checkmark next to the adapters that you wish to use with Home Assistant integrations.
Run a Service with both TCP and UDP
One example where it is really nice having a single unified Service expose all the ports instead of several "single-purpose" ones is the Unifi Controller: Helm values.
Up until Kubernetes version 1.26 it was (by default) not possible to have a single Service expose both TCP and UDP protocols.
Prerequisites
Since Kubernetes version 1.26 the MixedProtocolLBService
has graduated to GA status, and no special flags should be required.
Up until version 1.26 it was required to enable the MixedProtocolLBService=true
feature-gate in order to achieve this functionality.