Proteger AKS con Azure Firewall

Otro de los escenarios que solemos encontrarnos en nuestros clientes es la necesidad de proteger los entornos de AKS con Azure Firewall, tanto de entrada como salida. En este artículo de la documentación oficial puedes encontrar un ejemplo de este escenario. Sin embargo, el mismo ha quedado desactualizado con la versión «clásica» de la configuración de reglas. En este te comparto el despliegue con Azure CLI de la nueva versión, además de algún ejemplo adicional, para que entiendas bien cómo funciona.

Configuración de las variables

Para este escenario vamos a necesitar las siguientes variables:

# Variables
RESOURCE_GROUP="aks-egress-control-by-azfw"
LOCATION="northeurope"
AKS_NAME="aks-cluster"
VNET_NAME="aks-vnet"
AKS_SUBNET_NAME="aks-subnet"
FIREWALL_SUBNET_NAME="AzureFirewallSubnet" # DO NOT CHANGE FWSUBNET_NAME - This is currently a requirement for Azure Firewall.
FIREWALL_NAME="firewall-for-aks"
FIREWALL_PUBLIC_IP_NAME="firewall-publicip"
FIREWALL_CONFIG_NAME="firewall-config"
FIREWALL_ROUTE_TABLE_NAME="firewall-route-table"
FIREWALL_ROUTE_NAME="firewall-route"
FIREWALL_ROUTE_INTERNET="firewall-route-internet"
FIREWALL_COLLECTION_GROUP_NAME="aks-egress"
NETWORK_RULES_COLLECTION="aks-network-rules"
APP_RULES_COLLECTION="aks-app-rules"
DNAT_RULES_COLLECTION="aks-dnat-rules"
HIGHLIGHT="\e[01;34m"
NC='\e[0m'
echo -e "${HIGHLIGHT}Variables set!${NC}"

Creación del grupo de recursos y la red

Antes de desplegar los recursos como tal necesitamos un grupo de recursos y, al menos, una red donde estos van a convivir.

echo -e "${HIGHLIGHT}Create the resource group...${NC}"
az group create --name $RESOURCE_GROUP --location $LOCATION
echo -e "${HIGHLIGHT}Create the vnet and aks subnet...${NC}"
az network vnet create \
--resource-group $RESOURCE_GROUP \
--name $VNET_NAME \
--location $LOCATION \
--address-prefixes 10.42.0.0/16 \
--subnet-name $AKS_SUBNET_NAME \
--subnet-prefix 10.42.1.0/24
echo -e "${HIGHLIGHT}Create $FIREWALL_SUBNET_NAME subnet...${NC}"
az network vnet subnet create \
--resource-group $RESOURCE_GROUP \
--vnet-name $VNET_NAME \
--name $FIREWALL_SUBNET_NAME \
--address-prefix 10.42.2.0/24

En un entorno productivo lo normal es que tengas una topología hub and spoke pero para este entorno de pruebas con una única red es más que suficiente.

Crear Azure Firewall Policy y Azure Firewall

Para empezar con buen pie, antes de desplegar el clúster de AKS, crear el recurso de Azure Firewall:

echo -e "${HIGHLIGHT}Create a public IP for Azure Firewall...${NC}"
az network public-ip create \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_PUBLIC_IP_NAME \
--sku "Standard" \
--location $LOCATION

echo -e "${HIGHLIGHT}Register the Azure Firewall preview CLI extension...${NC}"
az extension add --name azure-firewall
echo -e "${HIGHLIGHT}Create Azure Firewall Policy...${NC}"
az network firewall policy create \
--name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--enable-dns-proxy true # In order to use Fqdns in Network Rules you need to enable DNS Proxy
# Get Azure Firewall Policy ID
FIREWALL_POLICY_ID=$(az network firewall policy show \
--name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--query "id" -o tsv)
echo -e "${HIGHLIGHT}Create the Azure Firewall and assign the policy...${NC}"
az network firewall create \
--name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--firewall-policy $FIREWALL_POLICY_ID
echo -e "${HIGHLIGHT}Create an Azure Firewall IP configuration...${NC}"
az network firewall ip-config create \
--firewall-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_CONFIG_NAME \
--public-ip-address $FIREWALL_PUBLIC_IP_NAME \
--vnet-name $VNET_NAME
FW_PUBLIC_IP=$(az network public-ip show \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_PUBLIC_IP_NAME \
--query "ipAddress" -o tsv)
echo -e "${HIGHLIGHT}Azure Firewall Public IP: $FW_PUBLIC_IP${NC}"
FW_PRIVATE_IP=$(az network firewall show \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_NAME \
--query "ipConfigurations[0].privateIPAddress" -o tsv)
echo -e "${HIGHLIGHT}Azure Firewall Private IP: $FW_PRIVATE_IP${NC}"

Crear reglas en el firewall

Para que nuestro clúster pueda funcionar mínimamente necesitamos crear las siguientes reglas:

# https://learn.microsoft.com/en-us/azure/aks/outbound-rules-control-egress#azure-global-required-network-rules
echo -e "${HIGHLIGHT}Create a rule collection group${NC}"
az network firewall policy rule-collection-group create \
--name $FIREWALL_COLLECTION_GROUP_NAME \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--priority 100
# Create network collection and apiudp rule
echo -e "${HIGHLIGHT}Allow access to port 1194 via UDP. (For tunneled secure communication between the nodes and the control plane)${NC}"
az network firewall policy rule-collection-group collection add-filter-collection  \
--name $NETWORK_RULES_COLLECTION \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--collection-priority  100 \
--action "Allow" \
--rule-name "apiudp" \
--rule-type "NetworkRule" \
--description "Allows access to port 1194 via UDP. (For tunneled secure communication between the nodes and the control plane)" \
--destination-addresses "AzureCloud.$LOCATION" \
--destination-ports 1194 \
--source-addresses "*" \
--ip-protocols "UDP"
echo -e "${HIGHLIGHT}Allow access to port 123 via UDP (Required for Network Time Protocol (NTP) time synchronization on Linux nodes.)${NC}"
az network firewall policy rule-collection-group collection rule add \
--collection-name $NETWORK_RULES_COLLECTION \
--policy-name $FIREWALL_NAME \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--resource-group $RESOURCE_GROUP \
--name "time" \
--rule-type "NetworkRule" \
--description "Allows access to port 123 via UDP (Required for Network Time Protocol (NTP) time synchronization on Linux nodes.)" \
--destination-fqdns "ntp.ubuntu.com" \
--destination-ports 123 \
--source-addresses "*" \
--ip-protocols "UDP"
echo -e "${HIGHLIGHT}Allow access to port 9000 via TCP (For tunneled secure communication between the nodes and the control plane.)${NC}"
az network firewall policy rule-collection-group collection rule add \
--collection-name $NETWORK_RULES_COLLECTION \
--policy-name $FIREWALL_NAME \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--resource-group $RESOURCE_GROUP \
--name "apitcp" \
--rule-type "NetworkRule" \
--description "Allows access to port 9000 via TCP (For tunneled secure communication between the nodes and the control plane.)" \
--destination-addresses "AzureCloud.$LOCATION" \
--destination-ports 9000 \
--source-addresses "*" \
--ip-protocols "TCP"

También del tipo Application rules:

echo -e "${HIGHLIGHT}Allow to talk with related AKS services...${NC}"
az network firewall policy rule-collection-group collection add-filter-collection \
--name $APP_RULES_COLLECTION \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--collection-priority 101 \
--action Allow \
--rule-name "aksservices" \
--rule-type ApplicationRule \
--description "Allows to talk with AKS services" \
--fqdn-tags AzureKubernetesService \
--protocols Http=80 Https=443 \
--source-addresses "*"

Todas ellas están recogidas en el artículo Outbound network and FQDN rules for Azure Kubernetes Service (AKS) clusters.

Crear un tabla de rutas

Para que todo esto tenga sentido necesitamos de alguna forma enlazar estas reglas a la subnet donde va a vivir nuestro clúster. Para ello creamos una tabla de rutas:

echo -e "${HIGHLIGHT}Create an empty route table to be associated with a given subnet...${NC}"
az network route-table create \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_ROUTE_TABLE_NAME \
--location $LOCATION
echo -e "${HIGHLIGHT}Create routes in the route table...${NC}"
az network route-table route create \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_ROUTE_NAME \
--route-table-name $FIREWALL_ROUTE_TABLE_NAME \
--address-prefix 0.0.0.0/0 \
--next-hop-type VirtualAppliance \
--next-hop-ip-address $FW_PRIVATE_IP
az network route-table route create \
--resource-group $RESOURCE_GROUP \
--name $FIREWALL_ROUTE_INTERNET \
--route-table-name $FIREWALL_ROUTE_TABLE_NAME \
--address-prefix $FW_PUBLIC_IP/32 \
--next-hop-type Internet

Y la asignamos a la subnet donde desplegaremos nuestro clúster:

echo -e "${HIGHLIGHT}Associate the route table to the AKS subnet....${NC}"
az network vnet subnet update \
--resource-group $RESOURCE_GROUP \
--vnet-name $VNET_NAME \
--name $AKS_SUBNET_NAME \
--route-table $FIREWALL_ROUTE_TABLE_NAME

Crear clúster de AKS

Ahora que ya tenemos casi todo listo, ya podemos crear nuestro clúster en su subnet:

# Deploy an AKS cluster with a UDR outbound type to the existing network
# Now, you can deploy an AKS cluster into the existing virtual network.
#  You will use the userDefinedRouting outbound type, which ensures that 
# any outbound traffic is forced through the firewall and no other egress paths will exist
# The loadBalancer outbound type can also be used.
AKS_SUBNET_ID=$(az network vnet subnet show \
--resource-group $RESOURCE_GROUP \
--vnet-name $VNET_NAME \
--name $AKS_SUBNET_NAME \
--query id -o tsv)
# You'll define the outbound type to use the UDR that already exists on 
# the subnet. This configuration will enable AKS to skip the setup and IP provisioning for the load balancer.
# AKS will create a system-assigned kubelet identity in the node resource group if you don't specify your own kubelet managed identity.
#  For user-defined routing, system-assigned identity only supports the CNI network plugin.
time az aks create \
--resource-group $RESOURCE_GROUP \
--location $LOCATION \
--name $AKS_NAME \
--node-vm-size Standard_B4ms \
--network-plugin azure \
--outbound-type userDefinedRouting \
--vnet-subnet-id $AKS_SUBNET_ID 

Una vez finalice, recupera las credenciales para interactuar con él:

# Get the credentials for the cluster
az aks get-credentials \
--resource-group $RESOURCE_GROUP \
--name $AKS_NAME --overwrite-existing

Desplegar ejemplos en el clúster

Para comprobar que esto funciona correctamente, voy a desplegar dos ejemplos: el de voting de la documentación oficial y, como siempre, mi Tour of heroes:

# Deploy tour of heroes
kubectl apply -f azure-firewall/manifests --recursive
kubectl get all

Al desplegarlo te darás cuenta de que voting comienza a crearse sin problemas pero Tour of heroes no es capaz de descargarse las imágenes:

Esto es así porque estas imágenes se hospedan en GitHub Packages y los FQDNs relacionadas no están permitidos en tu Azure Firewall. Para solucionarlo se podría añadir la siguiente application rule:

echo -e "${HIGHLIGHT}Create a rule to allow traffic to GitHub packages...${NC}"
az network firewall policy rule-collection-group collection rule add  \
--name "github-packages" \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--collection-name $APP_RULES_COLLECTION \
--rule-type ApplicationRule \
--description "Allows to pull docker images from GitHub Packages" \
--target-fqdns ghcr.io pkg-containers.githubusercontent.com \
--protocols Https=443 \
--source-addresses "*"
echo -e "${HIGHLIGHT}I cannot wait, delete all pods!${NC}"
kubectl delete pods --all
echo -e "${HIGHLIGHT}Check pods again!${NC}"
kubectl get pods

Probar las aplicaciones de manera interna

Para comprobar que las aplicaciones están funcionando correctamente, lo primero que pruebo es si estás están funcionando correctamente desde dentro del clúster:

echo -e "${HIGHLIGHT}Test voting app from inside the cluster${NC}"
kubectl run -it --rm test-voting-app --image=mcr.microsoft.com/powershell --restart=Never -- pwsh -c "Invoke-WebRequest -Uri http://voting-app"
echo -e "${HIGHLIGHT}Test tour of heroes API from inside the cluster${NC}"
kubectl run -it --rm test-tour-of-heroes --image=mcr.microsoft.com/powershell --restart=Never -- pwsh -c "Invoke-WebRequest -Uri http://tour-of-heroes-api/api/hero"

Probar las aplicaciones desde Internet

Lo siguiente es comprobar si podemos acceder a las mismas a través de las IPs públicas asociadas a los servicios de ambas aplicaciones:

echo -e "${HIGHLIGHT}Test voting app from the Internet${NC}"
VOTING_APP_PUBLIC_IP=$(kubectl get svc voting-app -o jsonpath='{.status.loadBalancer.ingress[*].ip}')
curl --connect-timeout 5 http://$VOTING_APP_PUBLIC_IP | jq

# Get API public IP
echo -e "${HIGHLIGHT}Test tour of heroes API from the Internet${NC}"
API_PUBLIC_IP=$(kubectl get svc tour-of-heroes-api -o jsonpath='{.status.loadBalancer.ingress[*].ip}')
# Try to call the API
curl --connect-timeout 5 http://$API_PUBLIC_IP/api/hero | jq
# It doesn't work. Here the explanation:
# When you use Azure Firewall to restrict egress traffic and create a UDR to force all egress traffic,
# make sure you create an appropriate DNAT rule in Azure Firewall to correctly allow ingress traffic. 
# Using Azure Firewall with a UDR breaks the ingress setup due to asymmetric routing. The issue occurs 
# if the AKS subnet has a default route that goes to the firewall's private IP address, but you're using a 
# public load balancer - ingress or Kubernetes service of type loadBalancer.
#  In this case, the incoming load balancer traffic is received via its public IP address, but the return path 
# goes through the firewall's private IP address. Because the firewall is stateful, it drops the returning packet 
# because the firewall isn't aware of an established session.
echo -e "${HIGHLIGHT}Create a NAT rule to allow traffic from the firewall public IP to the voting app${NC}"
az network firewall policy rule-collection-group collection add-nat-collection \
--name $NAT_RULES_COLLECTION \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--collection-priority 102 \
--action DNAT \
--rule-name "votingnatrule" \
--description "Allows to talk with Voting App" \
--ip-protocols TCP \
--source-addresses "*" \
--translated-address $VOTING_APP_PUBLIC_IP \
--translated-port 80 \
--destination-addresses $FW_PUBLIC_IP \
--destination-ports 80
echo -e "${HIGHLIGHT}Test again using firewall public IP to access voting app${NC}"
curl --connect-timeout 5 http://$FW_PUBLIC_IP 
echo http://$FW_PUBLIC_IP
echo -e "${HIGHLIGHT}Create a NAT rule to allow traffic from the firewall public IP to the tour of heroes API${NC}"
az network firewall policy rule-collection-group collection rule add \
--name "tour-of-heroes-nat-rule" \
--policy-name $FIREWALL_NAME \
--resource-group $RESOURCE_GROUP \
--collection-name $NAT_RULES_COLLECTION \
--rule-collection-group-name $FIREWALL_COLLECTION_GROUP_NAME \
--rule-type NatRule \
--description "Allows to talk with tour of heroes API" \
--ip-protocols TCP \
--source-addresses "*" \
--translated-address $API_PUBLIC_IP \
--translated-port 80 \
--destination-addresses $FW_PUBLIC_IP \
--destination-ports 8080
#
echo -e "${HIGHLIGHT}Test again but using the firewall public IP${NC}"
curl -H "Content-Type: application/json" \
-X POST \
-d '{"name": "Arrow", "alterEgo": "Oliver Queen", "description": "Multimillonario playboy Oliver Queen (Stephen Amell), quien, cinco años después de estar varado en una isla hostil, regresa a casa para luchar contra el crimen y la corrupción como un vigilante secreto cuya arma de elección es un arco y flechas." }' \
http://$FW_PUBLIC_IP:8080/api/hero | jq
curl --connect-timeout 5 http://$FW_PUBLIC_IP:8080/api/hero | jq

Como puedes ver, para que este escenario sea posible, es necesario crear reglas de tipo NAT para enrutar correctamente.

El ejemplo completo lo tienes en este repo de GitHub.

¡Saludos!