Kubernetes cert-manager is unable to fetch certificate that owns the secret
Today I was sifting through the cert-manager logs in my Kubernetes cluster looking for a possible cause of a certificate issue failure.
Here’s the command to see the logs (remember to use tab)
kubectl logs -f -n cert-manager cert-manager-<tab-tab>
Here I’ve found a good amount of lines like this
cert-manager/secret-mapper
"msg"="unable to fetch certificate that owns the secret"
"error"="Certificate.certmanager.k8s.io \"example-tls4\" not found"
"certificate"={"Namespace":"default","Name":"example-tls4"}
"secret"={"Namespace":"default","Name":"example-tls4"
A quick googling around pointed me to issue 1944
Obviuosly I don’t have the EnableCertificateOwnerRef
enabled, so I have some cleanup to do.
Here’s a bash script to list all the orphan certificates in all namespaces (thanks richstokes):
#!/bin/bash
set -e
function list_orphans_in_namespace(){
if [[ $# -lt 1 ]] ; then
echo "A namespace is needed"
exit 2;
fi
NAMESPACE=$1
SECRETS=($(kubectl --namespace $NAMESPACE get secrets 2>&- | grep tls | awk '{print $1}'))
CERTS=($(kubectl --namespace $NAMESPACE get certs 2>&- | grep True | awk '{print $1}'))
ORPHANS=($(comm -23 <(for x in "${SECRETS[@]}"; do echo "$x"; done | sort) <(for x in "${CERTS[@]}"; do echo "$x"; done | sort)))
if [ ${#ORPHANS[@]} -ne 0 ]; then
echo "Orphaned secrets detected in ${NAMESPACE}"
for i in ${ORPHANS[@]}; do
echo -e "\t$i"
done
fi
}
NAMESPACES=($1)
# If no namespace specified, set it to default
if [[ $# -lt 1 ]] ; then
NAMESPACES=($(kubectl get ns -o json | jq .items[].metadata.name --raw-output)) # Array of all namespaces
fi
for NAMESPACE in ${NAMESPACES[@]}; do
list_orphans_in_namespace $NAMESPACE
done
I prefered to delete every entry by hand, cause I had a couple of secrets on kube-prometheus-stack that are listed but they are not certificates.
Grafana Alerts on cert-manager issue failure
It would be nice to have an alert whenever a cert-manager is not able to issue a certificate.
To do so, I came up with this query:
sum(
count_over_time(
{namespace="cert-manager"}
|~ "\"error\"=\"failed to perform self check GET request.*\""
| regexp `(.*http://(?P<domain>[^/]*)\/.*)`
[1m]
)
) by (domain)
An alert is set when sum()
of such query is above 0