Cloud vulnerability management: how to detect what matters

Most cloud security teams spend their time scanning for vulnerabilities. They run Trivy against their container images, get a list of 3,000 CVEs, and try to figure out which ones matter. The list is always too long. The prioritization is always wrong. And by the time they patch the important ones, attackers have already moved on to the next exploit.

The core problem: scanning tells you what could go wrong. It does not tell you what is going wrong right now. 48,185 new CVEs were published in 2025, up 20.6% from 2024 (Indusface: Vulnerability Statistics 2026). The median time to exploit is under 5 days.

Mandiant’s M-Trends 2026 puts the mean time to exploit at negative seven days, meaning attackers are routinely exploiting vulnerabilities before they are publicly disclosed (Security Boulevard: Mean Time to Exploit Has Gone Negative).

Meanwhile, the average organization takes 88 days to patch a critical vulnerability after a fix ships (Pixee: Time-to-Exploit Dropped to 5 Days).

You cannot patch your way to security when attackers move faster than your patch cycle. Cloud vulnerability management that works in 2026 starts with detection, not scanning.

What cloud vulnerability management actually covers

Cloud vulnerability management goes beyond scanning for CVEs. It is the full cycle of finding, prioritizing, and fixing security weaknesses across your cloud infrastructure. In practice, that means four things:

Area	What you are looking for	How you find it
Known vulnerabilities (CVEs)	Unpatched software, vulnerable container images, outdated dependencies	Image scanners (Trivy, Grype), registry scanning, CI/CD pipeline checks
Misconfigurations	Open S3 buckets, overly permissive IAM roles, unencrypted databases, public-facing services that should not be	CSPM tools, policy engines (OPA/Gatekeeper), CIS benchmark scanners (kube-bench)
Configuration drift	Settings that changed since last audit, Kubernetes manifests that no longer match what is running	GitOps drift detection (ArgoCD, Flux), continuous config scanning
Runtime threats	Anomalous network traffic, unusual API calls, privilege escalation attempts, cryptomining processes	Log-based detection, runtime security tools (Falco), observability stack alerts

Most cloud security management programs only do the first one. They run a scanner, get a list of 10,000 CVEs, and drown in noise. The teams that actually reduce risk cover all four.

Why scanning alone does not work

A vulnerability scanner will tell you that your cluster has 3,000 known CVEs across all container images. That is technically accurate and practically useless.

The problems:

Most CVEs are in base image layers that your application never calls. A vulnerable version of libxml2 in your base image is not a risk if your application does not parse XML.
Scanners rank by CVSS score, not by exploitability. A CVSS 9.8 that requires physical access to the machine is less dangerous than a CVSS 7.5 that is being actively exploited in the wild.
Scan results are a snapshot. They tell you what was vulnerable at scan time. They do not tell you what is being attacked right now.

The numbers back this up. 85% of container images have high or critical vulnerabilities (Sysdig: 2026 Cloud-Native Security and Usage Report). If everything is critical, nothing is critical. You need a way to filter the noise.

What to actually monitor

Effective cloud vulnerability management combines scanning (what could be exploited) with monitoring (what is being exploited or is exposed to exploitation). Split by detection type:

1. Vulnerability exposure metrics

These tell you how exposed your infrastructure is at any given point.

Unpatched critical CVEs with known exploits.

Not all CVEs. Just the ones in CISA’s Known Exploited Vulnerabilities catalog or with public exploit code. This is your highest-priority list.

Time to patch after disclosure.

How many days between a CVE being published and your team deploying the fix. If this number is above 30 days for critical vulnerabilities, you are behind.

Container image age.

How old are the images running in production? Images older than 90 days almost certainly contain unpatched vulnerabilities. Track this per namespace.

Percentage of workloads scanned.

Are you scanning everything or just some things? Teams often scan CI/CD pipelines but skip running workloads. If it is not in the scan, it is a blind spot.

2. Misconfiguration detection

Misconfigurations cause more breaches than unpatched software. 23% of cloud security incidents in 2025 came from misconfigurations like open buckets or unprotected APIs. Gartner predicts 95% of cloud security failures through 2026 will be the customer’s fault, mostly due to misconfiguration (DataStackHub: Cloud Misconfiguration Statistics).

What to check:

Misconfiguration	Why it matters	How to detect it
Public S3/GCS buckets	Data exposure. This is how most large-scale data leaks happen.	CSPM policy checks, bucket access logging
Overly permissive IAM roles	Lateral movement. An attacker who compromises one service can reach everything.	IAM access analysis, least-privilege audits
Unencrypted data stores	Compliance failure and data theft risk.	Config scans against CIS benchmarks
Kubernetes RBAC misconfigs	Privilege escalation inside the cluster.	kube-bench, Kubescape, OPA policies
Exposed management ports (22, 3389, 443 admin panels)	Direct attack surface. These should never face the internet.	Network policy audits, port scanning

82% of these misconfigurations are caused by human error, not provider flaws (Fidelis Security: Cloud Misconfigurations Causing Data Breaches). That means they are preventable with the right checks in place.

3. Runtime detection (where observability fits)

This is the layer most teams are missing. Scanning finds what could go wrong. Runtime detection finds what is going wrong right now.

Your observability stack (Prometheus, Loki, Tempo, Grafana) already collects the data you need. You just need to write the right queries and alerts.

Log-based CVE exposure detection.

If a service with a known critical CVE is receiving external traffic, that is higher priority than the same CVE on an internal-only service. Correlate your vulnerability scan results with your traffic logs:

Service has critical CVE + receives external traffic = page
Service has critical CVE + internal only, no sensitive data = ticket
Service has critical CVE + internal only, handles PII = page

Anomalous traffic patterns.

A sudden spike in outbound connections from a pod that normally only handles inbound requests could indicate compromise. Set up alerts in Grafana for:

Outbound connection count exceeding the 95th percentile for that workload
DNS queries to domains not in your allow list
Traffic to known cryptomining pools or C2 infrastructure

Configuration drift alerts.

If a Kubernetes deployment’s running config no longer matches what is in Git, something changed outside the normal deploy process. That is either a misconfiguration or a compromise. ArgoCD and Flux both expose drift metrics that Prometheus can scrape.

Secret access anomalies.

If a service account that normally reads 2 secrets suddenly reads 15, that is worth investigating. Kubernetes audit logs track secret access. Route them through Loki and alert on deviations from the baseline.

Cloud security management: what to prioritize first

You cannot fix everything at once. This order gives you the most risk reduction per hour of work.

1. Fix what is being actively exploited

Check CISA’s Known Exploited Vulnerabilities (KEV) catalog against your running workloads. If you have a KEV-listed CVE on a service that faces the internet, patch it today. Not next sprint. Today.

2. Close misconfigurations that expose data

Run kube-bench or Kubescape against your clusters. Check for public-facing storage buckets. Audit IAM roles for overly broad permissions. These are the vulnerabilities that do not need a CVE to be dangerous.

3. Set up runtime detection

Get your observability stack alerting on the patterns described above. This does not require new tooling if you already have Prometheus, Loki, and Grafana. It requires writing the queries and setting the alert thresholds.

4. Automate image scanning in CI/CD

Every container image should be scanned before it reaches production. Trivy and Grype are both free and integrate with most CI/CD pipelines. Block deploys that contain critical CVEs with known exploits.

5. Track your patch window

Measure the time between CVE disclosure and your deploy of the fix. Set a target: under 14 days for critical, under 30 days for high. If you are not measuring this, you have no idea whether you are improving.

The detection gap

The average detection time for a cloud misconfiguration is over 180 days (DataStackHub: Cloud Misconfiguration Statistics). Six months of exposure before anyone notices. Organizations with mature log management programs have compressed that to under 4 hours (Nadcab: Cloud Log Management Transforming IT Monitoring).

That is the difference between cloud vulnerability management as a checkbox (run a scanner quarterly, file the report) and cloud security management as an actual practice (continuous scanning, runtime detection, measured response times).

The teams that close this gap are the ones with an observability stack that ties their vulnerability data to their runtime data.

They know which CVEs are on which services, which services are exposed to the internet, and which ones are behaving abnormally. Without that layer, you are scanning in the dark.

Where Obsium fits

If your team is scanning for vulnerabilities but not detecting what is actually being exploited, book a free 30-minute observability consultation. No sales deck, just an engineer-to-engineer chat about your stack.

FAQ

What is cloud vulnerability management?

It is the process of finding, prioritizing, and fixing security weaknesses across your cloud infrastructure. That includes scanning for known CVEs, detecting misconfigurations (open buckets, overly permissive IAM roles), catching configuration drift, and monitoring for runtime threats like anomalous traffic or privilege escalation attempts.

How is cloud vulnerability management different from cloud security posture management (CSPM)?

CSPM focuses on misconfigurations and compliance: are your S3 buckets public, are your IAM roles too broad, do your clusters pass CIS benchmarks. Cloud vulnerability management is broader. It includes CSPM but also covers CVE scanning, patch tracking, and runtime threat detection. CSPM is one input into the full vulnerability management cycle.

What does cloud security management include?

Cloud security management covers the policies, tools, and processes you use to protect cloud infrastructure. In practice that means vulnerability scanning, misconfiguration detection, access control (IAM and RBAC), runtime monitoring, incident response, and patch management. The teams that do it well tie all of these together through an observability stack so they can correlate alerts across layers.

Why is vulnerability scanning not enough on its own?

85% of container images have high or critical vulnerabilities. Scanners rank by CVSS score, not by whether something is actually being exploited. A CVSS 9.8 that requires physical access is less dangerous than a CVSS 7.5 with a public exploit and internet-facing exposure. Without runtime context, you are prioritizing from a list that treats everything as equally urgent.

What is cloud patch management, and how fast should we patch?

Cloud patch management is the process of deploying fixes for known vulnerabilities across your cloud workloads. The median time to exploit is under 5 days. The average organization takes 88 days to patch. A reasonable target: under 14 days for critical CVEs with known exploits, under 30 for high severity. If you are not measuring your patch window, you cannot tell whether you are improving.

How does Kubernetes vulnerability scanning work?

Tools like Trivy and Grype scan container images for known CVEs by comparing installed packages against vulnerability databases. You can run them in CI/CD pipelines to block vulnerable images before they reach production, and against running workloads to catch vulnerabilities that were not present at build time. Scanning alone misses misconfigurations and runtime threats, so it should be one layer in a broader approach.

What is Kubernetes runtime security?

Runtime security monitors your cluster for threats as they happen, not after a scan. That includes detecting anomalous network traffic (unexpected outbound connections, DNS queries to unknown domains), privilege escalation attempts, configuration drift from what is in Git, and unusual secret access patterns. Tools like Falco handle some of this. Your observability stack (Prometheus, Loki, Grafana) can handle the rest if you write the right queries and alerts.

What tools are used for cloud vulnerability management?

For CVE scanning: Trivy, Grype, Snyk. For misconfiguration detection: kube-bench, Kubescape, OPA/Gatekeeper. For CSPM: provider-native tools or third-party platforms. For runtime detection: Falco, plus your observability stack (Prometheus for metrics, Loki for logs, Grafana for alerting). For configuration drift: ArgoCD and Flux both expose drift metrics that Prometheus can scrape.

Cloud vulnerability management: how to detect what matters