Sensor v3: Why We Rebuilt Our VA Sensor From Scratch

Published 30 Apr 2026 · By Satyam Maurya · Networking · 16 min read

On 24 April 2026 we flipped Ogma's first production VA sensor — ST-MAIN, a customer in NCR — to Sensor v3. Single Docker container. OpenVAS, ospd-openvas, notus-scanner, mosquitto, redis, all running on the customer's host. No scan traffic over the WireGuard tunnel. 10-minute install via a single shell one-liner. This is the engineering write-up.

⚡ Install time

10 min

From cold to "ready" — single one-liner

📦 Containers

Was 5+ in v2 (compose stack)

🔁 Boot phases

wg_up → feed_sync → feed_ready → scanner_boot → ready

📡 Tunnel use

Control only

Scan traffic stays on customer LAN

🐛 Real bugs hit

In ~2 weeks of building this

Audience: this is a technical post written by a network engineer (NSE7) for other network and security engineers. If you're scanning your environment with OpenVAS, GVM, or any vulnerability scanner that talks OSP, the architectural choices here will look familiar. If you're not, the high-level architecture in the diagrams is enough.

What v1 and v2 Got Wrong

Ogma's original sensor (v1, then v2) used a hosted-over-WG model. The customer ran a tiny WireGuard endpoint on a thin Linux box. Scan packets flowed back over the WG tunnel to a centralised GVM cluster on Ogma's side, scans executed there, results came back. That worked. It also created four problems we got tired of fighting:

v2 Latency on every probe

Every TCP / UDP probe to a customer-side host had to traverse a WireGuard tunnel back to our DC, then return. For aggressive scans against /24 networks, the round-trip latency added 30–60% to scan time. Customers noticed.

v2 Centralised scan capacity = central bottleneck

One GVM cluster ran every customer's scans. Burst capacity was always wrong — either over-provisioned and idle, or under-provisioned and queued. Worse, our shared FOR UPDATE SKIP LOCKED claim queue had a subtle bug where a LEFT JOIN in the subquery silently dropped concurrent claims (Postgres rejects FOR UPDATE on the nullable side of an outer join). Took us a day to find the root cause; fixed in commit c247384.

v2 Customer scan data crossed organisational boundary

Scan results — including target host details, open ports, software versions, vulnerability findings — left the customer network in transit. Some BFSI and government customers had explicit policy against this even with end-to-end encryption. We had to explain this every onboarding call.

v1 + v2 Operationally heavy for the customer

v1 install was a multi-step bash script with explicit dependency installs (Docker, WireGuard, Greenbone tooling, kernel modules). Failure modes were many. We rewrote install.sh as POSIX /bin/sh in v1.2 to handle Ubuntu's dash, but the underlying complexity was still there.

The combination — latency + central bottleneck + data residency + install complexity — meant we could either keep patching v2 or rebuild. We chose rebuild.

Sensor v3 Architecture

The core idea: put OpenVAS on the customer host. Use the WG tunnel only for control plane. Scan traffic never leaves the customer LAN. The Ogma portal pushes scan jobs into a queue; the agent on the sensor claims them and runs them locally; results upload back over the WG tunnel as XML.

Customer Host (Ubuntu 24.04 + Docker)

Single container network_mode: host caps: NET_ADMIN, NET_RAW

tini + supervisord

PID 1 + service manager

priority root

mosquitto

loopback MQTT broker

priority 5

redis

unix-socket only

priority 10

notus-scanner

advisory matcher

priority 15

ospd-openvas

OSP daemon (unix sock)

priority 20

ogma-agent

claim → run → upload

priority 30

▼ Control plane: WireGuard tunnel — heartbeat, scan claims, results upload ▼

Ogma Portal (ogma.in)

/api/sensor/heartbeat

30 s liveness

/api/sensor/claim-scan

5 s poll, single scan

/api/sensor/log

NDJSON live log

/api/sensor/scan-report

final XML upload

The container is ubuntu:24.04 + supervisord + tini. NET_ADMIN is needed because the agent brings up the WG interface on the host network namespace; NET_RAW is needed for OpenVAS port-probing. We deliberately do not grant SYS_ADMIN — we have seen too many "container that needs SYS_ADMIN" pitches that turn out to mean "we couldn't be bothered to set the right per-cap permissions."

Why supervisord and not just `docker compose up`?

Two reasons. First, OpenVAS depends on a strict service start-order — redis before notus-scanner before ospd-openvas before agent. Compose's depends_on with condition: service_healthy works but adds 4 health-check sidecars and another layer of startup latency. Second, the agent needs to be able to systemctl restart-equivalent any of the scanner services if a scan hangs; supervisord exposes that natively over the loopback MQTT bus.

Phased Boot — Why It Matters

The single thing that bit us hardest in v2 was that the sensor would come up and look ready when in fact ospd-openvas was still loading its NVT cache — a process that can take 5–15 minutes on a cold start. Scans dispatched to a not-actually-ready scanner would fail with confusing errors. v3 fixes this with an explicit phased boot, with each phase reporting its progress back to the portal.

wg_up

Fetch WG config from portal using SENSOR_TOKEN. Bring up wg0. Verify handshake.

feed_sync

Skipped if NASL ≥ 2,000 + Notus ≥ 500 + age < 12 h. Else greenbone-feed-sync --type nvt → greenbone-nvt-sync → raw rsync fallback with to-check=X/Y progress.

feed_ready

Verify NVT count + Notus advisory count meet thresholds. Refuse to proceed if either is empty.

scanner_boot

exec supervisord — start mosquitto → redis → notus-scanner → ospd-openvas → agent.

ready

OSP sock open + agent heartbeat reaching portal. Portal flips sensor status to ONLINE.

Every phase transition POSTs to /api/sensor/log with the new state. Customers and Ogma engineers can both watch the boot progress live in the portal — no more "is it working yet?" tickets ten minutes after install.

# Excerpt from sensor/v3/phased-boot.sh — feed_sync phase
if [ "$NASL_COUNT" -ge "$NASL_MIN" ] && [ "$NOTUS_COUNT" -ge "$NOTUS_MIN" ] &&    [ "$FEED_AGE_HOURS" -lt 12 ]; then
  log_phase "feed_sync" "skipped — feeds fresh ($NASL_COUNT NASL, $NOTUS_COUNT Notus, $FEED_AGE_HOURS h old)"
else
  log_phase "feed_sync" "running greenbone-feed-sync"
  greenbone-feed-sync --type nvt || rsync_fallback "$RSYNC_NVT_URL" "$NVT_DIR"
  greenbone-feed-sync --type notus || rsync_fallback "$RSYNC_NOTUS_URL" "$NOTUS_DIR"
fi

The Scan-Claim Pattern

The agent polls POST /api/sensor/claim-scan every 5 seconds. The portal returns either 204 No Content (no work) or a JSON payload describing one scan to run — target hosts, scan-config UUID, target alive-test setting, the works.

The claim is backed by Postgres FOR UPDATE SKIP LOCKED on the va_scans table:

-- claim_next_pending_sensor_scan(sensor_id) — atomic claim, no double-dispatch
UPDATE va_scans
SET    status = 'running', claimed_at = NOW()
WHERE  id = (
  SELECT id FROM va_scans
  WHERE  sensor_id = %s
    AND  status = 'pending'
    AND  scheduled_at <= NOW()
  ORDER BY scheduled_at ASC
  LIMIT 1
  FOR UPDATE SKIP LOCKED
)
RETURNING *;

Real bug FOR UPDATE with LEFT JOIN doesn't work the way you'd expect

The first version of this query had a LEFT JOIN va_targets in the subquery so we could filter on target attributes. Postgres returned FeatureNotSupported: SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join. Two concurrent agents could then both claim the same scan, leading to duplicate runs.

Fix (c247384): drop the LEFT JOIN inside FOR UPDATE SKIP LOCKED. Filter on the outer query if needed. Lesson: never put a LEFT JOIN inside a FOR UPDATE subquery — Postgres will refuse it, and depending on your transaction isolation level you may not see the error until production race conditions surface.

Once the agent has a scan claim, it talks OSP over the local /run/ospd/ospd-openvas.sock unix socket — never network sockets. It then polls scan progress and uploads the final report XML to /api/sensor/scan-report. Server-side, _store_sensor_v3_report(scan_id, user_id, xml_bytes) in tasks.py mirrors the hosted code path — the same parsing, the same result/host/CVE schema, so the rest of the portal doesn't care whether a scan came from v3 or v2.

Install Story

Customer-facing install is a single command. We deliberately mirror the FortiGate/CrowdStrike "paste this and you're done" experience.

sudo SENSOR_TOKEN=<tok> SENSOR_NAME="DC-Mumbai-1" sh -c   "$(curl -fsSL https://ogma.in/sensor/v3/install)"

The install script (POSIX /bin/sh, Ubuntu-only) does seven things in order:

Verify Ubuntu (dies on anything else — too many edge cases otherwise)
Install Docker + iproute2 + iptables + WireGuard tools if missing
Pull the seven sensor files from /sensor/v3/files/<f> (Dockerfile, docker-compose.yml, entrypoint.sh, phased-boot.sh, log-shim.sh, agent.py, supervisord.conf, mosquitto.conf, install.sh, uninstall.sh)
Pull the sensor container image
Generate the local config (sensor name, token, portal URL)
Start the container — supervisord brings up the stack
POST a one-time registration to /api/sensor/register with hostname + OS info

On a clean Ubuntu 24.04 with Docker pre-installed, total time is about 6 minutes (mostly the container pull). On a host without Docker, ~10 minutes. From "ready" the agent is online and awaiting scans.

For Windows-only customer environments we ship a parallel PowerShell + WSL2 installer (/sensor/v3/install.ps1). Same pattern, different shell.

Migration Path: How We Roll Out v3

We did not flip every sensor to v3 the day v3 was ready. The cutover is per-sensor, controlled by one column in the va_sensors table:

UPDATE va_sensors SET sensor_version = 'v3' WHERE id = 108;

The router function _launch_queued_scan dispatches on sensor_version. v3 sensors get the agent-claim path. Anything else (NULL or v2) gets the existing two-phase hosted path over WG. Both paths coexist indefinitely; we migrate sensor-by-sensor. The global hosted-flow claim function explicitly excludes v3 sensors via:

AND NOT EXISTS (
  SELECT 1 FROM va_sensors
  WHERE id = s.sensor_id AND sensor_version = 'v3'
)

First sensor live: a customer in NCR, on 24 April 2026. WG handshake confirmed, feeds synced (3,454 NASL + 512 Notus), OSP socket accepting connections, agent online. The next sensor flips on the customer's next maintenance window.

Twelve Bugs We Hit Building This

For the engineers reading: a brief catalogue. Useful as a checklist if you're building anything similar.

FOR UPDATE with LEFT JOIN — covered above. Fix c247384.
Greenbone greenbone-feed-sync wrapper unquoted a path — broke on hostnames with dashes. Fix: de0b835.
Feed-directory ownership wrong post-sync; ospd-openvas couldn't read NVTs. Fix: chown -R in entrypoint, e6c55d6.
iproute2 + iptables not installed by default in ubuntu:24.04. Container couldn't bring up wg0. Fix: explicit apt install, 8294bc4.
Mosquitto broker wasn't whitelisting the agent's loopback subscription. Fix: explicit ACL in mosquitto.conf, daccf4e.
notus-scanner missing entirely from initial supervisord config — ospd-openvas calls it for advisory matching. Fix: add it, 466342d.
Heartbeat thread not actually starting (silent exit during init). Fix: explicit set -u + visible error, 6ab0118.
Agent output going to /dev/null by default — invisible debug. Fix: visible stdout, 143a80f.
Heartbeat thread racing the registration POST — sensor showed offline for the first 60s. Fix: heartbeat thread starts post-register, 6d8c471.
Phased boot's set -u caught a missing SENSOR_TOKEN env var late in install. Fix: validation up front, 7a62d90.
Windows installer clashed with user's existing Ubuntu WSL distro name. Fix: namespaced WSL distro, 1e653e4.
OSP get_vts intermittently returned empty NVT family list. Fix: hardcode known good families + buffer, fad2858 + 93f0f7d.

Twelve bugs in two weeks of focused build. None trivial. None catastrophic. The right test for whether a piece of infrastructure is ready for production is not "does it work the first time" — it's whether you've found and fixed the bugs that would have bitten you.

What's Next

Migration runway. Every existing v2 sensor flips to v3 over the next 60 days, sensor-by-sensor, on customer maintenance windows.
ARM64 image. Ubuntu 24.04 ARM build for sensors deployed on Raspberry Pi-class hardware in branch sites.
Authenticated scans. v3 currently does unauthenticated network scans. Next: SSH credential brokering for in-depth Linux scans, WMI / WinRM for Windows.
CIS benchmark scans. The OpenVAS NVT feed already includes CIS benchmark NASL scripts; we have to surface them in the portal as a separate scan type.
Compliance report templates. One-click DPDPA / RBI / SEBI CSCRF / ISO 27001 mapped reports straight from the v3 scan output. The plumbing exists in our hosted path; v3 reports go through the same code now.

✅ Key Takeaways

Hosted-over-WG is the wrong default for VA. Latency, central capacity bottleneck, scan-data residency and install complexity all bite at scale. Push the scanner to the customer host instead.
Single-container deployments don't have to mean Docker Compose. A well-sequenced supervisord inside one container avoids the cascade-restart problem and reduces the dependency surface area.
Phased boot reporting is operationally invaluable. Five named phases with progress streamed to the portal in real time eliminates "is it working" support tickets entirely.
Postgres FOR UPDATE SKIP LOCKED + LEFT JOIN is a footgun. Filter in the outer query, not the locked subquery.
Migrate by SQL flag, not by all-at-once cutover. One column on the sensor row routes to v3; everything else stays on the v2 path indefinitely. Reversible.

🛡️ Managed Vulnerability Assessment

Want this running in your environment?

Ogma's Managed Vulnerability Assessment is the consumer of everything described above. Sensor v3 deploys in 10 minutes on any Ubuntu host you control. Compliance-mapped reports for RBI / SEBI CSCRF / DPDPA / ISO 27001 / PCI DSS / CERT-In come out of the same scan output. NSE7-certified engineers triage and remediate. Talk to us.

✉ Write to [email protected] 📞 +91 80 0979 0979

Tags Vulnerability Assessment OpenVAS GVM Sensor Architecture WireGuard Docker Greenbone Engineering

Stay ahead of cyber threats

One short email a week — curated Indian cybersecurity news, Fortinet releases, DPDPA updates. No fluff.

Sensor v3: Why We Rebuilt Our VA Sensor From Scratch

What v1 and v2 Got Wrong

Sensor v3 Architecture

Why supervisord and not just `docker compose up`?

Phased Boot — Why It Matters

The Scan-Claim Pattern

Install Story

Migration Path: How We Roll Out v3

Twelve Bugs We Hit Building This

What's Next

✅ Key Takeaways

Stay ahead of cyber threats

Search

Talk to an Expert

Sensor v3: Why We Rebuilt Our VA Sensor From Scratch

What v1 and v2 Got Wrong

Sensor v3 Architecture

Why supervisord and not just docker compose up?

Phased Boot — Why It Matters

The Scan-Claim Pattern

Install Story

Migration Path: How We Roll Out v3

Twelve Bugs We Hit Building This

What's Next

✅ Key Takeaways

Stay ahead of cyber threats

Related Posts

Why Your Enterprise Needs Its Own Public IP Subnet — And How BGP Makes It Work

Cisco AgenticOps Explained: How Agentic AI Is Rewriting the Rules of IT Operations in 2026

Sensor v3: Why We Rebuilt Our VA Sensor From Scratch

Search

Talk to an Expert

Why supervisord and not just `docker compose up`?