Sensor v3: Why We Rebuilt Our VA Sensor From Scratch
On 24 April 2026 we flipped Ogma's first production VA sensor — ST-MAIN, a customer in NCR — to Sensor v3. Single Docker container. OpenVAS, ospd-openvas, notus-scanner, mosquitto, redis, all running on the customer's host. No scan traffic over the WireGuard tunnel. 10-minute install via a single shell one-liner. This is the engineering write-up.
⚡ Install time
10 min
From cold to "ready" — single one-liner
📦 Containers
1
Was 5+ in v2 (compose stack)
🔁 Boot phases
5
wg_up → feed_sync → feed_ready → scanner_boot → ready
📡 Tunnel use
Control only
Scan traffic stays on customer LAN
🐛 Real bugs hit
12
In ~2 weeks of building this
Audience: this is a technical post written by a network engineer (NSE7) for other network and security engineers. If you're scanning your environment with OpenVAS, GVM, or any vulnerability scanner that talks OSP, the architectural choices here will look familiar. If you're not, the high-level architecture in the diagrams is enough.
What v1 and v2 Got Wrong
Ogma's original sensor (v1, then v2) used a hosted-over-WG model. The customer ran a tiny WireGuard endpoint on a thin Linux box. Scan packets flowed back over the WG tunnel to a centralised GVM cluster on Ogma's side, scans executed there, results came back. That worked. It also created four problems we got tired of fighting:
v2 Latency on every probe
Every TCP / UDP probe to a customer-side host had to traverse a WireGuard tunnel back to our DC, then return. For aggressive scans against /24 networks, the round-trip latency added 30–60% to scan time. Customers noticed.
v2 Centralised scan capacity = central bottleneck
One GVM cluster ran every customer's scans. Burst capacity was always wrong — either over-provisioned and idle, or under-provisioned and queued. Worse, our shared FOR UPDATE SKIP LOCKED claim queue had a subtle bug where a LEFT JOIN in the subquery silently dropped concurrent claims (Postgres rejects FOR UPDATE on the nullable side of an outer join). Took us a day to find the root cause; fixed in commit c247384.
v2 Customer scan data crossed organisational boundary
Scan results — including target host details, open ports, software versions, vulnerability findings — left the customer network in transit. Some BFSI and government customers had explicit policy against this even with end-to-end encryption. We had to explain this every onboarding call.
v1 + v2 Operationally heavy for the customer
v1 install was a multi-step bash script with explicit dependency installs (Docker, WireGuard, Greenbone tooling, kernel modules). Failure modes were many. We rewrote install.sh as POSIX /bin/sh in v1.2 to handle Ubuntu's dash, but the underlying complexity was still there.
The combination — latency + central bottleneck + data residency + install complexity — meant we could either keep patching v2 or rebuild. We chose rebuild.
Sensor v3 Architecture
The core idea: put OpenVAS on the customer host. Use the WG tunnel only for control plane. Scan traffic never leaves the customer LAN. The Ogma portal pushes scan jobs into a queue; the agent on the sensor claims them and runs them locally; results upload back over the WG tunnel as XML.
Customer Host (Ubuntu 24.04 + Docker)
Single container network_mode: host caps: NET_ADMIN, NET_RAW
tini + supervisord
PID 1 + service manager
priority root
mosquitto
loopback MQTT broker
priority 5
redis
unix-socket only
priority 10
notus-scanner
advisory matcher
priority 15
ospd-openvas
OSP daemon (unix sock)
priority 20
ogma-agent
claim → run → upload
priority 30
Ogma Portal (ogma.in)
/api/sensor/heartbeat
30 s liveness
/api/sensor/claim-scan
5 s poll, single scan
/api/sensor/log
NDJSON live log
/api/sensor/scan-report
final XML upload
The container is ubuntu:24.04 + supervisord + tini. NET_ADMIN is needed because the agent brings up the WG interface on the host network namespace; NET_RAW is needed for OpenVAS port-probing. We deliberately do not grant SYS_ADMIN — we have seen too many "container that needs SYS_ADMIN" pitches that turn out to mean "we couldn't be bothered to set the right per-cap permissions."
Why supervisord and not just docker compose up?
Two reasons. First, OpenVAS depends on a strict service start-order — redis before notus-scanner before ospd-openvas before agent. Compose's depends_on with condition: service_healthy works but adds 4 health-check sidecars and another layer of startup latency. Second, the agent needs to be able to systemctl restart-equivalent any of the scanner services if a scan hangs; supervisord exposes that natively over the loopback MQTT bus.
Phased Boot — Why It Matters
The single thing that bit us hardest in v2 was that the sensor would come up and look ready when in fact ospd-openvas was still loading its NVT cache — a process that can take 5–15 minutes on a cold start. Scans dispatched to a not-actually-ready scanner would fail with confusing errors. v3 fixes this with an explicit phased boot, with each phase reporting its progress back to the portal.
wg_up
Fetch WG config from portal using SENSOR_TOKEN. Bring up wg0. Verify handshake.
feed_sync
Skipped if NASL ≥ 2,000 + Notus ≥ 500 + age < 12 h. Else greenbone-feed-sync --type nvt → greenbone-nvt-sync → raw rsync fallback with to-check=X/Y progress.
feed_ready
Verify NVT count + Notus advisory count meet thresholds. Refuse to proceed if either is empty.
scanner_boot
exec supervisord — start mosquitto → redis → notus-scanner → ospd-openvas → agent.
ready
OSP sock open + agent heartbeat reaching portal. Portal flips sensor status to ONLINE.
Every phase transition POSTs to /api/sensor/log with the new state. Customers and Ogma engineers can both watch the boot progress live in the portal — no more "is it working yet?" tickets ten minutes after install.
# Excerpt from sensor/v3/phased-boot.sh — feed_sync phase if [ "$NASL_COUNT" -ge "$NASL_MIN" ] && [ "$NOTUS_COUNT" -ge "$NOTUS_MIN" ] && [ "$FEED_AGE_HOURS" -lt 12 ]; then log_phase "feed_sync" "skipped — feeds fresh ($NASL_COUNT NASL, $NOTUS_COUNT Notus, $FEED_AGE_HOURS h old)" else log_phase "feed_sync" "running greenbone-feed-sync" greenbone-feed-sync --type nvt || rsync_fallback "$RSYNC_NVT_URL" "$NVT_DIR" greenbone-feed-sync --type notus || rsync_fallback "$RSYNC_NOTUS_URL" "$NOTUS_DIR" fi
The Scan-Claim Pattern
The agent polls POST /api/sensor/claim-scan every 5 seconds. The portal returns either 204 No Content (no work) or a JSON payload describing one scan to run — target hosts, scan-config UUID, target alive-test setting, the works.
The claim is backed by Postgres FOR UPDATE SKIP LOCKED on the va_scans table:
-- claim_next_pending_sensor_scan(sensor_id) — atomic claim, no double-dispatch UPDATE va_scans SET status = 'running', claimed_at = NOW() WHERE id = ( SELECT id FROM va_scans WHERE sensor_id = %s AND status = 'pending' AND scheduled_at <= NOW() ORDER BY scheduled_at ASC LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING *;
Real bug FOR UPDATE with LEFT JOIN doesn't work the way you'd expect
The first version of this query had a LEFT JOIN va_targets in the subquery so we could filter on target attributes. Postgres returned FeatureNotSupported: SELECT FOR UPDATE/SHARE cannot be applied to the nullable side of an outer join. Two concurrent agents could then both claim the same scan, leading to duplicate runs.
FOR UPDATE SKIP LOCKED. Filter on the outer query if needed. Lesson: never put a LEFT JOIN inside a FOR UPDATE subquery — Postgres will refuse it, and depending on your transaction isolation level you may not see the error until production race conditions surface.
Once the agent has a scan claim, it talks OSP over the local /run/ospd/ospd-openvas.sock unix socket — never network sockets. It then polls scan progress and uploads the final report XML to /api/sensor/scan-report. Server-side, _store_sensor_v3_report(scan_id, user_id, xml_bytes) in tasks.py mirrors the hosted code path — the same parsing, the same result/host/CVE schema, so the rest of the portal doesn't care whether a scan came from v3 or v2.
Install Story
Customer-facing install is a single command. We deliberately mirror the FortiGate/CrowdStrike "paste this and you're done" experience.
sudo SENSOR_TOKEN=<tok> SENSOR_NAME="DC-Mumbai-1" sh -c "$(curl -fsSL https://ogma.in/sensor/v3/install)"
The install script (POSIX /bin/sh, Ubuntu-only) does seven things in order:
- Verify Ubuntu (dies on anything else — too many edge cases otherwise)
- Install Docker + iproute2 + iptables + WireGuard tools if missing
- Pull the seven sensor files from
/sensor/v3/files/<f>(Dockerfile, docker-compose.yml, entrypoint.sh, phased-boot.sh, log-shim.sh, agent.py, supervisord.conf, mosquitto.conf, install.sh, uninstall.sh) - Pull the sensor container image
- Generate the local config (sensor name, token, portal URL)
- Start the container — supervisord brings up the stack
- POST a one-time registration to
/api/sensor/registerwith hostname + OS info
On a clean Ubuntu 24.04 with Docker pre-installed, total time is about 6 minutes (mostly the container pull). On a host without Docker, ~10 minutes. From "ready" the agent is online and awaiting scans.
For Windows-only customer environments we ship a parallel PowerShell + WSL2 installer (/sensor/v3/install.ps1). Same pattern, different shell.
Migration Path: How We Roll Out v3
We did not flip every sensor to v3 the day v3 was ready. The cutover is per-sensor, controlled by one column in the va_sensors table:
UPDATE va_sensors SET sensor_version = 'v3' WHERE id = 108;
The router function _launch_queued_scan dispatches on sensor_version. v3 sensors get the agent-claim path. Anything else (NULL or v2) gets the existing two-phase hosted path over WG. Both paths coexist indefinitely; we migrate sensor-by-sensor. The global hosted-flow claim function explicitly excludes v3 sensors via:
AND NOT EXISTS ( SELECT 1 FROM va_sensors WHERE id = s.sensor_id AND sensor_version = 'v3' )
First sensor live: a customer in NCR, on 24 April 2026. WG handshake confirmed, feeds synced (3,454 NASL + 512 Notus), OSP socket accepting connections, agent online. The next sensor flips on the customer's next maintenance window.
Twelve Bugs We Hit Building This
For the engineers reading: a brief catalogue. Useful as a checklist if you're building anything similar.
FOR UPDATEwithLEFT JOIN— covered above. Fixc247384.- Greenbone
greenbone-feed-syncwrapper unquoted a path — broke on hostnames with dashes. Fix:de0b835. - Feed-directory ownership wrong post-sync; ospd-openvas couldn't read NVTs. Fix:
chown -Rin entrypoint,e6c55d6. - iproute2 + iptables not installed by default in
ubuntu:24.04. Container couldn't bring upwg0. Fix: explicitapt install,8294bc4. - Mosquitto broker wasn't whitelisting the agent's loopback subscription. Fix: explicit ACL in
mosquitto.conf,daccf4e. - notus-scanner missing entirely from initial supervisord config — ospd-openvas calls it for advisory matching. Fix: add it,
466342d. - Heartbeat thread not actually starting (silent exit during init). Fix: explicit
set -u+ visible error,6ab0118. - Agent output going to
/dev/nullby default — invisible debug. Fix: visible stdout,143a80f. - Heartbeat thread racing the registration POST — sensor showed offline for the first 60s. Fix: heartbeat thread starts post-register,
6d8c471. - Phased boot's
set -ucaught a missingSENSOR_TOKENenv var late in install. Fix: validation up front,7a62d90. - Windows installer clashed with user's existing Ubuntu WSL distro name. Fix: namespaced WSL distro,
1e653e4. - OSP
get_vtsintermittently returned empty NVT family list. Fix: hardcode known good families + buffer,fad2858+93f0f7d.
Twelve bugs in two weeks of focused build. None trivial. None catastrophic. The right test for whether a piece of infrastructure is ready for production is not "does it work the first time" — it's whether you've found and fixed the bugs that would have bitten you.
What's Next
- Migration runway. Every existing v2 sensor flips to v3 over the next 60 days, sensor-by-sensor, on customer maintenance windows.
- ARM64 image. Ubuntu 24.04 ARM build for sensors deployed on Raspberry Pi-class hardware in branch sites.
- Authenticated scans. v3 currently does unauthenticated network scans. Next: SSH credential brokering for in-depth Linux scans, WMI / WinRM for Windows.
- CIS benchmark scans. The OpenVAS NVT feed already includes CIS benchmark NASL scripts; we have to surface them in the portal as a separate scan type.
- Compliance report templates. One-click DPDPA / RBI / SEBI CSCRF / ISO 27001 mapped reports straight from the v3 scan output. The plumbing exists in our hosted path; v3 reports go through the same code now.
✅ Key Takeaways
- Hosted-over-WG is the wrong default for VA. Latency, central capacity bottleneck, scan-data residency and install complexity all bite at scale. Push the scanner to the customer host instead.
- Single-container deployments don't have to mean Docker Compose. A well-sequenced supervisord inside one container avoids the cascade-restart problem and reduces the dependency surface area.
- Phased boot reporting is operationally invaluable. Five named phases with progress streamed to the portal in real time eliminates "is it working" support tickets entirely.
- Postgres
FOR UPDATE SKIP LOCKED+LEFT JOINis a footgun. Filter in the outer query, not the locked subquery. - Migrate by SQL flag, not by all-at-once cutover. One column on the sensor row routes to v3; everything else stays on the v2 path indefinitely. Reversible.
🛡️ Managed Vulnerability Assessment
Want this running in your environment?
Ogma's Managed Vulnerability Assessment is the consumer of everything described above. Sensor v3 deploys in 10 minutes on any Ubuntu host you control. Compliance-mapped reports for RBI / SEBI CSCRF / DPDPA / ISO 27001 / PCI DSS / CERT-In come out of the same scan output. NSE7-certified engineers triage and remediate. Talk to us.
Stay ahead of cyber threats
One short email a week — curated Indian cybersecurity news, Fortinet releases, DPDPA updates. No fluff.