See it before users open a ticket. Audit it after.
Prometheus + Grafana, Loki, Wazuh, Zabbix — the open observability stack we run for ourselves and for clients. On-call rotation you can audit, status page customers can subscribe to, post-incident reports inside 5 working days.
Talk to a monitoring engineerWho this is for
Outages reach you via Slack DM from a customer rather than via your monitoring. We fix that.
Learn more →Workloads across Azure, AWS, on-prem, OpenStack. You need one pane of glass that doesn't cost £30k/yr.
Learn more →FCA / NHS DSPT / sector-equivalent demands evidenced monitoring. We map the metrics to the assertions.
Learn more →What's included
Metrics (Prometheus + Grafana)
Hosted in your cloud or ours. Dashboards per service, alerts per SLO, 13-month retention default.
Logs (Loki / Wazuh / Splunk)
Centralised, indexed, with retention by class. Audit-grade for the Regulated tier.
Synthetic + RUM
Black-box checks of every public endpoint; real-user monitoring where it's justified.
On-call rotation
Pager-rota integrated to PagerDuty / Opsgenie / Grafana OnCall. We can be the on-call team or share with yours.
Status page
Public statuspage.io / Cachet / our own — whatever your customers expect. Subscribe via RSS / email / SMS.
Postmortems
Every P1 gets a postmortem within 5 working days. Blameless, structured, action-tracked.
How we deliver
- Discovery — What do you currently watch? What did you find out about last from a customer?
- Design — SLO catalogue with targets, alerting policy, on-call rotation.
- Run — Stack live within 2 weeks. First on-call rotation in week 3.
- Optimise — Monthly SLO review. Postmortem actions tracked to closure.
Outcomes you can measure
Tier-specific SLAs at /service-levels/.
Tech stack we run
Standard pieces; we'll work with what you have if you prefer.
Ready to talk monitoring?
30-minute discovery call. No slide deck.
Book a consultation