
Your n8n instance handles critical business automations. Customer orders, lead notifications, data syncs. What happens when something breaks at 2 AM?
Without proper monitoring, you’re flying blind. A well-designed stack watches your system around the clock, sending alerts before small issues become major outages. This guide shows you exactly how to build that safety net.
Monitoring your n8n infrastructure is essential for maintaining uptime, performance, and workflow reliability. The comparison table below highlights VPS hosting providers that support stable environments for logging, monitoring, and alerting systems. These providers make it easier to track resource usage and detect issues before they affect production workflows. Explore our recommended VPS hosting options.
VPS Hosting Providers Built for Reliable n8n Monitoring and Infrastructure Visibility
| Provider | User Rating | Recommended For | |
|---|---|---|---|
![]() | 4.8 | Scalability | Visit Kamatera |
![]() | 4.6 | Affordability | Visit Hostinger |
![]() | 4.7 | Developers | Visit IONOS |
Why You Need a Server Monitoring Stack for n8n Infrastructure
Self-hosting n8n offers incredible data sovereignty. You control your encrypted credentials, your workflow execution data, and your entire platform. But here’s the catch: managing your own infrastructure requires constant oversight.
Without monitoring, you’re essentially hoping nothing breaks. That’s not a strategy. That’s wishful thinking.
A robust monitoring stack delivers three critical benefits. First, it ensures high uptime by catching issues early. Second, it optimizes resource usage across CPU, memory, and storage. Third, it provides immediate alerts before bottlenecks impact your operations.
Yes, setting up a custom monitoring environment adds initial complexity. But consider the alternative. One missed database connection failure could mean hours of lost data. One memory spike could crash your production workloads at the worst possible moment.
The long-term benefits of visibility far outweigh the setup time. When your automation volume grows, you’ll be grateful for the foundation you built.

Integrating external tools like UptimeRobot or Healthchecks.io with your internal stack guarantees 24/7 reliability. This combination of internal and external monitoring creates a safety net that catches problems from multiple angles. For teams looking at best n8n hosting providers, understanding monitoring requirements helps you choose the right resources.
Built-In Health Checks for Your n8n Instance
The /healthz Endpoint
This endpoint is your first line of defense. It verifies whether your n8n instance is reachable and running.
Access it by navigating to <your-instance-url>/healthz. When things work correctly, it returns an HTTP 200 status code. Simple and effective as a basic heartbeat check.
However, there’s a limitation worth noting. This endpoint doesn’t verify database connection status. Your app could be reachable but completely unable to process any data. Think of it as checking whether the lights are on without confirming anyone’s home.
The /healthz/readiness Endpoint
This endpoint provides deeper assessment. It confirms your instance is fully ready to accept traffic.
Here’s the key difference: it returns HTTP 200 only if the PostgreSQL database is successfully connected and fully migrated. No database connection? No green light.
This makes it the recommended endpoint for load balancers and reverse proxy health checks. You want traffic routed only to nodes that can actually handle work. Access it via <your-instance-url>/healthz/readiness.
How to Enable Metrics via Environment Variables
Detailed status information lives at the /metrics endpoint. But it’s disabled by default for security and performance reasons.
To enable metrics, set N8N_METRICS=true in your configuration. Once active, access detailed metrics at <your-instance-url>/metrics.
Note: this feature works only for self-hosted setups. It’s not available on n8n Cloud.
You can further customize health check endpoints using the N8N_ENDPOINT_HEALTH environment variable. This gives you flexibility in how you structure your monitoring architecture.
Building a Production-Grade Stack to Deploy n8n
Prometheus for Metrics Collection

Prometheus is a time-series database that scrapes and stores metrics from your n8n instance and server host. It forms the core of your monitoring stack.
For storage management, set a 15-day data retention policy using –storage.tsdb.retention.time=15d. This balances historical analysis with disk usage.
You can optionally expose Prometheus to external networks by setting EXPOSE_PROMETHEUS=true. The official n8n documentation provides detailed setup guides for enabling Prometheus metrics natively.
Grafana for Visualization and Alerting
Grafana pairs perfectly with Prometheus to visualize data and manage critical system alerts. It transforms raw metrics into actionable dashboards.
The platform supports automated provisioning of datasources, dashboards, and alert rules directly upon deployment. You can configure files like n8n_grafana_alerts.json to bootstrap your setup.
Security settings include configuring the GF_SECURITY_ADMIN_PASSWORD for authentication. For production use, manage Strict Transport Security with durations between 31,536,000 and 315,360,000 seconds.
Essential Exporters for Self Hosted Deployments
Full-stack observability requires specialized exporters alongside your core applications.
Node Exporter captures host-level hardware metrics like CPU, RAM, and disk usage. It’s essential for understanding your server’s overall health.
cAdvisor runs in privileged mode to collect granular container-level metrics. Each Docker container gets individual monitoring.
Postgres Exporter and Redis Exporter monitor your database backend and queue system health. These become critical as execution volume increases.
Implementing Monitoring with Docker Compose
Single Mode Architecture Setup
A standard single-mode deployment includes Traefik for proxy and TLS, PostgreSQL for your database, and the n8n main application.

The monitoring layer adds Prometheus, Grafana, Node Exporter, cAdvisor, and Postgres Exporter. These services run alongside your core infrastructure.
Activating them is straightforward using Docker Compose profiles. Set COMPOSE_PROFILES=monitoring and the entire stack spins up together. For deeper guidance, explore our content on scaling n8n deployments.
Queue Mode Architecture and Scaling
Queue mode separates the n8n main process from background workers. The main application handles UI, schedules, and webhooks while workers execute heavy workflows.
This architecture introduces Redis as the message broker, n8n worker nodes, and the Redis Exporter for monitoring. Understanding the differences matters for proper setup. Check our guide on queue versus regular mode for detailed comparisons.
To capture comprehensive queue data, set N8N_METRICS_INCLUDE_QUEUE_METRICS=true. This unlocks visibility into job processing performance.
Comparison Table: Monitoring Stack Components
| Component | Purpose | Single Mode | Queue Mode | Enablement |
|---|---|---|---|---|
| Prometheus | Metrics storage/scraping | Yes | Yes | profiles: [“monitoring”] |
| Grafana | Dashboards/alerts | Yes (provisioned JSONs) | Yes (queue-specific dashboards) | GF_SECURITY_ADMIN_PASSWORD |
| Node Exporter | Host CPU/RAM/disk | Yes | Yes | prom/node-exporter:latest |
| cAdvisor | Container metrics | Yes | Yes | gcr.io/cadvisor/cadvisor:latest (privileged) |
| Postgres Exporter | DB metrics | Yes | Yes | quay.io/prometheuscommunity/postgres-exporter |
| Redis Exporter | Queue metrics | No | Yes | oliver006/redis_exporter |
| Traefik Metrics | Proxy health | Yes | Yes | –metrics.prometheus=true |
Essential Grafana Dashboards for n8n
Node Exporter and Host Metrics

Use the pre-built Node Exporter Full Dashboard (ID: 1860) for host monitoring.
Key indicators to track include CPU Busy/Sys Load. Take action if it exceeds 75-80%. SWAP Used should stay at 0% for optimal performance.
Monitor Root FS Used closely. Configure alerts to trigger when disk space exceeds 80% capacity. Running out of storage causes cascading failures.
Container Metrics with cAdvisor
Import the cAdvisor Dashboard (ID: 14282) to track individual Docker containers.
It provides per-container breakdowns of CPU, memory, and network usage. The “Containers Info” table displays critical metadata like container uptime and image versions.
This visibility helps identify which containers consume the most resources and whether you need to scale servers.
Database and Queue Monitoring Dashboards
For database monitoring, use the Postgres Exporter Dashboard (ID: 9628). It tracks query performance and connection pool variables.
For queue monitoring, the Redis Dashboard (ID: 763) tracks message brokering efficiency across your scalable worker setup.
The Traefik 2.x Dashboard (ID: 12250) monitors average response times and status codes. Watch for spikes in 4xx/5xx errors.
Proactive Alerting to Prevent Data Loss
Foundational Health and Memory Alerts
Deploying a pre-configured alerts pack is critical. Set immediate alerts for “n8n Down” status when the Up metric drops below 1.
Configure RSS memory alerts to trigger if usage remains high. For example, alert when memory stays above 850 MiB for 10 consecutive minutes. This prevents Out-Of-Memory crashes that cause data loss.
Building workflows is easy. Keeping them running reliably requires this kind of proactive monitoring.
Managing Event Loop Lag and CPU Usage

Event Loop Lag is critical for Node.js applications like n8n. A p99 lag exceeding 500ms is considered critical and requires immediate attention.
Monitor Queue Throughput by tracking completed and failed executions per second. This helps identify stalled workflows before they become problems.
Set alerts for Queue Backlogs exceeding 100 waiting jobs for more than 5 minutes. These backlogs can create systemic delays affecting all your automations.
Queue Mode Specifics and Advanced Scaling
Worker Node Health Checks
In queue mode, health endpoints for worker nodes are disabled by default. You must enable them using QUEUE_HEALTH_CHECK_ACTIVE=true.
Once enabled, worker servers expose /healthz confirming the worker node is up and /healthz/readiness confirming database and Redis readiness.
The default worker concurrency is 10, though a minimum of 5 is recommended for stable development and scaling.
Multi-Main Setup for High Availability
For enterprise resilience, enable a multi-main architecture using N8N_MULTI_MAIN_SETUP_ENABLED=true.
This robust setup utilizes leader election managed via Redis or PostgreSQL. Only one main node handles schedules at a time, preventing duplicate workflow execution.
The leader election mechanism uses a 10-second Time-To-Live with a 3-second check interval. This ensures rapid failover without creating a bottleneck in your system.
VPS Sizing and Deployment Best Practices
Resource Requirements for Single vs. Queue Mode
Single Mode requires a minimum of 1 vCPU, 2GB RAM, and 20GB storage for basic operations. This handles moderate execution volume without issues.
Queue Mode demands higher resources. Plan for 2-4 vCPUs, 4-8GB RAM, and 40GB+ storage to handle worker nodes and Redis overhead.

Configure unlimited executions to prune with a max age of 336 hours (14 days). This prevents storage from filling up as your instance processes more workflows.
Looking for affordable n8n hosting options? The monitoring stack adds resource overhead, so factor that into your calculations.
One-Script Automation Tools
Streamline your deployment using automated toolkits like the n8n-toolkit repository.
Tools like n8n_manager.sh handle implementation, upgrades, and backups automatically. This approach gets you closer to zero maintenance operations.
Configure Docker healthchecks to run wget –spider /healthz or pg_isready commands. Use a 10-second interval, 5-second timeout, 20-second start period, and 5 retries.
Ensure graceful shutdowns with a default 30-second timeout. This prevents interrupted workflows and credentials corruption during upgrades.
Choosing the Right VPS for Your Monitoring Stack
Running a complete monitoring stack alongside n8n requires adequate server resources. The combination of Prometheus, Grafana, and multiple exporters adds meaningful overhead to your cloud infrastructure.
When selecting a VPS provider, consider both current needs and future scalability. A server handling moderate traffic today might need to scale as your workflow count grows. Choose providers offering easy upgrade paths and robust network performance.
The monitoring stack itself serves websites, APIs, and dashboards. If you’re building a business around n8n automations, consider deploying a web store or user documentation site on the same infrastructure. Consolidated hosting simplifies management and reduces costs.
Conclusion
Building a server monitoring stack for n8n infrastructure transforms reactive troubleshooting into proactive system management. The combination of health endpoints, Prometheus, Grafana, and specialized exporters gives you complete visibility into your deployment.
Start with basic health checks, add Prometheus and Grafana, then expand to specialized exporters as needed. Each layer builds on the previous, creating comprehensive observability. For production stability guidance, see our checklist.
While self-hosting requires more effort than fully managed alternatives, the control and cost savings make it worthwhile for many teams.
Next Steps: What Now?
- Deploy the /healthz/readiness endpoint check with your load balancer or reverse proxy.
- Set N8N_METRICS=true and connect Prometheus to your instance.
- Import the Node Exporter and cAdvisor dashboards into Grafana.
- Configure memory and event loop lag alerts using the example thresholds.
- Enable queue metrics if running queue mode with worker nodes.
- Test your alerting by simulating a database disconnect.
- Document your monitoring setup for your team.



