What's the difference between /healthz and /healthz/readiness?

The /healthz endpoint confirms the n8n instance is reachable. The /healthz/readiness endpoint additionally verifies database connectivity and migration status. Use readiness for load balancers.

How do I enable metrics on my self-hosted n8n?

Set the environment variable N8N_METRICS=true in your configuration file. Access metrics at <your-instance-url>/metrics.

What resources does the monitoring stack require?

Expect 512MB-1GB additional RAM and minimal CPU overhead. Storage depends on retention settings, typically 5-10GB for 15 days.

Can I use this monitoring setup with Docker Compose?

Yes. Add COMPOSE_PROFILES=monitoring to activate monitoring services alongside your core n8n deployment.

How do I monitor worker nodes in queue mode?

Enable health checks with QUEUE_HEALTH_CHECK_ACTIVE=true. Workers then expose the same health endpoints as the main instance.

What alerts should I configure first?

Start with instance down alerts, memory exceeding 850 MiB, and event loop lag above 500ms. These catch the most critical issues.

Server Monitoring Stack for n8n Infrastructure (2026)

Server Monitoring Stack for n8n Infrastructure: The Ultimate Guide blog

Your n8n instance handles critical business automations. Customer orders, lead notifications, data syncs. What happens when something breaks at 2 AM?

Without proper monitoring, you’re flying blind. A well-designed stack watches your system around the clock, sending alerts before small issues become major outages. This guide shows you exactly how to build that safety net.

Monitoring your n8n infrastructure is essential for maintaining uptime, performance, and workflow reliability. The comparison table below highlights VPS hosting providers that support stable environments for logging, monitoring, and alerting systems. These providers make it easier to track resource usage and detect issues before they affect production workflows. Explore our recommended VPS hosting options.

VPS Hosting Providers Built for Reliable n8n Monitoring and Infrastructure Visibility

User Rating	Recommended For
4.8	Scalability	Visit Kamatera
4.6	Affordability	Visit Hostinger
4.7	Developers	Visit IONOS

Takeaways

The /healthz/readiness endpoint confirms database connectivity.
Enable metrics by setting N8N_METRICS=true in your environment variables.
Prometheus combined with Grafana provides complete visibility into your n8n performance.
Queue mode requires additional monitoring components like Redis Exporter.
Set alerts for event loop lag exceeding 500ms and memory usage staying above 850 MiB.
Configure execution data pruning to 14 days to manage storage effectively.

Why You Need a Server Monitoring Stack for n8n Infrastructure

Self-hosting n8n offers incredible data sovereignty. You control your encrypted credentials, your workflow execution data, and your entire platform. But here’s the catch: managing your own infrastructure requires constant oversight.

Without monitoring, you’re essentially hoping nothing breaks. That’s not a strategy. That’s wishful thinking.

A robust monitoring stack delivers three critical benefits. First, it ensures high uptime by catching issues early. Second, it optimizes resource usage across CPU, memory, and storage. Third, it provides immediate alerts before bottlenecks impact your operations.

Yes, setting up a custom monitoring environment adds initial complexity. But consider the alternative. One missed database connection failure could mean hours of lost data. One memory spike could crash your production workloads at the worst possible moment.

The long-term benefits of visibility far outweigh the setup time. When your automation volume grows, you’ll be grateful for the foundation you built.

Late-night workflow failure alert panic.

Integrating external tools like UptimeRobot or Healthchecks.io with your internal stack guarantees 24/7 reliability. This combination of internal and external monitoring creates a safety net that catches problems from multiple angles. For teams looking at best n8n hosting providers, understanding monitoring requirements helps you choose the right resources.

Built-In Health Checks for Your n8n Instance

The /healthz Endpoint

This endpoint is your first line of defense. It verifies whether your n8n instance is reachable and running.

Access it by navigating to <your-instance-url>/healthz. When things work correctly, it returns an HTTP 200 status code. Simple and effective as a basic heartbeat check.

However, there’s a limitation worth noting. This endpoint doesn’t verify database connection status. Your app could be reachable but completely unable to process any data. Think of it as checking whether the lights are on without confirming anyone’s home.

The /healthz/readiness Endpoint

This endpoint provides deeper assessment. It confirms your instance is fully ready to accept traffic.

Here’s the key difference: it returns HTTP 200 only if the PostgreSQL database is successfully connected and fully migrated. No database connection? No green light.

This makes it the recommended endpoint for load balancers and reverse proxy health checks. You want traffic routed only to nodes that can actually handle work. Access it via <your-instance-url>/healthz/readiness.

Ultahost

Launch, Scale, and Manage your website with high-performance Web Hosting and VPS.

Visit Site Coupons6

How to Enable Metrics via Environment Variables

Detailed status information lives at the /metrics endpoint. But it’s disabled by default for security and performance reasons.

To enable metrics, set N8N_METRICS=true in your configuration. Once active, access detailed metrics at <your-instance-url>/metrics.

Note: this feature works only for self-hosted setups. It’s not available on n8n Cloud.

You can further customize health check endpoints using the N8N_ENDPOINT_HEALTH environment variable. This gives you flexibility in how you structure your monitoring architecture.

Building a Production-Grade Stack to Deploy n8n

Prometheus for Metrics Collection

Prometheus metrics collection workflow diagram.

Prometheus is a time-series database that scrapes and stores metrics from your n8n instance and server host. It forms the core of your monitoring stack.

For storage management, set a 15-day data retention policy using –storage.tsdb.retention.time=15d. This balances historical analysis with disk usage.

You can optionally expose Prometheus to external networks by setting EXPOSE_PROMETHEUS=true. The official n8n documentation provides detailed setup guides for enabling Prometheus metrics natively.

Grafana for Visualization and Alerting

Grafana pairs perfectly with Prometheus to visualize data and manage critical system alerts. It transforms raw metrics into actionable dashboards.

The platform supports automated provisioning of datasources, dashboards, and alert rules directly upon deployment. You can configure files like n8n_grafana_alerts.json to bootstrap your setup.

Security settings include configuring the GF_SECURITY_ADMIN_PASSWORD for authentication. For production use, manage Strict Transport Security with durations between 31,536,000 and 315,360,000 seconds.

Essential Exporters for Self Hosted Deployments

Full-stack observability requires specialized exporters alongside your core applications.

Node Exporter captures host-level hardware metrics like CPU, RAM, and disk usage. It’s essential for understanding your server’s overall health.

cAdvisor runs in privileged mode to collect granular container-level metrics. Each Docker container gets individual monitoring.

Postgres Exporter and Redis Exporter monitor your database backend and queue system health. These become critical as execution volume increases.

Implementing Monitoring with Docker Compose

Single Mode Architecture Setup

A standard single-mode deployment includes Traefik for proxy and TLS, PostgreSQL for your database, and the n8n main application.

Female developer using Docker Compose.

The monitoring layer adds Prometheus, Grafana, Node Exporter, cAdvisor, and Postgres Exporter. These services run alongside your core infrastructure.

Activating them is straightforward using Docker Compose profiles. Set COMPOSE_PROFILES=monitoring and the entire stack spins up together. For deeper guidance, explore our content on scaling n8n deployments.

Queue Mode Architecture and Scaling

Queue mode separates the n8n main process from background workers. The main application handles UI, schedules, and webhooks while workers execute heavy workflows.

This architecture introduces Redis as the message broker, n8n worker nodes, and the Redis Exporter for monitoring. Understanding the differences matters for proper setup. Check our guide on queue versus regular mode for detailed comparisons.

To capture comprehensive queue data, set N8N_METRICS_INCLUDE_QUEUE_METRICS=true. This unlocks visibility into job processing performance.

Comparison Table: Monitoring Stack Components

Component	Purpose	Single Mode	Queue Mode	Enablement
Prometheus	Metrics storage/scraping	Yes	Yes	profiles: [“monitoring”]
Grafana	Dashboards/alerts	Yes (provisioned JSONs)	Yes (queue-specific dashboards)	GF_SECURITY_ADMIN_PASSWORD
Node Exporter	Host CPU/RAM/disk	Yes	Yes	prom/node-exporter:latest
cAdvisor	Container metrics	Yes	Yes	gcr.io/cadvisor/cadvisor:latest (privileged)
Postgres Exporter	DB metrics	Yes	Yes	quay.io/prometheuscommunity/postgres-exporter
Redis Exporter	Queue metrics	No	Yes	oliver006/redis_exporter
Traefik Metrics	Proxy health	Yes	Yes	–metrics.prometheus=true

Essential Grafana Dashboards for n8n

Node Exporter and Host Metrics

A comprehensive monitoring dashboard.

Use the pre-built Node Exporter Full Dashboard (ID: 1860) for host monitoring.

Key indicators to track include CPU Busy/Sys Load. Take action if it exceeds 75-80%. SWAP Used should stay at 0% for optimal performance.

Monitor Root FS Used closely. Configure alerts to trigger when disk space exceeds 80% capacity. Running out of storage causes cascading failures.

Container Metrics with cAdvisor

Import the cAdvisor Dashboard (ID: 14282) to track individual Docker containers.

It provides per-container breakdowns of CPU, memory, and network usage. The “Containers Info” table displays critical metadata like container uptime and image versions.

This visibility helps identify which containers consume the most resources and whether you need to scale servers.

Database and Queue Monitoring Dashboards

For database monitoring, use the Postgres Exporter Dashboard (ID: 9628). It tracks query performance and connection pool variables.

For queue monitoring, the Redis Dashboard (ID: 763) tracks message brokering efficiency across your scalable worker setup.

The Traefik 2.x Dashboard (ID: 12250) monitors average response times and status codes. Watch for spikes in 4xx/5xx errors.

Proactive Alerting to Prevent Data Loss

Foundational Health and Memory Alerts

Deploying a pre-configured alerts pack is critical. Set immediate alerts for “n8n Down” status when the Up metric drops below 1.

Configure RSS memory alerts to trigger if usage remains high. For example, alert when memory stays above 850 MiB for 10 consecutive minutes. This prevents Out-Of-Memory crashes that cause data loss.

Building workflows is easy. Keeping them running reliably requires this kind of proactive monitoring.

Managing Event Loop Lag and CPU Usage

Tech workspace with automation flowchart.

Event Loop Lag is critical for Node.js applications like n8n. A p99 lag exceeding 500ms is considered critical and requires immediate attention.

Monitor Queue Throughput by tracking completed and failed executions per second. This helps identify stalled workflows before they become problems.

Set alerts for Queue Backlogs exceeding 100 waiting jobs for more than 5 minutes. These backlogs can create systemic delays affecting all your automations.

Queue Mode Specifics and Advanced Scaling

Worker Node Health Checks

In queue mode, health endpoints for worker nodes are disabled by default. You must enable them using QUEUE_HEALTH_CHECK_ACTIVE=true.

Once enabled, worker servers expose /healthz confirming the worker node is up and /healthz/readiness confirming database and Redis readiness.

The default worker concurrency is 10, though a minimum of 5 is recommended for stable development and scaling.

Multi-Main Setup for High Availability

For enterprise resilience, enable a multi-main architecture using N8N_MULTI_MAIN_SETUP_ENABLED=true.

This robust setup utilizes leader election managed via Redis or PostgreSQL. Only one main node handles schedules at a time, preventing duplicate workflow execution.

The leader election mechanism uses a 10-second Time-To-Live with a 3-second check interval. This ensures rapid failover without creating a bottleneck in your system.

VPS Sizing and Deployment Best Practices

Resource Requirements for Single vs. Queue Mode

Single Mode requires a minimum of 1 vCPU, 2GB RAM, and 20GB storage for basic operations. This handles moderate execution volume without issues.

Queue Mode demands higher resources. Plan for 2-4 vCPUs, 4-8GB RAM, and 40GB+ storage to handle worker nodes and Redis overhead.

Modern workspace with tech-focused display.

Configure unlimited executions to prune with a max age of 336 hours (14 days). This prevents storage from filling up as your instance processes more workflows.

Looking for affordable n8n hosting options? The monitoring stack adds resource overhead, so factor that into your calculations.

One-Script Automation Tools

Streamline your deployment using automated toolkits like the n8n-toolkit repository.

Tools like n8n_manager.sh handle implementation, upgrades, and backups automatically. This approach gets you closer to zero maintenance operations.

Configure Docker healthchecks to run wget –spider /healthz or pg_isready commands. Use a 10-second interval, 5-second timeout, 20-second start period, and 5 retries.

Ensure graceful shutdowns with a default 30-second timeout. This prevents interrupted workflows and credentials corruption during upgrades.

Choosing the Right VPS for Your Monitoring Stack

Running a complete monitoring stack alongside n8n requires adequate server resources. The combination of Prometheus, Grafana, and multiple exporters adds meaningful overhead to your cloud infrastructure.

When selecting a VPS provider, consider both current needs and future scalability. A server handling moderate traffic today might need to scale as your workflow count grows. Choose providers offering easy upgrade paths and robust network performance.

The monitoring stack itself serves websites, APIs, and dashboards. If you’re building a business around n8n automations, consider deploying a web store or user documentation site on the same infrastructure. Consolidated hosting simplifies management and reduces costs.

Conclusion

Building a server monitoring stack for n8n infrastructure transforms reactive troubleshooting into proactive system management. The combination of health endpoints, Prometheus, Grafana, and specialized exporters gives you complete visibility into your deployment.

Start with basic health checks, add Prometheus and Grafana, then expand to specialized exporters as needed. Each layer builds on the previous, creating comprehensive observability. For production stability guidance, see our checklist.

While self-hosting requires more effort than fully managed alternatives, the control and cost savings make it worthwhile for many teams.

Next Steps: What Now?

Deploy the /healthz/readiness endpoint check with your load balancer or reverse proxy.
Set N8N_METRICS=true and connect Prometheus to your instance.
Import the Node Exporter and cAdvisor dashboards into Grafana.
Configure memory and event loop lag alerts using the example thresholds.
Enable queue metrics if running queue mode with worker nodes.
Test your alerting by simulating a database disconnect.
Document your monitoring setup for your team.

Server Monitoring Stack for n8n Infrastructure (2026): The Ultimate Guide