Uptime Monitoring for Heroku Apps: A Practical Guide

You've built a fantastic application, and it's running smoothly on Heroku. Heroku handles a significant portion of the infrastructure heavy lifting, providing a robust platform that keeps your dynos running and requests routing. But while Heroku excels at platform stability, it doesn't inherently guarantee your application is healthy and serving users correctly. That's where dedicated uptime monitoring comes in.

This guide will walk you through the essentials of monitoring your Heroku applications, from basic HTTP checks to sophisticated health endpoints, and how a tool like Tickr can ensure you're the first to know when things go awry. We'll cover common pitfalls and best practices, all from an engineering perspective.

Why Heroku Apps Need Dedicated Uptime Monitoring

Heroku's platform is designed for resilience. If a dyno crashes, Heroku will often restart it automatically. The routing mesh is robust, and the underlying infrastructure is generally stable. However, your application can still experience issues that Heroku's platform-level monitoring won't catch directly:

  • Application Code Errors: A bug in your code can lead to 500-level errors, even if the dyno itself is technically "running."
  • Database Connection Issues: Your Heroku Postgres database might hit connection limits, or a network transient could prevent your app from connecting.
  • Third-Party API Dependencies: If your application relies on external APIs (e.g., payment gateways, mapping services), their downtime can effectively make your app unusable, even if your code is perfect.
  • Resource Exhaustion: While Heroku provides resource isolation, your dyno can still hit memory or CPU limits, leading to slow responses or application crashes (e.g., an R14 error).
  • Long-Running Requests: If requests take too long, Heroku's router will time them out (H12 error), preventing users from accessing your service.

In all these scenarios, your Heroku dynos might appear "up" to Heroku's internal systems, but your users are experiencing downtime or degraded service. This is why you need an external, application-aware uptime monitoring solution.

Setting Up Basic HTTP(S) Monitoring

The simplest form of uptime monitoring for a Heroku app is to periodically make an HTTP(S) GET request to your application's main URL and check for a successful response. This catches the most critical failures: your app being completely unreachable, or returning a non-2xx status code for all requests.

With Tickr, you'd set up a monitor targeting your app's public URL, for example:

  • URL: https://your-heroku-app.herokuapp.com/
  • Method: GET
  • Expected Status Code: 200
  • Probe Frequency: 1 minute (or as often as needed)

Tickr will send HTTPS probes from various global locations, ensuring that network issues specific to one region don't give you a false sense of security. If the probe fails to connect, receives a non-200 status code, or times out, you'll be alerted.

Pitfall: While a 200 OK status code is a good start, it's not a definitive sign of health. Your application might be returning a generic, static error page with a 200 status code, or an empty page, indicating an internal failure without a proper HTTP status. This leads us to more robust monitoring.

Beyond Basic: Health Checks and Body Substring Matching

To gain true confidence in your application's health, you need a dedicated health check endpoint. This endpoint should go beyond simply serving a page; it should perform actual internal checks of your application's critical dependencies.

A good health check endpoint typically:

  1. Verifies database connectivity: Can the app connect to and query its database?
  2. Checks external services: Are critical third-party APIs reachable and responding as expected? (e.g., Redis, external payment gateways, search services).
  3. Tests internal components: Are any background job queues or internal caches functioning?
  4. Returns a specific status code: 200 OK for healthy, 503 Service Unavailable or 500 Internal Server Error for unhealthy.
  5. Optionally returns a specific body string: This provides an extra layer of verification.

Here's an example of a simple health check endpoint in a Ruby on Rails application. Similar patterns apply to frameworks like Python/Django/Flask, Node.js/Express, or others.

# config/routes.rb
Rails.application.routes.draw do
  # ... other routes ...
  get '/health', to: 'application#health'
end
# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
  # ... other controller methods ...

  def health
    # 1. Check database connectivity
    ActiveRecord::Base.connection.execute('SELECT 1')

    # 2. Check a critical external service (e.g., Redis)
    # If you're using Redis, you might do:
    # REDIS.ping
    # Or for an external API:
    # Net::HTTP.get(URI('https://api.example.com/status'))

    # If all checks pass
    render plain: 'OK', status: :ok
  rescue StandardError => e
    # Log the error for debugging
    Rails.logger.error("Health check failed: #{e.message}")
    # Return a 503 status code with a descriptive message
    render plain: "Unhealthy: #{e.message}", status: :service_unavailable
  end
end

With this endpoint in place, your Tickr monitor would be configured as follows:

  • URL: https://your-heroku-app.herokuapp.com/health
  • Method: GET
  • Expected Status Code: 200
  • Expected Body Substring: OK (case-sensitive or insensitive, depending on your setup)

By checking for both the 200 status code and the specific OK substring, you prevent false positives where a misconfigured health endpoint might return a 200 but not the expected content, indicating a subtle failure.

Pitfall: Ensure your health check endpoint is lightweight and fast. It should not perform complex, resource-intensive operations that could degrade your application's performance or cause the health check itself to time out. Also, make sure it's publicly accessible if Tickr needs to hit it – don't put it behind authentication unless your monitoring solution supports authenticated probes.

Common Pitfalls and Heroku-Specific Considerations

When monitoring Heroku apps, there are a few platform-specific nuances to be aware of:

  • Dyno Sleeping (Free Tier): If you're using Heroku's free dynos, they will sleep after 30 minutes of inactivity. The first request after a dyno has slept will experience a significant delay (cold start). While your monitor will eventually wake the dyno, the initial probe might time out or be very slow. This is generally not an issue for paid dynos (Hobby, Standard, Performance) which remain awake. If you're on a free tier,