Alerts

Define alert rules using GQL queries with threshold conditions. GoodLogs evaluates them every 60 seconds and notifies you via webhook, Slack, or Discord when metrics cross thresholds.

Default rules on new projects

Every newly created project is seeded with three baseline alerts so something useful is firing on day one — no configuration needed:

NameConditionSeverityCooldown
High error rateerror_count > 50 over 30 minwarning60 min
Critical error spikeerror_count > 200 over 30 mincritical30 min
Log volume droplog_volume > 0 over 15 minwarning60 min

You can disable, tune, or delete them from Dashboard → Alerts.

Creating Alerts

Go to Alerts in the sidebar and click + New Alert. Write a GQL query with an alert condition, or use the AI button to describe what you want in plain English.

GQL Alert Syntax

Alert queries use the over:WINDOW OP THRESHOLD syntax at the end of the pipeline. This defines a rolling time window and a condition that triggers the alert.

gql
# Basic: alert if more than 50 errors in 30 minutes
severity:error | count | over:30m > 50

# Any fatal errors in 5 minutes
severity:fatal | count | over:5m > 0

# Pattern match: payment failures
message:~"payment failed" | count | over:1h > 10

# Service-specific errors
severity:error service:billing | count | over:15m > 20

# Dead service (no events)
from:events | count | over:10m < 1

# 5xx errors
status_code:>=500 | count | over:5m > 50

# Latency degradation
| avg(duration_ms) | over:10m > 2000

# Signup drops below threshold
from:events event_name:signup | count | over:1h < 5

# Database timeout detection
message:~"database timeout" | count | over:5m > 0
OperatorMeaningExample
>Greater thanover:30m > 100
>=Greater or equalover:5m >= 1
<Less than (detect drops)over:10m < 1
<=Less or equalover:1h <= 5
=Equal toover:30m = 0
!=Not equal toover:1h != 0

Supported Aggregations

Any GQL aggregate function works before over::

gql
| count | over:30m > 50           # count of matching rows
| avg(duration_ms) | over:10m > 2000   # average of a field
| sum(amount) | over:1h > 10000        # sum of a field
| max(response_time) | over:5m > 5000  # max value

AI-Powered Alert Creation

Click the AI button in the alert GQL bar to describe your alert condition in plain English. The AI generates a valid GQL alert query using your project's actual schema.

Example Prompts

text
"alert me if there are more than 100 errors in 30 minutes"
→ severity:error | count | over:30m > 100

"notify when fatal errors happen"
→ severity:fatal | count | over:5m > 0

"alert if payment failures exceed 10 per hour"
→ message:~"payment failed" | count | over:1h > 10

"warn if average response time goes above 2 seconds"
→ | avg(duration_ms) | over:10m > 2000

"alert when no events for 10 minutes"
→ from:events | count | over:10m < 1

"alert if 5xx errors exceed 50 in 5 minutes"
→ status_code:>=500 | count | over:5m > 50

The AI knows your schema — it uses your actual field names, event names, and property types when generating queries.

Tip: While typing in AI mode, autocomplete suggests your project's field names to help you reference them accurately in your description.

Quick-Start Examples

Click any example in the alert creation form to pre-fill the GQL bar:

TemplateWhat It Does
severity:error | count | over:30m > 50Error spike detection
severity:fatal | count | over:5m > 0Any fatal error
message:~timeout | count | over:15m > 20Timeout pattern
message:=~"payment.*failed" | count | over:1h > 10Regex pattern match
from:events | count | over:10m < 1Dead service detection

Severity

Each alert has a severity that affects notification styling and urgency:

SeverityColorUse Case
info🔵 BlueInformational — non-urgent notifications
warning🟡 YellowWarning — needs attention soon (default)
critical🔴 RedCritical — immediate action required

Notification Channels

Configure one or more notification channels per alert. Notifications fire on both trigger and resolve events.

Webhook

POST a JSON payload to any HTTP endpoint.

json
{
  "alert": "Error Spike",
  "metric": "error_count",
  "value": 127,
  "threshold": 50,
  "status": "triggered",
  "severity": "critical",
  "project_id": "uuid",
  "message": "Error Spike: error_count is 127 (threshold: 50)",
  "timestamp": "2026-05-26T14:32:00Z"
}

Slack

Formatted message with severity emoji and metric details. Provide an incoming webhook URL.

text
🔴 *Alert TRIGGERED*: Error Spike

Error Spike: error_count is 127 (threshold: 50)

*Metric:* `error_count` | *Value:* `127` | *Threshold:* `50`

Discord

Color-coded rich embed (red for critical, yellow for warning, green for resolved).

Webhook Signing

Each notification channel supports an optional signing secret. When configured, GoodLogs computes an HMAC-SHA256 signature of the JSON payload and includes it in the x-goodlogs-signature header.

text
x-goodlogs-signature: sha256=a1b2c3d4e5f6...

Verifying Signatures

On your server, compute the HMAC-SHA256 of the raw request body using your secret and compare:

javascript
const crypto = require('crypto');

function verifySignature(body, secret, signature) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(signature)
  );
}

// In your webhook handler:
app.post('/webhook', (req, res) => {
  const sig = req.headers['x-goodlogs-signature'];
  if (!verifySignature(JSON.stringify(req.body), WEBHOOK_SECRET, sig)) {
    return res.status(401).send('Invalid signature');
  }
  // Process alert...
});
python
import hmac, hashlib

def verify_signature(body: bytes, secret: str, signature: str) -> bool:
    expected = 'sha256=' + hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

# In your webhook handler:
sig = request.headers.get('x-goodlogs-signature', '')
if not verify_signature(request.data, WEBHOOK_SECRET, sig):
    abort(401)

Warning: Always use constant-time comparison (timingSafeEqual / compare_digest) to prevent timing attacks.

Cooldown

Set a cooldown period (in minutes) to prevent the same alert from re-triggering too quickly. This avoids notification storms from flapping metrics.

text
Cooldown: 15 minutes
→ After triggering, the alert won't fire again for 15 minutes
   even if the metric stays above the threshold.

Muting

Mute an alert to suppress notifications during maintenance windows. The alert still evaluates (status updates) but doesn't send notifications.

bash
POST /api/orgs/:org/projects/:project/alerts/:id/mute
{ "minutes": 60 }

Message Templates

Customize notification messages with variable substitution:

text
🚨 {{name}}: {{metric}} is {{value}} (threshold: {{threshold}}, status: {{status}})
VariableDescription
{{name}}Alert rule name
{{metric}}Metric being measured
{{value}}Current metric value
{{threshold}}Configured threshold
{{status}}triggered or resolved
{{severity}}info, warning, or critical
{{project_id}}Project UUID

Alert Lifecycle

  • OK → TRIGGERED: metric crosses threshold → trigger notification
  • TRIGGERED → OK: metric returns to normal → resolve notification (includes duration)

Querying Alerts with GQL

Use from:alerts to query alert rules and from:alert_events to query the alert timeline.

Alert Rules

gql
# All alert rules
from:alerts

# Currently firing alerts
from:alerts status:triggered

# Critical alerts
from:alerts severity:critical

# Count alerts by status
from:alerts | count by status

Fields: name, metric, status (ok/triggered), severity, threshold, window_minutes, enabled, cooldown_minutes

Alert Timeline

gql
# Recent triggers
from:alert_events event_type:triggered | last:7d

# Most triggered alerts
from:alert_events event_type:triggered | count by alert_name | top 10 | last:30d

# Alert frequency trend
from:alert_events | count | timeseries 1d | last:30d

# Average resolution time
from:alert_events event_type:resolved | avg(duration_seconds) | last:30d

Fields: event_type (triggered/resolved), metric, actual_value, threshold, alert_name, duration_seconds

Status Pages

Alert events power the Public Status Pages feature. When status pages are enabled on a project, the page automatically derives its state from open alert events:

  • No open alerts → Operational (green)
  • Open alert with error/fatal/crash/5xx metric → Outage (red)
  • Any other open alert → Degraded (amber)

The 90-day uptime bar and incident history are built from the alert_events timeline. See Status Pages for setup and configuration.