Some punctuation and style updates

revamp of expressions doc draft 1
initial concept definition and use cases
2025-12-20 11:40:21 +08:00 · 2025-12-03 09:23:26 -06:00 · 2025-12-02 14:56:11 -06:00 · 2025-12-02 08:53:45 -06:00
5 changed files with 1353 additions and 263 deletions
--- a/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/_index.md
+++ b/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/_index.md
@@ -0,0 +1,68 @@
+---
+aliases:
+  - ../../../panels-visualizations/query-transform-data/ # /docs/grafana/next/panels-visualizations/query-transform-data/
+  - ../../../panels-visualizations/query-transform-data/expression-queries/ # /docs/grafana/next/panels-visualizations/query-transform-data/expression-queries/
+  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/
+  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/about-expressions/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/about-expressions/
+  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/write-an-expression/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/write-an-expression/
+labels:
+  products:
+    - cloud
+    - enterprise
+    - oss
+menuTitle: Expressions
+title: Grafana expressions
+description: Write server-side expressions to manipulate data using math and other operations
+weight: 40
+refs:
+  no-data-and-error-handling:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling
+  multiple-dimensional-data:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
+  grafana-alerting:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+  labels:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/#labels
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/#labels
+---
+
+# Grafana expressions
+
+An expression is a server-side operation that takes query results from one or more data sources and transforms them into new data. Expressions perform calculations like math operations, aggregations, or timestamp alignments without modifying the original data source results. This lets you derive metrics, combine data from different sources, and perform transformations your data sources can't do on their own.
+
+By running on the server, expressions also enable features like alerting to continue working even when no user is viewing a dashboard.
+
+## What problems do expressions solve?
+
+Expressions fill the gap between what your data sources can produce and what your visualizations or alerts need.
+
+They address several common challenges:
+
+- **Cross-data-source calculations:** Combine results from different data sources that can't query each other directly. For example, calculate error rates by dividing HTTP errors from Prometheus by total requests from an SQL database.
+- **Derived metrics:** Compute values your data source doesn't provide, such as percentage changes, moving averages, ratios, or conditional logic based on thresholds.
+- **Alerting on complex conditions:** Apply math, reductions, and comparisons to drive alert rules when your data source lacks the necessary functions or when you need to alert across multiple data sources.
+- **Post-query transformations:** Align timestamps between series, resample data to consistent intervals, filter out non-numeric values, or reduce time series to single summary values.
+- **Multi-dimensional data operations:** Perform calculations across multiple series while preserving their label identities. For example, apply the same formula to dozens of host metrics without writing individual queries for each host.
+- **Label-based series matching:** Automatically join and combine series based on their labels. For example, match CPU metrics and memory metrics for the same hosts by joining on common labels like `host` or `region`.
+- **Data quality handling:** Clean your data by filtering out, replacing, or detecting problematic values such as null, NaN, or infinity values before performing calculations or creating alerts.
+
+Without expressions, you'd need to either modify your data source queries (when possible), use client-side transformations (which don't work for alerting), or export and process data externally.
+
+## Get started
+
+Explore these resources to start using expressions:
+
+- [Create and use expressions](create-use-expressions/) - Learn how to create expressions and use Math, Reduce, and Resample operations.
+- [Expression examples](expression-examples/) - Practical examples from basic to advanced for common monitoring scenarios.
+- [Troubleshoot expressions](troubleshoot-expressions/) - Debug and resolve common expression issues.
--- a/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/create-use-expressions.md
+++ b/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/create-use-expressions.md
@@ -0,0 +1,254 @@
+---
+aliases:
+labels:
+  products:
+    - cloud
+    - enterprise
+    - oss
+menuTitle: Create and use expressions
+title: Create and use expressions
+description: Learn how to create expressions and use Math, Reduce, and Resample operations
+weight: 41
+refs:
+  multiple-dimensional-data:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
+  grafana-alerting:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+---
+
+# Create and use expressions
+
+Expressions are most commonly used for [Grafana Alerting](ref:grafana-alerting), where server-side processing ensures alerts continue working even when no user is viewing a dashboard.
+You can also use expressions with backend data sources in visualizations.
+
+## Understand expression data
+
+Before creating expressions, understand the data types and special values you'll work with.
+
+### Data types
+
+Expressions work with two types of data from backend data sources:
+
+- **Time series:** Collections of timestamped values, typically returned by time series databases like Prometheus or InfluxDB.
+- **Numbers:** Individual numeric values, such as aggregated results from SQL queries or reduced time series.
+
+Expressions also operate on [multiple-dimensional data](ref:multiple-dimensional-data), where each series or number is identified by labels or tags.
+For example, a single query can return CPU metrics for multiple hosts, with each series labeled by its hostname.
+
+### Special values
+
+When working with expressions, you'll encounter special values that represent problematic or undefined data:
+
+- **null:** Represents missing or absent data. Common when a data point doesn't exist or wasn't recorded.
+- **NaN (Not a Number):** Represents an undefined or invalid mathematical result, such as dividing zero by zero or taking the logarithm of a negative number. NaN is unique because it doesn't equal itself, which is why expressions include the `is_nan()` function.
+- **Infinity (Inf):** Represents values too large to represent as numbers. Can be positive (`Inf`) or negative (`-Inf`). Often results from dividing by zero.
+
+Expressions provide functions like `is_null()`, `is_nan()`, `is_inf()`, and `is_number()` to detect and handle these special values in your data.
+
+### Reference queries and expressions
+
+Each query or expression in Grafana has a unique identifier called a RefID (Reference ID).
+RefIDs appear as letters (`A`, `B`, `C`) or custom names in the query editor, and they let you reference the output of one query in another expression.
+
+To use a query or expression in a math operation, prefix its RefID with a dollar sign: `$A`, `$B`, `$C`.
+
+**Example:**
+
+If query `A` returns CPU usage and query `B` returns CPU capacity, you can create an expression `$A / $B * 100` to calculate CPU percentage.
+The expression automatically uses the data from queries A and B based on their RefIDs.
+
+## Create an expression
+
+To add an expression to a panel:
+
+1. Open the panel in edit mode.
+1. Below your existing queries, click **Expression**.
+1. In the **Operation** field, select **Math**, **Reduce**, or **Resample**.
+1. Configure the expression based on the operation type.
+1. Click **Apply** to save your changes.
+
+The expression appears in your query list with its own RefID and can be referenced by other expressions.
+
+## Expression operations
+
+Expressions provide three core operations that you can combine to transform your data: Math, Reduce, and Resample.
+Each operation solves specific data transformation challenges.
+
+### Math
+
+Math operations let you perform calculations on your query results using standard arithmetic, comparison, and logical operators.
+Use math expressions to derive new metrics, calculate percentages, or implement conditional logic.
+
+**Common use cases:**
+
+- Calculate error rates: `$errors / $total_requests * 100`
+- Convert units: `$bytes / 1024 / 1024` (bytes to megabytes)
+- Implement thresholds: `$cpu_usage > 80` (returns 1 for true, 0 for false)
+- Calculate capacity remaining: `$max_capacity - $current_usage`
+
+#### Syntax and operators
+
+Reference queries and expressions using their RefID prefixed with a dollar sign: `$A`, `$B`, `$C`.
+If a RefID contains spaces, use brace syntax: `${my query}`.
+
+**Supported operators:**
+
+- **Arithmetic:** `+`, `-`, `*`, `/`, `%` (modulo), `**` (exponent)
+- **Comparison:** `<`, `>`, `==`, `!=`, `>=`, `<=` (return 1 for true, 0 for false)
+- **Logical:** `&&` (and), `||` (or), `!` (not)
+
+**Numeric constants:**
+
+- Decimal: `2.24`, `-0.8e-2`
+- Octal: `072` (leading zero)
+- Hexadecimal: `0x2A` (leading 0x)
+
+#### How operations work with different data types
+
+Math operations behave differently depending on whether you're working with numbers or time series:
+
+- **Number + Number:** Performs the operation on the two values. Example: `5 + 3 = 8`
+- **Number + Time series:** Applies the operation to every point in the series. Example: `$cpu_series * 100` multiplies each CPU value by 100
+- **Time series + Time series:** Performs the operation on matching timestamps. Example: `$series_A + $series_B` adds values at each timestamp that exists in both series
+
+If time series have different timestamps, use the Resample operation to align them first.
+
+#### Label-based series matches
+
+When working with multiple series, expressions automatically match series based on their labels.
+If query `$A` returns CPU usage for multiple hosts (each with a `{host=...}` label) and query `$B` returns memory usage for the same hosts, the expression `$A + $B` automatically matches each host's CPU and memory values.
+
+**Matching rules:**
+
+- Series with identical labels match automatically
+- A series with no labels matches any other series
+- Series with subset labels match (for example, `{host=web01}` matches `{host=web01, region=us-east}`)
+- If both variables contain only one series, they always match
+
+#### Available functions
+
+Math expressions include functions for common operations and data quality checks.
+All functions work with both individual numbers and time series.
+
+**Mathematical functions:**
+
+- `abs(x)` - Returns absolute value. Example: `abs($temperature_diff)`
+- `log(x)` - Returns natural logarithm. Returns NaN for negative values. Example: `log($growth_rate)`
+- `round(x)` - Rounds to nearest integer. Example: `round($average)`
+- `ceil(x)` - Rounds up to nearest integer. Example: `ceil(3.2)` returns `4`
+- `floor(x)` - Rounds down to nearest integer. Example: `floor(3.8)` returns `3`
+
+**Data quality functions:**
+
+These functions help you detect and handle problematic values in your data:
+
+- `is_number(x)` - Returns 1 for valid numbers, 0 for null, NaN, or infinity. Example: `is_number($A)`
+- `is_null(x)` - Returns 1 for null values, 0 otherwise. Example: `is_null($A)`
+- `is_nan(x)` - Returns 1 for NaN values, 0 otherwise. Useful because NaN doesn't equal itself. Example: `is_nan($A)`
+- `is_inf(x)` - Returns 1 for positive or negative infinity, 0 otherwise. Example: `is_inf($A)`
+
+**Test functions:**
+
+- `null()`, `nan()`, `inf()`, `infn()` - Return the named special value. Primarily for testing.
+
+### Reduce
+
+Reduce operations convert time series into single numeric values while preserving their labels.
+Use reduce to create summary statistics, single-value panels, or alert conditions based on time series data.
+
+**Common use cases:**
+
+- Create alert thresholds: Reduce CPU time series to average and alert if it exceeds 80%
+- Display current values: Show the last recorded temperature from a sensor
+- Calculate totals: Sum all errors across a time range
+- Find extremes: Identify maximum memory usage in the last hour
+
+**Available reduction functions:**
+
+- **Last:** Returns the most recent value. Useful for "current state" displays.
+- **Mean:** Returns the average of all values. Use for typical behavior over time.
+- **Min / Max:** Returns the smallest or largest value. Useful for capacity planning or finding anomalies.
+- **Sum:** Returns the total of all values. Useful for counting events or totaling metrics.
+- **Count:** Returns the number of data points. Useful for checking data completeness.
+
+**Example:**
+
+If query `$A` returns CPU usage time series for three hosts over the last hour, applying `Reduce(Mean)` produces three numbers: the average CPU for each host, each labeled with its hostname.
+
+#### Handle non-numeric values
+
+Reduce operations let you control how null, NaN, and infinity values are handled:
+
+- **Strict:** Returns NaN if any non-numeric values exist. Use when data quality is critical.
+- **Drop non-numeric:** Filters out problematic values before calculating. Use when occasional bad data points are acceptable.
+- **Replace non-numeric:** Replaces bad values with a specified number. Use when you want to substitute a default value.
+
+### Resample
+
+Resample operations align time series to a consistent time interval, enabling you to perform math operations between series with mismatched timestamps.
+
+**Why resample:**
+
+When combining time series from different data sources, their timestamps rarely align perfectly.
+One series might report every 15 seconds while another reports every minute.
+Resampling normalizes both series to the same interval so you can add, subtract, or compare them.
+
+**Example use case:**
+
+You want to calculate `$errors / $requests` but your error logs report every 10 seconds while your request metrics report every 30 seconds.
+Resample both series to 30-second intervals, then perform the division.
+
+**Configuration:**
+
+- **Resample to:** The target interval. Use `s` (seconds), `m` (minutes), `h` (hours), `d` (days), `w` (weeks), or `y` (years). Example: `10s`, `1m`, `1h`
+- **Downsample:** How to handle multiple data points in one interval. Choose a reduction function like Mean, Max, Min, or Sum. Example: If resampling from 10s to 30s intervals and you have 3 values, Mean averages them.
+- **Upsample:** How to fill intervals with no data points:
+  - **Pad:** Uses the last known value (forward fill)
+  - **Backfill:** Uses the next known value (backward fill)
+  - **fillna:** Inserts NaN for missing intervals
+
+## Best practices
+
+Follow these guidelines to build efficient and maintainable expressions.
+
+### Process data in the data source when possible
+
+Perform aggregations, filtering, and complex calculations inside your data source rather than in expressions when you can.
+Data sources are optimized for processing their own data, and moving large volumes of data to Grafana for simple operations is inefficient.
+
+**Use expressions for:**
+
+- Operations your data source doesn't support
+- Cross-data-source calculations
+- Lightweight post-processing
+- Alerting logic that needs server-side evaluation
+
+**Avoid expressions for:**
+
+- Simple aggregations your data source can perform
+- Processing millions of data points
+- Operations that could be handled by recording rules or continuous queries
+
+### Understand backend data source requirements
+
+Expressions only work with backend (server-side) data sources. Browser-based data sources can't be used in expressions.
+
+**Supported:** Prometheus, Loki, InfluxDB, MySQL, PostgreSQL, CloudWatch, and other backend data sources.
+
+**Not supported:** TestData, browser-based plugins, or client-side data sources.
+
+### Use alerting-compatible configurations
+
+Expressions work differently in alerting contexts than in panels:
+
+- Alerting requires expressions to evaluate server-side.
+- Most alert conditions need single values (use Reduce operations).
+- Test your expressions with the same time ranges your alerts will use.
+- Legacy dashboard alerts don't support expressions - use [Grafana Alerting](ref:grafana-alerting) instead.
+
--- a/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/expression-examples.md
+++ b/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/expression-examples.md
@@ -0,0 +1,524 @@
+---
+aliases:
+labels:
+  products:
+    - cloud
+    - enterprise
+    - oss
+menuTitle: Expressions examples
+title: Expressions examples
+description: Practical expression examples from basic to advanced for common monitoring scenarios
+weight: 55
+refs:
+  grafana-expressions:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
+  grafana-alerting:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
+---
+
+# Expressions examples
+
+This document provides practical expression examples for common monitoring and visualization scenarios.
+Examples progress from basic to advanced, showing you how to solve real-world problems with Grafana Expressions.
+
+For foundational concepts, refer to [Grafana expressions](ref:grafana-expressions).
+
+## Basic examples
+
+Start here if you're new to expressions. These examples demonstrate fundamental patterns you'll use frequently.
+
+### Convert units
+
+**Scenario:** Your metrics are in bytes, but you want to display them in gigabytes.
+
+**Setup:**
+
+- Query A (Prometheus): `node_memory_MemTotal_bytes`
+- Expression B (Math): `$A / 1024 / 1024 / 1024`
+
+**Result:** Memory values converted from bytes to gigabytes.
+
+**Variations:**
+
+- Bytes to megabytes: `$A / 1024 / 1024`
+- Bytes to terabytes: `$A / 1024 / 1024 / 1024 / 1024`
+- Milliseconds to seconds: `$A / 1000`
+- Celsius to Fahrenheit: `$A * 9 / 5 + 32`
+
+---
+
+### Calculate a simple percentage
+
+**Scenario:** Show what percentage of total memory is being used.
+
+**Setup:**
+
+- Query A (Prometheus): `node_memory_MemTotal_bytes`
+- Query B (Prometheus): `node_memory_MemAvailable_bytes`
+- Expression C (Math): `($A - $B) / $A * 100`
+
+**Result:** Memory usage as a percentage (0-100).
+
+**Tip:** This pattern works for any "used / total * 100" calculation.
+
+---
+
+### Get the current (latest) value
+
+**Scenario:** Display the most recent temperature reading in a stat panel.
+
+**Setup:**
+
+- Query A (InfluxDB): Temperature sensor time series data
+- Expression B (Reduce): Input `$A`, Function: **Last**
+
+**Result:** Single number showing the most recent value from the time series.
+
+**When to use:** Stat panels, gauges, or any visualization that needs a single current value.
+
+---
+
+### Calculate an average over time
+
+**Scenario:** Show the average CPU usage over the dashboard time range.
+
+**Setup:**
+
+- Query A (Prometheus): `node_cpu_seconds_total{mode="idle"}`
+- Expression B (Reduce): Input `$A`, Function: **Mean**
+
+**Result:** Average CPU value across the selected time range.
+
+**Note:** Each series (each CPU core, each host) produces its own average, preserving labels.
+
+---
+
+### Find maximum or minimum values
+
+**Scenario:** Identify the peak memory usage in the last 24 hours.
+
+**Setup:**
+
+- Query A (Prometheus): `node_memory_MemUsed_bytes` (last 24 hours)
+- Expression B (Reduce): Input `$A`, Function: **Max**
+
+**Result:** Peak memory usage value for each host.
+
+**Variations:**
+
+- Use **Min** to find the lowest value
+- Use **Count** to see how many data points exist
+
+---
+
+### Simple threshold check
+
+**Scenario:** Create a binary indicator showing whether CPU is above 80%.
+
+**Setup:**
+
+- Query A (Prometheus): `100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)`
+- Expression B (Math): `$A > 80`
+
+**Result:** Returns `1` when CPU exceeds 80%, `0` otherwise. Useful for alerting or status indicators.
+
+---
+
+## Intermediate examples
+
+These examples combine multiple operations and handle more complex scenarios.
+
+### Calculate error rate percentage
+
+**Scenario:** Display HTTP error rate as a percentage of total requests.
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_requests_total{status=~"5.."}[5m]))`
+- Query B (Prometheus): `sum(rate(http_requests_total[5m]))`
+- Expression C (Math): `$A / $B * 100`
+
+**Result:** Error rate percentage across all endpoints.
+
+**Handling division by zero:** If there are zero requests, this produces infinity. To handle this:
+
+- Expression C (Math): `$B > 0 ? ($A / $B * 100) : 0`
+
+This returns 0 when there are no requests instead of infinity.
+
+---
+
+### Calculate available disk space
+
+**Scenario:** Show available disk space as a percentage for capacity planning.
+
+**Setup:**
+
+- Query A (Prometheus): `node_filesystem_size_bytes{mountpoint="/"}`
+- Query B (Prometheus): `node_filesystem_avail_bytes{mountpoint="/"}`
+- Expression C (Math): `$B / $A * 100`
+
+**Result:** Percentage of disk space available (not used) for each host's root filesystem.
+
+**For alerting:** Add an alert when available space drops below 10%:
+
+- Expression D (Math): `$C < 10`
+
+---
+
+### Aggregate across multiple servers
+
+**Scenario:** Calculate total requests per second across all web servers.
+
+**Setup:**
+
+- Query A (Prometheus): `rate(http_requests_total{job="webservers"}[5m])`
+- Expression B (Reduce): Input `$A`, Function: **Sum**
+
+**Result:** Total requests per second across all servers combined into a single value.
+
+**Alternative:** To get the average per server instead:
+
+- Expression B (Reduce): Input `$A`, Function: **Mean**
+
+---
+
+### Combine metrics from different data sources
+
+**Scenario:** Calculate efficiency by dividing application throughput (Prometheus) by infrastructure cost metric (CloudWatch).
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(processed_jobs_total[5m]))`
+- Query B (CloudWatch): EC2 instance cost metric
+- Expression C (Resample): Input `$A`, Resample to: `1m`, Downsample: Mean
+- Expression D (Resample): Input `$B`, Resample to: `1m`, Downsample: Mean
+- Expression E (Math): `$C / $D`
+
+**Result:** Jobs processed per dollar (or cost unit), showing application efficiency.
+
+**Why resample:** Different data sources often have different collection intervals. Resampling ensures timestamps align for math operations.
+
+---
+
+### Compare hosts to fleet average
+
+**Scenario:** Identify hosts performing worse than the fleet average.
+
+**Setup:**
+
+- Query A (Prometheus): `node_cpu_usage_percent` (returns one series per host)
+- Expression B (Reduce): Input `$A`, Function: **Mean** (fleet average)
+- Expression C (Math): `$A - $B`
+
+**Result:** Each host shows how much above or below the fleet average they are. Positive values indicate above-average CPU usage.
+
+---
+
+### Filter invalid data
+
+**Scenario:** Calculate average response time, ignoring any null or NaN values in the data.
+
+**Setup:**
+
+- Query A (Time series): Response time data with occasional gaps
+- Expression B (Reduce): Input `$A`, Function: **Mean**, Mode: **Drop non-numeric**
+
+**Result:** Clean average that ignores invalid data points.
+
+**Alternative modes:**
+
+- **Strict:** Returns NaN if any value is invalid (use when data quality matters)
+- **Replace non-numeric:** Substitutes a specific value for invalid data points
+
+---
+
+### Calculate rate of change
+
+**Scenario:** Show how quickly memory usage is increasing or decreasing.
+
+**Setup:**
+
+- Query A (Prometheus): `node_memory_MemUsed_bytes`
+- Query B (Prometheus): `node_memory_MemUsed_bytes offset 5m`
+- Expression C (Math): `$A - $B`
+
+**Result:** Bytes of memory change over the last 5 minutes. Positive = increasing, negative = decreasing.
+
+**As a percentage change:**
+
+- Expression C (Math): `($A - $B) / $B * 100`
+
+---
+
+## Advanced examples
+
+These examples demonstrate complex multi-step calculations and sophisticated alerting patterns.
+
+### Compare current value to 24-hour average
+
+**Scenario:** Highlight when current traffic is significantly above or below the daily norm.
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_requests_total[24h]))` (historical average)
+- Query B (Prometheus): `sum(rate(http_requests_total[5m]))` (current rate)
+- Expression C (Reduce): Input `$A`, Function: **Mean**
+- Expression D (Math): `($B - $C) / $C * 100`
+
+**Result:** Percentage difference from the 24-hour average. +50 means 50% above normal, -30 means 30% below normal.
+
+**Use cases:**
+
+- Detect traffic anomalies
+- Identify unusual load patterns
+- Trigger alerts for significant deviations
+
+---
+
+### Calculate service level indicator (SLI)
+
+**Scenario:** Calculate the percentage of requests meeting your latency target (under 200ms).
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_request_duration_seconds_bucket{le="0.2"}[5m]))`
+- Query B (Prometheus): `sum(rate(http_request_duration_seconds_count[5m]))`
+- Expression C (Math): `$A / $B * 100`
+
+**Result:** Percentage of requests completing in under 200ms (your SLI).
+
+**For SLO alerting:** Alert when SLI drops below 99%:
+
+- Expression D (Reduce): Input `$C`, Function: **Mean**
+- Expression E (Math): `$D < 99`
+
+---
+
+### Multi-host alerts with reduction
+
+**Scenario:** Alert when average CPU across all production servers exceeds 80%.
+
+**Setup:**
+
+- Query A (Prometheus): `100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle",env="production"}[5m])) * 100)`
+- Expression B (Reduce): Input `$A`, Function: **Mean** (average across all hosts)
+- Expression C (Math): `$B > 80`
+
+**Result:** Single alert that fires when the fleet average crosses the threshold, not individual host alerts.
+
+**Alternative - alert on any host:**
+
+- Expression B (Reduce): Input `$A`, Function: **Max**
+
+This alerts when any single host exceeds 80%.
+
+---
+
+### Calculate compound metrics
+
+**Scenario:** Calculate Apdex score (Application Performance Index) from response time buckets.
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_request_duration_seconds_bucket{le="0.5"}[5m]))` (satisfied: <500ms)
+- Query B (Prometheus): `sum(rate(http_request_duration_seconds_bucket{le="2.0"}[5m]))` (tolerating: <2s)
+- Query C (Prometheus): `sum(rate(http_request_duration_seconds_count[5m]))` (total)
+- Expression D (Math): `($A + ($B - $A) / 2) / $C`
+
+**Result:** Apdex score from 0 to 1, where 1 is perfect user satisfaction.
+
+**Formula explained:** Apdex = (Satisfied + Tolerating/2) / Total
+
+---
+
+### Detect sustained conditions
+
+**Scenario:** Alert only when CPU has been high for at least 5 minutes, not just a brief spike.
+
+**Setup:**
+
+- Query A (Prometheus): `avg_over_time(node_cpu_usage_percent[5m])`
+- Expression B (Reduce): Input `$A`, Function: **Mean**
+- Expression C (Math): `$B > 80`
+
+**Result:** Alerts only fire when the 5-minute average exceeds the threshold, filtering out brief spikes.
+
+**Alternative approach using count:**
+
+- Query A: `node_cpu_usage_percent`
+- Expression B (Math): `$A > 80`
+- Expression C (Reduce): Input `$B`, Function: **Sum** (counts "1" values where condition is true)
+- Expression D (Math): `$C > 5`
+
+This alerts when more than 5 data points in the range exceed the threshold.
+
+---
+
+### Correlate metrics across systems
+
+**Scenario:** Calculate orders processed per database query to measure backend efficiency.
+
+**Setup:**
+
+- Query A (Prometheus - App metrics): `sum(rate(orders_processed_total[5m]))`
+- Query B (MySQL data source): Database queries per second from performance schema
+- Expression C (Resample): Input `$A`, Resample to: `30s`, Downsample: Mean
+- Expression D (Resample): Input `$B`, Resample to: `30s`, Downsample: Mean
+- Expression E (Math): `$C / $D`
+
+**Result:** Orders per database query, showing how efficiently your backend processes orders.
+
+**Lower is better:** Fewer queries per order means more efficient database usage.
+
+---
+
+### Ratio-based alerts with baseline
+
+**Scenario:** Alert when error ratio increases by more than 2x compared to yesterday's baseline.
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_errors_total[5m]))` (current errors)
+- Query B (Prometheus): `sum(rate(http_requests_total[5m]))` (current requests)
+- Query C (Prometheus): `sum(rate(http_errors_total[5m] offset 24h))` (yesterday's errors)
+- Query D (Prometheus): `sum(rate(http_requests_total[5m] offset 24h))` (yesterday's requests)
+- Expression E (Math): `$A / $B` (current error rate)
+- Expression F (Math): `$C / $D` (baseline error rate)
+- Expression G (Reduce): Input `$E`, Function: **Mean**
+- Expression H (Reduce): Input `$F`, Function: **Mean**
+- Expression I (Math): `$G / $H > 2`
+
+**Result:** Alerts when today's error rate is more than double yesterday's rate.
+
+**Why this matters:** Absolute thresholds don't account for normal variation. Ratio-based alerting adapts to your system's baseline behavior.
+
+---
+
+### Calculate percentile-based thresholds
+
+**Scenario:** Alert when response time exceeds the 95th percentile baseline.
+
+**Setup:**
+
+- Query A (Prometheus): `histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))`
+- Query B (Prometheus): `histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[1h])) by (le))`
+- Expression C (Reduce): Input `$A`, Function: **Last** (current p95)
+- Expression D (Reduce): Input `$B`, Function: **Mean** (baseline p95)
+- Expression E (Math): `$C > $D * 1.5`
+
+**Result:** Alerts when current p95 latency exceeds 1.5x the hourly baseline.
+
+---
+
+### Weighted scores across metrics
+
+**Scenario:** Create a composite health score from multiple metrics (CPU, memory, disk, network).
+
+**Setup:**
+
+- Query A: CPU usage percentage (0-100)
+- Query B: Memory usage percentage (0-100)
+- Query C: Disk usage percentage (0-100)
+- Query D: Network saturation percentage (0-100)
+- Expression E (Reduce): Input `$A`, Function: **Mean**
+- Expression F (Reduce): Input `$B`, Function: **Mean**
+- Expression G (Reduce): Input `$C`, Function: **Mean**
+- Expression H (Reduce): Input `$D`, Function: **Mean**
+- Expression I (Math): `($E * 0.3) + ($F * 0.25) + ($G * 0.25) + ($H * 0.2)`
+
+**Result:** Weighted health score from 0-100 where lower is healthier. Weights reflect relative importance (CPU 30%, Memory 25%, Disk 25%, Network 20%).
+
+**For alerting:**
+
+- Expression J (Math): `$I > 70`
+
+Alert when composite score indicates degraded health.
+
+---
+
+### Conditional logic with fallbacks
+
+**Scenario:** Show error rate, but display 0 instead of infinity when there are no requests.
+
+**Setup:**
+
+- Query A (Prometheus): `sum(rate(http_errors_total[5m]))`
+- Query B (Prometheus): `sum(rate(http_requests_total[5m]))`
+- Expression C (Math): `$B > 0 ? ($A / $B * 100) : 0`
+
+**Result:** Error rate percentage that safely handles zero-request periods.
+
+**Conditional syntax:** `condition ? value_if_true : value_if_false`
+
+**More examples:**
+
+- Cap values at 100: `$A > 100 ? 100 : $A`
+- Convert negative to zero: `$A < 0 ? 0 : $A`
+- Binary classification: `$A > threshold ? 1 : 0`
+
+---
+
+### Time-window comparison for trend detection
+
+**Scenario:** Detect if metrics are trending up or down by comparing recent data to slightly older data.
+
+**Setup:**
+
+- Query A (Prometheus): `avg_over_time(http_requests_total[5m])`
+- Query B (Prometheus): `avg_over_time(http_requests_total[5m] offset 10m)`
+- Expression C (Reduce): Input `$A`, Function: **Mean**
+- Expression D (Reduce): Input `$B`, Function: **Mean**
+- Expression E (Math): `($C - $D) / $D * 100`
+
+**Result:** Percentage change in requests between the last 5 minutes and the previous 5-minute window.
+
+**Interpretation:**
+
+- Positive values: Traffic increasing
+- Negative values: Traffic decreasing
+- Values near 0: Traffic stable
+
+**Use case:** Detect rapid traffic changes that might indicate problems or attacks.
+
+---
+
+## Tips for expression development
+
+Follow these best practices to build reliable, maintainable expressions in your visualizations and alerts.
+
+### Start simple and iterate
+
+Begin with basic operations and verify each step works before adding complexity. Use the Query Inspector to see intermediate results.
+
+### Name your queries clearly
+
+While RefIDs default to letters, you can use descriptive names. Referencing `${errors}` and `${total_requests}` is clearer than `$A` and `$B`.
+
+### Test with realistic time ranges
+
+Expressions may behave differently with various time ranges. Test with the same ranges you'll use in production dashboards or alerts.
+
+### Handle edge cases
+
+Consider what happens when:
+
+- Data is missing (NoData)
+- Values are zero (division by zero)
+- Metrics haven't been collected yet
+- Time series have different numbers of points
+
+### Document complex expressions
+
+Add panel descriptions or annotation text explaining what complex expressions calculate and why.
+
+### Monitor expression performance
+
+If dashboards become slow, check if expressions are processing too much data. Consider moving heavy aggregations to recording rules or data source queries.
+
--- a/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/index.md
+++ b/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/index.md
@@ -1,263 +0,0 @@
---
-aliases:
-  - ../../../panels-visualizations/query-transform-data/ # /docs/grafana/next/panels-visualizations/query-transform-data/
-  - ../../../panels-visualizations/query-transform-data/expression-queries/ # /docs/grafana/next/panels-visualizations/query-transform-data/expression-queries/
-  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/
-  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/about-expressions/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/about-expressions/
-  - ../../../panels/query-a-data-source/use-expressions-to-manipulate-data/write-an-expression/ # /docs/grafana/next/panels/query-a-data-source/use-expressions-to-manipulate-data/write-an-expression/
-labels:
-  products:
-    - cloud
-    - enterprise
-    - oss
-menuTitle: Write expression queries
-title: Write expression queries
-description: Write server-side expressions to manipulate data using math and other operations
-weight: 40
-refs:
-  no-data-and-error-handling:
-    - pattern: /docs/grafana/
-      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling
-    - pattern: /docs/grafana-cloud/
-      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/alerting-rules/create-grafana-managed-rule/#configure-no-data-and-error-handling
-  multiple-dimensional-data:
-    - pattern: /docs/grafana/
-      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
-    - pattern: /docs/grafana-cloud/
-      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/
-  grafana-alerting:
-    - pattern: /docs/grafana/
-      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
-    - pattern: /docs/grafana-cloud/
-      destination: /docs/grafana/<GRAFANA_VERSION>/alerting/
-  labels:
-    - pattern: /docs/grafana/
-      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/#labels
-    - pattern: /docs/grafana-cloud/
-      destination: /docs/grafana/<GRAFANA_VERSION>/fundamentals/timeseries-dimensions/#labels
---
-
-# Write expression queries
-
-Server-side expressions enable you to manipulate data returned from queries with math and other operations. Expressions create new data and do not manipulate the data returned by data sources.
-
-## About expressions
-
-Server-side expressions allow you to manipulate data returned from queries with math and other operations. Expressions create new data and do not manipulate the data returned by data sources, aside from some minor data restructuring to make the data acceptable input for expressions.
-
-### Using expressions
-
-Expressions are most commonly used for [Grafana Alerting](ref:grafana-alerting). The processing is done server-side, so expressions can operate without a browser session. However, expressions can also be used with backend data sources and visualization.
-
-{{< admonition type="note" >}}
-Expressions do not work with legacy dashboard alerts.
-{{< /admonition >}}
-
-Expressions are meant to augment data sources by enabling queries from different data sources to be combined or by providing operations unavailable in a data source.
-
-{{< admonition type="note" >}}
-When possible, you should do data processing inside the data source. Copying data from storage to the Grafana server for processing is inefficient, so expressions are targeted at lightweight data processing.
-{{< /admonition >}}
-
-Expressions work with data source queries that return time series or number data. They also operate on [multiple-dimensional data](ref:multiple-dimensional-data). For example, a query that returns multiple series, where each series is identified by labels or tags.
-
-An individual expression takes one or more queries or other expressions as input and adds data to the result. Each individual expression or query is represented by a variable that is a named identifier known as its RefID (e.g., the default letter `A` or `B`).
-
-To reference the output of an individual expression or a data source query in another expression, this identifier is used as a variable.
-
-### Types of expressions
-
-Expressions work with two types of data.
-
- A collection of time series.
- A collection of numbers, where each number is an item.
-
-Each collection is returned from a single data source query or expression and represented by the RefID. Each collection is a set, where each item in the set is uniquely identified by its dimensions which are stored as [labels](ref:labels) or key-value pairs.
-
-### Data source queries
-
-Server-side expressions only support data source queries for backend data sources. The data is generally assumed to be labeled time series data. In the future we intend to add an assertion of the query return type (number or time series) data so expressions can handle errors better.
-
-Data source queries, when used with expressions, are executed by the expression engine. When it does this, it restructures data to be either one time series or one number per data frame. So for example if using a data source that returns multiple series on one frame in the table view, you might notice it looks different when executed with expressions.
-
-Currently, the only non-time series format (number) is supported when you're using data frames and you have a table response that returns a data frame with no time, string columns, and one number column:
-
-| Loc | Host | Avg_CPU |
-| --- | ---- | ------- |
-| MIA | A    | 1       |
-| NYC | B    | 2       |
-
-The example above will produce a number that works with expressions. The string columns become labels and the number column the corresponding value. For example `{"Loc": "MIA", "Host": "A"}` with a value of 1.
-
-### Operations
-
-You can use the following operations in expressions: math, reduce, and resample.
-
-#### Math
-
-Math is for free-form math formulas on time series or number data. Math operations take numbers and time series as input and change them to different numbers and time series.
-
-Data from other queries or expressions are referenced with the RefID prefixed with a dollar sign, for example `$A`. If the variable has spaces in the name, then you can use a brace syntax like `${my variable}`.
-
-Numeric constants may be in decimal (`2.24`), octal (with a leading zero like `072`), or hex (with a leading 0x like `0x2A`). Exponentials and signs are also supported (e.g., `-0.8e-2`).
-
-##### Operators
-
-The arithmetic (`+`, binary and unary `-`, `*`, `/`, `%`, exponent `**`), relational (`<`, `>`, `==`, `!=`, `>=`, `<=`), and logical (`&&`, `||`, and unary `!`) operators are supported.
-
-How the operation behaves with data depends on if it is a number or time series data.
-
-With binary operations, such as `$A + $B` or `$A || $B`, the operator is applied in the following ways depending on the type of data:
-
- If both `$A` and `$B` are a number, then the operation is performed between the two numbers.
- If one variable is a number, and the other variable is a time series, then the operation between the value of each point in the time series and the number is performed.
- If both `$A` and `$B` are time series data, then the operation between each value in the two series is performed for each time stamp that exists in both `$A` and `$B`. The Resample operation can be used to line up time stamps. (**Note:** in the future, we plan to add options to the Math operation for different behaviors).
-
-Summary:
-
- Number OP number = number
- Number OP series = series
- Series OP series = series
-
-Because expressions work with multiple series or numbers represented by a single variable, binary operations also perform a union (join) between the two variables. This is done based on the identifying labels associated with each individual series or number.
-
-So if you have numbers with labels like `{host=web01}` in `$A` and another number in `$B` with the same labels then the operation is performed between those two items within each variable, and the result will share the same labels. The rules for the behavior of this union are as follows:
-
- An item with no labels will join to anything.
- If both `$A` and `$B` each contain only one item (one series, or one number), they will join.
- If labels are exact match they will join.
- If labels are a subset of the other, for example and item in `$A` is labeled `{host=A,dc=MIA}` and item in `$B` is labeled `{host=A}` they will join.
- Currently, if within a variable such as `$A` there are different tag _keys_ for each item, the join behavior is undefined.
-
-The relational and logical operators return 0 for false 1 for true.
-
-##### Math Functions
-
-While most functions exist in the own expression operations, the math operation does have some functions similar to math operators or symbols. When functions can take either numbers or series, than the same type as the argument will be returned. When it is a series, the operation of performed for the value of each point in the series.
-
-###### abs
-
-abs returns the absolute value of its argument which can be a number or a series. For example `abs(-1)` or `abs($A)`.
-
-###### is_inf
-
-is_inf takes a number or a series and returns `1` for `Inf` values (negative or positive) and `0` for other values. For example `is_inf($A)`.
-
-{{< admonition type="note" >}}
-If you need to specifically check for negative infinity for example, you can do a comparison like `$A == infn()`.
-{{< /admonition >}}
-
-###### is_nan
-
-is_nan takes a number or a series and returns `1` for `NaN` values and `0` for other values. For example `is_nan($A)`. This function exists because `NaN` is not equal to `NaN`.
-
-###### is_null
-
-is_null takes a number or a series and returns `1` for `null` values and `0` for other values. For example `is_null($A)`.
-
-###### is_number
-
-is_number takes a number or a series and returns `1` for all real number values and `0` for other values (which are `null`, `Inf+`, `Inf-`, and `NaN`). For example `is_number($A)`.
-
-###### log
-
-Log returns the natural logarithm of of its argument which can be a number or a series. If the value is less than 0, NaN is returned. For example `log(-1)` or `log($A)`.
-
-###### inf, infn, nan, and null
-
-The inf, infn, nan, and null functions all return a single value of the name. They primarily exist for testing. Example: `null()`.
-
-###### round
-
-Round returns a rounded integer value. For example, `round(3.123)` or `round($A)`. (This function should probably take an argument so it can add precision to the rounded value).
-
-###### ceil
-
-Ceil rounds the number up to the nearest integer value. For example, `ceil(3.123)` returns 4.
-
-###### floor
-
-Floor rounds the number down to the nearest integer value. For example, `floor(3.123)` returns 3.
-
-#### Reduce
-
-Reduce takes one or more time series returned from a query or an expression and turns each series into a single number. The labels of the time series are kept as labels on each outputted reduced number.
-
-**Fields:**
-
- **Function -** The reduction function to use
- **Input -** The variable (refID (such as `A`)) to resample
- **Mode -** Allows control behavior of reduction function when a series contains non-numerical values (null, NaN, +\-Inf)
-
-##### Reduction Functions
-
-###### Count
-
-Count returns the number of points in each series.
-
-###### Mean
-
-Mean returns the total of all values in each series divided by the number of points in that series. In `strict` mode if any values in the series are null or nan, or if the series is empty, NaN is returned.
-
-###### Min and Max
-
-Min and Max return the smallest or largest value in the series respectively. In `strict` mode if any values in the series are null or nan, or if the series is empty, NaN is returned.
-
-###### Sum
-
-Sum returns the total of all values in the series. If series is of zero length, the sum will be 0. In `strict` mode if there are any NaN or Null values in the series, NaN is returned.
-
-##### Last
-
-Last returns the last number in the series. If the series has no values then returns NaN.
-
-##### Reduction Modes
-
-###### Strict
-
-In Strict mode the input series is processed as is. If any values in the series are non-numeric (null, NaN or +\-Inf), NaN is returned.
-
-###### Drop Non-Numeric
-
-In this mode all non-numeric values (null, NaN or +\-Inf) in the input series are filtered out before executing the reduction function.
-
-###### Replace Non-Numeric
-
-In this mode all non-numeric values are replaced by a pre-defined value.
-
-#### Resample
-
-Resample changes the time stamps in each time series to have a consistent time interval. The main use case is so you can resample time series that do not share the same timestamps so math can be performed between them. This can be done by resample each of the two series, and then in a Math operation referencing the resampled variables.
-
-**Fields:**
-
- **Input -** The variable of time series data (refID (such as `A`)) to resample
- **Resample to -** The duration of time to resample to, for example `10s`. Units may be `s` seconds, `m` for minutes, `h` for hours, `d` for days, `w` for weeks, and `y` of years.
- **Downsample -** The reduction function to use when there are more than one data point per window sample. See the reduction operation for behavior details.
- **Upsample -** The method to use to fill a window sample that has no data points.
-  - **pad** fills with the last know value
-  - **backfill** with next known value
-  - **fillna** to fill empty sample windows with NaNs
-
-## Write an expression
-
-If your data source supports them, then Grafana displays the **Expression** button and shows any existing expressions in the query editor list.
-
-For more information about expressions, refer to [About expressions](#about-expressions).
-
-1. Open the panel.
-1. Below the query, click **Expression**.
-1. In the **Operation** field, select the type of expression you want to write.
-
-   For more information about expression operations, refer to [About expressions](#about-expressions).
-
-1. Write the expression.
-1. Click **Apply**.
-
-## Special cases
-
-When any queried data source returns no series or numbers, the expression engine returns `NoData`. For example, if a request contains two data source queries that are merged by an expression, if `NoData` is returned by at least one of the data source queries, then the returned result for the entire query is `NoData`.
-
-For more information about how [Grafana Alerting](ref:grafana-alerting) processes `NoData` results, refer to [No data and error handling](ref:no-data-and-error-handling).
-
-In the case of using an expression on multiple queries, the expression engine requires that all of the queries return an identical timestamp. For example, if using math to combine the results of multiple SQL queries which each use `SELECT NOW() AS "time"`, the expression will only work if all queries evaluate `NOW()` to an identical timestamp; which does not always happen. To resolve this, you can replace `NOW()` with an arbitrary time, such as `SELECT 1 AS "time"`, or any other valid UNIX timestamp.
--- a/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/troubleshoot-expressions.md
+++ b/docs/sources/visualizations/panels-visualizations/query-transform-data/expression-queries/troubleshoot-expressions.md
@@ -0,0 +1,507 @@
+---
+aliases:
+labels:
+  products:
+    - cloud
+    - enterprise
+    - oss
+menuTitle: Troubleshoot expressions
+title: Troubleshoot Grafana expressions
+description: Debug and resolve common issues when working with Grafana Expressions
+weight: 50
+refs:
+  grafana-expressions:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/visualizations/panels-visualizations/query-transform-data/expression-queries/
+  transformations:
+    - pattern: /docs/grafana/
+      destination: /docs/grafana/<GRAFANA_VERSION>/panels-visualizations/query-transform-data/transform-data/
+    - pattern: /docs/grafana-cloud/
+      destination: /docs/grafana/<GRAFANA_VERSION>/panels-visualizations/query-transform-data/transform-data/
+---
+
+# Troubleshoot Grafana expressions
+
+This guide helps you diagnose and resolve common issues when working with expressions.
+
+## Debug expressions
+
+When an expression doesn't produce the expected results, use these strategies to identify the problem.
+
+### Test expressions step by step
+
+Break complex expressions into smaller pieces and verify each step:
+
+1. **Test individual queries first:** Ensure each data source query returns the expected data before adding expressions.
+1. **Add expressions incrementally:** Start with a simple expression and gradually add complexity.
+1. **Use separate panels for testing:** Create a temporary panel to test expressions in isolation.
+1. **Check intermediate results:** Add expressions at each step of your calculation to see intermediate values.
+
+**Example:**
+
+Instead of creating `($A - $B) / $C * 100` immediately, build it incrementally:
+
+- Expression D: `$A - $B` (verify the subtraction works)
+- Expression E: `$D / $C` (verify the division works)
+- Expression F: `$E * 100` (final percentage)
+
+Once working, you can collapse them into a single expression if desired.
+
+### Verify RefID references
+
+Ensure you're referencing the correct queries and expressions:
+
+- RefIDs are case-sensitive: `$A` is different from `$a`
+- Check that RefIDs haven't changed after reordering queries
+- Use `${RefID}` syntax for RefIDs with spaces or special characters
+
+### Check data types
+
+Expressions expect specific data types. Verify your queries return time series or numbers, not tables or other formats.
+
+**Common issues:**
+
+- SQL queries returning multiple columns (expressions need one value column)
+- Queries returning string data instead of numbers
+- Empty result sets that appear as NoData
+
+### Inspect labels
+
+Use the Table view in panels to see the labels on your series and verify they match as expected.
+
+**What to check:**
+
+- Do series from different queries have compatible labels for joining?
+- Are label names spelled consistently across queries?
+- Are there unexpected extra labels preventing matches?
+
+## Common errors and solutions
+
+Following are common errors and how to troubleshoot them.
+
+### "NoData" result
+
+**Problem:** Your expression returns NoData even though some queries have data.
+
+**Causes and solutions:**
+
+- **One query returns no data:** If any query in an expression returns NoData, the entire expression returns NoData. Check that all queries have data for the selected time range.
+- **Mismatched time ranges:** Ensure all queries use compatible time ranges. A query with "Last 5 minutes" can't combine with a query using "Last 24 hours" without adjustments.
+- **Backend data source required:** Expressions only work with backend data sources. Check that you're not using browser-based data sources.
+
+**Solution:**
+
+Test each query independently to identify which one returns NoData, then investigate why that query has no data.
+
+### No series match in math operations
+
+**Problem:** Math expression like `$A + $B` returns no data, but both queries return data.
+
+**Causes and solutions:**
+
+- **Label mismatch:** Series from `$A` and `$B` have different labels that prevent automatic matching.
+  
+  **Example:** `$A` has `{host="web01", region="us-east"}` but `$B` has `{server="web01", region="us-east"}`. The different label names (`host` vs `server`) prevent matching.
+  
+  **Solution:** Modify your queries to use consistent label names, or ensure one set of series has no labels (which matches anything).
+
+- **No overlapping timestamps:** Time series need matching timestamps for math operations.
+  
+  **Solution:** Use the Resample operation to align timestamps to a common interval.
+
+### Timestamp mismatch errors
+
+**Problem:** Combining results from multiple SQL queries fails because timestamps don't align.
+
+**Example:**
+
+```sql
+-- Query A
+SELECT NOW() AS "time", COUNT(*) as "errors" FROM error_log;
+
+-- Query B  
+SELECT NOW() AS "time", COUNT(*) as "requests" FROM request_log;
+```
+
+These queries may execute at slightly different times, producing different timestamps.
+
+**Solution 1 - Use fixed timestamps:**
+
+```sql
+-- Query A
+SELECT 1 AS "time", COUNT(*) as "errors" FROM error_log;
+
+-- Query B
+SELECT 1 AS "time", COUNT(*) as "requests" FROM request_log;
+```
+
+**Solution 2 - Use consistent time references:**
+
+Ensure all queries evaluate time identically by using the same timestamp variable or function.
+
+**Solution 3 - Use Resample:**
+
+Add Resample operations to align both series to a common interval before performing math.
+
+### Math operations produce unexpected nulls or NaN
+
+**Problem:** Expression results contain null or NaN values unexpectedly.
+
+**Causes and solutions:**
+
+- **Division by zero:** Dividing by zero produces infinity. Use conditional logic: `$A > 0 ? $B / $A : 0`
+- **Logarithm of negative numbers:** `log()` of negative values returns NaN.
+- **Operations on null values:** Math operations involving null typically produce null.
+
+**Solution:**
+
+Use data quality functions to filter or handle problematic values:
+
+```
+is_number($A) ? $A : 0
+```
+
+Or use Reduce with "Drop non-numeric" mode to clean data before calculations.
+
+### Reduce returns NaN in strict mode
+
+**Problem:** Reduce operation returns NaN even though most data points are valid.
+
+**Cause:** Strict mode returns NaN if _any_ value in the series is null, NaN, or infinity.
+
+**Solution:**
+
+Change the reduction mode:
+
+- **Drop non-numeric:** Ignores invalid values and calculates from valid ones
+- **Replace non-numeric:** Substitutes a specific value for invalid data points
+
+Use Strict mode only when data quality is critical and you want to know if any values are invalid.
+
+### Expression works in panel but fails in alerting
+
+**Problem:** Expression displays correctly in a panel but produces errors or unexpected results in alert rules.
+
+**Causes and solutions:**
+
+- **Time range differences:** Alerts use specific time ranges that may differ from your panel's time range. Verify the alert's time range settings.
+- **Data availability:** Data may be available when viewing the panel but missing when the alert evaluates.
+- **Reduce required for alerting:** Most alert conditions need single values. Ensure you're using Reduce to convert time series to numbers for threshold comparisons.
+
+**Solution:**
+
+Test your expression in a panel using the same time range as your alert rule.
+
+## Work with timestamps
+
+Timestamps can be a common source of issues when working with expressions. Here's how to handle them effectively.
+
+### Understand timestamp alignment
+
+Math operations between time series require matching timestamps. If series `$A` has points at `10:00:00`, `10:00:30`, `10:01:00` and series `$B` has points at `10:00:15`, `10:00:45`, `10:01:15`, the operation `$A + $B` produces no results because no timestamps match exactly.
+
+### When to resample
+
+Use Resample when:
+
+- Combining data from sources with different collection intervals
+- One data source reports irregularly while another reports at fixed intervals
+- You need to ensure timestamps align for math operations
+- You want to normalize data to a consistent interval for visualization
+
+### Resample strategies
+
+**Downsample (reducing frequency):**
+
+When going from higher to lower frequency (for example, 10s intervals to 1m intervals), choose an appropriate reduction function:
+
+- **Mean:** For averaging values (CPU percentage, temperature)
+- **Max:** For peak values (maximum memory usage)
+- **Min:** For minimum values (lowest throughput)
+- **Sum:** For accumulating values (request counts, error totals)
+
+**Upsample (increasing frequency):**
+
+When going from lower to higher frequency (for example, 1m intervals to 10s intervals), choose a fill strategy:
+
+- **Pad (forward fill):** Assumes value stays constant until next measurement (good for state data)
+- **Backfill:** Uses next known value (less common, use when future values inform past state)
+- **fillna:** Inserts NaN for unknown intervals (explicit about missing data)
+
+### SQL timestamp best practices
+
+When writing SQL queries for use with expressions:
+
+**Do:**
+
+- Use consistent timestamp columns across queries
+- Round or truncate timestamps to a common interval if needed
+- Use fixed timestamps for non-time-based aggregations
+
+```sql
+-- Good: Consistent time bucket
+SELECT 
+  DATE_TRUNC('minute', timestamp) AS "time",
+  COUNT(*) as "value"
+FROM events
+GROUP BY 1
+ORDER BY 1;
+```
+
+**Don't:**
+
+- Use `NOW()` or `CURRENT_TIMESTAMP` which vary between query executions
+- Mix different timestamp columns in related queries
+- Return data without a time column for time series expressions
+
+## Handle missing data
+
+Understanding how expressions handle missing data helps you build robust dashboards and alerts.
+
+### NoData propagation
+
+When any query in an expression returns NoData, the entire expression result is NoData. This is by design to prevent calculations on incomplete data.
+
+**Example:**
+
+```
+Expression: $A / $B
+- Query A returns: 100
+- Query B returns: NoData
+- Expression result: NoData (not 100, not error)
+```
+
+### Strategies for missing data
+
+**1. Use default values:**
+
+Modify your data source queries to return zero or a default value instead of no data.
+
+**2. Build conditional logic:**
+
+Use multiple expressions to check for data availability before performing calculations.
+
+**3. Adjust time ranges:**
+
+Ensure queries use time ranges likely to have data. If a service only reports every 5 minutes, don't query the last 1 minute.
+
+**4. Configure alert NoData handling:**
+
+In alerting, you can configure how NoData is treated (for example, trigger alert, don't trigger, or mark as special state).
+
+### Missing data points vs NoData
+
+**Missing data points:** Some points in a time series are null or absent, but the series exists.
+
+- Handle with Reduce modes (Drop non-numeric, Replace non-numeric)
+- Use data quality functions: `is_null($A)`, `is_number($A)`
+
+**NoData:** No series returned at all from a query.
+
+- Check query syntax and time range
+- Verify data exists in the data source
+- Ensure data source is reachable
+
+## Performance considerations
+
+Expressions run on the Grafana server, so understanding performance implications helps you build efficient dashboards and alerts.
+
+### When expressions are inefficient
+
+**Large data volumes:**
+
+- Pulling millions of data points to Grafana for simple aggregations
+- Better: Perform aggregation in the data source query
+
+**Repeated operations:**
+
+- Running the same calculation across many panels
+- Better: Consider recording rules (Prometheus) or continuous queries (InfluxDB)
+
+**Complex nested expressions:**
+
+- Long chains of expressions that could be simplified
+- Better: Simplify the expression or move logic to data source
+
+### Optimization strategies
+
+**1. Push processing to data sources:**
+
+Instead of:
+```
+Query A: SELECT value FROM metrics
+Expression B: Reduce(Mean, $A)
+Expression C: $B > 100
+```
+
+Do in data source:
+```
+Query A: SELECT AVG(value) FROM metrics
+Expression B: $A > 100
+```
+
+**2. Use appropriate time ranges:**
+
+- Don't query years of data when hours suffice
+- Match time ranges to your actual analysis needs
+- Use relative time ranges for consistent performance
+
+**3. Reduce data points before math:**
+
+If you only need a single value for alerting, reduce first then perform math rather than calculating across every point:
+
+**Less efficient:**
+```
+Expression A: $QueryA * 100 (multiplies every point)
+Expression B: Reduce(Mean, $A)
+```
+
+**More efficient:**
+```
+Expression A: Reduce(Mean, $QueryA)
+Expression B: $A * 100 (multiplies one value)
+```
+
+**4. Limit label cardinality:**
+
+High-cardinality labels (many unique values) multiply the number of series. If querying metrics with thousands of unique host labels, consider aggregating in the data source.
+
+### Monitor expression performance
+
+Watch for these warning signs:
+
+- Panels take more than 2-3 seconds to load
+- Query inspector shows expressions processing thousands of series
+- Grafana server CPU spikes when loading dashboards
+- Alert evaluation takes significant time
+
+If you see these issues, review your expressions for optimization opportunities.
+
+## Expressions vs transformations
+
+Both expressions and transformations manipulate query data, but they serve different purposes and have different capabilities.
+
+### When to use expressions
+
+Use expressions when:
+
+- **Server-side processing required:** Alerting requires server-side evaluation
+- **Cross-data-source operations:** Combining data from different data sources
+- **Label-based matching:** Automatic series matching based on labels
+- **Simple math and aggregations:** Basic calculations and reductions
+- **Backend data sources:** Working with backend/server-side data sources
+
+**Advantages:**
+
+- Work in alerting rules
+- Operate on data before visualization
+- Support cross-data-source calculations
+- Preserve label-based series relationships
+
+**Limitations:**
+
+- Only work with backend data sources
+- Limited operation types (Math, Reduce, Resample)
+- Less flexible than transformations for complex data reshaping
+- Can't modify table structures significantly
+
+### When to use transformations
+
+Use transformations when:
+
+- **Complex data reshaping:** Pivoting, merging, or restructuring data
+- **Table operations:** Working with tabular data formats
+- **Field manipulation:** Renaming, organizing, or filtering fields
+- **Client-side only needed:** Visualization changes that don't affect alerting
+- **Advanced processing:** Operations not available in expressions
+
+**Advantages:**
+
+- More operation types available
+- Better for complex table manipulations
+- Work with any data source (including browser-based)
+- More flexible field and column operations
+- Can dramatically reshape data structures
+
+**Limitations:**
+
+- Don't work in alerting (client-side only)
+- Can't combine different data sources
+- Process data after query execution
+- Don't preserve complex label relationships
+
+### Comparison table
+
+| Feature | Expressions | Transformations |
+|---------|------------|-----------------|
+| Works in alerts | Yes | No |
+| Combines data sources | Yes | No |
+| Available operations | 3 types (Math, Reduce, Resample) | 20+ types |
+| Execution | Server-side | Client-side (browser) |
+| Data source support | Backend only | All data sources |
+| Label matching | Automatic | Manual |
+| Table operations | Limited | Extensive |
+| Performance | Uses server resources | Uses browser resources |
+
+### Use both together
+
+You can use expressions and transformations in the same panel:
+
+1. Expressions run first (server-side)
+1. Transformations run after (client-side)
+
+**Example workflow:**
+
+- Query A: Prometheus metric
+- Query B: SQL query
+- Expression C: Combine `$A` and `$B` (server-side)
+- Transformation: Rename fields, organize columns (client-side)
+
+This approach lets you leverage the strengths of both systems.
+
+### Migration considerations
+
+**From transformations to expressions:**
+
+Consider this when:
+- You need the same logic in alerting
+- You're combining data sources
+- Server-side processing would improve performance
+
+**Limitations:**
+- May need to redesign complex transformations
+- Some transformation operations have no expression equivalent
+- Need backend data sources
+
+**From expressions to transformations:**
+
+Consider this when:
+- You need more complex data manipulation
+- You're working with browser-based data sources
+- You need advanced table operations
+
+**Limitations:**
+- Can't use in alerting
+- Can't combine different data sources
+- May need to change query structure
+
+## Get help
+
+If you're still experiencing issues after trying these troubleshooting steps:
+
+1. **Check the Query Inspector:** Click the Query Inspector button to see raw query results and expression outputs
+1. **Review Grafana logs:** Server-side expression errors appear in Grafana server logs
+1. **Simplify and isolate:** Create a minimal example that reproduces the issue
+1. **Community resources:** Search or post in the Grafana community forums
+1. **Documentation:** Refer to [Grafana Expressions](ref:grafana-expressions) for detailed operation documentation
+
+When asking for help, include:
+
+- Grafana version
+- Data source type and version
+- Simplified example of your queries and expressions
+- Expected vs actual results
+- Any error messages from Query Inspector or logs
+
Author	SHA1	Message	Date
Larissa Wandzura	a18a59c7be	Some punctuation and style updates	2025-12-03 09:23:26 -06:00
Larissa Wandzura	5d98a41b3b	revamp of expressions doc draft 1	2025-12-02 14:56:11 -06:00
Larissa Wandzura	7644072c74	initial concept definition and use cases	2025-12-02 08:53:45 -06:00