fixed linter gripes

created the troubleshooting guide
2025-12-20 19:44:55 +08:00 · 2025-12-12 14:47:11 -06:00 · 2025-12-12 14:23:52 -06:00
2 changed files with 309 additions and 0 deletions
--- a/docs/sources/datasources/prometheus/_index.md
+++ b/docs/sources/datasources/prometheus/_index.md
@@ -113,6 +113,7 @@ The following documentation will help you get started working with Prometheus an
 - [Configure the Prometheus data source](ref:configure-prometheus-data-source)
 - [Prometheus query editor](query-editor/)
 - [Template variables](template-variables/)
+- [Troubleshooting](troubleshooting/)

 ## Exemplars

--- a/docs/sources/datasources/prometheus/troubleshooting/index.md
+++ b/docs/sources/datasources/prometheus/troubleshooting/index.md
@@ -0,0 +1,308 @@
+---
+aliases:
+  - ../../data-sources/prometheus/troubleshooting/
+description: Troubleshooting the Prometheus data source in Grafana
+keywords:
+  - grafana
+  - prometheus
+  - troubleshooting
+  - errors
+  - promql
+labels:
+  products:
+    - cloud
+    - enterprise
+    - oss
+menuTitle: Troubleshooting
+title: Troubleshoot issues with the Prometheus data source
+weight: 600
+---
+
+# Troubleshoot issues with the Prometheus data source
+
+This document provides troubleshooting information for common errors you may encounter when using the Prometheus data source in Grafana.
+
+## Connection errors
+
+The following errors occur when Grafana cannot establish or maintain a connection to Prometheus.
+
+### Failed to connect to Prometheus
+
+**Error message:** "There was an error returned querying the Prometheus API"
+
+**Cause:** Grafana cannot establish a network connection to the Prometheus server.
+
+**Solution:**
+
+1. Verify that the Prometheus server URL is correct in the data source configuration.
+1. Check that Prometheus is running and accessible from the Grafana server.
+1. Ensure the URL includes the protocol (`http://` or `https://`).
+1. Verify the port is correct (the Prometheus default port is `9090`).
+1. Ensure there are no firewall rules blocking the connection.
+1. If Grafana and Prometheus are running in separate containers, use the container IP address or hostname instead of `localhost`.
+1. For Grafana Cloud, ensure you have configured [Private data source connect](https://grafana.com/docs/grafana-cloud/connect-externally-hosted/private-data-source-connect/) if your Prometheus instance is not publicly accessible.
+
+### Request timed out
+
+**Error message:** "context deadline exceeded" or "request timeout"
+
+**Cause:** The connection to Prometheus timed out before receiving a response.
+
+**Solution:**
+
+1. Check the network latency between Grafana and Prometheus.
+1. Verify that Prometheus is not overloaded or experiencing performance issues.
+1. Increase the **Query timeout** setting in the data source configuration under **Interval behavior**.
+1. Reduce the time range or complexity of your query.
+1. Check if any network devices (load balancers, proxies) are timing out the connection.
+
+### Failed to parse data source URL
+
+**Error message:** "Failed to parse data source URL"
+
+**Cause:** The URL entered in the data source configuration is not valid.
+
+**Solution:**
+
+1. Verify the URL format is correct (for example, `http://localhost:9090` or `https://prometheus.example.com:9090`).
+1. Ensure the URL includes the protocol (`http://` or `https://`).
+1. Remove any trailing slashes or invalid characters from the URL.
+
+## Authentication errors
+
+The following errors occur when there are issues with authentication credentials or permissions.
+
+### Unauthorized (401)
+
+**Error message:** "401 Unauthorized" or "Authorization failed"
+
+**Cause:** The authentication credentials are invalid or missing.
+
+**Solution:**
+
+1. Verify that the username and password are correct if using basic authentication.
+1. Check that the authentication method selected matches your Prometheus configuration.
+1. If using a reverse proxy with authentication, verify the credentials are correct.
+1. For AWS SigV4 authentication, verify the IAM credentials and permissions.
+
+### Forbidden (403)
+
+**Error message:** "403 Forbidden" or "Access denied"
+
+**Cause:** The authenticated user does not have permission to access the requested resource.
+
+**Solution:**
+
+1. Verify the user has read access to the Prometheus API.
+1. Check Prometheus security settings and access control configuration.
+1. If using a reverse proxy, verify the proxy is not blocking the request.
+1. For AWS Managed Prometheus, verify the IAM policy grants the required permissions.
+
+## Query errors
+
+The following errors occur when there are issues with PromQL syntax or query execution.
+
+### Query syntax error
+
+**Error message:** "parse error: unexpected character" or "bad_data: 1:X: parse error"
+
+**Cause:** The PromQL query contains invalid syntax.
+
+**Solution:**
+
+1. Check your query syntax for typos or invalid characters.
+1. Verify that metric names and label names are valid identifiers.
+1. Ensure string values in label matchers are enclosed in quotes.
+1. Use the Prometheus expression browser to test your query directly.
+1. Refer to the [Prometheus querying documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/) for syntax guidance.
+
+### Unknown metric name
+
+**Error message:** "unknown metric name" or query returns no data
+
+**Cause:** The specified metric does not exist in Prometheus.
+
+**Solution:**
+
+1. Verify the metric name is spelled correctly.
+1. Check that the metric is being scraped by Prometheus.
+1. Use the Prometheus UI to browse available metrics at `/graph` or `/api/v1/label/__name__/values`.
+1. Verify the time range includes data for the metric.
+
+### Query timeout limit exceeded
+
+**Error message:** "query timed out in expression evaluation" or "query processing would load too many samples"
+
+**Cause:** The query took longer than the configured timeout limit or would return too many samples.
+
+**Solution:**
+
+1. Reduce the time range of your query.
+1. Add more specific label filters to limit the data scanned.
+1. Increase the **Query timeout** setting in the data source configuration.
+1. Use aggregation functions like `sum()`, `avg()`, or `rate()` to reduce the number of time series.
+1. Increase the `query.timeout` or `query.max-samples` settings in Prometheus if you have admin access.
+
+### Too many time series
+
+**Error message:** "exceeded maximum resolution of 11,000 points per timeseries" or "maximum number of series limit exceeded"
+
+**Cause:** The query is returning more time series or data points than the configured limits allow.
+
+**Solution:**
+
+1. Reduce the time range of your query.
+1. Add label filters to limit the number of time series returned.
+1. Increase the **Min interval** or **Resolution** in the query options to reduce the number of data points.
+1. Use aggregation functions to combine time series.
+1. Adjust the **Series limit** setting in the data source configuration under **Other settings**.
+
+### Invalid function or aggregation
+
+**Error message:** "unknown function" or "parse error: unexpected aggregation"
+
+**Cause:** The query uses an invalid or unsupported PromQL function.
+
+**Solution:**
+
+1. Verify the function name is spelled correctly and is a valid PromQL function.
+1. Check that you are using the correct syntax for the function.
+1. Ensure your Prometheus version supports the function you are using.
+1. Refer to the [PromQL functions documentation](https://prometheus.io/docs/prometheus/latest/querying/functions/) for available functions.
+
+## Configuration errors
+
+The following errors occur when the data source is not configured correctly.
+
+### Invalid Prometheus type
+
+**Error message:** Unexpected behavior when querying metrics or labels
+
+**Cause:** The **Prometheus type** setting does not match your actual Prometheus-compatible database.
+
+**Solution:**
+
+1. Open the data source configuration in Grafana.
+1. Under **Performance**, select the correct **Prometheus type** (Prometheus, Cortex, Mimir, or Thanos).
+1. Different database types support different APIs, so setting this incorrectly may cause unexpected behavior.
+
+### Scrape interval mismatch
+
+**Error message:** Data appears sparse or aggregated incorrectly
+
+**Cause:** The **Scrape interval** setting in Grafana does not match the actual scrape interval in Prometheus.
+
+**Solution:**
+
+1. Check your Prometheus configuration file for the `scrape_interval` setting.
+1. Update the **Scrape interval** in the Grafana data source configuration under **Interval behavior** to match.
+1. If the Grafana interval is higher than the Prometheus interval, you may see less data points than expected.
+
+## TLS and certificate errors
+
+The following errors occur when there are issues with TLS configuration.
+
+### Certificate verification failed
+
+**Error message:** "x509: certificate signed by unknown authority" or "certificate verify failed"
+
+**Cause:** Grafana cannot verify the TLS certificate presented by Prometheus.
+
+**Solution:**
+
+1. If using a self-signed certificate, enable **Add self-signed certificate** in the TLS settings and add your CA certificate.
+1. Verify the certificate chain is complete and valid.
+1. Ensure the certificate has not expired.
+1. As a temporary workaround for testing, enable **Skip TLS verify** (not recommended for production).
+
+### TLS handshake error
+
+**Error message:** "TLS: handshake failure" or "connection reset"
+
+**Cause:** The TLS handshake between Grafana and Prometheus failed.
+
+**Solution:**
+
+1. Verify that Prometheus is configured to use TLS.
+1. Check that the TLS version and cipher suites are compatible.
+1. If using client certificates, ensure they are correctly configured in the **TLS client authentication** section.
+1. Verify the server name matches the certificate's Common Name or Subject Alternative Name.
+
+## Other common issues
+
+The following issues don't produce specific error messages but are commonly encountered.
+
+### Empty query results
+
+**Cause:** The query returns no data.
+
+**Solution:**
+
+1. Verify the time range includes data in Prometheus.
+1. Check that the metric and label names are correct.
+1. Test the query directly in the Prometheus expression browser.
+1. Ensure label filters are not excluding all data.
+1. For rate or increase functions, ensure the time range is at least twice the scrape interval.
+
+### Slow query performance
+
+**Cause:** Queries take a long time to execute.
+
+**Solution:**
+
+1. Reduce the time range of your query.
+1. Add more specific label filters to limit the data scanned.
+1. Increase the **Min interval** in the query options.
+1. Check Prometheus server performance and resource utilization.
+1. Enable **Disable metrics lookup** in the data source configuration for large Prometheus instances.
+1. Enable **Incremental querying (beta)** to cache query results.
+1. Consider using recording rules to pre-aggregate frequently queried data.
+
+### Data appears delayed or missing recent points
+
+**Cause:** The visualization doesn't show the most recent data.
+
+**Solution:**
+
+1. Check the dashboard time range and refresh settings.
+1. Verify the **Scrape interval** is configured correctly.
+1. Ensure Prometheus has finished scraping the target.
+1. Check for clock synchronization issues between Grafana and Prometheus.
+1. For `rate()` and similar functions, remember that they need at least two data points to calculate.
+
+### Exemplars not showing
+
+**Cause:** Exemplar data is not appearing in visualizations.
+
+**Solution:**
+
+1. Verify that exemplars are enabled in the data source configuration under **Exemplars**.
+1. Check that your Prometheus version supports exemplars (2.26+).
+1. Ensure your instrumented application is sending exemplar data.
+1. Verify the tracing data source is correctly configured for the exemplar link.
+1. Enable the **Exemplars** toggle in the query editor.
+
+### Alerting rules not visible
+
+**Cause:** Prometheus alerting rules are not appearing in the Grafana Alerting UI.
+
+**Solution:**
+
+1. Verify that **Manage alerts via Alerting UI** is enabled in the data source configuration.
+1. Check that Prometheus has alerting rules configured.
+1. Ensure Grafana can access the Prometheus rules API endpoint.
+1. Note that for Prometheus (unlike Mimir), the Alerting UI only supports viewing existing rules, not creating new ones.
+
+## Get additional help
+
+If you continue to experience issues after following this troubleshooting guide:
+
+1. Check the [Prometheus documentation](https://prometheus.io/docs/) for API and PromQL guidance.
+1. Review the [Grafana community forums](https://community.grafana.com/) for similar issues.
+1. Contact Grafana Support if you are a Cloud Pro, Cloud Contracted, or Enterprise user.
+1. When reporting issues, include:
+   - Grafana version
+   - Prometheus version and type (Prometheus, Mimir, Cortex, Thanos)
+   - Error messages (redact sensitive information)
+   - Steps to reproduce
+   - Relevant configuration such as data source settings, query timeout, and TLS settings (redact tokens, passwords, and other credentials)
Author	SHA1	Message	Date
Larissa Wandzura	e81769147f	fixed linter gripes	2025-12-12 14:47:11 -06:00
Larissa Wandzura	70e49947a1	created the troubleshooting guide	2025-12-12 14:23:52 -06:00