InfluxDB queries, a guide

I've been playing with InfluxDB lately. One of the problems I'm facing is getting what I need out of it. Which means exploring the query language. The documentation needs some polishing in spots, so I may submit a PR to it once I get something worked up. But until then, enjoy some googlebait about how the SELECT syntax works, and what you can do with it.

Rule 1: Never, ever put a WHERE condition that involves 'value'. Value is not indexed. Doing so will cause table-scans, and for a database that can legitimately contain over a billion rows, that's bad. Don't do it.
Rule 2: No joins.

With that out of the way, have some progressively more complex queries to explain how the heck this all works!


Return a list of values.

Dump everything in a measurement, going back as far as you have data. You almost never want to do this

SELECT value FROM site_hits

The one exception to this rule, is if you're pulling out something like an event stream, where events are encoded as tags-values.

SELECT event_text, value FROM eventstream

Return a list of values from a measurement, with given tags.

One of the features of InfluxDB, is that you can tag values in a measurement. These function like extra fields in a database row, but you still can't join on them. The syntax for this should not be surprising.

SELECT value FROM site_hits WHERE webapp = 'api' AND environment = 'prod'

Return a list of values from a measurement, with given tags that match a regex.

Yes, you can use regexes in your WHERE clauses.

SELECT value FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod'


That's cool and all, but the real power of InfluxDB comes with the aggregation functions and grouping. This is what allows you to learn what the max value was for a given measurement over the past 30 minutes, and other useful things. These yield time-series that can be turned into nice charts.

Return a list of values, grouped by application

This is the first example of GROUP BY, and isn't one you'll probably ever need to use. This will emit multiple time-series.

SELECT value FROM site_hits where webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY webapp

Return a list of values, grouped by time into 10 minute buckets

When using time for a GROUP BY value, you must provide an aggregation function! This will add together all of the hits in the 10 minute bucket into a single value, returning a time-stream of 10 minute buckets of hits.

SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY time(10m)

Return a list of values, grouped by both web-server and time into 10 minute buckets

This does the same thing as the previous, but will yield multiple time-series. Some graphing packages will helpfully chart multiple lines based on this single query. Handy, especially if servername changes on a daily basis as new nodes are added and removed.

SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY time(10m), servername

Return a list of values, grouped by time into 10 minute buckets, for data receive in the last 24 hours.

This adds a time-based condition to the WHERE clause. To keep the line shorter, we're not going to group on servername.

SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' AND time > now() - 24h GROUP BY time(10m)


There is one more trick InfluxDB can do, and this isn't documented very well. InfluxDB can partition data in a database into retention policies. There is a default retention policy on each database, and if you don't specify a retention-policy to query from, you are querying the default. All of the above examples are querying the default retention-policy.

By using continuous queries you can populate other retention policies with data from the default policy. Perhaps your default policy keeps data for 6 weeks at 10 second granularity, but you want to keep another policy for 1 minute granularity for six months, and another policy for 10 minute granularity for two years. These queries allow you to do that.

Querying data from a non-default retention policy is done like this:

Return 14 weeks of hits to API-type webapps, in 1 hour buckets

SELECT sum(value) FROM "6month".site_hits WHERE webapp =~ /api_[a-z]*/ AND environment = 'prod' AND time > now() - 14w GROUP BY time(1h)

The same could be done for "18month", if that policy was on the server.