I've been playing with InfluxDB lately. One of the problems I'm facing is getting what I need out of it. Which means exploring the query language. The documentation needs some polishing in spots, so I may submit a PR to it once I get something worked up. But until then, enjoy some googlebait about how the SELECT syntax works, and what you can do with it.
Rule 1: Never, ever put a WHERE condition that involves 'value'. Value is not indexed. Doing so will cause table-scans, and for a database that can legitimately contain over a billion rows, that's bad. Don't do it.
Rule 2: No joins.
With that out of the way, have some progressively more complex queries to explain how the heck this all works!
Return a list of values.
Dump everything in a measurement, going back as far as you have data. You almost never want to do this
SELECT value FROM site_hits
The one exception to this rule, is if you're pulling out something like an event stream, where events are encoded as tags-values.
SELECT event_text, value FROM eventstream
Return a list of values from a measurement, with given tags.
One of the features of InfluxDB, is that you can tag values in a measurement. These function like extra fields in a database row, but you still can't join on them. The syntax for this should not be surprising.
SELECT value FROM site_hits WHERE webapp = 'api' AND environment = 'prod'
Return a list of values from a measurement, with given tags that match a regex.
Yes, you can use regexes in your WHERE clauses.
SELECT value FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod'
That's cool and all, but the real power of InfluxDB comes with the aggregation functions and grouping. This is what allows you to learn what the max value was for a given measurement over the past 30 minutes, and other useful things. These yield time-series that can be turned into nice charts.
Return a list of values, grouped by application
This is the first example of GROUP BY, and isn't one you'll probably ever need to use. This will emit multiple time-series.
SELECT value FROM site_hits where webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY webapp
Return a list of values, grouped by time into 10 minute buckets
When using time for a GROUP BY value, you must provide an aggregation function! This will add together all of the hits in the 10 minute bucket into a single value, returning a time-stream of 10 minute buckets of hits.
SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY time(10m)
Return a list of values, grouped by both web-server and time into 10 minute buckets
This does the same thing as the previous, but will yield multiple time-series. Some graphing packages will helpfully chart multiple lines based on this single query. Handy, especially if servername changes on a daily basis as new nodes are added and removed.
SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' GROUP BY time(10m), servername
Return a list of values, grouped by time into 10 minute buckets, for data receive in the last 24 hours.
This adds a time-based condition to the WHERE clause. To keep the line shorter, we're not going to group on servername.
SELECT sum(value) FROM site_hits WHERE webapp =~ /^api_[a-z]*/ AND environment = 'prod' AND time > now() - 24h GROUP BY time(10m)
There is one more trick InfluxDB can do, and this isn't documented very well. InfluxDB can partition data in a database into retention policies. There is a default retention policy on each database, and if you don't specify a retention-policy to query from, you are querying the default. All of the above examples are querying the default retention-policy.
By using continuous queries you can populate other retention policies with data from the default policy. Perhaps your default policy keeps data for 6 weeks at 10 second granularity, but you want to keep another policy for 1 minute granularity for six months, and another policy for 10 minute granularity for two years. These queries allow you to do that.
Querying data from a non-default retention policy is done like this:
Return 14 weeks of hits to API-type webapps, in 1 hour buckets
SELECT sum(value) FROM "6month".site_hits WHERE webapp =~ /api_[a-z]*/ AND environment = 'prod' AND time > now() - 14w GROUP BY time(1h)
The same could be done for "18month", if that policy was on the server.