I wanted a way to view time-series data — There’s always gnuplot but I wanted to learn grafana since it is becoming an in-demand skill. But how to view my logs in grafana? Grafana doesn’t natively read csv files (I have managed to convert the relevant info in my logs to csv). After a little surveying, I found a way to get some csv files into it — via influxdb. Grafana has a built-in “data source” for influxdb. If I could just get my csv data into influxdb, I’d be set.

Here are my notes.

Influxdb is a time-series database. It is not an SQL database and it does not correspond one-to-one with SQL database concepts. However … I do know a little about SQL databases. So …

Influxdb has a database-managing process (influxdb), and a separate client (influx). The server also incorporates a HTTP-responding engine into the database for handling queries — by default at http://localhost:8086.

Being a time-series database, timestamps are an important part of the data. The database stores timestamps in a canonical format — to nanosecond precision. When feeding the data into the database, if your timestamps have less precision, you can tell the client what precision the timestamps are (milliseconds? seconds?) and it will do the conversion for you.

In addition to indexing the data by timestamp, there is also a “data” field — that is a label which pulls together a bunch of like data. So for instance, there could be a collection of “stock price” data. But that data can consist of a set of related data at each timestamp — such as “type” (could be a string indicating which stock), and multiple price-like fields like open, close, high, low. These could be the open, close, high and low prices for the interval from the last timestamp to the current timestamp.

In addition, each of these collections of data can have tags assigned to it. The query language allows you to select by time, time range, tags, fields, and data name. Among many other things : -) I’m still at a very early stage of learning so I’m sure there are lots of things the query language can do that I haven’t named. I’m only saying what I’ve learned so far.

In this early stage of my learning, trying to relate what I have learned to SQL databases with which I am already a little familiar, I’m thinking of the data name (like “price”) as the table name, and the individual key-values (like open, close, high, low) as the columns or attributes of the table. The tags might be more attributes in the table — or they might have extra functionality beyond that — I just don’t know yet. And at the moment this suffices for my purposes.

Anyway on to the examples. Based on influxdata’s blog post.

My data was in csv format, here is a script to convert to “line protocol” which is what influxdb reads.

The csv:

localhost$ cat data/BTC_sm_ns.csv                       
time,close,high,low,open,volume
1483228740,968.29,968.29,968.29,968.29,0
1483228800,968.7,968.76,968.49,968.29,12993
1483228860,968.43,968.7,967.2,968.7,73800
1483228920,967.21,968,967.21,968.43,3500
1483228980,966.97,967.21,966.74,967.21,15969
1483229040,966.97,966.87,966.97,966.97,300
1483229100,967,967,967,966.97,13231
1483229160,966.89,966.89,966.89,967,500
1483229220,966.89,966.89,966.89,966.89,0
1483229280,966.89,966.89,966.89,966.89,0
1483229340,966.89,966.89,966.89,966.89,0
1483229400,966.89,966.89,966.89,966.89,0

The script. This is not a generic any-csv-to-line-protocol — it makes assumptions about the data — note the hard-coded data name “price”, the hard-coded tag “BTC” which came from the CSV file name — also the conversion to nano-second timestamps (the original example did this — I learned about the time-stamp-converting influxdb client later):

#!/usr/bin/env python

import pandas as pd
#convert csv's to line protocol

#convert sample data to line protocol (with nanosecond precision)
df = pd.read_csv('data/BTC_sm_ns.csv')
lines = ['price'
         + ',type=BTC'
         + ' '
         + 'close=' + str(df['close'][d]) + ','
         + 'high=' + str(df['high'][d]) + ','
         + 'low=' + str(df['low'][d]) + ','
         + 'open=' + str(df['open'][d]) + ','
         + 'volume=' + str(df['volume'][d])
         + ' ' + str(df['time'][d])+'000000000' for d in range(len(df))]
thefile = open('data/chronograf.txt', 'w')
for item in lines:
    thefile.write('%s\n' % item)

The resulting “line syntax”, with a little query language preamble tacked on after the conversion was done:

localhost$ cat data/chronograf.txt                       

# DDL
CREATE DATABASE import

# DML
# CONTEXT-DATABASE: import

price,type=BTC close=968.29,high=968.29,low=968.29,open=968.29,volume=0 483228740000000000
price,type=BTC close=968.7,high=968.76,low=968.49,open=968.29,volume=12993 1483228800000000000
price,type=BTC close=968.43,high=968.7,low=967.2,open=968.7,volume=73800 1483228860000000000
price,type=BTC close=967.21,high=968.0,low=967.21,open=968.43,volume=3500 1483228920000000000
price,type=BTC close=966.97,high=967.21,low=966.74,open=967.21,volume=15969 1483228980000000000
price,type=BTC close=966.97,high=966.87,low=966.97,open=966.97,volume=300 1483229040000000000
price,type=BTC close=967.0,high=967.0,low=967.0,open=966.97,volume=13231 1483229100000000000
price,type=BTC close=966.89,high=966.89,low=966.89,open=967.0,volume=500 1483229160000000000
price,type=BTC close=966.89,high=966.89,low=966.89,open=966.89,volume=0 1483229220000000000
price,type=BTC close=966.89,high=966.89,low=966.89,open=966.89,volume=0 1483229280000000000
price,type=BTC close=966.89,high=966.89,low=966.89,open=966.89,volume=0 1483229340000000000
price,type=BTC close=966.89,high=966.89,low=966.89,open=966.89,volume=0 1483229400000000000

Import it using influxdb client CLI:

influx -import -path=data/chronograf.txt
influx -import -path=NOAA_data.txt -precision=s -database=NOAA_water_database

View using the REST GET — can use a browser or curl — will return json

http://localhost:8086/query?db=import&q=select%20*%20from%20price

I need to do a little more playing with influxdb before moving on to viewing the data with grafana :

For future exploration — play with a bigger database:

https://docs.influxdata.com/influxdb/v1.7/query_language/data_download/

NOAA water database:

https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt