Writing Slurpers

A sluper generally operates in 3 phases:

  1. waiting
  2. collecting
  3. posting data


Often phase one is performed by cron, so the slurper is simply a collector and a a data poster,
launched periodically by cron. Other scripts want to, say, keep a database connection open, and
so keep running with a sleep() call between passes.


This generally involves reading the output of one or more commands, putting
values into variables so they're ready to post, and sometimes summing values
accross users/groups/systems so you can post aggregate data.

Usual issues are making sure collect data from a command in a format that
doesn't vary and confuse your script -- for example:

  • df(1) sometimes wraps lines to make the output "pretty", or runs columns of numbers together unless
    you give a specific option
  • condor_q omits columns specified by -format if it considers them empty


The webservice now requires HTTP Basic Authentication for posts to the data cache
area. So before you can put any data in, you'll need to allow it to be written
to with a password.

To do that, you'll log in as ifmon on fifemondata, go into the "www" directory
and use htpasswd to add an entry for the area you want your rrd files to
go into, something like:

   ssh ifmon@fifemondata
   cd www
   htpasswd  maaws_passwd new_dir/
   Enter Password:
   Confirm Password:

THen you can use -u new_dir/:new_password on curl commandlines when posting data
into this area, as described below.

Posting text

You can post textual data into files with curl or other web utilities
and provide it with --data arguments. You specify

  • a username (which is a path prefix) and password
  • the file to update
  • the data to go in the file
curl -u test:testpass \
     --data file=test/test.txt \
     --data data="this is some text" \

Posting data

This is overall pretty straight forward. The recommended approach is top use
curl(1) as a command, and provide it with --data arguments. You specify

  • a username (which is a path prefix) and password
  • the rrd filename to update
  • the timestamp for this data (which can be 'N' for now)
  • name=value data for each item you want to update in the rrd file

There are two ways to do this with curl:

  1. use multiple --data arugments. This looks like
        curl -u plugstrip:plugpass \
             --data "rrdfile=plugstrip/s31" \
             --data "time=N" \
             --data "temp=$temp31" \
             --data "load=$amps31" \
             --data "end=x" 
             --data "rrdfile=plugstrip/s32" \
             --data "time=N" \
             --data "temp=$temp32" \
             --data "load=$amps32" \
             --data "end=x" 
  2. use one --data "@-" argument, and pipe the data into curl
             printf "rrdfile=plugstrip/s31&" 
             printf "time=N&" 
             printf "temp=$temp31&" 
             printf "load=$amps31&" 
             printf "end=x&" 
             printf "rrdfile=plugstrip/s32&" 
             printf "time=N&" 
             printf "temp=$temp32&" 
             printf "load=$amps32&" 
             printf "end=x&" 
        ) | curl -u plugstrip:plugpass --data "@-" \



  • in either case, you can update multiple rrd files in one POST of data.
  • in the printf case above, you do print '&' signs on the end, and don't print newlines.
  • you can't have a variable named "end" "time" or "rrdfile" in your rrd files
    when using file_in_rrd.cgi
  • if you want to specify multiple time entries for the same rrd file, you
    have to say rrdfile=f&time=t1&...&end=x&rrdfile=f&time=t2&...&end=x
    which is to say, you list the same rrdfile with different timestamps.

The second approach, piping into curl --data "@-" is recommended when the list
of items being posted is long, or varies signifigantly. It is how most of the
prototype slurpers are implemented.

Gauge vs Counter Data

One of the important distinctions in data put into an RRD file is the distintion
between Gauge and Counter data:

Counter values are ones where you are reporting, say, total packets transmitted since the last
reboot, or total jobs submitted or completed. RRDtool infers a value from the previous counter values
and time reported.

Gauge values list a current measurment, like a temperature, or current number of running jobs, etc.

Currently, the prototype file_in_rrd uses string matching to guess whether the value being presented
is a gauge or a counter; it thinks things with:

  • Processes
  • Min
  • -g-
  • Avail
  • Slots
  • Used
  • Size
  • Jobs
  • Avg
  • Perc

in them are Gauge values, and anything else is a counter. The plan over time is to prune that list
down to just "-g-", so any new things that are Gauge values should put "-g-" in their name, and we'll
try to fix the exisitng slurpers to report gage values with a "-g-" names, and get rid of the other
string match heuristics, which end up being confusing.