Project

General

Profile

Striped Python API

Important This is official description of the Striped API. It includes all supported features of the API. Using of features or making any assumptions not documented here is dangerous and may lead to errors in the future as the development of the API and the underlying storage continues.

StripedClient class

Constructor

StripedClient(url_head) - creates new Striped Client to communicate with the server identified by the URL. Currently only Web server client is implemented.

Attributes

  • URLHead - string

Methods

  • datasets() - list of strings - returns names of all datasets available on the server
  • dataset(name) - StripedDataset - returns StripedDataset object representing the dataset given by the name

StripedDataset class

Constructor

This class is not supposed to be instantiated by itself. Use StripedClient.dataset() method instead.

Attributes

  • Name - string - name of the dataset
  • schema (property) - dictionary - dictionary representing the dataset chema
  • rgids (property) - list of integers - list of integers row group ids.
    Note that row groups IDs are not necessarily consecutive and not necessarily ordered in any particular way
  • columnNames - list of strings - list of column names (can be derived from the schema)
  • allColumns - dictionary - returns a dictionary { column_name: StripedColumn object } for all the columns in the dataset

Methods

  • rginfo(rgids) (to be deprecated, replaced with rginfos() method) - list of dictionaries - one entry for each rgid listed in the argument. Each dictionary looks like this:
{
  "RGID": <integer>,
  "NEvents": <number of events in the RG>,
  "BeginEventID": <number of first event the dataset in the group>,
  "Segments": [ # provenance information
      { 
       "FlleName": <original file name without path>,
       "FileIndex": <file index within the dataset, counting from 0>,
       "BeginEvent": <starting event index in the file, counting from 0>,
       "NEvents": <number of events in the segment>
      },....    # repeated for each file, contributed to the RG
    ]
}
  • rginfos(rgids) - list of RGInfo objects - one object for each element of the rgids list.
  • column(name) - StripedColumn object - returns the object representing the column by its name.
  • columns(names) - dictionary - returns dictionary { column_name: StripedColumn object } for given columns. This is equivalent to:
    { name:dataset.column(name) for name in columns }

    but this method is faster because it sends requests for individual column information in batches
  • stripes(columns, rgid, compress=False) - dictionary
    • retrieves data for all the listed columns (either by name or by StripedColumn object) * converts data to 1-dimensional numpy representation, according to the column descriptor * returns dictionary { column_name: stripe }
  • stripeSizes(columns, rgids) - dictionary
    • retrieves unconverted byte sizes for all the stripes given by the combination of columns and rgids * columns is a list StripedColumn objects. Only data columns are accepted here * rgids is a list of integer rgids - not assumed to be consecutive or ordered * returns nested dictionary: { column_name: { rgid: byte size, ... } ... }

RGInfo class

Attributes

  • RGID - integer - row grouup id
  • NEvents - integer - number of events in the row group
  • BeginEventID - integer - staring event ID in the dataset
  • Segments - list - ordered list of one or more RGInfoSegment objects

RGInfoSegment class

Attributes

  • NEvents - integer - number of events in the row group
  • FileName - string - origin file name
  • FileIndex - integer - file index in the dataset
  • BeginEvent - integer - index of the first segment event in the file

StripedColumn class

Constructor

This class is not supposed to be instantiated by itself. Use StripedDataset column() or columns() methods instead.

Attributes

  • Name - string - column name
  • Dataset - StripedDataset object - reference to the parent dataset
  • descriptor - StripedColumnDescriptor object
  • issize - boolean, property - whether this is a size column or real data column
  • sizeColumn - None or StripedColumn object - reference to the size column, if this is a data column and if it is inside a list. Otherwise it is None

Methods

  • assembleList(data, depth, size) - returns assembled list of (lists of... ) numpy arrays. On the top level, one element per event in the RG * data - the stripe * depth - depth of the column in the schema, 0 - top, scalar * size - size array for the stripe
  • stripe(rgid, compress=False, assembled=False) - returns the stripe for the column and RG * rgid - integer RGid * compressed - boolean, whether the data should be compressed for the network transfer * assembled - whether the stripe should be assembled or returned unassembled
  • stripeSizeArray(rgid, compress=False) - returns stripe size array for the RG

StripedColumnDescriptor class

Constructor

This class is not supposed to be instantiated by the user. It is a StripedColumn attribute

Attributes

  • Type - string - type as in ("int", "float", etc.)
  • NPType - string - internal representation in the storage expressed in the numpy style (e.g. "<f8", ">i4", etc.)
  • ConveryToNPType - string - data type to convert to when the stripe is returned to the API user
  • Depth - integer - depth of the column. If the column is attached to the top of the event tree, depth = 0. If it is inside an array - 1, and so on.
  • SizeColumn - size column name or None
  • ParentArray - None or the name of the parent array. Parent array themselves are not stored in the database, but all the column members of the same parent array have same "ParentArray" attribute. So this attribute can be used to reconstruct schema