Project

General

Profile

SAM Introduction:

Concepts

  • Files
    • bag of bits
    • unique name
  • Metadata
    • info about file
    • basic
      • size/checksum
      • type/format
    • provenance
      • parentage
      • application/version
    • experiment/custom
      • run/subrun
      • datastream
      • beam type
      • filters
      • etc.
  • Locations
    • on a "data disk"
    • directory path
    • tape info(if tape backed)
  • Disks/Nodes
    • locations constrained here
  • Dimensions
    • Query lanuage
    • (over)simplified sql
    • hides complex table structure
  • Definitions
    • name for dimensions query
    • can take:
  • Snapshots
    • list of files from a query at some time
    • numeric snapshot-id
  • Projects
    • Coordinates delivering files from a snapshot
    • clients are "Consumer Processes"
    • Deals with tape staging efficiently
    • hands out files to next ready consumer
    • script in python
    • script in bash

Implentation

  • Samweb instance
    • handles most sam requests
    • front-end for:
  • Database
    • complex schema
    • currently Postgres
    • formerly Oracle (D0, CDF...)
  • Stations
    • part that does Projects
    • can have multple stations
  • FTS
    • file transfer service
    • files left in a "dropbox" directory
    • filed away/declared to SAM by rules
    • can have metadata extractor plugins
  • Enstore Log Scraper(cron job)
    • gets files moved to tape info
    • saves enstore load
    • sometimes misses things...

fife_utils / SAM for users

  • scripts to deal with datasets of files
    • delare batches of files
    • move them from around
    • check SAM info versus real world
  • fife_launch/fife_wrap
    • convert executables into jobs