The dimension syntax is composed of simple predicates grouped together to make more
complicated queries. The dimension parser takes care of doing the appropriate
database table joins, etc. to make the query into a full SQL query.
Simple dimension predicates¶
These are simple dimension_name operator value predicates like
file_name = "myfile" end_time >= 2010/10/20T13:30:00 ...
The operator can be left out, which implies "=".
The list of dimension names for these predicates can be obtained by looking at
with EXPERIMENT replaced by your experiment name (i.e. "nova", "mu2e", "minerva", etc.).
||greater than or equal|
||less than or equal|
|| wildcard string match (can also use
|| inverse wildcard string match (can also use
? for any single character,
% for one or more of any character.
There are also list based predicates dimension_name op list for operators
||is one of|
||is not one of|
predicates can be combined with and, or, and not, and grouped with parenthesis to make compound queries.
(file_name like "foo%bar.root" and file_size > 1024) or (file_name like "foo%baz.root" and file_size > 2048)
predicate minus predicate refer to a set-minus operation of the
two specified datasets.
Referring to existing definitions¶
Predicates of the form
defname: "foo"refer to an existing defined dataset named foo. This effectively inserts the definition of
fooat this point.
Modifiers that alter the behaviour of all or part of the query are specified using the
with operator, which must come at the end of a clause. A single with clause can specify multiple modifier terms.
The available modifiers are:
limit nwhich limits the maximum number of files to return
offset nwhich skips the first n files in the result set
stride nwhich returns every n'th file in the result set
availability x[,y[,z]]which limits the results to files that match the given availability status
For the availability modifier, possible availability: flags are in three categories. Multiple flags can be given as a comma separated list
- active flags:
active- only list active files (default)
retired- only list files that have been retired
- status flags:
good- only list files with a good content status (default)
bad- only list files with a bad content status
anystatus- list files regardless of content status
- location flags:
virtual- only list files with no locations
physical- only list files with at least one location (default at the top level)
anylocation- list files regardless of locations (default within
ischildofclauses - see below)
A predicate can be prefixed with
ischildof: to state that the predicate refers to
the immediate parent or child file of the one of which we are speaking. So if you want to look for
a large file derived from a small raw file, you could say:
file_size > 2048 and ischildof: (data_tier = "raw" and file_size < 1024 )
The default behaviour inside the ancestry clause is to include virtual files (those with no location). To override this, include an availability term with in the clause, such as
availability: physical to only consider files with a location.
isancestorof: can be used to look at all ancestors or descendants of the selected file set. These operators should be used with care as the file lineage tree can potentially get very big. They cannot be mixed with other lineage operators in the same query.
Note on negation¶
A subtle point is that referring to a named dimension forces the evaluator to require the existence of the relevant value for a file to match. In most cases this doesn't matter, but it can have some surprising effects when constraints are negated. For example
data_tier A and not some.parameter B
will only return files that have
some.parameter defined in their metadata as long as it doesn't have the value
B. To return all files that don't have
some.parameter at all as well as
some.parameter != B
data_tier A minus some.parameter B
The right hand side of the minus clause is evaluated separately from the left hand side so all files that match are removed from the results without imposing any further constraints on the results.
Recovery dataset for one or more projects¶
(project_name in ('mengelTest1355428720','mengelTest1355427845') and not (consumed_status like 'consumed'))
Files analyzed already¶
( data_tier in ('raw', 'binary-raw') and run_number >= 3515 and run_number < 3522 and isparentof: (data_tier = 'rawdigits') and physical_datastream_name in ('numib', 'numil', 'numip') and not quality.minerva = 'bad' )
Every 10th file from a previously defined dataset definition, up to a maximum of 100¶
defname: a_existing_definition_name with limit 100 stride 10
Use Latest Snapshot On A Definition Name¶
More On Dimensions Using The Client¶
samweb list-files --help-dimensions