WMArchive queries
The queries in WMArchive should be placed via POST request in a form of Query
Language (QL). The QL should specify condition keys and document fields to be
retrieved. We end-up using JSON format as QL data-format similar to MongoDB QL.
For example:
{"spec": {"lfn":"bla*123.root", "task": "bla", "run":[1,2,3]}, "fields":["logs"]}
here we define that our conditions spec is a dictionary where we define actual
conditions such as lfn pattern, task name and run list. The output fields would
be logs key in FWJR documents stored in WMArchive.
The WMArchive will support three attributes which user may provide:
spec defines condition dictionary each spec must have a timerange key-value
pair which points to specific timerange user want to look-up the data, e.g.
"timerange":[20160101, 20160102]
, the values in timerange list should be in
YYYYMMDD format fields defines list of attributes to be retrieved from FWJR and
returned back to user aggregate define list of attributes we need to aggregate
Query Language
The QL should be flexible enough to accommodate user queries, see next section.
For that we can use JSON document to define our queries, see example in
previous section. Since we're going to use nested structure in our FWJR
documents we need to refer them via dot notations, e.g. output.inputDataset,
meta_data.agent_ver, etc. Usage of JSON for queries can be easily translated
into MongoDB QL. Here is simplest QL rule definitions (mostly aligned with
MongoDB QL):
- input query should be in form of JSON
- the query JSON should contain only key-value pairs, no nested structures
- the supported data types for values would be int, float, string and list of those
- user can send multiple set of conditions which translates into AND set
- user can send alternative conditions via
{"$or": [JSON1, JSON2]}
structure.
- user can specify $gt, $lt operators for greater than, less than use cases
The WMArchive code will translate user query either into Mongo DB one or the
one suitable for querying documents on HDFS. Here are few examples of QL syntax
# use multiple keys
{"task":"abc", "run":1}
# use patterns
{"lfn":"/a/v/b*/123.root"}
# use or conditions
{"$or": [{"task":"abc", "lfn":"/a/v/b*/123.root"}, {"run":[1,2,3]}]}
# use array values, i.e. find docs with all given run numbers
{"run":[1,2,3]}
# usage of $gt, $lt operators
{"run":{"$gt":1})
Example of user queries
# User query
{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["wmaid"]}
# client call and output
wma_client.py --spec=query.json
{"result": [{"status": "ok",
"input": {"fields": ["wmaid"],
"spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}},
"storage": "mongodb",
"results": [{"wmaid": "6b0bace20fc732563316198d1ed2b94e"}]}]}