WMArchive queries

The queries in WMArchive should be placed via POST request in a form of Query Language (QL). The QL should specify condition keys and document fields to be retrieved. We end-up using JSON format as QL data-format similar to MongoDB QL. For example:
{"spec": {"lfn":"bla*123.root", "task": "bla", "run":[1,2,3]}, "fields":["logs"]}
here we define that our conditions spec is a dictionary where we define actual conditions such as lfn pattern, task name and run list. The output fields would be logs key in FWJR documents stored in WMArchive. The WMArchive will support three attributes which user may provide: spec defines condition dictionary each spec must have a timerange key-value pair which points to specific timerange user want to look-up the data, e.g.
"timerange":[20160101, 20160102]
, the values in timerange list should be in YYYYMMDD format fields defines list of attributes to be retrieved from FWJR and returned back to user aggregate define list of attributes we need to aggregate

Query Language

The QL should be flexible enough to accommodate user queries, see next section. For that we can use JSON document to define our queries, see example in previous section. Since we're going to use nested structure in our FWJR documents we need to refer them via dot notations, e.g. output.inputDataset, meta_data.agent_ver, etc. Usage of JSON for queries can be easily translated into MongoDB QL. Here is simplest QL rule definitions (mostly aligned with MongoDB QL): The WMArchive code will translate user query either into Mongo DB one or the one suitable for querying documents on HDFS. Here are few examples of QL syntax
# use multiple keys
{"task":"abc", "run":1}

# use patterns
{"lfn":"/a/v/b*/123.root"}

# use or conditions
{"$or": [{"task":"abc", "lfn":"/a/v/b*/123.root"}, {"run":[1,2,3]}]}

# use array values, i.e. find docs with all given run numbers
{"run":[1,2,3]}

# usage of $gt, $lt operators
{"run":{"$gt":1})

Example of user queries

# User query
{"spec":{"task": "/AbcCde_Task_Data_test_6735909/RECO"}, "fields":["wmaid"]}

# client call and output
wma_client.py --spec=query.json
{"result": [{"status": "ok",
             "input": {"fields": ["wmaid"],
             "spec": {"task": "/AbcCde_Task_Data_test_6735909/RECO"}},
             "storage": "mongodb",
             "results": [{"wmaid": "6b0bace20fc732563316198d1ed2b94e"}]}]}