Queries¶
Two syntaxes are available for constructing queries: an “operator” API using python’s comparators, and a “fluent” API where terms are chained together. Which to use is a matter of preference, and both construct the same query object.
Operator syntax¶
Searches are built up from a series of Terminal
nodes, which compare structural
attributes to some search value. In the operator syntax, python’s comparator
operators are used to construct the comparison. The operators are overloaded to
return Terminal
objects for the comparisons.
from rcsbsearch import TextQuery
from rcsbsearch import rcsb_attributes as attrs
# Create terminals for each query
q1 = TextQuery('"heat-shock transcription factor"')
q2 = attrs.rcsb_struct_symmetry.symbol == "C2"
q3 = attrs.rcsb_struct_symmetry.kind == "Global Symmetry"
q4 = attrs.rcsb_entry_info.polymer_entity_count_DNA >= 1
Attributes are available from the rcsb_attributes object and can be tab-completed.
They can additionally be constructed from strings using the Attr(attribute)
constructor. For a full list of attributes, please refer to the RCSB
schema.
Terminal
s are combined into Group
s using python’s bitwise operators. This is
analogous to how bitwise operators act on python set
objects. The operators are
lazy and won’t perform the search until the query is executed.
query = q1 & q2 & q3 & q4 # AND of all queries
AND (&
), OR (|
), and terminal negation (~
) are implemented directly by the API,
but the python package also implements set difference (-
), symmetric difference (^
),
and general negation by transforming the query.
Queries are executed by calling them as functions. They return an iterator of result identifiers.
results = set(query())
By default, the query will return “entry” results (PDB IDs). It is also possible to query other types of results (see return-types for options):
assemblies = set(query("assembly"))
Fluent syntax¶
The operator syntax is great for simple queries, but requires parentheses or temporary variables for complex nested queries. In these cases the fluent syntax may be clearer. Queries are built up by appending operations sequentially.
from rcsbsearch import TextQuery
# Start with a Attr or TextQuery, then add terms
results = TextQuery('"heat-shock transcription factor"') \
.and_("rcsb_struct_symmetry.symbol").exact_match("C2") \
.and_("rcsb_struct_symmetry.kind").exact_match("Global Symmetry") \
.and_("rcsb_entry_info.polymer_entity_count_DNA").greater_or_equal(1) \
.exec("assembly")
Sessions¶
The result of executing a query (either by calling it or using exec()
) is a
Session
object. It implements __iter__
, so it is usually treated just as an
iterator of IDs.
Paging is handled transparently by the session, with additional API requests made
lazily as needed. The page size can be controlled with the rows
parameter.
first = next(iter(query(rows=1)))
Progress Bar¶
The Session.iquery()
method provides a progress bar indicating the number of API
requests being made. It requires the tqdm
package be installed to track the
progress of the query interactively.
results = query().iquery()