API Documentation¶
RCSB Search API
-
class
rcsbsearch.
Attr
(attribute: str)¶ A search attribute, e.g. “rcsb_entry_container_identifiers.entry_id”
Terminals can be constructed from Attr objects using either a functional syntax, which mirrors the API operators, or with python operators.
Rather than their normal bool return values, operators return Terminals.
Pre-instantiated attributes are available from the
rcsbsearch.rcsb_attributes
object. These are generally easier to use than constructing Attr objects by hand. A complete list of valid attributes is available in the schema.-
__contains__
(value: Union[str, List[str], rcsbsearch.search.Value[str], rcsbsearch.search.Value[List[str]]]) → rcsbsearch.search.Terminal¶ Maps to contains_words or contains_phrase depending on the value passed.
“value” in attr maps to attr.contains_phrase(“value”) for simple values.
[“value”] in attr maps to attr.contains_words([“value”]) for lists and tuples.
-
__eq__
(value: Attr) → bool¶ -
__eq__
(value: Union[str, int, float, datetime.date, Value[str], Value[int], Value[float], Value[date]]) → rcsbsearch.search.Terminal Return self==value.
-
__ge__
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Return self>=value.
-
__gt__
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Return self>value.
-
__le__
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Return self<=value.
-
__lt__
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Return self<value.
-
__ne__
(value: Attr) → bool¶ -
__ne__
(value: Union[str, int, float, datetime.date, Value[str], Value[int], Value[float], Value[date]]) → rcsbsearch.search.Terminal Return self!=value.
-
__weakref__
¶ list of weak references to the object (if defined)
-
contains_phrase
(value: Union[str, rcsbsearch.search.Value[str]]) → rcsbsearch.search.Terminal¶ Match an exact phrase
-
contains_words
(value: Union[str, rcsbsearch.search.Value[str], List[str], rcsbsearch.search.Value[List[str]]]) → rcsbsearch.search.Terminal¶ Match any word within the string.
Words are split at whitespace. All results which match any word are returned, with results matching more words sorted first.
-
equals
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Attribute == value
-
exact_match
(value: Union[str, rcsbsearch.search.Value[str]]) → rcsbsearch.search.Terminal¶ Exact match with the value
-
exists
() → rcsbsearch.search.Terminal¶ Attribute is defined for the structure
-
greater
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Attribute > value
-
greater_or_equal
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Attribute >= value
-
in_
(value: Union[List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …], rcsbsearch.search.Value[List[str]], rcsbsearch.search.Value[List[int]], rcsbsearch.search.Value[List[float]], rcsbsearch.search.Value[List[datetime.date]], rcsbsearch.search.Value[Tuple[str, …]], rcsbsearch.search.Value[Tuple[int, …]], rcsbsearch.search.Value[Tuple[float, …]], rcsbsearch.search.Value[Tuple[datetime.date, …]]]) → rcsbsearch.search.Terminal¶ Attribute is contained in the list of values
-
less
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Attribute < value
-
less_or_equal
(value: Union[int, float, datetime.date, rcsbsearch.search.Value[int], rcsbsearch.search.Value[float], rcsbsearch.search.Value[datetime.date]]) → rcsbsearch.search.Terminal¶ Attribute <= value
-
range
(value: Union[List[int], Tuple[int, int]]) → rcsbsearch.search.Terminal¶ Attribute is within the specified half-open range
- Parameters
value – lower and upper bounds [a, b)
-
range_closed
(value: Union[List[int], Tuple[int, int], rcsbsearch.search.Value[List[int]], rcsbsearch.search.Value[Tuple[int, int]]]) → rcsbsearch.search.Terminal¶ Attribute is within the specified closed range
- Parameters
value – lower and upper bounds [a, b]
-
-
class
rcsbsearch.
Group
(operator: typing_extensions.Literal[and, or], nodes: Iterable[rcsbsearch.search.Query] = ())¶ AND and OR combinations of queries
-
__and__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Intersection: a & b
-
__invert__
()¶ Negation: ~a
-
__or__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Union: a | b
-
_assign_ids
(node_id=0) → Tuple[rcsbsearch.search.Query, int]¶ Assign node_ids sequentially for all terminal nodes
This is a helper for the
Query.assign_ids()
method- Parameters
node_id – Id to assign to the first leaf of this query
- Returns
The modified query, with node_ids assigned node_id: The next available node_id
- Return type
query
-
to_dict
()¶ Get dictionary representing this query
-
-
class
rcsbsearch.
Query
¶ Base class for all types of queries.
Queries can be combined using set operators:
q1 & q2: Intersection (AND)
q1 | q2: Union (OR)
~q1: Negation (NOT)
q1 - q2: Difference (implemented as q1 & ~q2)
q1 ^ q2: Symmetric difference (XOR, implemented as (q1 & ~q2) | (~q1 & q2))
Note that only AND, OR, and negation of terminals are directly supported by the API, so other operations may be slower.
Queries can be executed by calling them as functions (list(query())) or using the exec function.
Queries are immutable, and all modifying functions return new instances.
-
__and__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Intersection: a & b
-
__call__
(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance] = 'entry', rows: int = 100) → rcsbsearch.search.Session¶ Evaluate this query and return an iterator of all result IDs
-
abstract
__invert__
() → rcsbsearch.search.Query¶ Negation: ~a
-
__or__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Union: a | b
-
__sub__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Difference: a - b
-
__weakref__
¶ list of weak references to the object (if defined)
-
__xor__
(other: rcsbsearch.search.Query) → rcsbsearch.search.Query¶ Symmetric difference: a ^ b
-
abstract
_assign_ids
(node_id=0) → Tuple[rcsbsearch.search.Query, int]¶ Assign node_ids sequentially for all terminal nodes
This is a helper for the
Query.assign_ids()
method- Parameters
node_id – Id to assign to the first leaf of this query
- Returns
The modified query, with node_ids assigned node_id: The next available node_id
- Return type
query
-
and_
(other: Query) → Query¶ -
and_
(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an AND
-
assign_ids
() → rcsbsearch.search.Query¶ Assign node_ids sequentially for all terminal nodes
- Returns
the modified query, with node_ids assigned sequentially from 0
-
exec
(return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance] = 'entry', rows: int = 100) → rcsbsearch.search.Session¶ Evaluate this query and return an iterator of all result IDs
-
or_
(other: Query) → Query¶ -
or_
(other: Union[str, Attr]) → PartialQuery Extend this query with an additional attribute via an OR
-
abstract
to_dict
() → Dict¶ Get dictionary representing this query
-
to_json
() → str¶ Get JSON string of this query
-
class
rcsbsearch.
Session
(query: rcsbsearch.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance] = 'entry', rows: int = 100)¶ A single query session.
Handles paging the query and parsing results
-
__init__
(query: rcsbsearch.search.Query, return_type: typing_extensions.Literal[entry, assembly, polymer_entity, non_polymer_entity, polymer_instance] = 'entry', rows: int = 100)¶ Initialize self. See help(type(self)) for accurate signature.
-
__iter__
() → Iterator[str]¶ Generator for all results as a list of identifiers
-
__weakref__
¶ list of weak references to the object (if defined)
-
static
_extract_identifiers
(query_json: Optional[Dict]) → List[str]¶ Extract identifiers from a JSON response
-
_make_params
(start=0)¶ Generate GET parameters as a dict
-
_single_query
(start=0) → Optional[Dict]¶ Fires a single query
-
iquery
(limit: Optional[int] = None) → List[str]¶ Evaluate the query and display an interactive progress bar.
Requires tqdm.
-
static
make_uuid
() → str¶ Create a new UUID to identify a query
-
rcsb_query_builder_url
() → str¶ URL to view this query on the RCSB website query builder
-
rcsb_query_editor_url
() → str¶ URL to edit this query in the RCSB query editor
-
-
class
rcsbsearch.
Terminal
(attribute: Optional[str] = None, operator: Optional[str] = None, value: Optional[Union[str, int, float, datetime.date, List[str], List[int], List[float], List[datetime.date], Tuple[str, …], Tuple[int, …], Tuple[float, …], Tuple[datetime.date, …]]] = None, service: str = 'text', negation: bool = False, node_id: int = 0)¶ A terminal query node.
Terminals are simple predicates comparing some attribute of a structure to a value.
Examples
>>> Terminal("exptl.method", "exact_match", "X-RAY DIFFRACTION") >>> Terminal("rcsb_id", "in", ["5T89", "1TIM"]) >>> Terminal(value="tubulin")
A full list of attributes is available in the schema. Operators are documented here.
The
Attr
class provides a more pythonic way of constructing Terminals.-
__invert__
()¶ Negation: ~a
-
__str__
()¶ Return a simplified string representation
Examples
>>> Terminal("attr", "op", "val") >>> ~Terminal(value="val")
-
_assign_ids
(node_id=0) → Tuple[rcsbsearch.search.Query, int]¶ Assign node_ids sequentially for all terminal nodes
This is a helper for the
Query.assign_ids()
method- Parameters
node_id – Id to assign to the first leaf of this query
- Returns
The modified query, with node_ids assigned node_id: The next available node_id
- Return type
query
-
to_dict
()¶ Get dictionary representing this query
-
-
class
rcsbsearch.
TextQuery
(value: str, negation: bool = False)¶ Special case of a Terminal for free-text queries
-
__init__
(value: str, negation: bool = False)¶ Search for the string value anywhere in the text
- Parameters
value – free-text query
negation – find structures without the pattern
-
-
class
rcsbsearch.
Value
(value: T)¶ Represents a value in a query.
In most cases values are unnecessary and can be replaced directly by the python value.
Values can also be used if the Attr object appears on the right:
Value(“4HHB”) == Attr(“rcsb_entry_container_identifiers.entry_id”)
-
__eq__
(attr: Value) → bool¶ -
__eq__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal Return self==value.
-
__ge__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal¶ Return self>=value.
-
__gt__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal¶ Return self>value.
-
__le__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal¶ Return self<=value.
-
__lt__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal¶ Return self<value.
-
__ne__
(attr: Value) → bool¶ -
__ne__
(attr: rcsbsearch.search.Attr) → rcsbsearch.search.Terminal Return self!=value.
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
rcsbsearch.
rcsb_attributes
: SchemaGroup = <rcsbsearch.schema.SchemaGroup object>¶ Object with all known RCSB attributes.
This is provided to ease autocompletion as compared to creating Attr objects from strings. For example,
rcsb_attributes.rcsb_nonpolymer_instance_feature_summary.chem_id
is equivalent to
Attr('rcsb_nonpolymer_instance_feature_summary.chem_id')
All attributes in rcsb_attributes can be iterated over.
>>> [a for a in rcsb_attributes if "stoichiometry" in a.attribute] [Attr(attribute='rcsb_struct_symmetry.stoichiometry')]
Attributes matching a regular expression can also be filtered:
>>> list(rcsb_attributes.search('rcsb.*stoichiometry')) [Attr(attribute='rcsb_struct_symmetry.stoichiometry')]a