Utilities
Utility methods for use with ElasticWrap or outside of it.
Helpers
A collection of useful methods/data structures
- class elasticwrap.utils.helpers.PropagatingThread(*args, **kwargs)
Bases:
Thread
A thread variant which stores exceptions and throws them back on join.
- join(timeout=None)
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.
When the timeout argument is not present or None, the operation will block until the thread terminates.
A thread can be join()ed many times.
join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.
- run()
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- class elasticwrap.utils.helpers.RecursiveDict(*args, parent: Optional[Self] = None, parent_key: Optional[Self] = None, **kwargs)
Bases:
defaultdict
This class creates a defaultdict which will automatically set the value of an unknown key to be a new recursive defaultdict when it is encountered for the first time.
Additionally, if a key is deleted in the RecursiveDict which causes it to become empty, it will automatically delete the associated key from the parent (causing recursive deletions until either the root is encountered or a parent has another child).
Example
>>> a = RecursiveDict() >>> a['b']['c']['d']['e'] = 4 >>> a['b']['f'] = 3 >>> a RecursiveDict(..., {'b': RecursiveDict(..., {'f': 3, 'c': RecursiveDict(..., {'d': RecursiveDict(..., {'e': 4})})})}) >>> del a['b']['c']['d']['e'] >>> a RecursiveDict(..., {'b': RecursiveDict(..., {'f': 3})})
See also
defaultdict
- args
Arguments that were passed to this dict, will be passed to any automatically generated children.
- kwargs
Keyword arguments that were passed to this dict, will be passed to any automatically generated children.
- parent
The parent of this dict. If set to None, that means this is the root of the tree.
- Type
Optional[Self]
- parent_key
The key with which this dict was stored in the parent.
- Type
TYPE
Notes
On a copy operation, the tree slightly falls apart. If you have a layered RecursiveDict tree and you copy the root. The children will still be linked to eachother, only the first child is linked to the original parent and not the copy, resulting in the following behavior: >>> a[1][2][3] = 4 >>> b = a.copy() >>> del a[1][2][3] >>> a -> {} >>> b -> {1: {}}
If you make a deepcopy, the structure is kept intuitive and children are linked as expected.
- elasticwrap.utils.helpers.get_from_dict_dot(d: Mapping[str, Any], key: str) Any
Method for getting from a dictionary using dot notation
Example
>>> key = "a.b.c" >>> d = {"a": {"b": {"c": 1}}} >>> get_from_dict_dot(d, key) == 1
- Parameters
d (Mapping[str, Any]) – The dictionary to get from
key (str) – The dot notated string to get with
- Returns
The data
- Return type
Any
- elasticwrap.utils.helpers.merge_dicts(key: str, source: Mapping[str, Any], dest: MutableMapping[str, Any])
Merges two dicts together on a specific key. Happens in place
- Parameters
key (str) – The key to merge on.
source (Mapping[str, Any]) – The “source” of the data.
dest (MutableMapping[str, Any]) – The dictionary to put the data in.
Notes
If the key is not found in the source, nothing happens.
If the key is found in the source but not the dest, a shallow copy is put directly in the dest
If the key is found in both the source and the dest, the dictionaries are merged together. In case of a conflict, the dest will be assumed canonical.
- elasticwrap.utils.helpers.normalize_from_keyword(kw: MutableMapping[str, Any]) None
Setting query={“from”: …} would make ‘from’ be used as a keyword argument instead of ‘from_’. We handle that here.
- elasticwrap.utils.helpers.pop_transport_kwargs(kw: MutableMapping[str, Any]) Dict[str, Any]
Extract kwargs commonly used to specify parameters for network operations.
- Parameters
kw (MutableMapping[str, Any]) – The kwargs to extract from
- Returns
The extracted kwargs
- Return type
Dict[str, Any]
scan_pit
- class elasticwrap.utils.scan_pit.PitError(pit_id: str, *args: Any, **kwargs: Any)
Bases:
Exception
An error for when something goes wrong doing PIT search.
- pit_id
The pit_id that the search was performed with.
- Type
str
- scan_pit.scan_pit(index: str, initial_keep_alive: str = '5m', extend_keep_alive: str = '1m', query: Dict[str, Any] = {}, raise_on_error: bool = True, size: int = 1000, request_timeout: Optional[float] = None, clear_pit: bool = True, page_max=0, **kwargs: Any) Iterator[List[Document]]
Use the Point in time API to paginate search results.
Arguments:
- clientElasticsearch
the ES client object to connect to the cluster
- indexst
The ES index to gather data from
- initial_keep_alivestr
Specify how long a consistent view of the index should be maintained.
- extend_keep_alivestr
how long the time to live of the point in time after each search should be extended
- queryDict[str, Any]
body for the ES api
- raise_on_errorbool
raises an exception (
PitError
) if an error is encountered (some shards fail to execute).- sizeint
size (per shard) of the batch send at each iteration.
- request_timeoutOptional[float]
explicit timeout for each call to
search_pit_helper
- clear_pitbool
explicitly calls delete on the pit id via the clear pit API at the end of the method on completion or error, defaults to true.
- page_maxint
How many pages you want to retrieve at maximum. If set to 0 all pages are retrieved.
See also
elasticsearch.helpers.scan
PIT is the currently suggested way to paginate results in ElasticSearch[1]_. Because oftentimes we will be returning many documents, it has been decided to always default to this search. It is mostly intended for internal use, but can also be helpful when you want to search more manually.
[1] Elastic co. “Paginate search results” https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html