ElasticWrap
Main ElasticWrap module. Holds the actual object as well as a helpful Serializer.
- class elasticwrap.ElasticWrap(*args, hosts: Union[str, List[str]] = 'http://localhost:9200', **kwargs)
Bases:
Elasticsearch
The main object of the ElasticWrap library. All ElasticSearch related activities can be accomplished through this object.
The vanilla elasticsearch API can be used through this object. For any guide which mentions the Elasticsearch object you can simply use this object and it will work.
In general, this object has a number of attributes which lead to other objects which you can use to actually interface with the datastore.
- commits
The object for searching through commit data.
- Type
- indices
The object for dealing with ElasticSearch indices. Can be used to easily search through, create and delete indices.
- Type
- logger
A logger to log with
See also
elasticsearch.Elasticsearch
The vanilla ElasticSearch object.
elasticwrap.indices.IndicesWrapper
,elasticwrap.commits.CommitClient
- save(data_source: DataSource, proj: str, index_name: str, refresh: Literal['true', 'wait_for', 'false'] = 'wait_for')
Create documents from
data_source
and add them to an index. The index follows this naming format{proj}_{index_name}
.Existing documents with identical uuid value will be overwritten.
- Parameters
data_source (DataSource) – An instance of
datasource.DataSource
that contains the list of documents.proj (str) – Name of the PAW project.
index_name (str) – The index name provided by user. Usually a specific data source.
refresh (Literal["true", "wait_for", "false"], optional) –
Whether to force-refresh the index after saving.
If “false”, the method returns instantly after saving and the results may not yet be visible in the data store, but will be at some point in the future. Useful if you just want to submit and do not immediately require the data for anything else.
If “true”, the method will force-refresh the index. This is discouraged by the ES devs_[1].
If “wait_for”, the method will wait for a refresh to happen. This is the default value
[1] Elastic co. “?refresh” https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
Examples
Add a document from a dictionary to an index named
project_analysis
:>>> dict = DictionaryDataSource([{"uuid": "07341f846c0d68026997314a542251ca54151041", ... "message_analyzed": "a commit message"}]) >>> client = ElasticWrap() >>> client.save(dict, proj='project', index_name='analysis')
Add a document from
pandas.DataFrame
to an index namedproject_analysis
:>>> import pandas as pd >>> df = DataFrameDataSource(pd.DataFrame({"uuid": ["07341f846c0d68026997314a542251ca54151041"], ... "message_analyzed": ["a commit message"]}, ... index=["07341f846c0d68026997314a542251ca54151041"])) >>> client = ElasticWrap() >>> client.save(df, proj='project', index_name='analysis')
- Raises
NotFoundError – If the index you are submitting to does not exist.