ElasticWrap

Main ElasticWrap module. Holds the actual object as well as a helpful Serializer.

class elasticwrap.ElasticWrap(*args, hosts: Union[str, List[str]] = 'http://localhost:9200', **kwargs)

Bases: Elasticsearch

The main object of the ElasticWrap library. All ElasticSearch related activities can be accomplished through this object.

The vanilla elasticsearch API can be used through this object. For any guide which mentions the Elasticsearch object you can simply use this object and it will work.

In general, this object has a number of attributes which lead to other objects which you can use to actually interface with the datastore.

commits

The object for searching through commit data.

Type

CommitClient

indices

The object for dealing with ElasticSearch indices. Can be used to easily search through, create and delete indices.

Type

IndicesWrapper

logger

A logger to log with

See also

elasticsearch.Elasticsearch

The vanilla ElasticSearch object.

elasticwrap.indices.IndicesWrapper, elasticwrap.commits.CommitClient

save(data_source: DataSource, proj: str, index_name: str, refresh: Literal['true', 'wait_for', 'false'] = 'wait_for')

Create documents from data_source and add them to an index. The index follows this naming format {proj}_{index_name}.

Existing documents with identical uuid value will be overwritten.

Parameters
  • data_source (DataSource) – An instance of datasource.DataSource that contains the list of documents.

  • proj (str) – Name of the PAW project.

  • index_name (str) – The index name provided by user. Usually a specific data source.

  • refresh (Literal["true", "wait_for", "false"], optional) –

    Whether to force-refresh the index after saving.

    If “false”, the method returns instantly after saving and the results may not yet be visible in the data store, but will be at some point in the future. Useful if you just want to submit and do not immediately require the data for anything else.

    If “true”, the method will force-refresh the index. This is discouraged by the ES devs_[1].

    If “wait_for”, the method will wait for a refresh to happen. This is the default value

    [1] Elastic co. “?refresh” https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html

Examples

Add a document from a dictionary to an index named project_analysis:

>>> dict = DictionaryDataSource([{"uuid": "07341f846c0d68026997314a542251ca54151041",
...                               "message_analyzed": "a commit message"}])
>>> client = ElasticWrap()
>>> client.save(dict, proj='project', index_name='analysis')

Add a document from pandas.DataFrame to an index named project_analysis:

>>> import pandas as pd
>>> df = DataFrameDataSource(pd.DataFrame({"uuid": ["07341f846c0d68026997314a542251ca54151041"],
...                                        "message_analyzed": ["a commit message"]},
...                                        index=["07341f846c0d68026997314a542251ca54151041"]))
>>> client = ElasticWrap()
>>> client.save(df, proj='project', index_name='analysis')
Raises

NotFoundError – If the index you are submitting to does not exist.

class elasticwrap.WrapSerializer

Bases: JsonSerializer

A helpful serializer for ElasticWrap. All it does is make sure that python sets can be passed along to the elasticsearch-py library without issue.

default(data: Any) Any