DataSources

In order to convert to and from various data structures, you will want to use DataSource and it’s implementations. For more info, see Saving Data

DataSource

Interface for dealing with various data sources

class elasticwrap.data.datasource.DataSource(data: T)

Bases: Generic[T]

Abstract class that represents a data source.

data

The data for this datasource

Type

T

Notes

DataSource is a generic class, meaning that it can work on any datatype. For proper typechecking, it is recommended that any implementations of DataSource specify the type of the data they support.

abstract static convert_documents(data: Iterable[Document]) Iterator[T]

Convert the provided iterator of ES documents into the format that this DataSource manages.

This method assumes the whole ES document, meaning that it should still include the _index and _source keys.

Parameters

data (Iterable[Document]) – The documents holding the data that should be converted

Notes

This method is accessed statically. It strictly would not have to be part of this class, but it is nice to keep methods related to the same idea in the same place. In general, a user can be sure that {Type}DataSource.convert_document will always convert an ES document into the associated type.

This method is intended to work 2 ways, in the sense that if you were to reupload the converted document, you should create the exact same document as you originally retrieved. Because of this, there may be some superfluous data.

Yields

Iterator – An iterator for the same data, but in a different container.

Raises

NotImplementedError – Thrown if the implementing class forgets to implement this method.

abstract read_documents(index_name: str) Iterator[PostDocument[Mapping[str, Any]]]

Construct Elasticsearch documents from self.data, which will be added to the index index_name.

The implementation depends on the concrete class that extends this class.

Parameters

index_name (str) – The name of the Elasticsearch index.

Yields

Iterator[PostDocument[Mapping[str, Any]]] – Iterator of documents that will be used by elasticwrap.ElasticWrap.save().

Raises

NotImplementedError – Thrown if the implementing class forgets to implement this method.

DataFrameDataSource

Module that holds the DataSource class for Pandas DataFrames

See also

elasticwrap.data.datasource, pandas.DataFrame

class elasticwrap.data.dataframe.DataFrameDataSource(data: T)

Bases: DataSource[DataFrame]

Concrete class of DataSource for Pandas DataFrame

static convert_documents(data) Iterator[DataFrame]

Notes

To keep with the convention, this version of the method does yield an iterator, but it is always an iterator with 1 element because lazy evaluation does not make sense in the context of pandas dataframes.

Parameters

data (Iterable[Document]) – The iterable of elasticsearch documents to convert.

Yields

Iterator[pd.DataFrame] – A 1-length iterator holding the resuling pandas DataFrame.

read_documents(index_name: str) Iterator[PostDocument[Mapping[str, Any]]]

Construct Elasticsearch documents from self.data, which will be added to the index index_name.

This method overrides elasticwrap.datasource.DataSource.read_documents().

Parameters

index_name (str) – The name of the Elasticsearch index.

Yields

Iterator[PostDocument[Mapping[str, Any]]] – Iterator of documents that will be used by documents.DocumentsWrapper.create_documents().

DictionaryDataSource

Module that holds the DataSource class for arbitrary python dictionaries

class elasticwrap.data.dictionary.DictionaryDataSource(data: T)

Bases: DataSource[Iterable[Dict]]

Concrete class of DataSource for Python dictionary

static convert_documents(data) Iterator[Dict]

Summary

Parameters

data (Iterable[Document]) – The documents to convert to a regular Dict

Yields

Iterator[Dict] – The converted documents. This is essentially just the data found in _source

read_documents(index_name: str)

Construct Elasticsearch documents from self.data, which will be added to the index index_name.

This method overrides elasticwrap.datasource.DataSource.read_documents().

Parameters

index_name (str) – The name of the Elasticsearch index.

Yields

Iterator[PostDocument[Dict[str, any]]] – Iterator of documents that will be used by documents.DocumentsWrapper.create_documents().