DataSources

In order to convert to and from various data structures, you will want to use DataSource and it’s implementations. For more info, see Saving Data

DataSource

Interface for dealing with various data sources

class elasticwrap.data.datasource.DataSource(data: T)

Bases: Generic[T]

Abstract class that represents a data source.

data

The data for this datasource

Type: T

Notes

DataSource is a generic class, meaning that it can work on any datatype. For proper typechecking, it is recommended that any implementations of DataSource specify the type of the data they support.

abstract static convert_documents(data: Iterable[Document]) → Iterator[T]

Convert the provided iterator of ES documents into the format that this DataSource manages.

This method assumes the whole ES document, meaning that it should still include the _index and _source keys.

Parameters: data (Iterable[Document]) – The documents holding the data that should be converted

Notes

This method is accessed statically. It strictly would not have to be part of this class, but it is nice to keep methods related to the same idea in the same place. In general, a user can be sure that {Type}DataSource.convert_document will always convert an ES document into the associated type.

This method is intended to work 2 ways, in the sense that if you were to reupload the converted document, you should create the exact same document as you originally retrieved. Because of this, there may be some superfluous data.

Yields: Iterator – An iterator for the same data, but in a different container.
Raises: NotImplementedError – Thrown if the implementing class forgets to implement this method.

abstract read_documents(index_name: str) → Iterator[PostDocument[Mapping[str, Any]]]

Construct Elasticsearch documents from self.data, which will be added to the index index_name.

The implementation depends on the concrete class that extends this class.

Parameters: index_name (str) – The name of the Elasticsearch index.
Yields: Iterator[PostDocument[Mapping[str, Any]]] – Iterator of documents that will be used by elasticwrap.ElasticWrap.save().
Raises: NotImplementedError – Thrown if the implementing class forgets to implement this method.

DataFrameDataSource

Module that holds the DataSource class for Pandas DataFrames

DictionaryDataSource

Module that holds the DataSource class for arbitrary python dictionaries