DataSources
In order to convert to and from various data structures, you will want to use DataSource and it’s implementations. For more info, see Saving Data
DataSource
Interface for dealing with various data sources
- class elasticwrap.data.datasource.DataSource(data: T)
Bases:
Generic
[T
]Abstract class that represents a data source.
- data
The data for this datasource
- Type
T
Notes
DataSource is a generic class, meaning that it can work on any datatype. For proper typechecking, it is recommended that any implementations of DataSource specify the type of the data they support.
- abstract static convert_documents(data: Iterable[Document]) Iterator[T]
Convert the provided iterator of ES documents into the format that this DataSource manages.
This method assumes the whole ES document, meaning that it should still include the _index and _source keys.
- Parameters
data (Iterable[Document]) – The documents holding the data that should be converted
Notes
This method is accessed statically. It strictly would not have to be part of this class, but it is nice to keep methods related to the same idea in the same place. In general, a user can be sure that {Type}DataSource.convert_document will always convert an ES document into the associated type.
This method is intended to work 2 ways, in the sense that if you were to reupload the converted document, you should create the exact same document as you originally retrieved. Because of this, there may be some superfluous data.
- Yields
Iterator – An iterator for the same data, but in a different container.
- Raises
NotImplementedError – Thrown if the implementing class forgets to implement this method.
- abstract read_documents(index_name: str) Iterator[PostDocument[Mapping[str, Any]]]
Construct Elasticsearch documents from
self.data
, which will be added to the indexindex_name
.The implementation depends on the concrete class that extends this class.
- Parameters
index_name (str) – The name of the Elasticsearch index.
- Yields
Iterator[PostDocument[Mapping[str, Any]]] – Iterator of documents that will be used by
elasticwrap.ElasticWrap.save()
.- Raises
NotImplementedError – Thrown if the implementing class forgets to implement this method.
DataFrameDataSource
Module that holds the DataSource class for Pandas DataFrames
See also
elasticwrap.data.datasource
, pandas.DataFrame
- class elasticwrap.data.dataframe.DataFrameDataSource(data: T)
Bases:
DataSource
[DataFrame
]Concrete class of DataSource for Pandas DataFrame
- static convert_documents(data) Iterator[DataFrame]
Notes
To keep with the convention, this version of the method does yield an iterator, but it is always an iterator with 1 element because lazy evaluation does not make sense in the context of pandas dataframes.
- Parameters
data (Iterable[Document]) – The iterable of elasticsearch documents to convert.
- Yields
Iterator[pd.DataFrame] – A 1-length iterator holding the resuling pandas DataFrame.
- read_documents(index_name: str) Iterator[PostDocument[Mapping[str, Any]]]
Construct Elasticsearch documents from
self.data
, which will be added to the indexindex_name
.This method overrides
elasticwrap.datasource.DataSource.read_documents()
.- Parameters
index_name (str) – The name of the Elasticsearch index.
- Yields
Iterator[PostDocument[Mapping[str, Any]]] – Iterator of documents that will be used by
documents.DocumentsWrapper.create_documents()
.
DictionaryDataSource
Module that holds the DataSource class for arbitrary python dictionaries
See also
- class elasticwrap.data.dictionary.DictionaryDataSource(data: T)
Bases:
DataSource
[Iterable
[Dict
]]Concrete class of DataSource for Python dictionary
- static convert_documents(data) Iterator[Dict]
Summary
- Parameters
data (Iterable[Document]) – The documents to convert to a regular Dict
- Yields
Iterator[Dict] – The converted documents. This is essentially just the data found in _source
- read_documents(index_name: str)
Construct Elasticsearch documents from
self.data
, which will be added to the indexindex_name
.This method overrides
elasticwrap.datasource.DataSource.read_documents()
.- Parameters
index_name (str) – The name of the Elasticsearch index.
- Yields
Iterator[PostDocument[Dict[str, any]]] – Iterator of documents that will be used by
documents.DocumentsWrapper.create_documents()
.