Adding retrievers for custom datasources

One extension that is likely desired is the addition of more Perceval retrievers. This will allow the PAW to read data from more sources.

Defining the retriever

In order to define a retriever, a new instance of The Perceval Backend Interface has to be created.

Adding the retriever

Now, in order to have your retriever be recognized by Perceval, it needs to be placed at a sub-folder of perceval/backends/ from the perspective of your pyproject.toml, since this is the folder which, upon launch, Perceval will scan for retrievers to use.

For example, see the Mozilla suite retrievers.

Re-building with the retriever

So you have to re-build the PAW with a customized version of perceval which includes the new backend. Your best solution for this would be to point the Perceval install in deployment/docker/requirements.txt to the repository that has the new backend.

Using the retriever

Once the PAW has been rebuild with the custom backend, the retriever can be adressed via the mordred config file, and it’s datasources annotated like any other backend.