Getting Started
For ease of use, we have supplied a simple startup script to get a development instance of the PAW running. Simply run start_paw_docker.sh
to start. In order to be able to retrieve Multi-Mordred and ElasticWrap you will be asked to supply a Github Authentication Token and username. Make sure it has at least READ access to the SIGmoirelabs/multi-mordred-wrapper
and SIGmoirelabs/elasticwrap
repositories.
Additionally, you will need to generate a classic Authentication Token to retrieve the graphql_api
container. Once you have the token, run echo 'TOKEN' | docker login ghcr.io -u USERNAME --password-stdin
NOTE: The script will modify the following system values to make sure Redis and ElasticSearch are running properly:
vm.max_map_count=262144
vm.overcommit_memory=1
The script will automatically start the docker containers. If you want, you can set these values manually and simply go into the deployment
folder and run docker-compose up
.
It may take some time for all the containers to start. This is because we must ensure that redis
and mariadb
are running before we can properly start sortinghat
and multi-mordred
.
Setting up Multi-mordred
For a quickstart, you can create a deployment/mordred_data
directory and move the quickstart/setup.cfg
and quickstart/projects.json
files into it. Then after the containers are all up, run telnet localhost 15555
and run the following commands:
new demo /data/setup.cfg
start demo
This will start a mordred instance which gets data from the projects defined in projects.json
and enriches them. The raw data can be found in the demo_git
index, whereas enriched data can be found in demo_git_enriched
, demo_git_enriched_aoc
and demo_git_enriched_onion
.
Assuming you are using our example development docker-compose, you will now have 2 new directories in the deployment
folder: ES
and mordred_data
. The ES
folder simply stores all the ElasticSearch data and can be an easy way to delete data for testing purposes. The mordred_data
folder is mounted in the multi-mordred
container to /data/
. Any files you want to provide to multi-mordred can be thrown in this folder.
step-by-step: your first analysis project
So now that we have the multi-mordred container up and running we can provide it with configuration files to create a copy of mordred and perform an analysis project.
projects.json: defining datasources
The projects.json file defines all the data that can be queried by this analysis project, divided by type. The specific type string indicates which percival retriever (or “backend”) will be used te retrieve the data. While most retrievers simply retrieve data from a specific source, i.e. git vs slack vs jira, some retrievers already do some degree of analysis, or are purpose-build for specific later analysis-scripts (the retrievers introduced by arthur and graal for instance).
{
"SIGmoirelabs": {
"meta": {
"title": "ElasticWrap"
},
"git": [
"https://github.com/SIGmoirelabs/elasticwrap.git"
]
}
}
conf.cfg: defining project pramaters
The second file any mordred clone will need is the a general configuration. This file will describe the locations of and credentials to external resources, as well as describing which tasks should be executed.
[general]
short_name = GrimoireLab
update = true
min_update_delay = 60
debug = false
logs_dir = /home/sigmoire/logs
bulk_size = 1000
scroll_size = 1000
aliases_file = /home/sigmoire/aliases.json
[projects]
projects_file = /data/projects.json
[es_collection]
url = http://elasticsearch:9200
[es_enrichment]
url = http://elasticsearch:9200
autorefresh = true
for more on configuration options see: Using GrimoireLab-native Features and the official Mordred docs
helloWorld.py: adding custom scripting
One of the features added to multi-mordred over regular mordred is the ability to run arbitrary code as part of the data collection and processing pipeline. If the ‘custom’-phase is enabled in the .cfg file then a script can be specified which will be run after the build-in enrichment methods but before data visualization is configured.
[custom_script]
scriptsource = enrich.py
args = [arg1, arg2]
for more on this see: Using Custom scripts.
Starting a new mordred instance
Now that all of the required files have been provided, we can call multi-mordred to start a new mordred intsance and run the specified analysis.
$ telnet localhost 15555
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
multimordred wrapper, type commands for a list of available commands
new demo /data/conf.cfg
Created demo
start demo
Started
accessing results via graphql
Kibiter may not play nice with customized enrichment however, so the main intended method for retrieving processed data from the PAW is through the graphql API. A querrying system intended to be usable without any programming. Go to ‘http://localhost:8081/graphql’ to see the interface.
for more on this see: Using the GraphQL-API.