Using → Ingestor Guide

This guide will attempt to outline the use of the Open XDMoD ingestor command line utility. The ingestor is responsible for preparing data that has already been loaded by the shredder into the Open XDMoD databases so that is can be queried by the Open XDMoD portal. This process also includes aggregating the data in the Open XDMoD database to increase the performance of the queries performed by the Open XDMoD portal.

General Usage

By default, the ingestor with process new job data entered into the Open XDMoD database whose end times are within the past 7 days.

$ xdmod-ingestor

The ingestor should be run after you have shredded your data. If you have multiple clusters, you may run the shredder multiple times followed by a single use of the ingestor.

Start and End Date

If you have changed any data in the Open XDMoD database it is necessary to re-ingest that data. This can be accomplished by specifying a start and end date, formatted as YYYY-MM-DD, that include the dates associated with the modified data.

$ xdmod-ingestor --start-date *start-date* --end-date *end-date*

Last Modified Start Date

When aggregating data use this date as the basis of what jobs to include. Only jobs ingested on or after this date will be aggregated. This defaults to the start of the ingest and aggregation process.

$ xdmod-ingestor --last-modified-start-date *date*

The value specified for the date must be an ISO 8601 date or date and time (e.g. “2019-01-01” or “2019-01-01 12:00:00”).

Advanced Usage

The ingestor may be set to only ingest specific realms or time frames. You must also set the last modified start date for aggregation to work properly.

Jobs:

The following is an example of only aggregating the jobs realm.

Set timestamp:

$ last_modified_start_date=$(date +'%F %T')

Ingest shredded jobs to staging table:

$ xdmod-ingestor --ingest-shredded

Ingest staging table jobs to HPcDB:

$ xdmod-ingestor --ingest-staging

Ingest all HPcDB jobs to the data warehouse:

$ xdmod-ingestor --ingest-hpcdb

Aggregate:

$ xdmod-ingestor --aggregate=job --last-modified-start-date "$last_modified_start_date"

Cloud:

If you do not have jobs data and/or wish to break down your ingestion process to exclusively ingest cloud data, you may do so as such.

You will need to specify the type of cloud data (genericcloud, openstack):

Set timestamp:

$ last_modified_start_date=$(date +'%F %T')

Ingest Generic logs:

$ xdmod-ingestor --datatype=genericcloud

Ingest OpenStack logs:

$ xdmod-ingestor --datatype=openstack

Aggregate:

$ xdmod-ingestor --aggregate=cloud --last-modified-start-date "$last_modified_start_date"

Storage:

If you do not have jobs data and/or wish to break down your ingestion process to exclusively ingest storage data, you may do so as such.

Set timestamp:

$ last_modified_start_date=$(date +'%F %T')

Ingest storage logs:

$ xdmod-ingestor --datatype=storage

Aggregate:

$ xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"

Resource Specifications:

The source of data for the Resource Specifications realm is the resource_specs.json file. This file is ingested any time xdmod-ingestor is run and the --aggregate flag is not specified. The only step needed for this realm is to aggregate the data. If you recently ingested Jobs, Storage, or Cloud data, you may have already set the $last_modified_start_date shell variable. Otherwise, you should set the last modified start date to a time before the last time xdmod-ingestor was run after you edited the resource_specs.json file.

Aggregate:

$ xdmod-ingestor --aggregate=resourcespecs --last-modified-start-date "$last_modified_start_date"

Help

To display the ingestor help text from the command line:

$ xdmod-ingestor -h

Verbose Output

By default the Open XDMoD ingestor only outputs what it considers to be warnings, errors or notices. If you would like to see informational output about what is being performed, use the verbose option:

$ xdmod-ingestor -v

Debugging output is also available:

$ xdmod-ingestor --debug