Using → Storage Metrics

The Storage Realm provides metrics relating to storage subsystems including local disk and network attached storage. Each individual storage subsystem is treated as a separate storage resource (e.g., GPFS, Isilon, NFS, Lustre, etc.) This provides a mechanism for tracking utilization for a single storage resource as well as an aggregate across resources and allows for viewing this data by mount point, department, PI/project, and user. In addition, storage metrics can be plotted alongside data from other realms such as Job Accounting and Job Performance.

The data currently required to provide these metrics is described below in Input Format and are typically collected from the quota system on these storage resource. Detailed information such as access and modification times of individual files is not currently supported as the collection of this information is meta-data intensive and can adversely affect the performance of the filesystem.

These instructions use the file paths from the RPM installation. If you’ve installed from source they will need to be adjusted accordingly.

Input Format

Storage data must be formatted in JSON files and these files must use the .json file extension (e.g. 2019-01-01.json). These files will be validated against the JSON Schema /etc/xdmod/etl/etl_schemas.d/storage/usage.schema.json.

NOTE: The thresholds and usage numbers are all measured in bytes. Mountpoint names are currently limited to 255 characters.

Input Fields

  • resource - Storage resource name.
  • mountpoint - File system mountpoint.
  • user - User system username.
  • pi - PI system username.
  • dt - Date and time data was collected. Must be in RFC 3339 format (e.g. 2017-01-01T00:00:00Z). Must be UTC.
  • soft_threshold - Quota soft threshold measured in bytes.
  • hard_threshold - Quota hard threshold measured in bytes.
  • file_count - Number of files.
  • logical_usage - Logical usage measured in bytes.
  • physical_usage - Physical usage measured in bytes.

Example

[
    {
        "resource": "nfs",
        "mountpoint": "/home",
        "user": "jdoe",
        "pi": "pi_username",
        "dt": "2017-01-01T00:00:00Z",
        "soft_threshold": 1000000,
        "hard_threshold": 1200000,
        "file_count": 10000,
        "logical_usage": 100000,
        "physical_usage": 100000
    },

    ...
]

Setup

Add Storage Resource

Add a storage resource using the xdmod-setup script or by manually modifying /etc/xdmod/resources.json.

The resource name (also referred to as the resource code; not the formal name) must then be used in the JSON storage input files described above.

Data Ingestion

Storage data is shredded and ingested using the xdmod-shredder and xdmod-ingestor commands. Please see their respective guides for further information.

All of the following commands must be executed in the order specified below to fully ingest storage data into the data warehouse.

Ingest all files in the /path/to/storage/logs directory:

$ xdmod-shredder -f storage -r resource-name -d /path/to/storage/logs

NOTE: The above command will ingest all files in the /path/to/storage/logs directory even if they have already been ingested.

Ingest and aggregate data:

$ last_modified_start_date=$(date +'%F %T')
$ xdmod-ingestor --datatype storage
$ xdmod-ingestor --aggregate=storage --last-modified-start-date "$last_modified_start_date"