Using → Data Warehouse Export
The Data Warehouse Export feature of Open XDMoD is designed to give users
access to the raw (non-aggregate) data contained in the Open XDMoD data
warehouse. This is achieved through an interface where users submit requests
using the web portal that are fulfilled by a batch process that runs at
a scheduled time each day. When the data export is complete an email is
generated to notify the user that their data is ready. If any errors occur an
email is sent to the technical support email address configured in
portal_settings.ini
. The data export can then be downloaded using the link
contained in the email or from the web portal. After a configured time period
has elapsed the data export will be deleted from the server and the download
will no longer be available.
Configuration
There are several configuration options for the Data Warehouse Export feature.
These are set in the portal_settings.ini
file. They can be changed manually
or using the xdmod-setup
script.
; Configuration for data warehouse export functionality.
[data_warehouse_export]
; Exported data files will be stored in this directory.
export_directory = "/var/spool/xdmod/export"
; Length of time in days that files will be retained before automatic deletion.
retention_duration_days = 30
; Salt used during deidentification.
hash_salt = "..."
The directory where data files are stored is set by the export_directory
option. This must be an absolute path for a directory on a file system with
sufficient storage. The exact storage necessary will depend on how many
exports are created and the amount of data contained in the data files. If a
large quantity of data is exported it is advised to create a separate partition
to store the data files.
The time period that data files will be retained is set by the
retention_duration_days
option. This specifies the number of days that a
data file will be kept in the export directory before it is removed.
A salt may be specified by the hash_salt
option that is used when hashing
data that is configured to be anonymized. This will be set to a random value
the first time the Data Warehouse Export is configured using the xdmod-setup
script.
Exported Data Configuration
Specifying which fields are exported is configured in the
rawstatistics.d/20_jobs.json
file. For each field listed in the fields
section there is a batchExport
option that may be set to true
, false
, or
"anonymize"
. This file contains other sections and options that are used by
other features of Open XDMoD. The other sections and options in this file
should not be changed without a thorough understanding of all the different
ways this file is used.
NOTE: The format of this file will likely change in the future. If this file is modified and the format changes in a future version the file must be manually updated to use the new format and any changes must be re-applied. The Job Performance module uses a different format and that is also expected to change in the future.
This is an example of the general structure of this file and the location of
the batchExport
option:
{
...
"Jobs": {
...
"fields": [
{
...
"batchExport": true
},
...
]
}
}
Data Export Batch Process
Data export requests are fulfilled by a batch process that is run nightly via
cron. The cron job is scheduled in the file /etc/cron.d/xdmod
. This file
may be modified to alter the schedule for the job. The command used to
generate the data export files is batch_export_manager.php
. If you suspect
that there is a problem with the export process the following command may be
run by the xdmod
user to produce debugging output:
/usr/lib64/xdmod/batch_export_manager.php --debug