21.48. validation - Validation

The following documentation is for Validation (validation) content package at version v4.8.0-alpha00.25+g8e224a5bec085c80968ec27e0332c339a65d0659.

The Validation System allows an operator to flexibly define a set of tasks to do validation operations. Validation allows an operator to execute tests to verify that certain conditions have been met either in the environment or as completed tasks by other workflow steps. This allows for a composable set of rules to be run to either verify the system has been provisioned successfully or an error may have occured that needs addressed.

The validation system is designed to be flexibly extended in the field. The current set of Staes, Library functions, and Tasks are curated capabilities by RackN. Custom Stages and Tasks can be flexibly added to allow for unique use cases.

If you have developed a set of Tasks and Library functions that may be useful to other operators, please consider contributing back to the content pack to enhance it for other users.

21.48.1. Prebuilt Stages

The validation system provides three prebuilt base sets of validation stages to run the validation engine in:

  1. validation-post-discover
  2. validation-post-hardware
  3. validation-post-install

The intent is to have a set of tasks that validate after the initial discover (gohai-inventory, inventory, or other “discover” related tasks), post hardware configuration (typically after BIOS settings, Firmware Flash, and RAID volume creation), and in the final installed operating system.

The validation system is run in Workflow as a standard Stage allowing the operator to place the stage anywhere in the Workflow that makes sense. The above three pre-built stages define RackN defined useful points during the provisioning lifecycle of a _Machine_.

The validation-start and validation-stop tasks are the only tasks that should be listed in the Stage, do not add Tasks to the Stage. Tasks will dynamically be added based on the control Paramters.

21.48.2. Defining Validation Tasks

Control over which specific validation tasks are executed at any given validation stage is defined by the Param validation/list-parameter. This parameter is a reference to the Param name that the validation-start task uses to dynamically update the task list with a list of validation tasks.

The Tasks that are defined in the parameter (specified by the validation/list-parameter) will be composed and added to the task list. These will be dynamically inserted between the validation-start and validation-stop tasks.

Composed means the value of that parameter is the aggregation of all occurances of that parameter at all levels of the system, (e.g. From the parameters on the machine, then profiles on machines, then parameters on the stage, …). This allows multiple resources to provide the validation tests that will eventually be run on the system. Typically, Parameter orders of precedence would override earlier occurances of the Paramter values. Validation does not follow that standard path.

The Param that is referenced by validation/list-parameter should be a simple array type Param. It lists each Task that should be executed in the listed sequence by the validation process at that Stage in the Workflow.

21.48.3. Task Exit Conditions

Validation actions can exit one of several ways:

  1. validation-success = validation succeeded, exit 0
  2. validation-fail-immediately = fail and exit 1, stopping workflow immediately at task
  3. validation-fail-at-stage-end = exit 0, collect logs in validation/errors param, stop at end of Stage
  4. validation-fail-and-ignore = log validation error to validation/errors-ignored, but exit 0 - no workflow stop will occur at all

futures return conditions (not yet implemented): validation-fail-and-remediate = exit 1, and if the task specifies an additional remediation task, run this new task, plus ourself again - hoping the remediation task solved the issue - may need number tracking and break the cycle after N failures

21.48.4. Remediating and Continuing Failed Tasks

The failure exit conditions allow the developer of the validation task to determine if the failure is a “soft” failure which can be remediated, and then the Workflow can be resumed, or if the failure implies a “hard” fail that can not be corrected during workflow operation. Additionally, some failures may not impact final workflow completion, but the operator would like to have them recorded for future review or remediation.

If a failure that is external to machine and can be remediated (for example a DNS record has the wrong information, and DNS can be updated quickly) the test should be validation-fail-immediately. This allows the failure to be remediate and Workflow can be continued without failing validation.

If the failure is not able to be remediated immediately, a log message should be added to the validation/errors logging Param, and success returned so that additional issues can be found and aggregated.

21.48.5. Example Usage

Here is a brief outline of example usage of the validation system utilizing the RackN defined stages. In this example, we will define tasks to run in the validation-post-discovery and validation-post-hardware stages. Most of the Tasks referenced are valid tasks that exist across several content and plugin components.

21.48.5.1. Workflow

In this example we will create a “my-complete-discovery” workflow that utilizes several components of the Digital Rebar content and plugins capabity.

Note

Note the addition of the ‘validation-post-discovery’ and ‘validation-post-hardware’ Stages, these define the point in the workflow where the validation process will run.

Meta:
  color: grey
  icon: money
Name: my-complete-discovery
Description: My complete discovery workflow
Stages:
  - discover
  - setup-repos
  - ipmi-inventory
  - raid-inventory
  - network-lldp
  - inventory
  - rack-discover
  - classify
  - validation-post-discover
  - ipmi-configure
  - flash
  - raid-enable-encryption
  - raid-configure
  - bios-configure
  - ilo-config
  - burnin
  - burnin-reboot
  - validation-post-hardware
  - notify-hardware-complete
  - sledgehammer-wait

21.48.5.2. Stages

In this example, we are not creating any new stages, simply using the RackN defined stages in the above workflow example. Please review the “Adding New Stages” section in the below “Developing New Validation Capabilities” section if you’d like to create custom stages.

21.48.5.3. Tasks

There are several tasks that are defeind to be used by the Validation system in this example. You can find them in the following profile section. Note that several of these tasks do exist in various content and plugins. Each of these tasks would need to exist and perform the required functions.

21.48.5.4. Define the Tasks for Validation

In this example, we will utilize a profile which contains the defined Params for the validation-post-discover and validation-post-hardware stages. The operator can apply the Profile to the appropriate machines, you could potentially use the classify stage to execute a classification task to determine whether or not to programmatically apply the Profile to the given machine as it executes the classify stage.

The profile defines the tasks to run. The profile YAML configuration would look like the following:

---
Meta:
  color: grey
  icon: money
Name: validations
Params:
  validation/post-discover:
    - "validation-machine-name-dns-to-ip"
    - "in-subnet-check-render"
    - "in-subnet-check-validate"
    - "ipmi-network-validation"
    - "validate-nic-counts"
    - "validate-jumbo-frames"
  validation/post-hardware:
    - "validate-ipmi-hostname-ip"

The type definition for the Params validation/post-discover are defined in the Validation content pack, and the appropriate stage contains the control mapping for each.

For the validation-post-discover Stage, the stage carries the mapping within the stage, as follows:

Params:
  "validation/list-parameter": "validation/post-discover"

Similarly, the validation-post-hardware stage defines the mapping to point to the Param validation/post-discover.

These are the values we use in our profile above to add the list of Tasks to execute at the appropriate Stage run.

21.48.5.5. Operating The Validation

Once you have completed the above tasks, you can choose to make the my-complete-discovery Workflow your default preference setting for all new machines, or manually choose to place machines in to the Workflow.

To make this the default workflow, change your preferences (from Info & Preferencess in the Portal), or the following CLI command:

  • drpcli prefs set defaultWorkflow my-complete-discovery

21.48.5.6. Reviewing Any Failures

As with all jobs executed in Workflows, there will be a job log available to review the status output of any tasks that have run. If you receive a Validation failure, refer to the appropriate job log.

If the failure includes the use of the add_validation_error function, you can review the Param value that was added to the Machine, with the name validation/errors. The Param will be added to the machine that has failed the validation process.

The next run of any validation stage will empty the contents of this Param, prior to starting executing the tasks. Please insure that you review the Para error values after a failure, but before you re-run any more validation stages.

It is required to include the {{ template "setup.tmpl" .}} in each of the validation tasks. With the inclusion of this template, you can set the rs-debug-param on a Machine, and the validation tasks will contain a lot more debug output in the job log. If you need to verify/debug the actions in the validation task, this is a good way to review more detailed output logging information.

21.48.5.7. Skipping Validation Without Changing Your Workflow

The validation system was designed to allow you to add it to the Workflow, and if no validation tasks are defined (by use of the reference validation/list-parameter Param), then the system will skip any validation attempts for the given stage.

This allows you to leave the Stage defined control parameter blank and not require changing the Workflow. This effectively becomes a “noop” of the validation system.

21.48.6. Developing New Validation Capabilities

This section provides information on how to build custom validation capabilities utilizing the existing Validation system.

21.48.6.1. Namespace Definitions

Validation system stages, control params, and other content parts will utilize the prefix namespace of validation/. Individual Tasks, Params, etc. that implement functional use of the Validation system components will utilize the prefix namespace of validate/.

For example, the Start Stage for the Validation system is validation-start, while the NIC Count and UP link state checking uses a Task namespaced as validate-nic-counts.

Standard Stage/Task and Param semantics still apply (eg validate/nic-required-up is the Param namespace name).

Note that traditionally, RackN uses the “slash” naming standard for Params, while Stages, Taks, and Templates generally utilize a “dash” to separate the parent namespace. Don’t ask us why.

21.48.6.2. Library of Functions

The Validation System provides a template which contains a library of currated tools for the system; the template is called validation-lib.tmpl. Only Bash functions which implement broadly useful capabilities are placed in this template, which is managed by RackN.

If your validation tools require a common library, they can be built as a standard template, and included in any appropriate custom tasks.

Current functions that can be incorporated in to your Validation custom tasks are as follows:

  • validation_add_error() = Adds a validation error to the validation/errors Param
  • validation_add_error_ignore() = Adds a validation error to the validation/errors-ignore Param
  • validation_clear_errors_ignore() = Removes the validation/errors-ignore Param from the Machine
  • validation_msg_prefix() = builds a prefix of Stage/Task/CurrentJob for output messages
  • validation_success() = marks a validation task successful and exits 0
  • validation_fail_at_stage_end() = marks a validation task failed, but exits 0, and ultimately exits the Stage with exit code 1
  • validation_fail_immediately() = marks validation task failed and exits 1 immediately
  • validation_fail_and_ignore() = marks validation as failed, but ignores the failure without exiting the workflow
  • #validation_fail_and_remediate() = Not implemented yet
  • validate_machine_name_dns_by_ip() = verifies the machine has a DNS record based on the Machines IP address
  • validate_ipv4_ip_syntax() = simple helper to verify string is in IPv4 dotted quad notation
  • validate_check_same_subnet() = verifies that given 2 ip address and a subnet mask, both addresses are in the same subnet network
  • validate_ping_dest_from_src() = verifies that Machine can ping an IP address given a source interface to use
  • validate_nic_counts() = verifies the number of NICs and number of NICs that can be brought to an “UP” state

21.48.6.3. Adding New Stages

Stages define the primary grouping of Tasks that an operator runs at a given point in the Workflow sequence. These stages can be placed anywhere that makes operational sense to a given Workflow.

The Stage should only contain the validation-start and validation-stop tasks, no other tasks should be added to your stage, as the validation-start stage will dynamically inject the desired stages in to the Workflow for the operator.

Example Stage in YAML:

---
Name: "my-validation"
Description: "Perform tasks defined by the 'my-validation' Param."
Documentation: |
  The Param 'my-validation' is an array that will contain
  the list of validation tasks to run.
Params:
  "validation/list-parameter": "my-validation"
Tasks:
  - "validation-start"
  - "validation-stop"
Meta:
  color: "orange"
  icon: "search"
  title: "RackN Content"

Note that as the Documentation field says; the validation/list-parameter for this Stage is defined as my-validation. This Parameter must be defined as an array; which is a list of the Tasks to execute during this Stage. The Param can be defined on the Machine in all of the normal ways (directly as a Param, as part of a Profile, or as a Param in the Global Profile.

21.48.6.4. Adding New Tasks

Tasks are the heart of the validation system, and perform the actual validation implementation for the system to execute. Validation tasks are standard RackN Workflow Tasks, and can carry embedded templates, or refer to external templates to perform the actual Task(s) defined.

A validation task should include both the setup.tmpl template and the validation-lib.tmpl template. Both templates are required for a successful validation task.

Validation tasks are only limited by the requirements (and perhaps creativity) of the author. They should be individual and discreet items that the system should check for and return results on. Please review the “Task Exit Conditions” above to insure you handle exit codes correctly for the system.

Here is an example Task that implements a hypothetical validation function.

---
Name: "validation-test-fail"
Description: "Example failing validation task"
Meta:
  color: "blue"
  icon: "bug"
  title: "RackN Content"
  feature-flags: "sane-exit-codes"
Templates:
- Name: "validation_fail.sh"
  Contents: |
    #!/usr/bin/env bash
    {{ template "setup.tmpl" . }}
    {{ template "validation-lib.tmpl" . }}

    {{ if eq (.Param "margarita-time" ) true }}
      echo "This is a failure.  Calling add_validation_error"
      add_validation_error "Margarita time! We'll finish provisioning later."
      exit 0
    {{ else -}}
      echo "Sadly, it's not margarita time.  Provision on!"
      exit 0
    {{ end -}}

In this example, some other component or process would be responsible for setting the check Param margarita-time to either true or false.

21.48.6.5. Utilizing the Error Logging Param

The library also contains a helper routine to add errors to the
validation/errors logging Param. The Bash function name is add_validation_error which should be called with a single string which contains the text of the error message. This function can safely called multiple times, and each subsequent error message will be appended to the validation/errors array.

21.48.6.6. Separating Validate Tasks from the Validation System

Validation tasks or extensions can be built and used and added to other Content Packs or Plugins. The validation system itself is a framework for executing and providing the method to run and manage the validation process.

Examples of validation in use can be found throughout some of the other RackN managed content and plugin systems. For example, review the VMware Plugin content:

For a validation task that is defined and used outside of the Validation content pack.

21.48.7. Object Specific Documentation

21.48.7.1. params

The content package provides the following params.

21.48.7.1.1. validate/nic-required-up

Validates the number of NICs that can support an UP link state. This process does not try to configure any network stack on the interface, just verifies that the port link state can be brought UP successfully (eg there is cable and link connection to upstream switch device).

21.48.7.1.2. validate/parameters

This param defines the map of parameters to tests to run against them.

The elements of the map convert a parameter to a test.

The test structure is an operation and a list of values.

Example:

inventory/CPUs:
Op: equal Values: [ 1 ]

If inventory/CPUs exists and equals the value 1, this is successful. Otherwise it is a failure.

Operations: * equal - if any one value equals, then it returns true. * between - if the value is between the two values. Values is a list of two values. * match - if the value matches the regex value.

21.48.7.1.3. validation/errors

This param will be dynamically added to the Machine object if the Validation system encounters any errors that are recorded with the use of the helper function named add_validation_error.

It will be wiped clean prior to any validation-start task being executed.

21.48.7.1.4. validation/list-parameter

This defines the parameter that the Validation stage currently executing will use to reference the list of Tasks to execute for validation. It essentially operates as a “control pointer” to the real list, allowing for flexible usage of a single defined Param to point to a variable list of tasks.

21.48.7.1.5. validation/post-hardware

This param defines the list of Tasks to execute for validation during the validation-post-hardware stage. If left empty, all validation for the stage will be skipped.

21.48.7.1.6. validate/failure-mode

Set this param to the name of a validation failure function to specify how certain validation tasks are recorded, which will impact the final exit and continued task running.

Examples:

  • validation_add_error (default value)
  • validation_add_error_ignore

This Param is a free-form String type, as the Validation functions can be dynamically expanded, so an Enum list can not be safely provided. Ensure you specify a validation failure mode correctly in this param.

This Param should only be set to a value of a Function in the validation system that records the errors, but does not directly exit (eg no exit 0 or exit 1 type exit codes in the function).

The default will be to validation_add_error.

For instance, if this Param is set to validation_add_error_ignore, then errors will be recorded on the validation/errors-ignore Param. This Param defines the validation-stop processing should not exit with an error (eg ignore) the errors and continue workflow/task processing.

21.48.7.1.7. validate/nic-counts

Defines the minimum number of NICs that the Validation system should check for.

21.48.7.1.8. validate/record-parameters

This param defines a list of parameters to record their valies into the validate/parameter on the machine. This can be used to generate a validation structure.

This assumes that inventory stage has been run on the machine.

21.48.7.1.9. validation/errors-ignore

If validation occured, and an error is going to be ignored, this Param captures that information for further review.

This Param will not be automatically cleaned up on validation-start task run the way that validation/errors is. Instead, the logged <message> will be prepended automatically with:

Stage Name :: Task Name :: Job UUID :: <message>

If the operator wishes to wipe this value, add the validation-clear-errors-ignore task to the list of tasks to run, and the Param will be reset to an empty value at that point in the Workflow/Stage/Task.

21.48.7.1.10. validation/post-discover

This param defines the list of Tasks to execute for validation during the validation-post-discover stage. If left empty, all validation for the stage will be skipped.

21.48.7.1.11. validation/post-install

This param defines the list of Tasks to execute for validation during the validation-post-install stage. If left empty, all validation for the stage will be skipped.

21.48.7.2. stages

The content package provides the following stages.

21.48.7.2.1. validation-post-discover

This is a RackN defined validation Stage designed to run relatively soon after the Discover process has started. Typically this should run after the gohai-inventory, inventory, raid-inventory, network-lldp, or other similar “inventory” like stages, but prior to any hardware configuration or changes (BIOS, Flash, etc.).

21.48.7.2.2. validation-post-hardware

This stage is a RackN predefined stage designed to run immediately after the Machine hardware has been configured (typically things like BIOS configuration, Flash Firmware updates, RAID configuration, etc).

21.48.7.2.3. validation-post-install

This is a RackN predefined stage designed to be executed in the final installed operating system. This stage requires that the installed OS is utilizing the drp-agent, runner-service or ESXi agent-install stages to execute Worfklow in the installed OS.

21.48.7.2.4. validation-record-parameters

This stage allows for a system to record the values of an inventoried system for use in validation later. The stage will build a validate/parameters that validates the parameters defined in validate/record-parameters.

This assumes that the inventory stage has already be run.

21.48.7.3. tasks

The content package provides the following tasks.

21.48.7.3.1. validate-nic-counts

This task implements the Validation library function to verify that a given system has a minimum number of NICs in the system, and that a minimum number of those NICs can be brought to a “link UP” state.

The link up state refers only to the systems ability to bring the link UP, but does not attempt to configure any IP or network stack details on the link. This verifies cable connectivity and remote switch end is able to establish “UP” state.

21.48.7.3.2. validate-parameters

This task implements the Validation library function to verify that the parameters specified in validate/parameters match their schema and value.

This assumes that all the parameters have already been set.

21.48.7.3.3. validate-record-parameters

This task records the values from the inventory/collect parameter and stores them in the validate/parameters object format for validation. The default operation is equals.

This assumes that inventory as already been run.

21.48.7.3.4. validation-clear-errors-ignore

This task resets the validation/errors-ignore contents to an empty value.

The validation/errors-ignore Param is not reset at the beginning of each Stage run with the validation-start task. It is assumed that the operator would like to harvest the validation failure log messages, but not have them as regularly reset.

The ignored error messages will be prefixed with the Stage, Task, and Current Job UUID so they may be correctly correlated at a later time.

To use this Task, simply add it to the control Param task list whereever the operator would like the Machines Param cleared. It might make sense to add it as the very first Task in the first Validation Stage in a given Workflow.

21.48.7.3.5. validation-start

This task begins a given Validtion stage. No other tasks should preceed it on the stage, and the only other task that should follow is the validation-stop task.

Tasks will be dynamically injected in to the workflow after this task, if they have been specified by the appropriate control Params.

21.48.7.3.6. validation-stop

This task ends a given Validtion stage. No other tasks should follow it on the stage, and the only other task that should preceed is the validation-start task.

Tasks will be dynamically injected in to the workflow if they have been specified by the appropriate control Params prior to this task.