Skip to content

Safely Updating the Configuration of Ingest Pipelines


Ingest pipelines are used to modify data before they are stored in Elasticsearch. These modifications may include e. g. changing printer names to lower case, anonymizing data, and other.

This chapter provides a concept for safely updating ingest pipelines. Aim is to combine SEAL-specific and customer-specific pipelines in a way that allows the SEAL-pipelines to be updated without overwriting the customer-specific parts.


Implementation

Components

The concept includes the following components to assign an update-safe interface that embeds customer-specific pipelines to an index:

  1. A component template to define the pipelines with.

  2. An index template to which the component template is assigned.

    If according to this index template a new index is created, the pipeline from the component template is linked to this index.

  3. A Pipeline consisting of three sub-pipelines that are passed through successively.

    • The middle pipeline contains the SEAL-specific part and is named with the work postfix.

    • The other two pipelines contain before and after interfaces for customer-specific parts and are named accordingly with either before or after postfixes.

      Besides these pipelines carry the custom keyword in their names. Pipelines with this keyword in the name are only uploaded to Elasticsearch by the load-config script, if they do not yet exist. This way only an initial SEAL version is uploaded that can be customized and will not be overwritten by later updates.

      You can configure customer-specific pipelines manually or via script, as you do when anonimizing pipelines.

All necessary components are provided in the form of .JSON snippets and uploaded to Elasticsearch by means of the load-config script.

If you execute the script in a new installation for the very first time, the components that need to be set up do not yet exist. Therefore all component templates and predefined pipelines are uploaded. Afterwards, especially in case of an update, the script uploads only components that are either allowed to be updated or forced to be updated using the overwrite mode:

  1. Component Templates are always overwritten, i.e. you must not change the templates in a customer's installation. Otherwise the customization gets lost with the next update, because it is overwritten with the current SEAL version.

  2. Customer-specific pipelines with custom in their name are only created. Afterwards the load-config script does never overwrite them. Only if you delete the pilelines manually in the Kibana user interfade, they will be recreated according to the present JSON snippets.

  3. Other pipelines are always overwritten. These are the SEAL-specific pipelines that are to be replaced by improved versions during an update.

  4. Index templates are always overwritten. You do not need the overwrite mode any more.


In Case of Errors

For all pipelines, you can use a processor for the error handling. The current procedure stipulates that you move the affected record into another index, commomly called dead-letter-index.

Filebeat uses the seal-plossys-5-unmapped index for this purpose. Use this one here as well.


Troubleshooting: List of Passed Through Pipelines

All new pipelines for PLOSSYS 5 have been enhanced by a mechanism, so that each one adds its indentifier to the pipelines_passed_through field:

  • The previous SEAL-specific seal-plossys-5-lowercase-printers pipeline adds its lowercase-printers indentifier..

  • The new SEAL-specific seal-plossys-5-avoid-type-mismatches pipeline adds its avoid-type-mismatches indentifier.

This allows a quick check, whether the affected pipeline has been passed through, if an error occurs.

The order of the entries corresponds to the order of passing through the pipelines.

Index Inhalt von pipelines_passed_through
seal-plossys-5-log log-toplevel, log-before, log-work, lowercase-printers, avoid-type-mismatches, log-after
seal-plossys-5-accounting accounting-toplevel, accounting-before, accounting-work, accounting-after
seal-plossys-5-audit audit-toplevel
seal-plossys-5-statistics statistics-toplevel

Below you will find screenshots matching the values of the above table:

  • standard logging index for PLOSSYS 5, seal-plossys-5-log:

    Log Pipelines

  • accounting index for PLOSSYS 5, seal-plossys-5-accounting:

    Accounting Pipelines

  • audit index for PLOSSYS 5, seal-plossys-5-audit:

    Audit Pipelines

  • statistics index for PLOSSYS 5, seal-plossys-5-statistics:

    Statistics Pipelines


Deeper Insights

The new pipeline concept has been implemented only where it was needed, i. e. in

  • the standard logging index for PLOSSYS 5, seal-plossys-5-log

  • the accounting index for PLOSSYS 5, seal-plossys-5-accounting

    Here we do not yet have a SEAL-specific pipeline, but there are customer-specific adjustments concerning an anonymizing pipeline. The passing through of this pipeline must not be removed by an update.

The concept has been rudimentarily prepared for

  • the audit index

  • the statistics index

If you need to set up an ingest pipeline for another index, e. g. DPF, Operator, ..., we recommend you use the described pipeline configuration as template. You should structure the nesting of the base pipelines and the adding of the pipeline identifier in the pipelines_passed_through field in the same way as described above.

The following examples are provided in the configuration directory as JSON snippets for load-config.


The Pipelines

  • All pipelines are set up in a way that in case of error the data are moved to the seal-plossys-5-unmapped index.

  • All processor are usually set up in a way that an error that occurred is not to be ignored.


The Top Pipeline

The description uses the following pipeline as example: seal-plossys-5-accounting

The top pipeline

  • looks the same for any index, except of the names.

  • skips the corresponding processor, if you specify "false" for the "if" item.

  • contains the pipelines below that are passed through in the given order.

  • is defined in configuration/seal-plossys-5/pipeline/accounting.json:

    {
      "description": "Default toplevel pipeline for corrections to data used as input for seal-plossys-5-accounting index",
      "processors": [
        {
          "append": {
            "field": "pipelines_passed_through",
            "value": [
              "accounting-toplevel"
            ],
            "if": "true",
            "description": "add pipeline accounting-toplevel"
          }
        },
        {
          "pipeline": {
            "name": "seal-plossys-5-accounting-custom-before",
            "if": "true",
            "tag": "before"
          }
        },
        {
          "pipeline": {
            "name": "seal-plossys-5-accounting-work",
            "if": "true",
            "tag": "work"
          }
        },
        {
          "pipeline": {
            "name": "seal-plossys-5-accounting-custom-after",
            "if": "true",
            "tag": "after"
          }
        }
      ],
      "on_failure": [
        {
          "set": {
            "field": "_index",
            "value": "seal-plossys-5-unmapped"
          }
        }
      ]
    }
    

Customer-Specific "before" Pipeline

The description uses the following pipeline as example: seal-plossys-5-accounting-custom-before

The customer-specific before pipeline

  • is passed through by the SEAL-specific pipeline.

  • is not overwritten by an update, because of the custom keyword in its name.

  • is able to handle things like flags that control the SEAL-specific part, as this pipeline is the first one thas is passed through.

  • contains the customer-specific, installable anonymization pipeline in the example.

    • As the "if" item is set to "false", a pipeline of this name does not have to exist. It does not cause an error, because the processor is skipped.

    • As soon as you have configured your customer-specific anonymization pipeline, you can activate the flag by setting the "if" item to "true".

  • is defined in configuration/seal-plossys-5/pipeline/accounting-custom-before.json:

    {
      "description": "Custom pipeline for corrections to data before indispensable pipeline seal-plossys-5-accounting-work",
      "processors": [
        {
          "append": {
            "field": "pipelines_passed_through",
            "value": [
              "accounting-before"
            ],
            "if": "true",
            "description": "add pipeline accounting-before"
          }
        },
        {
          "pipeline": {
            "name": "anonymization",
            "if": "false"
          }
        }
      ],
      "on_failure": [
        {
          "set": {
            "field": "_index",
            "value": "seal-plossys-5-unmapped"
          }
        }
      ]
    }
    

SEAL-Specific Pipeline

The description uses the following pipeline as example: seal-plossys-5-accounting-work

The SEAL-specific pipeline

  • contains the processing steps, which SEAL requires for the data consistency and other things.

  • is replaced by the current version during an update.

  • contains a list of processors, that generally are to be passed through.

    In most cases the type of incoming data decides, whether a processor is passed through or not. It is also possible to trigger other things, e. g. retrieving flags defined in the before pipeline.

  • is defined in configuration/seal-plossys-5/pipeline/accounting-work.json:

    {
      "description": "Pipeline for corrections to data used as input for seal-plossys-5-accounting index",
      "processors": [
        {
          "append": {
            "field": "pipelines_passed_through",
            "value": [
              "accounting-work"
            ],
            "if": "true",
            "description": "add pipeline accounting-work"
          }
        },
        {
          "rename": {
            "field": "accounting.principals.deviceID",
            "target_field": "accounting.principals.endpoint",
            "ignore_missing": true,
            "tag": "rename_deviceID_to_endpoint",
            "description": "rename deprecated deviceID to endpoint"
          }
        },
        {
          "pipeline": {
            "name": "not-existing-pipeline-example",
            "if": "false"
          }
        }
      ],
      "on_failure": [
        {
          "set": {
            "field": "_index",
            "value": "seal-plossys-5-unmapped"
          }
        }
      ]
    }
    

Customer-Specific "after" Pipeline

The description uses the following pipeline as example: seal-plossys-5-accounting-custom-after

The customer-specific after pipeline

  • is passed through after the SEAL-specific pipeline.

  • is not overwritten by an update, because of the custom keyword in its name.

  • can be controlled by corresponding settings, i. e. the processors are passed through always or depending on the type of the incoming data or flags in the previous pipelines.

  • is defined in configuration/seal-plossys-5/pipeline/accounting-custom-after.json:

    {
      "description": "Custom pipeline for corrections to data after indispensable pipeline seal-plossys-5-accounting-work",
      "processors": [
        {
          "append": {
            "field": "pipelines_passed_through",
            "value": [
              "accounting-after"
            ],
            "if": "true",
            "description": "add pipeline accounting-after"
          }
        },
        {
          "pipeline": {
            "name": "not-existing-pipeline-example",
            "if": "false"
          }
        }
      ],
      "on_failure": [
        {
          "set": {
            "field": "_index",
            "value": "seal-plossys-5-unmapped"
          }
        }
      ]
    }
    

The Templates

Component templates are parts of the configuration that are used by index templates.


Component Template for an Accounting Pipeline

The description uses the following pipeline as example: seal-plossys-5-accounting-pipeline

The seal-plossys-5-accounting top pipeline

  • is integrated under default_pipeline item as default pipeline. The default pipeline is passed through early enough to allow data sets to be moved into another index.

  • is defined in configuration/seal-plossys-5/index-template/components/accounting-pipeline.json:

    {
      "template": {
        "settings": {
          "index": {
            "default_pipeline": "seal-plossys-5-accounting"
          }
        }
      }
    }
    

Integrating the Component Templates into the Index Template

The description uses the following pipeline as example: seal-plossys-5-accounting

  • This is just an excerpt of the actual JSON snippet.

  • The component template is specified under the composed_of item in configuration/seal-plossys-5/index-template/accounting.json:

    {
      "index_patterns": [
        "seal-plossys-5-accounting-*"
      ],
      "composed_of": ["seal-plossys-5-accounting-pipeline"],
      "priority": 160,
    ...
    }
    

List of Passed Through Pipelines

Index Cntent of pipelines_passed_through Hint
seal-plossys-5-accounting accounting-toplevel, accounting-before, accounting-work, accounting-after
seal-plossys-5-accounting accounting-toplevel, accounting-before, anopnymization, accounting-work, accounting-after Idea for the future: An anonymizing pipeline installed and configured so that it adds its idintifier here.

Back to top