Skip to content

Cleansing Rules

The Cleansing Rules feature allows you to standardize and clean your data by defining rules that transform and validate columns. You can view existing rules, create new ones, and configure how data is processed before being stored or used.


Cleansing Rules List

On the list page, you can:

  • View all existing cleansing rules associated with your assigned projects.
  • See details such as rule name, associated project, source and target tables.
  • Edit or delete existing rules using the corresponding action links.

Screenshot: Cleansing Rules List

Cleansing Rules List


Create / Edit Cleansing Rule

When creating or editing a cleansing rule, you can define how data in specific columns should be processed and stored.

Screenshot: New Cleansing Rule

New Cleansing Rule


Fields

  • Name – Enter a descriptive name for the cleansing rule.
  • Project – Select the project where this rule will be applied.
  • Datasource – Select the source connection where the data resides.
  • Table – Choose the source table for which the rule applies.
  • Target Table – Choose the destination table where the cleansed data will be stored.
  • Primary Keys – Select one or more primary key columns used to identify records.

Columns for Cleansing

Once you select the table, all its columns are listed below. For each column, you can apply different cleansing rules.

Screenshot: Columns for Cleansing

Columns for Cleansing

Column Attributes

  • Column Name – The name of the column available for cleansing.
  • Type – The data type or category assigned to the column.
  • Target Column Name – The column where the updated or transformed data will be stored.
  • Enabled – Toggle whether the rule is active for this column.
  • Rules – View or apply rules like remove spaces, case transformation, split values, and more.

Rule Categories

Each column can have multiple cleansing rules applied to it, grouped into the following categories:

✅ Remove

Remove unwanted characters from the data.

Options include:

  • Trailing Spaces
  • All Spaces
  • Leading Spaces
  • Reduce Multiple Spaces
  • Commas
  • Dots
  • Hyphens
  • Apostrophes
  • Double Apostrophes
  • Letters
  • Numbers
  • Symbols
  • Non-Printables
  • New Line
  • Tab Character
  • Reduce Non-Alpha Numerics
  • Other Characters (custom input)

✅ Standardize

Standardize the case of data entries.

Options include:

  • None
  • Upper Case
  • Lower Case
  • Proper Case

✅ Split

Split data into multiple parts based on defined delimiters.

Options include:

  • Split (toggle on/off)

✅ Convert

Replace specific values based on conditions.

Fields include:

  • Condition – Select from predefined conditions.
  • Value to replace – Enter the current value that needs to be replaced.
  • New Value – Enter the new value to apply.

You can add multiple conditions for a column.

✅ Validate

Validate data entries to ensure data integrity and correctness.

Fields include:

  • Metric Name – Select the validation metric such as Completeness, Validity, etc.
  • Condition – Define the condition that the data should meet (e.g., equal to, greater than, less than).
  • Value to Validate – Enter the reference value that the condition will be checked against.

This allows you to enforce rules that data entries must comply with, ensuring that only valid data is processed or stored.


How it Works

  1. After selecting the source table, its columns are listed.
  2. Click on a column to open the rule configuration panel.
  3. Expand each rule category and configure options as needed.
  4. Apply transformations and validations based on data requirements.
  5. Select the target column where the cleansed data will be stored.
  6. Save the rule for use in workflows.

By using Cleansing Rules, you can enhance data consistency and reliability across your projects without writing custom scripts, ensuring clean and standardized datasets for downstream processing.