Versatile Methods for Validating and Transforming Data within SMART

Praveena Natarajan, Senior Consultant at Synthesis Systems Inc

SMART is a proprietary accelerator from Synthesis that helps to migrate data between different subscription systems, whether they are on your computer or in the cloud. It has a user-friendly interface, so both technical and non-technical teams can use it efficiently. 

Here are some versatile methods to for validating and transforming data within SMART

Tmap component:

It processes data efficiently from both single and multiple sources, enabling seamless transfer to single or multiple destinations. It can handle complex data with the ability to read from XML files, delimited files or database, work with expressions, and create temporary variables, and ensure smooth data transfer operations. Here are some of the features:

Joining the data sources – It supports a diverse range of match models, including unique match, first match, and all matches, as well as all rows. Additionally, it facilitates various join models, such as inner join and left outer join. The tool provides the option to determine whether to store temporary data.

When the lookup table contains a limited number of rows, the Load Once lookup model can be selected. This results in storing the Lookup data in memory and enhance the speed of the main flow execution compared to the lookup data retrieval from memory. However, if the lookup table data is large, storing in memory might lead to a java heap space exception. In such cases, enabling the “store temp data” property to True is a good way to fix this problem. It also provides option to choose Lookup model as ‘Load once’ for static data and ‘Reload at Each Row’ for dynamic data.

Output forms – It enables the creation of multiple output flows with diverse transformations, even for a single input file. Different filter conditions can be applied to each output. Additionally, using the “Catch Output Reject” option and the “Catch Lookup Inner Join Reject” property allows the capture of rejected or bad data.

Variables and Expression– If a logic needs to be applied in multiple places in output, it can be defined in the variable section. This section is utilized for a variety of string handling operations such as concatenation, length determination, converting to uppercase, replacing, trimming, or any mathematical calculation involving type conversion. Data filtering based on conditions can also be achieved using expressions. Sequence can also be generated in the variable section and defined in the outflow to uniquely identify rows.

Automatic type conversion – Tmap offers an automatic type conversion feature to avoid compilation errors during job execution when input and output columns have different data types. We can also globally define the conversion behaviour for Tmap for the future jobs.

Auto mapping – The ETL tool features an “Auto Map” button that automatically establishes mappings between input and output columns if they have exactly same column names. Additionally, it offers “Auto Map” properties settings, allowing the creation of mappings based on the similarity between input and output column names.

The match mode in the ETL tool corresponds to a blend of Levenshtein and Jaccard settings. Adjusting the Levenshtein and Jaccard slide controls directly allows for configuring a combination of these settings.

Happy Migrating your data across subscription systems faster, safer and secure.