In the course of the project, the approach for de-duplicating address mass data is conceived and implemented with the help of the Teradata Aster Big Analytics Appliance.
Application of the Teradata Aster analysis platform using a Big Data challenge as an example
On the basis of a set of non-standardized, partially incomplete address data, a matching method is developed to de-duplicate addresses. This type of problem is referred to in the literature as duplicate detection/object identification (record linkage). The aim is to test the performance of the Teradata Aster analysis platform using a big data problem as an example. Verification is provided that address matching can be implemented using the Teradata Aster's scope of functions. The quality of the matching is further improved by implementing new functions.