Comparing CHAID, CARD, and QUEST Algorithms for Building Decision Trees

Decision tree is a popular machine learning technique that is used to solve classification and regression problems.  The three most popular algorithm choices that are available when you are running a decision tree are QUEST, CHAID, and CART.

When you are working on a classification problem (dealing with categorical dependent variable), any of the three algorithms can be used. Though QUEST algorithm is generally faster compared to the other two algorithms, however, when dealing with large datasets, where the memory requirements are usually larger, its performance is slow. Therefore, when there is very large data sets it may be impractical to use QUEST algorithm for classification.

On the other hand, QUEST algorithm is not applicable when you are working on regression-type problem (dealing with continuous categorical dependent variable). We can only use CART and CHAID for regression type problem. The CHAID algorithm will generate non-binary trees, which tends to be wider, and this is the primary reason that makes CHAID method popular for market research applications. In addition, the CHAID algorithm yields many terminal nodes that are connected to a single branch, which can be easily presented in form of a simple two-way table with multiple categories for each variable or dimension of the table. This is particularly useful in market research problems, such as market segmentation, for example, it may yield a split on a variable Household Income, dividing that variable into 4 categories and members in each of these groups will be different with respect to some important behavioral variable.

Lastly, the CART algorithm will always yield binary trees, which are sometimes not efficiently for interpretation and/or presentation.

If you ask about the predictive accuracy of these algorithms, it will be difficult to comment and make any recommendations. For practical purpose, it will be best to apply different algorithms, and then compare their performance to decide on the best performing algorithm based on the prediction errors.

Comments are closed.