We present an analytic evaluation of the runtime behavior of the C4.5 algorithm which highlights some efficiency improvements. Based on the analytic evaluation, we have implemented a more efficient version of the algorithm, called EC4.5. It improves on C4.5 by adopting the best among three strategies for computing the information gain of continuous attributes. All the strategies adopt a binary search of the threshold in the whole training set starting from the local threshold computed at a node. The first strategy computes the local threshold using the algorithm of C4.5, which, in particular, sorts cases by means of the quicksoft method. The second strategy also uses the algorithm of C4.5, but adopts a counting sort method. The third strategy calculates the local threshold using a main-memory version of the RainForest algorithm, which does not need sorting. Our implementation computes the same decision trees as C4.5 with a performance gain of up to five times.

Efficient C4.5

RUGGIERI, SALVATORE
2002-01-01

Abstract

We present an analytic evaluation of the runtime behavior of the C4.5 algorithm which highlights some efficiency improvements. Based on the analytic evaluation, we have implemented a more efficient version of the algorithm, called EC4.5. It improves on C4.5 by adopting the best among three strategies for computing the information gain of continuous attributes. All the strategies adopt a binary search of the threshold in the whole training set starting from the local threshold computed at a node. The first strategy computes the local threshold using the algorithm of C4.5, which, in particular, sorts cases by means of the quicksoft method. The second strategy also uses the algorithm of C4.5, but adopts a counting sort method. The third strategy calculates the local threshold using a main-memory version of the RainForest algorithm, which does not need sorting. Our implementation computes the same decision trees as C4.5 with a performance gain of up to five times.
2002
Ruggieri, Salvatore
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/70358
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 322
  • ???jsp.display-item.citation.isi??? 233
social impact