ROOT 9 is a supervised system for the classification of hypernyms, co - hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016) . It relies on a Random Forest algorithm and nine unsupervised corpus - based features. We evaluate it with a 10 - fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts - Of - Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT 9 achieves an F1 score of 90 . 7 %, against a baseline of 57. 2 % ( vector cosine ). When the classification is binary, ROOT 9 achieves the following results against the baseline: hypernyms - co - hyponyms 9 5 . 7 % vs. 69.8 %, hypernyms - random 9 1 . 8 % vs. 64.1 % and co - hypon yms - random 97. 8 % vs. 79.4 %. In order to compare the performance with the state - of - the - art, we have also evaluated ROOT 9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system lear ns the semantic relation or it simply learn s the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely , even though ROOT9 can be trained on negative examples (i.e. , switched hypernyms) to drastically r educe th is bias .
Nine Features in a Random Forest to Learn Taxonomical Semantic Relations
LENCI, ALESSANDROCo-primo
;
2016-01-01
Abstract
ROOT 9 is a supervised system for the classification of hypernyms, co - hyponyms and random words that is derived from the already introduced ROOT13 (Santus et al., 2016) . It relies on a Random Forest algorithm and nine unsupervised corpus - based features. We evaluate it with a 10 - fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts - Of - Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT 9 achieves an F1 score of 90 . 7 %, against a baseline of 57. 2 % ( vector cosine ). When the classification is binary, ROOT 9 achieves the following results against the baseline: hypernyms - co - hyponyms 9 5 . 7 % vs. 69.8 %, hypernyms - random 9 1 . 8 % vs. 64.1 % and co - hypon yms - random 97. 8 % vs. 79.4 %. In order to compare the performance with the state - of - the - art, we have also evaluated ROOT 9 in subsets of the Weeds et al. (2014) datasets, proving that it is in fact competitive. Finally, we investigated whether the system lear ns the semantic relation or it simply learn s the prototypical hypernyms, as claimed by Levy et al. (2015). The second possibility seems to be the most likely , even though ROOT9 can be trained on negative examples (i.e. , switched hypernyms) to drastically r educe th is bias .I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.