Comparing labeled trees has applications in various domains, particularly in the study of cancer phylogenies. In this paper, we address the problem of comparing fully labeled unordered trees, focusing on their structural and label similarities. We define a novel class of distance measures by exploiting the eXtended Burrows-Wheeler Transform (XBWT), an extension to labeled trees of the well-known Burrows-Wheeler Transform. The XBWT, introduced in [Ferragina et al., FOCS 2005], produces a linearization of the tree that is both compressible and efficiently searchable. We extend the definition of this linearization to pairs of trees, and we produce a partition based on the prefixes of a given length k ≥ 1 of the parent-to-root paths of the nodes. We define, for any k ≥ 1, the distance measure dk by applying the Jaccard distance to each element of this partition. We prove that all the measures dk are pseudometrics, i.e. they assume non-negative values, are zero when applied to identical trees, are symmetric, and satisfy the triangle inequality. These measures become metrics when all the labels within each tree are distinct. Furthermore, we show that these distances are sensitive to some operations on trees, such as the removal and insertion of subtrees, swapping of subtrees, and label swapping. Finally, to show the effectiveness of our approach, we have analyzed experimentally the behavior of the distances when operations on trees are applied to a randomly generated fully labeled tree. Here, we present the results obtained in the case k = 1.
Novel XBWT-based Distance Measures for Labeled Trees
Rosone G.;
2024-01-01
Abstract
Comparing labeled trees has applications in various domains, particularly in the study of cancer phylogenies. In this paper, we address the problem of comparing fully labeled unordered trees, focusing on their structural and label similarities. We define a novel class of distance measures by exploiting the eXtended Burrows-Wheeler Transform (XBWT), an extension to labeled trees of the well-known Burrows-Wheeler Transform. The XBWT, introduced in [Ferragina et al., FOCS 2005], produces a linearization of the tree that is both compressible and efficiently searchable. We extend the definition of this linearization to pairs of trees, and we produce a partition based on the prefixes of a given length k ≥ 1 of the parent-to-root paths of the nodes. We define, for any k ≥ 1, the distance measure dk by applying the Jaccard distance to each element of this partition. We prove that all the measures dk are pseudometrics, i.e. they assume non-negative values, are zero when applied to identical trees, are symmetric, and satisfy the triangle inequality. These measures become metrics when all the labels within each tree are distinct. Furthermore, we show that these distances are sensitive to some operations on trees, such as the removal and insertion of subtrees, swapping of subtrees, and label swapping. Finally, to show the effectiveness of our approach, we have analyzed experimentally the behavior of the distances when operations on trees are applied to a randomly generated fully labeled tree. Here, we present the results obtained in the case k = 1.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.