Abstract. The behavior of inlink and outlink distributions appears to be one of the most studied properties of the web structure. The literature agrees that the inlink distribution follows a power law, but no such agreement exists for the outlink distribution. Accurate observations show that in the low-degree region the link distribution fails to fit a power law with a discrepancy larger for outlinks than for inlinks. Moreover, a power law, as well as any continuous function, does not fit the scattered behavior shared by both the link distributions for large-degree values. The linking model we consider here is a mixed one, based on both the preferential attachment strategy and the uniform attachment strategy. A new approximation technique is devised to detect the parameters of the steady state solution that describe a real data set. A stochastic technique is suggested to describe the scattering of the data. With these techniques the model appears to be well suited for describing both inlink and outlink distributions. The experimentation on subsets of the World Wide Web and of Wikipedia shows that our approach produces an approximation more adequate than the power law. This approximation suggests that the two attachment strategies play a different role in the inlink and the outlink cases.
A stochastic model for the link analysis of the web
MENCHI, ORNELLA;ROMANI, FRANCESCO;
2007-01-01
Abstract
Abstract. The behavior of inlink and outlink distributions appears to be one of the most studied properties of the web structure. The literature agrees that the inlink distribution follows a power law, but no such agreement exists for the outlink distribution. Accurate observations show that in the low-degree region the link distribution fails to fit a power law with a discrepancy larger for outlinks than for inlinks. Moreover, a power law, as well as any continuous function, does not fit the scattered behavior shared by both the link distributions for large-degree values. The linking model we consider here is a mixed one, based on both the preferential attachment strategy and the uniform attachment strategy. A new approximation technique is devised to detect the parameters of the steady state solution that describe a real data set. A stochastic technique is suggested to describe the scattering of the data. With these techniques the model appears to be well suited for describing both inlink and outlink distributions. The experimentation on subsets of the World Wide Web and of Wikipedia shows that our approach produces an approximation more adequate than the power law. This approximation suggests that the two attachment strategies play a different role in the inlink and the outlink cases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.