The Software Heritage (SWH) dataset serves as a vast repository for open-source code, with the ambitious goal of preserving all publicly available open-source projects. Despite being designed to effectively archive project files, its size of nearly 1 petabyte presents challenges in efficiently supporting Big Data MapReduce or AI systems. To address this disparity and enable seamless custom analytics on the SWH dataset, we present the SWH-Analytics (SWHA) architecture. This development environment quickly and transparently runs custom analytic applications on open-source software data preserved over time by SWH.
The SWH-Analytics Framework
Torquati M.;
2023-01-01
Abstract
The Software Heritage (SWH) dataset serves as a vast repository for open-source code, with the ambitious goal of preserving all publicly available open-source projects. Despite being designed to effectively archive project files, its size of nearly 1 petabyte presents challenges in efficiently supporting Big Data MapReduce or AI systems. To address this disparity and enable seamless custom analytics on the SWH dataset, we present the SWH-Analytics (SWHA) architecture. This development environment quickly and transparently runs custom analytic applications on open-source software data preserved over time by SWH.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.