indxr is a Python utility for indexing file lines that allows users to dynamically access specific ones, avoiding loading the entire file in the computer's main memory. indxr addresses two main issues related to working with textual data. First, users who do not have plenty of RAM at their disposal may struggle to work with large datasets. Since indxr allows accessing specific lines without loading entire files, users can work with datasets that do not fit into their computer's main memory. For example, it enables users to perform complex tasks with limited RAM without noticeable slowdowns, such as pre-processing texts and training Neural models for Information Retrieval or other tasks. Second, indxr reduces the burden of working with datasets split among multiple files by allowing users to load specific data by providing the related line numbers or the identifiers of the information they describe, thus providing convenient access to such data. This paper overviews indxr's main features. (https://github.com/AmenRa/indxr).

indxr: A Python Library for Indexing File Lines

Nicola Tonellotto
2024-01-01

Abstract

indxr is a Python utility for indexing file lines that allows users to dynamically access specific ones, avoiding loading the entire file in the computer's main memory. indxr addresses two main issues related to working with textual data. First, users who do not have plenty of RAM at their disposal may struggle to work with large datasets. Since indxr allows accessing specific lines without loading entire files, users can work with datasets that do not fit into their computer's main memory. For example, it enables users to perform complex tasks with limited RAM without noticeable slowdowns, such as pre-processing texts and training Neural models for Information Retrieval or other tasks. Second, indxr reduces the burden of working with datasets split among multiple files by allowing users to load specific data by providing the related line numbers or the identifiers of the information they describe, thus providing convenient access to such data. This paper overviews indxr's main features. (https://github.com/AmenRa/indxr).
2024
9783031560682
9783031560699
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1264848
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact