Deception in forensic contexts poses a significant challenge due to the high percentage (from 25% to 45%) of behaviours that are deceptive. Consequently, various efforts have been made to unmask dishonest responding (e.g. . L, F, and K scales of the Minnesota Multiphasic Personality Inventory-3 or the X, Y, and Z scales of the Millon Clinical Multiaxial Inventory-III.) Abnormal scores on these scales indicate an overall tendency of the subjects to modulate their responses in the direction of social desirability (faking good) or hyperbolic psychopathology (faking bad). Despite these measures, existing methods fall short in evaluating the intensity of specific symptoms or identify simulation focused on a particular item in a questionnaire. In this work, we explored the potential of the Term Frequency – Inverse Document Frequency (TF-IDF) model as a tool for detecting dissimulation in self-report questionnaires. This approach identifies the infrequency of a particular response and the degree of deviation from other subjects' responses, highlighting deception at the single-item level. We validated the proposed model using the 10-item Big Five Test, a widely used short version of the Big Five personality test. We administered this questionnaire to a group of volunteers (n = 694) instructed to answer twice (using a 5-point Likert scale): the first time dishonestly according to specific instructions and the second time honestly. We provided the following three faking contexts: (a) a child custody case (n = 221), (b) a job interview for a sales manager position (n = 243), and (c) a job interview for a position in a humanitarian organization (n = 230). The proposed TF-IDF model has proven to be very effective in distinguishing between authentic responses and those that have been artificially manipulated, enhancing the reliability and validity of test outcomes. One of the main contributions of this approach is that two identical responses (from two participants) to the same item may yield different evaluations. This variation is based on other participants' responses to the same items and the subject's responses to the questionnaire's other items. In other words, TF-IDF aggregates a unique measure that considers the distribution of group responses to a target item and the subject's response style. The TF-IDF index, as presented here, can be regarded as a novelty detector, where a high TF-IDF score of a participant’s response to an item would indicate that their response deviates significantly from those of honest respondents.

Detecting Deception at Single Item Level in Questionnaires: the TF-IDF-Based Approach

Giulia Melis;Graziella Orrù;Giuseppe Sartori
2024-01-01

Abstract

Deception in forensic contexts poses a significant challenge due to the high percentage (from 25% to 45%) of behaviours that are deceptive. Consequently, various efforts have been made to unmask dishonest responding (e.g. . L, F, and K scales of the Minnesota Multiphasic Personality Inventory-3 or the X, Y, and Z scales of the Millon Clinical Multiaxial Inventory-III.) Abnormal scores on these scales indicate an overall tendency of the subjects to modulate their responses in the direction of social desirability (faking good) or hyperbolic psychopathology (faking bad). Despite these measures, existing methods fall short in evaluating the intensity of specific symptoms or identify simulation focused on a particular item in a questionnaire. In this work, we explored the potential of the Term Frequency – Inverse Document Frequency (TF-IDF) model as a tool for detecting dissimulation in self-report questionnaires. This approach identifies the infrequency of a particular response and the degree of deviation from other subjects' responses, highlighting deception at the single-item level. We validated the proposed model using the 10-item Big Five Test, a widely used short version of the Big Five personality test. We administered this questionnaire to a group of volunteers (n = 694) instructed to answer twice (using a 5-point Likert scale): the first time dishonestly according to specific instructions and the second time honestly. We provided the following three faking contexts: (a) a child custody case (n = 221), (b) a job interview for a sales manager position (n = 243), and (c) a job interview for a position in a humanitarian organization (n = 230). The proposed TF-IDF model has proven to be very effective in distinguishing between authentic responses and those that have been artificially manipulated, enhancing the reliability and validity of test outcomes. One of the main contributions of this approach is that two identical responses (from two participants) to the same item may yield different evaluations. This variation is based on other participants' responses to the same items and the subject's responses to the questionnaire's other items. In other words, TF-IDF aggregates a unique measure that considers the distribution of group responses to a target item and the subject's response style. The TF-IDF index, as presented here, can be regarded as a novelty detector, where a high TF-IDF score of a participant’s response to an item would indicate that their response deviates significantly from those of honest respondents.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11568/1263947
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact