A robust computational pipeline for model-based and data-driven phenotype clustering