The radar plots below give a high-level overview of performance across multiple models and test suite tags.