Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Reflexive Number Agreement (masculine; with subject relative clause)
Item
Condition
intronp_subjectthatembed_vptheembed_npmatrix_vreflexive
ItemConditionintronp_subjectthatembed_vptheembed_npmatrix_vreflexive
1 match_sing The author that liked the senators hurt himself
1 mismatch_sing The author that liked the senators hurt themselves
1 match_plural The authors that liked the senator hurt themselves
1 mismatch_plural The authors that liked the senator hurt himself
Showing 1 to 4 of 4 entries
Prediction performance for Transformer XL on Reflexive Number Agreement (masculine; with subject relative clause)
Accuracy
Formula
Description
AccuracyPredictionDescription
78.95% match_plural.reflexive < mismatch_plural.reflexive No description provided.
94.74% match_sing.reflexive < mismatch_sing.reflexive No description provided.
Showing 1 to 2 of 2 entries