Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Reflexive Number Agreement (masculine; with subject relative clause)
Item
Condition
intro np_subject that embed_vp the embed_np matrix_v reflexive
Item Condition intro np_subject that embed_vp the embed_np matrix_v reflexive
1 match_sing The author that liked the senators hurt himself
1 mismatch_sing The author that liked the senators hurt themselves
1 match_plural The authors that liked the senator hurt themselves
1 mismatch_plural The authors that liked the senator hurt himself
Prediction performance for GPT-2 on Reflexive Number Agreement (masculine; with subject relative clause)
Accuracy
Formula
Description
AccuracyPredictionDescription
68.42% (598,match_sing/8,reflexive) < (600,mismatch_sing/8,reflexive) No description provided.
78.95% (599,match_plural/8,reflexive) < (597,mismatch_plural/8,reflexive) No description provided.