Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Filler-Gap Dependencies (4 sentential embeddings)
Item
Condition
prefix comp embedding subj verb obj continuation
Item Condition prefix comp embedding subj verb obj continuation
1 what_gap I know what our mother said her friend remarked that the park attendant reported the cop thinks your friend threw into the trash can
1 that_gap I know that our mother said her friend remarked that the park attendant reported the cop thinks your friend threw into the trash can
1 what_no-gap I know what our mother said her friend remarked that the park attendant reported the cop thinks your friend threw the plastic into the trash can
1 that_no-gap I know that our mother said her friend remarked that the park attendant reported the cop thinks your friend threw the plastic into the trash can
Prediction performance for Transformer XL on Filler-Gap Dependencies (4 sentential embeddings)
Accuracy
Formula
Description
AccuracyPredictionDescription
33.33% (627,what_no-gap/6,obj)>(625,that_no-gap/6,obj) We expect the object to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
66.67% (628,what_gap/7,continuation)<(626,that_gap/7,continuation) We expect the continuation to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).