Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Cleft Structure (with modifier)
Item
Condition
intro subj verb modifier passive verb.1 matrix_v
Item Condition intro subj verb modifier passive verb.1 matrix_v
1 np_mismatch What he did after the ingredients had been bought from the store was the meal
1 np_match What he ate after the ingredients had been bought from the store was the meal
1 vp_match What he did after the ingredients had been bought from the store was prepare the meal
1 vp_mismatch What he ate after the ingredients had been bought from the store was prepare the meal
Prediction performance for GPT-2 XL on Cleft Structure (with modifier)
Accuracy
Formula
Description
AccuracyPredictionDescription
100.00% ((607,np_mismatch/7,matrix_v)-(605,np_match/7,matrix_v))+(((606,vp_mismatch/6,verb.1)+(606,vp_mismatch/7,matrix_v))-((608,vp_match/6,verb.1)+(608,vp_match/7,matrix_v)))>0 We expect that the Matrix Verb has lower surprisal in the NP Match condition, where we have a lexicalized verb (“ate” instead of “did”). In addition, we expect that the sum of the Verb 1 + Matrix Verb has lower surprisal in the VP Match condition, where it cannot be the object of a lexicalized verb such as “ate.” Together, the differences between these sums should be greater than zero. We add a VP modifier in this test suite.