Individual results

View docs

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal
Sample item for Filler-Gap Dependencies (extraction from prepositional phrase)
Item
Condition
prefix comp np1 verb np2 prep np3 end
Item Condition prefix comp np1 verb np2 prep np3 end
1 what_nogap I know who our uncle grabbed the food in front of the guests at the holiday party
1 that_nogap I know that our uncle grabbed the food in front of the guests at the holiday party
1 what_gap I know who our uncle grabbed the food in front of at the holiday party
1 that_gap I know that our uncle grabbed the food in front of at the holiday party
Prediction performance for GPT-2 on Filler-Gap Dependencies (extraction from prepositional phrase)
Accuracy
Formula
Description
AccuracyPredictionDescription
91.67% (622,what_nogap/7,np3) > (623,that_nogap/7,np3) We expect the NP3 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap.
95.83% (624,what_gap/8,end) < (621,that_gap/8,end) We expect the end region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”).