Individual results
View docsView in-depth performance of a single language model on a single test suite.
Region-by-region surprisal
Sample item for Filler-Gap Dependencies (extraction from prepositional phrase)
The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (extraction from prepositional phrase) to see the full list of items.
Item |
Condition
|
prefix | comp | np1 | verb | np2 | prep | np3 | end |
---|---|---|---|---|---|---|---|---|---|
Item | Condition | prefix | comp | np1 | verb | np2 | prep | np3 | end |
1 | what_nogap | I know | who | our uncle | grabbed | the food | in front of | the guests | at the holiday party |
1 | that_nogap | I know | that | our uncle | grabbed | the food | in front of | the guests | at the holiday party |
1 | what_gap | I know | who | our uncle | grabbed | the food | in front of | at the holiday party | |
1 | that_gap | I know | that | our uncle | grabbed | the food | in front of | at the holiday party |
Prediction performance for GPT-2 on Filler-Gap Dependencies (extraction from prepositional phrase)
Accuracy |
Formula
|
Description |
---|---|---|
Accuracy | Prediction | Description |
91.67% | (622,what_nogap/7,np3) > (623,that_nogap/7,np3) | We expect the NP3 to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap. |
95.83% | (624,what_gap/8,end) < (621,that_gap/8,end) | We expect the end region to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”). |