Individual results
View docsView in-depth performance of a single language model on a single test suite.
Region-by-region surprisal
Sample item for Filler-Gap Dependencies (4 sentential embeddings)
The first item of the test suite is shown below for quick reference. Please visit the page for Filler-Gap Dependencies (4 sentential embeddings) to see the full list of items.
Item |
Condition
|
prefix | comp | embedding | subj | verb | obj | continuation |
---|---|---|---|---|---|---|---|---|
Item | Condition | prefix | comp | embedding | subj | verb | obj | continuation |
1 | what_gap | I know | what | our mother said her friend remarked that the park attendant reported the cop thinks | your friend | threw | into the trash can | |
1 | that_gap | I know | that | our mother said her friend remarked that the park attendant reported the cop thinks | your friend | threw | into the trash can | |
1 | what_no-gap | I know | what | our mother said her friend remarked that the park attendant reported the cop thinks | your friend | threw | the plastic | into the trash can |
1 | that_no-gap | I know | that | our mother said her friend remarked that the park attendant reported the cop thinks | your friend | threw | the plastic | into the trash can |
Prediction performance for GPT-2 XL on Filler-Gap Dependencies (4 sentential embeddings)
Accuracy |
Formula
|
Description |
---|---|---|
Accuracy | Prediction | Description |
61.90% | (627,what_no-gap/6,obj)>(625,that_no-gap/6,obj) | We expect the object to be less surprising in the that_no-gap condition than in the what_no-gap condition, because an upstream wh-word should set up an expectation for a gap. |
80.95% | (628,what_gap/7,continuation)<(626,that_gap/7,continuation) | We expect the continuation to be lower in the what_gap condition than in the that_gap condition, because gaps must be licensed by upstream wh words (such as “what”). |