Viewing test suite Center Embedding
Reference
"Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."
Number of items
28
Tags
Models evaluated
88% (8/9)
Description
Center embedding, the ability to embed a phrase in the middle of another phrase of the same type, is a hallmark feature of natural language syntax. Center-embedding creates nested dependencies, which could pose a challenge for some language models. To succeed in generating expectations about how sentences will continue in the context of multiple center embedding, a model must maintain a representation not only of what words appear in the preceding context but also of the order of those words, and must predict that upcoming words occur in the appropriate order. In this test suite we use verb transitivity and subject/verb plausibility to test model capabilities in this respect.
Items for center_embed
Item |
Condition
|
intro | np_1 | that | det_2 | np_2 | verb1 | verb2 |
---|---|---|---|---|---|---|---|---|
Item | Condition | intro | np_1 | that | det_2 | np_2 | verb1 | verb2 |
1 | plaus | The | painting | that | the | artist | painted | deteriorated |
1 | implaus | The | painting | that | the | artist | deteriorated | painted |
2 | implaus | The | storm | that | the | captain | subsided | feared |
2 | plaus | The | storm | that | the | captain | feared | subsided |
3 | plaus | The | girl | that | the | bug | frightened | shouted |
3 | implaus | The | girl | that | the | bug | shouted | frightened |
4 | plaus | The | baby | that | the | woman | held | yelled |
4 | implaus | The | baby | that | the | woman | yelled | held |
5 | plaus | The | soldier | that | the | bullet | wounded | died |
5 | implaus | The | soldier | that | the | bullet | died | wounded |
6 | plaus | The | storm | that | the | scientist | predicted | intensified |
6 | implaus | The | storm | that | the | scientist | intensified | predicted |
7 | plaus | The | convict | that | the | cop | arrested | escaped |
7 | implaus | The | convict | that | the | cop | escaped | arrested |
8 | plaus | The | computer | that | the | secretary | bought | crashed |
8 | implaus | The | computer | that | the | secretary | crashed | bought |
9 | plaus | The | floor | that | the | maid | swept | cracked |
9 | implaus | The | floor | that | the | maid | cracked | swept |
10 | plaus | The | yacht | that | the | millionaires | bought | sank |
10 | implaus | The | yacht | that | the | millionaires | sank | bought |
11 | plaus | The | shirt | that | the | man | bought | ripped |
11 | implaus | The | shirt | that | the | man | ripped | bought |
12 | plaus | The | water | that | the | maid | poured | evaporated |
12 | implaus | The | water | that | the | maid | evaporated | poured |
13 | plaus | The | building | that | the | workers | built | collapsed |
13 | implaus | The | building | that | the | workers | collapsed | built |
14 | plaus | The | bones | that | the | doctor | examined | broke |
14 | implaus | The | bones | that | the | doctor | broke | examined |
15 | plaus | The | building | that | the | workers | repaired | deteriorated |
15 | implaus | The | building | that | the | workers | deteriorated | repaired |
16 | plaus | The | ship | that | the | workers | built | sank |
16 | implaus | The | ship | that | the | workers | sank | built |
17 | plaus | The | horse | that | the | boy | rode | bucked |
17 | implaus | The | horse | that | the | boy | bucked | rode |
18 | plaus | The | water | that | the | chef | needed | evaporated |
18 | implaus | The | water | that | the | chef | evaporated | needed |
19 | plaus | The | tree | that | the | old man | cut | fell |
19 | implaus | The | tree | that | the | old man | fell | cut |
20 | plaus | The | letter | that | the | author | wrote | arrived |
20 | implaus | The | letter | that | the | author | arrived | wrote |
21 | plaus | The | glass | that | the | athlete | hit | cracked |
21 | implaus | The | glass | that | the | athlete | cracked | hit |
22 | plaus | The | bomb | that | the | terrorist | built | exploded |
22 | implaus | The | bomb | that | the | terrorist | exploded | built |
23 | plaus | The | meat | that | the | man | cooked | burned |
23 | implaus | The | meat | that | the | man | burned | cooked |
24 | plaus | The | sugar | that | the | visitor | bought | dissolved |
24 | implaus | The | sugar | that | the | visitor | dissolved | bought |
25 | plaus | The | pants | that | the | woman | bought | ripped |
25 | implaus | The | pants | that | the | woman | ripped | bought |
26 | plaus | The | toilet | that | the | worker | fixed | clogged |
26 | implaus | The | toilet | that | the | worker | clogged | fixed |
27 | plaus | The | window | that | the | boy | wiped | shattered |
27 | implaus | The | window | that | the | boy | shattered | wiped |
28 | plaus | The | child | that | the | shadow | frightened | yelled |
28 | implaus | The | child | that | the | shadow | yelled | frightened |
Predictions for center_embed
Formula
|
Description |
---|---|
Formula | Description |
( (634,plaus/6,verb1) + (634,plaus/7,verb2) ) < ( (633,implaus/6,verb1) + (633,implaus/7,verb2) ) | The sum of Verb1+Verb2 should be lower in the match condition than in the mismatch condition, where the verb plausibility matches a first-in-first-out ordering. |
Results for center_embed
Model | Prediction 1 accuracy | |
---|---|---|
Model | Prediction 1 accuracy | |
TinyLSTM | 82.14% | Visualize results |
GPT-2 | 92.86% | Visualize results |
GPT-2 XL | 100.00% | Visualize results |
JRNN | 85.71% | Visualize results |
Ordered Neurons | 71.43% | Visualize results |
RNNG | 78.57% | Visualize results |
Transformer XL | 89.29% | Visualize results |
Vanilla LSTM | 82.14% | Visualize results |