Test suites
View docs
Test suites lie at the heart of psycholinguistic evaluation.
The items in a test suite are given as input to a language model, and the resulting
surprisal values are used to assess the model's performance.
Typically, test suites are designed in a way that probes a particular
grammatical phenomenon.
Browse the available test suites in the table below, or add a new test suite
by creating one interactively or
uploading one as a .json
file.
Available test suites
Name | Owner | Language | Reference | Models evaluated | Average performance |
Tags
|
|
---|---|---|---|---|---|---|---|
Name | Owner | Language | Reference | Models evaluated | Average performance | Tags | |
Dave W Kush | English |
|
16.67% | ||||
|
nan% | ||||||
Dave W Kush |
|
25.00% | |||||
Jon G | English |
|
89.38% | Long-Distance Dependencies | |||
Céline Renaud-Richard |
|
50.00% | |||||
Abby Bertics |
|
nan% | |||||
Jon G | English | "Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations." |
|
70.54% | Center Embedding | ||
Céline Renaud-Richard |
|
0.00% | |||||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
54.61% | Agreement | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
49.34% | Licensing | ||
Jon G | English | "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" |
|
53.12% | Long-Distance Dependencies | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
75.00% | Gross Syntactic State | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
91.07% | Garden-Path Effects | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
15.13% | Licensing | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
79.17% | Garden-Path Effects | ||
Jon G | English |
|
41.12% | Licensing | |||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
31.91% | Licensing | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
29.61% | Licensing | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
38.16% | Licensing | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
53.95% | Agreement | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
46.71% | Licensing | ||
Jon G | English | No published reference |
|
65.00% | Long-Distance Dependencies | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
79.35% | Gross Syntactic State | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
17.76% | Licensing | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
65.82% | Garden-Path Effects | ||
Jon G | English | "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?" |
|
51.56% | Long-Distance Dependencies | ||
Jon G | English | "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018 |
|
50.34% | Long-Distance Dependencies | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
86.96% | Gross Syntactic State | ||
Jon G | English | "Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations." |
|
85.27% | Center Embedding | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
66.67% | Garden-Path Effects | ||
Jon G | English | "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?" |
|
78.65% | Long-Distance Dependencies | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
72.83% | Gross Syntactic State | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
67.35% | Garden-Path Effects | ||
Jon G | English | "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state." |
|
95.24% | Garden-Path Effects | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
34.21% | Agreement | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
32.24% | Licensing | ||
Jon G | English | "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. " |
|
13.82% | Licensing | ||
Jon G | English | "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018 |
|
61.22% | Long-Distance Dependencies | ||
Jon G | English | "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?" |
|
72.40% | Long-Distance Dependencies |