Test suites

View docs

Test suites lie at the heart of psycholinguistic evaluation. The items in a test suite are given as input to a language model, and the resulting surprisal values are used to assess the model's performance. Typically, test suites are designed in a way that probes a particular grammatical phenomenon.

Browse the available test suites in the table below, or add a new test suite by creating one interactively or uploading one as a .json file.

Available test suites
Name Owner Language Reference Models evaluated Average performance
Tags
Name Owner Language Reference Models evaluated Average performance Tags
Dave W Kush English
3 / 9
16.67%
Jon G English
8 / 9
89.38% Long-Distance Dependencies
Jon G English "Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."
8 / 9
70.54% Center Embedding
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
54.61% Agreement
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
49.34% Licensing
Jon G English "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?"
8 / 9
53.12% Long-Distance Dependencies
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
8 / 9
75.00% Gross Syntactic State
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
91.07% Garden-Path Effects
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
15.13% Licensing
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
79.17% Garden-Path Effects
Jon G English
8 / 9
41.12% Licensing
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
31.91% Licensing
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
29.61% Licensing
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
38.16% Licensing
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
53.95% Agreement
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
46.71% Licensing
Jon G English No published reference
8 / 9
65.00% Long-Distance Dependencies
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
8 / 9
79.35% Gross Syntactic State
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
17.76% Licensing
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
65.82% Garden-Path Effects
Jon G English "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
8 / 9
51.56% Long-Distance Dependencies
Jon G English "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018
7 / 9
50.34% Long-Distance Dependencies
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
86.96% Gross Syntactic State
Jon G English "Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."
8 / 9
85.27% Center Embedding
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
66.67% Garden-Path Effects
Jon G English "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
8 / 9
78.65% Long-Distance Dependencies
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
8 / 9
72.83% Gross Syntactic State
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
67.35% Garden-Path Effects
Jon G English "Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."
7 / 9
95.24% Garden-Path Effects
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
34.21% Agreement
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
32.24% Licensing
Jon G English "Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "
8 / 9
13.82% Licensing
Jon G English "Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018
7 / 9
61.22% Long-Distance Dependencies
Jon G English "Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"
8 / 9
72.40% Long-Distance Dependencies