This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Test suites

Test suites lie at the heart of psycholinguistic evaluation. The items in a test suite are given as input to a language model, and the resulting surprisal values are used to assess the model's performance. Typically, test suites are designed in a way that probes a particular grammatical phenomenon.

Browse the available test suites in the table below, or add a new test suite by creating one interactively or uploading one as a .json file.

Available test suites

Name	Owner	Language	Reference	Models evaluated	Average performance	Tags
Name	Owner	Language	Reference	Models evaluated	Average performance	Tags
Cataphor Prediction	Dave W Kush	English		3 / 9	16.67%
ORC-CMP-comparaison				0 / 9	nan%
Across The Board Wh-Movement and CSC	Dave W Kush			6 / 9	25.00%
Cleft Structure	Jon G	English		8 / 9	89.38%	Long-Distance Dependencies
ORC with gap and no gap	Céline Renaud-Richard			6 / 9	50.00%
Negative inversion	Abby Bertics			0 / 9	nan%
Center Embedding (with modifier)	Jon G	English	"Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."	8 / 9	70.54%	Center Embedding
Comparaison ORC and CMP short	Céline Renaud-Richard			1 / 9	0.00%
Subject-Verb Number Agreement (with subject relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	54.61%	Agreement
Reflexive Number Agreement (masculine; with prepositional phrase)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	49.34%	Licensing
Filler-Gap Dependencies (hierarchy)	Jon G	English	"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?"	8 / 9	53.12%	Long-Distance Dependencies
Subordination (with object relative clause)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	8 / 9	75.00%	Gross Syntactic State
NP/Z Garden-path Ambiguity with Modifier (Overt Object)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	91.07%	Garden-Path Effects
Reflexive Number Agreement (feminine; with prepositional phrase)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	15.13%	Licensing
NP/Z Garden-path Ambiguity (Verb Transitivity)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	79.17%	Garden-Path Effects
Negative Polarity Licensing (any; with object relative clause)	Jon G	English		8 / 9	41.12%	Licensing
Negative Polarity Licensing (any; with subject relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	31.91%	Licensing
Negative Polarity Licensing (ever; with subject relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	29.61%	Licensing
Negative Polarity Licensing (ever; with object relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	38.16%	Licensing
Subject-Verb Number Agreement (with prepositional phrase)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	53.95%	Agreement
Reflexive Number Agreement (masculine; with subject relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	46.71%	Licensing
Cleft Structure (with modifier)	Jon G	English	No published reference	8 / 9	65.00%	Long-Distance Dependencies
Subordination	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	8 / 9	79.35%	Gross Syntactic State
Reflexive Number Agreement (feminine; with object relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	17.76%	Licensing
Main-verb/Reduced-relative Garden-path Disambiguation	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	65.82%	Garden-Path Effects
Filler-Gap Dependencies (extraction from prepositional phrase)	Jon G	English	"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"	8 / 9	51.56%	Long-Distance Dependencies
Filler-Gap Dependencies (4 sentential embeddings)	Jon G	English	"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018	7 / 9	50.34%	Long-Distance Dependencies
Subordination (with prepositional phrase)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	86.96%	Gross Syntactic State
Center Embedding	Jon G	English	"Wilcox E. Levy R. & Futrell R. (2019). Hierarchical representation in neural language models: Suppression and recovery of expectations."	8 / 9	85.27%	Center Embedding
NP/Z Garden-path Ambiguity with Modifier (Verb Transitivity)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	66.67%	Garden-Path Effects
Filler-Gap Dependencies (object extraction)	Jon G	English	"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"	8 / 9	78.65%	Long-Distance Dependencies
Subordination (with subject relative clause)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	8 / 9	72.83%	Gross Syntactic State
Main-verb/Reduced-relative Garden-path Disambiguation (with modifier)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	67.35%	Garden-Path Effects
NP/Z Garden-path Ambiguity (Overt Object)	Jon G	English	"Futrell R. Wilcox E. Morita T. Qian P. Ballesteros M. & Levy R. (2019). Neural language models as psycholinguistic subjects: Representations of syntactic state."	7 / 9	95.24%	Garden-Path Effects
Subject-Verb Number Agreement (with object relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	34.21%	Agreement
Reflexive Number Agreement (masculine; with object relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	32.24%	Licensing
Reflexive Number Agreement (feminine; with subject relative clause)	Jon G	English	"Marvin R. & Linzen T. (2018). Targeted syntactic evaluation of language models. "	8 / 9	13.82%	Licensing
Filler-Gap Dependencies (3 sentential embeddings)	Jon G	English	"Wilcox E. Levy R. & Futrell R. (2019). What Syntactic Structures block Dependencies in RNN Language Models?" Wilcox et al. 2018	7 / 9	61.22%	Long-Distance Dependencies
Filler-Gap Dependencies (subject extraction)	Jon G	English	"Wilcox E. Levy R. Morita T. & Futrell R. (2018). What do RNN Language Models Learn about Filler-Gap Dependencies?"	8 / 9	72.40%	Long-Distance Dependencies