This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 XL or test suite Negative Polarity Licensing (any; with object relative clause).

Sample item for Negative Polarity Licensing (any; with object relative clause)

The first item of the test suite is shown below for quick reference. Please visit the page for Negative Polarity Licensing (any; with object relative clause) to see the full list of items.

Item	Condition	Licensor	np	compl	rc_dp	rc_subj	rc_verb	matrix_v	npi	continuation
Item	Condition	Licensor	np	compl	rc_dp	rc_subj	rc_verb	matrix_v	npi	continuation
1	neg_pos	No	author	that	the	senators	liked	has had	any	success
1	neg_neg	No	author	that	no	senators	liked	has had	any	success
1	pos_pos	The	author	that	the	senators	liked	has had	any	success
1	pos_neg	The	author	that	no	senators	liked	has had	any	success

More info: view full details for test suite Negative Polarity Licensing (any; with object relative clause).

Prediction performance for GPT-2 XL on Negative Polarity Licensing (any; with object relative clause)

Accuracy	Formula	Description
Accuracy	Prediction	Description
100.00%	(579,neg_pos/8,npi) < (577,pos_pos/8,npi)	No description provided.
100.00%	(578,neg_neg/8,npi) < (580,pos_neg/8,npi)	No description provided.
100.00%	(579,neg_pos/8,npi) < (580,pos_neg/8,npi)	No description provided.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home