This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 or test suite Negative Polarity Licensing (any; with subject relative clause).

Sample item for Negative Polarity Licensing (any; with subject relative clause)

The first item of the test suite is shown below for quick reference. Please visit the page for Negative Polarity Licensing (any; with subject relative clause) to see the full list of items.

Item	Condition	Licensor	np	compl	rc_verb	rc_dp	rc_obj	matrix_v	npi	continuation
Item	Condition	Licensor	np	compl	rc_verb	rc_dp	rc_obj	matrix_v	npi	continuation
1	neg_pos	No	author	that	liked	the	senators	has had	any	success
1	neg_neg	No	author	that	liked	no	senators	has had	any	success
1	pos_pos	The	author	that	liked	the	senators	has had	any	success
1	pos_neg	The	author	that	liked	no	senators	has had	any	success

More info: view full details for test suite Negative Polarity Licensing (any; with subject relative clause).

Prediction performance for GPT-2 on Negative Polarity Licensing (any; with subject relative clause)

Accuracy	Formula	Description
Accuracy	Prediction	Description
97.37%	(583,neg_pos/8,npi) < (581,pos_pos/8,npi)	No description provided.
92.11%	(582,neg_neg/8,npi) < (584,pos_neg/8,npi)	No description provided.
60.53%	(583,neg_pos/8,npi) < (584,pos_neg/8,npi)	No description provided.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home