This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 or test suite Cataphor Prediction.

Sample item for Cataphor Prediction

The first item of the test suite is shown below for quick reference. Please visit the page for Cataphor Prediction to see the full list of items.

Item	Condition	adjunct_comp	adjunct_subject	adjunct_verb	adjunct_rest	main_subj	main_adverb	main_verb	main_object	main_rest
Item	Condition	adjunct_comp	adjunct_subject	adjunct_verb	adjunct_rest	main_subj	main_adverb	main_verb	main_object	main_rest
1	cata_match	When	he	was	at the party,	the boy	cruelly	teased	the girl	about something.
1	cata_mismatch	When	she	was	at the party,	the boy	cruelly	teased	the girl	about something.
1	new_referent	While	I	was	at the party,	the boy	cruelly	teased	the girl	about something.

More info: view full details for test suite Cataphor Prediction.

Prediction performance for GPT-2 on Cataphor Prediction

Accuracy	Formula	Description
Accuracy	Prediction	Description
50.00%	((679,cata_match/5,main_subj) < (680,cata_mismatch/5,main_subj))	No description provided.
100.00%	((679,cata_match/5,main_subj) < (681,new_referent/5,main_subj))	No description provided.
0.00%	((680,cata_mismatch/5,main_subj) > (681,new_referent/5,main_subj))	No description provided.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home