This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model GPT-2 XL or test suite Reflexive Number Agreement (feminine; with object relative clause).

Sample item for Reflexive Number Agreement (feminine; with object relative clause)

The first item of the test suite is shown below for quick reference. Please visit the page for Reflexive Number Agreement (feminine; with object relative clause) to see the full list of items.

Item	Condition	intro	np_subject	that	the	embed_np	embed_vp	matrix_v	reflexive
Item	Condition	intro	np_subject	that	the	embed_np	embed_vp	matrix_v	reflexive
1	match_sing	The	author	that	the	senators	liked	hurt	herself
1	mismatch_sing	The	author	that	the	senators	liked	hurt	themselves
1	match_plural	The	authors	that	the	senator	liked	hurt	themselves
1	mismatch_plural	The	authors	that	the	senator	liked	hurt	herself

More info: view full details for test suite Reflexive Number Agreement (feminine; with object relative clause).

Prediction performance for GPT-2 XL on Reflexive Number Agreement (feminine; with object relative clause)

Accuracy	Formula	Description
Accuracy	Prediction	Description
47.37%	(614,match_sing/8,reflexive) < (616,mismatch_sing/8,reflexive)	No description provided.
100.00%	(615,match_plural/8,reflexive) < (613,mismatch_plural/8,reflexive)	No description provided.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home