This is a beta release of SyntaxGym. Please send questions and comments to contact@syntaxgym.org.

Individual results

View in-depth performance of a single language model on a single test suite.

Region-by-region surprisal

Tip: hover over bars to see surprisal values. Surprisals are averaged over items. Error bars show 95% confidence intervals.
More info: view full details for model Transformer XL or test suite Cleft Structure.

Sample item for Cleft Structure

The first item of the test suite is shown below for quick reference. Please visit the page for Cleft Structure to see the full list of items.

Item	Condition	intro	subj	verb	passive	verb.1	matrix_v
Item	Condition	intro	subj	verb	passive	verb.1	matrix_v
1	np_mismatch	What	he	did	was		the meal
1	np_match	What	he	ate	was		the meal
1	vp_match	What	he	did	was	prepare	the meal
1	vp_mismatch	What	he	ate	was	prepare	the meal

More info: view full details for test suite Cleft Structure.

Prediction performance for Transformer XL on Cleft Structure

Accuracy	Formula	Description
Accuracy	Prediction	Description
95.00%	((547,np_mismatch/6,matrix_v)-(545,np_match/6,matrix_v))+(((546,vp_mismatch/5,verb.1)+(546,vp_mismatch/6,matrix_v))-((548,vp_match/5,verb.1)+(548,vp_match/6,matrix_v)))>0	We expect that the Matrix Verb has lower surprisal in the NP Match condition, where we have a lexicalized verb (“ate” instead of “did”). In addition, we expect that the sum of the Verb 1 + Matrix Verb has lower surprisal in the VP Match condition, where it cannot be the object of a lexicalized verb such as “ate.” Together, the differences between these sums should be greater than zero.

Tip: hover over region or condition names to highlight the corresponding entries in the Sample Item above. Predictions are evaluated on region-level surprisal values.
Need help? Take a look at the documentation.

← Back to visualization home