A statistical analysis of satirical Amazon.com product reviews

Stephen Skalicky, Scott Crossley


A corpus of 750 product reviews extracted from Amazon.com was analyzed for specific lexical, grammatical, and semantic features to identify differences between satirical and non-satirical Amazon.com product reviews through a statistical analysis. The corpus contained 375 reviews identified as satirical and 375 as non-satirical (750 total). Fourteen different linguistic indices were used to measure features related to lexical sophistication, grammatical functions, and the semantic properties of words. A one-way multivariate analysis of variance (MANOVA) found a significant difference between review types. The MANOVA was followed by a discriminant function analysis (DFA), which used seven variables to correctly classify 71.7 per cent of the reviews as satirical or non-satirical. Those seven variables suggest that, linguistically, satirical texts are more specific, less lexically sophisticated, and contain more words associated with negative emotions and certainty than non-satirical texts. These results demonstrate that satire shares some, but not all, of the previously identified semantic features of sarcasm (Campbell & Katz 2012), supporting Simpson’s (2003) claim that satire should be considered separately from other forms of irony. Ultimately, this study puts forth an argument that a statistical analysis of lexical, semantic, and grammatical properties of satirical texts can shed some descriptive light on this relatively understudied linguistic phenomenon, while also providing suggestions for future analysis.


satire; statistical analysis; online product reviews

Full Text:



Amazon. (2013). Funny Reviews: Dynamic List. http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1001250201 (accessed 18 September 2013).

Attardo, S. (2000). ‘Irony as relevant inappropriateness’. Journal of Pragmatics 32, pp. 793-826.

Burfoot, C. & Baldwin, T. (2009). ‘Automatic satire detection: Are you having a laugh?’, in Proceedings of the Association for Computational Linguistics International Joint Conference on Natural Language Processing 2009 Conference: Short Papers (Singapore, 2-7 August 2009), pp. 161-164.

Brysbaert, M. & New, B. (2009). ‘Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English’. Behavior Research Methods 41 (4), pp. 977-990.

Brysbaert, M., Warriner, A. & Kuperman, V. (2013). ‘Concreteness ratings for 40 thousand generally known English word lemmas’. Behavior Research Methods 46 (3), pp. 904-911.

Campbell, J. & Katz, A. (2012). ‘Are there necessary conditions for inducing a sense of sarcastic irony?’. Discourse Processes 49 (6), pp. 459-480.

Carvalho, P., Sarmento, L., Silva, M. & de Oliveira, E. (2009). ‘Clues for detecting irony in user-generated contents: Oh ...!! It’s “so easy” ; - )’, in TSA ’09: 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion (Hong Kong, 6 November 2009), New York: Association for Computing Machinery, pp. 53-56.

Caucci, G. & Kreuz, R. (2012). ‘Social and paralinguistic cues to sarcasm’. Humor: International Journal of Humor Research 25 (1), pp. 1-22.

Colston, H. & Gibbs, R. (2007). ‘A brief history of irony’, in Gibbs, R. R. & Colston, H. (eds.), Irony in Language and Thought: A Cognitive Science Reader, New York: Lawrence Erlbaum Associates, pp. 3-21.

Coltheart, M. (1981). ‘The MRC psycholinguistic database’. Quarterly Journal of Experimental Psychology 33 (4), pp. 497-505.

Condren, C. (2012). ‘Satire and definition’. Humor: International Journal of Humor Research 25 (4), pp. 375-399.

Crossley, S. A., Salsbury, T., McNamara, D. S. & Jarvis, S. (2010). ‘Predicting lexical proficiency in language learner texts using computational indices’. Language Testing 28 (4), pp. 561-580.

Gibbs, R. (2000). ‘Irony in talk among friends’. Metaphor and Symbol 15 (1-2), pp. 5-27.

González-Ibáñez, R., Muresan, S. & Wacholder, N. (2011). ‘Identifying sarcasm in Twitter: A closer look’, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (Portland, Oregon, 19-24 June 2011): Short Papers, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 581-586.

Hancock, J. (2004). ‘Verbal irony use in face-to-face and computer-mediated conversations’. Journal of Language and Social Psychology 23 (4), pp. 447-463.

Jorgensen, J. (1996). ‘The functions of sarcastic irony in speech’. Journal of Pragmatics 26 (5), pp. 613-634.

Kreuz, R., Long, D. & Church, M. (1991). ‘On being ironic: Pragmatic and mnemonic implications’. Metaphor and Symbolic Activity 6 (3), pp. 149-162.

Kreuz, R. & Caucci, G. (2007). ‘Lexical influences on the perception of sarcasm’, in FigLanguages ’07: Proceedings of the Workshop on Computational Approaches to Figurative Language, Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 1-4.

Kreuz, R. & Caucci, M. (2008). ‘Do lexical factors affect the perception of sarcasm?’ Paper presented at the 18th Annual Meeting of the Society for Text and Discourse. University of Memphis, Memphis, TN, 12-15 July.

Kuperman, V., Stadthagen-Gonzales, H. & Brysbaert, B. (2012). ‘Age-of-acquisition ratings for 30 thousand English words’. Behavior Research Methods 44 (4), pp. 978-990.

Kyle, K. & Crossley, S. A. (2014). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly.

LIWC, Inc. (n.d.). Linguistic inquiry and word count: Table 1: LIWC2007 output variable information. http://www.liwc.net/descriptiontable1.php (accessed 1 November 2013).

Mihalcea, R. & Strapparava, C. (2006). ‘Learning to laugh (automatically): Computational models for humor recognition’. Computational Intelligence 22 (2), pp. 126-142.

Newman, M., Groom, C., Handelman, L. & Pennebaker, J. (2008). ‘Gender differences in language use: An analysis of 14,000 text samples’. Discourse Processes 45, pp. 211-236.

Nilsen, A. & Nilsen, D. (2008). ‘Literature and humor’, in Raskin, V. (ed.), The Primer of Humor Research, New York: Mouton de Gruyter, pp. 243-280.

Pennebaker, J., Booth, R. & Francis, M. (2007). Operator’s Manual: Linguistic Inquiry and Word Count: LIWC2007. Austin, Texas: LIWC.net http://homepage.psy.utexas.edu/HomePage/Faculty/Pennebaker/Reprints/LIWC2007_OperatorManual.pdf (accessed 1 October 2013).

Popova, M. (n.d.). Modern Masterpieces of Comedic Genius: The Art of the Humorous Amazon Review. http://www.brainpickings.org/index.php/2013/07/08/humorous-amazon-reviews/ (accessed 1 September 2013).

Reyes, A. & Rosso, P. (2011). ‘Mining subjective knowledge from customer reviews: A specific case of irony detection’, in Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (Portland, Oregon, 24 June 2011), Stroudsburg, PA: Association for Computational Linguistics (ACL), pp. 118-124.

Simpson, P. (2003). On the Discourse of Satire: Towards a Stylistic Model of Satirical Humor. Amsterdam & Philadelphia: John Benjamins Publishing Company.

Skalicky, S. (2013). ‘Was this analysis helpful?: A genre analysis of the Amazon.com discourse community and its “most helpful” product reviews’. Discourse, Context & Media 2 (2), pp. 84-93.

Tausczik, Y. & Pennebaker, J. (2009). ‘The psychological meaning of words: LIWC and computerized text analysis methods’. Journal of Language and Social Psychology 29 (1), pp. 24-54.

Whalen, J., Pexman, P. & Gill, A. (2009). ‘“Should be fun – Not!”: Incidence and marking of nonliteral language in e-mail’. Journal of Language and Social Psychology 28 (3), pp. 263-280.

DOI: http://dx.doi.org/10.7592/EJHR2014.2.3.skalicki


  • There are currently no refbacks.

Publication ethics and malpractice statement