Processing frequently misspelled words (А study based on an English learner corpus)

Authors

  • Margarita A. Klimova HSE University, 25/12, ul. Bolshaya Pecherskaya, Nizhniy Novgorod, 603155, Russia
  • Anna V. Viklova Russian Presidential Academy of National Economy and Public Administration, 82/1, pr. Vernadskogo, Moscow, 119571, Russia
  • Daria A. Overnikova HSE University, 20, ul. Myasnitskaya, Moscow, 101000, Russia

DOI:

https://doi.org/10.21638/spbu09.2023.409

Abstract

The article presents an experimental study of the influence of the frequency of spelling errors in a word on its representation in mental lexicon. The hypothesis that frequently misspelled words cause difficulties in reading even if they are written correctly has been proved for native speakers of Russian and English. This paper aims to check the hypothesis on the basis of the learner corpus REALEC (Russian Error-Annotated Learner English Corpus), comprising texts by Russian L1 learners of English. The most frequently misspelled words collected from the corpus were used for the experiment that consisted in recognising correct and incorrect spellings. We analysed the influence of the following factors: the frequency of spelling errors in a word, its frequency in the corpus, entropy (the measure reflecting the amount of effort needed to choose between the variants of spelling), type of error. The results demonstrate the significance of entropy and frequency in the corpus, which corresponds to the findings of previous studies. A particular error type, substitution, has been found to be significant. Its special role corresponds both to the greatest difficulties this error type caused during the experiment and the results of previous research into written speech production of L2 English speakers, according to which substitution was considered the most frequent error type. The lower significance of the frequency of errors factor in comparison with the corresponding studies of L1 English can be explained by differences in language environments, in which learners of English are less exposed to incorrect spellings.

Keywords:

word processing, spelling errors, mental lexicon, learner corpus

Downloads

Download data is not yet available.
 

References

Литература

Алексеева, Слюсарь 2017 — Алексеева С. В., Слюсарь Н. А. Орфографические соседи в русском языке: База данных и эксперимент, направленный на изучение морфологической декомпозиции. Вопросы психолингвистики. 2017, 32 (2): 12–27.

Чернова и др. 2020a — Чернова Д. А., Алексеева С. В., Слюсарь Н. А. Чему нас учат ошибки: трудности при обработке слов с частотными орфографическими ошибками. Компьютерная лингвистика и интеллектуальные технологии. 2020, (19): 147–159.

Чернова и др. 2020б — Чернова Д. А., Слюсарь Н. А., Алексеева С. В. Особенности орфографической обработки падежных форм русских существительных в контексте предложения. Вестник Томского государственного университета. 2020, (454): 45–54.

Чернова 2022 — Чернова Д. А. Фонологическая и графическая репрезентации слова в ментальном лексиконе: восприятие омофонов при чтении. Вестник Санкт-Петербургского университета. Язык и литература. 2022, 19 (1): 181–194.

Andrews et al. 2020 — Andrews S., Veldre A., Clarke I. E. Measuring lexical quality: The role of spelling ability. Behavior Research Methods. 2020, 52 (6): 2257–2282.

Bates et al. 2015 — Bates D., Maechler M., Bolker B., Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015, 67 (1): 1–48.

Bestgen, Granger 2011 — Bestgen Y., Granger S. Categorising spelling errors to assess L2 writing. InternationalJournal of Continuing Engineering Education and Life-Long Learning. 2011, 21 (2–3): 235–252.

Botley et al. 2007 — Botley S., Hakim F., Dillah D. Investigating Spelling Errors in a Malaysian Learner Corpus. Malaysian Journal of ELT Research. 2007, (3): 74–93.

Cook 1997 — Cook V. J. L2 Users and English Spelling. Journal of Multilingual and Multicultural Development. 1997, 18 (6): 474–488.

Cook 2014 — Cook V. J. The English writing system. London; New York: Routledge, 2014.

Flor, Futagi 2012 — Flor M., Futagi Y. On using context for automatic correction of non-word misspellings in student essays. In: Proceedings of the seventh workshop on building educational applications using NLP. 2012. P. 105–115.

Flor et al. 2015 — Flor M., Futagi Y., Lopez M., Mulholland M. Patterns of misspellings in L2 and L1 English: A view from the ETS Spelling Corpus. Bergen Language and Linguistics Studies. 2015, (6): 107–132.

Hothorn et al. 2015 — Hothorn T., Bretz F., Ag P., Westfall P. Simultaneous inference in general parametric models. Biometrical Journal. 2015, 50 (3): 346–363.

Klimova et al. 2021 — Klimova M. A., Smilga V. K., Overnikova D. A. Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons. In: Trudy mezhdunarodnoi konferentsii “Korpusnaia lingvistika — 2021”. Zakharov V. P. (ed.). St. Petersburg: St. Petersburg University Press, 2021. Р. 112–121.

Leacock et al. 2015 — Leacock C., Chodorow M., Tetreault J. Automatic grammar and spell-checking for language learners. In: The Cambridge Handbook of Learner Corpus Research. Granger S., Gilquin G., Meunier F. (eds). Cambridge: Cambridge University Press, 2015. P. 267–286.

Okada 2005 — Okada T. A Corpus-based Study of Spelling Errors of Japanese EFL Writers with Reference to Errors Occurring in Word-initial and Word-final Positions. In: Second Language Writing Systems. Cook V., Bassetti B. (eds). Clevedon; Buffalo; Toronto: Multilingual Matters, 2005. P. 164–183.

Perfetti 1985 — Perfetti C. A. Reading ability. Oxford: Oxford University Press, 1985.

Perfetti 2007 — Perfetti C. A. Reading ability: Lexical quality to comprehension. Scientific Studies of Reading. 2007, 11 (4): 357–383.

Perfetti, Hart 2001 — Perfetti C. A., Hart L. The lexical basis of comprehension skill. In: On the consequences of meaning selection: Perspectives on resolving lexical ambiguity. Gorfein D. S. (ed.). Washington: American Psychological Association, 2001. P. 67–86.

Rahmanian, Kuperman 2019 — Rahmanian S., Kuperman V. Spelling errors impede recognition of correctly spelled word forms. Scientific Studies of Reading. 2019, 23 (1): 24–36. R Core Team 2013 — R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, 2013. Available at: http://www.R-project.org/ (accessed: 22.07.2022).

References

Алексеева, Слюсарь 2017 — Alexeeva S. V., Slioussar N. A. Orthographic neighbours: A database on Russian language and experimental studies of morphological decomposition. Voprosy psikholingvistiki. 2017, 32 (2): 12–27. (In Russian)

Чернова и др. 2020a — Chernova D. A., Alexeeva S. V., Slioussar N. A. What do we learn from mistakes: Processing difficulties with frequently misspelled words. Komp’iuternaia lingvistika i intellektual’nye tekhnologii. 2020, (19): 147–159. (In Russian)

Чернова и др. 2020б — Chernova D. A., Slioussar N. A., Alexeeva S. V. Orthographic processing of Russian case forms in sentential context. Vestnik Tomskogo gosudarstvennogo universiteta. 2020, (454): 45–54. (In Russian)

Чернова 2022 — Chernova D. A. Phonological and graphic representations of words in mental lexicon: Homophone processing while reading. Vestnik of Saint Petersburg University. Language and Literature. 2022, 19 (1): 181–194. (In Russian)

Andrews et al. 2020 — Andrews S., Veldre A., Clarke I. E. Measuring lexical quality: The role of spelling ability. Behavior Research Methods. 2020, 52 (6): 2257–2282.

Bates et al. 2015 — Bates D., Maechler M., Bolker B., Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015, 67 (1): 1–48.

Bestgen, Granger 2011 — Bestgen Y., Granger S. Categorising spelling errors to assess L2 writing. InternationalJournal of Continuing Engineering Education and Life-Long Learning. 2011, 21 (2–3): 235–252.

Botley et al. 2007 — Botley S., Hakim F., Dillah D. Investigating Spelling Errors in a Malaysian Learner Corpus. Malaysian Journal of ELT Research. 2007, (3): 74–93.

Cook 1997 — Cook V. J. L2 Users and English Spelling. Journal of Multilingual and Multicultural Development. 1997, 18 (6): 474–488.

Cook 2014 — Cook V. J. The English writing system. London; New York: Routledge, 2014.

Flor, Futagi 2012 — Flor M., Futagi Y. On using context for automatic correction of non-word misspellings in student essays. In: Proceedings of the seventh workshop on building educational applications using NLP. 2012. P. 105–115.

Flor et al. 2015 — Flor M., Futagi Y., Lopez M., Mulholland M. Patterns of misspellings in L2 and L1 English: A view from the ETS Spelling Corpus. Bergen Language and Linguistics Studies. 2015, (6): 107–132.

Hothorn et al. 2015 — Hothorn T., Bretz F., Ag P., Westfall P. Simultaneous inference in general parametric models. Biometrical Journal. 2015, 50 (3): 346–363.

Klimova et al. 2021 — Klimova M. A., Smilga V. K., Overnikova D. A. Using an Error-Annotated Learner Corpus (REALEC) in DDL Lessons. In: Trudy mezhdunarodnoi konferentsii “Korpusnaia lingvistika — 2021”. Zakharov V. P. (ed.). St. Petersburg: St. Petersburg University Press, 2021. Р. 112–121.

Leacock et al. 2015 — Leacock C., Chodorow M., Tetreault J. Automatic grammar and spell-checking for language learners. In: The Cambridge Handbook of Learner Corpus Research. Granger S., Gilquin G., Meunier F. (eds). Cambridge: Cambridge University Press, 2015. P. 267–286.

Okada 2005 — Okada T. A Corpus-based Study of Spelling Errors of Japanese EFL Writers with Reference to Errors Occurring in Word-initial and Word-final Positions. In: Second Language Writing Systems. Cook V., Bassetti B. (eds). Clevedon; Buffalo; Toronto: Multilingual Matters, 2005. P. 164–183.

Perfetti 1985 — Perfetti C. A. Reading ability. Oxford: Oxford University Press, 1985.

Perfetti 2007 — Perfetti C. A. Reading ability: Lexical quality to comprehension. Scientific Studies of Reading. 2007, 11 (4): 357–383.

Perfetti, Hart 2001 — Perfetti C. A., Hart L. The lexical basis of comprehension skill. In: On the consequences of meaning selection: Perspectives on resolving lexical ambiguity. Gorfein D. S. (ed.). Washington: American Psychological Association, 2001. P. 67–86.

Rahmanian, Kuperman 2019 — Rahmanian S., Kuperman V. Spelling errors impede recognition of correctly spelled word forms. Scientific Studies of Reading. 2019, 23 (1): 24–36. R Core Team 2013 — R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, 2013. Available at: http://www.R-project.org/ (accessed: 22.07.2022).

Published

2024-04-25

How to Cite

Klimova, M. A., Viklova, A. V., & Overnikova, D. A. (2024). Processing frequently misspelled words (А study based on an English learner corpus). Vestnik of Saint Petersburg University. Language and Literature, 20(4), 824– 837. https://doi.org/10.21638/spbu09.2023.409

Issue

Section

Linguistics