Generate spelling errors

This program generates spelling errors in a text, according to the four character change operators insert, delete, transpose and replace. It may or may not generate realistic output. There are some parameters to test and tweak, see below. More about this program is in the (Swedish) blog post Skapa stavfel ("Making spelling errors" in Swedish). If you have questions or other comments about the program, please mail me.

One of the things to study is how readable the text is with the spelling errors. Also see my program Reading scrambled words for a different way of scramble words.

Test this program with an English text.

Type in a text in the text area below, and change the parameters. Then click on "OK" to proceed. The scrambled text will be shown below. How readable is it?

Probability of errors in words: (default 0.5)
Note: the probabilities for transpose, delete, insert and replace should add to 1.
Probability for transpose: (default 0.25)
Probability for delete: (default 0.25)
Probability for insert: (default 0.25)
Probability for replace: (default 0.25)
Maximum number of errors in a word: (default 1, max 10)
Language (for insertion characters): Swedish English Just letters in the word



Type the text here:

Scrambled text

Itnressant!
Enx veetnskaplig undersöknnng gxjord vi eltt unviersitet i fngland ha vist tat utfiall ed avå fvörsta oh ed tvåö sist bokstävena i alla oröen i e thxt r riktgit placeraide, sjelar de ilten noll i vilket urdningsföljd dg övrigla bostäverna i okrden komer. Texqen äsr flult lälsbar t.o.m. mo dz andtra bokstväerna komer hullermobuller! Dtta eftesom iv nte lser avrje anskild bokstgv, uctan se bilcen fv ordt såom helhetå.

Parameters

Probability of spelling errors: 0.5
Probability of transpose: 0.25 (real: 0.23)
Probability of delete: 0.25 (real: 0.28)
Probability of insert: 0.25 (real: 0.22)
Probability of insert: 0.25 (real: 0.27)
Sum of probabilities of transpose, delete, insert and replace: 1
Max number of changes per word: 1

Explanations

Probability of spelling error

The probability that a specific word in the text should contain a spelling error. 0 mean that no errors will be generated at all, 1 means that every word may have some errors. This probability is checked for each word in a text. This means that for the value of 0.5 (which is default), just about every other word will be changed.

Error operators

There are no constraints where in a word the change will be: the positions are just randomized. For each of these operators it is possible to set a probability (from 0 to 1) that this type of error will occur. The sum of these probabilities should add to (about) 1 or else something unforseen may happen.

If you just want to study (say) transposes, set the transpose probablity to 1 (one) and the othere to 0 (zero).

Language

For the insertion operator use of English or Swedish character set may be used. The only characters that may be inserted is the lower characters "a" to "z" (for both languages) and "å", "ä" and "ö" (for Swedish). Note that just the lower characters is used for insertion. The option "Just letters in the word" will use only letters in the word for insertion.

Maximum number of errors

For a word that should be changed, there may be more than one change. Set this parameter to the number of maximum changes to do. The real number of changes is a random value between 1 and the value stated. Note that the result may be unrealistic, e.g. transposing a word a couple of times is not very likely in real life. Maximum 10 errors per word can be generated.

Also see

Detection of spelling errors in Swedish not using a word list en clair by Rickard Domeij, Joachim Hollman and Viggo Kann.
There may be some comments in my blogg post announcing this program.
Also see the related program Reading scrambled words
and Nearest words
Back to my other useless programs
Back to my homepage
Created by Hakan Kjellerstrand hakank@gmail.com