Today we're gonna talk about spelling correction. Lots of applications make use of spelling correction. For example, word processing, almost any modern word processor will take a misspelled word like component with an A and give you suggestions like component with an E and automatically replace it for you. Modern search engines will not only flag an error. So, language spelled without a u, here. But, give you, the results, as if you had spelled the word correctly. And, modern phones additionally will automatically find misspelled words. Here, I typed l-a-y-r, and it replaced it automatically, or suggests a replacement, with late. We can distinguish a number of separate tasks and spelling correction. One is the detection of the error itself. And then the correction of the error once you've found it. And we can think about different kinds of correction. We might automatically correct an error if we're positive that the error that we know the right answer for the error. So H-T-E is a very common misspelling for the, and so many word processors automatically correct H-T-E. We might suggest a single correction if we're, there's only one very likely correction, or we might suggest a whole list of corrections and let the user pick from among them. We distinguish two different classes of spelling errors. Non word errors are errors in which the, what the user types is not a word of English. So g-r-a-f-f-e a misspelling let's say for giraffe is not a word of English. By contrast, real word errors. Are errors in which then the resulting. [sound] Misspelling is actually a word of English and that makes them somewhat harder to detect. And we can break up real word errors into ones produced by really typographical processes. These were meant to type three. And typed [inaudible] let's say. Or cognitive errors, where the user, meant to type a word like [inaudible] and instead typed a homophone of a, of the word, or \u201ct-o-o\u201d instead of [inaudible] And in both cases what, what's produced is a real word of English, but by modeling the differences between these kind of errors, we might come up with better ways of fixing them both. How common are spelling errors? Depends a lot on the task. So in web queries, spelling errors are extremely common. So practically one in four words in a web query are likely to be misspelled. But in web processing tasks on phones it's much harder to get an accurate number. So there's been a number of studies and most of these studies are done by retyping. You give the user a passage to type and then you measure how well they, they type it. And, of course, that's not quite the same user's naturally writing messages or typing. Nonetheless If you ask users to retype and you don't let them use the backspace key, they make about thirteen percent of the words, thirteen percent of the words are in error. So indicating that if, that a lot of words. They correct themselves with the backspace. If you let them correct, now we're trying to experiment on a, on a p d a style phone site, organizer, they'll correct about seven percent of the words themselves. They'll still leave about two percent of the words uncorrected, on the organizer. And, similar numbers on people doing retyping on a regular keyboard. So, the numbers are about two percent where people typing. And probably a much higher number for web queries and probably a much higher number for people texting. Are the kind of spelling, spelling error [inaudible] that we see. How do we detect non word spelling errors. The traditional way is just to use a large dictionary. Any word not in the dictionary is an error. And, the larger the dictionary, it turns out the better this works. For correcting these non-word spelling errors, we generate a set of candidates that's real words that are similar to the error. And then we pick whichever one is best. And we'll talk about the noisy-channel probability model of how to do that. And it's also related to another method called the shortest weighted [inaudible] distance myth. So we find the words that are not in the dictionary. For each one, we generate a set of candidates. Those are going to be real words that are similar, we'll talk about what similar means, to that error and then we'll pick the best one. For real word spelling errors, the algorithm is quite similar. Again, for each word we generate a candidate set. But now we do this for every word in a sentence, not just the words that are not in some dictionary. So real word spelling error correction, we don't use a dictionary because of course the errors are in a dictionary. So that wouldn't help. So, for every word, we generate a candidate set. So we might find candidate words with similar pronunciations, we might find candidate words with similar spellings, and depending on the algorithm, exactly. And it's very important that we're gonna include the word itself, in the candidate set, because the every word might be a misspelling of some other real word, or it might be the correct word. In fact, most words are probably correct. So, for each candidate set of each possible error, we're gonna include the word itself. And most of the time, in fact, we're gonna pick that. And again, how we pick the words we might use the noisy channel model. We might use a classifier, we'll talk about that so we'll discuss the different methods of detecting these errors and correcting them in the next