Using Python to check words

Megan Source

I'm stuck on a simple problem. I've got a dictionary of words in the English language, and a sample text that is to be checked. I've got to check every word in the sample against the dictionary, and the code I'm using is wrong.

for word in checkList:      # iterates through every word in the sample
    if word not in refDict: # checks if word is not in the dictionary
         print word         # just to see if it's recognizing misspelled words

The only problem is, as it goes through the loop it prints out every word, not just the misspelled ones. Can someone explain this and offer a solution possibly? Thank you so much!

python

Answers

answered 8 years ago Grumdrig #1

Clearly "word not in refDict" always evaluates to True. This is probably because the contents of refDict or checkList are not what you think they are. Are they both tuples or lists of strings?

answered 8 years ago dsimard #2

Consider stripping your words of any whitespace that might be there, and changing all the words of both sets to the same case. Like this:

word.strip().lower()

That way you can make sure you're comparing apples to apples.

answered 8 years ago mjv #3

The snippet you have is functional. See for example

>>> refDict = {'alpha':1, 'bravo':2, 'charlie':3, 'delta':4}
>>> s = 'he said bravo to charlie O\'Brian and jack Alpha'
>>> for word in s.split():
...   if word not in refDict:
...       print(repr(word))  # by temporarily using repr() we can see exactly
...                          #  what the words are like
...
'he'
'said'
'to'
"O'Brian"
'and'
'jack'
'Alpha'     # note how Alpha was not found in refDict (u/l case difference)

Therefore, the dictionary contents must differ from what you think, or the words out of checklist are not exactly as they appear (eg. with whitespace or capitalization; see the use of repr() (*) in print statement to help identify cases of the former).

Debugging suggestion: FOCUS on the first word from checklist (or the first that you suspect is to be found in dictionary). Then for this word and this word only, print it in details, with its length, with bracket on either side etc., for both the word out of checklist and the corresponding key in the dictionary...

(*) repr() was a suggestion from John Machin. Instead I often use brackets or other characters as in print('[' + word + ']'), but repr() is more exacting in its output.

answered 8 years ago Tendayi Mawushe #4

The code you have would work if the keys in refDict are the correctly spelt words. If the correctly spelt words are the values in your dict then you need something like this:

for word in checkList:
    if word not in refDict.values():
        print word

Is there a reason you dictionary is stored as a mapping as opposed to a list or a set? A python dict contains name-value pairs for example I could use this mapping: {"dog":23, "cat":45, "pony":67} to store an index of a word and page number it is found in some book. In your case your dict is a mapping of what to what?

answered 8 years ago Gareth Simpson #5

Are the words in the refDict the keys or the values?

Your code will only see keys: e.g.:

refDict = { 'w':'x', 'y':'z' }
for word in [ 'w','x','y','z' ]:
  if word not in refDict:
  print word

prints:

x
z

Othewise you want;

if word not in refDict.values()

Of course this rather assumes that your dictionary is an actual python dictionary which seems an odd way to store a list of words.

answered 8 years ago Georg Schölly #6

Your refDict is probably wrong. The in keyword checks if the value is in the keys of the dictionary. I believe you've put your words in as values.

I'd propose using a set instead of a dictionary.

knownwords = set("dog", "cat")
knownwords.add("apple")

text = "The dog eats an apple."
for word in text.split(" "):
    # to ignore case word is converted to lowercase
    if word.lower() not in knownwords:
        print word
# The
# eats
# an
# apple.       <- doesn't work because of the dot

comments powered by Disqus