How do I filter a string for alphabetical characters?

Joey Cartella Source

This is only giving me the first word in a string I need to return without numbers:

def stripNonLetters(aString):
 words=[]
 aString=aString.lower()
 for word in aString:
      if word.isalpha() or word.isspace():
        words.append(word)
 print(words)
 return ''.join(words)

def main():
 myString='''Planes and 12 cars.'''
 stripNonLetters(myString)

main()

I need this to return "['planes','and','cars']", but I'm getting "['\n', ' ', ' ', ' ', ' ', 'p', 'l', 'a', 'n', 'e', 's'", etc. What am I doing wrong?

python-3.4

Answers

answered 4 years ago inspectorG4dget #1

After you do aString=aString.split(), aString is a list of words, none of which contain spaces. If you delete that line, you should be fine:

def stripNonLetters(aString):
    answer = ''
    for char in aString:
        if char.isalpha() or char.isspace():
            answer += char
    return answer

Of course, this requires a lot of string addition, which is inefficient. Therefore, you might be more inclined to use this:

def stripNonLetters(aString):
    answer = []
    for char in aString:
        if char.isalpha() or char.isspace():
            answer.append(char)
    return ''.join(answer)

answered 4 years ago Jonathan Eunice #2

It will be more direct (and often more efficient) to use regular expressions to process the strings in bulk, rather than character-by-character. For example:

import re

def stripNonLetters(s):
    """
    Strip all non-letter, non-space characters from a string.
    Runs of whitespace are normalized ot single space charactes,
    except at the start and end, where they are stripped.
    """
    s = re.sub(r'[^A-Za-z\s]', '', s.strip())
    return re.sub(r'\s+', ' ', s)

s = '''Planes and 12 cars.'''
print stripNonLetters(s).split()

I have kept the .split() division of the results into words at the end, because that was a late-stated requirement, and because it goes beyond the apparent remit of a function called stripNonLetters. But if you want the function to handle that subdivision function, move the split operation to the last line of the function, rather than post-processing in the caller.

comments powered by Disqus