3 Part 3 Python Workbook
3.4 Builtin Data Types
3.4.3 Strings
def test():
names1 = ['alice', 'bertrand', 'charlene', 'daniel']
names2 = ['bertrand', 'charlene']
names3 = [name for name in names1 if name not in names2]
print 'names3:', names3 if __name__ == '__main__':
test()
When run, this script prints out the following:
names3: ['alice', 'daniel']
>>> str2
'say "hello" to jerry\'s mom'
Triple quotes enable you to create a string that spans multiple lines. Use three single quotes or three double quotes to create a single quoted string.
Examples:
1. Create a triple quoted string that contains single and double quotes.
Solutions:
1. Use triple single quotes or triple double quotes to create multiline strings:
String1 = '''This string extends across several lines. And, so it has endofline characters in it.
'''
String2 = """
This string begins and ends with an endofline character. It can have both 'single'
quotes and "double" quotes in it.
"""
def test():
print String1 print String2
if __name__ == '__main__':
test()
3.4.3.1 Characters
Python does not have a distinct character type. In Python, a character is a string of length 1. You can use the ord() and chr() builtin functions to convert from character to integer and back.
Exercises:
1. Create a character "a".
2. Create a character, then obtain its integer representation.
Solutions:
1. The character "a" is a plain string of length 1:
>>> x = 'a'
2. The integer equivalent of the letter "A":
>>> x = "A"
>>> ord(x) 65
3.4.3.2 Operators on strings
You can concatenate strings with the "+" operator.
You can create multiple concatenated copies of a string with the "*" operator.
And, augmented assignment (+= and *=) also work.
Examples:
>>> 'cat' + ' and ' + 'dog' 'cat and dog'
>>> '#' * 40
'########################################'
>>>
>>> s1 = 'flower'
>>> s1 += 's'
>>> s1 'flowers'
Exercises:
1. Given these strings:
>>> s1 = 'abcd'
>>> s2 = 'efgh'
create a new string composed of the first string followed by (concatenated with) the second.
2. Create a single string containing 5 copies of the string 'abc'.
3. Use the multiplication operator to create a "line" of 50 dashes.
4. Here are the components of a path to a file on the file system: "home",
"myusername", "Workdir", "notes.txt". Concatenate these together separating them with the path separator to form a complete path to that file. (Note that if you use the backslash to separate components of the path, you will need to use a double backslash, because the backslash is the escape character in strings.
Solutions:
1. The plus (+) operator applied to a string can be used to concatenate strings:
>>> s3 = s1 + s2
>>> s3 'abcdefgh'
2. The multiplication operator (*) applied to a string creates a new string that concatenates a string with itself some number of times:
>>> s1 = 'abc' * 5
>>> s1
'abcabcabcabcabc'
3. The multiplication operator (*) applied to a string can be used to create a
"horizontal divider line":
>>> s1 = '' * 50
>>> print s1
4. The sep member of the os module gives us a platform independent way to construct paths:
>>> import os
>>>
>>> a = ["home", "myusername", "Workdir", "notes.txt"]
>>> path = a[0] + os.sep + a[1] + os.sep + a[2] + os.sep + a[3]
>>> path
'home/myusername/Workdir/notes.txt'
And, a more concise solution:
>>> import os
>>> a = ["home", "myusername", "Workdir", "notes.txt"]
>>> os.sep.join(a)
'home/myusername/Workdir/notes.txt'
Notes:
○ Note that importing the os module and then using os.sep from that module gives us a platform independent solution.
○ If you do decide to code the path separator character explicitly and if you are on MS Windows where the path separator is the backslash, then you will need to use a double backslash, because that character is the escape character.
3.4.3.3 Methods on strings
String support a variety of operations. You can obtain a list of these methods by using the dir() builtin function on any string:
>>> dir("")
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip',
'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
And, you can get help on any specific method by using the help() builtin function.
Here is an example:
>>> help("".strip)
Help on builtin function strip:
strip(...)
S.strip([chars]) > string or unicode
Return a copy of the string S with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
If chars is unicode, S will be converted to unicode before stripping
Exercises:
1. Strip all the whitespace characters off the right end of a string.
2. Center a short string within a longer string, that is, pad a short string with blank characters on both right and left to center it.
3. Convert a string to all upper case.
4. Split a string into a list of "words".
5. (a) Join the strings in a list of strings to form a single string. (b) Ditto, but put a newline character between each original string.
Solutions:
1. The rstrip() method strips whitespace off the right side of a string:
>>> s1 = 'some text \n'
>>> s1
'some text \n'
>>> s2 = s1.rstrip()
>>> s2 'some text'
2. The center(n) method centers a string within a padded string of width n:
>>> s1 = 'Dave'
>>> s2 = s1.center(20)
>>> s2
' Dave '
3. The upper() method produces a new string that converts all alpha characters in the original to upper case:
>>> s1 = 'Banana'
>>> s1 'Banana'
>>> s2 = s1.upper()
>>> s2 'BANANA'
4. The split(sep) method produces a list of strings that are separated by sep in the original string. If sep is omitted, whitespace is treated as the separator:
>>> s1 = """how does it feel ... to be on your own
... no directions known ... like a rolling stone ... """
>>> words = s1.split()
>>> words
['how', 'does', 'it', 'feel', 'to', 'be', 'on', 'your', 'own', 'no',
'directions', 'known', 'like', 'a', 'rolling', 'stone']
Note that the split() function in the re (regular expression) module is useful when the separator is more complex than whitespace or a single character.
5. The join() method concatenates strings from a list of strings to form a single string:
>>> lines = []
>>> lines.append('how does it feel')
>>> lines.append('to be on your own')
>>> lines.append('no directions known')
>>> lines.append('like a rolling stone')
>>> lines
['how does it feel', 'to be on your own', 'no directions known',
'like a rolling stone']
>>> s1 = ''.join(lines)
>>> s2 = ' '.join(lines)
>>> s3 = '\n'.join(lines)
>>> s1
'how does it feelto be on your ownno directions knownlike a rolling stone'
>>> s2
'how does it feel to be on your own no directions known like a rolling stone'
>>> s3
'how does it feel\nto be on your own\nno directions known\nlike a rolling stone'
>>> print s3 how does it feel to be on your own no directions known like a rolling stone
3.4.3.4 Raw strings
Raw strings give us a convenient way to include the backslash character in a string without escaping (with an additional backslash). Raw strings look like plain literal strings, but are prefixed with an "r" or "R". See String literals
http://docs.python.org/reference/lexical_analysis.html#stringliterals Excercises:
1. Create a string that contains a backslash character using both plain literal string and a raw string.
Solutions:
1. We use an "r" prefix to define a raw string:
>>> print 'abc \\ def' abc \ def
>>> print r'abc \ def' abc \ def
3.4.3.5 Unicode strings
Unicode strings give us a consistent way to process character data from a variety of character encodings.
Excercises:
1. Create several unicode strings. Use both the unicode prefix character ("u") and the unicode type (unicode(some_string)).
2. Convert a string (possibly from another nonascii encoding) to unicode.
3. Convert a unicode string to another encoding, for example, utf8.
4. Test a string to determine if it is unicode.
5. Create a string that contains a unicode character, that is, a character outside the ascii character set.
Solutions:
1. We can represent unicode string with either the "u" prefix or with a call to the unicode type:
def exercise1():
a = u'abcd' print a
b = unicode('efgh') print b
2. We convert a string from another character encoding into unicode with the decode() string method:
import sys
def exercise2():
a = 'abcd'.decode('utf8') print a
b = 'abcd'.decode(sys.getdefaultencoding()) print b
3. We can convert a unicode string to another character encoding with the encode() string method:
import sys
def exercise3():
a = u'abcd'
print a.encode('utf8')
print a.encode(sys.getdefaultencoding())
4. Here are two ways to check the type of a string:
import types def exercise4():
a = u'abcd'
print type(a) is types.UnicodeType print type(a) is type(u'')
5. We can encode unicode characters in a string in several ways, for example, (1) by defining a utf8 string and converting it to unicode or (2) defining a string with an embedded unicode character or (3) concatenating a unicode characher into a string:
def exercise5():
utf8_string = 'Ivan Krsti\xc4\x87'
unicode_string = utf8_string.decode('utf8') print unicode_string.encode('utf8')
print len(utf8_string) print len(unicode_string) unicode_string = u'aa\u0107bb'
print unicode_string.encode('utf8')
unicode_string = 'aa' + unichr(263) + 'bb' print unicode_string.encode('utf8')
Guidance for use of encodings and unicode:
1. Convert/decode from an external encoding to unicode early:
my_source_string.decode(encoding)
2. Do your work (Python processing) in unicode.
3. Convert/encode to an external encoding late (for example, just before saving to an external file):
my_unicode_string.encode(encoding)
For more information, see:
● Unicode In Python, Completely Demystified http://farmdev.com/talks/unicode/
● Unicode Howto http://www.amk.ca/python/howto/unicode.
● PEP 100: Python Unicode Integration
http://www.python.org/dev/peps/pep0100/
● 4.8 codecs Codec registry and base classes
http://docs.python.org/lib/modulecodecs.html
● 4.8.2 Encodings and Unicode
http://docs.python.org/lib/encodingsoverview.html
● 4.8.3 Standard Encodings http://docs.python.org/lib/standardencodings.html
● Converting Unicode Strings to 8bit Strings
http://effbot.org/zone/unicodeconvert.htm