Chapter 8: Strings

DNA String

DNA String

8.1 A String is a sequence

A string is a sequence of characters.

Since it is a sequence, we can access the characters one at a time with the bracket operator [].

The expression in brackets is called an index. The index indicates which character in the sequence you want (hence the name).

String / Sequence / Index

String / Sequence / Index

It is important to note that “the index is an offset from the beginning of the string, and the offset of the first letter is zero”.

Inside the bracket operator, we can use integers, variables and operators as indexes.

8.2 len

The function len is a built-in function that counts and returns the number of characters in a string.

Here is an example of how we use the len built-in function:

len function / Sequence / Indices

len function / Sequence / Indices

In order to find a character in our string, we can go by two ways:

  • Start from the beginning of the string and move forward:
    • However, we must remember that the first character is position (or index) 0, the next character is position (or index) 1, and so on.
    • When we move forward (from the beginning to the end of the string) we use positive integers.
  • Start at the end of the string and move backward:
    • When we move backward (from the end to the beginning of the string) we use negative integers.
    • -1 index indicates the first character from the end.

8.3 Traversal with a for loop

Soon we will be able to “scan” a string one character at a time. Usually, iteration (“or the scanning process”) starts at the beginning. The objective is to select each character, do something to it, and continue until the end of the sequence (or string).

This pattern of processing is called a traversal […]

Traverse: To iterate through the items in a sequence, performing a similar operation on each.

traversal with while and for loops

traversal with while and for loops

On the right column (using the while loop), the loop displays the string and displays each letter on a line by itself.

  • the variable “fruit” gets the string ‘banana’
    • the sequence of the characters in the string ‘banana’ is:
    • b    a   n   a   n   a
    • 0    1   2   3   4   5
  • the variable “index” is initialized with the number 0.
  • loop condition: index <len(fruit)
    • The built-in function len calculates the number of characters in the string ‘banana’, which is 6.
  • The while statement may be readas follow:
    • while index < len(fruit):
      • While this is true: variable “index” is less than 6 (which is the length of the string ‘banana’),
      • index < 6
    • letter = fruit [index]
      • the variable “letter” gets the value of the character at the position set by variable “index” in the string ‘banana’ (or variable ‘fruit’).
    • print letter
      • print variable “letter”
    • index = index + 1
      • variable “index” gets the value of itself incremented by one (1).
      • the last character accessed is the one with the “index” len(fruit) – 1, which is the last character in the string.

On the left column (using the for loop), the loop displays the string and displays each letter on a line by itself.

  • the variable “fruit” gets the string ‘banana’
    • the sequence of the characters in the string ‘banana’ is:
    • fruit:   b    a   n   a   n   a
    • index: 0    1   2   3   4   5
  • The for statement may be readas follow:
    • for char in fruit:
    • iterate for every item (“char”) in the sequence provided in variable “fruit”
    • print char
  • “Each time through the loop, the next character in the string is assigned to the variable . The loop continues until no characters are left.”

Exercise 8.1) Write a function that takes a string as an argument and displays the letters backward, one per line.

>>> def upside_down (x):    #define the function upside_down with variable parameter x
…     index = len(x) – 1
…     while index >= 0:
…             letter = x[index]
…             print letter
…             index = index – 1

>>> upside_down(‘gateman’)
n
a
m
e
t
a
g
>>>

The following example shows how to use concatenation (string addition) and a for loop to generate an abecedarian series (that is, in alphabetical order).

>>> prefixes = ‘JKLMNOPQ’
>>> suffix = ‘ack’
>>> for letter in prefixes:
…     print letter + suffix

Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack
>>>

Exercise 8.2) Modify the program to write “Ouack” and “Quak” instead of “Oack” and “Qack”.

>>> for letter in prefixes:
…     if letter == “O” or letter == “Q”:
…             print letter + “u” + suffix
…     else:
…             print letter + suffix

Jack
Kack
Lack
Mack
Nack
Ouack
Pack
Quack
>>>

8.4 String slices

A segment of a string is called a slice.

We can select a segment (or slice) of a string as follows:

>>> s = ‘Monty Python’
>>> print s [0:5]
Monty
>>> print s [6:12]
Python
>>>

It is somewhat similar than selecting a character. However, we must specify where the slice starts and where it ends in the sequence.

The operator [n : m] returns the part of the string from the “n-eth” character to the “m-eth” character, including the first but excluding the last.

String slice

String slice

Also, it is helpful to know that:

  • to start the slice from the beginning of the sequence to a position ‘m’, we can write:
    • >>> fruit = ‘banana’
      >>> fruit [:m]           # Let m = 3
      ‘ban’
  • to start the slice from the position ‘n’ to the end of the sequence, we can write:
    • >>> fruit = ‘banana’
      >>> fruit [n:]           # Let m = 3
      ‘ana’

NOTE: There is a rule for string slices to work:

The first index (for instance ‘n’) should be less than the second index (for instance ‘m’). Otherwise, it will result in an empty string represented by two quotation marks:

>>> fruit [3:3]

>>> fruit [4:3]

>>>

An empty string contains no characters and has length 0, but other than that, it is the same as any other string.

Exercise 8.3) Given that “fruit” is a string, what does “fruit [:]” mean?

>>> fruit [:]
‘banana’
>>>

It means that it goes from the first index to the last index of the string’s sequence.

Watch Minions playing with the concept of a string slice. Have fun!

8.5 Strings are immutable

[…] strings are immutable, which means you can’t change an existing string. The best you can do is create a new string that is a variation on the original:

>>> greeting = ‘Hello, world!’
>>> new_greeting = ‘J’ + greeting[1:]

#This assignment concatenates a new first letter onto a slice of the variable “greeting”.
# It has no effect on the original string ‘Hello, world!’
>>> print new_greeting
Jello, world!
>>>

Moreover, it is important to acknowledge that our understanding of the object notion is very broad. An object is the same thing as a value. In fact, we use the “object” and “value” interchangeably.

8.6 Searching

Searching in Python

Searching in Python

Professor Downey points out that this is the first time we see a return statement inside a loop.

Here is the logic broken down:

  • variable “index” is initialized at 0,
  • while it is true that the position determined by variable “index” is less than the length of the sequence in variable “word”,
    • if the character from the string in variable “word” (at position “index”) is equal to variable “letter” (the character we are looking for with the function find),
      • then, return variable “index”,
    • increment variable “index” by one,
  • while it is false that the position determined by variable “index” is less than the length of the sequence in variable “word”,
    • the function breaks and
    • it returns -1

This pattern of computation – traversing a sequence and returning when we find what we are looking for – is called search.

Exercise 8.4) Modify the equation find so that it has a third parameter, the index in word where it should start looking.

>>> def find(word, letter, index):
…     while index < len(word):
…             if word[index] == letter:
…                     return index
…             index = index + 1
…     return -1

>>> find (‘banana’, ‘n’, 0)
2
>>> find (‘banana’, ‘n’, 3)
4
>>>

8.7 Looping and counting

The following program counts the number of times [a given character] appears in a string:

>>> word = ‘banana’
>>> count = 0
>>> for letter in word:
…     if letter == ‘a’:
…             count = count + 1

>>> print count
3
>>>

The program demonstrate another pattern of computation called a counter. The variable count is initialized to 0 and the incremented each time an a is found.

When the loop exits, count contains the result – the total number of a‘s.

Exercise 8.5) Encapsulate this code in a function named count, and generalize it so that it accepts the string and the letter as arguments.

Exercise 8.6) Rewrite this function so that instead of traversing the string, it uses the three parameter version of find from exercise 8.4.

8.8 String methods

A method is a function that is associated with an object and called using dot notation.

Professor Downey points out that it is “similar to a function in the sense that it takes arguments and returns a value”.

  • upper method:

If we take the example of the method upper, the syntax works as follows:

>>> word = ‘banana’
>>> new_word = word.upper()

#The method upper takes the string in variable “word” and returns a new string will all uppercase letter.
#The empty parentheses indicate that this method takes no argument.

>>> print new_word
BANANA
>>>

Invocation : Similar than calling a function, a method is invoked. In the example above, we would say: “we are invoking upper on the [variable] word”.

  • find method :

To find a character in a string, the method called find does that:

>>> word = ‘banana’
>>> index = word.find(‘a’)   #We invoke find on [variable] word.
>>> print index
1
>>>

To find a substring in a string, the method find does that too:

>>> word.find(‘na’)

#this line of code indicates us where the first substring ‘na’ is found. In this case, the substring appears in index (or position) 2, starting from the beginning of the string.
‘ b    a   n   a   n   a ‘
0    1   2   3   4   5
2
>>>

To find a substring in a string starting from a precise index (or position) in the string, the method find is helpful too:

>>> word.find(‘na’, 3)

# Starting from index 3, the first substring ‘na’ is found in index (or position) 4.
‘ b    a   n   a   n   a
0    1   2   3   4   5
4
>>>

****

Finally, the method find may take up to three parameters, which are:

  1. the character or substring we are looking for
  2. the index (or position ) of where our search starts
  3. the index (or position ) of where our search ends

For instance:

>>> name = ‘bob’
>>> name.find(‘b’, 1, 2)

#this line of code indicates us where the first substring ‘na’ is found. In this case, the substring appears in index (or position) 4, starting from index 3.
‘ b      o     b  ‘
0      1     2
-3    -2    –1
-1
>>>

8.9 The in operator

The word in is a Boolean operator that takes two strings and returns True if the first appears as a substring in the second :

>>> ‘a’ in ‘banana’
True
>>> ‘ok’ in ‘banana’
False
>>>

Another use of the word in with strings is:

>>> def in_both (word1, word2):
…     for letter in word1:
…             if letter in word2:
…                     print letter

>>> in_both (‘apples’, ‘oranges’)
a
e
s
>>>

Think Python: in operator

Think Python: in operator

8.10 String comparison

The relational operators work on strings.

String equality:

>>> word = ‘banana’
>>> if word == ‘banana’:
…     print ‘All right, bananas.’

All right, bananas.
>>>

String inequality:

>>> word = ‘banana’
>>> if word < ‘banana’:
…     print ‘Your word, ‘ + word + ‘ , comes before banana.’
… elif word > ‘banana’:
…     print ‘Your word, ‘ + word + ‘ , comes after banana.’
… else:
…     print ‘All right, bananas.’

All right, bananas.
>>>

Python does not handle uppercase and lowercase letters the same way that people do.

This is another difference between natural language versus formal language. Remember Chapter 1?

Therefore, Python will interpret the uppercase letters as coming before all the lowercase letters.

>>> word = ‘Pineapple’
>>> if word < ‘banana’:
…     print ‘Your word, ‘ + word + ‘ , comes before banana.’
… elif word > ‘banana’:
…     print ‘Your word, ‘ + word + ‘ , comes after banana.’
… else:
…     print ‘All right, bananas.’

Your word, Pineapple , comes before banana.
>>>

A way to circumvent this problem is to “convert strings to a standard format, such as all lowercase, before performing the comparison”.

8.11 Debugging

 

 

***

Acknowledgments :

These notes represent my understanding from the book Think Python: How to Think Like a Computer Scientist written by Allen B. Downey.

Part of the chapter is transcribed and all the quotes unless specified otherwise come directly from his book.

Thank you Professor Downey for making this knowledge available.

 

Also, I would like to thank the open source community for their valuable contribution in making resources on programming available.

Thank you

2 thoughts on “Chapter 8: Strings

  1. Pingback: Chapter 9: Case Study – Word Play | Python Project

  2. Hi,

    Here is an improvement to the code block from Exercise (8.4)

    def find (word, letter):

    “” The function find will provide all the indexes (positions) for the letter we are looking for in variable word.
    “”
    TAB index = -1
    TAB match = 0
    TAB while index < len(word) – 1:
    TAB TAB index += 1
    TAB TAB if word [index] == letter:
    TAB TAB TAB print index
    TAB TAB TAB match += 1
    TAB if match == 0:
    TAB TAB print letter + ‘ not in ‘ + word

    find (‘banana’, ‘a’)
    1
    3
    5

    Courtesy of Tobi from NYC PyLadies!

    Thank you 🙂

    P.S. Keep sending me your improvements.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s