Python Regular Expressions

In Python, regular expressions are a powerful tool for matching and manipulating text patterns using a specialized syntax.

In Python a regular expression search is typically written as.

match = re.search(pat, str)

This code demonstrates how to use the re module in Python to search for a pattern in a string using regular expressions.

  • The code defines a string str that contains the text to search.
  • It then uses the re.search() function to search for a pattern that matches the word cat after the string word:.
  • The pattern is defined using a regular expression string r'word:\w\w\w', which matches the characters word: followed by any three word characters (letters, digits, or underscores).
  • The re.search() function returns a Match object if it finds a match for the pattern in the string, or None if no match is found.
  • The code then uses an if statement to check whether a match was found, and if so, it prints the matched text using the match.group() method.
import re

str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:
  print('found', match.group()) ## 'found word:cat'
else:
  print('did not find')

When we run this code, the output will be:

found word:cat

Basic Patterns

\d - Matches any decimal digit (0-9).

\w - Matches any alphanumeric character (a-z, A-Z, 0-9, or underscore _).

\s - Matches any whitespace character (space, tab, newline, etc.).

. - Matches any character except newline.

^ - Matches the start of a line.

$ - Matches the end of a line.

[] - Matches any single character in the set of characters enclosed in square brackets. For example, [abc] matches either ‘a’, ‘b’, or ‘c’.

| - Matches either the expression on the left or the expression on the right. For example, cat|dog matches either ‘cat’ or ‘dog’.

* - Matches zero or more occurrences of the preceding expression. For example, a* matches zero or more ‘a’ characters.

+ - Matches one or more occurrences of the preceding expression. For example, a+ matches one or more ‘a’ characters.

? - Matches zero or one occurrence of the preceding expression. For example, colou?r matches either ‘color’ or ‘colour’.

Example A

Example code

import re

# Define the pattern to search for
pattern = r'\b[A-Z][a-z]*\b'

# Define the string to search in
text = 'John is a man of the world, but he still calls San Francisco home.'

# Use the re.findall() function to find all matches of the pattern in the string
matches = re.findall(pattern, text)

# Print the matched words
for word in matches:
    print(word)

In this code, we import the re module and define a regular expression pattern r'\b[A-Z][a-z]*\b' that matches words consisting of an uppercase letter followed by one or more lowercase letters.

The \b is a word boundary anchor that matches the position between a word character (as defined by \w) and a non-word character. It does not match any character itself, but rather it matches a zero-width boundary between characters, indicating the start or end of a word.

We define a string text that contains some sample text to search for matching words, and then we use the re.findall() function to find all matches of the pattern in the text. The re.findall() function returns a list of all matched substrings in the order they appear in the text.

Finally, we loop through the list of matched words and print each word to the console using a for loop.

When we run this code, the output will be:

John
San
Francisco
Last modified July 21, 2024: update (e2ae86c)