Python Regular Expressions
Categories:
3 minute read
In Python a regular expression search is typically written as.
match = re.search(pat, str)
This code demonstrates how to use the re module in Python to search for a pattern in a string using regular expressions.
- The code defines a string
str
that contains the text to search. - It then uses the
re.search()
function to search for a pattern that matches the wordcat
after the stringword:
. - The pattern is defined using a regular expression string
r'word:\w\w\w'
, which matches the charactersword:
followed by any three word characters (letters, digits, or underscores). - The
re.search()
function returns aMatch
object if it finds a match for the pattern in the string, orNone
if no match is found. - The code then uses an
if
statement to check whether a match was found, and if so, it prints the matched text using thematch.group()
method.
import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
# If-statement after search() tests if it succeeded
if match:
print('found', match.group()) ## 'found word:cat'
else:
print('did not find')
When we run this code, the output will be:
found word:cat
Basic Patterns
\d
- Matches any decimal digit (0-9).
\w
- Matches any alphanumeric character (a-z, A-Z, 0-9, or underscore _).
\s
- Matches any whitespace character (space, tab, newline, etc.).
.
- Matches any character except newline.
^
- Matches the start of a line.
$
- Matches the end of a line.
[]
- Matches any single character in the set of characters enclosed in square brackets. For example, [abc] matches either ‘a’, ‘b’, or ‘c’.
|
- Matches either the expression on the left or the expression on the right. For example, cat|dog matches either ‘cat’ or ‘dog’.
*
- Matches zero or more occurrences of the preceding expression. For example, a* matches zero or more ‘a’ characters.
+
- Matches one or more occurrences of the preceding expression. For example, a+ matches one or more ‘a’ characters.
?
- Matches zero or one occurrence of the preceding expression. For example, colou?r matches either ‘color’ or ‘colour’.
Example A
Example code
import re
# Define the pattern to search for
pattern = r'\b[A-Z][a-z]*\b'
# Define the string to search in
text = 'John is a man of the world, but he still calls San Francisco home.'
# Use the re.findall() function to find all matches of the pattern in the string
matches = re.findall(pattern, text)
# Print the matched words
for word in matches:
print(word)
In this code, we import the re module
and define a regular expression pattern r'\b[A-Z][a-z]*\b'
that matches words consisting of an uppercase letter followed by one or more lowercase letters.
The \b
is a word boundary anchor that matches the position between a word character (as defined by \w) and a non-word character. It does not match any character itself, but rather it matches a zero-width boundary between characters, indicating the start or end of a word.
We define a string text that contains some sample text to search for matching words, and then we use the re.findall()
function to find all matches of the pattern in the text. The re.findall()
function returns a list of all matched substrings in the order they appear in the text.
Finally, we loop through the list of matched words and print
each word to the console using a for loop.
When we run this code, the output will be:
John
San
Francisco