Regular expressions, often abbreviated as regex or regexp, are a powerful tool for pattern matching and text manipulation. They allow you to search, find, replace, and split strings based on specific patterns. In Python, the re module provides a comprehensive set of functions to work with regular expressions. This tutorial will cover the core functionalities of the re module, including match(), search(), findall(), sub(), and split(). We'll also delve into pattern syntax, such as character classes, quantifiers, groups, and anchors.
Regular expressions are essential for tasks like data validation, parsing structured text, and searching through large documents. They provide a concise and flexible way to define patterns that can match complex text structures. Whether you're working with user input, log files, or any form of textual data, understanding regular expressions will greatly enhance your Python programming skills.
re Module FunctionsThe re module in Python offers several functions for pattern matching and manipulation:
match(): Checks if the pattern matches at the beginning of the string.search(): Searches for the first occurrence of the pattern anywhere in the string.findall(): Returns all non-overlapping matches of the pattern as a list of strings.sub(): Replaces occurrences of the pattern with a specified replacement string.split(): Splits the string at each match of the pattern.Regular expressions use a special syntax to define patterns:
Character classes allow you to specify a set of characters that can match a single character in the input string.
[abc]: Matches any one of the characters 'a', 'b', or 'c'.[a-z]: Matches any lowercase letter from 'a' to 'z'.[^0-9]: Matches any character that is not a digit (negation).1import re23pattern = r'[abc]'4text = "abracadabra"5matches = re.findall(pattern, text)6print(matches) # Output: ['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']
['a', 'b', 'r', 'a', 'c', 'a', 'd', 'a', 'b', 'r', 'a']
Quantifiers specify how many times a character or group should be repeated.
*: Matches zero or more occurrences.+: Matches one or more occurrences.?: Matches zero or one occurrence.{n}: Matches exactly n occurrences.{n,}: Matches at least n occurrences.{n,m}: Matches between n and m occurrences.1import re23pattern = r'a*b'4text = "ab abbb aaaaabb"5matches = re.findall(pattern, text)6print(matches) # Output: ['ab', 'abbb', 'aaaaabb']
['ab', 'abbb', 'aaaaabb']
Groups allow you to capture parts of a match for further processing.
(expression): Captures the matched expression as a group.1import re23pattern = r'(d+)-(w+)'4text = "123-abc 456-def"5matches = re.findall(pattern, text)6print(matches) # Output: [('123', 'abc'), ('456', 'def')]
[('123', 'abc'), ('456', 'def')]Anchors specify the position in the string where a match should occur.
^: Matches the start of the string.$: Matches the end of the string.\b: Matches a word boundary.1import re23pattern = r'^hello'4text = "hello world"5match = re.match(pattern, text)6print(bool(match)) # Output: True78pattern = r'world$'9match = re.search(pattern, text)10print(bool(match)) # Output: True
True True
Let's create a practical example that uses regular expressions to validate email addresses. We'll use the re.match() function to check if an input string matches the pattern of a valid email address.
1import re23def is_valid_email(email):4pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$'5match = re.match(pattern, email)6return bool(match)78emails = [9"example@example.com",10"invalid-email@.com",11"another.valid_email123@domain.co.uk"12]1314for email in emails:15print(f"{email}: {is_valid_email(email)}")
example@example.com: True invalid-email@.com: False another.valid_email123@domain.co.uk: True
| Function | Description |
|---|---|
match() | Matches at the beginning of the string |
search() | Searches for the first occurrence anywhere in the string |
findall() | Returns all non-overlapping matches as a list |
sub() | Replaces occurrences with a specified replacement string |
split() | Splits the string at each match |
Now that you have a solid understanding of regular expressions, let's move on to handling JSON data in Python. The next tutorial will cover how to parse and manipulate JSON using the json module. This knowledge is essential for working with APIs and structured data in web development and beyond. Stay tuned!