Tip: Use ^ and $ for exact string patterns and . to represent any character. Two dots = exactly two characters.
Which pattern will match a string that starts with a digit, followed by exactly two lowercase letters, and ends with either “x” or “y”?
This exercise combines digits, character classes, quantifiers, and alternation for a slightly tricky pattern.
Option 1: Almost correct. Uses \d (digit) and [a-z]{2}, ending with [xy]. Works in most cases, but option 4 uses numeric range explicitly which is clearer in some contexts.
Option 2: Missing ^ and $ anchors, so it may match substrings instead of the whole string.
Option 3: Uses (x|y), which is correct but adds unnecessary parentheses; simpler with character class [xy].
Option 4: Correct. ^[0-9][a-z]{2}[xy]$ ensures:
Start of string: ^
First character is a digit: [0-9]
Next two characters are lowercase letters: [a-z]{2}
Last character is either 'x' or 'y': [xy]
End of string: $
Example usage:
import re
pattern = r'^[0-9][a-z]{2}[xy]$'
print(bool(re.match(pattern, '1abx'))) # True
print(bool(re.match(pattern, '2cdz'))) # False (last char not x or y)
print(bool(re.match(pattern, '12ax'))) # False (second char is not letter)
Tip: Use [xy] for simple alternation instead of parentheses when matching single characters. Always anchor your pattern with ^ and $ for exact matches.
Consider the following code:
import re
pattern = r'^[A-Z][a-z]{2,4}\d$'
test_strings = ['Cat1', 'Dog12', 'Apple5', 'Bat9', 'cat1']
matches = [s for s in test_strings if re.match(pattern, s)]
print(matches)
Which of the following is the correct output?
This exercise combines anchors, character classes, quantifiers, and string filtering with regex.
Pattern Analysis:^[A-Z][a-z]{2,4}\d$
^ → start of string
[A-Z] → first character is uppercase
[a-z]{2,4} → next 2 to 4 lowercase letters
\d → ends with a digit
$ → end of string
Check each string:
'Cat1' → C (uppercase), at (2), t (3), 1 → matches
'Dog12' → last part '12' is two digits → does not match
'Apple5' → A (uppercase), p p l e (4 lowercase letters), 5 → matches
'Bat9' → B (uppercase), a t (2 lowercase), 9 → matches
'cat1' → starts with lowercase → does not match
Thus, matches = ['Cat1', 'Apple5', 'Bat9']
Tip: Pay attention to anchors and exact counts with curly braces. re.match matches from the start of the string, so ^ is optional here but keeps intent clear.
Fill in the blank in the following statement:
The Python re function _____________ returns a match object only if the pattern matches at the beginning of the string, otherwise it returns None.
This exercise tests your conceptual understanding of the key difference between re.match and re.search in Python.
Option 1:re.search() searches for the pattern anywhere in the string, not just at the beginning.
Option 2:re.match() is correct. It attempts to match the pattern starting from the beginning of the string.
Option 3:re.findall() returns all non-overlapping matches as a list, not a match object.
Option 4:re.fullmatch() matches the entire string; the pattern must cover the whole string exactly.
Key Takeaway: Use re.match() when you want to ensure the pattern matches at the start of the string. For general search anywhere in the string, use re.search().
You have the following text:
Order ID: 12345, Customer: Alice
Order ID: 67890, Customer: Bob
You want to extract all Order IDs using Python regex. Which code snippet will correctly do this?
This exercise focuses on using groups and the appropriate re function to extract multiple matches from text.
Option 1:re.match only checks at the start of the string, so it will only match the first Order ID if it is at the very beginning.
Option 2:re.search finds the first occurrence of the pattern, but will not extract all Order IDs.
Option 3: Correct. re.findall returns a list of all matches for the pattern, using the captured group (\d+).
Option 4:re.match(r'(\d+)', text) tries to match digits at the start of the string. It will fail because the string starts with “Order ID:”.
Example usage:
import re
text = """Order ID: 12345, Customer: Alice
Order ID: 67890, Customer: Bob"""
order_ids = re.findall(r'Order ID: (\d+)', text)
print(order_ids) # ['12345', '67890']
Tip: Use re.findall when you need all occurrences of a pattern. Parentheses (...) define the group you want to capture.
You want to match a string that contains either “cat” or “dog” anywhere in it. Which regex pattern will achieve this correctly?
This exercise focuses on alternation in regex, which allows matching one of multiple patterns.
Option 1: Correct. cat|dog matches either “cat” or “dog” anywhere in the string.
Option 2: Incorrect. The ^ at the end is misplaced; anchors should be at the start or end only.
Option 3: Incorrect. ^(cat|dog)$ matches the entire string exactly equal to “cat” or “dog”, not when they appear inside other text.
Option 4: Incorrect. Square brackets denote a character class, so [cat|dog] matches any single character that is 'c', 'a', 't', '|', 'd', 'o', or 'g'.
Example usage:
import re
pattern = r'cat|dog'
texts = ['I have a cat', 'My dog is cute', 'Birds are flying']
matches = [s for s in texts if re.search(pattern, s)]
print(matches) # ['I have a cat', 'My dog is cute']
Tip: Use the | operator for alternation. Remember: parentheses group alternatives, while square brackets match individual characters.
Consider the following text:
<p>Title 1</p><p>Title 2</p>
You want to extract each title separately using regex. Which pattern will work correctly?
This exercise tests understanding of greedy vs non-greedy quantifiers.
Option 1:.* is greedy, so it matches from the first <p> to the last </p>, resulting in one big match: <p>Title 1</p><p>Title 2</p>.
Option 2: Similar to option 1 but requires at least one character; still greedy, so same problem.
Option 3: Correct. .*? is non-greedy, so it matches the shortest possible string between <p> and </p>, giving two separate matches: Title 1 and Title 2.
Option 4: Incorrect. [^p]* matches any character except 'p', which breaks the intended pattern.
Example usage:
import re
text = '<p>Title 1</p><p>Title 2</p>'
pattern = r'<p>.*?</p>'
matches = re.findall(pattern, text)
print(matches) # ['<p>Title 1</p>', '<p>Title 2</p>']
Tip: Use ? after a quantifier like * or + to make it non-greedy when you want the shortest match possible, especially in nested or repeated patterns.
You want to match a string that literally contains a+b. Which regex pattern will work correctly in Python?
This exercise tests understanding of escaping special characters in regex.
Option 1:a+b is incorrect because + is a regex quantifier meaning “one or more of the preceding character”. It will match 'ab', 'aab', 'aaab', etc.
Option 2: Correct. a\+b escapes the +, so it matches the string 'a+b' literally.
Option 3:a\\b uses \b which represents a word boundary, not a literal '+'.
Option 4:[a+b] is a character class that matches a single character: 'a', '+', or 'b'. It does not match the whole string 'a+b'.
Tip: Use ?: inside parentheses to create a group that groups alternatives without storing a match. This is useful for performance or when you do not need the captured value.
You want to validate a password that must:
Be at least 6 characters long
Contain at least one digit
Which regex pattern will correctly enforce this using a positive lookahead?
This exercise demonstrates a real-world practical scenario using positive lookahead to validate passwords.
Option 1:\d{1,6} only matches 1 to 6 digits, not the full password requirement.
Option 2:^\d.*$ requires the password to start with a digit; it does not allow letters at the start.
Option 3: Correct. ^(?=.*\d).{6,}$ means:
(?=.*\d) → positive lookahead ensures at least one digit exists somewhere in the string
. {6,} → total length is at least 6 characters
^ ... $ → anchors match the whole string
Option 4:. {6,}\d requires the password to end with a digit; digits elsewhere are not counted.
Example usage:
import re
pattern = r'^(?=.*\d).{6,}$'
print(bool(re.match(pattern, 'abc123'))) # True
print(bool(re.match(pattern, 'abcdef'))) # False (no digit)
print(bool(re.match(pattern, '12ab'))) # False (less than 6 chars)
Tip: Positive lookahead (?=...) is a powerful way to enforce rules without consuming characters. It’s especially useful in password validation or complex multi-rule patterns.
You have the following text:
Prices: $100, $250, 300, $400
You want to extract only the numbers preceded by $ using regex. Which pattern will do this correctly?
This exercise demonstrates positive lookbehind to extract numbers that are immediately preceded by a specific character.
Option 1:\$\d+ works, but it includes the $ in the match. Sometimes you may want only the number.
Option 2:(?=\$\d+)\d+ is a positive lookahead; it checks for a $ before digits ahead but does not consume it correctly for extraction.
Option 3: Correct. (?<=\$)\d+ means:
(?<=\$) → positive lookbehind ensures the number is preceded by a $
\d+ → matches the digits only
This will extract 100, 250, 400 only.
Option 4:\d+\$ matches numbers that end with $, which is not what we want.
Tip: Use positive lookbehind (?<=...) to match content only if it is preceded by a specific character or pattern, without including the preceding part in the match.
You want to validate an email address that must:
Start with letters or digits
Contain exactly one @
Domain contains only letters and at least one dot
Which regex pattern is correct?
This exercise demonstrates a complex real-world validation using regex for emails.
Option 1: Matches emails partially, but it allows digits and underscores in the domain, which may not be desired.
Option 2: Only allows lowercase letters in the username and domain; digits are rejected.
Option 3: Similar to option 1, but domain part may include digits and underscores.
import re
pattern = r'^[a-zA-Z0-9]+@[a-zA-Z]+\.[a-zA-Z]+$'
emails = ['test123@domain.com', 'user@domain', 'user_name@domain.com', 'user@domain1.com']
valid_emails = [e for e in emails if re.match(pattern, e)]
print(valid_emails) # ['test123@domain.com']
Tip: For strict email validation, carefully define allowed characters in username and domain. Use anchors to match the entire string and avoid partial matches.
Which of the following statements about Python regex backreferences is TRUE?
This is a conceptual exercise about backreferences in regex.
Option 1: Incorrect. \1 does not refer to the first character in the string; it refers to the content captured by the first capturing group.
Option 2: Incorrect. Backreferences cannot refer to non-capturing groups; they only refer to capturing groups (defined by parentheses (...)).
Option 3: Correct. Backreferences allow the regex to match the same text that was previously captured by a capturing group. This is useful for finding repeated words, patterns, or mirrored sequences.
Option 4: Incorrect. Quantifiers like `{n}` are used to repeat a pattern a fixed number of times; this is different from backreferences.
Key Takeaway: Use backreferences (\1, \2, ...) when you want to ensure that a later part of the string matches exactly what was captured earlier. They are particularly useful for repeated words or mirrored patterns.
You have the following text:
"Love #Python3! #Regex is #fun, but #coding123 is awesome."
You want to extract all hashtags (words starting with # and containing letters and digits only, no punctuation). Which regex pattern will work correctly?
This exercise focuses on multi-step extraction using regex for a practical scenario: extracting hashtags.
Option 1:#\w+! only matches hashtags that end with an exclamation mark, missing normal hashtags.
Option 2: Correct. #\w+ matches # followed by letters, digits, or underscores, capturing all hashtags like #Python3, #Regex, #fun, #coding123.
Option 3:#[a-zA-Z]+ only matches letters, missing digits in hashtags like #Python3 or #coding123.
Option 4:#\w+\b[^\W_] is overly complicated and invalid; it may not match correctly.
Example usage:
import re
text = "Love #Python3! #Regex is #fun, but #coding123 is awesome."
pattern = r'#\w+'
hashtags = re.findall(pattern, text)
print(hashtags) # ['#Python3', '#Regex', '#fun', '#coding123']
Tip: Use \w+ to match letters, digits, and underscores. For hashtags, this captures typical alphanumeric hashtags without punctuation.
You have the following text:
"Total: $100 USD, Discount: $20 USD, Tax: 5 USD"
You want to extract numbers that are preceded by $ and followed by USD. Which regex pattern will work correctly?
This exercise demonstrates combining positive lookbehind and positive lookahead for precise matching.
Option 1:\$\d+USD matches the dollar sign and USD text along with the number, but we may want only the number.
Option 2:(?<=\$)\d+ matches numbers after $, but does not ensure they are followed by USD.
Option 3: Correct. (?<=\$)\d+(?=\sUSD) means:
(?<=\$) → number must be preceded by $
\d+ → match one or more digits
(?=\sUSD) → number must be followed by a space and "USD" without including it in the match
Option 4:\d+(?=USD) matches digits before USD but ignores the requirement for a preceding $.
[a-zA-Z0-9-]+\.[a-zA-Z]+ → domain allows letters, digits, hyphens, and a valid TLD
(?:/\S*)? → optional path or query parameters, no spaces allowed
$ → ensures full string match
Example usage:
import re
pattern = r'^https?://[a-zA-Z0-9-]+\.[a-zA-Z]+(?:/\S*)?$'
urls = [
'http://example.com',
'https://my-site.com/path/to/page?query=1',
'ftp://invalid.com',
'https://space inurl.com'
]
valid_urls = [u for u in urls if re.match(pattern, u)]
print(valid_urls)
# ['http://example.com', 'https://my-site.com/path/to/page?query=1']
Tip: For URL validation, carefully handle optional paths and query parameters, use anchors to match the full string, and disallow spaces. Non-capturing groups (?: ... ) are useful for grouping without capturing.
Quick Recap of Python Regular Expressions Concepts
If you are not clear on the concepts of Regular Expressions, you can quickly review
them
here before
practicing the exercises. This recap highlights the essential points and logic to help you solve
problems
confidently.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. Instead of checking text manually with multiple conditions or string functions, regex allows you to describe the structure of the text you want to find.
Regular expressions in Python are extremely useful for tasks such as:
Extracting all numbers from a string
Validating email addresses
Finding specific word patterns
Splitting text using multiple delimiters
Replacing multiple text patterns efficiently
Python provides regex support through the built-in re module:
import re
Common Regex Functions in Python
Python's re module provides several useful functions to search, match, replace, and split text using regular expressions.
Function
Description
re.search()
Finds the first occurrence of the pattern anywhere in the string.
re.match()
Checks whether the pattern matches from the beginning of the string.
re.findall()
Returns a list of all non-overlapping matches.
re.finditer()
Returns an iterator containing match objects for all matches.
re.sub()
Replaces matched patterns with the specified replacement text.
re.split()
Splits a string based on the given regex pattern.
re.compile()
Precompiles a pattern for better performance when used multiple times.
Basic Regex Syntax and Meaning
Regular expressions use special symbols and sequences to represent different kinds of text patterns. Below are some of the most commonly used regex components in Python.
Pattern
Meaning
.
Matches any single character except newline.
\d
Matches any digit (0–9).
\w
Matches any word character (letters, digits, underscore).
\s
Matches any whitespace character.
^
Matches the start of a string.
$
Matches the end of a string.
*
Matches 0 or more repetitions.
+
Matches 1 or more repetitions.
?
Matches 0 or 1 repetition.
{m,n}
Matches the previous pattern between m and n times.
[]
Defines a character set.
|
Acts as an OR operator.
()
Groups patterns together.
Why Use Regular Expressions?
Regular expressions are extremely powerful tools for working with text. They allow you to search, extract, validate, and manipulate patterns in strings with precision. Instead of manually checking each character or writing lengthy conditional logic, regex provides a compact and expressive way to describe complex text rules.
Find patterns quickly: Search for words, digits, symbols, or custom text patterns.
Validate inputs: Email formats, phone numbers, IDs, passwords, etc.
Extract information: Pull specific details from logs, forms, or documents.
Replace or clean text: Remove unwanted characters, format data, or sanitize input.
Efficient text parsing: Handle large datasets, files, and structured/unstructured text.
In Python, the re module makes working with regular expressions simple and efficient. With just a few expressions, you can accomplish tasks that would otherwise require many lines of code.
Basic Syntax and Metacharacters
Regular expressions work through a collection of special characters known as metacharacters. These characters help define search patterns and give regex its power. Understanding these basics is essential before working on more advanced patterns.
Metacharacter
Meaning
Example
.
Matches any character except newline
a.c matches abc, a1c, a-c
^
Matches the start of a string
^Hello matches any string starting with “Hello”
$
Matches the end of a string
world$ matches any string ending with “world”
*
Matches 0 or more repetitions
go* matches g, go, goo...
+
Matches 1 or more repetitions
go+ matches go, goo...
?
Matches 0 or 1 repetition (optional)
colou?r matches color and colour
[]
Character set
[abc] matches a, b, or c
()
Grouping or capturing
(abc)+ matches repeated “abc”
|
Alternation (OR)
cat|dog matches cat or dog
\
Escape special characters
\. matches a literal dot(.)
These metacharacters form the foundation of almost all regular expression patterns. In the following sections, you will see how they work together inside Python code.
Using Regular Expressions in Python (re Module)
Python provides a powerful built-in module named re that allows you to work with regular expressions. This module includes functions to search, match, split, substitute, and find all occurrences of patterns within strings. Below are the most commonly used functions in the re module.
Function
Description
Example (Scrollable Code)
re.search()
Searches for the first occurrence of a pattern
import re
text = "Welcome to Python"
result = re.search("Python", text)
print(result)
re.findall()
Returns all matches as a list
import re
text = "cat, dog, cat, tiger"
matches = re.findall("cat", text)
print(matches) # ['cat', 'cat']
re.match()
Checks only the beginning of a string
import re
text = "Python is fun"
result = re.match("Python", text)
print(result)
re.split()
Splits a string by the matched pattern
import re
text = "one-two-three"
parts = re.split("-", text)
print(parts)
re.sub()
Replaces pattern occurrences with another string
import re
text = "blue sky, blue ocean"
result = re.sub("blue", "green", text)
print(result)
These functions cover almost all regular expression use cases in Python. In the upcoming sections, we will explore more advanced usage, pattern typing, special sequences, and practical examples.
Character Classes in Regular Expressions
Character classes allow you to match a specific set or range of characters. They are written inside square brackets []. Instead of writing long OR conditions, character classes provide a shorter and more flexible way to define which characters are acceptable in a match.
1. Basic Character Class
Matches any one character from the specified set.
import re
pattern = r"[abc]"
text = "apple"
result = re.findall(pattern, text)
print(result) # ['a']
2. Character Range
Use a hyphen - to define a range of characters.
import re
pattern = r"[a-z]"
text = "Python3"
result = re.findall(pattern, text)
print(result) # ['y', 't', 'h', 'o', 'n']
3. Negated Character Class
Use ^ inside the brackets to match anything except the defined characters.
import re
pattern = r"[^0-9]"
text = "A1B2C3"
result = re.findall(pattern, text)
print(result) # ['A', 'B', 'C']
4. Multiple Character Groups
You can combine ranges and characters in the same class.
import re
pattern = r"[A-Za-z0-9]"
text = "Hello#2024!"
result = re.findall(pattern, text)
print(result)
Common Uses
Matching alphabet characters
Detecting digits
Filtering out unwanted characters
Validating usernames, filenames, and identifiers
Character classes are one of the most fundamental features of regex. They make matching flexible, readable, and powerful for building text filters, validators, and pattern recognizers.
Predefined Character Classes in Regular Expressions
Regular expressions provide several built-in shorthand character classes that make pattern matching cleaner and easier. These shortcuts match common types of characters such as digits, letters, or whitespace, saving you from manually writing long ranges.
Pattern
Description
Example Match
\d
Matches any digit (0–9)
“7” in "Grade7"
\D
Matches any non-digit
“G” in "G7"
\w
Matches letters, digits, and underscore
“A” or “t” in "A_10"
\W
Matches any non-alphanumeric character
“#” in "A#1"
\s
Matches whitespace (space, tab, newline)
space in "Hello World"
\S
Matches any non-whitespace
“H” in "Hello"
\b
Word boundary
Match before “Python” in "Python code"
\B
Not a word boundary
Middle of a word
Example: Extracting Digits from a String
import re
pattern = r"\d+"
text = "Order ID: 45829, Amount: $1200"
result = re.findall(pattern, text)
print(result) # ['45829', '1200']
Example: Matching Words Only
import re
pattern = r"\w+"
text = "Hello World_2024!"
result = re.findall(pattern, text)
print(result) # ['Hello', 'World_2024']
Example: Finding Whitespace
import re
pattern = r"\s"
text = "New York City"
result = re.findall(pattern, text)
print(result) # [' ', ' ']
Predefined character classes make regex easier, cleaner, and more readable. They cover the majority of common text-matching needs and work in almost all data-processing tasks.
Anchors: Matching Start and End of Strings
Anchors match positions, not characters. They help validate formats such as IDs, emails, and fixed-pattern inputs.
Practicing Python Regular Expressions? Don’t forget to test yourself later in
our
Python Quiz.
About This Exercise: Python – Regular Expressions
Welcome to Solviyo’s Python – Regular Expressions exercises, a carefully designed collection that helps you understand how pattern matching works in Python. Regular expressions (often called regex) are powerful tools that allow you to search, validate, and manipulate text with precision. These exercises are meant to guide you through the core ideas behind regex so you can use them confidently in real-world Python projects.
If you ever need a short refresher before solving the exercises, feel free to expand the Quick Recap section. It gives you a fast summary of the essential concepts so you can jump into the questions with clarity.
What You Will Learn
In this exercise set, you’ll work through the most important elements of Python regular expressions, including:
How patterns work and how to write them effectively using special symbols and character classes.
Using functions like match(), search(), findall(), and sub() from Python’s re module to process and transform text.
Understanding anchors, quantifiers, groups, and ranges to create flexible and powerful regex patterns.
Applying regular expressions to validate input, extract information, and clean up messy data.
Each exercise includes a clear answer and explanation, helping you understand not just the “what,” but also the “why” behind every solution. This makes the learning process smoother and helps you develop a strong intuition for pattern-based problem solving.
Why Regular Expressions Matter in Python
Regex is one of those skills that becomes more valuable the deeper you go into programming. Whether you’re analyzing text, cleaning data, building automation scripts, processing logs, or validating user input, regular expressions save time and make your code more efficient. Having a good grasp of regex strengthens your ability to solve problems quickly, especially in data-heavy work, backend development, or tasks that involve parsing large text files.
We created these exercises to make regex feel less intimidating and more practical. By practicing real use-cases instead of memorizing patterns, you’ll develop a hands-on understanding that sticks with you.
Start Practicing Python Regular Expressions
These Python regex exercises will help you build confidence step by step. Take your time, explore the patterns, and use the Quick Recap whenever you need a quick reminder. By the time you work through all the questions, you’ll be ready to apply regular expressions to real Python problems with accuracy and ease.
Need a Quick Refresher?
Jump back to the Python Cheat Sheet to review concepts before solving more challenges.