Extracting URLs from a Given String Using Regular Expressions
In this section, we will use a regular expression to extract URLs from a string.
Example:
import re
def Find(string):
# findall() finds all substrings where the regex pattern matches
url = re.findall('https?://(?:[-\\w.]|(?:%[\\da-fA-F]{2}))+', string)
return url
# Given string containing URLs
string = 'The URL of RMeve is: https://www.pmeve.com, and the URL of Google is: https://www.google.com'
print("URLs: ", Find(string))Explanation:
https?://: Matches 'http' or 'https'.(?:[-\\w.]|(?:%[\\da-fA-F]{2})): Matches URL-safe characters such as alphanumeric characters, hyphens, periods, or URL-encoded characters like%20.
Output:
URLs: ['https://www.pmeve.com', 'https://www.google.com']
This code successfully extracts both URLs from the string.
Explanation of Non-capturing Group (?:x)
(?:x) is a non-capturing group. It matches the expression x but does not store the match for later use. This is useful for grouping parts of a regular expression without affecting back-references or capturing sub-patterns.
For example, in /foo{1,2}/, the {1,2} applies only to the last character o. However, in /(?:foo){1,2}/, the {1,2} applies to the entire word "foo".