Extracting URLs from a Given String Using Regular Expressions
In this section, we will use a regular expression to extract URLs from a string.
Example:
import re def Find(string): # findall() finds all substrings where the regex pattern matches url = re.findall('https?://(?:[-\\w.]|(?:%[\\da-fA-F]{2}))+', string) return url # Given string containing URLs string = 'The URL of RMeve is: https://www.pmeve.com, and the URL of Google is: https://www.google.com' print("URLs: ", Find(string))
Explanation:
https?://
: Matches 'http' or 'https'.(?:[-\\w.]|(?:%[\\da-fA-F]{2}))
: Matches URL-safe characters such as alphanumeric characters, hyphens, periods, or URL-encoded characters like%20
.
Output:
URLs: ['https://www.pmeve.com', 'https://www.google.com']
This code successfully extracts both URLs from the string.
Explanation of Non-capturing Group (?:x)
(?:x)
is a non-capturing group. It matches the expression x
but does not store the match for later use. This is useful for grouping parts of a regular expression without affecting back-references or capturing sub-patterns.
For example, in /foo{1,2}/
, the {1,2}
applies only to the last character o
. However, in /(?:foo){1,2}/
, the {1,2}
applies to the entire word "foo".