Regular Expressions, Examples, Greedy and Non-Greedy expressions

05 min read

In the previous article, we have seen different functions to use regular expressions in Python. Now, let us discuss more in detail about their implementation and applications. To, understand better consider a task,

Task: Print all the lines inside a file which start from the word "From" using normal string functions / using Regular expresions

Normal method:

hand = open('file_name.txt')
for line in hand:
line = line.rstrip()
if line.startswith('From:'):
print line

Using Regular expressions

import re
hand = open('file_name.txt')
for line in hand:
line = line.rstrip() # this command is to remove extra spaces in each line, if any present.
if re.search('^From:',line): # matches if first word is From, in line (iteration)
print line


Greedy and Non greedy functions in regular expressions:
Greedy functions are the ones which matches maximum number of required expression whereas non greedy functions match as few as possible.
This can be best explained by the following example:

X= 'From : Using the : FACEPrep' # a string
y= re.findall('^F.+:', X) # starting from F, one or more matches, till colon is found (:)

Output: [ 'From : Using the :' ]

X= 'From : Using the : FACEPrep' # a string
y= re.findall('^F.+?:', X) # starting from F, one or zero, as few matches as possibe, till colon is found (:)
# observe the use of ? in the expression
Output:
[ ' From :']


Now let us consider a case where, a special character is used which we need to use as a normal character while operating through regular expressions, For example,
say we need to extract the amount in dollars specified among random data x = ' we found $10 on the road '. The $ character is recognised as a regular expression hindering
our operation and hence, to avoid this we use the escape symbol "\" which is used preceding the symbol, which makes it a normal character than a special symbol.
Hence, to find the amount in the given string we write the code as follows:

x = 'we found $10.00 on the road'
y = re.findall('\$[0-9.]+', x) # find $ character in the line and the preceding numbers between 0 to 9, returns y= ['$10.00']. "\" removed the power of $ in the code.

POST A NEW COMMENT
     
  • Input (stdin)

    Output (stdout)


    Input (stdin)

    Your Output (stdout)

    Expected Output

    Compiler Message

    Input (stdin)

    2    3

    Your Output (stdout)

    5

    Expected Output

    5

    Compiler Message

    5

    Error