Regular expressions_Extracting Necessary Information from Random Data

05 min read

Task: Extract important information from random data using Regular Expressions. 

  1. Extract Email ID from data = ' From FacePrep@focusacademy.in on Saturday morning Jan 26, important mail'.

Solution:

import re

data = ' From FacePrep@focusacademy.in on Saturday morning Jan 26, important mail'

y = re.findall('\S + @\S+', data)         # \S = atleast 1 non space character, '@' find this, + is to extract one or more sequences # we extract from one space to another space with a word containing '@' in it. 

Output:

y = ['FacePrep@focusacademy.in'] 

In the above code, it is important to note that we used the greedy algorithm to get the maximum number of matches. If we had entered '?' and turned the above code into a non-greedy program the output would have been y = ['p@f'] because it considers as minimum number of matches as possible.

 

2. Extract email id from a file with multiple lines.

import re

handle = open('filename.txt')

for line in handle: 

               y = re.findall('^From(\S+@\S+)' , line)        # '^' represents starting from "From" find a string from one space to another with '@' in it.

 

3. Extract only the domain of the email.

import re

data = ' From FacePrep@focusacademy.in on Saturday morning Jan 26, important mail'

y = re.findall('@([^ ]*', data)         # Start from '@' symbol and search till next space, one or more characters. A space is intentionally left blank inside [ ] to instruct the program to search till the next space
Output:

y = ['focusacademy.in']

 


4. Extract only the username in the email id.

import re

data = ' From FacePrep@focusacademy.in on Saturday morning Jan 26, important mail'

y = re.findall('^ From .*@([^ ]*)', data)         # From space to @ symbol

Output:

y = ['FacePrep'] 
POST A NEW COMMENT
     
  • Input (stdin)

    Output (stdout)


    Input (stdin)

    Your Output (stdout)

    Expected Output

    Compiler Message

    Input (stdin)

    2    3

    Your Output (stdout)

    5

    Expected Output

    5

    Compiler Message

    5

    Error