Program to Extract Needed Info From Random Data

05 min read

Task 1: Write a program to filter out all unwanted data in a given line and extract only the domain of email service provider that is mentioned in the line.

Solution: Say the given line is "From hello@focusacademy.in I received a placement related mail on Saturday early morning" 

The string that needs to be extracted is focusacademy.in

Code to solve the above problem using string manipulations:

data = 'From hello@focusacademy.in I received a placement related mail on Saturday early morning'
position1 = data.find('@') # find the index of @ character and store in position1 variable
print(position1)
position2 = data.find(' ', position1) # starting search from position 1, find the next ' ' (space).
print( position2) #It is obvious that domain name is stored between index of position1 to position2
answer= data[position1 + 1 : position2] # Note: starting from one position after @ to the position of space.
#Here, observe that space is not counted since, in Python [2:5] does not count 5.
print(answer)
Output:
focusacademy.in


Task 2: If the data was present in a file with other unwanted data:

For example say, faceprep.txt contains the following data

"This is to inform......blah blah blah
From hello@focusacademy.in I received a placement related mail on Saturday early morning
Subject of the mail is ....
It is not hard to get placed From hello@faceprep.in wanted to connect with you on tuesday
hello world "

Solution:


fhand = open('faceprep.txt') #open the file
for line in fhand: # for everyline in the file
line = line.strip() #strip() removes if any spaces are found in the start or end of the line.
if line.startswith('From:') #startswith() is an inbuilt function that checks if the line starts with user specified string. position1 = data.find('@') # find the index of @ character and store in position1 variable
print(position1)
position2 = data.find(' ', position1) # starting search from position 1, find the next ' ' (space).
print( position2) #It is obvious that domain name is stored between index of position1 to position2
answer= data[position1 + 1 : position2] # Note: starting from one position after @ to the position of space.
#Here, observe that space is not counted since, in Python [2:5] does not count 5.
print(answer)
Output:
focusacademy.in
faceprep.in
POST A NEW COMMENT
     
  • Input (stdin)

    Output (stdout)


    Input (stdin)

    Your Output (stdout)

    Expected Output

    Compiler Message

    Input (stdin)

    2    3

    Your Output (stdout)

    5

    Expected Output

    5

    Compiler Message

    5

    Error