Online Courses
Free Tutorials  Go to Your University  Placement Preparation 
Goeduhub's Online Courses @ Udemy in Just INR 570/-
Online Training - Youtube Live Class Link
0 like 0 dislike
320 views
in Python Programming by Goeduhub's Expert (3.1k points)
edited by

A regular expression or RegEx is a sequence of characters which enable you to find string or set of string using a specialized pattern. The module re provides support for Perl-like regular expressions in Python. The re module offers a set of functions like search(), findall(), split() and sub() which allows us to search a string for a match. 

In this tutorial we are going to study followings-

1)  What is Python regular expression?

2) How do you use regular expressions in Python?

3) What is re in Python?

4) Can you use RegEx in Python?

5) What are metacharacters in Python?

Goeduhub's Top Online Courses @Udemy

For Indian Students- INR 360/- || For International Students- $9.99/-

S.No.

Course Name

 Coupon

1.

Tensorflow 2 & Keras:Deep Learning & Artificial Intelligence

Apply Coupon

2.

Natural Language Processing-NLP with Deep Learning in Python Apply Coupon

3.

Computer Vision OpenCV Python | YOLO| Deep Learning in Colab Apply Coupon
    More Courses

2 Answers

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by
 
Best answer

What is Regular Expression?

Regular Expression

A regular expression is a sequence of characters which enable you to find string or set of string using a specialized pattern.

Regular expressions also called REs, or regexes, or regex patterns made available through the re module in Python.

Now let's try to understand it 

Note: In Python if we try to find out whether a particular characters (or set of characters) is in string or not, if it is in string then where it is (means index of it). 

For this we have to write a simple code given below

#A simple example of python

Str= 'abcdefg12345'

print('23' in Str)

Str.index('23')

Output : True 

                8

Note: In the above example we simply trace each character one by one and match them to find out required characters. It works for simple problems but what if we have asked to find out Three consecutive decimal digits in a given String.

In such cases we cannot use simple method (as we used in above example)

Here come the concept of Regular Expression Or RegEX.

Regular expression is a python library we can install it with help of pip.

Installation:  

  

pip install regex 

In python there is a package called re to use regex (Regular Expression). 

Now let's see some examples to understand it.

#A simple example

import re

Str= 'abcdefg12345'

re.search("123", Str)

 Output: 

regularexpression

Note: As you can see the output contain useful information about string in just one line of code. It give information if a searched object present in string or not and the index of objects in string.

re.search() function return a match objects if there is a match of searched object into given string.

Actually re offers some functions:

  1. search()
  2. findall()
  3. split()
  4. and sub()

search(): It return a match objects if there is a match of searched object into given string.

*** Note that in the absence of no match (means no match object found for searched object), the search function will execute without a output and error. (means no error and no output) ***

#search function explanation 

import re

Str = 'Python is a useful language Python'

match= re.search('Python', Str)

if match:

  print("match found")

else: 

  print("None")

Output: 

match found

Note: Here we simply searched  for a object (Python) to find out if it is in string or not.

findall(): It return all the matched objects in string.

#findall function explanation 

import re

Str = 'Python is a useful language Python'

re.findall('Python', Str)

Output:

regularexpression

Note: As we have to "Python" in our string, findall() function return both  "Python".

Split(): Split the string where split has matched and formed a list of it.

#split function explanation 

import re

Str = 'Python is a useful language Python'

match= re.split('use', Str)

print(match)

match= re.split(' ', Str)

print(match)

Output: 

regularexpression

sub(): Replace the matches with a characters (or string ) of your choice.

#sub function explanation 

import re

Str = 'Python is a useful language Python'

match= re.sub('use', " 123", Str)

print(match)

match= re.sub(' ', "5", Str)

print(match)

Output:

regularexpression


What are metacharacters in Python?

Metacharacters 

what is meta characters in Regular Expression (re), To understand it let's first see an example.

#Example of metacharcaters 

import re

Str = 'Rama is 22 and Sita is 23 and Ravan is 44 '

match= re.findall(r'[A-Z][a-z]*', Str)

print(match)

match= re.findall(r'\d{1,3}', Str)

print(match)

Output

regularexpression

Note: '[A-Z][a-z]*' and '\d{1,3}' seeing these you might be wondering . Actually these are metacharcters in regular expression (re) with a special meaning. 

Here, [A-Z][a-z]* means:We are searching for objects in string start with a capital letter (A to Z ,any) followed by small letter (a to z , any). In the above example all names start with a capital letter followed by small letter. And because we used findall() function That's why we got all the names present in string. 

What if we used search function not findall(), you can see it by replacing findall() in the above example with search().

BTW, we will get first matched object when use search function. (i.e. Rama)

'\d{1,3}': \d means finding all digit characters in string and {1,3} number of atleast 2 digits (22, 23 two digits).

Here, Listed some important Metacharacters 

Characters (Example)Description 
[]  ([a-z])A set of characters
\  (\d)Signals a special sequence 
.  (he....o)Any character (Except newline character)
^  (^hello)Start with
$ (End$)Ends With
* (an*)Zero or more occurrences 
+ (any+)One or more occurrences
{} (al{2})Exactly the specified number of occurrences
|   (python|C)Either or
() Capture a Group

Special Sequences 

CharactersDescription
\AReturns a match if the specified characters are at the beginning of the string
\bReturns a match where the specified characters are at the beginning or at the end of a word
\BReturns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word
\dReturns a match where the string contains digits (numbers from 0-9)
\DReturns a match where the string DOES NOT contain digits
\sReturns a match where the string contains a white space character
\SReturns a match where the string DOES NOT contain a white space character
\wReturns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
\WReturns a match where the string DOES NOT contain any word characters
\ZReturns a match if the specified characters are at the end of the string

Artificial Intelligence(AI) Training in Jaipur 

Machine Learning(ML) Training in Jaipur 

0 like 0 dislike
by Goeduhub's Expert (3.1k points)

Use of Regular Expression 

  1. Finding Strings in large data. 
  2. to check email address (whether they are correct or not).
  3. To extract useful data (In the above example name and age).

Some Basic Examples 

Checking Name is correct or not

Validity : Let's assume that a name with white space is valid name without white space it is not valid.

Now let's find out valid and invalid name 

A name with white space 

#full name validation 

pn= "Goeduhub technologies"

x=re.search("\w{1,21}\s\w{1,21}",pn)

# here \w matches any alphanumeric word character [a-zA-Z0-9_]

#{1,21} means match 1 to 21 characters

# here \s matches any whitespace character

if x:

  print("valid Name")

else:

  print("Not")

Output: valid Name

A name without white space 

#full name validation 

pn= "Goeduhub technologies"

x=re.search("\w{1,21}\S\w{1,21}",pn)

if x:

  print("valid Name")

else:

  print("Not")


Output: Not

Checking if phone number is correct or not

Validity : total ten digits in format 3-4-3 

#phone number validation 

pn= "123-3456-234"

x=re.search("\d{3}-\d{4}-\d{3}",pn)

if x:

  print("valid number")

else:

  print("Not")

Output: valid Number

Note: In both examples we first think a condition of validity and then based on the condition we checked / searched for validity.

 In second example we used \d instead of \w ?

\d- valid number - 123-2345-789 

\w- valid number- 213-74fg-87d , 112-6473-977 both valid in case of \w.


Artificial Intelligence(AI) Training in Jaipur 

Machine Learning(ML) Training in Jaipur 

3.3k questions

7.1k answers

394 comments

4.6k users

Related questions

0 like 0 dislike
1 answer 183 views
asked Sep 18, 2020 in Python Programming by Nisha Goeduhub's Expert (3.1k points)
0 like 0 dislike
1 answer 1.2k views
0 like 0 dislike
6 answers 2.4k views
asked Mar 16, 2020 in Python Programming by Nisha Goeduhub's Expert (3.1k points)
1 like 0 dislike
2 answers 1.2k views
asked Feb 13, 2020 in Python Programming by Nisha Goeduhub's Expert (3.1k points)
0 like 0 dislike
1 answer 3.3k views

 Goeduhub:

About Us | Contact Us || Terms & Conditions | Privacy Policy || Youtube Channel || Telegram Channel © goeduhub.com Social::   |  | 
...