FIFA-2022 Career Guide Free Tutorials Go to Your University Placement Preparation 
0 like 0 dislike
in Python Programming by Goeduhub's Expert (3.1k points)
Unicodedata (Database) in Python.

1 Answer

0 like 0 dislike
by Goeduhub's Expert (3.1k points)
edited by
Best answer

Unicode Character Database Official Documentation 

Unicode Character Database: 

The Unicode Character Database (UCD) consists of a number of data files listing Unicode character properties and related data. 

In simple terms defined the character properties for all unicode characters.

What is Unicode Characters:

We know that it is difficult to understand any language for a machine. Apart from this, we have boxes , symbols the text of any language.

So, in simple terms converting symbols , characters into a form that machine can understand.

Unicode Characters 

In python we use unicoedata library to use Unicode Database Characters.


pip install unicodedata2


#lookup function unicodedata

import unicodedata    

print (unicodedata.lookup('LEFT CURLY BRACKET') )

print (unicodedata.lookup('RIGHT CURLY BRACKET') )

print (unicodedata.lookup('ASTERISK'))

print (unicodedata.lookup('HYPHEN'))

print (unicodedata.lookup('HIGH VOLTAGE SIGN') )



Note: Here we used Lookup Function, used to get symbols for the corresponding character name passed in it.

If we passed a wrong name for example hyph for hyphen then this will give error.

#name function unicodedata

import unicodedata    

print ('^') )

print ('|') )

print (':') )

print ('&'))

print ('@') )

print ('`') )

print (' ') )



Note: name function in unicodedata return the name of assigned character. See the above code.

#Category function unicodedata

import unicodedata

print (unicodedata.category(u'&'))

print (unicodedata.category(u'9') )

print (unicodedata.category(u'a') )  

print (unicodedata.category(u'A') )



Note:  Category Function return the general category assign to the character. In the above example the  "&" belongs to the symbol category. The "9" belongs to the digit category. And  "A" and "a" belong to the letter category. In the output Ll = L -Letter and l-lowercase. Lu = L-Letter and u-uppercase. po= p-Punctuation and o- other. Nd= N-Number and d- decimal digit.

#normalize function unicodedata

from unicodedata import normalize  

print( '%r' % normalize('NFD', u'\u00C7') )

print( '%r' % normalize('NFC', u'C\u0327') )

print( '%r' % normalize('NFKD', u'\u2460') )



Note:Return the normal form form for the Unicode string unistr. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

The Unicode standard defines various normalization forms of a Unicode string based on the definition of canonical equivalence and compatibility equivalence.In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA).

For example in the above example two form of unicode denote one character that is  (C-cedilla).

For Detailed study of forms (Official Documentation) ,Function in Unicodedata 

Python Tutorial 

Machine Learning Tutorial 

AI Tutorial

Learn & Improve In-Demand Data Skills Online in this Summer With  These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.

Best Data Science Courses on Datacamp
Best Data Science Courses on Coursera
Best Data Science Courses on Udemy
Best Data Science Courses on Pluralsight
Best Data Science Courses & Microdegrees on Udacity
Best Artificial Intelligence[AI] Courses on Coursera
Best Machine Learning[ML] Courses on Coursera
Best Python Programming Courses on Coursera
Best Artificial Intelligence[AI] Courses on Udemy
Best Python Programming Courses on Udemy

 Important Lists:

Important Lists, Exams & Cutoffs Exams after Graduation PSUs


About Us | Contact Us || Terms & Conditions | Privacy Policy ||  Youtube Channel || Telegram Channel © Social::   |  | 


Free Online Directory