Coronavirus
Coronaviruses are a group of related RNA viruses that cause diseases in mammals and birds. In humans, these viruses cause respiratory tract infections that can range from mild to lethal. Mild illnesses include some cases of the common cold (which is caused also by certain other viruses, predominantly rhinoviruses), while more lethal varieties can cause SARS, MERS, and COVID-19.This virus originated from Wuhan city of China.
Note
- Here we will make a simple machine learning model to predict whether you have an coronavirus infection or not (or probability of having infection).
- The data that we will use here is not an official data, it has been created randomly.
- Because our data is not accurate here, it is not necessary to predict our model correctly.
- Here we are just trying to understand how machine learning can help us.
- If we have official and accurate data, then we can create an accurate model.
Practical Implementation
Required Libraries
#importing required libraries import pandas as pd import numpy as np import sklearn from sklearn.metrics import mean_squared_error |
---|
Note: Here we imported all required and basic libraries to solve the problem. Numpy, Pandas and sklearn etc..
Data used here is not accurate and official data but if you want to do practice you can download it from here (Click here for data)
#Reading csv file Data=pd.read_csv("randomdata.csv") Data.head() |
---|
Output
\
Note: As you can see from the above output we have basic features of coronavirus infection (i.e. fever, cold , age etc...) and our last column in data is a measure of all features (1, 0),where 1 means have an infection and 0 means no infection.
#Information of data Data.info() |
---|
Note:We have to check the information of the data so that we can do any correction that is required in data (null values, column type etc...). So that we don't face any problem in further processing the data.
#Defining our target (Y) and features (X) X = Data.drop('Probability', axis = 1) print(X.head()) print("data in Y") Y=Data['Probability'] Y.head() |
---|
Output
Note: In this section we have defined our target i.e. Y and features i.e. X. Basically,here our target is to find out the infection probability based on the features, so we have separated the column infection probability(Y) from other columns (X) (feature columns).
#Splitting train and test data X_train, X_test, Y_train, Y_test = sklearn.model_selection.train_test_split(X, Y, test_size = 0.33, random_state = 5) |
---|
Note: In this section we have applied train_test_split function to split data into train and test data.(For training and testing purpose)
#Converting into numpy array print(X_train.to_numpy()) Y_train.to_numpy() X_test.to_numpy() Y_test.to_numpy() |
---|
Output
#Importing logisticregression model from sklearn.linear_model import LogisticRegression clf =LogisticRegression() #training the model Y_train_pred=clf.fit(X_train,Y_train) |
---|
Note:In this section of code we have imported logistic regression machine learning model and train the model using fit function.
#Predicting using model #Infection (0,1) prediction infection=clf.predict([[98,20,0,1,0,0,0]]) #Infection probability prediction infection_probability= clf.predict_proba([[98,20,0,0,0,0,1]]) print(infection) print(infection_probability) |
---|
Output
Note
- In this part, we have predicted infection and (infection probability) with the model we have prepared.
- As you can see from the output, we have two types of output. In the first output we have predicted directly (1 or 0), whereas in another we have calculated the probability of infection.
- We used here logistic regression (Because of categorical data) , if you want you can use other model. (Model selection depends on accuracy of the model)
Register For Summer/Winter/Regular Training & Develop Projects Like This