Stock Prediction using Machine Learning and Python

0 like 0 dislike
2k views

Machine learning has significant applications in the stock price prediction. In this machine learning project, we will be talking about predicting the returns on stocks. This is a very complex task and has uncertainties.  We will learn how to predict stock price using the LSTM neural network.

0 like 0 dislike
by (117 points)
selected by

STOCK MARKET PREDICTION

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yeild significant profit.  Stocks are basically an aquity investment that represents part ownership in a corporation or a company, it entitles you to part of that company's earnings and assets.

DATASET

The historical stock data is collected from the Google stock price and this historical data is used for the prediction of future stock prices. To build the stock market prediction model, we will use the Google Stock Price Train dataset. Click here to download the dataset.

 #importing the librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport datetime

Next, what i'm going to do is read the dataset.

OUTPUT Since i've used dataset.head() here you can see the top five rows. If I would have used dataset.tail() and run it you can see the bottom five.

So, next what i'm going to do is i'll check if any of my data is not applicable. This is any() function is used to detect the missing values, it returns a boolean same size object indicating if the values are not applicable.

 dataset.isna().any()

OUTPUT

 Open False High False Low False Close False Volume False dtype: bool

Finally we're printing out the growth of the price stocks 2012 - 2017. As we're talking about the google and if i'm not wrong google spent companies so, its stock price rise by almost 85% between 2014 - 2017 going from about 820\$ - 1519\$ in three years.

 dataset['Open'].plot(figsize=(16,6))

OUTPUT

Now the next thing we're interested in is What is a 7 day rolling mean of a stock price? - for every single stock prediction we look 7 days back collect all the transactions that fall in this range and get the average of a column. Luckily is extremely easy to achieve with pandas. OUTPUT Now, compare with the previous graph that we've get and rolling mean. This basically gives you the moving average of past 30 days.

 dataset['Open'].plot(figsize=(16,6))dataset.rolling(window=30).mean()['Close'].plot()

OUTPUT Let's try to plot the close column v/s the 7 day moving average of the close column.

 dataset['Close: 30 Day Mean'] = dataset['Close'].rolling(window=30).mean()dataset[['Close','Close: 30 Day Mean']].plot(figsize=(16,6))

OUTPUT I also had a optioned of going ahead and specifying a minimum number of periods.

 # Optional specify a minimum number of periodsdataset['Close'].expanding(min_periods=1).mean().plot(figsize=(16,6))

OUTPUT And with that we're creating the our dataframe which is of the training set. and reading the content of the dataset using pandas.

 training_set = dataset['Open']training_set = pd.DataFrame(training_set)

DATA PREPROCESSING: The pre-processing stage involves Data discretization, Data transformation, Data cleaning, Data integration. After the dataset is transformed into a clean datset, the dataset is divided into training and testing sets to evaluate.                                                                                                                                 We're going to start out  by cleaning our data we're doing the same thing which we've done before is checking if there is any not applicable possibilities.

 # Data cleaning dataset.isma().any()

And then move on to feature scaling for which we're going to be importing MinMaxScaler from sklearn which is nothing but a machine learning library for python, we're using the MinMaxScaler to transform features by scaling each of them to set range.

 # feature Scalingfrom sklearn.preprocessing import MinMaxScalersc = MinMaxScaler(feature_range = (0,1))training_set_scaled = sc.fit_transform(training_set)

Then finally we're going to creating a data structure with 60 timesteps and 1 output, so basically what we're trying to do here is that we're basically going to take the data from day 1 - day 60 and then make prediction on the 61st day and then we're going to follow it up by taking data from day 2 - day 61 and then predict on 62nd day.

 # Creating a data structure with 60 timesteps and 1 outputx_train = []y_train = []for i in range(60,1258):    x_train.append(training_set_scaled[i-60:i, 0])    y_train.append(training_set_scaled[i, 0])x_train, y_train = np.array(x_train), np.array(y_train)# Reshapingx_train = np.reshape(x_train, (x_train.shape, x_train.shape, 1))

FEATURE EXTRACTION: In this layer , only the features which are to be fed to the neural network are chosen.

 # Part 2 - Building the RNN#Importing the Keras libraries and packagesfrom keras.models import Sequentialfrom keras.layers import Densefrom keras.layers import LSTMfrom keras.layers import Dropout

Next we're going to be initialising the RNN, so for a time series problem we're basically going to be using regression model, for a regression deep learning model. first step is to read in the data which is a sequential data and assigned to the model called regressor.

 # Initialising the RNNregressor = Sequential()

TRAINING NEURAL NETWORK: In this stage, the data is fed to the neural network and trained for prediction assigning random biases and weights. Now this LSTM model is composed of a sequential input layer followed by three LSTM layers and a dense layer with activation and then finally a dense output layer with the linear activation functions.

Next what we're going to do is compile our RNN.

 # Compiling the RNNregressor.compile(optimizer = 'adam', loss = 'mean_squared_error')#fitting the RNN to the Training setregressor.fit(x_train, y_train, epochs = 100, batch_size = 32)

VISUALIZATION: A rolling analysis of a time series model is often used to assess the model's stability over time. When analyzing financial time series data using a statistical model, a key assumption is that the parameters of the model are constant over time.

 # Part 3 - Making the prediction and visualising the results# Getting the real stock price of 2017dataset_test = pd.read_csv("Google_Stock_Price_Train.csv", index_col="Date",parse_dates=True)
 real_stock_price = dataset_test.iloc[:, 1:2].values

OUTPUT Here again reading the test set and putting it in a dataframe.

 dataset_test["Volume"] = dataset_test["Volume"].str.replace(',', '').astype(float)
 test_set=dataset_test['Open']test_set=pd.DataFrame(test_set)

And finally to get predicted stock price of 2017 with the merged training set and test set on the 0th axis.

 # getting the predicted stock price of 2017dataset_total = pd.concat((dataset['Open'], dataset_test['Open']), axis = 0)inputs = dataset_total[len(dataset_total) - len(dataset_test) - 60:].valuesinputs = inputs.reshape(-1,1)inputs = sc.transform(inputs)x_test = []for i in range(60,80):    x_test.append(inputs[i-60:i, 0])x_test = np.array(x_test)x_test = np.reshape(x_test, (x_test.shape, x_test.shape, 1))predicted_stock_price = regressor.predict(x_test)predicted_stock_price = sc.inverse_transform(predicted_stock_price)
 predicted_stock_price = pd.DataFrame(predicted_stock_price)predicted_stock_price.info()

OUTPUT

 RangeIndex: 20 entries, 0 to 19 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 0 20 non-null float32 dtypes: float32(1) memory usage: 208.0 bytes

Finally we're going to use matplotlib to visualize the results of the predicted stock and the real stock price. 