24x5 AI Stock Trading agent to predict stock prices | Live trading

Published in

Towards Data Science

9 min readApr 18, 2020

If you have followed the stock market recently, you would have noticed the wild swings due to COVID-19. It goes up a day and goes down another day, which AI might easily predict. Won’t it be wonderful to have a stock trading agent with AI powers to buy and sell stocks without the need to monitor hour by hour?

So I decided to create a bot to trade. You would have seen many models that read from CSV’s and create a neural network, LSTM, or Deep Reinforcement Models(DRL). However, those models end up in a sandbox environment and are often tricky to use in a live environment. So I created the AI pipeline, which trades in real-time. After all, who does not want to make money in the stock market? So let’s get started. Below is the process we are going to follow to implement it.

Alpaca Broker account
Alpaca python package for API trading
Collect data, EDA, feature engineering
AI model and Training
AWS Cloud to host code and get predictions
Create a lambda function and API
Trade stocks automatically

Alpaca Broker account:

Currently, most brokerage firms offer zero trading fees. However, not all brokerage firms have an API option to trade. Alpaca provides free trading with python API to trade. Once you create an account, you will have paper trading and live trading options. We can test the strategies in paper trading and implement them in live trading. It is just a key change for live trading.

Alpaca python package for API trading:

If you can have a local environment, you can install the pip package. Once installed, you can select Paper trading or Live trading.

Based on your selection, you can get the API and secret key.

Now, these keys will be used in our code.

import alpaca_trade_api as tradeapiapi = tradeapi.REST('xxxxxxxx', 'xxxxxxxxxx',base_url='https://paper-api.alpaca.markets', api_version='v2',)

Collect data and transform

Getting Data

One advantage of using Alpaca is you can get historical data from polygon API. The timeframe can be from minute, hour, day, etc. Once you create the data frame, the chart should be something like this.

Feature Engineering

Like any data science project, we need to create features related to the dataset. Some part of the implementation was referred from this article. I have built around 430+ technical indicators from the above dataset. Features include momentum, trends, volatility, RSI, etc.

Features have been created for each day. It can be easily created for hourly or any other timeframe. For some models which we are going to create, like LSTM, DRL we might need to use the original dataset.

Creating labels and features is where we have to create logic to train our model. For now, I have used the logic from this paper.

However, creating logic can be altered according to your needs. When performing unsupervised learning, you don’t require to create labels.

Finally, the data needs to be scaled. Neural networks work better in scaled data. The first function will fit the scaler object using the train data, and the next function is used to scale any dataset.

# scale train and test data to [-1, 1]
def transform_scale(train):
    # fit scaler
    
    print(len(train.columns))
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaler = scaler.fit(train)
    # transform train
    return scaler# scale train and test data to [-1, 1]
def scale(dataset, scaler):
    # transform train
    dataset = scaler.transform(dataset)
    print(dataset.shape)
    return dataset

Once we create the model, we have to prepare our data as a data loader. The below function will perform it.

def _get_train_data_loader(batch_size, train_data):
    print("Get train data loader.")
    
    train_X =    torch.from_numpy(train_data.drop(['labels'],axis=1).values).float()
    
    train_Y = torch.from_numpy(train_data['labels'].values).float()
    
    train_ds = torch.utils.data.TensorDataset(train_X,train_Y)return torch.utils.data.DataLoader(train_ds,shuffle=False, batch_size=batch_size)

AI model

In this section, we are going to create different types of models. However, these models might not be perfect for a time series dataset. I wanted to show how to use a deep learning model with a complete pipeline.

Fully connected Deep NN

Here we will create a fully connected deep neural network. The model itself is not fancy and I am not expecting to perform better. Also, it is not an appropriate model for time series data. I am using this model just to use all our features and for the sake of simplicity.

However, we are starting with a basic model to complete our pipeline. In the next section, I will show how to create other types of models. Our model.py looks like the below one.

import torch.nn as nn
import torch.nn.functional as F# define the CNN architecture
class Net(nn.Module):
    def __init__(self, hidden_dim, dropout =0.3):
        
        super(Net, self).__init__()
        # Number of features
        self.fc1 = nn.Linear(427, hidden_dim)
        
        self.fc2 = nn.Linear(hidden_dim, hidden_dim*2)
        
        self.fc3 = nn.Linear(hidden_dim*2, hidden_dim)
        
        self.fc4 = nn.Linear(hidden_dim, 32)
        
        self.fc5 = nn.Linear(32, 3)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        
        out = self.dropout(F.relu(self.fc1(x)))
        
        out = self.dropout(F.relu(self.fc2(out)))
        
        out = self.dropout(F.relu(self.fc3(out)))
        
        out = self.dropout(F.relu(self.fc4(out)))
        
        out = self.fc5(out)
        
        return out

After creating the model and required transformations, we will create our training loop.

We are going to train our model in AWS Sagemaker. It is completely an optional step. The model can be trained locally, and the model output file can be used for predictions. If you train it in the cloud, the below code can be used for Training.

You also need an AWS account with the Sagemaker setup. If you need more info or help, please check my previous article, Train a GAN and generate faces using AWS Sagemaker | PyTorch setup section.

Once you have all the required access, you can start fitting the model, as shown below. The command below will package all the necessary code with data, create an EC2 server with required containers, and train the model.

from sagemaker.pytorch import PyTorch#Check the status of dataloader
estimator = PyTorch(entry_point="train.py",
                    source_dir="train",
                    role=role,
                    framework_version='1.0.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    hyperparameters={
                        'epochs': 2,
                        'hidden_dim': 32,
                    },)

Once you train the model, all the corresponding files will be in your S3 bucket. If you train your model locally, make sure you have the files in the corresponding S3 bucket location.

AWS Cloud to host code and get predictions

As our next setup, we will deploy the model in AWS Sagemaker. When deploying a PyTorch model in SageMaker, you are expected to provide four functions that the SageMaker inference container will use.

model_fn: This function is the same function that we used in the training script, and it tells SageMaker how to load our model from the S3 bucket.
input_fn: This function receives the raw serialized input sent to the model's endpoint, and its job is to de-serialize and make the input available for the inference code. Here we are going to create new data on a daily or hourly basis from Alpaca API.
output_fn: This function takes the output of the inference code, and its job is to serialize this output and return it to the caller of the model's endpoint. This is where we will have our logic to trade.
predict_fn: The heart of the inference script is where the actual prediction is done and is the function you will need to complete. Predictions will be made using underlying data. It has three outcomes Buy, Sell, and Hold.

Below is the code to load the model and prepare the input data

Some points to be noted in the above code:

The model and scaler object need to be in an S3 bucket.
We have fetched data for many days or hours. It is required for LSTM type networks.
Input content is the ticker symbol. We can tune the code for multiple symbols.

In the below code section, we will create the output and predict the function.

Some points to be noted in the above code

We have three classes buy, sell or hold. Prediction needs to be one of these three.
We need to focus on what is predicted and returned.
Trade only if there are enough funds (or limited funds) and in limited quantity.

Deployment is similar to training the model.

from sagemaker.predictor import RealTimePredictor
from sagemaker.pytorch import PyTorchModelclass StringPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')model = PyTorchModel(model_data=estimator.model_data,
                     role = role,
                     framework_version='1.0.0',
                     entry_point='predict.py',
                     source_dir='../serve',
                     predictor_cls=StringPredictor,)# Deploy the model in cloud serverpredictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.large')

If you want to test the model, you can execute the below code. This says that the workflow is working, and you can output your predictions and trade stocks on the issued ticker.

You can also get the endpoint name from the above code in the screenshot.

Create a Lambda function and API

Here will complete the pipeline by creating the Lamda function and API.

Create a Lambda function

Create a lambda function in the AWS lambda service. Remember to update the Endpoint name from the above screenshot.

API Gateway

From AWS API gateway services, create a Rest API. Then give a name and create the API.

Create a post and deploy from the Actions dropdown. Once you create it, the API is ready. You can use it in any UI if required.

Finally, we have our Rest endpoint, where we can create post requests. The endpoint can be tested with Postman or any other tool. If you don’t need an endpoint, you can schedule the lambda function by following this link.

Cheers! You can see that the stock is bought and sold in the Alpaca portal. Predictions are predicted from our model, and live data is fed into the model.

Conclusion

We have trained a deep learning model and traded stocks using the output of the model in real-time. I still think there is room for improvement.

I have used a deep neural network model here. You don’t need any model here. You can simply use your logic and create the pipeline.
Better feature engineering and selection of those features can be performed.
Different model architectures(like LSTM or DRL) need to be tested for time-series datasets.
Backtesting needs to be performed on the training data. I have not covered backtesting in this article.
The model can be retrained at frequent intervals. AWS Sagemaker provides an option without much hassle.

If there is enough interest in the article, I will write a followup article on adding sentiment analysis about that particular stock in real-time with backtesting and add other model architectures.

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Disclaimer

This article is entirely informative. None of the content presented in this notebook constitutes a recommendation of any particular security. All trading strategies are used at your own risk.

Questions? Comments? Feel free to leave your feedback in the comments section.

Check out other articles