WE MADE A TOOL TO PREDICT THE NEXT GAMESTOP (AND YOU CAN TOO)
With a simple script and a free website, we created a custom sentiment analysis model to tell me which meme stock will head moonwards next. Here’s how we did it.
You’re going to need three things for this recipe:
Lexikat.com’s custom sentiment tool. I have a subscription to the site because I make a lot of these, but there’s also a free tier that doesn’t require your details to sign up for.
Some data. I just copied and pasted a few /r/wallstreetbets threads about GameStop into a Word document. If you want to use mine, it is here, but you can use data of your own and get results that are customised to that data.
A script for scraping Reddit and applying the sentiment model you’re going to make. The one I used is here, but you can capture more data if you have a Reddit API key.
The first step is to upload your data — in a Word or Excel file — to Lexikat.com.
The first thing you’ll see are some word clouds based on the key topics that /r/wallstreetbets has been talking about. These can also be used to make predictive models, but we’re not going to be using them today. Instead, you’ll want to click on the sentiment button:
This will make a custom sentiment analysis model based on your data, and tell you which words triggered it. In this example, positive sentiment is in the red circle and negative is green:
This basic model is ok, but it could be better. We’re only interested in positive sentiment in this case, because most of the negative sentiment is addressed towards those shorting the stock, rather than the stock itself. Basically, it’s easier to simply ignore the negative sentiment than to try to sort signal from noise, but if you want to make a more complex model you can try including it. The positive sentiment cloud is missing a few key items. The computer has failed to recognise some of the internet slang used in the thread — “tendies”, “yolo”, and “moon” for example. We can improve the model by adding them in. Click on the red circle to add the words:
There are also some words that are there and shouldn’t be. In this context, “like” is being used to express a comparison rather than approval, so we can delete it from the model by clicking on the X button.
It doesn’t take much editing to get something that looks a lot more realistic:
Now it’s time to grab the info. Click the stats button to get a downloadable table:
You’ll get an Excel sheet that looks like this:
We don’t need the frequencies or percentages for this task, so we can get rid of them and just paste the positive words list into a .txt file.
Save this into the same folder you used when saving the script from point number three above. If you’re using the script I linked toyou might want to change the list of default stocks included in the script. I’ve just listed a few common examples as placeholders, but it would be worth using a more complete list of company names and ticker symbols. You can find one here.
It might be worth explaining a bit more about how the script works at this point. A good meme stock isn’t identified by sentiment alone. It will also lead to the creation of a high volume of threads. A stock that has good potential but not a Reddit darling will have a high sentiment score, but only be the subject of one or two threads. For this reason, the system filters out all but the top five most frequently mentioned stocks of the past 24 hours. If a stock is popular but not viral, it’s not a meme stock.
Now it’s time to run the script For this you will need to have Python installed. If you don’t have it, you can get it here. You will also need to install Requests. If you’ve never run a script before, you can find instructions here (it’s easy — even I can do it). It will take a couple of minutes to process the data.
After a couple of minutes, the processing will be done and you’ll have a results file:
I made this early on the morning of the 28th (Singapore time). Unsurprisingly, $GME is far out-scoring every other stock, but Blackberry and AMC are also popping, an indicator of other ongoing short squeezes. This output just shows the total level of positive sentiment for each stock, but the script can very easily be changed to — for example — compare positive and negative sentiment in each post about a stock, or give the average score per post. You can try out different options to work out which gives better predictions.
Once you’ve got the script working, you can do various things with it. I’ll probably set it to run daily and send the results to a Google Sheet. Then I’ll tell Google to email me whenever a stock achieves a score of more than 250 — that way I’ll get advance warnings of forthcoming spikes without needing to spend all day scrolling Reddit. What do you think? Can you improve on this? See you on the moon :)