ChatGPT helped a simulated study return up to 512% trading stocks based on news. Here's the prompt the researchers used — plus the positive and negative takeaways from their findings.

ChatGPT excelled at predicting a stock’s price direction based on news sentiment.
Researchers at the University of Florida used a single prompt to ask it to determine sentiment.
Its positive predictability ratio allowed it to generate gains over time.

Investor enthusiasm for artificial intelligence has boosted the S&P 500 by 18% this year, with mega-cap stocks with exposure to the technology accounting for the majority of the index’s gains.

Along with the excitement, there is apprehension about how AI can be further integrated into investing. For example, as large language models (LLM) that underpin services like ChatGPT improve, will they be able to mimic human reasoning well enough to replace stock pickers?

The Department of Finance at the University of Florida thought it would be interesting to see if these models could understand financial markets despite not having been trained in them. They accomplished this by asking ChatGPT whether a piece of news was good or bad for the price of a stock, and then running a simulation that would buy or short a stock based on whether the information was positive or negative.

They discovered that, while ChatGPT excelled at predicting stock direction based on news sentiment, it has limitations. Indeed, generative AI services such as ChatGPT and Bard explicitly advise users not to rely on them for financial advice and to conduct their own research.

The study, led by Alejandro Lopez-Lira, assistant professor of finance, and Yuehua Tang, associate professor at Emerson-Merrill Lynch, sought to determine whether ChatGPT could understand the impact of news on stock-market movements sufficiently to generate returns, and whether it was as competent as, or even better than, a human.

They fed it news about everything from dividend payouts to CEO announcements. The companies were chosen from the database of the Center for Research in Security Prices. To ensure that they only used relevant news, news headlines were scraped from the web and compared to those from data provider RavenPack.

They instructed ChatGPT to assign headlines the following scores: “1” for good news, “0” for unknown, and “-1” for bad news. Those with a score of “1” would be purchased, while those with a score of “-1” would be automatically shorted using Python code in Linux. On “0” outputs, no action was taken. ChatGPT correctly predicted the outcome 51% of the time. While the margin is low, the returns accumulated over time and frequency, according to Lopez-Lira.

The research was carried out in April as a walk-forward test for trading days from October 2021 to December 2022. Because GPT-3.5 was trained with data until September 2021, the LLM was unable to predict what happened in the stock market after that date, allowing Lopez-Lira and Tang to test its predictive abilities.

Their strategy was designed to trade any stock on the NYSE and Nasdaq. However, because small-cap stocks are more expensive to trade, fewer investors are trading them, creating a larger window of opportunity to profit from the news, according to Lopez-Lira.

During that time, the long-short strategy informed by GPT-3.5’s sentiment analysis generated a 512% return on $1. Similarly, the GPT-4 strategy turned the same amount into $3.76 over the same time period, for a 276% return. A second simulation on GPT-3.5 used transaction costs ranging from 5 to 25 basis points, resulting in gains of 380% and 50%, respectively. In comparison, the Russell 2000 dropped 20% during the same time period.

According to Lopez-Lira, the earlier model outperformed the latter in terms of total returns but not risk-adjusted returns.

“It’s mostly because we give GPT-3.5 the option to say, ‘I’m not sure if this news is good or bad,'” Lopez-Lira explained. “So GPT-3.5 likes to answer more of these questions.”

Furthermore, the study discovered that previous versions of the language model, such as GPT-1, GPT-2, and BERT, failed to translate information adequately enough to make profitable trades. This suggests that as language models improve, accuracy may improve.

The main prompt was:

“Ignore all previous instructions. Assume you are a financial expert who has made stock recommendations in the past. In the first line, respond “YES” if the news is good, “NO” if the news is bad, or “UNKNOWN” if the news is uncertain. Then, on the next line, elaborate with one short and concise sentence. Is this headline good or bad for (company name’s) stock price in the short or long term?”

(Insert headline here)

The following are some examples of headlines and ChatGPT responses:

“Cadence Design Stock Rises Following Earnings.” Analysts are optimistic.”

ChatGPT: YES, the positive earnings report and upbeat analyst forecast are likely to boost demand for Cadence Design stock in the short term.

“Philip Morris Beats Quarterly Estimates, But Cuts Outlook”

ChatGPT: “NO, despite beating quarterly estimates, Philip Morris International’s stock price is likely to fall in the short term due to the negative outlook.”

“Plexus Worldwide Promotes Gene Tipps to President of Global Operations” is the headline.

ChatGPT: “UNKNOWN” It is unclear how Gene Tipps’ promotion to President of Global Operations will affect Plexus’ stock price in the short term.”

“We asked it to consider the price direction and also a short explanation, and we had it systematically output them,” Lopez-Lira explained. “All of them, let’s say, I’d give a ‘A.'” he added. “All of the explanations are reasonable and financially sound.”

Wall Street ChatGPT

On the plus side, because they can quickly digest large amounts of information, these language models could make an analyst’s job much easier and more productive, according to Lopez-Lira.

As newer AI models emerge, they may make markets more efficient by allowing for faster reaction times to news. This timeframe is known as the window of opportunity, and it refers to the amount of time a trader has to capitalize on the news before others do.

The preliminary research concentrated on intraday trades. However, Lopez-Lira believes that as more firms use these tools, the window of opportunity to exploit information will shrink from a day to minutes or even seconds, making it impossible for a human to manually exploit information for high-frequency trades. Retail traders are already finding it difficult to compete with large institutional algorithms. AI will only make it more difficult, widening the gap between institutions and retail traders, he added.

Lopez-Lira believes that institutional traders’ advanced abilities may backfire: Predictability will decline as more firms integrate AI tools into their trading practices because they are competing in the same space, analyzing data with similar models. He believes that their competitive advantage will dwindle over time.

Seasoned retail traders typically avoid betting against institutional algorithms. According to trading records reviewed by Insider, David Capablanca, a short seller who had up to a 90% win ratio between February 2021 and April 2023, said he would not trade small caps if he detects algorithmic trades being executed. He will also not wager against stocks that have more than 40% institutional ownership.

Real-world pitfalls

If you want to use ChatGPT to make actual trades, Lopez-Lira says you’ll need to give it a lot more context.

Alpesh Patel, CEO of the private equity firm Praefinium, did exactly that when he tested GPT-4’s ability to pick stocks in real-time. He fed the model data points from a terminal for the 30 Dow stocks, which included working capital, free-cash conversion, and debt, among many other variables. This allowed the LLM to narrow down the top five stocks it believes will perform well over the next year. It chose three of the same stocks that Patel already owned, as well as two additional names with similarly strong fundamentals.

Capablanca analyzes how headlines affect stock movements on a regular basis. But it’s only one of nine items on his checklist before shorting any stock. Many other factors must be considered by short sellers in order to avoid disaster. Market friction, or things that could interfere with the ability to execute a trade quickly, is a major cause of these disasters, which the simulation did not account for.

Sometimes the broker does not immediately execute your order, or you cannot find shares to borrow. Because short selling requires borrowing, buying, and selling, there are more opportunities for friction points to slow down your trade as a human trader like Capablanca. However, because ChatGPT didn’t have to wait for the broker to lend it a stock or execute its trade, it was able to take advantage of a short sale faster than the average person.

This meant that negative news had a greater and longer impact in the real world than in the simulation, giving ChatGPT an advantage, according to Lopez-Lira. As a result, negative news had higher return predictability in the simulation.

“Stocks that are the most obvious shorts, that go to $100 and collapse to zero, you couldn’t even short,” Capablanca said, comparing GPT’s attempt to short any stock to a real-world scenario in which float size was not taken into account. “On top of that, it doesn’t take into account the squeezes.” So, how many of these stocks did you blow up if you had shorted them with a certain amount of money?”

Capablanca also mentioned the dangers of real-world trading halts, which can trap a trader. Then there’s the increased risk of leaving short positions open overnight, as the simulation did. After-hours trading can experience gap ups, or highly volatile price movements. These can result in short squeezes and margin calls, he says.

Large institutional investors, on the other hand, must consider price impact, which occurs when large amounts of transactions move a share price, which is more likely to occur in smaller-cap stocks, according to Lopez-Lira.

Another source of friction that could slow down a real-time trade is a lack of liquidity. Stocks with lower floats are more difficult to enter and exit due to supply or demand constraints, which may result in wider spreads.

According to Cory Mitchell, an analyst at the trading education website Trading.biz, if the stocks shorted had ample volume and could be shorted, the model is robust because it shows massively outsized performance. When compared to the S&P 500 over the same time period, this model outperformed the index by 500%. He also mentioned that the drawdowns were less than the index, which is very encouraging.

“An equal weight market portfolio would have experienced a 36% drawdown at one point during this study.” Their portfolio, on the other hand, lost 22.79%, or one-third less than an index-type portfolio, according to Mitchell.

Despite the study’s shortcomings, Capablanca believes it is critical for traders to stay up to date on these developments for informational purposes.

“It’s good to be aware of how far they’ve gotten because in the future it will be good,” Capablanca says of AI tools. “I don’t like being in the dark.” It’s like ignoring computers until 2005; if you ignore computers until now, you’ll be behind because they’ll figure it out eventually.”