
Researchers from China conducted a large-scale comparison of AI capabilities for stock trading using real market data. AI agents managed a portfolio of 20 Dow Jones Index stocks over 4 months with an initial capital of $100,000. The AI trading testing included both proprietary models – GPT-5, Claude-4, OpenAI O3, and open-source neural networks Qwen3, Kimi-K2, GLM-4.5, DeepSeek. Results showed significant differences in trading strategies between models. Code for reproducing experiments is available on GitHub.
To compare neural network capabilities for stock trading, researchers created the StockBench benchmark:

Stock portfolio distribution:
Performance evaluation of neural networks for trading was conducted using three key metrics:
- Final return (percentage change in portfolio value);
- Maximum drawdown during trading (largest decline from peak to trough);
- Sortino ratio. Formula: excess return divided by standard deviation of negative returns. The higher the Sortino value, the more efficiently the AI for trading generates profit relative to downside risks.
The final composite ranking of neural networks was calculated as the average standardized deviation (z-score) across three metrics.
Formula:
Composite Rank = [z(return) - z(drawdown) + z(Sortino)] / 3
Calculation example for Kimi-K2:
Suppose after transformation we got the following z-scores:
- z(return) = +0.5 (return above average);
- z(drawdown) = +1.2 (drawdown less than average — this is good);
- z(Sortino) = +1.8 (Sortino above average).
Composite rank = [0.5 – (-1.2) + 1.8] / 3 = [0.5 + 1.2 + 1.8] / 3 = 3.5 / 3 = 1.17
Best Neural Networks for Stock Trading: Kimi-K2 and Qwen3-235B-Ins
Kimi-K2 ranked first in the AI trading rating, demonstrating a balanced combination of profitability and risk management in stock trading. The neural network achieved a 1.9% final return with a maximum drawdown of -11.8% and the best Sortino coefficient of 0.0420 among all participants. The high Sortino value indicates Kimi-K2’s ability to generate stable profits on the exchange with minimal drawdowns — a key advantage for long-term stock trading where loss limitation during volatile periods is critical.

Qwen3-235B-Ins ranked second among neural networks for trading with impressive results: 2.4% return and the lowest maximum drawdown of -11.2% among all models when working with stocks on the exchange. The Sortino coefficient of 0.0299 confirms effective downside risk management, although it falls short of the leader. The Instruct version of Qwen3 demonstrated superior discipline in limiting losses during trading — 3.7 percentage points better than its reasoning version Qwen3-235B-Think, which showed a drawdown of -14.9% with a return of 2.5% and Sortino 0.0309.
GLM-4.5 completed the top three AI models for stock trading with a return of 2.3%, drawdown of -13.7%, and Sortino ratio of 0.0295. The value close to Qwen3-235B-Ins indicates comparable efficiency in managing negative volatility when working on the exchange, which allowed the neural network to compete with top solutions.
Reasoning Models in Trading: High Stock Returns, But More Risk
Neural networks with extended reasoning capabilities showed mixed results in stock trading. Qwen3-235B-Think achieved the highest return among the top 4 trading models (2.5%), but ranked only fourth due to a high drawdown of -14.9% and Sortino of 0.0309 — lower than the instruction version. OpenAI O3, specializing in complex reasoning, showed 1.9% return when trading on the exchange with a drawdown of -13.2% and Sortino of 0.0267, taking fifth place.
Qwen3-30B-Think with fewer parameters achieved 2.1% return and -13.5% drawdown with a Sortino coefficient of 0.0255, ranking sixth among AI for trading. Comparison of reasoning versions with Instruct counterparts revealed a pattern: neural networks with reasoning capabilities generate more aggressive stock trading strategies, leading to increased negative volatility and reduced risk-adjusted performance on the exchange as measured by Sortino.
Error type analysis confirmed this trend. Qwen3-235B-Think made 5.6% arithmetic errors versus 14.5% in the instruction version, demonstrating superiority in quantitative calculations during trading. However, the reasoning model violated the required output format in 8% of cases versus 2% for Qwen3-235B-Ins, indicating a tendency toward excessive complication of trading decisions.

Proprietary Models Claude-4 and GPT-5 in Trading: Mid-Ranking
Claude-4-Sonnet from Anthropic ranked seventh among AI for stock trading with a return of 2.2%, drawdown of -14.2%, and Sortino coefficient of 0.0245. Despite decent profitability on the exchange, the relatively low Sortino value indicates insufficient efficiency in controlling drawdowns compared to generated profit during trading, which prevented the neural network from competing with leaders.
GPT-5 from OpenAI unexpectedly ranked only ninth in the neural networks for trading rating with the minimum return of 0.3% among all LLM agents, a drawdown of -13.1%, and an extremely low Sortino of 0.0132. A coefficient three times lower than the leader indicates inefficient profit generation in stock trading relative to risks taken. Despite advanced capabilities in other domains, GPT-5 demonstrated an overly conservative trading strategy on the exchange, barely exceeding the passive baseline (0.4% return and Sortino 0.0155).
DeepSeek-V3.1 ranked eighth with a return of 1.1%, drawdown of -14.1%, and Sortino of 0.0210, outperforming GPT-5 across all stock trading metrics. The earlier version DeepSeek-V3 showed 0.2% return, Sortino 0.0144, and ranked 11th among AI for trading, falling behind even the passive strategy in risk-adjusted performance on the exchange.
Stock Trading Outsiders: Specialized and Small Neural Networks
Qwen3-Coder, optimized for programming, ranked tenth in the AI for trading rating with a return of 0.2%, drawdown of -13.9%, and Sortino of 0.0137. Specialization in code generation provided no advantages in making financial decisions during stock trading, and the low Sortino coefficient indicates an inefficient profit-to-downside risk ratio on the exchange.
GPT-OSS-120B showed negative return of -0.9% during trading with a drawdown of -14.0% and Sortino of 0.0156, ranking 13th out of 14 among neural networks for trading. The positive Sortino value with negative return is explained by the calculation methodology relative to the risk-free rate. The model demonstrated extreme result volatility in stock trading with a return variance of 10.19×10⁻⁴ — the highest among all participants.
GPT-OSS-20B with smaller size became the outsider of the AI trading rating with a return of -2.8%, drawdown of -14.4%, and negative Sortino ratio of -0.0069. The only neural network with a negative Sortino coefficient demonstrated fundamental inability to generate profit above the risk-free rate in stock trading on the exchange at any level of downside risks taken — a critical failure for a trading agent.
Stock Portfolio Scalability: Advantage of Large Neural Networks for Trading
Experiments with different stock portfolio sizes (5, 10, 20, 30 positions) revealed a critical dependence of AI for trading performance on model scale. Kimi-K2 maintained a positive average return of 3.2% on a portfolio of 10 stocks and 1.9% on 20 stocks, demonstrating scalability of trading strategies on the exchange without significant degradation of risk-adjusted metrics.
GPT-OSS-120B showed sharp deterioration when expanding the stock portfolio: from -5.7% on 5 positions to -0.9% on 30 positions with a coefficient of variation increasing from 0.1 to 4.4. The smaller neural network size limits the ability to simultaneously analyze multiple assets on the exchange and coordinate trading decisions, negatively affecting risk management efficiency during trading.
Qwen3-235B-Ins demonstrated optimal performance when trading a portfolio of 10-20 stocks, balancing between diversification and manageability. Expansion to 30 stocks led to a decrease in average return, indicating limits of effective attention distribution even for large neural networks when working on the exchange.
Practical Recommendations for Choosing AI for Stock Trading
For conservative investors prioritizing capital protection when trading on the exchange, the optimal neural network choice is Qwen3-235B-Ins with the minimum drawdown of -11.2% and stable return of 2.4%. The Sortino coefficient of 0.0299 confirms high efficiency of AI for trading in managing downside stock risks without sacrificing return.
Investors focused on maximizing returns in stock trading and willing to accept increased volatility on the exchange may consider Qwen3-235B-Think with the highest return of 2.5% among top trading models. Sortino of 0.0309 remains competitive, although the increased drawdown of -14.9% requires higher risk tolerance.
For universal application in stock trading with optimal balance of all metrics, Kimi-K2 unequivocally leads with first place in the neural networks for trading ranking. The Sortino coefficient of 0.0420 — 40% higher than the nearest competitor — confirms the superior ability of AI to generate stable profits on the exchange while controlling negative volatility. The model combines acceptable return of 1.9%, low drawdown of -11.8%, and stability in various market regimes during trading.
Investors should avoid small neural networks like GPT-OSS-20B, which demonstrated negative return of -2.8% and the only negative Sortino ratio of -0.0069 among all participants in AI for stock trading testing. Model size is critical for effective analysis of multiple market signals on the exchange and coordination of trading decisions in a multi-stock portfolio.
StockBench testing results show that modern neural networks for stock trading can outperform passive strategies in risk management on the exchange, reflected in higher Sortino coefficient values of top models compared to baseline (0.0155). Kimi-K2, Qwen3-235B-Ins, and GLM-4.5 represent the current state-of-the-art among AI for trading, demonstrating potential for practical application in stock trading with careful risk management.