Sign up for OB Trader and unleash
the power of trading AI
sıgn up
You have successfully signed up for OB Trader!

We sent you a link to complete your signup.
Check your inbox to set the password for your OB Trader account.

OB Team

Join our communities!
Oops! Something went wrong while submitting the form.
You will receive an email with the link to activate your OB account.
Already have an account? Log in.

Whitepaper: OB Trader (Technical)

Extract from the One Button Trader’s whitepaper describing AI, ML, and Neural Networks behind the platform

One Button Trading
BotsStatisticsNewsFor Beginners
Sep 9
Sep 9

AI Foundations

Refresher on Neural Networks

Neural networks (currently) are a form of narrow AI, which means that it’s able to learn as specific tasks designed by a human. In theory, any system of non-linear (differentiable) functions could be a neural network, although generally, they adjust scalars, vectors, and tensors as parameters, in order to fulfill the task.

The upsides of neural networks is that they achieve high performance on a variety of tasks, with minimal expert knowledge required. It does have the downside of being hard to interpret by humans because of the complex non-linear relations that are computed. Additionally, it can require quite a large amount of up-front compute in order to train a neural network.

Evolutionary Strategies

For our trading bots, we use Evolutionary strategies to train them, it’s a relatively simple concept if you understand Darwinian evolution although because we are using numbers (rather than physical animals) we can make some additional changes to make it more reliable for solving difficult issues.

Evolutionary strategies go through 5 (relatively) simple steps:

  1. Create a randomly initialized model as the Master model
  2. Create N mutated models from the Master model, by applying random noise
  3. Evaluate the N models in the environment
  4. Adjust the main model by the weighted sum of the reward multiplied by their respective random noise.
  5. Go to step 2 until satisfied.

In math terms:

  • E is the evaluation function (returns normalized reward)
  • J is a deterministic noise
  • m is the model at a certain time-step n
  • L is the learning rate.

Demonstration of the evolutionary strategies algorithm.

We have written our own implementation for scalability and extendability. It is available under MPL-2.0 at

Why evolutionary strategies?

There are a lot of reinforcement learning algorithms (PPO/TRPO/DQN/Dueling-DQN/etc), which allows for training against an environment (in our case a simulated market). The main issue with these are they either:

  1. Expect the agent to have a meaningful effect on the environment
  2. They estimate the expected future value based on the current state/action
  3. They expect rewards to be given for a certain action within relatively short period of time

While in theory it’s possible to make this work for trading, these algorithms are likely to mismatch our goal. Since we do not expect to have a major effect on the market when a strategy is released, and the future value cannot be accurately estimated without having knowledge of the future.

Evolutionary strategies bypasses all of these issues since the reward are a single scalar for each episode, the rewards we give can be calculated at the end of the episode so it doesn’t run into issue #3 and #1.

It compares perturbed models on the same simulation to get the direction rather than having the estimate it, which fixes issue #2.

Now it’s possible that one of the regular RL algorithms fixes some of these issues as well, however because of their modus operandi it would not be easy to do.

Visual representation of AI training process. Different AI models’ V2 metric during training
Backtesting results of the trained model

Different AI models’ V2 metric during training

AI Models

All of our current models are neural networks, and are based on battle-tested architectures.

Astral (Filter)

Astral was one of our first models based on concepts from the WaveNet paper, it uses 2 parallel Linear layers in with sigmoid and tanh activations respectively to create a filter. In all of our deployed strategies, this model has 3 FilterBlocks and 2 projection layers for the input/output.

FilterBlocks have 3 feed-forward layers. Each contains a PReLU shared linear layer which goes into the tanh and sigmoid linear layers respectively, afterwards these are re-combined by a Hadamard product.

These models have mostly be replaced by the more recent Performer models, because of the fact that the Performer can in theory make more complex computations. Filter models do have the benefit of allowing for a better compute-to-memory usage ratio than Performer, however for raw trading performance this is a moot point.

Horizon/Ascendant (GRU)

The horizon and ascendant models use recurrent neural networks to take actions, specifically a multi-layer GRU. They keep their internal state between actions, allowing them to re-use some computation from previous steps.

The strategies that we have deployed typically have 3 GRU layers with different hidden state and channels dimensions depending on the input data. Each market scan the model gets a full window of price data (e.g. 64 of most recent ohlc), these are used as a single step in the GRU. This allows for more efficient computation as well as creating an inductive bias for applying historical data to itself with an offset (as is used in many technical indicators).


The performer model is a direct adaptation from the Performer paper which uses FAVOR+ to approximate attention used in Transformers. It is otherwise equivalent to the Transformer architecture.

Transformers are currently the most popular/promising type of neural network in most fields of study (NLP, Timeseries, and even some Vision). This is due to it’s extremely general nature, it can learn many different types of tasks using approximately the architecture.

The strategies we have deployed typically have 2 layers, and has learned positional embedding. It also has 2 projection layers for the input/output layers. The models are otherwise the same as described in the paper.

Complex Adaptive Systems (Discussion)

The first thing which rationalizes our choices is the fact that markets are complex adaptive systems, which means that any changes in the system can change dynamics long term and may influence other parts of the system in a feedback-loop. The agents in the system collectively influence the system, which in turn makes the agents adapt, which changes the system again.

Any such agent acting in the system in a static way will eventually fail to perform; if this agent’s reasoning is publicly available other agents will also be able to take this into account speeding up the issue.

This effectively means that using purely technical indicators is extremely unlikely to hold performance in the future (if they ever worked at all), as any consistent causal pattern will be adapted to, especially if this pattern is publicly available.

Humans can adapt; research into a company’s fundamentals and environment can provide certain expected prices with certain probabilities (for example Morgan Stanley’s analysis on page 6)

However such research requires vast human resources and can cost a customer up to 350,000$ per year; and this doesn’t even provide you with a strategy, just the probabilities/expectations of certain events happening.

So for everyone who isn’t a stock/crypto-expert billionaire, there will need to be a different solution.

For us this meant going to neural networks which directly interface with the market. In it’s current state it only handles the direct environment (the market), however it is able to adapt to changes in the market and provides you with fully automated way of using the strategy. We can also regularly update our models (or even do it with online learning) to adapt to changes that the model itself doesn’t account for. In the future the inclusion of external features (such as social media, news articles, performance reports), will allow for even more internal adaptability.


Subscribe to our Newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.