Stochastic Optimal Control using Signatures

Master Thesis

1 Introduction

In this thesis, we consider a stochastic control problem of the form $$ dY_t = \mu_t b(Y_t) dt + \sigma(Y_t) dB_t, $$ where $\mu_t$ is an $\mathcal F_t = \sigma(B_s | s \leq t)$-measurable, continuous process we have some control over. An SDE of this form can be found when one considers a noisy process, where only some control on the drift, i.e. the average direction is given. This control manifests itself in the function $\mu_t : [0, T] \to \mathbb R$.

A toy example for a problem of this kind could be modeling navigating on the seas or in space, where the random part is the combined influence of winds and currents on a boat and $\mu_t$ represents the direction of the rudder, or in the space example, the randomness represents course altering events like solar winds and $\mu_t$ is the direction or strength of thrust. A similar optimal control problem with control in the drift was considered in (Diehl, Fritz, and Gassiat, 2017), which investigates the value function to find a dual problem. This was the first paper on stochastic optimal control, using rough path analysis.

We now use the ansatz \begin{align*} &\mu_t = \Theta(B|_{[0, t]}) &\Theta \in C( \Lambda_T, \mathbb R) =: \mathcal T, \end{align*} with $\Lambda_T$ being the space of stopped rough paths up to time $T$ (see Definition 5.2).

This gives the SDE $$ dY_t^\mu = \underbrace{\Theta(\hat B|_{[0, t]}) b(Y_t^\mu)}_{= \mu_t} dt + \sigma(Y_t^\mu) dB_t. $$

We can now define a loss-function like \begin{align*} L(Y^\mu) := \mathbb E(Y_T^\mu)^2 + \mathbb E(\left|Y_T^\mu\right|^2), \end{align*} but in general all losses $L : C([0, T], \mathbb R^m) \to \mathbb R^+$ Lipschitz or Hölder continuous are possible.

The question we want to answer is: What is $\inf_\mu \mathbb E[L(Y^\mu)]$ and what does the corresponding $\mu$ (and $\Theta$) look like? It is the question of an optimal way to act, while counteracting random noise.

First, we need to understand our main problem SDE. This is a shorthand notation for \begin{align}\label{eq:main_problem_integral_form} Y_t - Y_0 = \int_0^t \mu_\tau b(Y_\tau) d\tau + \int_0^t \sigma(Y_\tau) dB_\tau. \end{align} The second integral here can be seen as an Itô integral. However, we will view it as the integral over a rough path, a so-called rough integral. This is a generalization of the Itô-Map, which can also incorporate other types of stochastic integrals, like the Stratonovich-Integral. This change of perspective is useful since we want to look at the so-called signatures of some processes, which are defined naturally in the context of rough paths.

The theory of rough paths was first introduced in the 1990s by Terry Lyons. It is an elegant framework for path-wise integration with rough driving signals and is therefore suited to a general class of stochastic processes, like Brownian motion or fractional Brownian motion. In particular rough integrals are a generalization of Young’s theory of integration. An important aspect of the theory is the continuity of the solution map of rough differential equations, which is not given in the classical case of Itô SDEs, where the solution map is measurable, but not continuous.

In addition to theoretical advances in SDEs, there were additional tools developed for rough paths, most notably the signature. The signature $\mathbb X^{< \infty}$ of a path $x: [0, T] \to \mathbb R^n$ is a collection of iterated integrals of all components of the path against each other; \begin{align*} \int_{0 \leq t_1 \leq … \leq t_k \leq t} dx_{t_1}^{i_1} … dx_{t_k}^{i_k} \end{align*} for $k \in \mathbb N$ and $i_1, …, i_k \in \lbrace 1, …, n\rbrace$. Now, the values of the signature have to be defined up to a certain level $k$, which depends on the roughness of the underlying path. To see why this is true, one can consider the differences between the Itô and Stratonovich integrals, which both are fair definitions of integrals with respect to Brownian motion. We have \begin{align*} \int_{0 \leq t_1 \leq t_2 \leq T} dB_{t_1} dB_{t_2} = \int_0^T B_t dB_t = \frac{B_T^2}{2} + \frac{T}{2}, \end{align*} but also \begin{align*} \int_{0 \leq t_1 \leq t_2 \leq T} \circ dB_{t_1} \circ dB_{t_2} = \int_0^T B_t \circ dB_t = \frac{B_T^2}{2}, \end{align*} which makes it clear, that there is not one single way of defining the signature of a process. This is why, when working with iterated integrals, one has to set one way of calculation. The theory of rough paths gives a framework for doing exactly that. The signature of a path is important because the signature at time $t$ determines the whole path up to time $t$ up to so-called tree-like extensions. In particular the signature of an augmented rough path, i.e. a path $x_t = (x_t^{(1)}, …, x_t^{(n)})$ with an additional dimension that represents the time \begin{align*} \hat x_t = (x_t^{(1)}, …, x_t^{(n)}, t) \in \mathbb R^{n + 1} \end{align*} is unique. This makes the signature an important tool in machine learning as a model-free way to extract features from time-series data, like audio, speech, or character drawing. As such it has been used successfully in several machine learning applications including Chinese character recognition or even medical tasks like the recognition of mental disorders.

The property of injectiveness of the signature map also makes it important to us and is why we take the following ansatz for answering the question from above: $$ \Theta(\hat B|_{[0, t]}) = \langle \ell, \hat{B}_{0, t}^{< \infty} \rangle. $$ Here, $\hat{B}_{0, t}^{< \infty}$ is the signature of the augmented path of Brownian motion. In this, we will follow the reasoning of (Kalsi, Lyons, and Arribas, 2020) and (Bayer et al., 2022), where it was shown that similar control problems of optimal trading speed and optimal stopping can be solved by just using linear maps of the path signature.

The main result of this thesis will be

Theorem 5.6:

Let $2 \leq p < 3$ and let $\mathbb P$ be a probability measure on $\left( \hat \Omega^p_T, \mathcal B(\hat \Omega^p_T) \right)$. Let $Y^\mu$ be the unique solution to \begin{align*} dY = \mu_t b(Y_t) dt + \sigma(Y_t) d\mathbf x \end{align*} started at $\xi \in \mathbb R^m$, with $\mu \in \mathcal T$, $b$ Lipschitz, and $\sigma \in C^3_b(\mathbb R^m, \mathbb R^{m \times n})$. Here, the $\mathbf x$ is a random geometric $p$-rough path with distribution determined by $\mathbb P$. It holds \begin{align*} \inf_{\mu \in \mathcal T} \mathbb E [L(Y^\mu)] = \inf_{\mu \in \mathcal{T}_{sig}} \mathbb E [L(Y^\mu)] \end{align*} for a loss function $L : C([0, T], \mathbb R^m) \to \mathbb R$ bounded and $\alpha$-Hölder for some $\alpha > 0$.

Here $\mathcal T = C(\Lambda_T, \mathbb R)$ is the set of all continuous functions of the path up to some time $t \in [0, T]$, while $\mathcal{T}_{sig}$ is the set of all functions of the form $\langle \ell, \hat{\mathbb{X}}_{0, t}^{< \infty} \rangle$. The theorem, therefore, says, that the optimal control problem can be solved by considering just linear maps of the signature of the augmented path. The statement will then also be extended to Itô integrals, as considered in the problem SDE, in Theorem 5.6.

Using these theorems, we can tackle our question numerically by modeling $\mu_t = \langle \ell, \hat{B}_{0, t}^{\leq k} \rangle$ to be a linear map. Here, we drop from the infinite-dimensional, full signature $\hat B_{0, t}^{< \infty}$ to the finite-dimensional, truncated signature $\hat B_{0, t}^{\leq k}$ for numerical reasons. This is a good approximation, as \begin{align*} \left|\left|{\hat B_{s, t}^k}\right|\right| \leq C \frac{\omega(s, t)^{\frac k p}}{\left( \frac{k}{p} \right) ! } \end{align*} (see Theorem 3.7 in Lyons, Caruana, and Lévy, 2007), i.e. the norms of additional signature levels decrease like $\frac{1}{k !}$. We can approximate the RDE’s solution by using a Milstein scheme (Algorithm 3) on a discrete time-grid \begin{align*} 0 = t_0 < t_1 < … < t_k = T \end{align*} and estimating the expected loss $\mathbb E[L(\theta)]$ after many such simulations. Using the backpropagation algorithm then can lead us arbitrarily close to the optimal solution $\mu_t$.

At first, we will introduce the theory of rough paths with its basic facts and definitions and derive rough integrals as a limit of Riemann-like sums in Section 2. Throughout the thesis, we will work with general rough paths with finite $p$-variation for $p \in [2, 3)$, where Young integration breaks down. For ease of notation, we will introduce a tensor calculus. In this section, a general setting of controlled rough paths is also established that deals with all kinds of rough paths as opposed to the theory of (Fritz, and Hairer, 2020) only considering $\alpha$-Hölder paths. After that, in Section 3, we will deal with rough differential equations (RDEs). We will prove the existence and uniqueness of solutions in the usual way via Picard iteration, but then extend the theory to RDEs with drift term, where we will only require very mild assumptions on the drift term, such that we can incorporate all RDEs of the form seen in the problem SDE for $b$ Lipschitz and $\mu$ continuous. We also investigate the stability of RDEs in the drift term. After having introduced RDEs, we will move on to signatures in Section 4, where we will see the basic definitions, along with a proof of the shuffle identity for geometric rough paths. This is directly followed by the proof of our main theorem, Theorem 5.6, in Section 5. Here, we will exploit the notion of stopped rough paths, as well as Lemma 5.5 which has also been used in (Kalsi, Lyons, and Arribas, 2020) and (Bayer et al., 2022) to show the density of signature controls on compact sets of arbitrary high probability ($< 1$). We then expand the main theorem to work with Itô-integrals. After proving the theoretical results, we will go on to state numerical algorithms which can be used for approximation and which are also implemented and can be viewed on GitHub, as well as some convergence results for said algorithms in Section 6. Then, in Section 7, we test our implementation against a julia reference implementation based on two SDE problems. We also use our framework to solve an optimal asset allocation problem in the Black-Scholes model. The SDE of this problem is of a different structure than we had before, and we argue why the same approach we took (approximating $\mu \in \mathcal T$ by $\mu \in \mathcal T_{sig}$) can also be done when one has combined control over the drift and volatility terms. Here, we use the Markov property of Brownian motion and neural networks to choose the control term to be $C(\mathbb R^{m + 1}, \mathbb R)$ instead of a linear function of the signature of the process. In the end (Section 8) we will discuss some extensions of the problem, as well as different possibilities of defining the Gubinelli derivative of RDE solutions when dealing with a drift term.

For more information, see the full thesis.


(Diehl, Fritz, and Gassiat, 2017) Joscha Diehl, Peter K. Fritz, and Paul Gassiat. ‘‘Stochastic control with rough paths’’. In Applied Mathematics & Optimization 75, pp. 285-315, 2017.

(Kalsi, Lyons, and Arribas, 2020) Jasdeep Kalsi, Terry Lions, and Imanol Perez Arribas. ‘‘Optimal execution with rough path signatures’’. In SIAM J. Financial Math 11, pp.470-493, 2020.

(Bayer et al., 2022) Christian Bayer et al. ‘‘Optimal stopping with signatures’’. In: The Annals of Applied Probability, 2022.

(Lyons, Caruana, and Lévy, 2007) Terry J. Lyons, Michael J. Caruana, and Terry Lévy. ‘‘Differential equations driven by rough paths’’. In: Lecture Notes in Mathematics, Springer Berlin Heidelberg, 2007.

Tobias Christian Nauen
Tobias Christian Nauen
PhD Student

My research interests include efficiency of machine learning models, multimodal learning, and transformer models.