Trajectory Tactics: When Transformers Learn Exploration to Generate Online Signature
Abstract
The increasing need for robust digital signature verification systems has amplified interest in realistic online signature generation to counter digital forgeries. In this work, we propose a novel Decision Transformer based framework that learns Reinforcement Learning output to generate diverse online signatures. Departing from traditional RL approaches that rely on policy gradients or value function estimation, we formulate signature generation as a sequence-modeling problem. Our framework addresses varied free-form signature styles, demonstrating adaptability across linguistic and stylistic variations. Initially, an RL model generates signature trajectories, which are then fed to Decision Transformer, employing an autoregressive sequence modeling approach. To further personalize the generated signatures, we introduce a Q-learning-based module that produces user-specific variations while mitigating noise. By operating in an offline reinforcement learning setting, the proposed method reduces the dependency on extensive online interactions, improving scalability. Experimental results on a publicly available online signature dataset in multiple linguistic script styles show that our approach significantly outperforms traditional generative methods in terms of realism, variability, and mimicry accuracy. These results highlight the potential of Decision Transformers for structured sequence generation tasks beyond their conventional domains.