论文标题
高频股票交易数据的新型建模策略
Novel Modelling Strategies for High-frequency Stock Trading Data
论文作者
论文摘要
证券交易所中的完整电子自动化最近变得流行,产生了高频盘中数据,并激发了近乎实时价格预测方法的开发。机器学习算法被广泛应用于中价库存预测。处理原始数据作为预测模型的输入(例如,数据稀疏和特征工程)主要会影响预测方法的性能。但是,研究人员很少讨论此主题。这促使我们提出了处理原始数据的三种新型建模策略。我们说明了我们的新型建模策略如何通过分析道琼斯琼斯30组件股票的高频数据来改善预测性能。在这些实验中,我们的策略通常会导致预测的统计学显着改善。这三种策略分别将SVM模型的F1得分提高了0.056、0.087和0.016。
Full electronic automation in stock exchanges has recently become popular, generating high-frequency intraday data and motivating the development of near real-time price forecasting methods. Machine learning algorithms are widely applied to mid-price stock predictions. Processing raw data as inputs for prediction models (e.g., data thinning and feature engineering) can primarily affect the performance of the prediction methods. However, researchers rarely discuss this topic. This motivated us to propose three novel modelling strategies for processing raw data. We illustrate how our novel modelling strategies improve forecasting performance by analyzing high-frequency data of the Dow Jones 30 component stocks. In these experiments, our strategies often lead to statistically significant improvement in predictions. The three strategies improve the F1 scores of the SVM models by 0.056, 0.087, and 0.016, respectively.