论文标题

促进SQL查询组成和分析

Facilitating SQL Query Composition and Analysis

论文作者

Zolaktaf, Zainab, Milani, Mostafa, Pottinger, Rachel

论文摘要

制定有效的SQL查询需要几个调整和执行的周期,尤其是对于缺乏经验的用户。我们通过在执行前提供有关SQL查询的见解来研究可以加速和改善此交互的方法。我们通过预测诸如查询答案大小,其运行时和错误类别之类的属性来实现这一目标。与现有方法不同,我们的方法不依赖于数据库实例或查询执行计划中的任何统计信息。这在对数据库实例访问有限的设置中尤为重要。我们的方法基于使用数据驱动的机器学习技术,这些技术依靠大量查询工作负载来建模SQL查询及其属性。我们评估了神经网络模型和传统机器学习模型的实用性。我们使用两个现实的查询工作负载:斯隆数字天空调查(SDSS)和SQLSHARE查询工作负载。经验结果表明,神经网络模型在预测查询错误类方面更为准确,可以在较少样本的类别上获得更高的F量级,并且在其他问题(例如运行时和答案大小预测)上的表现更好。这些结果令人鼓舞,并确认可以利用SQL查询工作负载和数据驱动的机器学习方法来促进查询组成和分析。

Formulating efficient SQL queries requires several cycles of tuning and execution, particularly for inexperienced users. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to execution. We achieve this by predicting properties such as the query answer size, its run-time, and error class. Unlike existing approaches, our approach does not rely on any statistics from the database instance or query execution plans. This is particularly important in settings with limited access to the database instance. Our approach is based on using data-driven machine learning techniques that rely on large query workloads to model SQL queries and their properties. We evaluate the utility of neural network models and traditional machine learning models. We use two real-world query workloads: the Sloan Digital Sky Survey (SDSS) and the SQLShare query workload. Empirical results show that the neural network models are more accurate in predicting the query error class, achieving a higher F-measure on classes with fewer samples as well as performing better on other problems such as run-time and answer size prediction. These results are encouraging and confirm that SQL query workloads and data-driven machine learning methods can be leveraged to facilitate query composition and analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源