论文标题
对具有多样性的移动应用程序生成测试套件的全面经验评估
A Comprehensive Empirical Evaluation of Generating Test Suites for Mobile Applications with Diversity
论文作者
论文摘要
上下文:在基于搜索的软件工程中,我们经常使用具有默认配置的流行启发式方法,这通常会导致次优效果,或者我们执行实验以在反复试验的基础上识别配置,这可能会导致特定问题的更好结果。我们考虑为移动应用程序(应用程序)生成测试套件的问题,并依靠\ sapienz,\ sapienz是这种问题的最新方法,该方法使用了使用默认配置的流行启发式(NSGA-II)。目的:我们希望在使用\ sapienz生成测试套件的同时,避免试用试验实验以确定\ sapienz的更合适的配置。方法:我们对\ sapienz进行了健身景观分析,以分析搜索问题,这使我们能够在开发\ sapienzdiv时就\ sapienz的启发式和配置做出明智的决定。我们在与\ sapienz进行了34个应用程序的\ sapienz的头对面比较中全面评估了\ sapienzdiv。结果:分析\ sapienz的健身景观,我们观察到了进化的测试套件的多样性以及25代后搜索的停滞。 \ sapienzdiv实现了保留正在进化的测试套件多样性的机制。评估表明,\ sapienzdiv的测试结果比\ sapienz关于覆盖范围和显示故障的数量更好或至少相似。但是,\ sapienzdiv通常会产生更长的测试序列,并且需要比\ sapienz更多的执行时间。结论:对健身景观分析获得的搜索问题的理解使我们在没有试用实验的情况下找到了\ sapienz的更合适的配置。通过在搜索过程中促进测试套件的多样性,可以在故障和覆盖范围内改进或至少进行类似的测试结果。
Context: In search-based software engineering we often use popular heuristics with default configurations, which typically lead to suboptimal results, or we perform experiments to identify configurations on a trial-and-error basis, which may lead to better results for a specific problem. We consider the problem of generating test suites for mobile applications (apps) and rely on \Sapienz, a state-of-the-art approach to this problem that uses a popular heuristic (NSGA-II) with a default configuration. Objective: We want to achieve better results in generating test suites with \Sapienz while avoiding trial-and-error experiments to identify a more suitable configuration of \Sapienz. Method: We conducted a fitness landscape analysis of \Sapienz to analytically understand the search problem, which allowed us to make informed decisions about the heuristic and configuration of \Sapienz when developing \SapienzDiv. We comprehensively evaluated \SapienzDiv in a head-to-head comparison with \Sapienz on 34 apps. Results: Analyzing the fitness landscape of \Sapienz, we observed a lack of diversity of the evolved test suites and a stagnation of the search after 25 generations. \SapienzDiv realizes mechanisms that preserve the diversity of the test suites being evolved. The evaluation showed that \SapienzDiv achieves better or at least similar test results than \Sapienz concerning coverage and the number of revealed faults. However, \SapienzDiv typically produces longer test sequences and requires more execution time than \Sapienz. Conclusions: The understanding of the search problem obtained by the fitness landscape analysis helped us to find a more suitable configuration of \Sapienz without trial-and-error experiments. By promoting diversity of test suites during the search, improved or at least similar test results in terms of faults and coverage can be achieved.