论文标题

流k匹配问题:空间与总时间之间的权衡

The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

论文作者

Golan, Shay, Kociumaka, Tomasz, Kopelowitz, Tsvi, Porat, Ely

论文摘要

我们以长度$ m $的模式和长度$ n $的流媒体文本在流型模型中重新审视$ k $ -mismatch的问题,均在尺寸为$σ$字母的情况下。 Clifford等人的流媒体$ k $ -mismatch问题的当前最新算法。 [SODA 2019],使用$ \ tilde o(k)$ space和$ \ tilde o \ big(\ sqrt k \ big)$糟糕的时间每字符。已知空间复杂性是(无条件)最佳的,并且每个字符的最差时间与条件下限匹配。但是,算法的总时间成本之间存在差距,即$ \ tilde o(n \ sqrt k)$和最快已知的离线算法,其成本为$ \ tilde o \ big big(n + \ min \ min \ big big(\ frac {\ frac {nk {nk} {\ sqrt m} $ big)此外,当使用超过$ O(k)$空间时,尚不清楚对$ \ tilde o(n \ sqrt k)$的改进是否可以进行。 我们通过设计一个随机流算法的$ k $ - miSmatch问题来解决这些空白,给定整数参数$ k \ le s \ le m $,使用$ \ tilde o(s)$ space and App $ \ tilde o \ tilde o \ big big big(n+\ min \ min \ min \ min \ min \ big big big big big big big big big big(\ frac { s},\ frac {σnm} s \ big)\ big)$总时间。对于$ s = m $,总运行时变为$ \ tilde o \ big(n + \ min \ big(\ frac {nk} {\ sqrt m},σn\ big)\ big)$,这与最快的离线algorithm的时间成本相匹配。此外,每个字符的最差时间成本仍然是$ \ tilde o \ big(\ sqrt k \ big)$。

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{σnm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源