Repeated DNA Sequences

https://leetcode.com/problems/repeated-dna-sequences/description/

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Thoughts

首先这个有歧义,开始一直以为是找No overlapping的重复子数组,看了别人的答案才知道不一定要no overlapping, 比如

If you enter your "AAAAAAAAAAA" as custom test and click "Run Code", you'll see that expected answer is

["AAAAAAAAAA"].

知道了这点很容易想出naive 解法。即每次移动一位,把遇到的都记录下来。但这么做浪费空间。

还可以用整数存储方法。

这个思路非常简单,这里一共有四个字母:A,C,G,T。我们转换整数的思路如下:

  1. A = 00,C = 01,G = 10,T = 11。

  2. int key = 0, key = key << 2 | code(A|C|G|T)。

Last updated