on

I consider the probability distribution for the number of occurrences of a given word inside a random string whose letters have been generated by a stationary stochastic process. The problem is non-trivial due to the possibility of overlapping occurrences and has applications in communication theory and analysis of biological sequences. I will show that this problem can be cast into one of determining the partition function of a 1d lattice gas with interacting particles. Using this analogy, the probability distribution can be obtained from a virial expansion.