Heuristics for Text Compression


In my experience, students are pretty good at developing a compression that works well with a specific text or song, but they really struggle to find a method that is applicable to every situation.

Things I like to get them to focus on:

  1. Can you find the longest pattern that repeats the most times?

  2. Would it be more valuable to find a set letter number to focus on?
    2a) Find all patterns of 4 letters, then 3 letters.
    2b) Replace the pattern that is most frequent first.

  3. Once they have developed a plan, go back to the first text that they compressed, how does this new plan compare?

There are a lot of little things that students can add in to make their code work better. It is also important to see how many times does a 2-letter combo need to occur for it to be a net positive in adding a new character to the dictionary.


Thanks for sharing. Great questions!