'16-'17 General Discussion for Lesson 2.2


Use this thread to discuss your questions and comments about how to run the lesson.



Yes! Questions for the responses are now in the downloadable csv file! I hope this is throughout. I download the questions to review the responses and sometimes several stages at a time so this is a big help! Thanks so much!


For my heuristic,
I. compressed phrases that repeated
2. compressed words that repeated
3. compressed parts of words that appear repeatedly
watch how much each of these contribute to the compression. My compression was 34.17%.

I realized that some words actually reduced rather than increasing the text compression.

Is the data getting lost? redirecting-why or why not would this process result in data being lost?
How can data be reconstructed? Using the dictionary


I compressed the poem, “She Sells Sea Shells”. I was able to get 25.84%. However, as I tried to add additional repeated patterns to the dictionary, the compression percentage went down.

My heuristic boiled down to the following steps:

  1. I looked for repeated phrases first and added them to the dictionary
  2. As I added phrases and/or words to the dictionary, I monitored the Compression %.
  3. If something I put in reduced the compression rate, I took it out of the dictionary.
  4. I continued compressing with additional repeated words.

Reminding the students that trial and error is okay and that it will add to the challenge of these activities.


So this is the thing about this kind of text compression…there is always a tipping point where the amount of stuff in the dictionary overwhelms the benefits of the compression. (extreme example: if you added each individual character of the alphabet to the dictionary, you’ve now at least doubled the size of the “compressed” version since you now have to account for two characters were there was just one.

So what you want is to substitute a single characters for as many large groups of characters as possible.

What’s nuts is that there is no way to know what’s best, and what you choose to substitute first makes a difference. But you always reach the tipping point. For example, I’ve attached a screenshot of a way to do “she sells sea shells” that gets ~40% compression. And I followed basically the same heuristic you outlined!


My heuristic was to:
Look for repeating word combinations
Look for basic words