'15-'16 Text Compression

Me too… I used the “Please porridge…” poem and achieved a 45.75%. Just looked for repeating patterns and utilized the previous compression in my patterns as well.

One of my favorite lessons (for any course) ever…challenging at a variety of levels, practical, easy entry point…

I try to find the largest pattern of text that is repeated most often and make my way down through the smaller repetition of patterns until the percentage is actually starts to increase. I think my students will enjoy trying to find the greatest amount of compression possible.

My heuristic

  1. Look for patterns of size 3 that are repeated at least 3 times throughout the poem.
  2. Create a dictionary entry for each of the above.
  3. Look for new patterns that arise at least 2 times throughout the poem.
  4. Create a dictionary entry for each of the above.

This is going to be a fun and challenging lesson. Thanks for posting the Heuristic rules. Greatly appreciated!

My rules:

  1. Look for words/phrases that repeat
  2. Create a compression of the word/phrade, review to ensure that the rate of compression is decreasing
  3. Look for portion of words that are repeated and create a compression for it
  4. Continue looking for any other common letter etc to compress
  5. Always check that the compression rate is getting lower

My heuristic:

  1. Look for the longest string of text that repeats at least once. Substitute that.
  2. Repeat for remaining text
  3. If in doubt, observe the effect of substitution on compression ratio. Substitute only if it results in a higher compression ratio.

Pease porridge compression ratio: 42.48%

I used the pitter patter poem. My best rate was 38.71%. My basic plan was first to find common groups of letters, such as “tter” and make that a new word in the dictionary. Then I used that abbreviation and combined it with “pi” or “pa” to make 2 more new words. I also included “the”, because it showed up twice.

The first time I tried the text compression widget, I thought it was great. I’m looking forward to having my students create their own poems to compress. I’ll post how the lesson went in a few weeks.

Here is my basic heuristic:

1.Examine the text looking for patterns in words that repeat throughout text.
2. Copy and paste into dictionary-check to see if you are getting positive-efficient compression.
3. Re-examine-text with symbol patterns, check to see if you can add new combinations to dictionary.
4. Compare initial compression to your new compression scheme, check for overall improvement.

Students should note that not every substitution results in increased compression. One heuristic is to apply multiple passes. The first pass searches for letter sequence repetitions and substituting symbols. The second pass looks for letter-symbol or symbol-symbol repetitions and substitutes. Repeat passes until the compression % drops. Undo last pass.

My heuristic is to look for chunks that repeat. The longer the chunk the better, as long as you’re not excluding several shorter chunks by lengthening your chunk. Repeat this over and over.

She sells sea shells : 46.63%

I did better at this than I did in training. I tend to not look at longer patterns which could help compress better. It is a good lesson though and I think the kids will have fun with the challenge.

Working on this lesson on my own caused me to reflect on the experience at the PD session last month. I clearly remember feeling confident that I had discovered a fairly high compression rate until I turned to my partner and discovered that she had achieved a much higher rate of compression. This experience indicates the importance of working as a team to share ideas and “fine tune” the results.

The learning objective that speaks to the impossibility of writing a perfect algorithm for compression is demonstrated by the need to keep trying different approaches to obtain better results. This should allow my students to understand that “brute force” is the only way to try all of the possible combinations for optimal results. I think I will introduce the “good enough” rule.

I used the same poem, the alliterative “sea shells” and improved over my compression in class and started using find to see how many repetitions of phrases starting with nine for “ells” and went from there with my eye and selecting words until I reached “e_s” for my final:
I used the snipping tool and see the code where it pasted in the reply and did not save it. Cool

I looked for common groups of letters…for example on the pitter patter poem, “tter_” is the most common group of symbols, and then that got compressed into a sun, and I used the sun for compressing words even more (pi(sun) and pa(sun) were pitter and patter). I got up to about 37 percent on that poem with that method. It’s interesting that there is a “break even” point on this text compression activity where if you introduce too many symbols you stop gaining compression efficiency and actually start losing it.

In my opinion this is one of the mast enjoyable lessons, as it allows the students to experiment freely with text compression. I plan on using these concepts as they relate to image compression and further on to encryption in latter lessons.

My heuristic changed depending on the poem. I looked for longer repetative strings of characters including spaces. I repeated the process until there were no repeats or the compression did not improve. the I got was 44.44 with Pease Porridge.

I might have students bring in a poem or a favorite song to compress. Although there are patterns that students may look for I like that there are entry ways and that this lesson is accessible to all students.

I just played around and looked for chunks that repeated.