'15-'16 Text Compression

Once someone realizes that the library can reference itself, I think the “battle” will really take off.

This will be an extremely relevant lesson and I look forward to using the tools with my students.

looked for most common string. Looked to see result. Tried another string. kept the one that gave me the best compression.

I REALLY like this lesson. Too often I have tried to explain compression to students and always falling short of defining how it works. This tool (I think) accurately defines how it works. It provides a tangible way for students to learn how compression works. I am VERY excited to try this lesson out.

I did the pitter patter poem and achieved a 30.11% rate. I looked for smaller patterns and then combined those when possible.

I’m curious what you mean by loss of integrity. Is it possible for this tool to be lossy? I’m thinking it would always be able to reverse the process, if that were an option. I’d like to see the tool go the other way: i.e. students could provide a dictionary, encoded text, and see what comes out when decoded. The best compression should end up with something pretty much unreadable by a human. Here’s what I did for pease porridge:

Pease_porridge_
in_the_pot_Nine_days_old.
Some_like_it_
hot_
cold_

And got a 47.71% rate ending up with this: :sunny::comet::sunny:★:sunny::open_umbrella:_​:snowman_with_snow::comet::snowman_with_snow:★:snowman_with_snow::open_umbrella:

I wanted to see if there was a way to get dictionary elements that reference each other for further compression, but couldn’t see any repeated symbols in there.

I like your heuristic. I would add that I start with LARGE chunks of text, not just words, and then work down to smaller and smaller bits until I can’t see anything else that can be replaced.

I looked for repeating patterns and and using symbols looked to see if those patterns repeated also.

I think that i will have students type in lyrics of a song that they like, and see how much it can be compressed. Many Beatles songs repeat, so they compress nicely.

I think that this will be an interesting lesson. For my students I see them trying to compete with each other even though it may be a bit challenging.

This is going to be a great lesson! A lot of this lesson lends itself to students comparing all the different compressions and seeing what worked and what didn’t. I really want them to explore this lesson as apposed to me spitting information at them!

I’ve been teaching the beginning lessons from unit 1 and find that looking at them here is a lot different than actually presenting them and having 10th graders work on them. Therefore, I’m reluctant to comment on this one. However, I think it’ll be pretty cool.

I very much enjoyed doing this lesson and I think my students will like this lesson also.

I looked for a word or phrase that occurred most often as the first compression. I then looked for parts of words that may occur often. Then I looked for patterns that were occurring and compressed those into a new symbol.

I was able to get 80.47% with 16 a’s in a row. Not all of the phrases can get compression that high. I had fun with the Pease_ Porridge examples I got 43.79%.

  1. Look for repeated words or phrases, incl the dash after each
  2. Look for repeated symbols that have been made by step 1

I used “Pease porridge…” poem. I got a 43.14% comp rate.

I LOVE this activity - lots of fun…but it drives me CRAZY that I can’t find the BEST way to do it. Must be the math teacher in me - I want it to have a right answer and that’s it. But I guess life is not like that all the time.

During the 5-day PD, we struggled coming up with a plan that worked well for all the given poems. We imported larger poems, and found a more consistent level of compression. Very interesting!

I did the text - Aaaaaa… and used :sunny: to replace Aaaa and then :sunny::sunny::sunny::sunny: to replace the entire sequence of a’s. I got 18 bytes and 85.94% compression.

This is a good way to close the lesson…it’s a simple, yet practical example. As for developing my own rule or heuristic (per our Phase 3 prompt), rather than come up with what I think works best, I’ll be excited to document what my students discover works best.

As an extension to the extended learning, I would add a mini lesson on how to use 7-zip (7-zip.org), which is probably the most commonly used open source utility for compressing multiple files into a single container. This utility will allow students to take a closer look at LZW compression, and they’ll be able to compare it to other algorithms.

I used the following dictionary to get 37% on the pitter_patter set. To develop this I looked for the most common repeated sets of letters and then combined that with some other letters to reduce the phrase.

tter_
pi☀
pa☀
_ the _

I looked for patterns and tried different combinations to get about 29% compression. Tried to think like a student on how he/she would work he problem

I used the sea shells and got 26.97% compression. I started with “she”, and one of the cool things was noticing that it didn’t just compress the entire word “she” but also the beginning of the word “shells”. I then added just “lls” since it repeated several times, then the two symbols for “she” and “lls” since that repeated several times, then did any more repeating sets. I think this would be a good one to have students try and then discuss why you don’t always need to have complete words in the dictionary.