'15-'16 Text Compression

plohara · August 18, 2015, 2:12pm

Once someone realizes that the library can reference itself, I think the “battle” will really take off.

terryy · August 18, 2015, 7:21pm

This will be an extremely relevant lesson and I look forward to using the tools with my students.

sligator · August 19, 2015, 4:40pm

looked for most common string. Looked to see result. Tried another string. kept the one that gave me the best compression.

stephen_p_sell · August 20, 2015, 1:03am

I REALLY like this lesson. Too often I have tried to explain compression to students and always falling short of defining how it works. This tool (I think) accurately defines how it works. It provides a tangible way for students to learn how compression works. I am VERY excited to try this lesson out.

mstahl · August 20, 2015, 2:46pm

I did the pitter patter poem and achieved a 30.11% rate. I looked for smaller patterns and then combined those when possible.

acrowe · August 21, 2015, 12:07pm

I’m curious what you mean by loss of integrity. Is it possible for this tool to be lossy? I’m thinking it would always be able to reverse the process, if that were an option. I’d like to see the tool go the other way: i.e. students could provide a dictionary, encoded text, and see what comes out when decoded. The best compression should end up with something pretty much unreadable by a human. Here’s what I did for pease porridge:

Pease_porridge_
in_the_pot_Nine_days_old.
Some_like_it_
hot_
cold_

And got a 47.71% rate ending up with this: ★:sunny:_:snowman_with_snow:★:snowman_with_snow:

I wanted to see if there was a way to get dictionary elements that reference each other for further compression, but couldn’t see any repeated symbols in there.

I like your heuristic. I would add that I start with LARGE chunks of text, not just words, and then work down to smaller and smaller bits until I can’t see anything else that can be replaced.

hoganw · August 22, 2015, 10:45pm

I looked for repeating patterns and and using symbols looked to see if those patterns repeated also.

I think that i will have students type in lyrics of a song that they like, and see how much it can be compressed. Many Beatles songs repeat, so they compress nicely.

awade · August 23, 2015, 2:47am

I think that this will be an interesting lesson. For my students I see them trying to compete with each other even though it may be a bit challenging.

david_j_baker · August 23, 2015, 6:53pm

This is going to be a great lesson! A lot of this lesson lends itself to students comparing all the different compressions and seeing what worked and what didn’t. I really want them to explore this lesson as apposed to me spitting information at them!

jmoreton · August 23, 2015, 10:47pm

I’ve been teaching the beginning lessons from unit 1 and find that looking at them here is a lot different than actually presenting them and having 10th graders work on them. Therefore, I’m reluctant to comment on this one. However, I think it’ll be pretty cool.

mmcneil · August 24, 2015, 1:31am

I very much enjoyed doing this lesson and I think my students will like this lesson also.

jparsons · August 24, 2015, 10:51pm

I looked for a word or phrase that occurred most often as the first compression. I then looked for parts of words that may occur often. Then I looked for patterns that were occurring and compressed those into a new symbol.

I was able to get 80.47% with 16 a’s in a row. Not all of the phrases can get compression that high. I had fun with the Pease_ Porridge examples I got 43.79%.

jreyn · August 25, 2015, 9:23pm

Look for repeated words or phrases, incl the dash after each
Look for repeated symbols that have been made by step 1

I used “Pease porridge…” poem. I got a 43.14% comp rate.

I LOVE this activity - lots of fun…but it drives me CRAZY that I can’t find the BEST way to do it. Must be the math teacher in me - I want it to have a right answer and that’s it. But I guess life is not like that all the time.

timothy_ellis · August 26, 2015, 1:06pm

During the 5-day PD, we struggled coming up with a plan that worked well for all the given poems. We imported larger poems, and found a more consistent level of compression. Very interesting!

bhatnagars · August 27, 2015, 1:27am

I did the text - Aaaaaa… and used to replace Aaaa and then to replace the entire sequence of a’s. I got 18 bytes and 85.94% compression.

jpauley · August 27, 2015, 4:42pm

This is a good way to close the lesson…it’s a simple, yet practical example. As for developing my own rule or heuristic (per our Phase 3 prompt), rather than come up with what I think works best, I’ll be excited to document what my students discover works best.

jpauley · August 27, 2015, 4:51pm

As an extension to the extended learning, I would add a mini lesson on how to use 7-zip (7-zip.org), which is probably the most commonly used open source utility for compressing multiple files into a single container. This utility will allow students to take a closer look at LZW compression, and they’ll be able to compare it to other algorithms.

colleen_m_adams · August 28, 2015, 1:02am

I used the following dictionary to get 37% on the pitter_patter set. To develop this I looked for the most common repeated sets of letters and then combined that with some other letters to reduce the phrase.

tter_
pi☀
pa☀
_ the _

willieworld86 · August 28, 2015, 2:32am

I looked for patterns and tried different combinations to get about 29% compression. Tried to think like a student on how he/she would work he problem

sshultz · August 28, 2015, 3:43pm

I used the sea shells and got 26.97% compression. I started with “she”, and one of the cool things was noticing that it didn’t just compress the entire word “she” but also the beginning of the word “shells”. I then added just “lls” since it repeated several times, then the two symbols for “she” and “lls” since that repeated several times, then did any more repeating sets. I think this would be a good one to have students try and then discuss why you don’t always need to have complete words in the dictionary.

Topic		Replies	Views
'15-'16 Bytes and File Sizes Unit and Lesson Discussion csp-unit-2 , csp-unit-2-lesson-1	44	6098	September 28, 2016
'16-'17 General Discussion for Lesson 2.2 Unit and Lesson Discussion csp-unit-2 , csp-unit-2-lesson-2	6	4079	February 25, 2017
'15-'16 Lossy Compression and File Formats Unit and Lesson Discussion csp-unit-2 , csp-unit-2-lesson-5	27	6092	September 28, 2016
17-18 Text Compression Unit and Lesson Discussion csp-unit-2 , csp-unit-2-lesson-2	2	3401	February 18, 2018
'15-'16 Encoding B&W Images Unit and Lesson Discussion csp-unit-2 , csp-unit-2-lesson-3	27	5026	September 28, 2016

'15-'16 Text Compression

Related Topics