'15-'16 Text Compression

Heuristic: Try different combinations of letters that appear in the text (at least 2 symbols, and no more than 1/2 the letters) and keep the one that makes the most compression.
Repeat until you cannot find any more combinations that reduce it further.

I developed this heuristic after trying to figure it out, and then realizing that a simple strategy could arrive at a really good solution. There may be better ones, but it has the next best compression at every stage.

I was able to get over 50% compression on a couple of the non-aaaaaaa ones.

Love the text compression widget. I will also use my digital cameras for this lesson to teach about compression of images.

I think ending the lesson with the rules for the heuristic is a great way to reinforce algorithms and the need for computational thinking in computer science, which further opens students’ eyes to what it takes to be successful in CS.

I think the song compression is a great idea for a homework or extension topic.

I looked for patterns. I used pitter_patter and the. This gave me a text compression of 34.41%. For the Some like it poem, I used the below compression resulting in 40.52 but then through trial and error, I was able to get a 42.48% compression.
pease_porridge
hot
cold
some_like_it
in_the_pot
nine_days_old

I remember this lesson from our training. It was interesting to see the different patterns everyone came up with and to compare the compression rates. It really spurred the competitiveness in some of us.

The thing I am most looking forward to is the discussion of when it goes from being a positive to a negative (when your dictionary ends up holding more than the original message). Some of us like to get really creative and it is helpful where other times it ends up being more trouble than it is worth, so to speak.

I will also wait until the students are partway through the activity to show them they can copy and paste symbols that have already been used if they want to make them part of another compression term (per se).

I plan to follow this lesson pretty closely.

I first looked at words or phrases that repeat more than once in the poem.
Then I copied them and put them in to the dictionary to determine if I got a postitive compression.
I then checked to see if there are phrases that repeat for compression as well.
Compare the results with the result prior to making another dictionary entry to see if there is any improvement,
I got 24.73% on pitter patter

The best I did in terms of compression was about 27%. I know that with more time I probably could be able to compress further. My rule of thumb for compression in these activities is: Effective compression is achieved when the compression results in a savings of time + space and no errors.

For me, this was a really fun lesson. The fact that I am not stellar in this area might really be a good thing; as I could attempt this in front of the students and they would gang up to figure out how to improve!

Love that compression tutorial as well as the tool. It makes it clear what compression is.

I used aaaaaaaaaaaaaaaa for the Aaaaa… pattern. I realized that the tool is not case sensitive. I got 25 bytes total and and 90.47 % compression.

Having done this in the PD, I liked the constructivist aspect to this lesson. Setting it up as a competition seems like a good way to run the simulation ad then having students report out to one another.

My heuristic for the first poem:

Find the largest possible set of consecutive characters that repeat the greatest number of times.
Encode that into a single character in the dictionary.
Repeat prior two steps until compression cost of adding characters outweighs gains in attempting to further compress text.

Starting with “tter_” as the first character in the dictonary, I found myself able to get pretty darn far.

I looked for a pattern of text that was repeated the most often (tter). I then found (PA) and realized that if I include icons as well I could continue to compare. I then found (PI) and also added the icon for (tter) and I was able to reach 29.03%.

Hi everyone!

Just wanted to let you know that we’ve got a really neat new tutorial video available with the text compression widget featuring singer/songwriter Aloe Blacc (you may not know him by name, but may know his very popular song, “Wake me up”.) In addition to teaching how the tool works, as the original video did, we’ve added in some great support content about what digital compression is, including definitions and examples of lossy vs. lossless compression.

Level in Code Studio where your students will see the video: https://studio.code.org/s/cspunit1/stage/13/puzzle/21
Standalone tutorial video on YouTube: http://youtu.be/LCGkcn1f-ms
A version of the video that just talks about digital compression, without a tutorial on the tool: http://youtu.be/By30SCp-Tsw1

Also, I wanted to note that while the Pitter Patter and other rhymes are still available in the tool, we’ve now defaulted to a section of “Wake me up” to match the video and added a couple other Aloe Blacc songs to the tool for your students to play with if they’d like.

Check it out and let us know what you think!
Sarah

2 Likes

This was one of my favorite lessons during the PD. I agree that the compression tutorial is a great tool to use. I plan on giving the students more time on this to allow them time to really explore this concept.

I like this lesson since it really starts dealing with pattern recognition and replacing them (i.e. subroutines, functions, and methods). The trick is to determine at what point do you create a separate pattern or one that encompasses another pattern. This is something that students just need to develop thru experience and time. What I did it work inside out - in other words, first see a small pattern that repeats and then see if that pattern repeats within other bigger patterns and build my rules that way. Eventually, what’s left is pretty much individual words. This way I as able to get 38% compression on “Wake me up” song.

This is a fun activity that lets students try and often discover that what they thought would work in fact might not. I basically scanned the text for the most repeated patterns of some length and considered whether I could build patterns comprised of other patterns. Using “So wake me…” I got 39.58% compression. One was a pattern containing another pattern.

This lesson was a hit with the students. They loved using the text compression widget to compete with their friends to see who could come up with the best compression.

My students loved the activity. However, they were disappointed that they couldn’t store their work to show to others later. Is there a way to do that?

Hello! Unfortunately there’s no built in or automatic way in the tool right now to save work to show others later. It’s a bit clunky, but they can always copy/paste the content of the message and the dictionary into a saved text file and copy/paste it back later.

Definitely something we’ll consider as we revise the tools, and thank you for the suggestion!

-Sarah