'15-'16 Cleaning Data


Use this thread to discuss your questions and comments about how to run the lesson.


Ms. Sundar and I met to discuss and prepare the best way to present this lesson to students. We found that both MS Excel as well as Google sheets provides the same functionality so we decided to use the native file in Google sheets.

The first step was to review the raw data, and understand the file organization, columns descriptions, contents. After this analysis we created a standardized column the categorize the data from tweeter as LEARN, SUPPORT, TEACH and RETWEET. These categories were later on used to create a pie chart that represents the percent of each category with respect of the total tweeters received.

In applying similar concepts while teaching our students, we consider that each of them can create a data set based on their own interest such as sports, colleges, automotive industry, etc. Each student will select this raw data and then categorized based on simple definitions that will allow the students to analyze the distribution of their data. For instance if one student decide to review the statistics on their favorite football team, they can collect this data including the number of games won, final score, best player, location, etc. After they selected their own category they can assign certain given values that will be used as their data categories.

enter link description here


Mr. JJ and I worked together to prepare for this lesson. The lesson requires the use of spreadsheets and understanding how data can be analyzed or sorted based on the criteria decided by the user. The spreadsheets can either be in Google sheets or MS Excel format.The data in the spreadsheet can be categorized by filtering, sorting, and deleting records that match the criteria described. Once the data is categorized, charts can be drawn to enhance the presentation of the data.

We will create a form that the students could use to collect the data for this lesson and organize the same using either Google sheets or MS Excel. The form could contain the collection of the following information like the height of the student, favorite color, favorite subject, sports team, phone text message usage , etc. The students will learn to input the data in the spreadsheet and also learn to sort and filter the raw data. The students can also clean the raw data collected if the input does not match the category.
The following link displays how the challenge data was categorized based on the tweet text and a pie chart was drawn.


I’m trying to wrap my head around Lessons 7, 8 and 9.

It seems like in Lesson 7 Pairs create their own copy of the Class Data Tracker data then clean it.
Then in Lesson 9 they make their own copy of the original uncleaned data for the Practice PT?

This doesn’t make sense to me.

I’m also a bit confused about the fact they work with a partner is 7 and 8, then do their own independent PT? Should I expect them to tell similar stories?



Hi Caroline, I see what’s happening. We’ll try to clear up the language in the lessons.

But here’s the deal.

The general ethos is: Have a spreadsheet buddy. Work together to clean the data and learn how to use other tools. Then use that dataset you made with a partner to do your own individual work.

Lesson 7: Pairs create a copy of class tracker data to clean it. (at this point we’ve got one copy per pair of students).

Lesson 8: Pivot table stuff, working on cleaned copy with partner.

Lesson 9: Each person in a pair peels off their own personal copy for working on the PT (at this point each student has their own copy) – and yes, work as individual for Practice PT (because that’s how the Explore PT is) but they can and should sanity check things with their partner.

If you’re using google docs, of course, a “copy” is an ephemeral thing. There is no reason why partners couldn’t continue to share a spreadsheet for their personal work as long as they play nice and are careful to not destroy their cleaned up original data.

Does this make sense?


Yes that makes sense thanks.