ECS 6.0 ver, Unit 5, Instructional Days 22-24... I need help! :(

I do not understand what we are being asked to do when the curriculum says… (ECS Version 6.0, page 246-247)….

“Now it is your turn, Text mining—analyzing word counts.

Demonstrate how to do the following:

This is a good opportunity to explain that the tweets are stored in an array or vector, where the numbers in front indicate the place the tweet is in the vector.

Arrays are an important concept in computer science. Storing items in an array allows us to access particular
elements, search and sort. Demo how to view the vector and point out that each of the array elements of the corpus matches the corresponding tweet in the data file.

Create a frequency table that separates out each word and counts how many times it appears in all the tweets.

Ask questions such as: What is the word that appears least frequently? What is the word that appears most frequently?

Demo how to produce frequency tables that show only the most frequently appearing words and the different sorting options.

Demo how to produce a bar chart of frequently occurring words.”…

I need help with understanding were to start with this… I’m lost? Help! :frowning:

@madohuf

To clarify your question… Have you looked at the activities that follow the lesson outline? I just want to know if you need to know what to do or how to do it.

Andrea

How to do it… specifically…

Demonstrate how to do the following:

“This is a good opportunity to explain that the tweets are stored in an array or vector, where the numbers in front indicate the place the tweet is in the vector.”

Not sure what is meant by the terms… array & vector,

“Demo how to view the vector and point out that each of the array elements of the corpus matches the corresponding tweet in the data file.”

Thanks for your inquiry… Matt

I am working on an answer to your specific question.

Andrea

Thank you very much! :smile:

Hi Matthew

I’m not exactly sure what is happening here either but my guess is that this is left over from the days when Unit 5 used the language R and something called the deducer to do a bunch of the data processing. Most people will not use those to do this unit and will instead stick to spreadsheet products to look at data and create charts. The basic point of this section seems to bee looking at word frequency in the tweets so I easy way to do this with students might be to copy out the text of the tweets into a word cloud creator.

Hope that helps!

-Dani

Thank you for sharing this with me. I think I understand that now! :blush: Matt