Issues with data sets changing

A few of my students have had major issues with the information in the datasets changing or not formatted correctly. For example, in the NCAA Division I Teams table the enrollment column has quotes and commas which doesn’t allow for finding min and max. Similarly, for the Top 200 songs, the titles of the songs change to reflect the most recent top songs. How often are these changed/updated?

It has caused some very real frustration amongst my students.

Hi @ktpanderson ,

Sorry to hear your students are having a frustrating experience.

I took a quick look and it seems the quality of the dataset is dependent on the source it’s pulled from. In the case of the NCAA dataset, CFB DataWarehouse (whoever they are) for some reason decided to format their enrollment as text. Weirdly, the code.org documentation states it’s formatted as a number (it’s not).

I haven’t worked with these datasets much, but I did some cursory research and it’s plausible to create another column with the text values converted to numbers, it’s a bit of a stretch considering the scope of the course (doable, but multiple steps, including the use of functions not in the course) - at least the way I’m thinking of doing it.

Not sure if this is for the Create PT, but if so, unfortunately you can’t help students troubleshoot the code, but you can remind them that they’re allowed to modify their program purpose/features.

The Top 200 songs dataset appears to update daily:
image

Can you clarify why the dataset being updated poses a problem for the students?

I have also noticed a couple of problems with datasets. The good news is if you send a message to support@code.org they will fix it.

Meanwhile, If your class can use libraries here is one that fixes the Enrollment column. C6OYfZDdnF_5vJ5YddKsKwOLbbUt8oQ3Dj_1WahVI2A
Here is a small example of using it Code.org - App Lab.

If the students are matching to specific song titles, for example, they can export the dataset to CSV then import it back in as their own dataset that won’t change.

1 Like

This is helpful, thanks. For the top 200 songs the student created a drop down with song names but when the data set was updated their chosen songs were no longer found in the data set. It caused their output to say undefined.

I can see that happening. We don’t really show them how to populate a dropdown box of choices from a list like top 200 songs. I believe you can show the entire class how to do it in a generic way and it doesn’t count as help on the Create Task.

On one hand it’s a great teaching moment as to the fluidity of using external data sources. On the other hand it’s frustrating for the students in the crunch of trying to get their Create PT to work. The only way the AP Reader would know that the dropdown list is incorrect is if it shows up on the video. I would suggest that student make sure the video shows a couple of valid selections.

BTW - I ran into this issue with the Cereal App that is the high score example for the Hackathon project. I was making a video showing the students how to record their video for their Create PT. I grabbed the Cereal App, remixed it when I used it I got "ERROR: Line: 19: You tried to get a column called “Protein” from a table called “Cereal Nutrition”, but that column doesn’t exist. " Ooops.

One workaround could be for students to download the dataset on the date they make the app using “Export to CSV” and then create a new table using the “Add” button and then “Import CSV” the dataset they just downloaded. This will lead to a static dataset they can reference in their code. Below I captured the top 200 songs from today (March 3) in a table I created.

Screen Shot 2021-03-03 at 3.10.40 PM

Yes, the same thing happened with me re: Cereal App!

GOod idea, but this is beyond what I expect them to do. We are right in the middle of Create, so I stopped the class and showed the Cereal App error to everyone and how I needed to change it. Only 2 students needed that hint, but it forced everyone to look at it and double check.

I have a student that wants to use the enrollment column in the NCAA Division I dataset as a number for his Create PT. Would it be acceptable to let him add this library to his code to convert the strings to numbers as long as he cites in his code that he used a library created by someone else? Not sure how to advise him on this. He understands that the problem is that the enrollment is a text column but he doesn’t know how to convert it to a number.

Yes. Just add a comment to his code in front of the function call. Also once converted it stays converted until code.org updates that database. So students can run it and remove it. I won’t tell anyone if you don’t.

Thank you for your help!