AI and Machine Learning - Lesson 22

Has anyone found any websites that offer different datasets for students to look at?

I did this unit with my 8th grade students last year and we created a survey for the entire school to take. While I was happy with it, I wanted to see if there is anything out there that they could use as a guide.

We decided on a topic as a class, I sent the survey out to the school and then they created the app from that information.

Thanks for your help.

Kaggle.com is a good site with a lot of relevant datasets. I found a fun one on Harry Potter characters that makes a good lesson on data cleansing. It’s got lots of relatable data (with some errors!)

Mike

1 Like

Thank you for your help.

Krista

It looks like the AI Bot is only testing a random sample of the data rows to determine the accuracy… is this how it should work?

Hi @dmaletta - great question! There’s a teaching tip related to this in Lesson 16:

In realistic situations: the “testing” dataset should be a random sample from the original dataset. This means the model is created using 90% of the data, and then validated and tested against the other randomly-collected 10% of the data.

However, as the teaching-tip notes, this can lead to slight changes in accuracy since different testing data is grabbed each time the model is trained. In classroom settings, this can lead to curious situations where two students might use the same dataset with the same features, but get different accuracy.

To avoid this confusion throughout the unit and create a consistent, expected teaching experience: most of the lessons use the last 10% of the dataset as the testing data so it’s the same every single time. But this is really more of a pedagogical consideration - by the time we get to these projects, we use the more realistic real-world setting of choosing a random 10% of the data instead.

Hope that helps?

mwood,

I’ve explored Kaggle in the past, but I struggled to make those sets work inside the AI lab. Maybe I’m getting bad data.

Could you point me to this Harry Potter dataset that worked for you?

This is the one I used …

Mike

1 Like