Short answer: Yes - according to the format given.
Because we don't know the image dimensions ahead of time, we have agreed to reserve the first 8 bits to communicate the width, then another 8 bits to represent the height. (And the rest is pixel data.) That's all we know ahead of time.
So let's say given this agreement, I try to read your 3-bit file. What would happen? (We wouldn't be able to interpret it following our rules.)
Does that also answer your question "Why do I need 16 bits for the metadata"? If we agreed on 16 bits of metadata beforehand, I won't be able to interpret anything you send me that doesn't follow that agreement.
Now, it IS possible to represent a 1-pixel image the way you describe, but we would have to change the image format - in other words, change our agreement on how we represent the data.
Maybe the question was phrased in a way it's unclear whether we get to change the formatting "protocol", but it seems if we interpret it to mean we get to change the format, there'd be no point in even setting aside pixels to communicate width and height if the image format you stated (1 bit for width, 1 bit for height) ever only allows an image to be 1x1 - if that were the case, you would only need 1 bit total to communicate pixel data.
I hope that's helpful to you and your students.