After training and testing on the sampled movie segment from "rent" and "west side story", we decided to see how it perfroms on a new movie.
In the The beginning, we only tested it on the first 10 seconds of the movie. The result was so amazing, it was 0.0% ... in accuracy, not in error rate. How can that be?
Because the result is so unusual, we compared the features (spectrums) extracted from the test data and training data one by one, and we found one thing:
We forgot to normalize the sound magnitude!
Therefore, we re-trained the model with sound normalization (In fact, we normalize the data in the frequency domain), and see how it perfroms. The error rate had raised, we have no idea why the performance deterioated so much just because we did the normalization job. In fact, we even expected the performance would become better.
Now, we don't know whether we should give up doing this to find a new way. Or we should spend more time on this (but there seems no much time left) approach.