After training and testing on the sampled movie segment from "rent" and "west side story", we decided to see how it perfroms on a new movie.
In the The beginning, we only tested it on the first 10 seconds of the movie. The result was so amazing, it was 0.0% ... in accuracy, not in error rate. How can that be?
Because the result is so unusual, we compared the features (spectrums) extracted from the test data and training data one by one, and we found one thing:
We forgot to normalize the sound magnitude!
Therefore, we re-trained the model with sound normalization (In fact, we normalize the data in the frequency domain), and see how it perfroms. The error rate had raised, we have no idea why the performance deterioated so much just because we did the normalization job. In fact, we even expected the performance would become better.
Now, we don't know whether we should give up doing this to find a new way. Or we should spend more time on this (but there seems no much time left) approach.
2 則留言:
There are some possible solution. One is using GMM models for nonmusic/music part. Why I propose it? Because in singer identification, the first part is using GMM models for vocal/nonvocal music. I think it might be used here for its quick implementation.
The other possible solution is finding other ways to improve it. Maybe the algorithm we implement is not suitable here. Maybe we should try using openCV to train as a trial. Or maybe we just need to find other ways, such as possibility features or others.
The last choice is that we just skip it and assume that we already find the music part of the film. We can just start to do the next step.
Hello. This post is likeable, and your blog is very interesting, congratulations :-). I will add in my blogroll =). If possible gives a last there on my blog, it is about the Smartphone, I hope you enjoy. The address is http://smartphone-brasil.blogspot.com. A hug.
張貼留言