2007年6月2日 星期六

What A Bad Result

After training and testing on the sampled movie segment from "rent" and "west side story", we decided to see how it perfroms on a new movie.

In the The beginning, we only tested it on the first 10 seconds of the movie. The result was so amazing, it was 0.0% ... in accuracy, not in error rate. How can that be?

Because the result is so unusual, we compared the features (spectrums) extracted from the test data and training data one by one, and we found one thing:
We forgot to normalize the sound magnitude!

Therefore, we re-trained the model with sound normalization (In fact, we normalize the data in the frequency domain), and see how it perfroms. The error rate had raised, we have no idea why the performance deterioated so much just because we did the normalization job. In fact, we even expected the performance would become better.

Now, we don't know whether we should give up doing this to find a new way. Or we should spend more time on this (but there seems no much time left) approach.

2007年5月22日 星期二

some problems in the multiboost training

We modified the feature extractor, because we didn't choose samples from training segment randomly before. And then the performance dropped. We thought it might be overfitting in short segments, for the cause that short samples could be chosen with more probability.

Now, I am trying to figure out this toolkit is implemented with what kind of theory. Because multi-class boosting has many variances, maybe we cannot define classes like what we have done.

here are some papers related with multiboost toolkit:
1. Aggregate Features and AdaBoost for Music Classification
http://www.iro.umontreal.ca/~casagran/docs/2006_ml_draft.pdf
"Classification with AdaBoost" part
2. a brief introduction to boosting
http://0rz.tw/b62Dn

2007年5月11日 星期五

"west side story" is labeled and some outcomes

"west side story" is labeled by me, and some the features are extracted. we will train some models built by them. Also, sutony has trained 2 models by "rent" features with 1000 iterations using multiboost toolkit, the outcome is as below:

Error
class 1: 43.5%
class 2: 57%
class 3: 41.38%
class 4: 58.48%
overall: 50.09%

Error
class 1: 32.82%
class 2: 46.8%
class 3: 33.64%
class 4: 47.08%
overall: 40.09%

the result seems strange. the latter model is always better than the previous one in all classes. we will try to figure out what happen.


multiboost toolkit:
http://www.iro.umontreal.ca/~casagran/multiboost.html#

2007年5月10日 星期四

"Rent" is labeled!!!

We have labeled the musical film "Rent", and some frames are randomized-taken out to extract. And we also put them into multiboost toolkit we find. Hope that they will have a good performance.....God bless us!!

2007年5月8日 星期二

Annotation Difficulty

We face some problems when we do manual annotation. Should we label
  1. music/non-music?
  2. singing/non-singing?
  3. music with singing/music without sining/no music
  4. singing with music/singing without music/no singing
  5. ...... something else
One of a reason why we're doubting is because we don't know "two adaBoost" or "one MultiBoost" would be better to identify A, B, and C. Note: the A here may include B, and vice versa.

Also, another question is, how can we handle the no-vocal part between two sentences? Is it music with singing or pure-music? Would it influence our training results?

2007年5月7日 星期一

How to Start: Let's Extract Musical Part

We've read a paper about Speech/Music separation - "Frame-Level Speech/Music Discrimination using AdaBoost." Because of the powerful feature selection ability of AdaBoost, We think the algorithm introduced in this paper can be exploitted to do other classifactions too. For example, pure music and music with vocal.
Therefore, the first step we're going to do is implementing the audio classfication by AdaBoost. Some difficutly we might face:
  • Mannual Annotation - a very boring process
  • AdaBoost needs much training time - which means we don't have many chances to test