2007年5月22日 星期二

some problems in the multiboost training

We modified the feature extractor, because we didn't choose samples from training segment randomly before. And then the performance dropped. We thought it might be overfitting in short segments, for the cause that short samples could be chosen with more probability.

Now, I am trying to figure out this toolkit is implemented with what kind of theory. Because multi-class boosting has many variances, maybe we cannot define classes like what we have done.

here are some papers related with multiboost toolkit:
1. Aggregate Features and AdaBoost for Music Classification
http://www.iro.umontreal.ca/~casagran/docs/2006_ml_draft.pdf
"Classification with AdaBoost" part
2. a brief introduction to boosting
http://0rz.tw/b62Dn

2007年5月11日 星期五

"west side story" is labeled and some outcomes

"west side story" is labeled by me, and some the features are extracted. we will train some models built by them. Also, sutony has trained 2 models by "rent" features with 1000 iterations using multiboost toolkit, the outcome is as below:

Error
class 1: 43.5%
class 2: 57%
class 3: 41.38%
class 4: 58.48%
overall: 50.09%

Error
class 1: 32.82%
class 2: 46.8%
class 3: 33.64%
class 4: 47.08%
overall: 40.09%

the result seems strange. the latter model is always better than the previous one in all classes. we will try to figure out what happen.


multiboost toolkit:
http://www.iro.umontreal.ca/~casagran/multiboost.html#

2007年5月10日 星期四

"Rent" is labeled!!!

We have labeled the musical film "Rent", and some frames are randomized-taken out to extract. And we also put them into multiboost toolkit we find. Hope that they will have a good performance.....God bless us!!

2007年5月8日 星期二

Annotation Difficulty

We face some problems when we do manual annotation. Should we label
  1. music/non-music?
  2. singing/non-singing?
  3. music with singing/music without sining/no music
  4. singing with music/singing without music/no singing
  5. ...... something else
One of a reason why we're doubting is because we don't know "two adaBoost" or "one MultiBoost" would be better to identify A, B, and C. Note: the A here may include B, and vice versa.

Also, another question is, how can we handle the no-vocal part between two sentences? Is it music with singing or pure-music? Would it influence our training results?

2007年5月7日 星期一

How to Start: Let's Extract Musical Part

We've read a paper about Speech/Music separation - "Frame-Level Speech/Music Discrimination using AdaBoost." Because of the powerful feature selection ability of AdaBoost, We think the algorithm introduced in this paper can be exploitted to do other classifactions too. For example, pure music and music with vocal.
Therefore, the first step we're going to do is implementing the audio classfication by AdaBoost. Some difficutly we might face:
  • Mannual Annotation - a very boring process
  • AdaBoost needs much training time - which means we don't have many chances to test

2007年5月6日 星期日

Our Goals

What we're interested in is the analysis of music video (or opera) structure. The primal steps we might adopt is as follows:
  • Audio
  1. Separate the non-music (speech, silence) and music (pure-music, vocal) parts.
  2. Extract the vocal in the music parts got from step 1.
  3. Identifying roles by voice.
  • Video
  1. Face (role) detection
  2. Identifying roles by face (costume) recognition

After getting the clues from audio and video, we would try to analysize the social relationship between the roles

  • Social Relationship
  1. Who is the leading role
  2. How much relevence is between any pair of two roles

If those goals above can be achieved succesfully, some application can be done:

  • Application
  1. Give users only the fragments of those actors/actresses he/she is interested in
  2. The graphics of roles relationship
  3. A simple script. I mean the program can automatically make marks for different roles, this make the audience easier to understand the story.

Brand New Start

  • Yes, this article means our blogger has just been opened.
  • The blogger offers the space for the members of the class "Multimedia Analysis and Indexing" to interchange the ideas. Of course, what we're making effort on and our new idea will be posted here immediately. Any advice or suggestion is welcomed.
  • We're a group of two people, that is, Yu-Ching Lin and me (Ya-Fan Su). We're both EE senior students, and we're both not single currently. Therefore, I have to apologize to those girls who are interested in us.