How to make good audio
I used the voice of the song sung by my favorite singer to remove the accompaniment, that is, the pure voice. The total time was more than two hours, and the noise reduction was also set on. I don't know why the training effect is very bad
-
Remember that anytime you attempt to use audio with a mix of sounds, the results won't be as good; in fact, they may end up horrible. With sound, there are several types of phenomena to consider, since sound is simply the molecular transmission of vibrations through a particular substance (in most cases, air, but can be ground, water, or any gaseous substance), and it cannot transmit through a vacuum or empty space. Also, there are many types of vibration, most of which are rythmic, corresponding to full wavelike motion, while some actually only comprise a half of that WHEN THEY ARE INITIALLY FORMED, but generate a less than perfect waveform of their own (the compression doesn't exactly match the rarefaction or trough. Both of these are inherent in human speech. Consider that most consonant sounds have a less than rythmic opening, even if followed by a rythmic sustain. Now consider that every discernable TONAL sound like that sustain is made up of FORMANTS, FUNDAMENTALS, and PARTIALS. The FORMANT makes the transition from the opening to the sustain, the FUNDAMENTAL tone really is like the "Note" or "key" being played (similar to music), and the PARTIALS are less audible, but extremely UNIQUE tones that have both rythmic and non-rythmic characteristics, above and below the Fundamental tone (in frequency), that are unique to the PHYSICAL makeup of the object making the sound. Two violins might play the same note, but sound just a bit different, mostly because they don't share the exact same molecules in the exact same makeup. The material will have differences that affect the sound, as will the strings, the bow, the person pressed against the instrument. Tuning will also have an effect, since tuning will rarely be absolutely perfect, they will be slightly off. Their timing when playing together will be slightly off, based on the skill of the player, which will put their sounds at different timings in the waveform they create...
The more sounds you have in a waveform capture, the less true your AI training will be, since the partials, which actually provide the more minute but more definitively discerning frequencies of a vocal representation, are heavily affected. If some are louder, others removed by your vocal remover, you will be affecting the sound of the original vocal characteristics used by the AI to model the sound.
Also, remember that singing and speaking are two different styles of speech. The AI program is designed for Speaking, not necessarily for singing. This is because Singing is more like playing a musical instrument than talking. If you played a tune on a piano very staccato with very short notes and short holds, you'd be doing something more like speech. Singing changes the wave shaping that the AI would use to mold the model.
Bottom line, find interviews and such with the artist you want to model where they are speaking, then use that to build your model.
0
Please sign in to leave a comment.
Comments
1 comment