iClone 6
Issue 457
Enable Lip Sync Options & Reduce Lip Keys for ALL iClone iAvatars
Known multi-year issues:
Even with crystal clear, well-normalized 44K 16bit mono vocal wav files
1. IClone lip sync engine produces 50 to 70% REDUNDANT viseme keys
2. IClone lip sync engine fails to line up at minimum 50% of usable keys

IClone lip sync inaccuracies are probably acceptable (or indiscernible) to hobby movie makers
To discerning users, experienced animation artists
or those who've used other prosumer lip sync tools (Mimic or Unity plugins etc)
iClone's auto-generated lip keys are at best 25% accurate
which means, tedious manual adjustment and fine-tuning
or, may as well start from scratch: ERASE the whole auto-generated Viseme>Lips track
and then MANUALLY match little viseme keys, patiently, one by one, to audio file

RL is aware of the issue and have attempted to fix it.
IClone6 introduced fixes:
- Reduce Lip Keys (activated only for CC)
- Lip Sync Options (activated only for CC)

RL also finally added Audio scrubbing in a recent update.
All helped. However
Deep into iClone 6 dev cycle...yet
Reduce Lip Keys and Lip Sync Options have yet to be enabled GLOBALLY for ALL IAVATARS!

G5, G6, Genesis1, Genesis 2, Genesis 3, Maya import, Max import, Blender import, Poser import, Mixamo import can all benefit from iClone's progress in lip sync.
After all, IClone is marketed as a MULTI-CHARACTER-PLATFORM-SUPPORTING character animation tool.

Thank you for listening
and thank you for making iClone a more EFFICIENT animation tool for discerning animators.
OS: Windows 10
  • lipsync.png
  •  10
  •  6188
Submitted byBellatrix
1 2
I just noticed that the links in my comments have end-of-sentence periods added, so they will not work. Here are the corrected links:

Mimic manual: 

Articles and video about JALI:
I have written about this issue MANY TIMES IN THE FORUM.

a) The audio scrubbing in iClone 6 is a God-send.  Thank you, Reallusion.

b) In iC6, I have noticed the "Ah" viseme tends to be at 100%, maybe every time, resulting in a sudden and jarring "jaw drop" that is way out of proportion with the rest of the speech.  I noticed that when using good-quality recorded speech from a human female.

c) I agree about the excess number of visemes.

d) Sort-of off-topic, and a bit minor, but I've noticed the waveforms lag the actual audio by about a quarter-second or so.  Nothing major, but I should write it up someday.

Closing - This is a very vote-worthy topic.  I'll have to review my list of 10 votes and see if I should move one of my votes over here.
I found some recent research that might inspire the development team: 

There is video that goes with it:

And some more details here:
I have done quite a bit of lip sync with iclone 6. It is tedious and needs quite a bit of refinement. Most of the time I find that I am reducing the number of visemes and reusing the same ones just to get a somewhat believable lip movement. 

I think the key is to add more bone and morph control of the face based on the sound files. This is something that has been accomplished years ago by other software developers. 

I don't why this doesn't get more votes, because the technology is there to do things better. Mimic, mentioned above, is a good example. I have used Mimic from the time that it was a Poser plugin and produced by a company called LipSync. I have written this many times in forum posts, but might as well repeat it here: one of the reasons Mimic works so well is that it uses a script of the dialog text to support the viseme assignment. To quote from the Mimic 3 manual:

"The Text area of the Session Manager is where you load plain ASCII text (.txt on a Windows PC) files. Alternatively, you may click in the Text Display field and then type in your desired text. Your text must quote the sound file verbatim. This step is optional, but typically improves results. Mimic analyzes this text file (if present in conjunction with the sound file) to ensure maximum phoneme accuracy. It does this by reading each syllable, applying standard English pronunciation rules, and selecting the best available phoneme for the job."

The manual is actually quite thorough and may serve is a source of inspiration:

There is no shame in borrowing good ideas. 
1 2