Control of intonation and timing is difficult in diphone concatenation synthesis, especially when one aim of the synthesis is to capture and present the voice characteristics of a specific talker. One method for controlling F0 and duration in diphone synthesis, the Pitch Synchronous Overlap Add (PSOLA) technique (e.g., Hamon, et al., 1989) has been shown to produce high quality copy synthesis, and to provide control of both F0 and duration. PSOLA is implemented by manipulating windowed epochs of the time signal. In voiced speech, each epoch is derived by applying a Hanning window centered on the onset of a pitch period and having a length corresponding to twice that of the pitch period.
However, the PSOLA method is sensitive to pitch tracking errors and can introduce distortion when F0 is changed by too great a degree. Moreover, PSOLA has been found to produce greater distortion than residual excited LPC methods of F0 control, at least under some conditions (e.g., Macchi, et al., 1993). Here, we derive a hybrid LPC and time-domain pitch control method that uses residual excited LPC reconstruction for voiced segments only. In diphone synthesis applications, Voiceless segments are processed as per the PSOLA method.