- ( ).. -. 19 9 6, 1, 10 4-12 5.,.,. EGG, EMA, EPGG... formant Animated Vocal Tract Profiles. -.. I., -. Video- CD, T V,. ( ), (intelligibility) (naturalness) 104
,,.. ( ) (masking).. MPEG Video- CD HDT V (Gersho, 1994).,.,,... 64Kbps (8KHz 8 bit ) ( VSELP ) 8Kbps 4.8Kbps ( 4.8Kbps naturalness ). (articulatory movement estimation) (articulatory speech synthesizer) (Schroeter and Sondhi, 1994).,. (feedback),.. 105
-... II. 1. :. (vocal tract). Schroeder (1967) Mermelstein s(1967).. Paige and Zue (1970) Atal and Hanauer (1971). Wakita (1973)...... < 1> Schroeter and Sondhi (1994). 106
< 1> ( : Schroeter and Sondhi, 1994). all- pole model, pulse.... (Wakita, 1973). < 2> Crichton and Fallside (1974)... A cou stic - A rticulat ory Mapping (Rahim et al., 1993; Guenther, 1994). < 3> Rahim et al.(1993) 18. 107
- < 2> ( : Crichton and Fallside, 1974) < 3> ( : Rahim et al., 1994) 108
..,...,., (transition state) (anti- resonance).,. Atal et al.(1978)... 2... EPG(electro- palatograph) EMA(electromagnetic articulograph).. X - Ray X - Ray Microb eam X- ray video.. ( ) microbeam. < 4> Wisconsin X- ray microbeam. MRI. 109
- < 4 > M ic ro b e a m X- ra y ( : http://www.biostat.wisc.edu/ubeam/ images/ shematic.gif). EPG (Electro- P alatograph ). < 5>... EPG. (dorsum ). < 5> Reading University EPG 3 62. < 6> / s/ EPG (Javkin et al., 1993)... 110
< 5 > EP G m o d e l 3 ( : http:// midwitch.reading.ac.uk/ research/ speechlab/ epg) < 6 > / s/ EP G ( : Javkin et al., 1993). EMA (Electromag netic A rticulog raph ) EPG. EMA. < 7> 3 EMA. 3 5. (induced).. 111
- < 7 > 3 EMA ( : Perkell et al., 1992)... -, - EGG (electro- glottalgraph) - Airflow transducer Pneumotachograph - EPG (electro- palatograph) 1982). (, 112
< - 8 > P h ys io lo g ia ( : http://www.lpl.univ- aix.fr/valorisation/ physiologia) III. LPC- Formant Based Vocal T ract Profile Graphics LPC (Linear Prediction Coefficient)- Formant Based Vocal T ract Profile Estimation (1992). opening, rounding degree. animated trajectory. 1. LP C 113
- LPC. LPC < 9>. < 9 > < 9> G, u (n). u (n), random (white noise). (time- varing digital filter),. s(n ). < 9> all- pole (1) (, G:, i:, p: ). H ( z ) = S ( z ) U( z ) = 1 - G p i = 1 iz - i (1) (1) s(n) AR., i G., (ai) (minimum mean squared error ; m in ), i G. 2. LP C LPC (linear prediction method) 114
, (auto- regressive; AR). 3. F o rm a n t F ormant (,, ). 17cm, 3KHz 3 4, 5KHz 4 5 (formant ).,., ( ),.. LPC., non- formant peak (McCandless, 1974). LPC (merging) 1, 2. 4.. (vocal tract profile), (vocal tract ).., 1mm (resolution)., 115
-. (1). (2) 15. (3) X-. (4).,,,..,.,,., 15 LPC.,. 15 pipe 17cm 17cm section., X- Ladefoged et al.(1978) PARAFAC. 17 section 5 10., grid- line grid- line. < 10> Ladefoged et al.(1978) 17 section. 116
< 10 > Ha rs h m a n X- d at a 2 section, ( ) 11, 2 15 section..,, aspect ratio 2.1 1 2 1. < 11>. < 1 1> 117
-. 15 section. LPC An (z) (2). A n ( z ) = forw ard v olum e v elocit y propag at ed v olum e v elocit y, n : section (2), section n n. n = n - 1 1 + C n 1 - C n (3) Cn inverse filter. n section n - 1 section Y,. (4). (4) PARAFAC. X1 = c1f 2 + c2f2f3 + c3(f 1/ F 2) + c4 (4), c1 = 0.300 10-3, c2 = - 0.343 10-6, c3 = 4.143, c4 = - 0.174 X1 X3. X 2 = X 1 + X 3 2 (5) rounding degree, height formant. (mapping ) 118
.,,,.,.. (Xp,Yp), (Xa,Ya) Xp,Yp (6). X p = X a + MX a + NX d M + N, Y p = Y a + M Y a + N Y d M + N (6), M = [ (X a - X d ) 2 + ( Y a - Y d ) 2 ] 1/ 2 N = 1- M Xd, Yd < 12> < 13>. < 12> 119
- Intensity Fundamental Frequency Nasality Vocal T ract Log Spectrum Profile Formant Frequency Articulatory Distance Computation Representation Vocal T ract Area < 13 > IV. 4 5 /,,,, / < 14>.., 4 /,,,, /, Harshman X- < 15>. X-,,. < 16>. (intensity ), (pitch), (nasality ), (spectrum). 120
,,,... < 16> / / / / / / / /. (Park et al., 1994). LPC- F ormant Based Vocal T ract Profile Graphics LPC, opening rounding degree -.. LPC, glottal source. pseudo- glottal w aveform glottal source vocal tract filter pitch- synchronou s tw o channel analy sis -. 121
- < 14 > / /, / /, / /, / /, / / 5 122
< 15 > X- < 16 > ( / /, / / ) V.,. 123
-. EGG, EMA, EPGG..,... (1992).., 41, 209-216. (1982).., 5, 38-44. Atal, B. S. and Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech w ave. J ournal of the A coustical S ociety of A m erica, 50, 637-655. Atal, B. S., Chang, J. J., Mathew s, M. V., and T ukey, J. W. (1978). Inversion of articulatory to acoustic transformation in the vocal tract by a computer sorting technique. J ournal of the A cous tical S ociety of A m erica, 63, 1535-1555. Crichton, R. G. and Fallside, F. (1974). Linear prediction model of speech production with applications to deaf speech training. P roceedings of Institute of E lectrical E ng ineering Control & S cience, 121, 865-873. Gersho, A. (1994). Advances in speech and audio processing. P roceedings of Institute of E lectrical and E lectronic E ng ineering, 82, 900-918. Guenther, F. (1994). A neural netw ork model of speech acquisition and motor equivalent speech production. B iolog ical Cy bernetics, 72, 43-53. Javkin, H., Baroso, N. A., Das, A., Zerkle, D., Yamada, Y., Murata, N., Levitt, H., and Youdelman, K. (1993). A motivation- sustaining articulatory/ acoustic speech training system for profoundly deaf children. 1993 Internationnal Conference on A coustic, Sp eech and S ignal P rocessing, 1, 145-148. Ladefoged, P., Harshman, R., Goldstein, L., and Rice, L. (1978). Generating vocal tract shapes from formant frequencies. J ournal of the A coustical S ociety of A m erica, 64, 1027-1035 124
McCandless, S. S. (1974). An algorithm for automatic formant extraction using linear prediction spectra. Institute of Electrical and E lectronic Eng ineering Transactions on A coustic Sp eech and S ignal P rocess ing, 20, 135-141. Mermelstein, P. (1967). Determination of the shape of the human vocal tract from acoustical measurements. J ournal of the A coustical Society of A m erica, 41, 1283-1294. Paige, A. and Zue, W. (1970). Computation of vocal tract area functions. Institute of E lectrical and E lectronic E ng ineering T ransactions on A udio E lectroacous tics, 18, 7-18. Park, S. H., Kim, D. J., Lee, J. H., and Yoon, T. S. (1994). Integrated speech training system for the hearing impaired. Institute of E lectrical and E lectronic E ng ineering Transactions on R ehabilitation, 2, 189-196. Perkell, J. S., Cohen, M. H., Svirsky, M. A., Matthies, M. L., Garabieta, I., and Jackson, M. T. (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. J ournal of the A cous tical S ociety of A m erica, 92, 3078-3096. Rahim, M. G., Goodyear, C. C., Kleijin, W. B., Schroeter, J., and Sondhi, M. (1993). On the use of neural network in articulatory speech synthesis. J ournal of the A cous tical S ociety of A m erica, 93, 1109-1121. Schroeder, M. R. (1967). Determination of the geometry of the human vocal tract by acoustic measurements. J ournal of the A cous tical S ociety of A m erica, 41, 1002-1010. Schroeter, J. and Sondhi, M. M. (1994). T echniques for estimating vocal- tract shapes from the speech signal. Institute of E lectrical and E lectronic Eng ineering Transactions on Sp eech and A udio P rocessing, 2, 133-150. Wakita, W. (1973). Direct estimation of the vocal tract shape by inverse filtering of acoustic speech w aveforms. Institute of E lectrical and E lectronic E ng ineering T ransactions on A coustic, sp eech and s ignal P rocess ing, 21, 417-427. 125