55 말소리와음성과학제 권제 3 호 (2009) pp. 55~63 Automated Speech Analysis Applied to Sasang Constitution Classification ABSTRACT This paper introduces an automatic voice classification system for the diagnosis of individual constitution based on Sasang Constitutional Medicine (SCM) in Traditional Korean Medicine (TKM). For the developing of this algorithm, we used the voices of 473 speakers and extracted a total of 44 speech features from the speech data consisting of five sustained vowels and one sentence. The classification system, based on a rule-based algorithm that is derived from a non parametric statistical method, presents binary negative decisions. In conclusion, 55.7% of the speech data were diagnosed by this system, of which 72.8% were correct negative decisions. Keywords: non-parametric, quantitative, Sasang, constitution, SCM, TKM. (,,, ),,, [].,,, [2],[3]. 4() ) doskian@kiom.re.kr 2) jhyoo@kiom.re.kr 3) haejung064@kiom.re.kr 4) ssmed@kiom.re.kr. (: 0028438) : 2009 7 5 : 2009 8 6 : 2009 9 7.,. Chiu 4 (zero-crossing), (peak) (valley),, 40-60 () 70% [4]. FD(Fractal Dimension) DTW (Dynamic Time Warping) (Qi-vacuity), (Ying-vacuity) 85% [5]. () (). ().. ()... ().., (). [6].,.
56 말소리와음성과학제 권제 3 호 (2009),. [7], [8].,,. 00. 2. 2. 2007 2008 7 0 ( ) 3. 5 60 4 (), 20 60 (). 0 80 (48.52±5.50yr), 64 ( 32, 44, 88), 309( 82, 6, ) 473. %. 30dB. Sennheiser e-835s 4-5cm. 5 (,,,, ) 2-2 (. ). PCM signed 6bits, mono (sampling rate) 44,00Hz. 2.2 <> 44. 5 (a, e, i, o, u).. 44 Figure. 44 Speech Features for Constitution Classification 2.2. F 0, Intensity, Formant MDVP (F 0, fundamental frequency) (intensity), (formant), MDVP(Multi-Dimensional Voice Program). Praat (script) [9]-[0] 20msec F 0,, 2 (F, F 2), -3dB (bandwidth), (amplitude),. Praat., Matlab. To Pitch (ac)... 0.00 75 3 yes 0.03 0.45 0.0 0.35 0.4 500 To Intensity... 00 0.005! To Formant (burg)... 0 5 5000 0.025 50! To Formant (burg)... 0 5 5500 0.025 50 deltat = 0.020 dur = ('offsettime'-'onsettime') / deltat iter = 'dur:0'+ i = 'onsettime' while i < 'offsettime' db = Get value at time... i Cubic f = Get value at time... i Hertz Linear w = Get bandwidth at time... i Hertz Linear f2 = Get value at time... 2 i Hertz Linear w2 = Get bandwidth at time... 2 i Hertz Linear mimamp = Get minimum... i i+0.02 Parabolic maxamp = Get maximum... i i+0.02 Parabolic fileappend 'wfilename$' 'i:3''tab$''f0:''tab$''db:''tab$' 'f:''tab$''w:''tab$''f2:''tab$''w2:''tab$''mimamp:5
음성을이용한사상체질분류알고리즘 57 endwhile ''tab$''maxamp:5''newline$' i = 'i'+deltat F 0 5 (segmentation). 500msec 750msec (onset) (offset) (candidates) 20%~80%. 5. /u/ F 0 Praat F 0. F 0 (Pearson correlation coefficient), F 0 0, 50, 90th (percentile),. MDVP []. F 0 / (jitter) (Jita, Jitt, RAP, PPQ) (shimmer) (ShdB, Shim, APQ).. MDVP Table. Formulas for jitter and shimmer related parameters used in MDVP n 2 STD å( F i - F) n - i= N Jita å - T 0 T N - i= - i- 0i RAP N-2 T + T + T 0i+ 0i 0i- å - T0i N - 2 i= 3 N- åt0i N 00 PPQ APQ 2.2.2 MFCC i= 0 ShdB Shim (Mel-scale) (cepstrum) MFCC(Mel-Frequency Cepstral Coefficients) [2]. MFCC. MFCC HTK(Hidden Markov Toolkit), 2 MFCC 65 MFCC. MFCC HTK <2>. 2. MFCC HTK (HCopy Command) Table 2. HTK Coding Parameters for the Extraction of MFCC OPTION VALUE SOURCEKIND WAVEFORM SOURCEFORMAT WAV SORCERATE 227 TARGETKIND MFCC_0 TARGETRATE 00000.0 SAVECOMPRESSED F SAVEWITHCRC F WINDOWSIZE 400000.0 USEHAMMING T PREEMCOEF 0.97 NUMCHANS 30 CEPLIFTER 22 NUMCEPS 2 ENORMALISE F 2.2.3 <3> 44. 3. Table 3. The Definition of Speech Features xf0 average fundamental frequency xt0 period of the average glottal period xstd standard deviation of F 0 xjita absolute jitter xjitt jitter percent xrap relative average perturbation xppq pitch period perturbation quotient xshdb shimmer in db xshim shimmer percent xapq amplitude perturbation quotient xf st formant xbw st 3dB bandwidth xf2 2nd Formant xbw2 2nd 3dB Bandwidth xmfcc~2 ~2th MFCC xc0 energy CORR correlation between F 0 and intensity P0 0th percentile of F 0 P50 50th percentile of F 0 P90 90th percentile of F 0 PHL (P90-P50)/(P50-P0) I0 0th percentile of intensity I50 50th percentile of intensity I90 90th percentile of intensity IHL (I90-I50)/(I50-I0) x : 5(a, e, i, o, u)
58 말소리와음성과학제 권제 3 호 (2009) 3. 3.. 44 train set. umin, lmax. umin ui, lmax li. umin, ui, lmax, li 4. 2. 4 2 Figure 2. 4 Conditional Variables and 2 Logical Rules 3.2,,, BMI 44,,. 4,,, BMI(Body Mass Index). 0.3 9(eSHDB, eshim, uf0, if2, P0, P50, emfcc8, imfcc8, omfcc8, umfcc8), 2(uF0, P50) BMI 0.3. 2(aF0, at0, ef0, et0, if0, of0, ot0, uf0, P0, P50, P90, amfcc9, amfcc2, emfcc6, emfcc, emfcc2, imfcc6, imfcc9, omfcc9, umfcc7, umfcc9, umfcc2), BMI 5(eF0, P0, P50, P90, amfcc2) 0.3. 0.3 6 5, BMI 4 <3> sub-grule., BMI sub-grule grule. grule li ui grule. <2> 4. If (X > umin) then NOT ui () If (X < lmax) then NOT li (2), 4 umin (upper threshold), lmax (lower threshold). (X), X umin ui.. lmax li.. 44 2, 288 4 44 4 ( grule ),. 4 (outlier). 3. sub-grule Figure 3. sub-grule Matrix 3.3.,,.. 44,, 0
음성을이용한사상체질분류알고리즘 59 2 44 grule. (), (2) ui li -. grule. (normalized vector) (W). train set grule 3 (S). S (3) (4) (W). train set sub-grule grule (W). S s se s sy s te (3) s se :, s sy :, s te : a sse s sy s te s se b sse s sy s te s sy s te c sse s sy s te W a b c a b c (4) 3.4 grule. 4. Figure 4. Constitutional Score and Decision Rule,,, 0 grule. ui li (W).,.. <4>,, (N_SF), 44 0% 4.. 4. 4. 5 0-fold (cross validation)., 0 9 train set, grule test set. Test set 0 test set, 0-fold... train set 4 0% 0th 90th. 5 0-fold. 5 0-fold, 54.7%, 74.0%. 56.3% 72.%. /.
60 말소리와음성과학제 권제 3 호 (2009) 4. Table 4. Algorithm Result of the Male Patients 5. Table 5. Algorithm Result of the Female Patients 4.2 44. 5 0-fold test set grule 20 <6>. F 0,.. 6. Table 6. Rank of Speech Features for Classification Rank P50[Hz] * af[hz] ** 2 amfcc2 amfcc8 3 of0[hz] * omfcc2 4 ot0[ms] * uf[hz] ** 5 P90[Hz] * of2[hz] ** 6 af[hz] ** umfcc 7 if0[hz] * emfcc4 8 ut0[ms] * amfcc6 9 it0[ms] * emfcc6 0 at0[ms] * imfcc0 omfcc2 omfcc5 2 IHL et0[ms] * 3 irap[%] I50[dB] 4 uf2[hz] ** emfcc9 5 ef0[hz] * if0[hz] * 6 af0[hz] * ef0[hz] * 7 et0[ms] * uc0 8 ubw[hz] ** amfcc9 9 uf0[hz] * I0[dB] 20 obw[hz] ** emfcc0 * : F 0, ** : Formant 5..,,,.....,,,. 50% 70%. /.. F 0,. (testosterone) [3]-[4]
음성을이용한사상체질분류알고리즘 6... [] WHO. (2007). WHO International Standard Terminologies on Traditional Medicine in The Western Pacific Region", http://www.who.int [2] Park, S. H., Kim, M. G., Lee, S. J., Kim, J. Y. & Chae, H. (2009). Temperament and character profiles of sasang typology in an adult clinical sample, Evid. Based Complement. Altern. Med., Advance Access published on April 20; doi:0.093/ecam/nep034 [3] Park, S. C. & Kim, D. J. (2004). Implementation of the automatic pulse-power diagnostic system and the discrimination algorithm of four constitutions, Journal of IEEK SC, Vol. 4, No. 2, pp. 53-60 (, (2004)., 4 SC 2, pp. 53-60, March) [4] Chiu, C. C., Chang, H. H. & Yang, C. H. (2000). "Objective auscultation for traditional Chinese medical diagnosis using novel acoustic parameters", Comput. Methods Programs Biomed, Vol. 62, pp. 99-07. [5] Chiu, C. C., Yang, M. T. & Lin, C. S. (2002). Using fractal dimension analysis on objective auscultation of traditional Chinese medical diagnosis, J. Med.Biol. Eng., Vol. 22, pp. 29-225. [6] Kim, D. R. (999). The Principle of Life Ppreservation in Oriental Medicine, Chungdam. ( (999).,.) [7] Cho, D. U., (2006). Sasang Constitution Classification by Speech Signal Processing, Journal of KICS, Vol. 3, No. 5C, pp. 548-555, May. ( (2006)., 3 5C, pp. 548-555.) [8] Moon, S. J., Tak, J. H. & Hwang, H. J. (2005). A phonetic study of Sasang Constitution, Malsori, Vol. 55, pp. -4. (,, (2005). :, 55, pp. -4.) [9] Boersma, P. (993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proceeding Institute of Phonetic Sciences Vol. 7, pp. 97-0. [0] Yang, B. G. (2003). Speech Analysis Using Praat Script, Mansu ( (2003).,.) [] Ko, D. H. & Jeong, O. R. (200). The Use of Speech and Language Analyzer, Hankukmunhwasa (, (200).,.) [2] Dellar, J. R., Hansen, J. H. L. & Proakis, J. G. (999). Discrete- Time Processing of Speech Signals, Wiley-IEEE Press, pp. 380-385. [3] Dabbs, J. M. & Mallinger, A. (999). High testosterone levels predict low voice pitch among men, Personality and Individual Differences, Vol. 27, pp. 80-804. [4] King, A., Ashby. J. & Nelson, C. (200). "Effects of testosterone replacement on a male professional singer", Journal of Voice, Vol. 5, No. 4, pp. 553-557. (Kang, Jaehwan) 483 Tel: 042-868-930 Fax: 042-868-9480 Email: doskian@kiom.re.kr :, (Yoo, Jonghyang) 483 Tel: 042-868-959 Fax: 042-868-9480 Email: jhyoo@kiom.re.kr : (Lee, Haejung) 483 Tel: 042-868-9320 Fax: 042-868-9480 Email: haejung064@kiom.re.kr :, (Kim, Jongyeol) 483 Tel: 042-868-9489 Fax: 042-868-9480 Email: ssmed@kiom.re.kr :