博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【语音信号处理】之 Kaldi ToolKit 和VoiceBox
阅读量:867 次
发布时间:2019-03-25

本文共 17864 字,大约阅读时间需要 59 分钟。

1. Ubuntu上安装Kaldi ToolKit

安装git

i) git --version

ii) sudo apt install git
iii) git –version : 2.7.4
ix) git config --global user.name “git”
x) git config --global user.email zhaodpx@163.com
xi) git config --list
xii) git init
xiii) git init newrepo

安装Kaldi Toolkit

主要参考: http://kaldi-asr.org/doc/install.html

Git主页:
git clone https://github.com/kaldi-asr/kaldi.git kaldi --origin upstream

下载后,所需操作为:

cd kaldi/tools/; make; cd ../src; ./configure; make

第一步:cd kaldi/tools/

第二步:make,显示:

zhaodeng@ubuntu:~/kaldi/tools$ makeextras/check_dependencies.shextras/check_dependencies.sh: zlib is not installed.extras/check_dependencies.sh: automake is not installed.extras/check_dependencies.sh: autoconf is not installed.extras/check_dependencies.sh: sox is not installed.extras/check_dependencies.sh: gfortran is not installed.extras/check_dependencies.sh: neither libtoolize nor glibtoolize is installedextras/check_dependencies.sh: subversion is not installedextras/check_dependencies.sh: Intel MKL is not installed. Run extras/install_mkl.sh to install it. ... You can also use other matrix algebra libraries. For information, see: ...   http://kaldi-asr.org/doc/matrixwrap.htmlextras/check_dependencies.sh: Some prerequisites are missing; install them using the command:  sudo apt-get install zlib1g-dev automake autoconf sox gfortran libtool subversionMakefile:38: recipe for target 'check_required_programs' failedmake: *** [check_required_programs] Error 1

根据显示的command,在此输入:

sudo apt-get install zlib1g-dev automake autoconf sox gfortran libtool subversion

2. 下载VoiceBox

HomePage:

Git:
下载后,添加到matlab路径中即可。

Installation

1)Pull the GitHub repository or unzip the zip archive into any suitable folder (assumed below to be C:\sap-voicebox)

2)Start MATLAB, click “Set Path”, click “Add Folder …”, navigate to C:\sap-voicebox\voicebox, click “Select Folder” then click “Save”.
3)[Optional] The routine v_voicebox.m contains various installation-dependent parameters which may need to be altered before using the toolbox. In particular it contains a number of default directory paths indicating where temporary files should be created, where speech data normally resides, etc. You can override the defaults by editing v_voicebox.m directly or, more conveniently, by setting an environment variable VOICEBOX to the path of an initializing m-file. See the comments in v_ voicebox.m for a fuller description.
4)[Optional] You may find it convenient to install the non-unicode IPA phonetic symbol fonts developed by SIL which are in the C:\sap-voicebox\external\silipa93 folder.
5)[Optional] The folder C:\sap-voicebox\external\shorten contains the source code and compiled executable for the SHORTEN program written by Tony Robinson and SoftSound Limited www.softsound.com. This is needed for reading compressed SPHERE format files. You may wish to move it elsewhere but, if so, you will need to edit v_voicebox.m to give its location.

在matlab中输入:

what voicebox

即可出现voicebox内所有文件

输入:

help voicebox

即会出现所有文件的说明:

>> help voicebox  Voicebox: Speech Processing Toolbox for MATLAB   Function names have been prefixed "v_" to avoid name conflicts; the  unprefixed aliases will be removed in a future version. Use the function  v_voicebox_update to update old code which, by default, updates all .m files  in the current folder.   Audio File Input/Output    v_readwav       - Read a WAV file    v_writewav      - Write a WAV file    v_readhtk       - Read HTK waveform files    v_writehtk      - Write HTK waveform files    v_readsfs       - Read SFS files    v_readsph       - Read SPHERE/TIMIT waveform files    v_readaif       - Read AIFF Audio Interchange file format file    v_readcnx       - Raed BT Connex database files    v_readau        - Read AU files (from SUN)    v_readflac      - Read FLAC files    wavread       - Emulation of legacy MATLAB function to read a WAV file    wavwrite      - Emulation of legacy MATLAB function to write a WAV file   Frequency Scales    v_frq2bark      - Convert Hz to the Bark frequency scale    v_frq2cent      - Convert Hertz to cents scale    v_frq2erb       - Convert Hertz to erb rate scale    v_frq2mel       - Convert Hertz to mel scale    v_frq2midi      - Convert Hertz to midi scale of semitones    v_bark2frq      - Convert the Bark frequency scale to Hz    v_cent2frq      - Convert cents scale to Hertz    v_erb2frq       - Convert erb rate scale to Hertz    v_mel2frq       - Convert mel scale to Hertz    v_midi2frq      - Convert midi scale of semitones to Hertz   Fourier/DCT/Hartley Transforms    v_rfft          - FFT of real data    v_irfft         - Inverse of FFT of real data    v_rsfft         - FFT of real symmetric data    v_rdct          - DCT of real data    v_irdct         - Inverse of DCT of real data    v_rhartley      - Hartley transform of real data    v_zoomfft       - calculate the fft over a portion of the spectrum with any resolution    v_sphrharm      - calculate forward and inverse shperical harmonic transformations   Probability Distributions    v_berk2prob     - Convert Berksons to probability    v_gaussmix      - Fit a gaussian mixture model to data values    v_gaussmixd     - Calculate marginal and conditional density distributions and perform inference    v_gaussmixk     - Estimate Kuleck-Leibler divergence between two GMMs    v_gaussmixg     - Calculate global mean, covariance and mode of a Gaussian mixture    v_gaussmixm     - Estimate mean and variance of GMM vector magnitude    v_gaussmixp     - Calculates and plots full and marginal probability density from a GMM    v_gaussmixt     - multiplies two GMMs together    v_gausprod      - Calculate the product of multiple gaussians    v_gmmlpdf       - OBSOLETE - use v_gaussmixp instead    v_histndim      - N-dimensional histogram (+ plot 2-D histogram)    v_lognmpdf      - Prob density function of a lognormal distribution    v_maxgauss      - Calculate the mean and variance of max(x) where x is a gaussian vector    v_normcdflog    - Calculate the log of the Normal cdf without underflow    v_pdfmoments    - Convert between central moments, raw moments and cumulants    v_prob2berk     - Convert probability to Berksons    v_randvec       - Generate random vectors    v_randiscr      - Generate discrete random values with prescribed probabilities    v_rnsubset      - Select a random subset    v_randfilt      - Generate filtered random noise without transients    v_stdspectrum   - Generate standard audio and speech spectra    v_usasi         - Generate USASI noise (obsolete: use v_stdspectrum instead)    v_chimv         - Approximate mean and variance of non-central chi distribution    v_vonmisespdf   - Calculate the pdf of the Von Mises (circular normal) distribution   Vector Distances    v_disteusq      - Calculate euclidean/mahanalobis distances between two sets of vectors    v_distchar      - COSH spectral distance between AR coefficient sets     v_distitar      - Itakura spectral distance between AR coefficient sets     v_distisar      - Itakura-Saito spectral distance between AR coefficient sets    v_distchpf      - COSH spectral distance between power spectra     v_distitpf      - Itakura spectral distance between power spectra     v_distispf      - Itakura-Saito spectral distance between power spectra    Speech Analysis    v_activlev      - Calculate the active level of speech (ITU-T P.56)    v_activlevg     - Calculate the active level of speech robustly to added noise    v_dypsa         - Estimate glottal closure instants from a speech waveform    v_enframe       - Divide a speech signal into frames for frame-based processing    v_correlogram   - calculate a 3-D v_correlogram    v_ewgrpdel      - Energy-weighted group delay waveform    v_fram2wav      - Interpolate frame-based values to a waveform    v_filtbankm     - Transformation matrix for a linear/mel/erb/bark-spaced v_filterbank from dft output     v_fxpefac       - PEFAC pitch tracker    v_fxrapt        - RAPT pitch tracker    v_gammabank     - Calculate a bank of IIR gammatone filters    v_importsii     - Calculate the SII importance function (ANSI S3.5-1997)    v_modspect      - Caluclate the modulation specrogram    v_mos2pesq      - Convert MOS values to equivalent PESQ scores    v_overlapadd    - Reconstitute an output waveform after frame-based processing    v_pesq2mos      - Convert PESQ scores to equivalent MOS values    v_phon2sone     - Convert signal levels from phons to sones    v_psycdigit     - Experimental estimation of monotonic/unimodal psychometric function using TIDIGITS    v_psycest       - Experimental estimation of monotonic psychometric function    v_psycestu      - Experimental estimation of unimodal psychometric function     v_psychofunc    - Psychometric functions    v_sigma         - Identify glottal closure and opening intstants from Lx or EGG waveform    v_snrseg        - Segmental SNR and Global SNR calculation    v_sone2phon     - Convert signal levels from sones to phons    v_soundspeed    - Returns the speed of sound in air as a function of temperature    v_spgrambw      - Spectrogram with many options    v_stoi2prob     - Convert STOI intelligibility measure to probability of correct recognition    v_txalign       - Align two sets of time markers    v_vadsohn       - Voice activity detector    v_ppmvu         - Calculate the PPM, VU or EBU levels of a signal   LPC Analysis of Speech    v_ccwarpf       - warp complex cepstrum coefficients    v_lpcauto       - LPC analysis: autocorrelation method    v_lpcbwexp      - Bandwidth expansion of LPC filter    v_lpccovar      - LPC analysis: covariance method    v_lpcconv       - Arbitrary conversion between LPC representations    v_lpcifilt      - inverse filter a speech signal    v_lpcrand       - create random stable filters    v_lpcrr2am      - Matrix with all LPC filters up to order p    v_lpcstable     - check for stability and force stable filters    v_lpc--2--      - Convert between alternative LPC representation   Speech Synthesis    v_sapisynth     - Text-to-speech synthesis of a string or matrix     v_glotros       - Rosenberg model of glottal waveform    v_glotlf        - Liljencrants-Fant model of glottal waveform   Speech Enhancement    v_estnoiseg     - Estimate the noise spectrum from noisy speech using MMSE method    v_estnoisem     - Estimate the noise spectrum from noisy speech using minimum statistics    v_specsub       - Speech enhancement using spectral subtraction    v_ssubmmse      - Speech enhancement using MMSE estimate of spectral amplitude or log amplitude    v_ssubmmsev     - Speech enhancement using MMSE estimate and VAD-based noise estimation    v_specsubm      - (obsolete algorithm) Spectral subtraction     v_spendred      - Speech Enhancement and Dereverberation (Doire's algorithm)   Speech Coding    v_lin2pcmu      - Convert linear PCM to mu-law PCM    v_pcma2lin      - Convert A-law PCM to linear PCM    v_pcmu2lin      - Convert mu-law PCM to linear PCM    v_lin2pcma      - Convert linear PCM to A-law PCM    v_kmeanlbg      - Vector quantisation: LBG algorithm    v_kmeanhar      - Vector quantization: K-harmonic means    v_potsband      - Create telephone bandwidth filter    v_kmeans        - Vector quantisation: k-means algorithm   Speech Recognition    v_ldatrace      - constrained Linear Discriminant Analysis to maximize trace(W\B)    v_melbankm      - Mel v_filterbank transformation matrix    v_melcepst      - Mel cepstrum frontend for recogniser    v_cep2pow       - Convert mel cepstram means & variances to power domain    v_pow2cep       - Convert power domain means & variances to mel cepstrum   Signal Processing    v_addnoise      - Add noise to a signal at a chosen SNR    v_convfft       - 1-dimensional convolution/corrolation using FFT    v_ditherq       - Add dither and quantize a signal    v_filterbank    - Apply a bank of IIR filters to a signal    v_findpeaks     - Find peaks in a signal or spectrum    v_maxfilt       - Running maximum filter    v_meansqtf      - Output power of a filter with white noise input    v_momfilt       - Generate running moments    v_resample      - Resamples a signal: identical to MATLAB resample but removes filter transients    v_schmitt       - Pass a signal through a v_schmitt trigger    v_sigalign      - Align a clean refeence with a noisy signal    v_teager        - Calculate the Teager energy waveform    v_windinfo      - Calculate window properties and figures of merit    v_windows       - Window function generation    v_zerocros      - Find interpolated zero crossings   Information Theory    v_huffman       - Generate Huffman code    v_entropy       - Calculate v_entropy and conditional v_entropy   Computer Vision    v_imagehomog    - Apply a homography transformation to an image with bilinear interpolation    v_polygonarea   - Calculate the area of a polygon    v_polygonwind   - Test if points are inside or outside a polygon    v_polygonxline  - Find where a line crosses a polygon    v_qrabs         - Absolute value of a real quaternion    v_qrdivide      - divide two real quaternions (or invert one)    v_qrdotdiv      - elmentwise division of two real quaternion arrays    v_qrdotmult     - elmentwise multiplication of two real quaternion arrays    v_qrmult        - multiply two real quaternion arrays    v_qrpermute     - permute the indices of a quaternion array    v_rectifyhomog  - Apply rectifing homographies to a set of cameras to make their optical axes parallel    v_rot--2--      - Convert between different representations of rotations    v_rotqrmean     - Find the average of several v_rotation quaternions    v_rotqrvec      - Apply a quaternion rotation to an array of 3D vectors    v_sphrharm      - forward and inverse spherical harmonic transform using uniform, Gaussian                      or arbitrary inclination (elevation) grids and a uniform azimuth grid.    v_upolyhedron   - Calculate the vertex coordinates and other characteristics of a uniform polyhedron   Printing and Display functions    v_axisenlarge   - Selectively enlarge figure axis for clarity    v_cblabel       - Add a label onto the colorbar    v_figbolden     - Make a figure bold and adjust colours for printing clearly    v_fig2emf       - Make a figure bold and save as a windows metafile    v_fig2pdf       - Make a figure bold and save as pdf, eps or ps    v_frac2bin      - Convert numbers to fixed-point binary strings    v_lambda2rgb    - convert wavelength to XYZ or RGB colour triplets    v_sprintsi      - Print a value with an SI multiplier    v_sprintcpx     - Print a complex number with real and imaginary parts    v_texthvc       - write text on a plot with specified alignment and colour    v_tilefigs      - Arrange all figures on the screen    v_colormap    - Set and plot colormap information    v_xticksi       - Label x-axis tick marks using SI multipliers    v_yticksi       - Label y-axis tick marks using SI multipliers    v_xyzticksi     - Helper function for v_xticksi and v_yticksi   Voicebox Parameters and System Interface    v_hostipinfo    - Get information about the computer name and internet connections    v_regexfiles    - Recursively find files that match a regular expression pattern    v_unixwhich     - Search the WINDOWS system path for an executable program (like UNIX which)    v_voicebox      - Global installation-dependent parameters    v_winenvar      - Obtain WINDOWS environment variables    v_voicebox_update - Update matlab files in the current folder to include the v_ prefix where needed   Utility Functions    v_atan2sc       - arctangent function that returns the sin and cos of the angle    v_besselratio   - calculate the Bessel function ratio: besseli(v+1,x)./besseli(v,x)    v_besselratioi  - calculate the inverse of v_besselratio [only for v=0]    v_bitsprec      - Rounds values to a precision of n bits    v_choosenk      - All choices of k elements out of 1:n without replacement    v_choosrnk      - All choices of k elements out of 1:n with replacement    v_dlyapsq       - Solve the discrete lyapunov equation    v_dualdiag      - Simultaneously diagonalise two hermitian matrices    v_finishat      - Estimate the finishing time of a long loop    v_fopenmkd      - Like FOPEN() but creates any missing directories/folders    v_gammalns      - Calculates log(gamma(x)) for signed real-valued x    v_horizdiff     - Estimate the horizontal difference between two functions of x    v_hypergeom1f1  - Confluent Hypergeometric function or Kummer's M function    v_logsum        - Calculates log(sum(exp(x))) without overflow/underflow    v_minspane      - calculate the minimum (or shortest) spanning tree    v_mintrace      - find a row permutation to minimize the trace of a matrix    v_m2htmlpwd     - Create HTML documentation of matlab routines in the current directory    v_nearnonz      - Replace each zero element with the nearest non-zero element    v_paramsetch    - Set a parameter structure and do valididty checks    v_permutes      - All n! permutations of 1:n    v_quadpeak      - Find quadratically-interpolated peak in a 2D array    v_rotation      - Generate v_rotation matrices    v_skew3d        - Generate 3x3 skew symmetric matrices    v_zerotrim      - Remove empty trailing rows and columns  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%voicebox 既是目录也是函数。  For calling details please see v_voicebox.m    This dummy routine is included for backward compatibility only  and will be removed in a future release of voicebox. Please use  v_voicebox.m in future and/or update with v_voicebox_update.m        Copyright (C) Mike Brookes 2018       Version: $Id: voicebox.m 10863 2018-09-21 15:39:23Z dmb $

3. 连接共享文件夹

ubuntu里输入

sudo apt install nfs-kernel-serversudo mount -t nfs -o nolock 192.168.1.152:/home/zhuguili/mnt ~/tmp

即可实现文件共享(本地文件夹为tmp)

转载地址:http://htiyk.baihongyu.com/

你可能感兴趣的文章