最近在学习语音识别的一些基本知识,也在了解Python的语音识别功能依赖库。分享一下。
Python的依赖库中有一些现成的语音识别软件包。其中包括:
- apiai
- google-cloud-speech
- pocketsphinx
- SpeechRcognition
- watson-developer-cloud
- wit
其中SpeechRecognition,是google出的,专注于语音向文本的转换。
wit 和 apiai 提供了一些超出基本语音识别的内置功能,如识别讲话者意图的自然语言处理功能。
满足几种主流语音 API ,灵活性高
Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥,无需注册就可使用
SpeechRecognition无需构建访问麦克风和从头开始处理音频文件的脚本, 只需几分钟即可自动完成音频输入、检索并运行。因此易用性很高。
SpeechRecognition 的核心就是识别器类。一共有七个Recognizer API ,包含多种设置和功能来识别音频源的语音,分别是:
- recognize_bing():Microsoft Bing Speech
- recognize_google():Google Web Speech API
- recognize_google_cloud():Google Cloud Speech- requires installation of the google-cloud-speech package
- recognize_houndify():Houndifyby SoundHound
- recognize_ibm():IBM Speech to Text
- recognize_sphinx():CMU Sphinx- requires installing PocketSphinx
- recognize_wit():Wit.ai
以上七个中只有 recognition_sphinx()可与CMU Sphinx 引擎脱机工作, 其他六个都需要连接互联网。
另外,SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥,可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证,因此本文使用了 Web Speech API。
To use all of the functionality of the library, you should have:
需要Python 2.6、2.7和3.3以上的版本
需要安装PyAudio 0.2.11+的版本
需要安装PocketSphinx
需要使用Google API Client Library for Python
需要安装FLAC encoder,如果系统不是X86
支持的文件类型有:
- WAV: 必须是 PCM/LPCM 格式
- AIFF
- AIFF-C
- FLAC: 必须是初始 FLAC 格式;OGG-FLAC 格式不可用
上篇文章介绍了SpeechRecognition的基本概念和优势,这篇文章介绍如何安装和体验一下demo。
一、安装Python,基于Python3.7
从终端安装 SpeechRecognition,使用命令:pip3 install SpeechRecognition:
- alicedembp:~ alice$ pip3 install SpeechRecognition
- Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (3.8.1)
- alicedembp:~ alice$ python -m speech_recognition
二、验证安装是否成功
安装完成后打开解释器窗口输入以下内容来验证安装:
- >>> import speech_recognition as sr
- >>> sr.__version__
- '3.8.1'
三、安装portaudio、pyaudio
接下来,安装必须依赖的两个包,注意顺序不能错,安装pyaudio时必须依赖于portaudio
- brew install portaudio
- pip install pyaudio
如下:
- alicedembp:~ alice$ brew install portaudio
- Updating Homebrew...
- ==> Auto-updated Homebrew!
- Updated 1 tap (homebrew/core).
- ==> New Formulae
- allureofthestars csound inlets libgr terrahub
- boringtun cubelib itk nlohmann-json vapoursynth-imwri
- cfn-lint cypher-shell kahip otf2 vapoursynth-ocr
- cmix fasttext ktlint phpstan vapoursynth-sub
- cpp-gsl faudio kubeaudit scws
- cql gel leela-zero sk
- ==> Updated Formulae
- libpng ✔ godep libdap picard-tools
- amazon-ecs-cli golang-migrate libebml pijul
- ammonite-repl gopass libedit pilosa
- ansifilter goreleaser libestr platformio
- apache-geode gradle libetonyek postgresql
- apache-spark grafana libfabric postgresql@10
- arangodb graphene libfixbuf pre-commit
- aravis groovysdk libgit2 presto
- argyll-cms grpc libgit2-glib privoxy
- asciidoctor gst-editing-services libical prometheus
- autojump gst-libav libiconv pspg
- autopep8 gst-plugins-bad libjson-rpc-cpp psql2csv
- avra gst-plugins-base liblcf pulumi
- aws-iam-authenticator gst-plugins-good liblinear purescript
- aws-okta gst-plugins-ugly libltc pushpin
- aws-sdk-cpp gst-python libmatroska py3cairo
- azure-cli gst-rtsp-server libmicrohttpd pygobject3
- badtouch gstreamer libmspub qalculate-gtk
- ballerina gtranslator libphonenumber qbs
- bash hadoop libpqxx qemu
- bdw-gc harfbuzz libpulsar quazip
- binaryen hebcal libqalculate r
- bind helmfile librealsense rawtoaces
- bit hexyl libressl rclone
- blast hfstospell libssh readline
- boost hivemind libtorrent-rasterbar rebar3
- botan hledger libuv restic
- btfs hlint libvisio ripgrep
- buildkit hopenpgp-tools libvmaf rke
- bwfmetaedit howdoi libxo roll
- carla htmlcxx linkerd root
- castxml http-parser lmod rsyslog
- ccache httpd lynis ruby
- certbot hub lz4 ruby-build
- chakra hugo mapnik rust
- chronograf hydra maven rustup-init
- clang-format hypre maxwell s-nail
- cmake i2p media-info salt
- cmocka iamy memcached serverless
- cockroach icu4c meson shfmt
- cogl idnits mimic ship
- cointop igv mingw-w64 sile
- conan ilmbase minio silk
- couchdb imagemagick minio-mc skaffold
- cpprestsdk imagemagick@6 mkvtoolnix sn0int
- cromwell imake modules sonobuoy
- crowdin influxdb mono sops
- crystal iniparser mosquitto sqldiff
- crystal-icr ios-sim mpd sqlite
- ctl ios-webkit-debug-proxy mps-youtube sqlite-analyzer
- cython iozone msmtp sqlmap
- dartsim ipbt mypy ssh-copy-id
- dbhash ipfs mysql stubby
- dfmt ipython n subversion
- digdag ircii nagios svgo
- dmd isl nano swagger-codegen
- docfx istioctl nats-streaming-server swagger-codegen@2
- doctl itstool ncmpcpp swiftformat
- dwdiff jailkit neovim swiftlint
- emscripten jbig2dec netdata synfig
- epubcheck jena newsboat tarantool
- erlang jenkins nghttp2 tcpreplay
- erlang@20 jetty nginx tectonic
- ethereum jfrog-cli-go nifi telegraf
- exploitdb jhiccup node teleport
- faas-cli john node-build tmux
- ffmpeg joplin node@10 tmuxinator-completion
- field3d jp2a node@8 tomcat
- firebase-cli jruby nomad topgrade
- flatbuffers json_spirit numpy traefik
- flow jump ocamlbuild triton
- fluxctl just octave tundra
- fn kafka odpi typescript
- freeling khard opencoarrays ucloud
- freetds kibana@5.6 opencolorio ultralist
- frps kitchen-sync opencv urbit
- frugal klavaro opencv@2 v8
- galen knot opencv@3 vapoursynth
- gauge knot-resolver openexr varnish
- gcc kore openimageio vault
- gcc@5 kotlin openrct2 vcdimager
- gcc@6 krb5 openssh vim
- gcc@7 kubeprod openvdb vips
- gegl kubernetes-cli openvpn volt
- getdns kyoto-cabinet operator-sdk vte3
- ghc kyoto-tycoon packer vtk
- ghq lastpass-cli paket webdis
- gifsicle laszip parallel widelands
- git-lfs latex2html passenger wp-cli
- gitfs latexml pazpar2 wtf
- gitlab-runner lbdb pbrt xonsh
- gitless lcdf-typetools pcapplusplus yaf
- gjs lego pcl yaz
- glances lgogdownloader pcre2 ykman
- glfw libatomic_ops pdal you-get
- glib libb2 pdfgrep youtube-dl
- glooctl libbluray pdnsrec zebra
- glslang libcddb php znc
- gmic libcdio php-cs-fixer zorba
- gmsh libcdr php@7.1 zstd
- go libchamplain php@7.2
- goaccess libcoap phpunit
- ==> Deleted Formulae
- safe
-
- ==> Downloading https://homebrew.bintray.com/bottles/portaudio-19.6.0.high_sierra.bottle.tar.gz
- ######################################################################## 100.0%
- ==> Pouring portaudio-19.6.0.high_sierra.bottle.tar.gz
- ? /usr/local/Cellar/portaudio/19.6.0: 33 files, 452KB
- alicedembp:~ alice$ pip3 install pyaudio
- Collecting pyaudio
- Using cached https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
- Building wheels for collected packages: pyaudio
- Building wheel for pyaudio (setup.py) ... done
- Stored in directory: /Users/alice/Library/Caches/pip/wheels/f4/a8/a4/292214166c2917890f85b2f72a8e5f13e1ffa527c4200dcede
- Successfully built pyaudio
- Installing collected packages: pyaudio
- Successfully installed pyaudio-0.2.11
- alicedembp:~ alice$
-
否则会出现错误提示:src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
- gcc -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch i386 -arch x86_64 -g -DMACOSX=1 -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/_portaudiomodule.c -o build/temp.macosx-10.6-intel-3.7/src/_portaudiomodule.o
-
- src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
-
- #include "portaudio.h"
-
- ^~~~~~~~~~~~~
-
- 1 error generated.
-
- error: command 'gcc' failed with exit status 1
- import speech_recognition as sr
-
- r = sr.Recognizer()
-
- test = sr.AudioFile('/Users/alice/Documents/Work/Blog/AI/语音识别/speechrecognition/audiofiles/test1.wav')
-
- with test as source:
- audio = r.record(source)
-
- type (audio)
-
- r.recognize_google(audio, language='zh-CN', show_all= True)