2025年5月17日 星期六 乙巳(蛇)年 二月十九 设为首页 加入收藏
rss
您当前的位置:首页 > 计算机 > 编程开发 > Python

Python实现语音识别:SpeechRecognition

时间:08-09来源:作者:点击数:45

最近在学习语音识别的一些基本知识,也在了解Python的语音识别功能依赖库。分享一下。

常用Python语音识别依赖库

Python的依赖库中有一些现成的语音识别软件包。其中包括:

  • apiai
  • google-cloud-speech
  • pocketsphinx
  • SpeechRcognition
  • watson-developer-cloud
  • wit

其中SpeechRecognition,是google出的,专注于语音向文本的转换。

wit 和 apiai 提供了一些超出基本语音识别的内置功能,如识别讲话者意图的自然语言处理功能。

SpeechRecognition库的优势

满足几种主流语音 API ,灵活性高

Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥,无需注册就可使用

SpeechRecognition无需构建访问麦克风和从头开始处理音频文件的脚本, 只需几分钟即可自动完成音频输入、检索并运行。因此易用性很高。

SpeechRecognition的识别器

SpeechRecognition 的核心就是识别器类。一共有七个Recognizer API ,包含多种设置和功能来识别音频源的语音,分别是:

  • recognize_bing():Microsoft Bing Speech
  • recognize_google():Google Web Speech API
  • recognize_google_cloud():Google Cloud Speech- requires installation of the google-cloud-speech package
  • recognize_houndify():Houndifyby SoundHound
  • recognize_ibm():IBM Speech to Text
  • recognize_sphinx():CMU Sphinx- requires installing PocketSphinx
  • recognize_wit():Wit.ai

以上七个中只有 recognition_sphinx()可与CMU Sphinx 引擎脱机工作, 其他六个都需要连接互联网。

另外,SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥,可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证,因此本文使用了 Web Speech API。

SpeechRecognition 的使用要求

To use all of the functionality of the library, you should have:

  • Python2.6, 2.7, or 3.3+ (required)

需要Python 2.6、2.7和3.3以上的版本

  • PyAudio0.2.11+ (required only if you need to use microphone input,Microphone)

需要安装PyAudio 0.2.11+的版本

  • PocketSphinx(required only if you need to use the Sphinx recognizer,recognizer_instance.recognize_sphinx)

需要安装PocketSphinx

  • Google API Client Library for Python(required only if you need to use the Google Cloud Speech API,recognizer_instance.recognize_google_cloud)

需要使用Google API Client Library for Python

  • FLAC encoder(required only if the system is not x86-based Windows/Linux/OS X)

需要安装FLAC encoder,如果系统不是X86

SpeechRecognition 支持的文件类型

支持的文件类型有:

  • WAV: 必须是 PCM/LPCM 格式
  • AIFF
  • AIFF-C
  • FLAC: 必须是初始 FLAC 格式;OGG-FLAC 格式不可用

安装 SpeechRecognation

上篇文章介绍了SpeechRecognition的基本概念和优势,这篇文章介绍如何安装和体验一下demo。

一、安装Python,基于Python3.7

从终端安装 SpeechRecognition,使用命令:pip3 install SpeechRecognition:

  • alicedembp:~ alice$ pip3 install SpeechRecognition
  • Requirement already satisfied: SpeechRecognition in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (3.8.1)
  • alicedembp:~ alice$ python -m speech_recognition

二、验证安装是否成功

安装完成后打开解释器窗口输入以下内容来验证安装:

  • >>> import speech_recognition as sr
  • >>> sr.__version__
  • '3.8.1'

三、安装portaudio、pyaudio

接下来,安装必须依赖的两个包,注意顺序不能错,安装pyaudio时必须依赖于portaudio

  • brew install portaudio
  • pip install pyaudio

如下:

  • alicedembp:~ alice$ brew install portaudio
  • Updating Homebrew...
  • ==> Auto-updated Homebrew!
  • Updated 1 tap (homebrew/core).
  • ==> New Formulae
  • allureofthestars csound inlets libgr terrahub
  • boringtun cubelib itk nlohmann-json vapoursynth-imwri
  • cfn-lint cypher-shell kahip otf2 vapoursynth-ocr
  • cmix fasttext ktlint phpstan vapoursynth-sub
  • cpp-gsl faudio kubeaudit scws
  • cql gel leela-zero sk
  • ==> Updated Formulae
  • libpng ✔ godep libdap picard-tools
  • amazon-ecs-cli golang-migrate libebml pijul
  • ammonite-repl gopass libedit pilosa
  • ansifilter goreleaser libestr platformio
  • apache-geode gradle libetonyek postgresql
  • apache-spark grafana libfabric postgresql@10
  • arangodb graphene libfixbuf pre-commit
  • aravis groovysdk libgit2 presto
  • argyll-cms grpc libgit2-glib privoxy
  • asciidoctor gst-editing-services libical prometheus
  • autojump gst-libav libiconv pspg
  • autopep8 gst-plugins-bad libjson-rpc-cpp psql2csv
  • avra gst-plugins-base liblcf pulumi
  • aws-iam-authenticator gst-plugins-good liblinear purescript
  • aws-okta gst-plugins-ugly libltc pushpin
  • aws-sdk-cpp gst-python libmatroska py3cairo
  • azure-cli gst-rtsp-server libmicrohttpd pygobject3
  • badtouch gstreamer libmspub qalculate-gtk
  • ballerina gtranslator libphonenumber qbs
  • bash hadoop libpqxx qemu
  • bdw-gc harfbuzz libpulsar quazip
  • binaryen hebcal libqalculate r
  • bind helmfile librealsense rawtoaces
  • bit hexyl libressl rclone
  • blast hfstospell libssh readline
  • boost hivemind libtorrent-rasterbar rebar3
  • botan hledger libuv restic
  • btfs hlint libvisio ripgrep
  • buildkit hopenpgp-tools libvmaf rke
  • bwfmetaedit howdoi libxo roll
  • carla htmlcxx linkerd root
  • castxml http-parser lmod rsyslog
  • ccache httpd lynis ruby
  • certbot hub lz4 ruby-build
  • chakra hugo mapnik rust
  • chronograf hydra maven rustup-init
  • clang-format hypre maxwell s-nail
  • cmake i2p media-info salt
  • cmocka iamy memcached serverless
  • cockroach icu4c meson shfmt
  • cogl idnits mimic ship
  • cointop igv mingw-w64 sile
  • conan ilmbase minio silk
  • couchdb imagemagick minio-mc skaffold
  • cpprestsdk imagemagick@6 mkvtoolnix sn0int
  • cromwell imake modules sonobuoy
  • crowdin influxdb mono sops
  • crystal iniparser mosquitto sqldiff
  • crystal-icr ios-sim mpd sqlite
  • ctl ios-webkit-debug-proxy mps-youtube sqlite-analyzer
  • cython iozone msmtp sqlmap
  • dartsim ipbt mypy ssh-copy-id
  • dbhash ipfs mysql stubby
  • dfmt ipython n subversion
  • digdag ircii nagios svgo
  • dmd isl nano swagger-codegen
  • docfx istioctl nats-streaming-server swagger-codegen@2
  • doctl itstool ncmpcpp swiftformat
  • dwdiff jailkit neovim swiftlint
  • emscripten jbig2dec netdata synfig
  • epubcheck jena newsboat tarantool
  • erlang jenkins nghttp2 tcpreplay
  • erlang@20 jetty nginx tectonic
  • ethereum jfrog-cli-go nifi telegraf
  • exploitdb jhiccup node teleport
  • faas-cli john node-build tmux
  • ffmpeg joplin node@10 tmuxinator-completion
  • field3d jp2a node@8 tomcat
  • firebase-cli jruby nomad topgrade
  • flatbuffers json_spirit numpy traefik
  • flow jump ocamlbuild triton
  • fluxctl just octave tundra
  • fn kafka odpi typescript
  • freeling khard opencoarrays ucloud
  • freetds kibana@5.6 opencolorio ultralist
  • frps kitchen-sync opencv urbit
  • frugal klavaro opencv@2 v8
  • galen knot opencv@3 vapoursynth
  • gauge knot-resolver openexr varnish
  • gcc kore openimageio vault
  • gcc@5 kotlin openrct2 vcdimager
  • gcc@6 krb5 openssh vim
  • gcc@7 kubeprod openvdb vips
  • gegl kubernetes-cli openvpn volt
  • getdns kyoto-cabinet operator-sdk vte3
  • ghc kyoto-tycoon packer vtk
  • ghq lastpass-cli paket webdis
  • gifsicle laszip parallel widelands
  • git-lfs latex2html passenger wp-cli
  • gitfs latexml pazpar2 wtf
  • gitlab-runner lbdb pbrt xonsh
  • gitless lcdf-typetools pcapplusplus yaf
  • gjs lego pcl yaz
  • glances lgogdownloader pcre2 ykman
  • glfw libatomic_ops pdal you-get
  • glib libb2 pdfgrep youtube-dl
  • glooctl libbluray pdnsrec zebra
  • glslang libcddb php znc
  • gmic libcdio php-cs-fixer zorba
  • gmsh libcdr php@7.1 zstd
  • go libchamplain php@7.2
  • goaccess libcoap phpunit
  • ==> Deleted Formulae
  • safe
  • ==> Downloading https://homebrew.bintray.com/bottles/portaudio-19.6.0.high_sierra.bottle.tar.gz
  • ######################################################################## 100.0%
  • ==> Pouring portaudio-19.6.0.high_sierra.bottle.tar.gz
  • ? /usr/local/Cellar/portaudio/19.6.0: 33 files, 452KB
  • alicedembp:~ alice$ pip3 install pyaudio
  • Collecting pyaudio
  • Using cached https://files.pythonhosted.org/packages/ab/42/b4f04721c5c5bfc196ce156b3c768998ef8c0ae3654ed29ea5020c749a6b/PyAudio-0.2.11.tar.gz
  • Building wheels for collected packages: pyaudio
  • Building wheel for pyaudio (setup.py) ... done
  • Stored in directory: /Users/alice/Library/Caches/pip/wheels/f4/a8/a4/292214166c2917890f85b2f72a8e5f13e1ffa527c4200dcede
  • Successfully built pyaudio
  • Installing collected packages: pyaudio
  • Successfully installed pyaudio-0.2.11
  • alicedembp:~ alice$

否则会出现错误提示:src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found

  • gcc -fno-strict-aliasing -Wsign-compare -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch i386 -arch x86_64 -g -DMACOSX=1 -I/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/_portaudiomodule.c -o build/temp.macosx-10.6-intel-3.7/src/_portaudiomodule.o
  •     src/_portaudiomodule.c:29:10: fatal error: 'portaudio.h' file not found
  •     #include "portaudio.h"
  •              ^~~~~~~~~~~~~
  •     1 error generated.
  •     error: command 'gcc' failed with exit status 1

SpeechRecognition的Demo调试

  • import speech_recognition as sr
  • r = sr.Recognizer()
  • test = sr.AudioFile('/Users/alice/Documents/Work/Blog/AI/语音识别/speechrecognition/audiofiles/test1.wav')
  • with test as source:
  • audio = r.record(source)
  • type (audio)
  • r.recognize_google(audio, language='zh-CN', show_all= True)
方便获取更多学习、工作、生活信息请关注本站微信公众号城东书院 微信服务号城东书院 微信订阅号
推荐内容
相关内容
栏目更新
栏目热门