使用ffmpeg实现音频静音修剪

本文介绍: Ffmpeg 应该是 FFmpeg 工具集中最核心的利器，支持多种多样的编码器、解码器、封装格式、滤镜功能。FFmpeg框架的基本组成包含AVFormat、AVCodec、AVFilter、AVDevice、AVUtil等模块库，结构图如下：

本文主要介绍在 FFmpeg 命令中使用 silenceremove filter 进行音频静音的修剪。

参数名	说明	取值范围	默认值
start_periods	设置是否应在音频开头修剪音频。`0` 表示不应从一开始就修剪静音。当指定一个非 `0` 值时，它会修剪音频直到找到非静音。通常，当从音频的开头修剪静音时，start_periods 将为 1，但它可以增加到更高的值，以将所有音频修剪到非静音周期的特定计数。	0 ~ 9000	0
start_duration	设置停止修剪音频之前必须检测到的非静音时长。通过加大该时长，可以将突发的噪音视为静音并进行修剪。（注意，一般使该值保持为 `0`，否则会有副作用。）	0 ~ INT32_MAX（注意：传参可用多种格式）	0
start_threshold	设置应将哪些样本值视为静音。对于数字音频，值 `0` 可能没问题，但对于从模拟信号录制的音频，可能希望增加该值以考虑背景噪音。可以以 `dB` 为单位指定（“dB”附加到值后面）或幅度比(amplitude ratio, 比如 0.001 实际上等价于 -60dB)。	0 ~ DBL_MAX	0
start_silence	设置在修剪后保留的最大静音时长。默认值为 0，这等于修剪所有检测为静音的样本。	0 ~ INT32_MAX（注意：传参可用多种格式）	0
start_mode	指定多声道音频开始处检测静音结束的模式。可以是 `any` 或 `all`。默认为 `any`, 表示对于任一声道被检测到非静音的样本都将导致停止修剪静音。对于 `all`，则当所有通道都被检测为非静音时，才会导致停止修剪静音。	any/all	any

参数名	说明	取值范围	默认值
stop_periods	设置从音频结尾开始修剪静音的计数。要从文件中间删除静音则指定为负值。负值将值视为正值并且像 `start_periods` 那样重新开始处理，使其适用于消除音频中间的静音时间段。	-9000 ~ 9000	0
stop_duration	指定在不再复制(Copy，FFmpeg 内部处理模式分为裁剪(Trim)和复制(Copy)。)音频之前必须存在的静音时长。通过加大该时长，可以在音频中保留所需的静音。（注意，一般使该值保持为 `0`，否则会有副作用。）	0 ~ INT32_MAX（注意：传参可用多种格式）	0
stop_threshold	类似于 `start_threshold`，但用于从音频末尾修剪静音。	0 ~ DBL_MAX	0
stop_silence	类似于 `start_silence`。	0 ~ INT32_MAX（注意：传参可用多种格式）	0
stop_mode	指定多声道音频结尾处检测静音结束的模式。可以是 `any` 或 `all`。默认为 `any`, 表示对于任一声道被检测到非静音的样本都将导致停止修剪。对于 `all`，则当所有通道都被检测为非静音时，才会导致停止修剪。	any/all	any

参数名	说明	取值范围	默认值
detection	指定音量的计算方式。`rms` 表示使用均方根值(RMS)计算， `peak` 表示使用幅度值(amplitude)计算。	rms/peak	rms
window	指定使用滑动窗口计算音量的窗口大小。单位为秒。由 `detection` 决定计算方式。	0 ~ 10	0.02

使用 Audition 之类的软件录制一段音频，然后将背景噪音用静音替换。

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0.2:start_threshold=-50dB:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence01_01.ogg" -y

参数说明：

结果展示：

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:start_silence=0.5:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence02_01.ogg" -y

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=3:start_duration=0.38:start_threshold=-50dB:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence03_01.ogg" -y

直接使用 stop_periods 试验未成功。采取两次翻转的低效方式。首先翻转并移除首部（也就是翻转之前的尾部）静音，然后再翻转回来。

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:detection=peak,areverse" -ar 44100 -ac 2 "Test-RemoveSilence04_01.wav" -y

如下代码设置 stop_periods 为 1 不成功，-1 则文件中间的静音也被移除。

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=stop_periods=-1:stop_duration=0:stop_threshold=-50dB:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence04_02.ogg" -y

不使用 stop_periods 的原因同 4。

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:detection=peak,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:detection=peak,areverse" -ar 44100 -ac 2 "Test-RemoveSilence05_01.wav" -y

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=stop_periods=-1:stop_duration=0:stop_threshold=-50dB:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence06_01.ogg" -y

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:stop_periods=-1:stop_duration=0:stop_threshold=-50dB:detection=peak" -ar 44100 -ac 2 -b:a 96k "Test-RemoveSilence06_02.ogg" -y

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:stop_periods=-1:stop_duration=0:stop_threshold=-50dB:detection=peak,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:detection=peak,areverse" -ar 44100 -ac 2 "Test-RemoveSilence06_03.wav" -y

问题：areverse 版生成 ogg 文件有警告：[libvorbis @ 0x14f025400] Queue input is backward in time。Audition 打开也提示有错。原因暂未知。改为输出 wav 没问题。

如果要在首尾保留 0.5 秒的静音：

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:start_silence=0.5:stop_periods=-1:stop_duration=0:stop_threshold=-50dB:detection=peak,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:start_silence=0.5:detection=peak,areverse" -ar 44100 -ac 2 "Test-RemoveSilence07_01.wav" -y

问题：areverse 版生成 ogg 文件有警告：[libvorbis @ 0x14f025400] Queue input is backward in time。Audition 打开也提示有错。原因暂未知。改为输出 wav 没问题。

除了首部，中间也多了0.5 秒的静音，并且结尾处并不是很完美。

中间的静音超过 1.5 秒则移除并且保留 1.5 秒；头部保留 0.5 秒静音；尾部不保留静音。第一次使用 silenceremove 会在首部保留 1.5 秒，第二次使用 silenceremove 能够将首部静音缩减为 0.5 秒。

ffmpeg -hide_banner -loglevel warning -i "Test.wav" -filter_complex "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:start_silence=1.5:stop_periods=-1:stop_duration=0:stop_threshold=-50dB:stop_silence=1.5:detection=peak,silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:start_silence=0.5:detection=peak"  -ar 44100 -ac 2 "Test-RemoveSilence08_01.wav" -y

start_silence 和 stop_silence 都需要设置为 1.5;

如果提供了 start_duration 和 stop_duration 参数则需要设置为 0。