用 80 行 JavaScript 代码构建自己的语音助手

lgmyxbjfu

发布于 2020-9-7 13:47

浏览

0收藏

在本教程中，我们将使用 80 行 JavaScript 代码在浏览器中构建一个虚拟助理（如 Siri 或 Google 助理）。你可以在这里测试这款应用程序，它将会听取用户的语音命令，然后用合成语音进行回复。

你所需要的是：
Google Chrome （版本 25 以上）
一款文本编辑器
由于 Web Speech API 仍处于试验阶段，该应用程序只能在受支持的浏览器上运行：Chrome（版本 25 以上）和 Edge（版本 79 以上）。

我们需要构建哪些组件？
要构建这个 Web 应用程序，我们需要实现四个组件：

一个简单的用户界面，用来显示用户所说的内容和助理的回复。
将语音转换为文本。
处理文本并执行操作。
将文本转换为语音。
用户界面
第一步就是创建一个简单的用户界面，它包含一个按钮用来触发助理，一个用于显示用户命令和助理响应的 div、一个用于显示处理信息的 p 组件。

const startBtn = document.createElement("button");
startBtn.innerHTML = "Start listening";
const result = document.createElement("div");
const processing = document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);
1.
2.
3.
4.
5.
6.
7.
8.

语音转文本
我们需要构建一个组件来捕获语音命令并将其转换为文本，以进行进一步处理。在本教程中，我们使用 Web Speech API 的 SpeechRecognition。由于这个 API 只能在受支持的浏览器中使用，我们将显示警告信息并阻止用户在不受支持的浏览器中看到 Start 按钮。

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (typeof SpeechRecognition === "undefined") {
  startBtn.remove();
  result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";
}
1.
2.
3.
4.
5.

我们需要创建一个 SpeechRecognition 的实例，可以设置一组各种属性来定制语音识别。在这个应用程序中，我们将 continuous 和 interimResults 设置为 true，以便实时显示语音文本。

const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;
1.
2.
3.

我们添加一个句柄来处理来自语音 API 的 onresult 事件。在这个处理程序中，我们以文本形式显示用户的语音命令，并调用函数 process 来执行操作。这个 process 函数将在下一步实现。

function process(speech_text) {
    return "....";
}
recognition.onresult = event => {
   const last = event.results.length - 1;
   const res = event.results[last];
   const text = res[0].transcript;
   if (res.isFinal) {
      processing.innerHTML = "processing ....";
      const response = process(text);
      const p = document.createElement("p");
      p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;
      processing.innerHTML = "";
      result.appendChild(p);
      // add text to speech later
   } else {
      processing.innerHTML = `listening: ${text}`;
   }
}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.

我们还需要将用户界面的 button 与 recognition 对象链接起来，以启动 / 停止语音识别。

let listening = false;
toggleBtn = () => {
   if (listening) {
      recognition.stop();
      startBtn.textContent = "Start listening";
   } else {
      recognition.start();
      startBtn.textContent = "Stop listening";
   }
   listening = !listening;
};
startBtn.addEventListener("click", toggleBtn);
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

处理文本并执行操作
在这一步中，我们将构建一个简单的会话逻辑并处理一些基本操作。助理可以回复“hello”、“what's your name？”、“how are you？”、提供当前时间的信息、“stop”听取或打开一个新的标签页来搜索它不能回答的问题。你可以通过使用一些 AI 库进一步扩展这个 process 函数，使助理更加智能。

function process(rawText) {
   // remove space and lowercase text
   let text = rawText.replace(/\s/g, "");
   text = text.toLowerCase();
   let response = null;
   switch(text) {
      case "hello":
         response = "hi, how are you doing?"; break;
      case "what'syourname":
         response = "My name's Siri.";  break;
      case "howareyou":
         response = "I'm good."; break;
      case "whattimeisit":
         response = new Date().toLocaleTimeString(); break;
      case "stop":
         response = "Bye!!";
         toggleBtn(); // stop listening
   }
   if (!response) {
      window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");
      return "I found some information for " + rawText;
   }
   return response;
}
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.

文本转语音
在最后一步中，我们使用 Web Speech API 的 speechSynthesis 控制器为我们的助理提供语音。这个 API 简单明了。

speechSynthesis.speak(new SpeechSynthesisUtterance(response));

就是这样！我们只用了 80 行代码就有了一个很酷的助理。程序的演示可以在这里找到。

 // UI comp
const startBtn = document.createElement("button");
startBtn.innerHTML = "Start listening";
const result = document.createElement("div");
const processing = document.createElement("p");
document.write("<body><h1>My Siri</h1><p>Give it a try with 'hello', 'how are you', 'what's your name', 'what time is it', 'stop', ... </p></body>");
document.body.append(startBtn);
document.body.append(result);
document.body.append(processing);
// speech to text
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
let toggleBtn = null;
if (typeof SpeechRecognition === "undefined") {
	startBtn.remove();
	result.innerHTML = "<b>Browser does not support Speech API. Please download latest chrome.<b>";
} else {
	const recognition = new SpeechRecognition();
	recognition.continuous = true;
	recognition.interimResults = true;
	recognition.onresult = event => {
		const last = event.results.length - 1;
		const res = event.results[last];
		const text = res[0].transcript;
		if (res.isFinal) {
			processing.innerHTML = "processing ....";
			const response = process(text);
			const p = document.createElement("p");
			p.innerHTML = `You said: ${text} </br>Siri said: ${response}`;
			processing.innerHTML = "";
			result.appendChild(p);
			// text to speech
			speechSynthesis.speak(new SpeechSynthesisUtterance(response));
		} else {
			processing.innerHTML = `listening: ${text}`;
		}
	}
	let listening = false;
	toggleBtn = () => {
		if (listening) {
			recognition.stop();
			startBtn.textContent = "Start listening";
		} else {
			recognition.start();
			startBtn.textContent = "Stop listening";
		}
		listening = !listening;
	};
	startBtn.addEventListener("click", toggleBtn);
}
// processor
function process(rawText) {
	let text = rawText.replace(/\s/g, "");
	text = text.toLowerCase();
	let response = null;
	switch(text) {
		case "hello":
			response = "hi, how are you doing?"; break;
		case "what'syourname":
			response = "My name's Siri.";  break;
		case "howareyou":
			response = "I'm good."; break;
		case "whattimeisit":
			response = new Date().toLocaleTimeString(); break;
		case "stop":
			response = "Bye!!";
			toggleBtn();
	}
	if (!response) {
		window.open(`http://google.com/search?q=${rawText.replace("search", "")}`, "_blank");
		return `I found some information for ${rawText}`;
	}
	return response;
}
×
Drag and Drop
The image will be downloaded
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.

作者介绍：

Tuan Nhu Dinh，Facebook 软件工程师。

来源：InfoQ

分类

JavaScript

标签

已于2020-9-7 14:09:29修改

51CTO

51CTO博客

51CTO学堂

用 80 行 JavaScript 代码构建自己的语音助手

订阅鸿蒙技术特刊，精选内容抢先看