TOP > 外国特許検索 > Voice interaction apparatus, its processing method, and program

Voice interaction apparatus, its processing method, and program NEW

外国特許コード F200010123
整理番号 5788
掲載日 2020年5月18日
出願国 欧州特許庁(EPO)
出願番号 18155702
公報番号 3370230
出願日 平成30年2月8日(2018.2.8)
公報発行日 平成30年9月5日(2018.9.5)
優先権データ
  • 特願2017-040580 (2017.3.3) JP
発明の名称 (英語) Voice interaction apparatus, its processing method, and program NEW
発明の概要(英語) A voice interaction apparatus incudes voice recognition means for recognizing a voice of a user, response-sentence generation means for generating a response sentence to the voice of the user based on the recognized voice, filler generation means for generating a filler word to be inserted in a conversation, output means for outputting the generated response sentence and the generated filler word, and classification means for classifying the generated response sentence into one of predetermined speech patterns indicating predefined speech types. When the output means outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence, the classification means classifies the first response sentence into one of the speech patterns, and the filler generation means generates the filler word based on the speech pattern into which the first response sentence has been classified.
従来技術、競合技術の概要(英語) BACKGROUND
The present disclosure relates to a voice interaction apparatus that performs a voice interaction with a user, and its processing method and a program therefor.
A voice interaction apparatus that inserts filler words (i.e., words for filling silences in conversations) to prevent silences in conversations from being unnaturally prolonged has been known (see Japanese Unexamined Patent Application Publication No. 2014-191030).
However, the present inventors have found the following problem. That is, the aforementioned voice interaction apparatus outputs a formal (i.e., perfunctory) filler word as a word for filling a silence when a waiting time occurs in a conversation. Therefore, there is a possibility that the inserted filler word may not fit well with the content (e.g., meaning) of the conversation and hence make the conversation unnatural.
特許請求の範囲(英語) [claim1]
1. A voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5), wherein
the voice interaction apparatus (1) further comprises classification means (7) for classifying the response sentence generated by the response-sentence generation means (4) into one of predetermined speech patterns indicating predefined speech types, and
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
the classification means (7) classifies the first response sentence into one of the speech patterns, and
the filler generation means (5) generates the filler word based on the speech pattern into which the first response sentence has been classified by the classification means (7).

[claim2]
2. The voice interaction apparatus (1) according to Claim 1, wherein the voice interaction apparatus (1) further comprises:
storage means (9) for storing table information including the speech patterns and information about types of feature values associated with the speech patterns; and
feature-value calculation means (8) for calculating a feature value of a preceding or subsequent speech based on information about the type of the feature value associated with the speech pattern into which the first response sentence has been classified by the classification means (7), wherein
the filler generation means (5) generates the filler word based on the feature value calculated by the feature-value calculation means (8).

[claim3]
3. The voice interaction apparatus (1) according to Claim 2, wherein the information about the type of the feature value includes at least one of prosodic information of the preceding speech, linguistic information of the preceding speech, linguistic information of the subsequent speech, and prosodic information of the subsequent speech.

[claim4]
4. The voice interaction apparatus (1) according to Claim 2 or 3, wherein
the storage means (9) stores filler form information associated with respective feature values of filler types each of which includes at least one filler word and indicates a type of the filler word, and
the filler generation means (5) narrows down the number of filler types based on the speech pattern into which the first response sentence has been classified by the classification means (7), selects one filler type associated with the feature value calculated by the feature-value calculation means (8) from among the narrowed-down number of filler types, and generates the filler word by selecting the filler word included in the selected filler type.

[claim5]
5. A processing method for a voice interaction apparatus (1), the voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5),
the processing method comprising:
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the speech pattern into which the first response sentence has been classified.

[claim6]
6. A program for a voice interaction apparatus (1), the voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5),
the program being adapted to cause a computer to perform:
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the speech pattern into which the first response sentence has been classified.
  • 出願人(英語)
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
  • 発明者(英語)
  • KAWAHARA, Tatsuya
  • TAKANASHI, Katsuya
  • NAKANISHI, Ryosuke
  • WATANABE, Narimasa
国際特許分類(IPC)
指定国 Contracting States: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Extension States: BA ME
ライセンスをご希望の方、特許の内容に興味を持たれた方は、下記までご連絡ください。

PAGE TOP

close
close
close
close
close
close