TOP > 外国特許検索 > Voice interaction apparatus, its processing method, and program

Voice interaction apparatus, its processing method, and program

外国特許コード F200010122
整理番号 5788
掲載日 2020年5月18日
出願国 アメリカ合衆国
出願番号 201815883240
公報番号 10416957
出願日 平成30年1月30日(2018.1.30)
公報発行日 令和元年9月17日(2019.9.17)
優先権データ
  • 特願2017-040580 (2017.3.3) JP
発明の名称 (英語) Voice interaction apparatus, its processing method, and program
発明の概要(英語) A voice interaction apparatus incudes voice recognition means for recognizing a voice of a user, response-sentence generation means for generating a response sentence to the voice of the user based on the recognized voice, filler generation means for generating a filler word to be inserted in a conversation, output means for outputting the generated response sentence and the generated filler word, and classification means for classifying the generated response sentence into one of predetermined speech patterns indicating predefined speech types. When the output means outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence, the classification means classifies the first response sentence into one of the speech patterns, and the filler generation means generates the filler word based on the speech pattern into which the first response sentence has been classified.
従来技術、競合技術の概要(英語) BACKGROUND
The present disclosure relates to a voice interaction apparatus that performs a voice interaction with a user, and its processing method and a program therefor.
A voice interaction apparatus that inserts filler words (i.e., words for filling silences in conversations) to prevent silences in conversations from being unnaturally prolonged has been known (see Japanese Unexamined Patent Application Publication No. 2014-191030).
However, the present inventors have found the following problem. That is, the aforementioned voice interaction apparatus outputs a formal (i.e., perfunctory) filler word as a word for filling a silence when a waiting time occurs in a conversation. Therefore, there is a possibility that the inserted filler word may not fit well with the content (e.g., meaning) of the conversation and hence make the conversation unnatural.
特許請求の範囲(英語) [claim1]
1. A voice interaction apparatus comprising:
circuitry configured to:
recognize a voice of a user;
generate a first response sentence to the voice of the user based on the voice recognized voice;
generate a filler word to be inserted in a conversation with the user;
output the first response sentence and the filler word;
classify the first response sentence into one of predetermined speech patterns indicating predefined speech types; and
when outputting, after the user utters a voice subsequent to the outputting of the first response sentence, the filler word and outputting a second response sentence:
classify the first response sentence into one of the predetermined speech patterns, and
generate the filler word based on the predetermined speech pattern into which the first response sentence has been classified.

[claim2]
2. The voice interaction apparatus according to claim 1, wherein the circuitry is configured to:
store table information including the predetermined speech patterns and information about types of feature values associated with the predetermined speech patterns;
calculate a feature value of a preceding or subsequent speech based on information about a type of a feature value associated with the predetermined speech pattern into which the first response sentence has been classified; and
generate the filler word based on the calculated feature value.

[claim3]
3. The voice interaction apparatus according to claim 2, wherein the information about the type of the feature value includes at least one of prosodic information of the preceding speech, linguistic information of the preceding speech, linguistic information of the subsequent speech, and prosodic information of the subsequent speech.

[claim4]
4. The voice interaction apparatus according to claim 2, wherein the circuitry is configured to:
store filler form information associated with respective feature values of filler types each of which includes at least one filler word and indicates a type of the at least one filler word; and
narrow down a number of filler types based on the speech predetermined pattern into which the first response sentence has been classified;
select one filler type associated with the calculated feature value from among the narrowed-down number of filler types; and
generate the filler word by selecting the filler word included in the selected filler type.

[claim5]
5. A processing method for voice interaction, comprising:
recognizing a voice of a user;
generating a first response sentence to the voice of the user based on the recognized voice;
generating a filler word to be inserted in a conversation with the user;
outputting the first response sentence and the filler word; and
when outputting, after the user utters a voice subsequent to the outputting of the first response sentence, the filler word and outputting a second response sentence:
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the predetermined speech pattern into which the first response sentence has been classified.

[claim6]
6. A non-transitory computer readable medium that stores a program for voice interaction which when executed causes a computer to preform a method comprising:
recognizing a voice of a user;
generating a first response sentence to the voice of the user based on the recognized voice;
generating a filler word to be inserted in a conversation with the user;
outputting the first response sentence and the filler word,
when outputting, after the user utters a voice subsequent to the outputting of the first response sentence, the filler word and outputting a second response sentence:
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the predetermined speech pattern into which the first response sentence has been classified.
  • 発明者/出願人(英語)
  • KAWAHARA Tatsuya
  • TAKANASHI Katsuya
  • NAKANISHI Ryosuke
  • WATANABE Narimasa
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
国際特許分類(IPC)
ライセンスをご希望の方、特許の内容に興味を持たれた方は、下記までご連絡ください。

PAGE TOP

close
close
close
close
close
close