Top > Search of International Patents > Voice interaction apparatus, its processing method, and program

Voice interaction apparatus, its processing method, and program

Foreign code F200010123
File No. 5788
Posted date May 18, 2020
Country EPO
Application number 18155702
Gazette No. 3370230
Date of filing Feb 8, 2018
Gazette Date Sep 5, 2018
Priority data
  • P2017-040580 (Mar 3, 2017) JP
Title Voice interaction apparatus, its processing method, and program
Abstract A voice interaction apparatus incudes voice recognition means for recognizing a voice of a user, response-sentence generation means for generating a response sentence to the voice of the user based on the recognized voice, filler generation means for generating a filler word to be inserted in a conversation, output means for outputting the generated response sentence and the generated filler word, and classification means for classifying the generated response sentence into one of predetermined speech patterns indicating predefined speech types. When the output means outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence, the classification means classifies the first response sentence into one of the speech patterns, and the filler generation means generates the filler word based on the speech pattern into which the first response sentence has been classified.
Outline of related art and contending technology BACKGROUND
The present disclosure relates to a voice interaction apparatus that performs a voice interaction with a user, and its processing method and a program therefor.
A voice interaction apparatus that inserts filler words (i.e., words for filling silences in conversations) to prevent silences in conversations from being unnaturally prolonged has been known (see Japanese Unexamined Patent Application Publication No. 2014-191030).
However, the present inventors have found the following problem. That is, the aforementioned voice interaction apparatus outputs a formal (i.e., perfunctory) filler word as a word for filling a silence when a waiting time occurs in a conversation. Therefore, there is a possibility that the inserted filler word may not fit well with the content (e.g., meaning) of the conversation and hence make the conversation unnatural.
Scope of claims [claim1]
1. A voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5), wherein
the voice interaction apparatus (1) further comprises classification means (7) for classifying the response sentence generated by the response-sentence generation means (4) into one of predetermined speech patterns indicating predefined speech types, and
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
the classification means (7) classifies the first response sentence into one of the speech patterns, and
the filler generation means (5) generates the filler word based on the speech pattern into which the first response sentence has been classified by the classification means (7).

[claim2]
2. The voice interaction apparatus (1) according to Claim 1, wherein the voice interaction apparatus (1) further comprises:
storage means (9) for storing table information including the speech patterns and information about types of feature values associated with the speech patterns; and
feature-value calculation means (8) for calculating a feature value of a preceding or subsequent speech based on information about the type of the feature value associated with the speech pattern into which the first response sentence has been classified by the classification means (7), wherein
the filler generation means (5) generates the filler word based on the feature value calculated by the feature-value calculation means (8).

[claim3]
3. The voice interaction apparatus (1) according to Claim 2, wherein the information about the type of the feature value includes at least one of prosodic information of the preceding speech, linguistic information of the preceding speech, linguistic information of the subsequent speech, and prosodic information of the subsequent speech.

[claim4]
4. The voice interaction apparatus (1) according to Claim 2 or 3, wherein
the storage means (9) stores filler form information associated with respective feature values of filler types each of which includes at least one filler word and indicates a type of the filler word, and
the filler generation means (5) narrows down the number of filler types based on the speech pattern into which the first response sentence has been classified by the classification means (7), selects one filler type associated with the feature value calculated by the feature-value calculation means (8) from among the narrowed-down number of filler types, and generates the filler word by selecting the filler word included in the selected filler type.

[claim5]
5. A processing method for a voice interaction apparatus (1), the voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5),
the processing method comprising:
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the speech pattern into which the first response sentence has been classified.

[claim6]
6. A program for a voice interaction apparatus (1), the voice interaction apparatus (1) comprising:
voice recognition means (2) for recognizing a voice of a user;
response-sentence generation means (4) for generating a response sentence to the voice of the user based on the voice recognized by the voice recognition means (2);
filler generation means (5) for generating a filler word to be inserted in a conversation with the user; and
output means (6) for outputting the response sentence generated by the response-sentence generation means (4) and the filler word generated by the filler generation means (5),
the program being adapted to cause a computer to perform:
when the output means (6) outputs, after the user utters a voice subsequent to the first response sentence, the filler word and outputs a second response sentence,
classifying the first response sentence into one of predetermined speech patterns indicating predefined speech types, and
generating the filler word based on the speech pattern into which the first response sentence has been classified.
  • Applicant
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
  • Inventor
  • KAWAHARA, Tatsuya
  • TAKANASHI, Katsuya
  • NAKANISHI, Ryosuke
  • WATANABE, Narimasa
IPC(International Patent Classification)
Specified countries Contracting States: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Extension States: BA ME
Please contact us by e-mail or facsimile if you have any interests on this patent. Thanks.

PAGE TOP

close
close
close
close
close
close