Top > Search of International Patents > Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method

Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method

Foreign code F200010113
File No. 6206
Posted date May 18, 2020
Country EPO
Application number 19191406
Gazette No. 3618063
Date of filing Aug 13, 2019
Gazette Date Mar 4, 2020
Priority data
  • P2018-162774 (Aug 31, 2018) JP
Title Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method
Abstract A voice interaction system capable of appropriately handling a situation so as to effectively prevent a response error from occurring is provided. A speech acquisition unit 102 acquires user speech. A feature extraction unit 104 extracts a feature of the user speech. A response determination unit 120 determines a response corresponding to the extracted feature vector using any one of a plurality of learning models. A response execution unit 130 executes the determined response. A user state detection unit 140 detects a user state. A learning model selection unit 150 selects a learning model from a plurality of learning models in accordance with the detected user state. The response determination unit 120 determines a response using the selected learning model.
Outline of related art and contending technology BACKGROUND
The present disclosure relates to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method, and in particular, to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method for having a conversation with a user by using a voice.
A technique for enabling a user to enjoy a daily conversation with a voice interaction robot (voice interaction system) is becoming widespread. A voice interaction robot according to this technique analyzes phonological information of a voice uttered by a user and makes a response according to a result of the analysis. Here, the voice interaction robot determines a response using a learning model.
Regarding the above technique, Japanese Unexamined Patent Application Publication No. 2005-352154 discloses an emotional state reaction operation apparatus which evaluates an emotional state of a user from a voice uttered by the user and executes an appropriate corresponding operation. The emotional state reaction operation apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 includes phoneme feature quantity extraction means for extracting a feature quantity related to a phoneme spectrum of voice information, state determination means for inputting the phoneme feature quantity and determining an emotional state of the voice information based on a state determination table prepared in advance, and corresponding action selection means for inputting the emotional state and determining a corresponding action process based on a corresponding action selection table prepared in advance. The emotional state reaction motion apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 further includes an emotional state learning table and emotional state learning means. The emotional state learning means acquires a relation between the phoneme feature quantity and the emotional state using a predetermined machine learning model based on the emotional state learning table and stores a result of the learning in the state determination table. The state determination means determines an emotional state according to the machine learning model based on the state determination table.
Scope of claims [claim1]
1. A voice interaction system (1) that has a conversation with a user by using a voice, comprising:
a speech acquisition unit (102) configured to acquire user speech given by the user;
a feature extraction unit (104) configured to extract at least a feature of the acquired user speech;
a response determination unit (120) configured to determine a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
a response execution unit (130) configured to perform control in order to execute the determined response;
a user state detection unit (140) configured to detect a user state, which is a state of the user; and
a learning model selection unit (150) configured to select the learning model from the plurality of learning models in accordance with the detected user state,
wherein the response determination unit (120) determines the response using the learning model selected by the learning model selection unit (150).

[claim2]
2. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a degree of activeness of the user in the conversation as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the degree of the activeness of the user.

[claim3]
3. The voice interaction system (1) according to Claim 2, wherein
the user state detection unit (140) detects the amount of speech given by the user in a predetermined period or the percentage of the time during which the user has made a speech with respect to the sum of the time during which the voice interaction system has output a voice as a response and the time during which the user has made a speech in the predetermined period, and
the learning model selection unit (150) selects the learning model that corresponds to the amount of speech given by the user or the percentage of the time during which the user has made a speech.

[claim4]
4. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects identification information on the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the identification information on the user.

[claim5]
5. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects emotion of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the emotion of the user.

[claim6]
6. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a health condition of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the health condition of the user.

[claim7]
7. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a degree of an awakening state of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the degree of the awakening state of the user.

[claim8]
8. A voice interaction method performed by a voice interaction system (1) that has a conversation with a user by using a voice, the voice interaction method comprising:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user; and
selecting the learning model from the plurality of learning models in accordance with the detected user state,
wherein the response is determined using the selected learning model.

[claim9]
9. A program for executing a voice interaction method performed by a voice interaction system (1) that has a conversation with a user by using a voice, the program causing a computer to execute the steps of:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user;
selecting the learning model from the plurality of learning models in accordance with the detected user state; and
determining the response using the selected learning model.

[claim10]
10. A learning model generation apparatus (200) configured to generate a learning model used in a voice interaction system (1) that has a conversation with a user by using a voice, the apparatus comprising:
a speech acquisition unit (212) configured to acquire user speech, which is speech given by one or more desired user, by having a conversation with the desired user;
a feature extraction unit (214) configured to extract a feature vector indicating at least a feature of the acquired user speech;
a sample data generation unit (216) configured to generate sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
a user state acquisition unit (218) configured to acquire a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
a sample data classification unit (220) configured to classify the sample data for each of the user states; and
a learning model generation unit (222) configured to generate a plurality of learning models by machine learning for each of the pieces of classified sample data.

[claim11]
11. A learning model generation method for generating a learning model used in a voice interaction system (1) that has a conversation with a user by using a voice, the method comprising:
acquiring user speech, which is speech given by one or more desired user, by having a conversation with the desired user;
extracting a feature vector indicating at least a feature of the acquired user speech;
generating sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
acquiring a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
classifying the sample data for each of the user states; and
generating a plurality of learning models by machine learning for each of the pieces of classified sample data.
  • Applicant
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
  • Inventor
  • KAWAHARA, Tatsuya
  • HORI, Tatsuro
  • WATANABE, Narimasa
IPC(International Patent Classification)
Specified countries Contracting States: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Extension States: BA ME
Please contact us by e-mail or facsimile if you have any interests on this patent. Thanks.

PAGE TOP

close
close
close
close
close
close