TOP > 外国特許検索 > Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method

Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method

外国特許コード F200010112
整理番号 6206
掲載日 2020年5月18日
出願国 アメリカ合衆国
出願番号 201916555603
公報番号 20200075007
出願日 令和元年8月29日(2019.8.29)
公報発行日 令和2年3月5日(2020.3.5)
優先権データ
  • 特願2018-162774 (2018.8.31) JP
発明の名称 (英語) Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method
発明の概要(英語) A voice interaction system capable of appropriately handling a situation so as to effectively prevent a response error from occurring is provided. A speech acquisition unit acquires user speech. A feature extraction unit extracts a feature of the user speech. A response determination unit determines a response corresponding to the extracted feature vector using any one of a plurality of learning models. A response execution unit executes the determined response. A user state detection unit detects a user state. A learning model selection unit selects a learning model from a plurality of learning models in accordance with the detected user state. The response determination unit determines a response using the selected learning model.
従来技術、競合技術の概要(英語) BACKGROUND
The present disclosure relates to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method, and in particular, to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method for having a conversation with a user by using a voice.
A technique for enabling a user to enjoy a daily conversation with a voice interaction robot (voice interaction system) is becoming widespread. A voice interaction robot according to this technique analyzes phonological information of a voice uttered by a user and makes a response according to a result of the analysis. Here, the voice interaction robot determines a response using a learning model.
Regarding the above technique, Japanese Unexamined Patent Application Publication No. 2005-352154 discloses an emotional state reaction operation apparatus which evaluates an emotional state of a user from a voice uttered by the user and executes an appropriate corresponding operation. The emotional state reaction operation apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 includes a phoneme feature quantity extraction function for extracting a feature quantity related to a phoneme spectrum of voice information, a state determination function for inputting the phoneme feature quantity and determining an emotional state of the voice information based on a state determination table prepared in advance, and a corresponding action selection function for inputting the emotional state and determining a corresponding action process based on a corresponding action selection table prepared in advance. The emotional state reaction motion apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 further includes an emotional state learning table and emotional state learning function. The emotional state learning function acquires a relation between the phoneme feature quantity and the emotional state using a predetermined machine learning model based on the emotional state learning table and stores a result of the learning in the state determination table. The state determination function determines an emotional state according to the machine learning model based on the state determination table.
特許請求の範囲(英語) [claim1]
1. A voice interaction system that has a conversation with a user by using a voice, comprising:
hardware, including at least one memory configured to store a computer program and at least one processor configured to execute the computer program;
a speech acquisition unit, implemented by the hardware, configured to acquire user speech given by the user;
a feature extraction unit, implemented by the hardware, configured to extract at least a feature of the acquired user speech;
a response determination unit, implemented by the hardware, configured to determine a response in accordance with the extracted feature using any one of a plurality of learning, models generated in advance by machine learning;
a response execution unit, implemented by the hardware, configured to perform control in order to execute the determined response;
a user state detection unit, implemented by the hardware, configured to detect a user state, which is a state of the user; and
a learning model selection unit, implemented by the hardware, configured to select a learning model from the plurality of learning models in accordance with the detected user state,
wherein the response determination unit, implemented by the hardware, determines the response using the learning model selected by the learning model selection unit.

[claim2]
2. The voice interaction system according to claim 1, wherein
the user state detection unit detects a degree of activeness of the user in the conversation as the user state, and
the learning model selection unit selects the learning model that corresponds to the degree of the activeness of the user.

[claim3]
3. The voice interaction system according to claim 2, wherein
the user state detection unit detects an amount of speech given by the user in a predetermined period or a percentage of time during which the user has made a speech with respect to a sum of time during which the voice interaction system has output a voice as a response and the time during which the user has made a speech in the predetermined period, and
the learning, model selection unit selects the learning model that corresponds to the amount of speech given by the user or the percentage of the time during which the user has made a speech.

[claim4]
4. The voice interaction system according to claim 1, wherein
the user state detection unit detects identification information on the user as the user state, and
the learning model selection unit selects the learning model that corresponds to the identification information on the user.

[claim5]
5. The voice interaction system according to claim 1, wherein
the user state detection unit detects emotion of the user as the user state, and
the learning model selection unit selects the learning model that corresponds to the emotion of the user.

[claim6]
6. The voice interaction system, according to claim 1, wherein
the user state detection unit detects a health condition of the user as the user state, and
the learning model selection unit selects the learning model that corresponds to the health condition of the user.

[claim7]
7. The voice interaction system according to claim 1, wherein
the user state detection unit detects a degree of an awakening state of the user as the user state, and
the learning model selection unit selects the learning model that corresponds to the degree of the awakening state of the user.

[claim8]
8. A voice interaction method performed by a voice interaction system that has a conversation with a user by using a voice, the voice interaction method comprising:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user; and
selecting a learning model from the plurality of learning models in accordance with the detected user state,
wherein the response is determined using the selected learning model.

[claim9]
9. A non-transitory computer readable medium storing a program for executing a voice interaction method performed by a voice interaction system that has a conversation with a user by using a voice, the program causing a computer to execute the steps of:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user;
selecting a learning model from the plurality of learning models in accordance with the detected user state; and
determining the response using the selected learning model.

[claim10]
10. A learning model generation apparatus configured to generate a learning model used in a voice interaction system that has a conversation with a user by using a voice, the apparatus comprising:
hardware, including at least one memory configured to store a computer program and at least one processor configured to execute the computer program;
a speech acquisition unit, implemented by the hardware, configured to acquire user speech, which is speech given by at least one desired user, by having a conversation with the desired user;
a feature extraction unit, implemented by the hardware, configured to extract a feature vector indicating at least a feature of the acquired user speech;
a sample data generation unit configured to generate sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
a user state acquisition unit, implemented by the hardware, configured to acquire a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
a sample data classification unit, implemented by the hardware, configured to classify the sample data for each of the user states; and
a learning model generation, unit, implemented by the hardware, configured to generate a plurality of learning models by machine learning for each of pieces of the classified sample data.

[claim11]
11. A learning model generation method for generating a learning model used in a voice interaction system that has a conversation with a user by using a voice, the method comprising:
acquiring user speech, which is speech given by at least one desired user, by having a conversation with the desired user;
extracting a feature vector indicating at least a feature of the acquired user speech;
generating sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
acquiring a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
classifying the sample data for each of the user states; and
generating a plurality of learning models by machine learning for each pieces of the classified sample data.
  • 発明者/出願人(英語)
  • Kawahara Tatsuya
  • Hori Tatsuro
  • Watanabe Narimasa
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
国際特許分類(IPC)
ライセンスをご希望の方、特許の内容に興味を持たれた方は、下記までご連絡ください。

PAGE TOP

close
close
close
close
close
close