TOP > 外国特許検索 > Voice interaction system, voice interaction method, program, learning model generation device, and learning model generation method

Voice interaction system, voice interaction method, program, learning model generation device, and learning model generation method NEW

外国特許コード F200010114
整理番号 6206
掲載日 2020年5月18日
出願国 中華人民共和国
出願番号 201910783430
公報番号 110875032
出願日 令和元年8月23日(2019.8.23)
公報発行日 令和2年3月10日(2020.3.10)
優先権データ
  • 特願2018-162774 (2018.8.31) JP
発明の名称 (英語) Voice interaction system, voice interaction method, program, learning model generation device, and learning model generation method NEW
発明の概要(英語) A voice interaction system capable of appropriately handling a situation so as to effectively prevent a response error from occurring is provided. A speech acquisition unit 102 acquires user speech. A feature extraction unit 104 extracts a feature of the user speech. A response determination unit 120 determines a response corresponding to the extracted feature vector using any one of a plurality of learning models. A response execution unit 130 executes the determined response. A user state detection unit 140 detects a user state. A learning model selection unit 150 selects a learning model from a plurality of learning models in accordance with the detected user state. The response determination unit 120 determines a response using the selected learning model.
(From EP3618063 A1)
従来技術、競合技術の概要(英語) BACKGROUND ART
Background
Technology for enabling a user to enjoy daily conversation with a voice interactive robot (voice interactive system) is becoming widespread. The voice interaction robot according to the technology analyzes voice information of voice uttered by a user and responds according to an analysis result. Here, the voice interactive robot determines a response using the learning model.
With regard to the above-described technology, japanese unexamined patent application publication No.2005-352154 discloses an emotional state reaction operation device that evaluates the emotional state of a user based on the voice uttered by the user and performs appropriate corresponding operations. The emotional state reaction operating device according to japanese unexamined patent application publication No.2005-352154 includes: phoneme feature quantity extracting means for extracting feature quantities related to a phoneme spectrum of the speech information; state determination means for inputting phoneme feature quantities and determining an emotional state of the speech information based on a state determination table prepared in advance; and corresponding action selection means for inputting an emotional state and determining a corresponding course of action based on a corresponding action selection table prepared in advance. The emotional state reaction sports apparatus according to japanese unexamined patent application publication No.2005-352154 further includes an emotional state learning table and an emotional state learning device. The emotional state learning device acquires a relationship between the phoneme feature quantity and the emotional state using a predetermined machine learning model based on the emotional state learning table, and stores the learning result in the state determination table. The state determination device determines an emotional state from the machine learning model based on the state determination table.
特許請求の範囲(英語) [claim1]
1. A voice interactive system that conducts a conversation with a user by using voice, the voice interactive system comprising: an utterance obtaining unit configured to obtain a user utterance given by the user; a feature extraction unit configured to extract at least features of the acquired user utterance; a response determination unit configured to: determining a response from the extracted features using any one of a plurality of learning models generated in advance by machine learning; a response execution unit configured to control so as to execute the determined response; a user state detection unit configured to detect a user state, the user state being a state of the user; and a learning model selection unit configured to select a learning model from the plurality of learning models according to the detected user state, wherein the response determination unit determines the response using the learning model selected by the learning model selection unit.

[claim2]
2. The voice interaction system of claim 1, the user state detection unit detects a degree of aggressiveness of the user in the session as the user state, an The learning model selection unit selects the learning model corresponding to a degree of the user's aggressiveness.

[claim3]
3. The voice interaction system of claim 2, the user state detection unit detects an amount of utterance given by the user within a predetermined period of time, or a percentage of a time during which the user has uttered an utterance with respect to a sum of a time during which the voice interaction system has output a voice as a response and a time during which the user has uttered the utterance within the predetermined period of time, and the learning model selection unit selects the learning model corresponding to the amount of speech given by the user or the percentage of time that the user has uttered speech.

[claim4]
4. The voice interaction system of claim 1, the user state detection unit detects identification information on the user as the user state, an The learning model selection unit selects the learning model corresponding to the identification information about the user.

[claim5]
5. The voice interaction system of claim 1, the user state detection unit detects emotion of the user as the user state, an The learning model selection unit selects the learning model corresponding to the emotion of the user.

[claim6]
6. The voice interaction system of claim 1, the user state detection unit detects a health condition of the user as the user state, an The learning model selection unit selects the learning model corresponding to the health condition of the user.

[claim7]
7. The voice interaction system of claim 1, the user state detection unit detects a degree of the user's awake state as the user state, an The learning model selection unit selects the learning model corresponding to a degree of the awake state of the user.

[claim8]
8. A voice interaction method performed by a voice interaction system that makes a conversation with a user by using voice, the voice interaction method comprising: obtaining a user utterance given by the user; extracting at least features of the acquired user utterance; determining a response from the extracted features using any one of a plurality of learning models generated in advance by machine learning; controlling so as to perform the determined response; detecting a user state, the user state being a state of a user; and selecting a learning model from the plurality of learning models based on the detected user state, wherein the response is determined using the selected learning model.

[claim9]
9. A computer-readable medium storing a program for executing a voice interaction method performed by a voice interaction system that makes a conversation with a user by using voice, the program causing a computer to execute the steps of: obtaining a user utterance given by the user; extracting at least features of the acquired user utterance; determining a response from the extracted features using any one of a plurality of learning models generated in advance by machine learning; controlling so as to perform the determined response; detecting a user state, the user state being a state of a user; selecting a learning model from the plurality of learning models according to the detected user state; and determining the response using the selected learning model.

[claim10]
10. A learning model generation apparatus configured to generate a learning model for use in a voice interaction system that carries out a conversation with a user by using voice, the apparatus comprising: a voice acquisition unit configured to acquire a user utterance by a conversation with a desired user, the user utterance being an utterance given by one or more of the desired users; a feature extraction unit configured to extract a feature vector indicating at least a feature of the acquired user utterance; a sample data generation unit configured to generate sample data in which a correct tag for indicating a response to the user utterance and the feature vector are associated with each other; a user state acquisition unit configured to acquire a user state to associate the acquired user state with sample data corresponding to the user utterance, the user state being a state of the user when the desired user has uttered an utterance; a sample data classification unit configured to classify the sample data for each of the user states; and a learning model generation unit configured to generate a plurality of learning models by machine learning for the sample data of each classification.

[claim11]
11. A learning model generation method for generating a learning model used in a voice interaction system that carries out a conversation with a user by using voice, the method comprising: obtaining user utterances by conducting a conversation with a desired user, the user utterances being utterances given by one or more of the desired users; extracting a feature vector indicating at least a feature of the acquired user utterance; generating sample data in which a correct label for indicating a response to the user utterance and the feature vector are associated with each other; obtaining a user state to associate the obtained user state with the sample data corresponding to the user utterance, the user state being a state of the user when the desired user has uttered an utterance; for each of the user states, classifying the sample data; and generating a plurality of learning models by machine learning for the sample data of each classification.
  • 出願人(英語)
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
  • 発明者(英語)
  • KAWAHARA TATSUYA
  • HORI TATSURO
  • WATANABE NARIMASA
国際特許分類(IPC)
ライセンスをご希望の方、特許の内容に興味を持たれた方は、下記までご連絡ください。

PAGE TOP

close
close
close
close
close
close