TOP > 外国特許検索 > Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method

Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method NEW

外国特許コード F200010113
整理番号 6206
掲載日 2020年5月18日
出願国 欧州特許庁(EPO)
出願番号 19191406
公報番号 3618063
出願日 令和元年8月13日(2019.8.13)
公報発行日 令和2年3月4日(2020.3.4)
優先権データ
  • 特願2018-162774 (2018.8.31) JP
発明の名称 (英語) Voice interaction system, voice interaction method, program, learning model generation apparatus, and learning model generation method NEW
発明の概要(英語) A voice interaction system capable of appropriately handling a situation so as to effectively prevent a response error from occurring is provided. A speech acquisition unit 102 acquires user speech. A feature extraction unit 104 extracts a feature of the user speech. A response determination unit 120 determines a response corresponding to the extracted feature vector using any one of a plurality of learning models. A response execution unit 130 executes the determined response. A user state detection unit 140 detects a user state. A learning model selection unit 150 selects a learning model from a plurality of learning models in accordance with the detected user state. The response determination unit 120 determines a response using the selected learning model.
従来技術、競合技術の概要(英語) BACKGROUND
The present disclosure relates to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method, and in particular, to a voice interaction system, a voice interaction method, a program, a learning model generation apparatus, and a learning model generation method for having a conversation with a user by using a voice.
A technique for enabling a user to enjoy a daily conversation with a voice interaction robot (voice interaction system) is becoming widespread. A voice interaction robot according to this technique analyzes phonological information of a voice uttered by a user and makes a response according to a result of the analysis. Here, the voice interaction robot determines a response using a learning model.
Regarding the above technique, Japanese Unexamined Patent Application Publication No. 2005-352154 discloses an emotional state reaction operation apparatus which evaluates an emotional state of a user from a voice uttered by the user and executes an appropriate corresponding operation. The emotional state reaction operation apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 includes phoneme feature quantity extraction means for extracting a feature quantity related to a phoneme spectrum of voice information, state determination means for inputting the phoneme feature quantity and determining an emotional state of the voice information based on a state determination table prepared in advance, and corresponding action selection means for inputting the emotional state and determining a corresponding action process based on a corresponding action selection table prepared in advance. The emotional state reaction motion apparatus according to Japanese Unexamined Patent Application Publication No. 2005-352154 further includes an emotional state learning table and emotional state learning means. The emotional state learning means acquires a relation between the phoneme feature quantity and the emotional state using a predetermined machine learning model based on the emotional state learning table and stores a result of the learning in the state determination table. The state determination means determines an emotional state according to the machine learning model based on the state determination table.
特許請求の範囲(英語) [claim1]
1. A voice interaction system (1) that has a conversation with a user by using a voice, comprising:
a speech acquisition unit (102) configured to acquire user speech given by the user;
a feature extraction unit (104) configured to extract at least a feature of the acquired user speech;
a response determination unit (120) configured to determine a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
a response execution unit (130) configured to perform control in order to execute the determined response;
a user state detection unit (140) configured to detect a user state, which is a state of the user; and
a learning model selection unit (150) configured to select the learning model from the plurality of learning models in accordance with the detected user state,
wherein the response determination unit (120) determines the response using the learning model selected by the learning model selection unit (150).

[claim2]
2. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a degree of activeness of the user in the conversation as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the degree of the activeness of the user.

[claim3]
3. The voice interaction system (1) according to Claim 2, wherein
the user state detection unit (140) detects the amount of speech given by the user in a predetermined period or the percentage of the time during which the user has made a speech with respect to the sum of the time during which the voice interaction system has output a voice as a response and the time during which the user has made a speech in the predetermined period, and
the learning model selection unit (150) selects the learning model that corresponds to the amount of speech given by the user or the percentage of the time during which the user has made a speech.

[claim4]
4. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects identification information on the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the identification information on the user.

[claim5]
5. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects emotion of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the emotion of the user.

[claim6]
6. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a health condition of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the health condition of the user.

[claim7]
7. The voice interaction system (1) according to Claim 1, wherein
the user state detection unit (140) detects a degree of an awakening state of the user as the user state, and
the learning model selection unit (150) selects the learning model that corresponds to the degree of the awakening state of the user.

[claim8]
8. A voice interaction method performed by a voice interaction system (1) that has a conversation with a user by using a voice, the voice interaction method comprising:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user; and
selecting the learning model from the plurality of learning models in accordance with the detected user state,
wherein the response is determined using the selected learning model.

[claim9]
9. A program for executing a voice interaction method performed by a voice interaction system (1) that has a conversation with a user by using a voice, the program causing a computer to execute the steps of:
acquiring user speech given by the user;
extracting at least a feature of the acquired user speech;
determining a response in accordance with the extracted feature using any one of a plurality of learning models generated in advance by machine learning;
performing control in order to execute the determined response;
detecting a user state, which is a state of the user;
selecting the learning model from the plurality of learning models in accordance with the detected user state; and
determining the response using the selected learning model.

[claim10]
10. A learning model generation apparatus (200) configured to generate a learning model used in a voice interaction system (1) that has a conversation with a user by using a voice, the apparatus comprising:
a speech acquisition unit (212) configured to acquire user speech, which is speech given by one or more desired user, by having a conversation with the desired user;
a feature extraction unit (214) configured to extract a feature vector indicating at least a feature of the acquired user speech;
a sample data generation unit (216) configured to generate sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
a user state acquisition unit (218) configured to acquire a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
a sample data classification unit (220) configured to classify the sample data for each of the user states; and
a learning model generation unit (222) configured to generate a plurality of learning models by machine learning for each of the pieces of classified sample data.

[claim11]
11. A learning model generation method for generating a learning model used in a voice interaction system (1) that has a conversation with a user by using a voice, the method comprising:
acquiring user speech, which is speech given by one or more desired user, by having a conversation with the desired user;
extracting a feature vector indicating at least a feature of the acquired user speech;
generating sample data in which a correct label indicating a response to the user speech and the feature vector are associated with each other;
acquiring a user state, which is a state of the desired user when the user has made a speech, to associate the acquired user state with the sample data that corresponds to the user speech;
classifying the sample data for each of the user states; and
generating a plurality of learning models by machine learning for each of the pieces of classified sample data.
  • 出願人(英語)
  • KYOTO UNIVERSITY
  • TOYOTA MOTOR
  • 発明者(英語)
  • KAWAHARA, Tatsuya
  • HORI, Tatsuro
  • WATANABE, Narimasa
国際特許分類(IPC)
指定国 Contracting States: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
Extension States: BA ME
ライセンスをご希望の方、特許の内容に興味を持たれた方は、下記までご連絡ください。

PAGE TOP

close
close
close
close
close
close