TOP > 外国特許検索 > BEHAVIOR ESTIMATION DEVICE, BEHAVIOR ESTIMATION METHOD, AND BEHAVIOR ESTIMATION PROGRAM

BEHAVIOR ESTIMATION DEVICE, BEHAVIOR ESTIMATION METHOD, AND BEHAVIOR ESTIMATION PROGRAM 新技術説明会

外国特許コード F200010006
整理番号 S2018-0471-C0
掲載日 2020年1月29日
出願国 世界知的所有権機関(WIPO)
国際出願番号 2019JP011693
国際公開番号 WO 2019202900
国際出願日 平成31年3月20日(2019.3.20)
国際公開日 令和元年10月24日(2019.10.24)
優先権データ
  • 特願2018-078057 (2018.4.15) JP
発明の名称 (英語) BEHAVIOR ESTIMATION DEVICE, BEHAVIOR ESTIMATION METHOD, AND BEHAVIOR ESTIMATION PROGRAM 新技術説明会
発明の概要(英語) [Problem] To realize object operation skill learning that is robust against change in a condition.
[Solution] A action estimation device 100 includes a collection unit 200 for performing bidirectional communication between a master robot and a slave robot and thereby collecting skill data from when the slave robot is operated in a plurality of different conditions using a bilateral system in which the slave robot can be operated via the master robot. The action estimation device 100 also includes an action estimator 300 for estimating a command value for causing the slave robot 520 to automatically act, on the basis of the skill data collected by the collection unit 200 and a response outputted from the slave robot 520.
従来技術、競合技術の概要(英語) BACKGROUND ART
In recent years, various object manipulation operation performed by the human robot is demanded can be replaced. Object manipulation to the operation, for example, in agricultural harvest or collection, a construction work, warehouse picking, cooking, surgical, washing and the like.
Object manipulation operation in order to substitute a robot, using reinforcement learning technique to learn the operation of the robot of ordinary skill in the object is known. Reinforcement learning is, in an environment that is observed in the current state of the agent, determining the action to be taken which is a kind of machine learning. The agent, behaviors are selected from the environment it is possible to obtain a reward. Is the reinforcement learning, a series of action through the most reward learning strategies may be obtained.
However, using reinforcement learning technique to learn the operation of ordinary skill in the object, a huge number of object manipulation. Go is the operation object such as a software that can be reproduced because they do not, one such attempt cannot be performed at a high speed. Therefore, it is desirable to reduce the number of attempts.
On the other hand, the learning object using the robot to mimic the operation of ordinary skill in the known method is learned. Learning is to mimic, for example the position command of the operator when operating the robot and collecting the data, the operation object on the basis of the collected data to cause the robot to learn the skill. Mimic the use of learning can be expected to significantly reduce the number of attempts.
However, in the field of learning to mimic the time of data collection between the robot and the operator does not consider the bi-directional, a person of ordinary skill in the operation object may not be exhibited sufficiently has been a problem. As a result, the operation success rate is not high enough to improve the object there is room for improvement.
In this regard, bi-directional between the robot and the operator in consideration of bilateral system has been known. Is bilateral system, the master robot and the operator, the master robot and slave robot operating in conjunction with the two-way control is carried out. Of the master robot as a function of storing data, thereby to reproduce the operation of the operator and the slave robot can be.
  • 出願人(英語)
  • ※2012年7月以前掲載分については米国以外のすべての指定国
  • UNIVERSITY OF TSUKUBA
  • 発明者(英語)
  • SAKAINO Sho
国際特許分類(IPC)
この特許について質問等ある場合は、電子メールによりご連絡ください。

PAGE TOP

close
close
close
close
close
close