Top > Search of International Patents > Solution search system, solution search method, and solution search program

Solution search system, solution search method, and solution search program UPDATE_EN

Foreign code F190009916
File No. 08135-US
Posted date Aug 26, 2019
Country United States of America
Application number 201414780166
Gazette No. 20160042291
Date of filing Mar 17, 2014
Gazette Date Feb 11, 2016
International application number JP2014001506
International publication number WO2014156044
Date of international filing Mar 17, 2014
Date of international publication Oct 2, 2014
Priority data
  • P2013-066768 (Mar 27, 2013) JP
  • 2014JP01506 (Mar 17, 2014) WO
Title Solution search system, solution search method, and solution search program UPDATE_EN
Abstract A solution search system, which can search for a test object, expected to output an optimal result, from at least two test objects each outputting a result based on a probability distribution, includes: a record superiority comparing unit which can obtain past records of the test objects based on an accumulation of the output results and can compare all records for superiority/inferiority; a controlling unit which can control a measurement variable of the test objects, based on the compared superiority/inferiority records, and a latest result output from the test object; and an output instructing unit which can instruct the test object, the measurement variable of which has exceeded a threshold value, to output a result, wherein the output instructing unit determines, as a desired solution, at least one test object to which the largest number of the output instructions have finally been given after repetition of the output instructions.
Outline of related art and contending technology BACKGROUND ART
Conventionally, a bandit problem has been known as a representative example of a problem of searching for a solution capable of maximizing an expected value. A purpose of the bandit problem is to maximize an expected value of the total of rewards one can receive. In the bandit problem, a player repeats choosing, by a certain action, one option from n-types of different options of action. After each selection, a result selected from a probability distribution which depends on the selected action is given to the player as a reward.
Take the following case as an example: There are several slot machines, and a player can get a coin (a reward) by pulling a lever of the machine, under a certain probability distribution. The probability distribution (a winning rate) of getting a coin differs among each of the slot machines, and the player has no knowledge about the winning rate. In such case, a most common method for evaluating the winning rate of each of the slot machines is simply to play each of the slot machines multiple times one after another. The slot machine which actually provides the highest reward is determined to be the machine of the highest winning rate.
In the above method, however, the player has to play the slot machines for a considerable number of times in order to determine a machine with the highest winning rate in fact. This, as a result, requires a large investment. It is necessary, therefore, to create an algorithm capable of finding a solution efficiently, as well as reducing an investment as much as possible, in searching for the winning rate of each of the slot machines.
To find a solution, the above case can be applied to the bandit problem described above, which is a problem of searching for a solution capable of maximizing an expected value of the total of rewards (for example, refer to Non Patent Literature 1). In particular, a combinatorial bandit problem has recently been drawing attention. In the combinatorial bandit problem, a combination of options which is expected to output an optimal result is selected from n-types of different options of action. There is a need for the combinatorial bandit problem in various fields, other than selecting a combination of slot machines which is likely to provide a high dividend from among a plurality of slot machines. For example, the combinatorial bandit problem can be used in selecting an optimal channel combination which is able to maximize the amount of data transmission in cognitive radio communication, an optimal advertisement combination to maximize the click count in Internet advertising, and a portfolio of financial instruments with the highest return on investment. In these applications, the bandit problem is a commoner type of a combinatorial reward maximization problem. In other words, there are multiple players, and the amount of reward for each player is determined depending on the choice by the player (for example, by a payoff matrix). In the present description, however, a combinatorial reward maximization problem in which each slot machine is independent of each other (particularly, a combination of two slot machines) is described with examples for simplification.
Scope of claims [claim1]
1. A solution search system configured to search for a test object expected to output an optimal result, among at least two test objects each of which outputs a result based on a probability distribution, the solution search system comprising:
a record superiority comparing unit configured to obtain a past record for each of the test objects, based on an accumulation of the results having been output, and to compare the records of the test objects in terms of superiority and inferiority in relation to the records of all the test objects;
a controlling unit configured to perform a control to increase or decrease a measurement variable for each of the test objects, based on the superiority/inferiority of the record having been compared by the record superiority comparing unit, and a latest result having been output from the test object; and
an output instructing unit configured to instruct the test object, the measurement variable of which has exceeded a threshold value, to output a result, wherein
the output instructing unit determines, as a desired solution, at least one test object to which the largest number of the output instructions have finally been given after repetition of the output instructions.

[claim2]
2. The solution search system according to claim 1, the solution search system being further configured to search for a combination of test objects which is expected to output an optimal result, among at least three test objects each of which outputs a result based on a predetermined probability distribution, wherein
the output instructing unit determines, as a desired solution, a combination of test objects to which the largest number of the output instructions have been given after repetition of the output instructions.

[claim3]
3. The solution search system according to claim 1, the solution search system being further configured to search for a combination of test objects, each of the test objects having a probability distribution which varies chronologically in response to an output instruction from the output instructing unit, and
when searching for the solution, at least two solution-search systems are used.

[claim4]
4. The solution search system according to claim 1, wherein:
the record superiority comparing unit raises the past record of one test object to a superior level, when the test object has output a superior result, while the other test objects have output inferior results, and
the record superiority comparing unit lowers the past record of one test object to an inferior level, when the test object has output an inferior result, while the other test objects have output superior results.

[claim5]
5. The solution search system according to claim 1, wherein:
in the record superiority comparing unit, an internal resource value is a difference between the record of one test object and the mean of the records of all the test objects, and
the controlling part:
increases the measurement variable, when the internal resource value is positive, and the latest result having been output from the test object is superior;
maintains the measurement variable, when the internal resource value is positive, and the latest result having been output from the test object is inferior;
increases the measurement variable, when the internal resource value is zero, and the latest result having been output from the test object is superior;
decreases the measurement variable, when the internal resource value is zero, and the latest result having been output from the test object is inferior;
maintains the measurement variable, when the internal resource value is negative, and the latest result having been output from the test object is superior; and
decreases the measurement variable, when the internal resource value is negative, and the latest result having been output from the test object is inferior.

[claim6]
6. The solution search system according to claim 1, wherein the controlling unit performs a control so that the sum of the measurement variables which are assigned to the respective test objects can be kept constant.

[claim7]
7. A solution search program configured to search for a test object expected to output an optimal result, among at least two test objects each of which outputs a result based on a probability distribution, the solution search program comprising the steps of:
record superiority comparing, in which a past record is obtained for each of the test objects, based on an accumulation of the results having been output, and the records of the test objects are compared in terms of superiority and inferiority in relation to the records of all the test objects;
controlling, in which a control is performed to increase or decrease a measurement variable for each of the test objects, based on the superiority/inferiority of the record having been compared by the record superiority comparing unit, and a latest result having been output from the test object; and
output instructing, in which the test object, the measurement variable of which has exceeded a threshold value, is instructed to output a result, wherein
in the output instructing step, a computer is caused to determine, as a desired solution, at least one test object to which the largest number of the output instructions have finally been given after repetition of the output instructions.

[claim8]
8. The solution search program according to claim 7, wherein
in the record superiority comparing step,
the past record of one test object is raised to a superior level, when the test object has output a superior result, while the other test objects have output inferior results, and
the past record of one test object is lowered to an inferior level, when the test object has output an inferior result, while the other test objects have output superior results.

[claim9]
9. The solution search program according to claim 7, wherein:
in the record superiority comparing step, an internal resource value is a difference between the record of one test object and the mean of the records of all the test objects, and
in the step of controlling:
the measurement variable is increased, when the internal resource value is positive, and the latest result having been output from the test object is superior;
the measurement variable is maintained, when the internal resource value is positive, and the latest result having been output from the test object is inferior;
the measurement variable is increased, when the internal resource value is zero, and the latest result having been output from the test object is superior;
the measurement variable is decreased, when the internal resource value is zero, and the latest result having been output from the test object is inferior;
the measurement variable is maintained, when the internal resource value is negative, and the latest result having been output from the test object is superior; and
the measurement variable is decreased, when the internal resource value is negative, and the latest result having been output from the test object is inferior.

[claim10]
10. The solution search program according to claim 7, wherein
in the step of controlling, a control is performed so that the sum of the measurement variables which are assigned to the respective test objects can be kept constant.

[claim11]
11. A solution search method for searching for a test object expected to output an optimal result, among at least two test objects each of which outputs a result based on a probability distribution, the solution search method comprising the steps of:
record superiority comparing, in which a past record is obtained for each of the test objects, based on an accumulation of the results having been output, and the records of the test objects are compared in terms of superiority and inferiority in relation to the records of all the test objects;
controlling, in which a control is performed to increase or decrease a measurement variable for each of the test objects, based on the superiority/inferiority of the record having been compared by the record superiority comparing unit, and a latest result having been output from the test object; and
output instructing, in which the test object, the measurement variable of which has exceeded a threshold value, is instructed to output a result, wherein
in the step of output instructing, at least one test object to which the largest number of the output instructions have finally been given after repetition of the output instructions of the result is determined as a desired solution.

[claim12]
12. The solution search method according to claim 11, wherein
in the step of record superiority comparing,
the past record of one test object is raised to a superior level, when the test object has output a superior result, while the other test objects have output inferior results, and
the past record of one test object is lowered to an inferior level, when the test object has output an inferior result, while the other test objects have output superior results.

[claim13]
13. The solution search method according to claim 11, wherein:
in the step of record comparing, an internal resource value is a difference between the record of one test object and the mean of the records of all the test objects, and
in the step of controlling:
the measurement variable is increased, when the internal resource value is positive, and the latest result having been output from the test object is superior;
the measurement variable is maintained, when the internal resource value is positive, and the latest result having been output from the test object is inferior;
the measurement variable is increased, when the internal resource value is zero, and the latest result having been output from the test object is superior;
the measurement variable is decreased, when the internal resource value is zero, and the latest result having been output from the test object is inferior;
the measurement variable is maintained, when the internal resource value is negative, and the latest result having been output from the test object is superior; and
the measurement variable is decreased, when the internal resource value is negative, and the latest result having been output from the test object is inferior.

[claim14]
14. The solution search method according to claim 11, wherein in the step of controlling, a control is performed so that the sum of the measurement variables which are assigned to the respective test objects can be kept constant.
  • Inventor, and Inventor/Applicant
  • KIM Song-Ju
  • AONO Masashi
  • NAMEDA Etsushi
  • HARA Masahiko
  • RIKEN
IPC(International Patent Classification)

PAGE TOP

close
close
close
close
close
close