TOP > 外国特許検索 > Method of compiling three-dimensional object identifying image database, processing apparatus and processing program

Method of compiling three-dimensional object identifying image database, processing apparatus and processing program

外国特許コード F120006111
整理番号 S2008-0586
掲載日 2012年1月6日
出願国 アメリカ合衆国
出願番号 99040109
公報番号 20110058733
公報番号 8306315
出願日 平成21年4月27日(2009.4.27)
公報発行日 平成23年3月10日(2011.3.10)
公報発行日 平成24年11月6日(2012.11.6)
国際出願番号 JP2009058284
国際公開番号 WO2009133855
国際出願日 平成21年4月27日(2009.4.27)
国際公開日 平成21年11月5日(2009.11.5)
優先権データ
  • 特願2008-118646 (2008.4.30) JP
  • 2009JP058284 (2009.4.27) WO
発明の名称 (英語) Method of compiling three-dimensional object identifying image database, processing apparatus and processing program
発明の概要(英語) Provided are a method of generating a low-capacity model capable of identifying an object with high accuracy, and creating an image database using the model, a processing program for executing the method, and a processing apparatus that executes the process.
The method for compiling an image database that is used for a three-dimensional object recognition includes a step of extracting vectors as local descriptors from a plurality of images each image showing a three-dimensional object as seen from different viewpoints, a model creating step of evaluating the degree of contribution of each local descriptor to identification of the three-dimensional object, and creating a three-dimensional object model systematized to ensure approximate nearest neighbor search using the individual vectors which satisfy criteria, and a registration step of adding an object identifier to the created object model and registering the object model into an image database.
従来技術、競合技術の概要(英語) BACKGROUND ART
In recent years, as digital cameras are increasingly widespread and sophisticated, the digital cameras and devices using them are increasingly receiving attention as new information devices.
In addition, the increase in the memory capacity of hard disks allows individual people to possess a large amount of image data.
Accordingly, researches dealing with a large number of digital images or moving images are conducted actively.
As a field of such researches, there is research on recognition of three-dimensional objects included in images.
The techniques of recognizing three-dimensional objects included in images can be classified into a technique that generally recognizes the class of objects and a technique that recognizes the instance.
The former returns the class of objects, such as a chair and an automobile, as the result, whereas the latter identifies the instance such as a specific model of an automobile.
The present invention will focus on the latter, i.e., the identification of the instance, and description will be made in relation thereto.
Particularly, the present invention will focus on a three-dimensional object recognition, which uses local descriptors, for example based on a SIFT (Scale-Invariant Feature Transform)(e.g., see Non-Patent Literature 1).
In the conventional techniques, there is a technique which constructs a three-dimensional surface model of an object through matching of local descriptors, based on images of an object shot from various angles, so as to be used for recognition (e.g., see Non-Patent Literatures 2 and 3).
In addition, there is a technique that uses local descriptors extracted from an image for construction of a model to be matched with unknown images, without using a three-dimensional model (e.g., see Non-Patent Literatures 4 and 5).
The present invention relates to the latter approach.
As a simplest technique using such an approach, there is a technique in which a large number of local descriptors are extracted from images of an object shot under various conditions, and are stored for constructing a model.
Advantageously, this simple approach can easily realize highly accurate recognition.
However, since a huge number of the local descriptors will be obtained, there are problems in that it takes immense time to perform local descriptor matching, and in that it is difficult to perform a large-scale object recognition since a large memory is required for recognition.
As to the former problem, it is indispensable to improve the efficiency in the nearest neighbor searching of local descriptors.
Thus, in order to solve this problem, there is a technique using approximate nearest neighbor searching of local descriptors.
According to Noguchi et al., it is reported that with introduction of this technique into the object recognition, it is possible to realize a high-speed, highly accurate object recognition. (e.g., see Non-Patent Literature 6, and Patent Literature 1).
On the other hand, as to the latter problem, since the memory size of models (memory required for models) constitutes a large proportion of the memory required for recognition, reduction in the memory size of models is a main problem.
Meanwhile, of the three-dimensional object recognition techniques using local descriptors, such techniques that do not construct three-dimensional models of objects are advantageous, since with shot images of an object, it is possible to simply construct its model by extracting local descriptors therefrom.
In order to achieve accuracy in the three-dimensional object recognition using such simple techniques, a large number of images shot under various conditions are required for constructing a model.
Generally, since several dozen to several thousand local descriptors are extracted from one image, an extremely large number of local descriptors will be involved in modeling of an object, and how to deal with such local descriptors will be the main subject.
Most of the conventional techniques employ a method of vector-quantization of local descriptors so as to be replaced by representation vectors, which are called visual words.
In the case of recognizing an unknown image, local descriptors obtained from the image are replaced by the visual words so as to be matched.
In the case of identification of the instance of an object, it is known that, particularly, the more the number of the visual words is increased, the more the recognition rate will be improved, although the improvement depends on the recognition target.
For example, Nister et al. reported an example using 16 million visual words (see Non-Patent Literature 4).
In the case of using a large number of visual words, the calculation time required for matching between the local descriptors and the visual words is unignorable, and thus speeding-up by using various data structures such as a tree structure is necessary (see Non-Patent Literatures 4 and 5).
Among the techniques using such a large number of visual words, a technique of using all "cases" of the local descriptors without using vector quantization is the most extreme one.
With this approach, although high recognition rate can be expected, a problem will occur in that a huge memory will be required for model recording.
The simplest one of the recognition techniques may be such a technique in which a label indicating an object is added to a large number of local descriptors, which correspond to the above cases, and based on matching with those local descriptors which are obtained from unknown images, votes are cast for the label indicating the object.
Normally, the matching is performed using the nearest neighbor searching.
In such a process, since it is only necessary to assign a correct label to each local descriptor obtained from unknown images, it is not necessary to record all the local descriptors.
Here, "voting" is processing used for partially counting up evidences in the field of information processing, and is processing in which: based on each of the obtained evidences, a score is given to one of choices; and the choice that has obtained a top score, as a result of counting up scores based on all the evidences, is to be selected.
Generally, the score for voting varies depending on the evidences.
As a method of eliminating unnecessary local descriptors while guaranteeing the same effect as that in the case of recording all the local descriptors, a method called condensing is proposed.
For example, Wada et al. proposed a technique that is also efficiently applicable to a higher-dimensional space (e.g., see Non-Patent Literature 7).

CITATION LIST

Patent Literature
Patent Literature 1: International Publication No. 2008/026414

Non-Patent Literature
Non-Patent Literature 1: D. Lowe: "Distinctive Image Features from Scale-Invariant Keypoints", International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110 (2004)
Non-Patent Literature 2: F. Rothganger, S. Lazebnik, C. Schmid and J. Ponce: "3D Object Modeling and Recognition from Photographs and Image Sequences", Ponce et al., Eds., Toward Category-Level Object Recognition, LNCS4170, Springer, pp. 105-126 (2006)
Non-Patent Literature 3: D. Lowe: "Local Feature View Clustering for 3D Object Recognition", Proc.
CVPR2001, Springer, pp. 682-688 (2001)
Non-Patent Literature 4: D. Nister and H. Stewenius: "Scalable Recognition with a Vocabulary Tree", Proc.
CVPR2006, pp. 775-781 (2006)
Non-Patent Literature 5: S. Obdrzalek and J. Matas: "Sub-linear Indexing for Large Scale Object Recognition", British Machine Vision Conference (BMVC), pp. 1-10 (2005)
Non-Patent Literature 6: Kazuto Noguchi, Kouichi Kise, Masakazu Iwamura: "Efficient Recognition of Objects by Cascading Approximate Nearest Neighbor Searchers" Meeting on Image Recognition and Understanding (MIRU2007) Collection of papers, OS-B2-02, pp. 111-118 (2007)
Non-Patent Literature 7: Takekazu Kato, To shikazu Wada: "Algorithms and Evaluations for Efficient Condensing based on Proximity Graphs" Shingaku Giho PRMU, Vol. 103, No. 96, pp. 19-24 (2003)

特許請求の範囲(英語) [claim1]
1. A method for compiling an image database that is used for a three-dimensional object recognition comprising the steps of: extracting, from a plurality of images each image showing
a three-dimensional object from different viewpoint, a plurality of local descriptors each of which is a vector representing respective local features of each image;
constructing an object model of the three-dimensional object, the object model being obtained by estimating contribution of each vector to the three-dimensional object recognition, by choosing the vectors making positive contribution and by organizing the chosen vectors in such a manner that each vector is adapted to be used for approximate nearest neighbor searching; and
storing into the image database the images showing the three-dimensional object and the constructed object model with an object ID for identifying the three-dimensional object being attached, wherein: each of the steps is executed by a computer;
the storing step stores the object model and the corresponding object ID so that, when an image showing a three-dimensional object in question is given as a query while a plurality of object models are stored in the image database, the computer extracts a plurality of query local descriptors from the query through a similar step to the extracting step, retrieves vectors as neighbor vectors of each query local descriptor, each neighbor vector being retrieved from the stored object models in the image database by using an algorithm of the approximate nearest neighbor searching, obtains object IDs attached to the neighbor vectors, determines at least one three-dimensional object which is identified by the object IDs as a candidate and determines at least one three-dimensional object based on points of similarities and/or of differences between each query local descriptor and corresponding neighbor vector; and
the object model construction step estimates the contribution of each vector in such a manner that when a vector extracted from an image of a three-dimensional object is approximately nearest to another vector according to the same three-dimensional object from a different viewpoint, the vector is regarded to make a positive contribution, and when the vector is approximately nearest to another vector according to a different three-dimensional object, the vector is regarded to make a negative contribution.
[claim2]
2. The method according to claim 1, wherein the model construction step constructs the object model through the steps of:
specifying an approximate nearest vector to a target vector to be estimated, the approximate nearest vector being retrieved from vectors extracted from images showing from the different viewpoints the same three-dimensional object according to the target vector and from images of different three-dimensional objects;
counting up a score in the case where the approximate nearest vector is derived from the same three-dimensional object according to the target vector; and
choosing the vectors that construct the object model based on scores counted in the counting steps for each vector.
[claim3]
3. The method according to claim 2, wherein the model construction step scores each vector extracted from each image showing the three-dimensional object to be stored from the different viewpoints.
[claim4]
4. The method according to claim 3, wherein the model construction step chooses the vectors so that the vectors extracted from the images of the same three-dimensional object from different viewpoints are shared almost evenly in the object model.
[claim5]
5. The method according to claim 3, wherein the model construction step estimates the contribution of each vector to the recognition of a three-dimensional instance.
[claim6]
6. The method according to claim 2, wherein the model construction step chooses the vectors so that the vectors extracted from the images of the same three-dimensional object from different viewpoints are shared almost evenly in the object model.
[claim7]
7. The method according to claim 6, wherein the model construction step estimates the contribution of each vector to the recognition of a three-dimensional instance.
[claim8]
8. The method according to claim 2, wherein the model construction step estimates the contribution of each vector to the recognition of a three-dimensional instance.
[claim9]
9. The method according to claim 1, wherein the model construction step estimates the contribution of each vector to the recognition of a three-dimensional instance.
[claim10]
10. An apparatus for processing an image database that is used for a three-dimensional object recognition comprising: an extraction section which extracts, from a plurality of images each image showing a three-dimensional object from different viewpoint, a plurality of local descriptors each of which is a vector representing respective local features of each image;
a model construction section which constructs an object model of the three-dimensional object, the object model being obtained by estimating contribution of each vector to the three-dimensional object recognition, by choosing the vectors making positive contribution and by organizing the chosen vectors in such a manner that each vector is adapted to be used for approximate nearest neighbor searching;
a storing section which stores into the image database the images showing the three-dimensional object and the constructed object model with an object ID for identifying the three-dimensional object being attached; and
a retrieval section which, when an image showing a three-dimensional object in question is given as a query while a plurality of object models are stored in the image database extracts a plurality of query local descriptors from the query in a similar manner as in the extraction section; retrieves vectors as neighbor vectors of each query local descriptor, each neighbor vector being retrieved from the stored object models in the image database by using an algorithm of the approximate nearest neighbor searching, obtains object IDs attached to the neighbor vectors, determines at least one three-dimensional object which is identified by the object IDs as a candidate, and determines at least one three-dimensional object based on points of similarities and/or of differences between each query local descriptor and corresponding neighbor vector, wherein the object model construction section estimates the contribution of each vector in such a manner that when a vector extracted from an image of a three-dimensional object is approximately nearest to another vector according to the same three-dimensional object from a different viewpoint, the vector is regarded to make a positive contribution, and when the vector is approximately nearest to another vector according to a different three-dimensional object, the vector is regarded to make a negative contribution.
[claim11]
11. A non-transitory computer readable medium having a program for processing an image database that is used for a three-dimensional object recognition, the program causing a computer to function as: an extraction section which extracts, from a plurality of images each image showing a three-dimensional object from different viewpoint, a plurality of local descriptors each of which is a vector representing respective local features of each image;
a model construction section which constructs an object model of the three-dimensional object, the object model being obtained by estimating contribution of each vector to the three-dimensional object recognition, by choosing the vectors making positive contribution and by organizing the chosen vectors in such a manner that each vector is adapted to be used for approximate nearest neighbor searching;
a storing section which stores into the image database the images showing the three-dimensional object and the constructed object model with an object ID for identifying the three-dimensional object being attached; and
a retrieval section which, when an image showing a three-dimensional object in question is given as a query while a plurality of object models are stored in the image database: extracts a plurality of query local descriptors from the query in a similar manner as in the extraction section, retrieves vectors as neighbor vectors of each query local descriptor, each neighbor vector being retrieved from the stored object models in the image database by using an algorithm of the approximate nearest neighbor searching; obtains object IDs attached to the neighbor vectors; determines at least one three-dimensional object which is identified by the object IDs as a candidate; and determines at least one three-dimensional object based on points of similarities and/or of differences between each query local descriptor and corresponding neighbor vector,
wherein the object model construction step estimates the contribution of each vector in such a manner that when a vector extracted from an image of a three-dimensional object is approximately nearest to another vector according to the same three-dimensional object from a different viewpoint, the vector is regarded to make a positive contribution, and when the vector is approximately nearest to another vector according to a different three-dimensional object, the vector is regarded to make a negative contribution.
  • 発明者/出願人(英語)
  • INOUE KATSUFUMI
  • MIYAKE HIROSHI
  • KISE KOICHI
  • OSAKA PREFECTURE UNIVERSITY
国際特許分類(IPC)
米国特許分類/主・副
  • 382/154
  • 715/852

PAGE TOP

close
close
close
close
close
close