Top > Search of International Patents > INFORMATION EXTRACTION APPARATUS, INFORMATION EXTRACTION METHOD, AND INFORMATION EXTRACTION PROGRAM

INFORMATION EXTRACTION APPARATUS, INFORMATION EXTRACTION METHOD, AND INFORMATION EXTRACTION PROGRAM meetings

Foreign code F160008836
File No. (S2015-0247-N0)
Posted date Aug 18, 2016
Country WIPO
International application number 2015JP084974
International publication number WO 2016098739
Date of international filing Dec 14, 2015
Date of international publication Jun 23, 2016
Priority data
  • P2014-253058 (Dec 15, 2014) JP
Title INFORMATION EXTRACTION APPARATUS, INFORMATION EXTRACTION METHOD, AND INFORMATION EXTRACTION PROGRAM meetings
Abstract Even in cases where a structured document specification has been altered, provided is an information extraction apparatus capable of easily and reliably extracting predetermined information extracted prior to the alteration and following the alteration. The information extraction apparatus (100) comprises a control unit (120) that extracts differing portions between a plurality of structured documents as variable elements and that extracts elements within a predetermined range from each of the variable elements as peripheral information and a memory unit (140) that sets at least one of the variable elements as an object to be extracted and that stores the variable elements and the peripheral information for at least the object to be extracted. The control unit re-extracts the variable elements and the peripheral information from the plurality of structured documents, calculates a degree of likeness between the variable elements and the peripheral information before and after re-extraction on the basis of the re-extracted variable elements and the peripheral information and the variable elements and the peripheral information stored in the memory unit, and identifies variable elements corresponding to the object to be extracted from the re-extracted variable elements on the basis of the calculated degree of likeness.
Scope of claims [claim1]
1. As you acquire the plural documents which are structured, you extract the part which differs between the plural documents which are acquired, as the variable element it extracts the element which from each variable element is inside the specified range, as peripheral information the control section and,
The memory section which among the aforementioned variable elements at least deal with one extraction, at least concerning the aforementioned extraction object the aforementioned variable element and houses the information around the description above and,
Possessing,
The description above which corresponds to the aforementioned extraction object on the basis of aforementioned similarity the description above acquiring the plural documents which are structured for the second time, as it re-extracts the part which differs between the plural documents which you acquire for the second time as the variable element, it re-extracted the aforementioned control section, the element which from each variable element which it re-extracts is inside the specified range as peripheral information, calculated the aforementioned variable element of re-extraction front and back and similarity of information around the description above re-are extracted on the basis of with the aforementioned variable element and the information around the description above and the aforementioned variable element and the information around the description above which is housed in the aforementioned memory section, calculated,The variable element it specifies from midst of the aforementioned variable element after the re-extracting,
Information extracting equipment.
[claim2]
2. From midst of the aforementioned variable element after the re-extracting, the variable element whose similarity for the variable element of the aforementioned extraction object is highest it specifies, in claim 1 the information extracting equipment of statement.
[claim3]
3. It calculates similarity of the aforementioned variable element which re-is extracted and the aforementioned variable element which is housed in the aforementioned memory section, at the same time it calculates with the information around the description above which re-is extracted, and similarity of information around the description above which is housed in the aforementioned memory section the variable element which corresponds to the aforementioned extraction object on the basis similarity of the aforementioned variable element and with of similarity of information around the description above, it specifies from midst of the aforementioned variable element after the re-extracting, in claim 1 the information extracting equipment of statement.
[claim4]
4. It divides the numeric part and the letter part which are respectively included in the aforementioned variable element which re-is extracted and the aforementioned variable element which is housed in the aforementioned memory section, into the aforementioned numeric part and the aforementioned letter part, it decides similarity of the aforementioned variable element on the basis similarity of the aforementioned numeric part and with of similarity of the aforementioned letter part, in claim 1 the information extracting equipment of statement.
[claim5]
5. The aforementioned variable element is extracted the description above by calculating the finite difference of the plural documents which are structured, in claim 1 the information extracting equipment of statement.
[claim6]
6. The indicatory department which indicates the aforementioned variable element which is extracted and,
The input section which inputs the aforementioned extraction object which is selected from midst of the aforementioned variable element which is indicated by the user and,
Furthermore it possesses, in claim 1 the information extracting equipment of statement.
[claim7]
7. The plural times you acquire the document which is dealt with, you exclude from the aforementioned variable element the plural times specified frequency the part which differs as an exclusion element between the documents which are acquired, in claim 1 the information extracting equipment of statement.
[claim8]
8. The step which acquires the plural documents which are structured and,
The step which extracts the part which differs between the plural documents which you acquire as the variable element and,
The step which extracts the element which from each variable element is inside the specified range as peripheral information and,
The step which among the aforementioned variable elements at least deal with one extraction, at least concerning the aforementioned extraction object the aforementioned variable element and houses the information around the description above in the memory section and,
The description above the step which acquires the plural documents which are structured for the second time and,
For the second time the part which differs between the plural documents which are acquired as a variable element the step which re-is extracted and,
The element which from each variable element which re-is extracted is inside the specified range as peripheral information the step which re-is extracted and,
Re-are extracted on the basis with of the aforementioned variable element and the information around the description above and the aforementioned variable element and the information around the description above which is housed in the aforementioned memory section, the aforementioned variable element of re-extraction front and back and the step which calculates similarity of information around the description above and,
The variable element which corresponds to the aforementioned extraction object on the basis of aforementioned similarity it calculated, the step which specifies from midst of the aforementioned variable element after the re-extracting and,
It includes, information extraction method.
[claim9]
9. From midst of the aforementioned variable element after the re-extracting, the variable element whose similarity for the variable element of the aforementioned extraction object is highest it specifies, in claim 8 information extraction method of statement.
[claim10]
10. It calculates similarity of the aforementioned variable element which re-is extracted and the aforementioned variable element which is housed in the aforementioned memory section, at the same time it calculates with the information around the description above which re-is extracted, and similarity of information around the description above which is housed in the aforementioned memory section the variable element which corresponds to the aforementioned extraction object on the basis similarity of the aforementioned variable element and with of similarity of information around the description above, it specifies from midst of the variable element after the re-extracting, in claim 8 information extraction method of statement.
[claim11]
11. It divides the numeric part and the letter part which are respectively included in the aforementioned variable element which re-is extracted and the aforementioned variable element which is housed in the aforementioned memory section, into the aforementioned numeric part and the aforementioned letter part, it decides similarity of the aforementioned variable element on the basis similarity of the aforementioned numeric part and with of similarity of the aforementioned letter part, in claim 8 information extraction method of statement.
[claim12]
12. The aforementioned variable element is extracted the description above by calculating the finite difference of the plural documents which are structured, in claim 8 information extraction method of statement.
[claim13]
13. The step which indicates the aforementioned variable element which is extracted and,
The step which inputs the aforementioned extraction object which is selected from midst of the aforementioned variable element which is indicated by the user and,
Furthermore it includes, in claim 8 information extraction method of statement.
[claim14]
14. The plural times you acquire the document which is dealt with, you exclude from the aforementioned variable element the plural times specified frequency the part which differs as an exclusion element between the documents which are acquired, in claim 8 information extraction method of statement.
[claim15]
15. The step which acquires the plural documents which are structured and,
The step which extracts the part which differs between the plural documents which you acquire as the variable element and,
The step which extracts the element which from each variable element is inside the specified range as peripheral information and,
The step which among the aforementioned variable elements at least deal with one extraction, at least concerning the aforementioned extraction object the aforementioned variable element and houses the information around the description above in the memory section and,
The description above the step which acquires the plural documents which are structured for the second time and,
For the second time the part which differs between the plural documents which are acquired as a variable element the step which re-is extracted and,
The element which from each variable element which re-is extracted is inside the specified range as peripheral information the step which re-is extracted and,
Re-are extracted on the basis with of the aforementioned variable element and the information around the description above and the aforementioned variable element and the information around the description above which is housed in the aforementioned memory section, the variable element of re-extraction front and back and the step which calculates similarity of peripheral information and,
The variable element which corresponds to the aforementioned extraction object on the basis of aforementioned similarity it calculated, the step which specifies from midst of the aforementioned variable element after the re-extracting and,
The information extraction program in order to make the computer execute.
  • Applicant
  • ※All designated countries except for US in the data before July 2012
  • INTER-UNIVERSITY RESEARCH INSTITUTE CORPORATION RESEARCH ORGANIZATION OF INFORMATION AND SYSTEMS
  • Inventor
  • SAKAMOTO KAZUNORI
  • HONIDEN SHINICHI
IPC(International Patent Classification)
Specified countries National States: AE AG AL AM AO AT AU AZ BA BB BG BH BN BR BW BY BZ CA CH CL CN CO CR CU CZ DE DK DM DO DZ EC EE EG ES FI GB GD GE GH GM GT HN HR HU ID IL IN IR IS JP KE KG KN KP KR KZ LA LC LK LR LS LU LY MA MD ME MG MK MN MW MX MY MZ NA NG NI NO NZ OM PA PE PG PH PL PT QA RO RS RU RW SA SC SD SE SG SK SL SM ST SV SY TH TJ TM TN TR TT TZ UA UG US UZ VC VN ZA ZM ZW
ARIPO: BW GH GM KE LR LS MW MZ NA RW SD SL SZ TZ UG ZM ZW
EAPO: AM AZ BY KG KZ RU TJ TM
EPO: AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR
OAPI: BF BJ CF CG CI CM GA GN GQ GW KM ML MR NE SN ST TD TG
Please contact us by E-mail or facsimile if you have any interests on this patent.

PAGE TOP

close
close
close
close
close
close