TOP > 外国特許検索 > Method and a system for predicting protein functional site, a method for improving protein function, and a function-modified protein

Method and a system for predicting protein functional site, a method for improving protein function, and a function-modified protein

外国特許コード F110005274
整理番号 E04401US2
掲載日 2011年8月29日
出願国 アメリカ合衆国
出願番号 34520503
公報番号 20030105615
公報番号 7231301
出願日 平成15年1月16日(2003.1.16)
公報発行日 平成15年6月5日(2003.6.5)
公報発行日 平成19年6月12日(2007.6.12)
国際出願番号 JP1998000430
国際公開番号 WO1998033900
国際出願日 平成10年2月2日(1998.2.2)
国際公開日 平成10年8月6日(1998.8.6)
優先権データ
  • 1998JP000430 (1998.2.2) WO
  • 特願1997-019248 (1997.1.31) JP
  • 特願1997-019249 (1997.1.31) JP
  • 特願1997-332100 (1997.12.2) JP
  • 特願1998-018699 (1998.1.30) JP
  • 09/697,138 (2000.10.27) US
  • 09/355,486 (1999.9.20) US
発明の名称 (英語) Method and a system for predicting protein functional site, a method for improving protein function, and a function-modified protein
発明の概要(英語) The present application provides a method for predicting the functional site of a protein using data of the entire proteins of an organism of which genome data or cDNA data is known.
More specifically, the present application provides a method for predicting a protein functional site, comprising the steps of calculating the frequency of occurrence of an oligopeptide in the entire proteins, calculating the value of each amino-acid residue contributing to the frequency of occurrence as the representative value of the function, and predicting the protein functional site by using the representative value of function as an indicator.
The present also provides a system for predicting a functional site for automatically performing said methods.
Additionally, the present application provides a method for preparing a function-modified protein comprising subjecting the amino-acid residues composing the functional site identified by the method described above to artificial mutation, and a novel thermophilic DNA polymerase prepared by the method.
従来技術、競合技術の概要(英語) BACKGROUND ART
Following the progress of genome analysis and cDNA analysis of various organisms including pathogenic microorganisms, the number of novel genes whose functions are unknown is rapidly increasing, together with the number of proteins encoded by the genes.
So far, the analysis of the nucleotide sequence of the whole genome of a microorganism, for example Mycoplasma genitalium (Fraser et al., Science 270, 397-403, 1995), Haemophilus influenzae (Fleischman et al., Science 269, 496-512, 1995), and Methanococcus jannaschii (Bult et al., Science 273, 1058-1073, 1996), has been completed, so that numerous novel proteins predicted from the genome sequence have been discovered.
For humans and mice, the cDNA analysis is under way in combination with the genome analysis, which brings about the discovery of a great number of novel proteins.
In such circumstance, the prediction of the function of a functionally unknown protein or a functional site thereof has been a significant issue.
If not only a novel protein but also a novel function or a novel functional site of a protein with a known function is discovered, whether or not these proteins are worth industrial or clinical application is possibly determined.
Furthermore, such prediction of function possibly enables to prepare a modified protein with a further improved function.
Whether or not a protein encoded by a gene elucidated by genome analysis or cDNA analysis is novel or has a known function has been determined conventionally by searching the homology through protein databases such as Swiss-Prot.
So as to predict a functional site, additionally, functionally identical proteins derived from various organisms are extracted from a protein database and are then subjected to alignment, to identify a region conserved in common to them and predict the conserved region as a functional site.
However, disadvantageously, such alignment method cannot be used if a protein obtained by genome analysis or cDNA analysis is an absolutely novel protein.
Even if the protein has homology with known proteins in a protein database, the conserved region occupies most of the amino acid sequence of the protein in case that the protein is homologous to proteins derived from closely related organisms, so that it is impossible to predict the functional site.
As to modification of protein, generally, the function of a protein is potentially deteriorated irrespective of the fact that the function is known or unknown once the conserved region is modified, even if the functional site is predicted by alignment.
Accordingly, the amino-acid residues outside the conserved region should be modified to improve the function.
In other words, it is required to find a novel functional site in such protein to be modified.
Using the conventional alignment method, disadvantageously, a novel functional site cannot be discovered or which amino-acid residue should be modified cannot be predicted.
Taking account of such circumstance, the present invention has been carried out.
It is an object of the present invention to provide a novel method for predicting a functional site of a functionally unknown protein obtained by genome analysis or cDNA analysis.
In accordance with the present invention, furthermore, it is an object to provide a system for predicting the function.
In accordance with the present invention, still furthermore, it is an object to provide a method for predicting a novel functional site of a protein with an unknown function or with a known function and subjecting the functional site to mutation to prepare a modified protein.
Still furthermore, it is an object of the present invention to provide a protein with a function modified by the method described above.

特許請求の範囲(英語) [claim1]
1. A method for predicting functional site of a functionally unknown protein obtained from an organism, in which amino acid sequences for all proteins expressed by the organism are estimated from known cDNA, said method comprises:
(1) determining in the amino acid sequences from all proteins of the organism, the frequency of occurrence of each amino acid and the frequency of occurrence of individual oligopeptides produced by permutations of twenty amino acids, and determining the smallest length (n) of oligopeptides having criteria of among oligopeptides of length (n), the number of oligopeptides which occur once in all of the proteins is smaller than the number of oligopeptides which occur twice in all of the proteins, andamong oligopeptides of length (n+1), the number of oligopeptides which occur once in all of the proteins is larger than the number of oligopeptides which occur twice in all of the proteins;(2) determining from all of the proteins of the organism, the frequency of occurrence of an Aji-oligopeptide of length (n+1), which is a fragment of the protein for predicting the amino-acid residues responsible for functional activity, and contains the j-th amino-acid residue Aj (n+1 <= j <= L-n) from the N-terminus of the amino acid sequence (length of L) of the protein, wherein the j-th amino-acid residue Aj is the i-th residue Aji from the N-terminus of the Aji-oligopeptide,the Aji-oligopeptide is aj1aj2, . . . Aji . . . ajnaj(n+1),1 <= i <= n+1,Aj is Aji and Aj is the i-th residue of the oligopeptide, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1), anddetermining from all of the proteins of the organism, the frequency of occurrence of an Xji-oligopeptide of length (n+1), wherein the Xji-oligopeptide is aj1aj2 . . . Xji . . . ajnaj(n+1), and further wherein1 <= i <= n+1,n+1 <= j <= L-n;
and
the i-th residue Xji is any amino acid, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1);(3) calculating ratio value Yji of the frequency of occurrence of the Aji-oligopeptide to that of the Xji-oligopeptide;(4) determining mean value Yj of the value Yji, wherein
(Equation image 10 not included in text) (5) determining Zj, wherein Zj value is defined as the representative value of the function of the j-th amino-acid residue Aj of the amino acid sequence (length of L), and wherein Zj=f(Yj), and function f is a monotonously decreasing function or a monotonously increasing function;
and(6) repeating steps (2) to (5) sequentially and determining the Zj value of each Aj of all the amino-acid residues at positions between n+1 <= j <= L-n in the amino acid sequence (length of L), thereby predicting the degree of involvement of each amino-acid residue of said sequence in the function of the protein by using Zj value as an indicator.
[claim2]
2. The method according to claim 1, wherein the Zj value (n+1 <= j <= L-n) of each amino-acid residue in the amino acid sequence (length of L) is expressed in a distribution chart.
[claim3]
3. A system for automatically predicting functional site of a functionally unknown protein obtained from an organism, in which amino acid sequences for all proteins expressed by the organism are estimated from known cDNA, which comprises:
(a) an outer memory unit for memorizing the amino acid sequences of all proteins of the organism and an existing protein data base;(b) a first calculation/memory unit for calculating the frequency of occurrence of each amino acid and the frequency of occurrence of individual oligopeptides produced by permutations of twenty amino acids, in the amino acid sequences of all of the proteins of the organism, and a memory unit for storing the calculation results therein;(c) a second calculation/memory unit for calculating the smallest length (n) of oligopeptides having the criteria among the individual oligopeptides of which the frequencies of the occurrences being memorized in the unit (b) of among oligopeptides of length (n), the number of oligopeptides which occur once in all of the proteins is smaller than the number of oligopeptides which occur twice in all of the proteins, andamong oligopeptides of length (n+1), the number of oligopeptides which occur once in all of the proteins is larger than the number of oligopeptides which occur twice in all of the proteins, and
a memory unit for storing the calculation results therein;
(d) a third calculation/memory unit for calculating from all of the proteins of the organism, the frequency of occurrence of an Aji-oligopeptide of length (n+1), which is a fragment of the protein for predicting the amino-acid residues responsible for functional activity, and contains the j-th amino-acid residue Aj (n+1 <= j <= L-n) from the N-terminus of the amino acid sequence (length of L) of the protein, wherein the j-th amino-acid residue Aj is the i-th residue Aji from the N-terminus of the Aji-oligopeptide,the Aji-oligopeptide is aj1aj2 . . . Aji . . . ajnaj(n+1),1 <= i <= n+1,Aj is Aji and Aj is the i-th residue of the oligopeptide, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1), andcalculating from all of the proteins of the organism, the frequency of occurrence of an Xji-oligopeptide of length (n+1), wherein the Xji-oligopeptide is aj1aj2 . . . Xji . . . ajnaj(n+1), and further wherein1 <= i <= n+1,n+1 <= j <= L-n;
and
the i-th residue Xji is any amino acid, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1), and
a memory unit for storing the calculation results therein;
(e) a fourth calculation/memory unit for calculating ratio value Yji of the frequency of occurrence of the Aji-oligopeptide to that of the Xji-oligopeptide, and a memory unit for storing the calculation results therein;(f) a fifth calculation/memory unit for calculating mean value Yj of the value Yji, wherein
(Equation image 11 not included in text)
and a memory unit for storing the calculation results therein;
and (g) a sixth calculation/memory unit for determining Zj, wherein Zj value is defined as the representative value of the function of the j-th amino-acid residue Aj of the amino acid sequence (length of L), and wherein Zj=f(Yj), and function f is a monotonously decreasing function or a monotonously increasing function, and a memory unit for storing the calculation results therein;
wherein said system causes said first through sixth units to sequentially, in order from said first to said sixth units, to perform the respective calculations so as to determine the Zj value of each Aj of all the amino-acid residues at positions between n+1 <= j <= L-n in the amino acid sequence (length of L), thereby predicting the degree of involvement of each amino-acid residue of said sequence in the function of the protein by using Zj value as an indicator.
[claim4]
4. The system according to claim 3, the system being equipped with a display unit displaying the Zj value (n+1 <= j <= L-n) of each amino-acid residue in the amino acid sequence (length of L) in a distribution chart.
[claim5]
5. A computer-readable medium on which a program is stored, said program causing a computer to execute a method for predicting functional site of a functionally unknown protein obtained from an organism, in which amino acid sequences for all proteins expressed by the organism are estimated from known cDNA, said method comprises:
(1) determining in the amino acid sequences from all proteins of the organism, the frequency of occurrence of each amino acid and the frequency of occurrence of individual oligopeptides produced by permutations of twenty amino acids, and determining the smallest length (n) of oligopeptides having criteria of among oligopeptides of length (n), the number of oligopeptides which occur once in all of the proteins is smaller than the number of oligopeptides which occur twice in all of the proteins, andamong oligopeptides of length (n+1), the number of oligopeptides which occur once in all of the proteins is larger than the number of oligopeptides which occur twice in all of the proteins;(2) determining from all of the proteins of the organism, the frequency of occurrence of an Aji-oligopeptide of length (n+1), which is a fragment of the protein for predicting the amino-acid residues responsible for functional activity, and contains the j-th amino-acid residue Aj (n+1 <= j <= L-n) from the N-terminus of the amino acid sequence (length of L) of the protein, wherein the j-th amino-acid residue Aj is the i-th residue Aji from the N-terminus of the Aji-oligopeptide,the Aji-oligopeptide is aj1aj2 . . . Aji . . . ajnaj(n+1),1 <= i <= n+1,Aj is Aji and Aj is the i-th residue of the oligopeptide, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1), anddetermining from all of the proteins of the organism, the frequency of occurrence of an Xji-oligopeptide of length (n+1), wherein the Xji-oligopeptide is aj1aj2 . . . Xji . . . ajnaj(n+1), and further wherein1 <= i <= n+1,n+1 <= j <= L-n;
and
the i-th residue Xji is any amino acid, andaj1 is Aj-i+1, . . . ,aj(n+1)=Aj-i+(n+1);(3) calculating ratio value Yji of the frequency of occurrence of the Aji-oligopeptide to that of the Xji-oligopeptide;(4) determining mean value Yj of the value Yji, wherein
(Equation image 12 not included in text) (5) determining Zj, wherein Zj value is defined as the representative value of the function of the j-th amino-acid residue Aj of the amino acid sequence (length of L), and wherein Zj=f(Yj), and function f is a monotonously decreasing function or a monotonously increasing function;
and(6) repeating steps (2) to (5) sequentially and determining the Zj value of each Aj of all the amino-acid residues at positions between n+1 <= j <= L-n in the amino acid sequence (length of L), thereby predicting the degree of involvement of each amino-acid residue of said sequence in the function of the protein by using Zj value as an indicator.
[claim6]
6. A program recorded on a computer-readable medium for causing a computer to execute a method for predicting functional site of a functionally unknown protein obtained from an organism, in which amino acid sequences for all proteins expressed by the organism are estimated from known cDNA, said method comprises:
(1) determining in the amino acid sequences from all proteins of the organism, the frequency of occurrence of each amino acid and the frequency of occurrence of individual oligopeptides produced by permutations of twenty amino acids, and determining the smallest length (n) of oligopeptides having criteria of among oligopeptides of length (n), the number of oligopeptides which occur once in all of the proteins is smaller than the number of oligopeptides which occur twice in all of the proteins, andamong oligopeptides of length (n+1), the number of oligopeptides which occur once in all of the proteins is larger than the number of oligopeptides which occur twice in all of the proteins;(2) determining from all of the proteins of the organism, the frequency of occurrence of an Aji-oligopeptide of length (n+1), which is a fragment of the protein for predicting the amino-acid residues responsible for functional activity, and contains the j-th amino-acid residue Aj (n+1 <= j <= L-n) from the N-terminus of the amino acid sequence (length of L) of the protein, wherein the j-th amino-acid residue Aj is the i-th residue Aji from the N-terminus of the Aji-oligopeptide,the Aji-oligopeptide is aj1aj2 . . . Aji . . . ajnaj(n+1),1 <= i <= n+1,Aj is Aji and Aj is the i-th residue of the oligopeptide, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1), anddetermining from all of the proteins of the organism, the frequency of occurrence of an Xji-oligopeptide of length (n+1), wherein the Xji-oligopeptide is aj1aj2 . . . Xji . . . ajnaj(n+1), and further wherein1 <= i <= n+1,n+1 <= j <= L-n;
and
the i-th residue Xji is any amino acid, andaj1 is Aj-i+1, . . . , aj(n+1)=Aj-i+(n+1);(3) calculating ratio value Yji of the frequency of occurrence of the Aji-oligopeptide to that of the Xji-oligopeptide;(4) determining mean value Yj of the value Yji, wherein
(Equation image 13 not included in text) (5) determining Zj, wherein Zj value is defined as the representative value of the function of the j-th amino-acid residue Aj of the amino acid sequence (length of L), and wherein Zj=f(Yj), and function f is a monotonously decreasing function or a monotonously increasing function;
and(6) repeating steps (2) to (5) sequentially and determining the Zj value of each Aj of all the amino-acid residues at positions between n+1 <= j <= L-n in the amino acid sequence (length of L), thereby predicting the degree of involvement of each amino-acid residue of said sequence in the function of the protein by using Zj value as an indicator.
  • 発明者/出願人(英語)
  • DOI HIROFUMI
  • HIRAKI HIDEAKI
  • KANAI AKIO
  • JAPAN SCIENCE AND TECHNOLOGY AGENCY
国際特許分類(IPC)
米国特許分類/主・副
  • 702/19
  • 702/23
  • 702/27
参考情報 (研究プロジェクト等) ERATO DOI Bioasymmetry AREA
ライセンスをご希望の方、特許の内容に興味を持たれた方は、問合せボタンを押してください。

PAGE TOP

close
close
close
close
close
close