Abstract

The Cytochrome c oxidase (COX) region of mitochondrial DNA is the most studied region of the fish mitochondrial genome. COX is one of the largest protein coding genes of metazoan mitochondrial genome.COX (E.C.1.9.3.1) is the terminal member of the respiratory chain catalyzing the reduction of dioxygen to water by ferrocytochrome C. A 19.927 kDa COX has been characterized in Channa punctata where COX III dimensional structure was generated using Deep View/ Swiss Pdp Viewer 3.7(Sps) by homology modeling, predicted model was validated in RAMPAGE Server and COX secondary structure is predicted by PSIPRED, PHYRE and TNHMM Server. Protein statics was carried out by using the SAPS and CLC sequence viewer. The predicted 3-D model shows that most 95 % of residues have φ and ψ angle in the core and allowed regions, α-helix region 89.33%, β-sheet 9.33%, fully allowed regions 81.52%, additionally allowed regions 14.13%, generally allowed regions 3.26%, isoelectric point 4.95 and aliphatic index 121.72, from these results we can understand the structure and properties of protein COX. However, homology modeling will give more insight on DNA bar coding.

Introduction

Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template assignment, model building, and model evaluation. The number of protein sequences that can be modeled, as well as the accuracy of the prediction, is increasing steadily because of the growth and number of known protein sequences and structures as well as improvements in the modeling software. It is currently possible to model, with useful accuracy. Significant parts of approximately one half of all known protein sequences²². Despite progress in ab initio protein structure prediction³⁹, comparative modeling remains the only method that can reliably predict the 3-D structure of a protein with an accuracy comparable to a low-resolution experimentally determined structure. Even models with errors may be useful, because some aspects of function can be predicted from only coarse structural features⁴¹⁸.

The spotted snakehead, Channa punctata locally known as spotted murrel, is one of the highly priced fresh water food fish species in India. It is distributed throughout the South- East Asia countries. The fish is well known for its good taste, high protein content and fewer intramuscular spines, high nutritive value, recuperative, medicinal qualities and its recommended as a diet during convalescence. Snakeheads breed naturally during the South West Monsoon (June – September) and North- East Monsoon (October – December) in flooded rivers, paddy fields, ponds and ditches in southern parts of India¹⁷. Over the last ten years its wild population has undergone a steady decline due to the fishing, loss of habitat, introduction of alien species, disease, pollution, siltation, poisoning, dynamite and destructive fishing, these factors not only destroyed the breeding and feeding grounds but also caused havoc to the biodiversity of this important fishery. As a result, according to IUCN status, it has been listed among the 66 low risk near threatened fish species in India. Due to it hardness and air breathing nature it has been identified as one of the cultivable species for aquaculture in derelict, swampy and oxygen depleted water bodies¹⁷.

Fish mitochondrial DNA (mt DNA) all have a similar genomic organization¹⁴¹⁶²¹ and are similar to other vertebrates including humans. Many parts of mt DNA such as those coding for protein genes or regulatory part as the control region are used as genetic markers for measurement of intra species and inter species diversity. This quality is because of an increased mutation rate for mt DNA, relative to nuclear DNA, which result in an accumulation of many base substitutions over a long period of time, providing tools for taxonomic, evolutionary and phylogentic research²¹³²⁰²⁷.

Cytochrome C oxidase I region (COX/CO-I) of mitochondrial DNA is the most studied regions in DNA bar coding¹⁵. COX/CO (E.C. 1.9.3.1.) is the terminal member of the respiratory chain catalyzing the reduction of dioxygen to water ferrocytochrome C. Biological identification of species through DNA barcodes has become popular in recent years mt genome of animals represent a better target for analysis then the nuclear genome because of its lack of introns and its limited exposure to recombination and its haploid mode of inheritance, COX likely possess a greater range in phylogenetic signal than any other mitochondrial gene¹².

Materials and Methods

The Channa punctata COX/CO-I sequence was obtained from the National Centre for Biotechnology Information (NCBI) protein database (http://www.ncbi.nih.gov) (Assession number : ABY59029.1). The experimental structures used for construction of the model where Bovine heart cytochrome c oxidase at fully oxidized state (PDB 1v54A) which had 95.699 % identity with target protein was used as template for comparative modeling. The structural alignment was generated using Deep View – Swiss Pdb Viewer Software¹¹ (http://www.expasy.org/spdbv/) and the manual correction were carried out. The model was validated with RAMPAGE server (http://mordred.bioc.cam.ac.uk/-rapper/rampage.php). The secondary structure COX was predicted by using PSIPRED server (http://www.bioinf.cs.ucl.ac.uk/psipred/), Phyreserver (http://www.sbg.bio.ic.ac.uk/phyre), transmemberane protein helix probability curve was analyzed by using TMHMM server V.2.0 (http://www/cbs.dtu.dk/services/TMHMM.2.0),MITOPROT(http://ihg2.helmholtzmuenchen.de/ihg/mitoprot.html.) used for prediction of N-terminal protein hydrophobic value was calculated. The energy minimization was realized using Anolea server, SAPS (statistical analysis of protein sequences) (http://www.isrec.isp.ch/cgi-bin/SAPS) and CLC Sequence Viewer 6.0.2 was used for protein statistics.

Results

We have developed a three dimensional model for the cytochrome c oxidase in Channa punctata (Accession No. ABY 59029.1) For modeling the template protein were obtained by Swiss model template library, and select the template bovine cytochrome oxidase at the fully oxidized state PDB(1v54A), X -ray resolution (1.80Å), the sequence identity-95.699% and E-value is 8.61 e-77.The model was constructed for the target protein (COX/CO I) using Deep View/Swiss Pdb Viewer 3.7 (Sps) and Swiss model server. Based on structural alignments of 3-D model for C.punctata (COX/CO-I) was obtained (Figure 1). The Ramachandran plot indicated that most (95 %) of residues have φ and ψ angle in the core and allowed regions (Figure 2 & 2a) the bond angle, bond length and torsion angles were in the range of value expected for a naturally folded protein. α-helix region 89.33 %, β-sheet 9.33 %, fully allowed regions 81.52 %, additionally allowed regions 14.13 %, generally allowed regions 3.26 %. The secondary structure of COX results revealed that the coil regions were detected Coil I (Leu 1 – Asp 5), C – II (Iso 29 – Gly 33), C – III (Iso 41 – Ser 55), C – IV (Val 72 – Try 80), C – V (Tyr 82 – Ser 96), C – VI (Meth 125 – Leu 137), C – VII (Ser 152 – Leu 157), C – VIII (Meth 162 – Gly 186). Among the eight coil regions, C-VIII, C-V and C-III have the 14 sequence. The helix regions were highlighted that the regions of H1 (Glu 6 – Meth 28), HII (Asp 34 – Meth 40), HIII (Phe 56 – Ala 71), HIV (Val 97 – Asp 124), HV (Phe 137 – Leu 151) and HIV (Ala 156 – The 161) for strand (The 81 – Val 82). The COX domain was divided into coil I – VIII (56.45 %) Helix I-IV (47.84%) and Strand (1.071%). Transmembrane helix probability curve (Figure. 3), hydrophobic probability value -0.0487, positive charge (1.6%), negative charge (4.3%), total charge(5.9%), net charge(-2.7%) and major hydrophobic (42.5%). COX amino acid frequency and distribution were calculated by using CLC sequence viewer 6.0.2, the mean values were given as Leucine - 0.151, Alanine - 0.108, Isoleucine -0.081, Glycine - 0.081, Proline - 0.075, Theronine -0.065 ,Valine -0.065 Serine - 0.059, Asparagines - 0.048, Aspartic Acid -0.038, Tryptophan- 0.022, Tyrosine -0.022, Arginine - 0.011, Lysine-0.005, Glutamic Acid -0.005 and Amino acid composition were presented in the Figure. 3. From these results one can understand the structure and function of protein cytochrome c oxidase in proteomic level and this type of study is not possible in wet lab which is time consuming and laborious.

Fig. 1: Three dimensional structure of Cox Channa punctata

Fig. 2: Three dimensional structure of Cox of Channa punctata

Fig. 3: Ramachandran Plot of Cox ABY59029.1.pdb model Glycine and Proline allowed regions

Fig. 4: Ramachandran Plot of Cox ABY59029.1.pdb model

Discussion

The complete genomes of a number of organisms have been sequenced and many more genome-sequencing projects are underway. Structural biology now faces the arduous task of characterizing the shapes and dynamics of the encoded proteins to facilitate the understanding of their functions and mechanisms of action. Recent developments; in the techniques of structure determination at atomic resolution, X-ray diffraction and nuclear magnetic resonance spectroscopy, have enhanced the quality and the speed of structural studies²⁸. Nevertheless, current statistics still show that the known protein sequences (~1,000,00;⁷) vastly out number the available protein structures ~20,000;⁵. Fortunately, domains in protein sequences are gradually evolving entities that can be clustered into a relatively small number of families of domains with similar sequences and structures (i.e., folds;²⁶. These evolutionary relationships make it possible to use computational methods, such as threading¹⁰ and comparative protein structure modeling¹⁶¹⁸ to predict the structures of protein sequences based on their similarity to known protein structures.

Many structural genomics efforts, in fact, combine the experimental structure determination methods and the computational modeling techniques to determine a sufficient number of appropriately selected structures, so that most other sequences can be placed within modeling distance of at least one known structure. To maximize the number of proteins that can be modeled reliably, a concerted effort toward structure determination of new folds by X-ray crystallography and nuclear magnetic resonance spectroscopy is in order, as envisioned by structural genomics¹⁹²⁴²⁵²⁶. It has been estimated that 90% of all globular and membrane proteins can be organized into approximately 16,000 families containing protein domains with more than 30% sequence identity to each other²⁶. Of these families, 4000 are already structurally defined; the others present suitable targets for structural genomics. The full potential of the genome-sequencing projects will only be realized once all protein functions are assigned and understood. Comparative modeling will play an important bridging role in these efforts.

Hebert¹² proposed that a single gene sequence would be sufficient to differentiate all (or) at least the vast majority of animal species, and proposed the use of mitochondrial DNA gene cytochrome c oxidase sub unit 1 COI as a global bio identification system for animals popularly known as DNA bar coding, protein coding cytochrome c oxidase sub unit COI gene are responsible well conserved proven to be robust evolution marker for determining inter specific relationship. Rupasinghe and Schuler²³ worked on homology model for plant cytochrome P_450S.They suggested that homology modeling represents a reliable and relatively rapid alternative method for analyzing structure-function relationships and predicting – substrates for many P_{450S (}CytochromeP_450Smonooxygenases.

In most of the published results⁸¹²¹⁵ have worked on DNA bar coding. The Secondary and 3D structure of cytochrome c oxidase was totally ignored. The 3 -Dimensional structure of cytochrome c oxidase (cox) in Indian Channa species hitherto unreported. This prompted us to investigate the cox of mitochondrial gene in Channa Sp. The homology modeling will give more insight on DNA bar coding .So the present study deals with homology modeling of cox using tools like BLASTP, Swiss Pdb Viewer 3.7 (sps), Swiss model Server, Rampage Server (Ramachandran Plot), PSIPRED, PHYRE, TMHMM, MITOPROT , SAPS, Rasmol and CLC sequence viewer 6.0.2. The Insilco approach helps the researchers by giving them an hand-in idea, so that they can happily and easily work on the structure predication of the target protein Cox/Co I. This type of study is helpful to conclude the 2-D, 3-D structure and functions of Cox/Co I and also it is useful for DNA bar coding studies.

Fig. 5: Transmembrane protein helix probability curve of Cox

Fig. 6: Amino acid frequency and distribution of Cox protein

Fig. 7: Secondary structure of The COX Protein

References

Abhilash M. Nandhini M. 2010. Homology modeling and Insilco analysis of hamaggutinin protein from HINI influenza A virus. Inte. J. Phar. Sci and Res., 1 (1):40-50.
Avise C. Phylogeography. 2001. The History and Formation of Species. Harward University Press. Cambridge.
Baker D. 2000. A surprising simiplicity to protein folding. Nature. 405:39-42.
Baker D and Sali A. 2001. Protein structure predication and structural genomics. Science 294:93-96.
Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, Fagan P, Marvin J, Padillia D, Ravichabndran V, Scneider B, Thanki N, Weissig H, Westbrook JD and Zardecki C. 2002. The Protein Data Bank. Acta. Crystallogr. D. Biol. Crystallogr., 58:899-907.
Blundell TL, Sibanda BL, Stenberg MJ and Thornton JM. 1987. knowledge based prediction of protein structures and design of novel molecules. Nature 326:347-352.
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estericher A, Gasteiger E, Martin MJ, Michoud KO, Donovan C, Phan I, Pilout S, and Schnedier M. 2003. The SWISS PROT Protein knowledgebase and its supplement TrEMBL in 2003.Nucleic Acids Res., 31:365-370.
Boore JL. 1999. Animal Mitochondrial genomes, Nucleic Acids. Res., 27:1767-1780.
Bonneau R, Baker D. 2001. Ab intio protein structure predication; progress and prosepects. Annu.Rev .Biophys.Biomol.Struct., 30:173-189.
Domingues FS, Koppensteiener WA and Sippal MJ. 2000. The role of protein structure in genomics. FEBS.LETT., 476:98-102.
Guex N and Peitsch MC. 1997. Swiss Model and Swiss Pdb Server: An environment for comparative protein modeling. Electrophoresis 18:2714-2723.
Hebert PDN, Cywinska A, Ball SL and Dewarrd JR. 2003. Biological identifications through DNA Barcodes. Proc. R. Soc. Lond. Biol. Sci., 270:313-321.
Kartavtsev KP, Park TJ, Vinnikov KA, Ivankov Sharina SN, Lee JS. 2007. Cytochrome b cyt-b gene sequence analysis in six flatfish species (Teleostei; Pleuronectide) with Phylogenetic and Taxonomic insights. Mar. Sci., 152:757-773.
Kim IC, Jung SO, Lee YM, Lee CJ, Park JK, and Lee JS. 2005. The Complete mitochondrial genome of the ray fish Raja porosa (Chondrichtyes; Rajiidae).DNA Seq., 16:187-194.
Kranthi S, Kranthi KR, Bharose AA, Syed SN, Dhawad CS and Patil EK. 2006. Cytochrome oxidase I sequence of Helicoverpa (Noctuidae: Lepidoptera) species in India- Its utility as a molecular tool, J. Biotech., 5:195-199.
Lee JS, Miya M, Lee YS, Kim CG, Park EH, Aokiy Nishida M. 2001. The complete DNA sequence of the mitochondrial genome of the self fertilizing fish Rivulus marmoratus (Cyprinnodontifome Rivuilidae) and the first description of the duplication of control region in fish .Gene., 280:1-7.
Marimuthu and Haniffa 2010. Asi Fish Sci.
Marti- Renom MA. Stuart A. Fiser A. Sanchez R. Melo F. and Sali A. 2000. Comparative protein structure modeling of gens and genomes. Annu. Rev. Biophys. Biomol. Struct., 29:291-325.
Montelione GT and Anderson S. 1999. Structural genomics keystone for a human proteome project. Nat.Struct.Biol., 6:11-12.
Nei M, Kumar S. 2000. Molecular Evolution and Phylogenetics. ; Oxford University Press, New York. 333.
Nohara M, Nishida M, Miya M, Nishi Kawa T. 2005. Evolution of the mitochondrial genome in cephalochordate as inferred from complete nucleotide sequence from two epigenichty species. J .Mol. Evol., 60:526-537.
Piper U, Eswar N, Ilyin VA, Stuart A and Sali A. 2002. Modbase, a database of annotated comparative protein structure models. Nucleic Acids. Res., 30:255-259.
Rupasinghe I and Schuler MA. 2006. Homology modeling of plant cytochrome P450s. Phytochem. Rev., 5:473-505.
Sali A. 1998. 100, 000 protein structures for the biologist. Nat. Struct. Biol., 5:1029-1032.
Sanchez R, Pieper U, Melo F, Eswar N, Marti-Renom MA, Madhusudhan MS, Mirkovic Nand Sali A. 2000. Protein structure modeling for structural genomics. Nat. Struct. Biol., 7:986-990.
Vikup D, Melamud E, Moult J and Sander C. 2001. Completeness in structural genomics. Nat. Struct .Biol ., 8:559-566.
Wallace DC. 1992. Disease of the mitochondrial DNA. Annu. Rev. Biochem., 61:1175-1212.
Zhanng C and Kim SH. 2003. Overview of structural genomics from structure to function. Curr. Opin. Chem. Biol., 7:28-32.

Homology modeling and In silico analysis of COX from Channa punctata (Bloch)

Authors