Characterization and expression analysis of four members genes of flavanone 3-hydroxylase families from Chamaemelum nobile

Chamaemelum nobile is a traditional Chinese herbal medicine, whose secondary metabolites used in the pharmacology of Chinese medicine. Among them, the flavonoids have great research value. Flavanone 3hydroxylase (F3H) is one of the core enzymes in the early steps of flavonoid biosynthesis. This study aimed to elucidate the structures, functions, and expression levels of F3H families from C. nobile. Four members of the F3H family were screened from C. nobile transcriptome data and performed bioinformatics analysis. Results showed that CnF3H1~4 had a high similarity with the other F3H plants, and all genes contained two conserved isopenicillin N synthase-like and oxoglutarate/iron-dependent dioxygenase domains. Further analysis revealed that the four CnF3H proteins contained some differences in binding sites. The results of secondary and 3-D structures displayed that the composition and proportion of the four CnF3H secondary structures were basically the same, and their 3D structures were consistent with the secondary structures. The phylogenetic tree displayed that CnF3H2, CnF3H3, and CnF3H4 were grouped with Asteraceae. The expression patterns of CnF3Hs in the roots, stems, leaves, and flowers of C. nobile were evaluated using the value of RPKM. The results indicated that CnF3Hs had significant difference in the expression of different tissues. Especially, CnF3H1~3 and CnF3H4 had the highest expression levels in the flowers and roots, respectively. Hence, CnF3Hs played a significant role in the flavonoid metabolism.


Introduction
Chamaemelum nobile is a perennial herb from the Asteraceae family, which is native to southwestern Europe, spread all over Europe, and distributed in southwestern Asia (Ma et al., 2007). C. nobile is a traditional Chinese herbal medicine, whose secondary metabolites used in the pharmacology of Chinese medicine mainly 103 include ester volatile oil with calming nerve (Melegari et al., 1988), flavonoids with anti-inflammatory activity (Achterrath-Tuckermann et al., 1980), and sesquiterpene lactone with antibacterial activity (Farhoudi, 2013). In addition the medicinal value of C. nobile, its extracts are widely used in formulations of cosmetics, shampoos, hair dyes, and other supplies (Jaziri et al., 1999). Previous scientists have distilled volatile oils from the flowers of C. nobile, but only 0.4%-1.5% (dry weight) (Jaziri et al., 1999;Carnat et al., 2004). A few sesquiterpenoids have been isolated and identified from C. nobile, such as α-bisabolol, (E)-β-farnesene, terpene alcohol, and chamazulene (Irmisch et al., 2012;Newall et al., 1996;Schilcher et al., 2005). Only a few genes have been cloned and analyzed, such as 3-hydroxy-3-methyl-glutaryl-CoA reductase , farnesyl diphosphate synthase (Su et al., 2015), phosphomevalonate kinase gene , and mevalonate kinase gene . However, the genes of the flavonoid metabolic pathway are rarely reported in C. nobile.
Flavonoids are important secondary metabolites of plants, which are often bound in vegetables, fruits, beans, nuts, and other plants (Cheng et al., 2018). It can be divided into flavonols, dihydroflavonols, flavones, flavanones, flavanols, anthocyanins, isoflavones, and chalcones depending on the structure (Zhao et al., 2010). Flavonoids are natural antioxidants with anti-aging effects because of their ability to scavenge free radicals (Dooner et al., 1991;Taylor and Grotewold, 2005). It also has the function of protecting the heart (Middleton et al., 2000;Peer et al., 2004), protecting the plants from ultraviolet radiation (Li et al., 1993), and affecting the fertility of plants ( Meer et al., 1992).
Flavanone 3-hydroxylase (F3H) is a non-heme iron enzyme, which is an oxidized glutarate-dependent oxygenase and mainly relies on Fe 2+ , oxygen, and 2-ketoglutaric acid to work (Charrier et al., 1995). F3H catalyzes flavanone to yield dihydroflavonol, which is one of the important enzymes in the early stage of flavonoid biosynthesis (Deboo et al., 1995) (Figure 1). Therefore, the expression of F3H gene can affect flavonoid content in plants, which may change the depth and types of colors (Elomaa et al., 1993). Until now, the F3H genes in some plants have been cloned, and their functions have been described in detail (Fan et al., 2014;Hu et al., 2014;Shen et al., 2016). F3H gene is expressed in plant roots, stems, leaves, and flowers (Huang et al., 2013;Kumar et al., 2015). F3H gene can affect all parts of plant coloring, because it regulates the synthesis of flavonoids in plants, while the structure of flavonoid is susceptible to the environment of the vacuole, ambient light, and temperature (Elomaa et al., 1993). For example, it could affect the color of strawberry  and raspberry fruits (Lee et al., 2009), Lycoris radiata flowers (Huang et al., 2013), and Reaumuria trigyna leaves (Zhang et al., 2014). F3H gene can affect the growth and development of plants, because most insect-pollinated plants generally have attractive and colorful petals (Zhao et al., 2015). Plants accumulate protective pigments, such as anthocyanins, in the epidermal cells, thereby reducing the damaging effects of solar radiation to the inner cell. Thus, the level of F3H in plants affects the ability of plants to resist ultraviolet radiation, because the F3H gene is the central locus of the flavonoid synthesis pathway, while the specific relationship remains unknown (Braun and Tevini, 1993;Xu et al., 2008). In this study, the F3H families of C. nobile were analyzed, and the expression patterns of F3Hs in various tissues of C. nobile were explored.

Plant material
The seedlings of C. nobile were grown in the greenhouse of Yangtze University, Jingzhou, China. The roots, stems, leaves, and flowers of C. nobile were collected in April 15th, 2017. All samples were immediately frozen in liquid nitrogen and then kept in the refrigerator at -80 °C.

RNA-Seq data analysis
In previous studies, transcriptome sequencing of C. nobile was performed. The resultant unigenes were further aligned by BlastX to the protein databases. Sequence similarity protein sequences were obtained from the nr database to retrieve sequences sharing the highest sequence similarity with F3H genes.

Bioinformatics analysis
The physicochemical parameters of the four obtained candidate unigenes, which had been described in detail in previous studies, were evaluated using various bioinformatics tools. Vector NTI 11.5.1 was used to determine an open reading frame (ORF). The nucleotide sequences and deduced amino acid sequences were compared through database search by the bioinformatics software on websites (http: //www. ncbi. nlm. nih.gov/BLAST/) and (http://web.expasy.org). Amino acid composition and hydrophobicity analysis were conducted using Bioedit 7.0. Multiple sequence alignments were performed using DNAMAN 6.0, and the conserved protein domains were predicted using InterPro Scan (http:// www.ebi.ac.uk/interpro/). Protein secondary structure prediction was implemented via online tools (https://npsa-prabi.ibcp.fr/cgibin/secpred_sopma.pl). The comparative modeling of the 3D structure of CnF3Hs was generated using the SWISS-MODEL. 3D structural analyses were performed using Weblab Viewerlite. Phylogenetic tree was constructed through the NJ method by using Clustal X 2.0 and MEGA 5.0. The sequences of the transcription group of C. nobile were obtained from the pre-transcriptase sequences, and the expression of the CnF3H gene was determined by FPKM (Mortazavi et al., 2008). Results

Transcriptome-guided unigenes retrieval
In previous studies, transcriptome sequencing of C. nobile was performed (data have been published). The resultant unigenes were further aligned to the protein databases by using BlastX. Sequence similarity protein sequences were obtained from the nr database to retrieve sequences sharing the highest sequence similarity with F3Hs. Results revealed eleven unigenes showing similarity with F3H from RNA-Seq data. Their transcript IDs were listed in Table 1. These unigenes were further analyzed by NCBI to identify their ORFs. All these unigenes were predicted to contain the length of cDNA sequences (Table 1). After obtaining the sequences by comparison, four highly similar sequences were selected for further analysis (Number: 1, 2, 6, and 7). Identification and characterization of F3H proteins of C. nobile ExPASy online prediction tools were used to predict the protein encoded by the CnF3H genes, and the predicted amino acid lengths of the four CnF3H genes (CnF3H1~4) were 300, 350, 341, and 239 bp. The amino acid composition of the CnF3Hs was analyzed using the Bioedit 7.0 software ( Figure 2) in which leucine (Leu) had the highest in CnF3H1~3 with values of 1,083%, 9.48%, and 11.27%. Glutamic (Glu) was the highest in CnF3H4, which reached 8.44%, whereas cysteine (Cys) was the lowest in CnF3H1 with only 0.78%, and tryptophan (Trp) was the lowest in CnF3H2~4 with values of 1.2%, 0.52%, and 1.19%. The contents of the basic amino acids of CnF3H1~4 were 4.66%, 5.9%, 2.7%, and 3.8% for arginine (Arg), 3%, 2.6%, 2%, and 3.85 for histidine (His), and 7%, 5.5, 9.9, and 6.3 for lysine (Lys). The contents of acidic amino acids containing aspartic acid (Asp) were 5.2%, 7.7%, 5.8%, and 5%, respectively, while those containing glutamic acid (Glu) were 5.6%, 4.8%, 6.88%, and 8.4%, respectively. As shown in Figure 3, the hydrophilic region was larger than the hydrophobic region (positive region, hydrophobic region; negative, hydrophilic), further indicating that the protein encoded by the CnF3Hs is a hydrophilic protein. In addition, both the N-and C-termini of the CnF3H protein were hydrophilic and showed strong hydrophilicity or hydrophobicity in some regions. The F3H protein sequences of other plants were searched on the NCBI website (Table 2), and the CnF3H protein sequences were compared with those of the other plants F3H by using DNAMAN 6.0. The results shown in Figure 4 indicated that the protein sequences of CnF3H1~4 were homologous with those of other plant F3H proteins with similarity of 62.70%. Conserved domains of CnF3H protein were predicted using an online analysis tool InterPro. Results displayed that four CnF3Hs belonged to the isopenicillin N synthase-like superfamily, which all had two conserved domains, non-haem dioxygenase N-terminal domain (IPR027443), and oxoglutarate/iron-dependent dioxygenase (IPR005123) (Figure 4). According to Shen et al. (2006) and Huang et al. (2013), the analysis revealed that the four CnF3H proteins contained some differences in binding sites, which conserved His and Asp bound to Fe 2+ , and Arg and Ser were involved in the combination of 2-Oketoglutarate (Figure 4). Differences were observed in polypeptide dioxygenases with specific residues in the two highly conserved regions, and the binding sites for amino acid ferrous iron and α-ketoglutarate were the same in the four CnF3Hs.
To better understand the CnF3H protein, we predicted the secondary structure by using online tools (https://npsa-prabi.ibcp.fr/cgi-bin/secpred_sopma.pl). As shown in Figure 5A, CnF3H1~4 peptides were composed of α-helices represented in blue, random coils represented in purple, β-turns represented in green, and extension chains represented in red. Moreover, the composition and proportion of the four CnF3H secondary structures were basically the same. α-Helixes and random coils were the most abundant structural elements in the CnF3H secondary structure, while extension chains and β-folds were intermittently distributed in the protein ( Figure 5B). A comparative modeling of the 3D structure of CnF3Hs were performed using SWISS-MODEL based on the highest query coverage of the template Papaver somniferum (5o7y.1) (Kluza et al., 2018). 3D structural analyses were performed using Weblab Viewerlite to further elucidate the CnF3H protein. As seen from Figure 6, the four CnF3H proteins were approximately spherical, similar to the other F3Hs. The 2-ODD enzyme structure was located at the core position of the entire structure formed by the jellyroll motif, and the active sites of conjugated His and Asp residues of Fe 2+ and Fe were buried in the center of the enzyme. The surface of the four proteins had a conservative α-helix, which had a leucine zipper structure composed of Lue, Ile, Val, and Met, while Leu faced the center of the molecule and could not interact with the other proteins outside (Shen et al., 2006).

Phylogenetic analysis of F3H proteins from different plant species
To better understand the molecular evolution relationship of F3Hs protein, we used the Clustal X 2.0 and MEGA 5.0 software to compare the deduced amino acid sequence of CnF3H1~4 with the F3H of other plants on BLAST to construct the phylogenetic tree with the neighbor-joining method ( Table 2). The results in Figure 7 indicated that the phylogenetic tree of F3H was divided into five branches, including Brassicaceae, Solanaceae, Asteraceae, Rosaceae, and Gramineae.  The evolution of CnF3Hs basically accorded with the characteristics of plant taxonomy and had obvious species characteristics. Plants of the same family were on the same branch of the evolutionary tree. Brassica rapa, Brassica oleracea, Raphanus sativus, and Camelina sativa belonged to Brassicaceae. Solanum tuberosum and Solanum pennellii were classified under Solanaceae. Brachypodium distachyon, Oryza brachyantha, and Oryza sativa belonged to Gramineae. Prunus mume, Prunus avium, and Prunus persica were clustered under Rosaceae. The phylogenetic tree displayed that CnF3H2, CnF3H3, and CnF3H4 were clustered with Chrysanthemum morifolium and Dahlia pinnata, which belonged to Asteraceae. The genetic relationship between CnF3H2 and CnF3H4 was the nearest, that between CnF3H3 and Chrysanthemum morifolium was closest, and that between CnF3H1 and Solanaceae was the closest. Some changes possibly occurred due to some functional changes in the process of evolution. In terms of species, Asteraceae plants appeared earlier than the other plants. The genetic relationship between the CnF3H2~4 and CnF3H1 was different, but the sequence alignment of the amino acids showed that CnF3H had the same conservative region. Our results indicated that CnF3Hs were based on sequence characteristics and conserved structures (conserved motifs) that shared common evolutionary ancestors with the F3H from other plants.

Expression patterns of CnF3Hs in different tissues of C. nobile
As shown in Figure 8, the RPKM values in the roots, stems, leaves, and flowers were 3. 323948, 2.796359, 1.86812, and 20.99907 for CnF3H1, 0.453682846, 0.83263989, 0.83263989, and 3.949693814 for CnF3H2, 16.17092858, 15.23330874, 2.854766389, and 291.2184552 for CnF3H3, and 18.78227185, 10.49824245, 6.061711013, and 4.742910978 for CnF3H4, respectively. Hence, CnF3H1~4 had considerably difference in the expression of different tissues. CnF3H1~3 had highest expression level in the flowers, and CnF3H4 had the highest expression level in the roots. However, the lowest expression of the four CnF3Hs in different tissues of C. nobile were not the same; the expression level of CnF3H1 was the lowest in the leaves, CnF3H2 expression level was lowest in the roots, CnF3H3 expression was lowest in the leaves, and the expression quantity of CnF3H4 was lowest in the flowers.    A majority of F3H genes were involved in the regulation of flavonoid biosynthesis based on the cloned F3H genes from many plants (Huang et al., 2013;Ma and Guo, 2014;Zhou et al., 2015) but those in C. nobile were rarely reported. In this study, the F3H families were determined from C. nobile transcriptome data. Four highly similar sequences were selected for further analysis. CnF3H1~4 contained two conserved isopenicillin N synthase-like and oxoglutarate/iron-dependent dioxygenase domains, which were typical protein structures of F3H families and were involved in flavonoid biosynthesis. Four CnF3H proteins contained some differences in binding sites (Figure 4) because of the differences in polypeptide dioxygenases with specific residues in two highly conserved regions. Furthermore, CnF3H encoded protein belonging to the F3H gene family in the 2OG-FeII_Oxy dioxygenase family. The conservation of these motifs plays key roles in flavonoid biosynthesis (Huang et al., 2013). Thus, CnF3Hs may have the same function as the F3H in other plants in flavonoid biosynthesis. Our results were similar to previous studies about F3Hs in tea (Hu et al., 2010), Saussurea medusa (Jin et al., 2005), and Ginkgo biloba (Shen et al., 2006) plants. The amino acid compositions of the four CnF3Hs were similar ( Figure 2). Simultaneously, the amino acid composition of four F3H proteins the important domains were highly homologous, presumably because some plants need to accumulate many flavonoids, and the production of flavonoids and anthocyanin are mainly determined by the flow of metabolic pathways through the formation of the intermediate metabolite, and these highly homologous regions may play some certain functions in these processes. Therefore, F3H must maintain a very conservative genetic stability and evolutionary homogeneity. The composition and proportion of the four CnF3H secondary structures were the basically same, and the 3-D structures were consistent with the secondary structures ( Figure 5 and 6). Similar to the other F3Hs, the binding sites of CnF3H proteins mainly participated in and regulated protein metabolism, post-translational modification, and processing, which were important for protein activity (Peng, 2010;Zhang et al., 2010). As shown in Figure 7, a significant genetic relationship between CnF3Hs and the other plants, and the phylogenetic tree displayed that CnF3H2, CnF3H3, and CnF3H4 were clustered in Asteraceae. The genetic relationship between CnF3H1 and Solanaceae was the closest, because some changes in the gene lead to changes in certain functions during evolution, which need to be further confirmed by biotechnologies in the future. Although the genetic relationship between the CnF3Hs was different, the sequence alignment of the amino acids showed that the CnF3Hs had the same conservative region. Thus, CnF3Hs were based on sequence characteristics and conserved structures (conserved motifs) that shared common evolutionary ancestors with the F3H from other plants. Our findings will provide a theoretical basis for elucidating the functions of F3H in C. nobile, and the analysis results are of great significance to explore the mechanism of flavonoid synthesis at the molecular level.
In addition, some of the main components of flavonoids such as celery, rutin, luteolin, and quercetin, have been determined in other plants (Stobiecki and Kachlicki, 2006). However, the contents of flavonoids in various tissues of C. nobile remain unclear. The expression of F3H gene can affect flavonoid metabolites, and it also plays a key role in the synthesis of flavonoids (Kim et al., 2008). In general, the expression of F3H in flowers was relatively high in plants with flowers as the main application value (Zhou et al., 2015). In this paper, the FPKM values indicated that CnF3H1~4 had a remarkable difference in the expression of different tissues of C. nobile, the CnF3H1~3 had highest expression level in the flowers, and the expression level of CnF3H4 was highest in the roots (Figure 8). The same result was obtained in Camellia nitidissima, the expression level of F3H gene was the highest in the flowers, which had a difference expression in various organs (Zhou et al., 2015). Lycoris radiata had the same result (Huang et al., 2013). The above results demonstrated that the expression levels of F3H in different plant tissues were different, which may be the differences in organs that accumulation of flavonoid and F3H gene was likely to provide several flavonoids in the flowers. As reported in tea, the expression level of F3H gene is the highest in the mature leaves, and the expression level of F3H gene expression 112 is regulated by light (Hu et al., 2014). The F3H gene may be related to the accumulation of flavonoids in tea leaves. Generally, the expression of F3H gene may be differing in different plant tissues, and the amount of expression is higher in tissues with wide application value. This research will lay theoretical foundation for improving the yield of flavonoids in C. nobile. At present, the research about F3H gene is not deep enough, and its function and the relationship between related genes in the flavonoid biosynthesis pathway remain unclear. F3H gene plays a key role in the flavonoid biosynthesis pathway. To understand the biosynthesis of flavonoids and other active substances in C. nobile, we will determine the correctness of the screening sequence through gene cloning and further to confirm the expression patterns of these genes by real-time PCR. We will also further to verify the function of the F3H gene in C. nobile by transgenic technology. Our results will further reveal the regulation of the F3H gene on flavonoid biosynthesis and the correlation between the related genes in the synthetic pathway and clarify the mechanism of action of the F3H gene in the formation of flavonoids.

Conclusions
In this study, we screened four CnF3H genes and successfully characterized them. Bioinformatics analysis showed that CnF3H1~4 had a typical protein structure of F3H families, and CnF3Hs had the same conservative region of the amino acids. The expression patterns of CnF3Hs in the roots, stems, leaves, and flowers indicated substantial difference in the expression of different tissues according the RPKM values. This study will lay a theoretical foundation to further understand the specific function of F3H families in the flavonoid metabolism pathway.