Characterization and Tissue-specific Expression of bHLH Genes in Dimocarpus longan

In plants, the basic helix-loop-helix ( bHLH ) transcription factors (TFs) play pivotal roles in many biological processes including growth, stress response, and secondary metabolite synthesis. To date, many bHLH genes have been identified and characterized in diverse plant species. However, little is known regarding the bHLH genes in Dimocarpus longan Lour. ( D. longan ). Based on RNA-seq data, we identified 42 putative bHLH genes from D. longan and determined their putative functions using the NCBI Conserved Domain Search Tool and Pfam databases. The physicochemical properties, phylogenetic relationships, conserved motifs, gene ontology (GO) annotations, protein-protein interactions, and tissue-specific expression patterns of these bHLH genes were systematically explored. In total, ten motifs were found in DlbHLH proteins using MEME, among which two were highly conserved. Phylogenetic tree analysis found that DlbHLH proteins can be divided into nine groups, with group 2 being the largest. GO annotation results showed that the DlHLH genes were involved in various molecular functions. RNA-seq and qRT-PCR results revealed important differences in the expression patterns of 17 of the DlbHLH genes. In particular, DlbHLH-9 , DlbHLH-19, DlbHLH-25, DlbHLH-26, and DlbHLH-35 were found to show significantly different expression patterns in root and leaf tissues. The results of this study will further enrich our knowledge regarding bHLH transcription factor genes and lay a foundation for enhancing the production of active secondary metabolites by genetic engineering in D. longan


Introduction
Transcription factors (TFs) are an important group of DNA-binding proteins that recognize and bind to specific DNA sequences to control transcription from DNA to mRNA at specific times and places. TFs are usually characterized by the possession of four functional regions, including a nuclear localization signal, a DNA binding domain, a transcription regulation domain, and an oligomerization site (Yang et al., 2012;Yamasaki et al., 2013;Guo and Wang, 2017). In plants, bHLH TFs are the second largest family after MYB TFs (Sun et al., 2018;Yu et al., 2019). These TFs contain the highly conserved bHLH domain, which includes both a basic region and a HLH region. The basic region is usually located at the N-terminus of the bHLH domain, and permits binding to E-box sequences (5'-CANNTG-3') in target gene promoters (Heim et al., 2003). The HLH region is usually located at the C-terminus of the bHLH domain, and is approximately 50 amino acids long. It contains two alpha helixes separated by a loop, and forms homodimeric or heterodimeric complexes with other bHLH proteins, thereby regulating their activity (Massari and Murre, 2000).
To date, the diversity of the bHLH family has been explored in many species including Arabidopsis thaliana, Brachypodium distachyon, peanut, Chinese cabbage, rice, Salvia miltiorrhiza, and tomato (Toledo-Ortiz et al., 2003;Li et al., 2006;Wang et al., 2015a;Zhang et al., 2015;Wu et al., 2016b;Chao et al., 2017;Niu et al., 2017). Moreover, it has also been shown that bHLH TFs play an important role in active secondary metabolism. The Lc protein, the first biosynthesis of secondary metabolites in D. longan and further highlight the importance of bHLH TFs in plants.

Materials and Methods
Plant material D. longan plant tissue was obtained from plants cultivated in a greenhouse with the humidity of 50% and temperature of 25 °C. Leaves from the upper peripheral branches and roots from 10 plants were collected from individuals after 2 months growth. All samples were immediately put in liquid nitrogen for later RNA isolation.

Identification of bHLH genes in D. longan
We obtained sequences from the RNA-seq data deposited to the non-redundant (NR) NCBI database (NCBI accession number: SRP155595) to identify 42 putative DlbHLH genes based on NR annotation. Moreover, we examined all sequences using the NCBI Conserved Domain Search Tool (https:// www. ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) and the Pfam database (http://pfam.xfam.org/) to further confirm the identity of all putative DlbHLH TFs. All 42 confirmed DlbHLH TFs were retained for further analyses.

Bioinformatics analysis of the DlbHLH genes in D. longan
ExPasy software was used to investigate the molecular weight, protein sequence length, instability index, aliphatic index, isoelectric point, and grand average of hydropathicity (GRAVY) for all DlbHLH proteins. The Self-Optimized Prediction method With Alignment (SOPMA) tool was used to predict the proportions of extended strands, alpha helices, beta turns, and random coils in all proteins. The conserved motifs present in the DlbHLH proteins were identified using Multiple Expectation Maximization for Motif Elicitation (MEME). A phylogenetic tree was generated by MEGA 7.0 using the neighbor-joining method with 1,000 bootstrap iterations (Tamura et al., 2011). The functional regulatory network of D. longan bHLH proteins was studied using the Protein-Protein Interaction Networks (STRING) online tool. Finally, Blast2GO PRO was applied to analyze the functional classification of bHLH proteins and to acquire detailed annotations (Conesa and Gotz, 2008). The online softwares used in this study were shown in Table 1. reported bHLH TF, is encoded by the maize R gene, and has been shown to be involved in regulating the expression of two structural genes related to the maize anthocyanin metabolic pathway (Ludwig et al., 1989). In snapdragon, the expression of the Delila gene, which contains a helix-loophelix domain, was also found to have close relationships with anthocyanin accumulation (Goodrich et al., 1992). Moreover, bHLH TFs participate in the regulation of terpenoid biosynthesis. AtMYC2, an Arabidopsis thaliana bHLH TF, can interact with the target promoter regions of sesquiterpene synthases genes, thereby activating their transcription and increasing sesquiterpene accumulation (Hong et al., 2012). A diterpenoid phytoalexin factor (DPF) belonging to the bHLH family has been shown to positively regulate the transcript level of rice diterpenoid phytoalexin (DP) genes, and is thereby linked to DP accumulation (Yamamura et al., 2015).
The functions of members of the bHLH TF family involved in secondary metabolite synthesis have been widely studied in many plant species. However, to date no studies have examined the bHLH TF family in D. longan. D. longan is a common fruit tree in China that is valuable for human consumption and medicine. However, the active secondary metabolites that are the medically active components of D. longan are accumulated in only small amounts in root and leaf tissues. Thus, identification of bHLH TF genes related to the accumulation of secondary metabolites may facilitate the development of D. longan plant resources in southern China. Recently, the genome sequencing of D. longan was performed, the results of which showed its genetic diversity. The genome sequence data analysis not only revealed the unique characteristics of D. longan, but also highlighted the genes that are possibly involved in the accumulation of secondary metabolites in D. longan (Lin et al., 2017). Genetic engineering and targeted breeding programs could then be used to enhance the production of these active secondary metabolites in D. longan.
In the present study, 42 bHLHs were identified and their physicochemical properties, motif compositions, phylogenetic relationships, gene ontology (GO) annotations, and protein-protein interactions were examined. In addition, we used RNA-seq and quantitative real time PCR (qRT-PCR) to study the expression patterns of these bHLHs in root and leaf tissues. The results of this study will lay the foundation for further studies of the 904

RNA isolation and quantitative real time PCR analysis
Total RNA was extracted from root and leaf of D. longan using the cetyltrime thylammonium bromide (CTAB) method (Jaakola et al., 2001). First strand cDNAs were synthesized using the TransScript® One-Step gDNA Removal and cDNA Synthesis SuperMix kit (TransGen Biotech, Beijing, China). The TransStart® Top Green qPCR SuperMix (TransGen Biotech, Beijing, China) was used for all qRT-PCR reactions according to the manufacturer's instructions. The reaction system were 20 μL, containing 10 μL 2×TransStart® Top Green qPCR SuperMix, l μL cDNA template, 7 μL ddH2O, 1 μL forward primer and 1 μL reverse primer. Primers used for qRT-PCR are shown in Table 2. The D. longan tubulin gene was used as reference. The qRT-PCR reactions conditions were as follows: 95℃ for 1 min, followed by 95℃ for 5 s, 60℃ for 30s, and 72℃ for 30 s. All experiments were performed in triplicate. Relative gene expression was computed using the relative quantification (2 −ΔΔCT ) method (Schmittgen and Livak, 2008).

Identification of D. longan bHLH genes and their physicochemical properties
Using NR annotations, the NCBI Conserved Domain Search Tool, and the Pfam database to analyze D. longan RNA-seq data (NCBI accession number: SRP155595), we identified 42 genes as putative DlbHLH TF genes. We designated the 42 DlbHLH genes DlbHLH-1 to DlbHLH-42 according to the order of these genes in the original RNA-seq experiment. We then assessed the physicochemical properties of these TFs; our analyses included determinations of their open reading frame (ORF) length, theoretical isoelectric point, aliphatic index, molecular weight, instability index (II), and grand average of hydropathicity (GRAVY), as well as the number of alpha helices, extended strands, beta turns, and random coils. The detailed results were shown in Table 3.
Analysis of conserved motifs in DlbHLH proteins MEME was used to investigate the motif compositions of the DlbHLH proteins identified here. In total, we identified 10 conserved motifs, and the positions where these motifs were found in DlbHLH proteins are shown in Fig. 1. All 42 DlbHLH proteins were found to contain Motif 1, and Motif 2 was also present in 40 of 42 DlbHLHs.

Phylogenetic analysis of DlbHLH proteins
A phylogenetic tree was used to investigate the evolutionary relationships among the 42 bHLH proteins from D. longan. Based on the classifications of rice and Arabidopsis thaliana (Toledo-Ortiz et al., 2003), the DlbHLH proteins could be divided into nine groups, with DlbHLH-3, DlbHLH-14, DlbHLH-25, and DlbHLH-40 unable to be classified (Fig. 2). Group 2, which contains 13 DlbHLH proteins, was the largest among these groups, whereas Groups 3, 4, 5, and 8 were the smallest-each had 2 DlbHLH proteins. The other 4 groups each contained 4-5 DlbHLH proteins.

Gene Ontology annotation of DlbHLH genes
GO annotation of all 42 DlbHLH genes as biological process, molecular function, or cellular component genes was performed using Blast2GO v5.2.5 with graph level 2 (Fig. 3). In total, 22 DlbHLH genes were identified as metabolic process genes, accounting for 37% of all DlbHLH genes identified. 22 DlbHLH genes were also identified as cellular process genes (37% of the total), followed by biological regulation (13%) and regulation of biological process (13%). Within the molecular function category, 26 and 9 DlbHLH genes were predicted to be associated with binding (74%) and transcription regulator activity (26%), respectively. Cellular component prediction showed that genes associated with five different terms were defined, including cell parts (27%), organelles (27%), cells (27%), protein-containing complexes (10%), and organelle parts (10%), respectively. Moreover, those DlbHLH genes that were predicted to have multiple classifications are listed in Table 4.

DlbHLH gene expression patterns in root and leaf tissues
The comparative expression of D. longan bHLH genes were analyzed in root and leaf tissues. Using previously published RNA-seq data, the expression patterns of the identified 42 DlbHLH genes were shown in a heat map in Fig. 5. As shown in Fig. 5, only 17 DlbHLH genes (40.48% of the total), including DlbHLH-3, , were shown to have obviously different expressions between root and leaf (Fig.5).
Next, in order to further validate the RNA-seq-derived patterns of gene expression, we performed qRT-PCR analysis on the 17 DlbHLH genes expressed in both root and leaf. These results, shown in Fig. 6, revealed that all 17 tested DlbHLH genes were expressed, with different levels of expression in root and leaf. The expression levels of 12 genes  were higher in roots than that in leaves, whereas the expression patterns of the other five genes  were the opposite. Significant differences in gene expression between root and leaf tissues were found for DlbHLH-9, . The expression levels of DlbHLH-9 and DlbHLH-35 in leaf were 43-and 80-fold higher than that in root, respectively. In contrast, the expression levels of  in root were 29-, 33-, and 27-fold higher than in leaf, respectively. Moreover, the patterns of expression of the 17 DlbHLH genes revealed by qRT-PCR were in accordance to the patterns previously found in the RNA-seq data.

Analysis of protein-protein interactions
In this study, Protein-Protein Interaction Networks (STRING) software was used to predict DlbHLH protein interactions. The results are shown in Fig. 4. As genomic data for D. longan were not available in STRING, bHLHs proteins from Arabidopsis thaliana with high homology to those from D. longan were selected as representatives for protein interaction studies, since-to some extent-these are likely to reflect the relationships among D. longan bHLH proteins.
As shown in Fig. 4, most DlbHLH proteins were predicted to interact with more than one bHLH protein.
Among the DlbHLH proteins that were predicted to interact with others, DlbHLH-3 (homologous to AT5G57150) and DlbHLH-25 protein (homologous to BHLH92) were predicted to be co-expressed-connected with a black line. Proteins linked with purple lines indicated interactions that were experimentally validated. In Arabidopsis thaliana, it has been experimentally determined that AT1G68810 and AT5G51780 could interact with AT2G31220 and AT3G61950, respectively. Therefore, since homologous proteins often have similar biological functions, we speculated that DlbHLH-11/DlbHLH-36 and DlbHLH-19/DlbHLH-24 could interact with DlbHLH-14 and DlbHLH-35, respectively.  (Xue et al., 2015). Moreover, enhancing the production of active secondary metabolites present in D. longan roots and leaves by gene engineering can significantly expand the scope of its application and increase its value as a crop.
In this study, D. longan RNA-seq data was used to identify and characterize the 42 putative DlbHLH genes. The number of bHLH genes varies among higher plants, lower plants, and fungi. Higher plants such as Brassica napus, Glycine max, and Panicum virgatum contain numerous bHLH genes, while only one bHLH gene was identified in the lower plants and fungi such as Bathycoccus, Ostreococcus tauri, Ostreococcus lucimarinus, and Helicosporidium . Based on these facts, we speculate that the bHLH gene family had undergone expansion during evolution, and that this expansion had likely resulted in the emergence of novel biological functions.

Discussion
Transcription factors are important regulatory genes involved in diverse biological processes, including plant growth, development, stress response and secondary metabolite synthesis. To date, only a few transcription factor families-such as the WRKY TF family-have been systematically studied in D. longan (Jue et al., 2018). No studies of the bHLH TFs have yet been performed in D. longan, although bHLH TFs have been identified and studied in many other plant species, including Arabidopsis, Brachypodium distachyon, and rice (Toledo-Ortiz et al., 2003;Li et al., 2006;Niu et al., 2017). Studies of bHLH TFs have demonstrated that they are closely related to diverse biological functions, especially those involved in secondary metabolite synthesis (Heim et al., 2003). D. longan, which is consumed both for food and medicine, has important commercial and medicinal values. Moreover, the root and leaf tissues of D. longan have been shown to possess The genetic engineering of transcription factors has proven to be an effective strategy to enhance the accumulation of secondary metabolites and to increase the yield of crops and medicinal plants (Gantet and Memelink, 2002). In the hairy roots of Salvia miltiorrhiza, overexpression of the SmbHLH10 gene has been shown to enhance the accumulation of tanshinones (Xing et al., 2018a), and overexpression of the SmbHLH148 gene induced tanshinone and phenolic acid productions (Xing et al., 2018b). Therefore, we speculate that the five DlbHLH genes that showed significantly different expression patterns in root and leaf (i.e.  deserves further study in their potential to enhance the production of valuable secondary metabolites in D. longan.

Conclusions
In this study, 42 DlbHLH genes were identified in D. longan using transcriptomic data, the NCBI Conserved Domain Search Tool, and the Pfam database. The physicochemical properties, phylogenetic relationships, conserved motifs, GO annotations, and protein-protein interactions of these genes were then examined using bioinformatics tools. Moreover, RNA-seq data and qRT-PCR results indicated that 17 of 42 DlbHLH genes expressed differently in root and leaf. Among these DlbHLH genes, , and DlbHLH-35 exhibited significant tissuespecific expression, which is deserving of further investigation in the future. The results of this study will enrich our knowledge of the bHLH TF family in D. longan and lay a foundation for enhancing the production of active secondary metabolites by genetic engineering in D. longan.
The length of bHLH genes in D. longan varied from 228 bp to 1,860 bp. In Panax ginseng, the length of bHLH genes ranged from 283 bp to 2,857 bp (Chu et al., 2018). By contrast, the longest bHLH gene in D. longan was significantly shorter (i.e. by about 1,000bp) than that in Panax ginseng, suggesting that bHLH gene lengths vary significantly among different species. The span of theoretical isoelectric points in D. longan bHLH proteins was large, ranging from 4.84 to 9.52, suggesting that different DlbHLH proteins might be functional in diverse microenvironments. The theoretical isoelectric points of Panax ginseng bHLH proteins were close to those found in D. longan, and varied from 4.81 to 10.16 (Chu et al., 2018).
We also evaluated the stability of the DlbHLH proteins. A protein whose instability index is larger than 40 is likely unstable, while those with values under 40 are likely stable (Guruprasad et al., 1990). The instability indexes of the DlbHLH-5 and DlbHLH-17 proteins were both predicted to be under 40 (39.77 and 32.78, respectively), while the instability index values of all other DlbHLH proteins were above 40. Thus D. longan contains both stable and unstable bHLH proteins, but the unstable bHLH proteins predominate. In addition, we found that all 42 DlbHLH proteins identified in D. longan had negative GRAVY scores. Since proteins with negative GRAVY scores are predicted to be soluble (Kyte and Doolittle, 1982), this means that all 42 DlbHLH proteins are likely soluble. This conclusion is consistent with the general requirement that transcription factors should be soluble. MEME was used to predict the conserved motifs in the 42 DlbHLH proteins identified in D. longan. In total, 10 conserved motifs were found, with motifs 1 and 2 present in many proteins. Because of their ubiquity, we speculate that motifs 1 and 2 are likely related to the core functions of bHLH proteins. A neighbor-joining phylogenetic tree was created, in which DlbHLH proteins with bootstrap values above 50 clustered together (Toledo-Ortiz et al., 2003). In general, the bHLH TFs of plants clustering in the same group participate in similar biological processes (Pires and Dolan, 2010). In D. longan, 42 DlbHLH proteins were divided into 9 groups, and were likely to be involved in 9 biological processes. This result suggested the possible biological processes that the 42 DlbHLH proteins are involved in, and each of these putative functions requires further examination in future work. Predicting proteinprotein interactions is useful for investigating the physiological functions of proteins , and can be especially valuable for those that, like bHLH family proteins, interact with each other. In this study, 37 DlbHLH proteins were predicted to interact with each other, which suggested that they may not function alone but require the presence of other DlbHLH proteins.
Next, we systematically explored the expression profiles of DlbHLH genes in D. longan root and leaf tissues. According to our RNA-seq dataset, 17 of 42 DlbHLH genes had different expression levels in root and leaf. To further confirm these results, qRT-PCR was performed to investigate the expression profiles of DlbHLH genes that showed differential expression level in the RNA-seq data. In conclusion, our qRT-PCR results were in accordance with the RNA-seq data. 911