Genome-wide investigation of Hydroxycinnamoyl CoA: Shikimate Hydroxycinnamoyl Transferase (HCT) gene family in Carthamus tinctorius L.

Hydroxycinnamoyl-CoA: shikimate hydroxycinnamoyl transferase (HCT) is mainly associated with monolignol biosynthesis, a central precursor to producing guaiacyl and syringyl lignins in plants. However, the explicit regulatory mechanism of HCT-mediated monolignol biosynthesis in plants still remained unclear. Here, the genome-wide analysis of the HCT gene family in Carthamus tinctorius as a target for understanding growth, development, and stress-responsive mechanisms was investigated. A total of 82 CtHCT genes were identified and characterized. Most of the CtHCTs proteins demonstrated the presence of two common conserved domains, including HXXXD and DFGWG. In addition, the conserved structure of protein motifs, PPI network, cis-regulatory units, and gene structure analysis demonstrated several genetic determinants reflecting the wide range of functional diversity of CtHCT-encoding genes. The observed expression analysis of CtHCT genes in different flowering stages under normal conditions partially highlighted their putative roles in plant growth and development pathways. Moreover, CtHCT genes appeared to be associated with abiotic stress responses as validated by the expression profiling in various flowering phases under light irradiation and MeJA treatment. Altogether, these findings provide new insights into identifying crucial molecular targets associated with plant growth and development and present practical information for understanding abiotic stress-responsive mechanisms in plants.


Introduction Introduction Introduction Introduction
Carthamus tinctorius is commonly known as safflower or 'bastard saffron', which belongs to the Asteraceae family of the plant kingdom. The increasing demand for its oilseed, which is extremely rich in conjugated linoleic acid, has attracted the attention of plant biologists worldwide. Safflower's oilseed consists of 80% of octadecadienoic acid, which helps regulate the rate of cholesterol and avert diseases related to cardiovascular channels (Roh et al., 2004). C. tinctorius is radically recommended for its medicinal and economic value in the mainland of China and west Asia. Following multiple phytochemical and pharmacologic investigations on safflower petals, it was discovered that the vital component that provides abundant resources of pharmacogenetic importance is flavonoid. Almost over 5000 types of phenolic compounds and lignin derivatives exist across the plant kingdom in which safflower shares a remarkable reservoir of flavonoids. The widely distributed classes of flavonoids in C. tinctorius mainly include carthamin chalcone glycoside, kaempferol glucosides, hydroxylsafflor yellow A&B, and quercetin glucosides (Ye and Gao, 2008;Zhang et al., 2011).
Hydroxycinnamoyl CoA: Shikimate Hydroxycinnamoyl Transferase (HCT) gene family is widely known as acyl-CoA-dependent transferases, including various enzymes that utilize the commonly used donner molecule hydroxycinnamoyl-CoAs which catalyse a group of reactions and substrates (Chiang et al., 2018).
HCT synthesizes p-coumaroyl shikimate by transferring the p-coumaroyl group from the acyl donor pcoumaroyl-CoA to the acyl acceptor shikimate. It is an essential enzyme in the phenylpropanoid metabolism, conserved across all land plants (Chao et al., 2021;Weng and Chapple, 2010). The downstream pathway catalyses the conversion of phenylalanine into a variety of hydroxycinnamic acids, which are the key precursor molecules of flavonoids, hydroxycinnamic acid conjugates, and lignins (Wang et al., 2015). The metabolic pathways towards lignin and chlorogenic acid (CGA) presumably share common intermediates and enzymes.
In vascular plants, the phenylpropanoid pathway is required to synthesize many metabolites, including lignin, which provides mechanical strength to vascular tissues and defense against various stresses (Vanholme et al., 2019;Wang et al., 2015). For instance, low temperature, high salinity, drought, mechanical injury, abscisic acids (ABAs), salicylic acid (S.A.), and hydrogen peroxide induce HcHCT expression in Hibiscus cannabinus. HcHCT increases abiotic stress tolerance in plants (Chowdhury et al., 2012). In Cucumis sativus, the HCT expression was reduced with pectinase treatment and directing the phenylpropanoid pathway to generate H-lignin caused p-coumaraldehyde accumulation (Varbanova et al., 2011). HCT is generally a conserved gene family among higher plants (Tohge et al., 2013;Xu et al., 2009). The comprehensive genomewide characterization of HCT gene family as well as focusing on different structural components and functionally active sites would enables us to gain deeper understanding of HCT utilization during specialized metabolism in plants. In the current study, the structural and functional dynamics of the HCT gene family in C. tinctorius was unveiled by conducting genome-wide identification and expression analysis under normal sporadic conditions. This work will also provide unique insights into the underlying regulatory mechanism of plant growth and development under abiotic stress conditions.

Materials and Methods Materials and Methods Materials and Methods Materials and Methods
Plant materials and treatment conditions The seeds of 'JiHong' No. 1 variety of C. tinctorius were purchased from the Tacheng seed company, Xinjiang province of China, and then grown in the experimental station of Jilin Agricultural University under control conditions at 23 ± 2 °C. The flowering development period in C. tinctorius was recorded approximately 100 days from the date of cultivation. The flower samples from the bud, initial, full, and fade flowering stages were collected on the 99th day, 120th day, 140th day, and 160th day, respectively. In the case of light treatment, the healthy plants of C. tinctorius after the initiation of flowering were allowed to grow under the induction of normal light radiation (16.8 MJ/m 2 ) and weak intensity of light irradiation (4.6 MJ/m 2 ) maintained in the experimental station of the laboratory. For MeJA treatment, healthy flowering plants of C. tinctorius were treated with (100 μM solution) once daily for 7 days. The flower's petals from each flowering stage were collected and immediately placed in liquid nitrogen and preserved at -80 °C until their subsequent use.
Genome-wide identification and sequences retrieval of CtHCTs The Hidden Markov model (HMMsearch) of the HCT domain (PF02458) at the Pfam database, accessible at http://pfam.xfam.org/ (Finn et al., 2015), was screened to investigate distribution of CtHCTs in the C. tinctorius genome. Moreover, we screened the entire set of CtHCT protein sequences for the existence of HXXXD and transferase domains using the online server of MARCOIL available at (http://toolkit.tuebingen.mpg.de/marcoil). The non-redundant protein sequences lacking the two domains of HCT were deleted from the analysis. After the assembly of CtHCT sequences, the genomic and protein sequences of HCTs were collected from Arabidopsis thaliana, Cynara cardaunculus, Helianthus annuus, Lactuca sativa, and Artemisia annuua. The HCTs sequences from A. thaliana and other plants were extracted from the Arabidopsis Information Resource (TAIR) (http://www.Arabidopsis.org/), NCBI (https://www.ncbi.nlm.nih.gov/), Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html), and Plantgrn noble (http://plantgrn.noble.org/) respectively. Lastly, a dataset including 82 clean sequences of CtHCTs and 259 members of HCTs from the other five plants were assembled for further bioinformatics analyses. The physicochemical properties of 82 putative CtHCT proteins were investigated using different online tools. The theoretical isoelectric point (P.I.), protein lengths, and molecular weight (M.W.) of the obtained proteins were analyzed using ExPASyProtParam online tool (available online: at https://web.expasy.org/protparam/). The subcellular localization prediction of each gene was predicted using the cello web server (http://cello.life.nctu.edu.tw/) and WoLF PSORT (https://wolfpsort.hgc.jp/).

Phylogenetic reconstruction of HCT proteins
The full-length amino acid sequences of the 82 CtHCT proteins obtained from the C. tinctorius genome were subjected to multiple sequence alignment using Clustal W (2.0). The CtHCT sequences were numbered from CtHCT001-082 following their identification order. To analyse the evolutionary relationship and divergence of CtHCTs, an unrooted neighbour-joining phylogenetic tree with 1000 bootstrap method was generated together with 259 HCT sequences from A. thaliana, C. cardaunculus, H. annuus, L. sativa, and A. annuua using MEGA X software version 4.1 (Tamura et al., 2011). The classifications of subfamilies were further analysed for genome-wide comparison.

Analyses of conserved protein motifs and PPI network
The clear sequences of CtHCTs were added to multiple sequence alignments in Clustal W (2.0) software to investigate the conserved amino acid composition and the presence of conserved protein motifs. The distribution and composition of the conserved protein motifs in CtHCTs were comprehensively investigated by adding each CtHCT protein sequence to MEME web server Version 4.8.1; available at 4 http://meme.nbcr.net/meme/cgi-bin/meme.cgi) using the default settings. The logos of these identified motifs were extracted from the MEME server. The graphical representation of protein motifs was edited in EvolView v.2 (http://www.evolgenius.info/). Furthermore, the prediction of protein interactive network of the putative CtHCT proteins was also investigated by uploaded CtHCT sequences to the online web server of STRING database version 10 (https://string-db.org/). The hierarchical network of interactor proteins associated with upstream and downstream regulation CtHCTs was created and exported from the STRING database.
Analysis of gene structure, cis-acting units, and Go term enrichment of CtHCTs The gene structure organization, including exon and intron and UTR region along with the length of CtHCT genes, was examined from the C.D.sC.D.s and genomic sequences of CtHCT genes with the help of GSDS (Gene Structure Display Server (http://gsds.cbi.pku.edu.cn/index.php) according to the instructions given by (Hu et al., 2014). Furthermore, to investigate the cis-regulatory units of the CtHCT promoter, the 2kb upstream 5` UTR flanking sequence of each gene was analyzed using the online webtool of PLACE (https://sogo.dna.affrc.go.jp/). In addition, the G.O. term analysis of C. tinctorius HCTs was determined with the help of Blast2GO software available at (https://www.blast2go.com/) following the instructions given by (Conesa and Götz, 2008). For this purpose, the full-length amino acid residues of CtHCT proteins were added to Blast2Go, and functional annotation of different categories was then identified.

Expression analysis of the putative CtHCTs
The experimental tissues of flower petals at different stages (bud, initial, fade, and full) were pulverized entirely in liquid nitrogen and then collected into centrifuge tubes. The total RNA extraction was performed using RNA ISOplus reagent (Takara Bio Co., Beijing, China), following the manufacturer's protocol. RNA quality was confirmed using OD260/280 concentrations through NanoDrop 2000 (ThermoFisher Scientific, Beijing, China) and 1% agarose gel electrophoresis. The first-strand cDNA templates were synthesized from the RNA isolated from each flowering stage using the reverse transcription kit (PrimeScript RT reagent kit with gDNA Eraser (Takara, Japan), following the instruction of the manufacturer's protocols. The quantitative real-time PCR assay was carried out to determine the transcription level of CtHCT genes using SYBR® Premix Ex Taq™ (TaKaRa). The system of StratageneMx3000P (Stratagene, CA, USA) was utilized to conduct qRT-PCR experiments. The relative expression level of CtHCTs at the bud stage was normalized to the housekeeping gene 18s ribosomal RNA expression. The fold change ratio was calculated according to the 2-ΔΔCT method (Livak and Schmittgen, 2001). Each experiment was repeated in three independent biological replicates. The gene-specific primers synthesized for each CtHCT are listed in (Table S1).

Statistical analysis
The results were calculated as mean ± S.D. with three replicates. The variations between means of each group were assessed by carried through a one-way analysis method of the variance with the help of (Statistix 8.1). P-value equal to 0.05 was kept statistically significant.

Results Results Results
Identification and characterization of CtHCTs in C. tinctorius To identify all candidate HCT genes in C. tinctorius, we conducted comprehensive searches using the hidden Markov model (HMMsearch) against the C. tinctorius genome. Furthermore, the set of CtHCT sequences were re-investigated for the existence of HXXXD and functional domain of transferase using the online webserver of MARCOIL. Based on the information obtained, a total of 105 HCTs were identified in C. tinctorius genome. Among these, 23 CtHCTs sequences were eliminated from the analysis due to the 5 insufficient data and absence of the active HCT functional domains. These 82 CtHCT genes were distributed unevenly on different chromosomes, demonstrating the formation of different clusters. The assembly and organization of CtHCTs in C. tinctorius revealed that evolutionary events such as tandem duplication and genome repetition might participate in the origin HCT gene family. The 82 CtHCT-encoded amino acids were renamed from CtHCT001-CtHCT082. The length of these peptides ranged from 305aa-903aa; molecular weights ranged from 33.88kDa to 99.69kDa, with an average of 50.40kDa (Table 1). The isoelectric points (pI) values fall under 5.00 to 9.03, with an average of 6.51. In addition, the subcellular localization predictions were investigated of which, most of the candidate CtHCTs proteins were localized to the plasma membrane, cytoplasm, nucleus, and mitochondria, respectively. All CtHCTs proteins showed thermal stability due to their aliphatic matching indexes with other globular proteins. Phylogenetic analysis of CtHCTs To further elucidate the evolutionary relationship of the HCT family in C. tinctorius, the protein sequences combined with 259 members of HCTs from A. thaliana, C. cardaunculus, H. annuus, L. sativa, and A. annuua were added to the alignments. The selected species were purposely nominated for genome-wide comparison because these plant species shared a relatively high frequency of presenting the HCT gene family across the plant kingdom ( Figure S1). A neighbour-joining phylogenetic tree was constructed using the MEGA X software package with the 1000 bootstrap method. CtHCT family were clustered into six subfamilies designated as the group I, II, III, IV, V, and VI, indicating strong conservation with other species (Figure 1). Comparative phylogenetic studies suggested that most CtHCTs were assembled into group IV with AtHCTs and AuHCTs, which are mainly associated with the function of r hydroxycinnamates transfer to shikimate during monolignol formation. Similarly, the second most CtHCT containing group was group VI, along with HaHCTs and LsHCTs in abundance, corresponding to the strong conservation of the active sites for carbonyl group and shikimate binding sites. The smallest group was designated as group II containing only one member of CtHCT protein, clustered together with only one CcHCT and two members of AnHCT, suggesting a different catalytic activity and are most likely related to function other than lignin biosynthesis in plants.

Motifs distribution and protein alignment of CtHCTs
The conserved domains of CtHCT-encoding proteins were identified by aligning the 82 CtHCT amino acid sequences using multiple pairwise alignments. The presence of the two prevalent HCT domains in C. tinctorius consisting of HXXXD and DFGWG ( Figure S2) were observed. The existence of high-frequency conservation of specified amino acids within the HCT domains suggested crucial hallmarks for the catalytic 8 activity of CtHCT in C. tinctorius. Furthermore, the distribution of the conserved motif was screened out with the help of MEME web server with specified settings including the classic mood, selecting the protein standard alphabet, site distribution as zero or one occurrence per sequence, and 10 numbers of motifs. The results confirmed the presence of 10 conserved motifs unevenly organized across CtHCT proteins (Figure 2). For instance, motif 1, 3, 4, 5, and 7 were found in all subgroups suggesting that these motifs were most frequently conserved in CtHCT proteins.
Moreover, from these findings, we deduced that all closely related CtHCTs proteins clustered together might represent the common composition of these conserved motifs and acquire similar activity. However, motifs 2, 6, 8, 9, and 10 were diversely distributed across all CtHCTs proteins. For example, the occurrence of motif 2 was found absent in the five members of subgroup II, motif 6 were found in all subfamilies except some members of subfamily VI, motif 8 were found nearly in all members of subfamily VI except for subfamily II and IV whereas appeared in only one member of subfamily III, motif 9 were found abundantly in VI but absent in group II, while appeared in only one member of subfamily I and III. Similarly, motif 10 was found unique to group VI and absent in all other subgroups. The full-length logos of these motifs were enlisted in ( Figure S3).  The interrelation of CtHCT with other proteins was investigated to link their interaction network involving different biological pathways. Identifying the protein-protein association network could play a fundamental role in predicting the possible function of the putative proteins. A total of 22 interactor proteins were predicted for the CtHCTs as shown in (Figure 3), some of them have already been determined experimentally, such as cinnamate 4-hydroxylase (C4H), p-coumarate 3-hydroxylase (C3H), 4-coumarate: coenzyme A ligase (4CL), and caffeoyl shikimate esterase (CSE). The role of these interactor proteins was most widely characterized in multiple pathways that occurred in plants during lignin and secondary metabolite biosynthesis. Furthermore, to understand the detailed topology of the interactor proteins with CtHCT proteins, the three-dimensional structures of these proteins were predicted and compared correspondingly determined by their genetically encoded amino acid sequences ( Figure S4). As a result, understanding the relationship between 3D arrangements of amino acid sequence and protein structure allows us to draw a significant amount of information for functional prediction of novel protein from genome sequence data and the rational engineering of protein functions. Taken together, the protein-protein interaction network of putative CtHCTs could help us linking crucial biosynthetic pathways and routes leading to specialized metabolism in plants.    Gene structure organization of CtHCT genes The investigation of the intron/exons organization of CtHCT genes was carried out to predict gene structure conservation and investigate their evolutionary relationships using the online GSDS tool. The average length of CtHCT genes ranged between 1000 bp (CtHCT17) to 2675 bp (CtHCT042). The structural organization of each CtHCT gene comprising exons (red), C.D.s (yellow) as well as 5` and 3` UTR regions (blue) is demonstrated in (Figure 4). These results suggested a variable trajectory in exon/intron numbers, C.D.s, and UTR regions were found even in the most closely related members of the same subgroup. For example, CtHCT018, CtHCT056, CtHCT057, CtHCT058, and CtHCT070 possess similar gene structures, however, the same members of group 6 contains different number of exons and introns including CtHCT002, CtHCT005, CtHCT049, CtHCT064, CtHCT066, and CtHCT078. The diversity in gene structures of CtHCT genes indicated multiple evolutionary mechanisms such as gene recombination, gene duplication, alternative splicing, and transposon, resulting in new gene structures and transcripts that form unique polypeptides with different biological functions.

11
Cis-regulatory units of CtHCT genes To explore the functional diversity and regulatory system of the CtHCT gene family, we investigated cis-regulatory elements within the promoter region of each gene. A total of 20 frequent cis-regulatory units were identified in the 2000 bp genomic sequences located upstream from the initiation codon or 5' untranslated region (5'UTR) of CtHCT genes. Most of the cis-elements found abundantly in the promoter region of CtHCT genes include hormonal responsive elements, particularly gibberellins, jasmonic acid, salicylic acid, abscisic acid, and auxin-responsive elements. Furthermore, diverse group of regulatory units related to tissue-specific expression were also detected, such as meristems, endosperm, root, and seed-specific regulatory units. Apart from that, various defense and abiotic stress-associated responsive elements, such as light and low temperature-responsive elements combined with cell cycle regulatory units and metabolic-related responsive units were also detected in the promoter region of CtHCT genes of C. tinctorius ( Figure S5). These findings revealed the flexibility and functional diversity of CtHCT genes involving their potential roles in specialized metabolism and diverse biological activities in plants.
Functional annotation of CtHCT genes The G.O. term analysis was performed to assign functional annotation to putative CtCHCT-encoding genes. All CtHCTs were divided into three functional categories, including biological processes (B.P.), molecular function (M.F.), and cellular component (CC). A bulk of CtHCT genes were enriched into biological processes term followed by molecular function and cellular component. The most enriched G.O. terms of biological processes contain biosynthetic, metabolic, and cellular processes which include biotic and abiotic stimuli, defense responses, cell wall organization and biogenesis, cellulose metabolic, and biosynthetic processes ( Figure S6). In the molecular function term, the top-ranked G.O. terms of binding and catalytic activities were enriched which include enzyme inhibitor and regulator activity, molecular function regulator, copper ion binding, protein histidine kinase activity, carboxylic ester hydrolase activity, and pectinesterase activity. The most enriched G.O. terms such as the cell wall, external encapsulating structure, cell periphery, and respiratory chain were assigned to the cellular component category ( Figure S6). These results emphasized the potential roles of CtHCTs in a variety of biosynthetic and crucial metabolic processes which may directly or otherwise participate in regulating plant responses against external stimuli.
HCT expression profiles at different flowering stages of C. tinctorius The digital expression level of CtHCT genes was primarily calculated with FPKM (fragments per Kb per million reads) statistics of each CtHCT gene using the software package of featureCounts (v1.5,0-p3). The data was obtained from the whole transcriptome shotgun sequencing in four different flowering stages of C. tinctorius, including bud, initial, full, and fade stages. The data have been deposited in the public database of NCBI under the accession number (PRJNA399628). As described in (Figure S7), the expression level of CtHCT genes was divided into different cluster groups demonstrating the differential expression pattern of these transcripts at four flowering stages in C. tinctorius. Furthermore, to validate the biological expression level of CtHCT transcripts, we performed the qRT-PCR analysis of 20 genes in four different flowering stages of C. tinctorius (bud, initial, full, and fade). Expectedly, the expression patterns of these CtHCT genes across all flowering stages were found consistent with RNA-seq results. For instance, the expression trend of CtHCT005, CtHCT019, CtHCT030, CtHCT055, CtHCT077 and CtHCT081 was increased upwards at the initial and full phase of flower development ( Figure 5). Similarly, the transcript abundance of the CtHCT009, CtHCT028, CtHCT031 candidate genes was exhibited at the full and fade flowering phase. However, the relative transcript abundance of CtHCT001, CtHCT028, CtHCT033, and CtHCT048 was detected only at the full flowering phase indicating their transcriptional regulation full bloom flowering period of C. tinctorius. On the contrary, the relative fold expression level of CtHCT017, CtHCT027, CtHCT031, CtHCT039, CtHCT040 and CtHCT059 was peaked at the fading stage of flower development ( Figure 5).

12
Mostly, the CtHCT genes expression level in fading stages of flower development was four-fold higher than the rest of the three stages. The expression preference of CtHCT genes in different flowering phases suggested a significant correlation with flower developmental and regulation of secondary metabolism in plants.  The 18s ribosomal RNA genes were used as a housekeeping gene in our analysis. The data was calculated using the 2-ΔΔ CT method.

13
Expression profiling of CtHCT genes under light irradiation Here we analyzed the expression profiling of CtHCT genes at four different phases of flower development in C. tinctorius in response to normal and low light irradiation using the qRT-PCR assay. Following weak light irradiation (4.6 MJ/m 2 ), the transcription level of selected CtHCT genes was significantly induced through all flowering stages under investigation. However, the expression trend was found variable for each transcript than the normal light irradiation condition (16.8 MJ/m 2 ). Following weak light irradiation at bud flowering stage, the expression level of the most of CtHCT genes including CtHCT017, CtHCT019, CtHCT030, CtHCT017, CtHCT039, CtHCT048, CtHCT055, CtHCT065, and CtHCT081 was significantly up-regulated indicating different folds of transcript abundance. However, CtHCT001, CtHCT033, and CtHCT059 expression levels were down-regulated at the bud flowering stage ( Figure 6).  The red bars represent the control treatment group under normal light irradiation (16.8 MJ/m 2 ), whereas blue bars denote the treatment group induced with (4.6 MJ/m 2 ). The 18s ribosomal RNA genes were used as the housekeeping gene. The data was calculated using the 2-ΔΔ CT method. Moreover, after the induction of weak light irradiation at the initial flowering phase of C. tinctorius, the transcription level of CtHCT001, CtHCT033, CtHCT039, CtHCT048, CtHCT059 was increased up to 2-3 folds, whereas the expression level was declined in case of CtHCT017, CtHCT019, CtHCT030, CtHCT055, CtHCT065, and CtHCT081 genes.Under the same light intensity at the full flowering stage, the expression level of the most CtHCT genes showed a downwards trend, including in CtHCT001, CtHCT033, CtHCT048, CtHCT055, CtHCT059, and CtHCT065; however, the expression level was induced upwards in CtHCT017, CtHCT019, CtHCT027, and CtHCT030. During fading stage, the expression of CtHCT genes such as CtHCT001, CtHCT019, CtHCT033, CtHCT039, CtHCT065, and CtHCT081 was up-regulated under weak light induction, whereas the expression pattern was declined in case of CtHCT017, CtHCT027, CtHCT030, CtHCT048, CtHCT055, and CtHCT059 ( Figure 6). These findings of the of CtHCT genes under weak light irradiation suggested positive insights into understanding the regulation of stress responses by activating their genetic machinery in combination with other possible factors that interconnect early stress responsive mechanisms.   The red bars represent the control treatment group (no treatment), whereas the blue bars denote the treatment group induced with MeJA (100 μM). The 18s ribosomal RNA genes were used as a housekeeping gene. The data was calculated using the 2-ΔΔ CT method.

Discussion Discussion Discussion
The HCT gene family is mainly involved during the regulatory mechanism of monolignol biosynthesis in different plant species (Shadle et al., 2007;Sun et al., 2018). Here, we extensively characterized the HCT gene family in C. tinctorius and provided a complete genome-wide overview of these genes alongside their structural and functional active sites interlinked with the regulation of abiotic stress responses in plants. In general, the assembly of the HCT gene family of C. tinctorius in comparison to other plants shared variable size in the total genome, including Arabidopsis (Initiative, 2000), strawberry (Shulaev et al., 2011), pear (Wu et al., 2013), apple (Velasco et al., 2010), peach (Verde et al., 2013). Further comparative analysis revealed that the occurrence of conserved amino acids at specific positions in CtHCT was found consistent with the Arabidopsis (D' Auria, 2006) and pear HCTs (Ma et al., 2017). Moreover, the existence of other conserved amino acid residues indicating more than 60% similarity was consistent with Populus nigra (Vanholme et al., 2013) and Coffeaca nephora (Lepelley et al., 2007). These findings suggested that the high frequency of the conserved amino acid residues could be crucial for identifying putative function of CtHCT genes.
The investigation of the cis-elements in the promoter region of CtHCT genes revealed various critical regulatory units involved during the counter-response of plants against different abiotic stressors. The occurrence of these top-ranked abiotic stress-responsive elements in CtHCT genes includes abscisic acid, low temperatures, gibberellins, jasmonic acid, salicylic acid, auxin, defense and stress-responsive factors, and endosperm-specific expression. As found by (Dang et al., 2011), these types of cis-regulatory elements were explicitly identified in pea plants against abiotic stress responses. In agreement with these findings, the results of of cis-elements of CtHCT genes also suggested important hallmarks involving plant adaptation to various abiotic stresses and signal transduction of hormones during plant growth and development. In addition to cisregulatory elements, the enrichment of the HCT-encoding proteins in the core pathway of phenylpropanoid biosynthesis and other essential classes of secondary metabolites have been described in several plants, such as Linum usitatissimum (Tripathi and Agrawal, 2013), tobacco (Tamagnone et al., 1998), and Eucalyptus globules (Shinya et al., 2014). In this study, we also investigated that most of the CtHCT-encoding proteins were enriched in biosynthetic, metabolic, and cellular processes containing responses to biotic and abiotic stimuli, defense responses, cell wall organization and biogenesis, cellulose metabolic and biosynthetic processes, and external encapsulating structure organization. These findings strongly highlight essential clues into the putative role of CtHCT genes in secondary metabolism and understanding the abiotic stress resistance mechanism in C. tinctorius. HCT genes played a critical role in plant growth and development-related activities. For example, the HCT1 and HCT2 of red clover plants were expressed in all tissues, including stems, leaves, and flowers, but was found higher in flowers than expression in unexpanded leaves, mature leaves, and stems (Sullivan, 2009).
Similarly, Populus trichocarpa possesses seven PtrHCTs that can be expressed in the tissues of various plant parts and exhibit differences concerning their relative performance. In particular, PtrHCT1 and PtrHCT6 are primarily expressed in stem tissues, whereas PtrHCT3 has a higher expression level in leaf tissues (Shi et al., 2009). Another study revealed that the expression pattern of the HcHCT transcript was ubiquitous in all parts of a 4-week-old plant but was relatively high in roots and mature flowers. The highest HcHCT transcript was detected in young flowers and young leaves during flowering and leaf development (Chowdhury et al., 2012).
HcHCT showed high expression levels in flowers and roots, suggesting that HcHCT participates in the biosynthesis of secondary metabolites in floral and root tissues. Given the previous studies, the diverse expression profiling of CtHCT genes at four different flowering stages of C. tinctorius was also detected. The transcription regulation and expression preference of CtHCT genes at various flowering stages indicated a significant correlation of these transcripts with plant growth and development and regulation of secondary metabolism.
Furthermore, the natural exposure of plants to various abiotic and biotic stresses leads to generating several mechanisms in cell wall modification to protect themselves against these stresses. The potential of lignin to protects cell wall degradation by maintaining its polysaccharides level against pathogenic microbes and stressinduced degradation. Lignin act as an antioxidant in a plant, encountering heat stress and eventually increase plant tolerance against various stress conditions (Bhardwaj et al., 2014). As a point of importance, the optimum light intensity and temperature required for germination and average growth of C. tinctorius are (16.8 MJ/m 2 ) and 35 °C (Torabi et al., 2016). Fluctuation in light intensity and temperature can affect the physiological and development activities of C. tinctorius (Torabi et al., 2016). In addition, MeJA treatment also demonstrated similar results in Norway spruce where the HCT genes showed induced expression level (Chowdhury et al., 2012). The regulatory mechanism of abiotic stress tolerance in C. tinctorius by conducting the expression profiling of CtHCT genes under low light irradiation (4.6 MJ/m 2 ) and MeJA stress using qRT-PCR analysis was also examined. These results demonstrated that CtHCT transcription shared a diverse expression pattern through different flower development stages in C. tinctorius following induction with weak light irradiation and MeJA treatment. These data suggested that the up-regulation and down-regulation of CtHCT genes at certain stages of flower development could be involved in the defense-related pathways in coordination with J.A. signalling pathways (Chowdhury et al., 2012) that define early stress response to specific and broadspectrum stress tolerance mechanisms. However, more efforts are still needed to elucidate the explicit functional role of CtHCT genes in C. tinctorius.

Conclusions Conclusions Conclusions
This study provides the first comprehensive genome-wide analysis explaining various structural and functional components of CtHCT genes. From these findings, it was revealed that various conserved entities such as protein motifs, cis-acting elements, gene structure, and functional enrichments could be crucial in predicting the function of CtHCT genes during plant growth and development. In addition, a group of CtHCT genes showed preferential expression in developing flowers of C. tinctorius under normal and abiotic stress conditions, suggesting the regulation of stress responsive mechanism in flower tissues. Together, these findings could pave the wave for the discovery of key genes involved in lignin biosynthesis and the foundation for engineering C. tinctorius with enhanced lignin content.  . Not Bot Horti Agrobo 49(3):12489 Figure S1. Figure S1. Figure S1. Figure S1. Distribution of HCT gene family shared by different plant species Figure S2. Figure S2. Figure S2. Figure Figure S3. Figure S3. Figure S3. Logos of the 10 conserved protein motifs within CtHCT proteins obtained from the MEME online web-server . Not Bot Horti Agrobo 49(3):12489 Figure S4. Figure S4. Figure S4. Figure S4. The three-dimensional structures of the ten putative interactor-proteins with a member of CtHCT putative protein . Not Bot Horti Agrobo 49(3):12489 Figure S5. Figure S5. Figure S5. Figure S5. The graphical representation of the twenty conserved cis-regulatory elements identified in the 5' untranslated region (5' UTR) of the promoter sequence of CtHCT genes Each colour represents a different type of cis-acting element within the promotor site of CtHCT genes. . Not Bot Horti Agrobo 49(3):12489 Figure S6. Figure S6. Figure S6.  Figure S7. Figure S7. The heatmap was generated from the FPKM data obtained from RNA-seq data.