Analysis of codon usage pattern in Lonicera × heckrottii ‘Gold Flame’ based on chloroplast genome

Codon usage bias (CUB) was a unique feature of the genome, and revealing chloroplast genome codon usage bias can provide useful information for the evolution of plant species. Lonicera × heckrottii ‘Gold Flame’ is one of the members of the Lonicera genus with important ornamental value. However, the codon usage bias of chloroplast genome of this genotype has not been investigated. In this study, base compositions and various codon usage indices of 51 coding sequences from Lonicera × heckrottii ‘Gold Flame’ chloroplast genome was calculated, by Codon W, DnaSP, CUSP of EMBOSS and SPSS software. A total of 51 CDS of the Lonicera × heckrottii ‘Gold Flame’ chloroplast genome was selected to analyse codon usage bias. The results showed that the average GC content of 51 CDS sequences was 39.27%, and the average value of ENC was 48.75%. The chloroplast gene codon usage bias was weak, and preferred A/T ending. The general GC content order was GC1 (47.72%)>GC2 (39.89%)>GC3 (30.19%). Correlation analysis results showed that there was a significant positive correlation between GC content and GC1, GC2 and GC3 content. Combined with neutral plot analysis, ENC-plot and PR2-plot analysis, it is found that chloroplast gene codon usage preference was affected by mutations pressure and natural selection. In addition, the eight optimal codons in chloroplast genome were finally identified, codon bias ending with A/T. The study on the codon usage bias of Lonicera × heckrottii ‘Gold Flame’ provides a demonstration for exploring its genetic structure and molecular evolution mechanism, and provides a reference for molecular breeding.


Introduction
The genetic code is an important link between DNA (nucleic acid) and protein. It is a unique attribute of genes and genomes. The unequal use of synonymous codons is known as codon usage bias (CUB). CUB widely existed in different organisms, such as diatoms (Krasovec and Filatov, 2019), bacteria (Dilucca Pavlopoulou and Georgakilas, 2020), animals (Galtier et al., 2020), human (Dhindsa et al., 2020), plants (Zhang et al., 2012;Nie et al., 2014;Wang et al., 2020b;Chakraborty et al., 2017), etc. Quantitative analysis of codon usage bias can provide an important reference for species classification and evolutionary mechanisms.
Chloroplasts are subcellular organelles essential for the photosynthesis and metabolism of green plants. Compared to the nuclear genome, the chloroplast genome, which possesses many characteristics, including simple, small size and highly conserved. It is widely used in researches such as identification, phylogenetic and 2 adaptive analysis (Dobrogojski, Adamiec and Luciński, 2020). In recent years, with the development of highthroughput sequencing technology for chloroplast genomes, a large number of chloroplast genomes have been sequenced, and the codon usage bias of chloroplast genomes has become possible. Therefore, the chloroplast genome is a special tool for plant systems evolution research (Iriarte et al., 2021). Many plant chloroplast genomes have been sequenced and analysed for codon usage characteristics, including Helianthus annuus (Chen et al., 2021), Oryza (Chakraborty et al., 2020), Delphinium grandiflorum L. (Duan et al., 2021), Paeonia suffruticosa (Guo et al., 2020), Euphorbiaceae (Wang et al., 2020b), Hemiptalea davidii , Asteraceae (Nie et al., 2014), Solanum (Zhang et al., 2018), and Lonicera macranthoides (Hu et al., 2018).
Codon usage bias was the adaptation mechanism of species to environmental selection pressure. It was affected by many different factors. The possible evolutionary forces based on codon usage patterns have been studied in the genomes of many organisms (Zhang et al., 2007;Iriarte et al., 2021). There are differences in codon usage bias among species. It may be affected by mutation pressure, natural selection and genetic drift in the population. Previous research found that the codon usage bias of some plant's chloroplast genome was mainly driven by natural selection, as exemplified in Delphinium grandiflorum L. (Duan et al., 2021), Hemiptalea davidii , Euphorbiaceae (Wang et al., 2020b), Helianthus annuus (Chen et al., 2021) and Mesona chinensis Benth (Tang et al., 2020). Asteraceae plastomes chloroplast genome codon usage bias was affected by gene length (Nie et al., 2014), and influenced by translation level. the chloroplast genes of Triticum aestivum L. (Zhang et al., 2007). Additionally, in plant species, the codon usage bias in the Porphyra umbilicalis and Nitrariaceae chloroplast genome were affected by natural selection and mutation pressure (Li et al., 2019;Chi et al., 2020). In addition, codon usage bias was affected by many factors, including GC content, gene length, gene expression level and so on . Codon usage bias reflects a selection-mutation balance, which is affected by natural selection, mutation pressure and genetic drift in a population (Bulmer, 1991;Eyre-Walker, 1991;Duan et al., 2021). Analysing the dominant role of codon usage bias can provide a suitable strategy for identifying the main driving force. Therefore, selecting which one is the dominant factor is still a focus of research.
Caprifoliaceae contains > 800 species. These are grown as an ornamental plants around the world.
Among them, Lonicera japonica Thunb., is the most famous, mainly as anti-inflammatory herbs (Wang et al., 2019;Pu et al., 2020). Lonicera × heckrottii 'Gold Flame' belongs to Lonicera and is a semi-evergreen vine. Lonicera × heckrottii 'Gold Flame' (also known as Goldflame honeysuckle) is a cross between Lonicera sempervirens and Lonicera americana. It is characterized by the habit of winding and climbing, and it blooms continuously during the growing season in most areas (Bruner et al., 2001). Oval blue-green leaves and red stems make it very ornamental. It is known as the most handsome climbing honeysuckle. It is a popular garden plant. As a vine, it can trim into various shapes. Therefore, it is an excellent hedge plant. It is also original material for producing jelly (Bruner et al., 2002). There have been a few studies on cultivation and cutting propagation (Bruner et al., 2001(Bruner et al., , 2002. We had already sequenced the complete chloroplast genome sequence of Lonicera × heckrottii 'Gold Flame' (GenBank accession: MZ522723) before this study. The total length of Lonicera × heckrottii 'Gold Flame' is 155,437 bp and includes 125 genes, with a typical quadripartite structure. However, the codon usage bias of chloroplast genome in Lonicera × heckrottii 'Gold Flame' has not been reported. 3

Sequence data
The complete chloroplast genome sequences of Lonicera × heckrottii 'Gold Flame' (accession numbers: MZ522723) were downloaded from GenBank (https://www.ncbi.nlm.nih.gov/) in FASTA format. A total of 125 genes were obtained. After filtering genes with sequence length <300 bp and the repeated sequences, and deleted the genes of non-coding regions. Finally, a total of 51 qualified CDS (coding DNA sequences) were obtained for subsequent codon usage bias analysis (shown in Table S1).

Analysis of codon composition
A number of the codon usage indices were calculated using Codon W 1.4.2, DnaSP 6.0 software and the online CUSP program of EMBOSS (http://emboss.toulouse.inra.fr/cgi-bin/emboss/cusp), including Effective number of codons (ENC), Relative synonymous codon usage (RSCU), Codon bias index (CBI), The codon adaptation index (CAI), Frequency of optimal codon (Fop), The frequency of codon usage (FCU), at the 3rd codon position of the four-codon bases (A3, T3, C3 and G3) and GC3 (the proportion of GC nucleotides at the 3rd position of synonymous codons). The GC content at the 1st, 2nd, 3rd of codons (GC1, GC2, GC3) and the overall GC content of genes (GC) and so on.
Neutrality plot GC12 represents the average value of GC contents GC1 and GC2, drawing scatter plot with GC3 as the abscissa and GC12 as the ordinate, each point is considered to be an independent gene, in neutrality plots. If GC12 and GC3 were significantly correlated, and the codon usage patterns was mainly affected by mutation pressure. Conversely, if there is no correlation between GC12 and GC3, indicating the codon usage patterns as mainly affected by natural selection (Sueoka, 1988;Duan et al., 2021).

ENC-plot
ENC-plot mapping analysis was an effective codon number plotting analysis, which can reflect the relationship between codon usage bias and base composition. Drawing a two-dimensional scatter chart with the GC3 values as the abscissa and ENC values as the ordinate. And add a standard curve in the scatter chart: ENC=2+GC3+29/[GC3 2 +(1-GC3) 2 ]. The corresponding point close to the standard curve means that the codon usage bias might be only driven by mutations pressure, on the contrary, besides mutations, there are other factors, especially natural selection (Wright, 1990;Gupta, Bhattacharyya and Ghosh, 2004).

PR2-plot
PR2-plot analysis was called parity preference analysis, it explores whether the A, T, C, and G base mutations in the third base of each gene codon was balanced. A3/(A3+T3) and G3/(G3+C3) were drawing a scatter diagram as the ordinate and as the abscissa in PR2-plot analysis, respectively. The vector from the center represents the range and direction of PR2 bias. If A=T and C=G, there was no bias by mutation and selection affected (Sueoka, 1999).

Optimal codons
Taking ENC as the preferred standard, and rank ENC. A 10% of the total CDS with extremely high/low ENC value were regarded as two datasets (high/low expression). Calculate the RSCU value of each codon in the two datasets, and record it as Δ RSCU. The codons with RSCU>1.00 were defined as high-frequency codons. The codons with Δ RSCU>0.08 and RSCU>1.00 were defined as optimal codon Wang et al., 2020a).

Correspondence analysis
Correspondence analysis (COA) is a type of the multivariate statistical methods, that allows used to explore the codon usage bias among genes, and also reflect the distribution of genes in a multi-dimensional space (Romero et al., 2003;Zhang et al., 2012), it can as vectors consisting of columns and rows. COA was performed on the RSCU values. The data is plotted in 59-dimensional space on different axes, and the RSCU value of one sense codon corresponds to each dimension (Nair et al., 2012;Nie et al., 2014;Wang et al., 2020b).
Statistical analysis SPSS 19.0 and Microsoft Excel software were used for statistical analyses. Correlation analysis based on Pearson's rank. The charts were depicted in Microsoft EXCEL 2010, including Neutrality plot, ENC-plot and PR2-plot. The box plot of GC contents was depicted using the online program of imageGP (http://www.ehbio.com/ImageGP/#opennewwindow).

Results
Base composition in Lonicera × heckrottii 'Gold Flame' chloroplast genome The assembled chloroplast genome of Lonicera × heckrottii 'Gold Flame' was 155,437 bp in length.
After removing genes less than 300bp in length and repetitive sequences, a total of 51 CDS sequences of Lonicera × heckrottii 'Gold Flame' selected were analysed (Table 1). The average GC content of chloroplast genome was 39.27% ( Figure 1A; Table S2). The distribution of the four deoxynucleotides was unequal, thymine (T), adenine (A), cytosine (C) and guanine (G) nucleotide of the 51 CDS sequences were recorded 30.82%, 29.91%, 21.32% and 17.94%, respectively. We have summarized the CDS numbers of different GC Content level, the GC content of 51 CDS sequences were contained between 32.00-47.00%, and the GC content was divided into three groups, the 39-40% groups contained the most (seven genes), followed by the 42-43% groups with 6 genes ( Figure 1B). We also calculated the GC contents of the first, second, and third sites of codons (GC1, GC2 and GC3) ( Figure 1C), the GC1 content of most genes was higher than the GC2 and GC3 content. The order of GC content was GC1>GC2>GC3. The average content of GC2 and the range between the upper and lower quartiles was the highest, followed by GC1, and GC3 was the lowest.

6
The ENC value of 51 CDS sequences ranged from 35.55 to 57.74%, and the average ENC was 48.75%.
Which shows that the codon usage bias of Lonicera × heckrottii 'Gold Flame' chloroplast genes was weak.
Correlation analysis results show that GC was extreme positive correlation with GC1, GC2 and GC3, GC3 were extreme positive correlation with ENC (r=0.3575, p<0.01), and GC1 were significantly correlation with GC2, GC3 (r=0.2759, p<0.05) ( Table 2). This indicates that GC content has an impact on codon usage bias. The codon usage pattern of Lonicera × heckrottii 'Gold Flame'chloroplast genome The amino acids number of 51 genes ranged from 63 to 822, with an average of 345. The RSCU values of different codons were varied greatly. We identified a total of 29 codons with an RSCU value > 1 ( Table 3).
The codon TAA encoding Leu shows the highest RSCU value of 1.83, and the RSCU value of the termination codon TAC is only 0.32. Among 29 codons, the number of codons ending in T, A and G is 16, 12 and 1, respectively. This indicates that in the codon usage bias of Lonicera × heckrottii 'Gold Flame' chloroplast genome tended to end with A/T. Optimal codons In this study (Table 4), 28 high-expression codons were identified in the Lonicera × heckrottii 'Gold Flame' chloroplast genome (RSCU>1 in high expression groups). Among them, 15 codons end with T, 10 codons end with A, 2 codons end with G and 1 codon end with C. 8 optimal codons were selected (RSCU>1 and Δ RSCU>0.08), including TCT, AGT, CCA, CAA, GAT and GGA codons preferred an ending with A/T, only ACG and CGC ending with G/C. It further confirmed that the codons ending with G/C were lacking in the chloroplast genome of Lonicera × heckrottii 'Gold Flame'.

Neutrality plot analysis
The relationship between GC3 and GC12 was analysed by the neutrality plot ( Figure 2). The range of GC3 was 0.35-0.55, and GC12 was 0.23-0.38. The average values of GC12 and GC3 were 0.44 and 0.30, respectively. The correlation analysis of GC3 and GC12 shows that the correlation coefficient was 0.2067 (a=0.05, r=0.2759), there were not significant. It shows that the chloroplast genome of Lonicera × heckrottii 'Gold Flame' was mainly affected by natural selection. ENC-plot analysis The ENC plot was used to analyse the codon usage patterns of the 51 CDS in Lonicera × heckrottii 'Gold Flame' (Figure 3). The values of ENC ranged from 35.55 to 55.74, and the average value of ENC was 48.75. The contents of GC3 ranged from 0.23 to 0.38, and the average content of GC3 was 0.30. The results of correlation analysis showed that the correlation between ENC and GC3 showed an extreme positive correlation (r=0.3575, P<0.01). Most of the genes were located near the standard curve, 27 genes were located below the standard curve. This indicates that mutation pressure might play an important factor in determining codon usage patterns. 9 Figure 3. ENC-plot analysis for chloroplast genes in Lonicera × heckrottii 'Gold Flame'

PR2-plot analysis
In the present study, Using PR2-plot analysis, points of PR2-plot fell among 0.33 to 0.59 on A3/(A3+ T3), and 0.35 to 0.74 G3/(G3 + C3) (Figure 4). The four regions in the PR2-plane were not evenly distributed. Most points fall on the lower right of the four quadrants centered on 0.5. Therefore, indicating that in terms of the use frequency of base, T>A and G>C. This further confirms that the codon usage pattern of Lonicera × heckrottii 'Gold Flame' chloroplast genome was not only affected by the mutation pressure, but also by other factors, such as selection. In this study, our correspondence analysis reveals the main trends in codon usage of chloroplast genes in Lonicera × heckrottii 'Gold Flame' based on the variation of RSCU values. Axis 1 accounted for 11.60% and Axis 2 accounted for 10.30% of the overall variation, whereas the next two axes accounted for 8.40% and 7.70%, respectively. It is confirming that Axis 1 and Axis 2 represented the main factor in explaining the data. The correlation analysis suggested that Axis 1 and Axis 2 have no significant correlation with GC, GC1, GC2, GC3, ENC (r=0.2759, P<0.05) ( Table 5). Axis3 had significant correlation with GC, GC1. Axis4 had negative significant correlation with ENC, and Axis4 had negative significant correlation with GC3 (r=0.3575, P<0.01). This result indicated that there is no obvious single trend in codon usage bias, and the reason may be several factors. In other words, this also implies that the formation process of Lonicera × heckrottii 'Gold Flame' chloroplast codon usage pattern was more complicated. Codon usage bias was formed in the long-term evolutionary process of organisms (Krasovec and Filatov, 2019). Among various species and genes has different codon usage bias (Karumathil et al., 2018). Many quantitative indicators reflected the pattern of codon usage, including GC, GC3 and ENC. ENC reflects the degree of codon deviation from random selection. So ENC is often used as an important indicator of codon usage bias. The ENC value ranges from 20 to 61. An ENC value of 35 was usually used as the criterion to distinguish between strong and weak codon usage bias (Wright, 1990). The smaller the ENC value means that the codon usage was more biased. In the present study, the average ENC value of 51 CDS sequences was 48.75%, which shows that Lonicera × heckrottii 'Gold Flame' chloroplast genome has a weak codon usage bias. Which was consistent with Hemiptalea davidii , Helianthus annuus (Chen et al., 2021), Oryza (Chakraborty et al., 2020) and Delphinium grandiflorum L. (Duan et al., 2021).
The RSCU value reflects the codon usage pattern of different genes. If the RSCU value was equal to 1.0, this codon was chosen randomly. The codon has negative codon usage bias while the RSCU value was less than 1.0. The codons with RSCU value greater than 1.0 have positive codon usage bias (Karumathil et al., 2018). In the present study, we have identified 28 high-frequency codons in Lonicera × heckrottii 'Gold Flame' chloroplast genome with RSCU value greater than 1.0. Among these codons, 25 were A/T-ending codons (Tending: fifteen; A-ending: ten), two with G-ending and only one with G-ending. In addition, 8 optimal codons were identified in our investigation. The majority of optimal codons were A/T ending. Which were similar to other plant species, such as Hemiptalea davidii , Helianthus annuus (Chen et al., 2021), the Poaceae family (Zhang et al., 2012), the Asteraceae family (Nie et al., 2014), Populus alba (Zhou et al., 2008) and Delphinium grandiflorum L. (Duan et al., 2021). However, high frequency codons are not always conservative in various species. For instance, TCC (Ser) was not in dicots, but was a high-frequency codon commonly used in monocots . Comparably, in the nuclear genome, the opposite pattern of codon ending bases might reflect differences between dicot and monocot species. The codons in dicot were biased towards A/T bases, and were biased towards G/C bases in monocot (Zhang et al., 2012;Mazumdar et al., 2017). The codon characteristics showed both species and genomes specificity (Kawabe & Miyashita 2003;Chakraborty et al., 2020;Liu et al., 2020). In the present study, the chloroplast genomes of dicot, monocot and woody plants were not systematically analyzed, which need to be further verified in the following work.
Effect factors on codon usage in chloroplast genomes of Lonicera × heckrottii 'Gold Flame' Codon usage bias is an important evolutionary feature in genome, it has been widely documented in various organisms (Wang et al., 2020b). Previous studies have shown that various biological factors were involved in the synonymous of codon usage patterns, such as ribosomal frameshifting, gene length, tRNA abundance, methylation, gene expression level, GC composition and mutation bias (Zhang et al., 2012;Mazumdar et al., 2017;Bergman and Tuller, 2020). However, the pressure of mutation and the natural selection were the main factors that affect the codon bias of many organisms in Solanum (Zhang et al., 2018). In the chloroplast genome of Camellia sinensis var. (Yengkhom, Uddin and Chakraborty, 2019), wetland plants (Deng et al., 2020), Hippophae (Wang et al., 2020b), Pisum L. (Bhattacharyya et al., 2019) and Nitrariaceae (Chi et al., 2020), natural selection plays an important role that affected the codon usage patterns, but the codon usage patterns of Coffea arabica (Nair et al., 2012) and Populus alba (Zhou et al., 2008) chloroplast genome were dominated by mutational pressure. Neutrality plot analysis, ENC-plot and PR2-plot analysis, indicated that the codon usage patterns of Lonicera × heckrottii 'Gold Flame' chloroplast genome was formed under the effect of mutational pressure and natural selection. This was similarly observed in Mesona chinensis Benth (Tang et al., 2020), Poaceae family (Zhang et al., 2012), Biebersteiniaceae and Nitrariaceae (Chi et al., 2020). The results of the COA analysis further suggest that there was no obvious single trend in the codon usage bias of Lonicera × heckrottii 'Gold Flame', probably due to several factors. Therefore, it can be seen that the plant codon usage patterns was affected by many factors, and merits further study.
The optimal codons of the chloroplast genomes in Lonicera × heckrottii 'Gold Flame' The optimal codon can provide an important representative of codon usage patterns. The 8 optimal codons in Lonicera × heckrottii 'Gold Flame' chloroplast genome was finally identified, and most preferred A/T ending. The number, type and distribution of optimal codons together with chromosomes were also considered to be related to gene expression. Codon usage patterns affect the expression of genes. The higher the gene expression level, the stronger the codon preference (Hershberg and Petrov, 2008;Tang et al., 2020). Codon preference affects the expression level of exogenous genes by regulating the accuracy and efficiency of gene translation (Zhang et al., 2012). Codon usage patterns and optimal codons were identified in Lonicera × heckrottii 'Gold Flame' chloroplast genome. This may be important for exploring species evolution and increasing the expression of exogenous genes in host cells.

Conclusions
This study systematically analysed the codon usage patterns in the chloroplast genome of Lonicera × heckrottii 'Gold Flame', and explored the influencing factors of codon usage bias. The codon usage bias of Lonicera × heckrottii 'Gold Flame' chloroplast genome was weak, and it tends to use A/T ending. The codon usage pattern is mainly affected by mutation pressure and natural selection. This study reveals for the first time the codon usage patterns and influencing factors in the chloroplast genome of Lonicera × heckrottii 'Gold Flame', and provides theoretical support for further research on the optimization of codons and the phylogenetic development mechanism.

Authors' Contributions
Conceptualization: JQZ; Data curation and Writing-Original draft: JQZ; Investigation: HCL and JQZ; Methodology: HCL and WTX; Writing-review and editing: JQZ and KYZ. All authors read and approved the final manuscript.
Ethical approval (for researches involving animals or humans) Not applicable.