Evaluation of Variability in Tunisian Olea europaea L . Accessions using Morphological Characters and Computational Approaches

The olive trees (Olea europaea L.) have been cultivated for millennia in the Mediterranean basin and its oil has been an important part of human nutrition in the region. In order to distinguish between olive accessions, morphological and biological characters have been widely and commonly used for descriptive purposes and have been used to characterize olive accessions. A comparative study of morphological characters of olive accessions grown in Tunisia was carried out and analyzed using Bayesian Networks (BN) and Principal Components Analysis (PCA). The obtained results showed that averages of fruit and kernel weights were 2.27 grams and 0.41 grams, respectively. Besides, a relatively moderate level of variation (51.22%) being explained by four Principal components. BN revealed that geographical localisation plays a role in the increase of tree habit, size of lenticels and leaf shape. A dendrogram has been carried out in the aim to classify studied olive accessions. We proposed a novel method of analysis based on the three-step scheme, in which first the data set is clustered, then olive tree features are evaluated. The studied accessions can be divided into four main groups by cutting the dendrogram at a similarity value of 0.645. Different relationships are studied and highlighted, and finally the collected features are subjected to a global principal component analysis. Obtained results confirmed that core surface was negatively correlated with geographical location (r = -0.52, p<0.05) and maturation period r = -0.539, p<0.05). Number of lenticels was positively correlated to lenticels size (r = 0.632, p<0.05). Core shape had a negative correlation with fruit shape (r = -0.759, p<0.05). On the basis of these findings, this research confirmed that morphological markers are a preliminary tool to characterize olive oil accessions.


Introduction
The cultivated Olive (Olea europaea L.) is a long-lived evergreen tree native to the Mediterranean basin (Poljuha et al., 2008); however, it is now found in several countries in Europe, North and South America, and Australia (Therios, 2009).The Mediterranean climate offers ideal growing conditions for olives: a long and hot growing season with a relatively cool winter (Connell, 1994).The leaves of an olive tree are thick, leathery, and oppositely arranged and live for a period from two to three years (Martin, 1994).The leaves are grey-green in colour.Stomata are only located on the lower surfaces of the leaves, and are involved in restricting water loss and protecting the leaves from harmful ultra-violet radiation (Fernandez et al., 1997).The morphology of the olive fruit is interesting.The fruit is a drupe which results from the growth of the ovary with only one of the ovules developing (Proietti et al., 1999).The size of the olive fruit is variable, even within the same tree, and depends on accession, fruit load, soil fertility, available water and cultural practices (Therios, 2009).Complete fruit development, from anthesis to ripening, lasts between 25-30 weeks, and after the first four weeks the parts of the fruit can be clearly identified.The endocarp consists of the pit and seed.Volatile and phenolic compounds found in the fruit are responsible for the aroma, taste and many of the health benefits associated with olive oil (Mez-Rico et al., 2006).The concentrations of these compounds change depending upon the degree of ripening of the fruit.The quality of pickling olives is greatly affected by the concentrations of organic acids and sugars found in the fruit due to the role of these compounds in the fermentation process (Patumi et al., 1999).It is important to know when maturation has been reached, in order to harvest fruit when these compounds are at optimal levels.The fruit maturation process is characterized by a change in the fruit

Morphological characterisation and clustering analysis of the data
The description of traits was recorded for 3 successive years (2013)(2014)(2015), from either the collection fields or farms, following indications provided in literature (Ben Ayed et al., 2016).Quantitative traits (fruit and kernel weights) were transformed as ordinal characters using discretization method to minimise environmental effects.This experiment on 21 traits collected from 30 Olea europaea L. accessions was carried out in order to highlight the possible connections between different traits and characteristics of accessions using statistical analysis and classification systems.Each accession to select 5 plants of healthy growth was measured.For each tree, morphological observations were made on 20 leaves and on 20 fruits.Maximum length and width of leaves were measured using mm scale and maximum length and diameter of the fruits were measured using a screw gauge.After fruit characterization, the stone was removed and the dimensions were also measured using a screw gauge and subjected to characterization.The morphological study integrated both quantitative and qualitative variables.Regarding tree characteristics, the height and the circumference of both canopy and the trunk were determined.For leaf; the length and width were determined.For flower, all observations on inflorescence and flower were done on 20 flowers at the middle part of fruit bearing branch.Concerning fruit, measures were carried-out on 20 drupes collected uniformly.The fruit polar length, cross-sectional width, weight and length/width ratio were determined.For stone; the polar length, cross-sectional width, weight, length/width ratio, numbers of grooves and flesh to stone ratio were determined.For endocarp, all observations of the endocarp were done on 20 endocarps, well cleaned, taken from the same fruit used for description.Furthermore, other qualitative variables were also recorded according to the methodology for primary characterization of olive varieties (IOOC, 1997;Barranco et al., 2000;Ben Ayed et al., 2016).

Data analysis dendrogram
Qualitative and quantitative data sets were analysed.Quantitative variables were standardized (mean = 0, variance = 1) for numerical analysis (Manly, 1986).Qualitative data sets were scored as the presence or absence of a character.The data matrix was converted into a matrix of similarity (S) values using Jaccard coefficient (Jaccard, 1908).For a pair of two accessions, i and j, this coefficient is calculated as: where ni is the number of bands present in accession i and absent in accession j, nj is the number of bands present in j and absent in i, and nij is the number of bands shared by the two accessions i and j.A tree is then inferred using the unweighted pair group method using an arithmetic average (UPGMA) clustering algorithm.All analyses were done using NTSysPc program version 2.1 (Rohlf, 1999).The characteristics of encoding are using the method of rank number coding, of multiple standards of quality performance traits encoded by the classification method.The quantitative traits are not encoded and directly analysed in raw data form in the next operation.Five values are used to calculate the average of the studied quantitative trait.

Bayesian networks modelling
Bayesian networks (BNs) are a powerful framework for decision support under uncertain knowledge.They come out color from green to a dark purple or black (Proietti et al., 1999).Each accession has a different intensity of the final colour which indicates ripeness.Olives have great commercial, economic, and social importance in Tunisia, Tunisia is a major producer and exporter of olive oil (Eyres, 2012).Selecting adequate accessions is considered as a crucial requirement because the need to improve efficiency in growing olives and extracting the oil.These accessions have rich genetic diversity, and more synonymy or homonymy were found in Tunisian breeding accessions.
Olive trees can be considered as very complex dynamical systems, showing a remarkable ability of adapting their metabolism to different environment conditions.Many important processes in the olive tree physiology are controlled by dynamic interactions between various endogenous and exogenous parameters.In this context, computational methods such as classification and clustering were successfully used by biologists in order to analyze experimental data.In fact, the most commonly used computational method for analysing olive tree data is clustering.While clustering provides a compact summarisation of the data and might point to functional relationships between clustered varieties, it suffers from the following shortcomings: firstly, clustering is based on a global correlation measure.This obscures relationships that exist over only a subset of the data.Secondly, clustering fails to detect interactions between different varieties from linear correlation.Finally, it is impossible to incorporate additional types of information, such as experimental details.
Therefore, we propose an alternative unified approach for data estimation, based on Bayesian Networks (BNs) that take account of hierarchical structure among covariates.A BN, is a probabilistic graphical model tool for describing relationships in a wide variety of domains (Adusei-Poku et al., 2007), including artificial intelligence, decision theory and data fusion (Pourret et al., 2008).When this is done, the BN fully determines the joint probability distribution over all the variables in the domain.As a result, we can answer various probabilistic queries (probabilistic inference) about the domain by conditioning and marginalizing some sets of variables (Koller and Friedman, 2009).The aim of this study was firstly to evaluate the associations of autochthon olive accessions grown in Tunisia on the basis of a combination of morphologic and agronomic parameters by clustering, and also to analyse accurately the type and origin of existing relationships between the studied parameters by Bayesian Networks (BN) and Principal Component Analysis (PCA) methods.

Plant material
In this investigation, 30 accessions of Olea europaea L., representing the diversity of Tunisian olive germplasm, were collected from different local farms.For each olive tree variety, five of 30-40 year-old olive trees were selected.Trees were watered according to usual agricultural practice and were exposed to natural sunlight.All measurements were made between November and early December.Table 1 presented the Olea europaea L. accessions and their studied traits.All samples were collected shortly after sunrise, placed within wet paper layers, wrapped in polyethylene bags and transferred immediately to the laboratory.from artificial intelligence studies and constitute one of the most coherent techniques for the acquisition and the modelling of complex systems.They have been applied to a large range of problems, and eventually in biology (Ennouri et al., 2015(Ennouri et al., , 2016)).Since the data are discontinuous and experimental data are limited, and it is well known that the application of BN requires a lot of data for the learning and testing procedures, our proposed methodology includes the following three different stages for building model: Data normalization and Construction of Bayesian Network.A sample data from different experiences was normalized from experimental data and a Bayesian network (Pearl, 2000) is generated as following: two nodes i and j having a partial correlation are connected by a non-oriented edge.The orientation is determined by a heuristic method based on the following test: If Bij = wj j σii /wii σj j > 1, the arc is then oriented from i to j and if Bij = wj j σii/wii σj j < 1, the arc is then oriented from j to i.The other edges with Bij = wj j σii /wii σj j = 1 remained undirected.The graph with all directed arcs constituted the Bayesian network.It is imperative to note that it does not necessarily include all nodes contained in the network (Opgen-Rhein and Strimmer, 2007).The advantage of Bayesian network is to deduct all parent nodes which are directly dependent on child nodes.R program was used to analyze obtained data.

Principal Component Analysis
Principal component analysis (PCA) is a multivariate statistical technique and it is used by almost all scientific disciplines.PCA analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated.Its goal is to extract the important information from the data table and to express this information as a set of new orthogonal variables called principal components.Principal component analysis (PCA) of phenotypic data was performed using OriginPro 9.1.Absolute value of 0.50 was used in the loading matrices to select the traits in a particular principal component (PC).Correlations between variables were calculated with the Pearson correlation coefficients.Pearson's correlation coefficient is a statistical method of quantifying the association between two variables.

Results and Discussion
Name accessions, geographical localisation, fruit weight en kernel weight were presented in Table 1.Fruit weight values ranged between 9.6 grams and 0.68 grams, and kernel weight values ranged between 1.35 grams and 0.15 grams.Furthermore, the averages of fruit and kernel weights of studied accessions were 2.27 grams and 0.41 grams, respectively.

Genetic diversity and relationship between Tunisian olive accessions
To explain genetic relationships among 30 Tunisian olive accessions, a dendrogram was produced using UPGMA cluster analysis and the Jaccard similarity coefficients over 21 morphological characters (Fig. 1).The accessions studied can be divided into seven main groups by cutting the dendrogram at a similarity value of 0.75.The first group corresponds to the 18 accessions of the top of the dendrogram including accessions originated from center and south region.The second group contains 'Fakhari', 'Meski', 'Gerboui' and 'Besbessi'.The third group contains 5 accessions.The forth group corresponds to the 3 accessions which also are clonally related.The UPGMA clustering obtained from the agronomic distance matrix showed a rather high variability among the most of Tunisian olive accessions generally clustered according to their geographical origin.For example, North Tunisia accessions were clustered in Group 2, 3 and 4. Accessions that originate from the centre and the south of Tunisia were clustered in Group 1. Classification of accessions based on their geographic origin was also demonstrated in a larger geographic scale study presented by Belaj et al. (2001) with accessions from numerous Mediterranean Basin countries.Moreover, several previous studies demonstrated that the geographic and genetic structure was not exclusively observed among accessions of different countries, but also within accessions of the same country (Claros et al., 2000;Carriero et al., 2002;Ben Ayed et al., 2011, 2015a).Likewise, Sanz-Cortés et al. (2001) observed in a specific region of Spain, subclustering according to the geographic origin within that region.
The grouping of the accessions from the same or nearby region suggests a common genetic base and an autochthonous source for these accessions.This result coincides with the hypothesis of autochthonous origin of most of the olive accessions as well as their limited diffusion from their centers of origin (Barranco and Rallo, 2000;Belaj et al., 2001;Besnard et al., 2001).Accession intercrossing and crosses with wild accessions, along with local selection of outstanding seedlings and subsequent vegetative cloning, could have led to a large number of varieties around their possible original areas of cultivation.Conversely, using DNA-based markers such as SSR and SNP, Ben Ayed et al. (2011Ayed et al. ( , 2015a) ) reported that most of Tunisian olive accessions clustered according to their Fig. 1.Dendrogram of 30 Tunisian olive accessions generated by unweighted pair group method using an arithmetic average cluster analysis using Jaccard similarity coefficients from 21 morphological characters Fig. 2. Circle of correlations based on morphological data in olive varieties.Plot of first two PCs with contributing phenotypic traits (geographical localisation: A, tree habit: B, tree vigor: C, foliage density: D, leaf shape: E, fruit shape: F, symmetry of fruit: G, position of maximum diameter: H, nipple of fruit: I, type of the top of fruit: J, base of fruit: K, number of lenticels of fruit: L, Size of lenticels: M, core form: N, core symmetry: O, top of core: P, base of core: Q, core surface: R, flowering period: S, maturation period: T and oil production level: U) fruit size or commercial use (table or oil olive), but no classification was showed based on geographical origin.In addition, they found a comparable grouping pattern among 'Chetoui' and 'Rkhaymi' that were grouped with the Northern large-fruited size accessions.

Principal Component Analysis
Principal component analysis (PCA) on the basis of phenotypic data on fruit morphology and colour traits identified four principal components (PCs) explaining more than 51% of the total variation (Table 2).The circle of correlations was presented in the Fig. 2. In a correlation circle, each measured variable is shown as a vector, which signals the combined strength of the relationships between the measured variable and two PCs (vector length) and whether these relationships are positive or negative (vector direction).The angle between two vectors signals the degree of correlation between two measured variables.A right angle indicates that two variables are completely uncorrelated; zero or 180 degrees between two variables indicates complete positive or negative correlation.Correlation circles allow for graphical examination of the relationships among indices, and the consistency of these relationships among olive groups (Ben Ayed et al., 2015a, 2015b).The Eigenvalues, contribution and cumulative contribution of studied characters were summarized in Table 2.The first PC had high loadings for core surface "R", geographical localisation "A", base of core "Q", maturation period "T", oil production level "U", Size of lenticels "M", fruit shape "F", and leaf shape "E", which explained together 18% of the total phenotypic variation.Core surface "R", fruit shape "F" and base of core "Q" were negatively correlated whereas Table 2. Eigenvalues, contribution and cumulative contribution of principal components of 21 characters (Geographical localisation: A, tree habit: B, tree vigor: C, foliage density: D, leaf shape: E, fruit shape: F, symmetry of fruit: G, position of maximum diameter: H, nipple of fruit: I, type of the top of fruit: J, base of fruit: K, number of lenticels of fruit: L, Size of lenticels: M, core form: N, core symmetry: O, top of core: P, base of core: Q, core surface: R, flowering period: S, maturation period: T and oil production level: U)  localisation "A", maturation period "T", oil production level "U", size of lenticels "M" and leaf shape "E" are positively correlated.The second PC had significant loadings for type of the top of fruit "J", symmetry of fruit "G", core form "N", top of core "P", core symmetry "O", flowering period "S" and fruit shape "F" and parameter 2 explained 16% of the total phenotypic variation.The highest positive loading is attributed for type of the top of fruit "J" and negative for fruit shape "F".The third PC explained 9 % of the total phenotypic variation.Finally, the fourth component explained 8% of the total phenotypic variation.This was a relatively moderate level of variation (51.22%) being explained by only four PCs using morphological data.Eigenvalue is the variance explained by the PCs.This variance can also be presented as proportion of the total phenotypic variance, which is 100% in total.

Bayesian networks modelling
BNs are directed acyclic graphs composed by nodes (variables of the problem) and arcs that encode conditional probabilistic independencies between the nodes.These graphical models are very attractive for their aptitude to explain probabilistic interactions connecting variables.
In fact, they have proven to capture causal relationships between variables and they can show excellent forecast accuracy even with relatively small sample data sizes (Benson, 2015).To achieve the mentioned objectives, Bayesian networks modelling were used.We considered 21 nodes as represented in Fig. 3. Correlation coefficients among all morphological traits in olive accessions are presented in Table 3. Determinations of physical and mechanical properties of agricultural products are very important factors in the design of processing, grading, transporting and other agricultural machinery (Altuntaş et al., 2007).Moreover, the shape and the size of the product are the most important physical properties (Altuntaş et al., 2005).In our study, base of core "Q" has a double connection: "Q" is negatively related with core symmetry "O" and positively with base of fruit "K".Core symmetry "O" and core form "N" were negatively influenced simultaneously by fruit shape "F" and core symmetry "O" which influenced directly symmetry of fruit "G".El-Soaly (2008) found that the olive fruit length, diameter and weight were directly proportional to its pit for the investigated varieties.
Geographical localization "A" plays a role in the increase of tree habit "B", Size of lenticels "M" and leaf shape "E".In fact, olive leaves are affected by geographical region (Al-Rimawi et al., 2014).However, localization "A" is involved in the decrease of foliage density "D", core surface "R" and top of core "P".It has been demonstrated that pit development is a continuous and progressive process and length of this period can vary according to water status (Rapaport et al., 2004).Several studies have shown that climatic factors such as temperature and precipitation, closely related with geographical location, have an effect on plant physiology (Pannelli et al., 1994;Ryan et al., 1998).Furthermore, symmetry of fruit "G" has an active function in enhancement of nipple of fruit "I" and opposite effect on position of maximum diameter "H".Moreover, position of maximum diameter "H" is affected by leaf shape "E".The top of core "P" influenced positively the type of fruit top "J" and the base of fruit "K" is indirectly connected with the top of core "P" by the type of the top of fruit "J".Moreover, core surface "R" has simultaneous associations: in fact, "R" is negatively connected with maturation period "T" and oil production level "U".The number of lenticels of fruit "L" influenced directly size of lenticels "M" and "M" has a retroactive effect on tree vigor "C".Indeed, tree vigor has been defined as the overall physiological condition or "health" of a tree in a given environment (Wargo, 1978).Physiological condition of the tree will determine the tree's response to defoliation, but it is difficult to measure (Kozlowski, 1969).Difficult environmental conditions such as drought, late spring frosts, ice storms, excessive moisture, competition, cutting, slash disposal, and other disturbances can cause stress on trees that influences and changes overall tree vigor.Trees can recover from stresses over time.Moreover, our findings are in concordance with several studies.In fact, De La Rosa et al.
(2007) and Tous et al. (2010) demonstrated that 'Arbequina', a low vigor accession, is currently considered the most important accession for super high-density olive groves.Lenticels are lensshaped macroscopic openings that occur on the surfaces of roots, shoots, some fruits (Kuo-Huang and Hung, 1995).Lenticels are essential to the plant, since they control gaseous exchange for photosynthesis, respiration and transpiration in the absence of stomata (Mauseth, 1988).The gross anatomy of mature lenticels in many plants has been described, but only a few published reports are referring to the development of lenticels (Jacob et al., 1989).

Conclusions
In conclusion, this research indicated that morphological markers are a preliminary tool to characterize olive oil accessions.In fact, this study has revealed that the distribution of Tunisian olive accessions is based on geographic origin (North, Center and South).The use of DNA-based markers such SNP will be suitable to confirm our findings and provide automated tools for olive accessions identification, characterization and classification.Since the clustering of Tunisian olive accessions is a complicate task, we have need the combination of several marker systems to provide a more complete understanding of the diversity of available Tunisian olive tree accessions and the way in which it can be best used for olive breeding.

Fig. 3 .
Fig. 3. Bayesian networks connecting related variables (geographical localisation: A, tree habit: B, tree vigor: C, foliage density: D, leaf shape: E, fruit shape: F, symmetry of fruit: G, position of maximum diameter: H, nipple of fruit: I, type of the top of fruit: J, base of fruit: K, number of lenticels of fruit: L, Size of lenticels: M, core form: N, core symmetry: O, top of core: P, base of core: Q, core surface: R, flowering period: S, maturation period: T and oil production level: U).

Table 3 :
Correlation coefficients among all morphological traits in olive varieties