Chestnut (Castanea sativa Mill.) cultivar classification: an artificial neural network approach

The present study investigated the possible use of artificial neural networks (ANN) to classify five chestnut (Castanea sativa Mill.) varieties. For chestnut classification, back-propagation neural networks were framed on the basis of physical and mechanical parameters. Seven physical and mechanical characteristics (geometric mean diameter, sphericity, volume of nut, surface area, shell thickness, shearing force and strength) of chestnut were determined. It was found that these characteristics were statistically different and could be used in the classification of species. In the developed ANN model, the design of the network is 7-(5-6)-1 and it consists of 7 input, 2 hidden and 1 output layers. Tansig transfer functions were used in both hidden layers, while linear transfer functions were used in the output layer. In ANN model, R value was obtained as 0.99999 and RMSE value was obtained as 0.000083 for training. For testing, R value was found as 0.99999 and RMSE value was found as 0.00031. In the approximation of values obtained with ANN model to the values measured, average error was found as 0.011%. It was found that the results found with ANN model were very compatible with the measured data. It was found that the ANN model obtained can classify chestnut varieties in a fast and reliable way.


Introduction
Chestnut is an important plant which is consumed in very different ways and which is very valuable in terms of its rich nutritional content. It can clearly be seen from different health and composition studies that chestnut and extracts of chestnut trees have significant potential as food ingredient or functional food. For the most part, chestnut is produced in China, Bolivia, Turkey, Republic of Korea, Italy and Portugal. Globally, 367 around 2.3 million tons is produced and Turkey is the third largest producer with 63 thousand tons according to the FAO statistics (FAO, 2017). In terms of geography, chestnut is found in three major areas: Europe with Castanea sativa Mill., Asia with Castanea creanata Sieb. and Zucc. (Japan) and Castanea mollissima Bl. (China and Korea), and North America with Castanea dentata Borkh. (Bounous, 2005;Lang et al., 2006). Anatolia is the gene culture and one of the oldest areas where Castanea sativa Mill., which is also known as European chestnut or sweet chestnut, is cultured (Soylu, 2004). Turkey is the main producer of Castanea sativa, global production of which is around 190 000 tones/year (Bounous et al., 2001). Chestnut is an important product in our country both as an orchard crop and in terms of agroforestry; and it is mainly produced in Black Sea region, Marmara region and Aegean region in Anatolia. In Turkey, there are 17 varieties, all of which belong to Castanea sativa Mill. species, registered by Ministry of Agriculture and Forestry Seed Registration and Certification Directorate (TTSM, 2019).
In addition to being consumed fresh by frying/boiling, chestnut is also consumed intensely after being processed in food industry as dried, flour, or confectionery products. In both types of consumption, since the product has different size, taste, texture and starch and sugar content, determination of the variety and preparing for consumption and process is an important post-harvest stage. For this reason, it is important to classify chestnut varieties in a fast, reliable and highly accurate way by taking a few classification characteristics into consideration. This way, models obtained for identifying and classifying products are transferred to many different real time applications.
In agricultural product identification and classification, one method which has gained common acceptance is artificial neural networks. Selected classification criteria can influence classification accuracy. The data set is separated by classification criteria and thus, the object is included in one of the resulting categories. In recent years, neural network (NN) methods have been used frequently to develop classification criteria. For pattern identification, artificial neural networks are commonly used. These networks get their inspiration from the concept of biological nervous systems; it has been proven that they are powerful in handling ambiguous data and problems which necessitate the interpolation of numerous data. NNs do not sequentially perform a program of instructions; instead they explore a great number of hypotheses at the same time with the help of massive parallelism (Lippman, 1987). This is true only in cases when specific hardware implementation is used. A computing network of highly interconnected processing elements called "neurons" or "nodes" are developed through neural network methods. Neural networks can solve problems when some inputs and corresponding output values are known; however, it is difficult to create a mathematical function from the association between inputs and outputs and difficult to understand. For this reason, in grading, sorting and identifying agricultural products, these classifiers have a great potential (Visen et al., 2002;Dubey et al., 2006).
The structure of an ANN has three layers as input, hidden and output ( Figure 1). The layers consisting of neurons are linked to each other with weights. While there are a great number of learning algorithms to find out weights, the most popular one is back propagation learning algorithm, which is used to minimize the total error by changing the weights. The inputs from the previous layer are multiplied with the weights of the matching connections.
Every neuron process input weighted with transfer function in order to produce its output. Transfer function can be a linear or nonlinear function. The data are grouped in two as training set and test set. The aim is to determine the weight values which minimize the difference between actual output and estimated output values at the output layer. The trained network is later tested with the data in test set. When the test error comes to the predetermined tolerance value, the training of the network is terminated (Kalogirou, 2001).
Chestnut sizes and shapes, texture, content (starch and sucrose sugar etc.) are important quality indicators. Its size and shape which differ according to varieties are prominent criteria in use in fresh consumption and in pastry and sugar industry. Variety based classification of chestnut according to these characteristics that emerge in consumption demand is also commercially important. Although these factors determine the grade and price of chestnut to a great extent, it is not possible to make an objective and rapid measurement when there are no precision instruments and human expertise. Considering the great amount of genotype and variety in chestnut, using smart system applications for a rapid and reliable product identification and classification by using contemporary technologies according to specific characteristics of the product becomes a necessity.
A statistical pattern classifying technique developed to rate variety of chestnuts objectively and consistently by using these properties was described in this paper. In the study, identification and classification methods which can easily be adapted to application to classify varieties of chestnut were chosen. The aim of this study was to classify chestnut cultivars by their physical and mechanical characteristics using an ANN approach developed.

Materials and Methods
In this study, 'Erfelek', 'Ünal', 'Marigoule', 'Sarıaşlama' and 'Işıklar' chestnut cultivars, which have a commercial importance, were used for the experiments. While 'Erfelek', 'Ünal' and 'Marigoule' cultivars were obtained from Blacksea Region (Samsun and Sinop), 'Işıklar' cultivar was obtained from Aegean Region (Aydın), and 'Sarıaşlama' cultivar was obtained from Marmara Region (Bursa). For each chestnut cultivar, the samples were harvested from 5 different trees randomly in the harvesting seasons October and November 2017. The samples were kept in perforated polyethylene bags in a cold storage (0 °C, 75-85% humidity) until they were used. Care was taken to conduct the experiments in shorted time possible after chestnuts were harvested. In order to clear away all foreign matter, immature, broken or spoilt nuts, the chestnuts were cleaned manually.
Physical features (length, width and thickness of chestnuts, geometric mean diameter, sphericity, volume of nut, surface area, and shell thickness) and chestnut skin mechanical features (shearing force and strength) of each chestnut were found.

Physical and mechanical properties
In order to determine the average size, 100 chestnuts belonging to each variety were selected randomly. Length, width and thickness of chestnuts were found with a digital caliper having an accuracy of 0.01 mm. The following formulas were used to calculate the geometric mean diameter, sphericity, surface area and volume of chestnut (Yurtlu et al., 2010;Taner et al., 2018): Lloyd Instrument Universal Testing Machines (Lloyd Instrument LRX Plus, Lloyd Instruments Ltd, An AMATEK Company and NEXYGEN Plus software) (Yurtlu and Yeşiloğlu, 2011) was used to find out chestnut shells' mechanical properties of under compression load. In order to measure the shells of chestnuts, the device was equipped with a load cell of 55 N and the load cell had a measurement accuracy of 0.5%. Shell was removed carefully from each chestnut; shell thickness was measured from 3 different points and average shell thickness was measured and recorded. In order to find out shearing force and strength of skin in chestnut, chestnut shell was placed on the apparatus and pressed by crosshead steel plunger on moving head at the 10 mm/min speed until puncture. Force-deformation curve, which included a rapid decrease in force, was used to find out puncture point. Shearing force and shearing strength were used to express the mechanical properties of chestnut shell. For each test, the shearing force was recorded. The following equation was used to find out the shearing strength values at puncture point (Mohsenin, 1980): Artificial neural networks MATLAB NN Toolbox was used to develop ANN model. A total of 500 data were used in the model. In ANN model, while geometric mean diameter, sphericity, volume of nut, surface area, shell thickness, shear force and strength were used as input parameter, variety was used as output parameter. While forming ANN model, all the data were normalized between 0 and 1 (Purushothaman and Srinivasa, 1994).
The following formula was used for normalization: In order to obtain the actual values from normalized values, "yi" value was calculated from the same formula.
In order to develop the ANN model, normalized values were grouped in two as training and test set. 440 data were used in training set, while 60 data were used in test set.
The most appropriate neuron numbers in hidden layers were found by trying 2-25 with trial and error method. In the ANN model, epoch number was tried from 1 to 10.000 in order to find out the most appropriate epoch number. Through trials, the most appropriate epoch number was determined for the model. In the ANN model, Feed Forward Back Propagation, Multi Layered Perceptron Network was used. The most popular and most widely used algorithm in this network is Back Propagation algorithm. Back Propagation algorithm reduces total error by changing the weights to improve the efficiency of the network (Jacobs, 1988;Minai and Williams, 1990). The training algorithm used is Levenberg-Marquardt algorithm (Levenberg, 1944;Marquardt, 1963). Until the predetermined tolerance value was reached for test error, network training continued. Following the end of network training, test data was used to test the network (Kalogirou, 2001).
For finding out the performance of the results, RMSE and R 2 , which are based on the concept of mean error and which are among the primarily used accuracy criteria, were calculated with the formulas below (Bechtler, 2001): The relative error between the measured values and the estimated values was calculated with the help of the following equation (Bağırkan, 1993): The data related with the obtained physical and mechanical properties were assessed with variance analysis according to Random Blocks Trial Design in SPSS.21 program (Yurtsever, 1984). Table 1 shows the average and standard deviation values of the physical and mechanical properties of chestnut varieties. All of the physical and mechanical properties were found to be statistically significant (P<0.05). Geometric mean diameter values were found to differ between 24.64 and 30.76 mm. While 'Marigoule' variety was found to have the highest value, 'Erfelek' and 'Ünal' varieties were found to have the lowest value in the same group. In their studies, Hamleci and Güner (2015) found this value as between 28.9 and 32.9 mm and Yurtlu and Yeşiloğlu (2011) found this value as between 21.4 and 26.4 mm for different chestnut varieties. Sphericity values were between 78.69% and 84.48%. While 'Ünal' variety was found to have a high value, 'Işıklar' and 'Marigoule' varieties were found to have the lowest value in the same group. Hamleci and Güner (2015) reported sphericity as between 77.2 and 80.5% in different varieties, while Yurtlu and Yeşiloğlu (2011) reported as between 79% and 86%. Volume of nut was found as 6325.58 mm 3 the lowest and as 12017.66 mm 3 as the highest. While 'Marigoule' variety was found to have the highest value, 'Erfelek' and 'Ünal' varieties were found to have the lowest values. Surface area was found to vary between 1685.34 and 2600.67 mm 2 . While 'Marigoule' variety was found to have the highest value, 'Erfelek' and 'Ünal' varieties were found to have the lowest values. Shell thickness had values between 0.49 and 0.65 mm, while the highest value was found in 'Marigoule' and 'Işıklar' varieties, the lowest value was found in 'Sarıaşlama' variety.

Results and Discussion
Shearing force was found to differ between 82.70 and 157.55 N. While 'Marigoule' variety was found to have the highest value, 'Sarıaşlama' variety was found to have the lowest value. Shearing strength was found to be between 10.92 and 15.59 N mm-2. While 'Marigoule' variety was found to have a high value, 'Işıklar' and 'Sarıaşlama' varieties were found to have the lowest value.
The fact that the assessed physical and mechanical properties were statistically different showed that these properties were significant parameters in the classification of these varieties and that a successful classification could be made.
In the ANN model, the formation of the network was 7-(5-6)-1, designed as 7 input, 2 hidden and 1 output layers (Figure 1). As for transfer function, tansig was used in both of the first and second layers, while linear functions were used in the output layer. For the network, the lowest training value was obtained at 901 epoch number.
The equation below shows the mathematical formula of the ANN model: For the second hidden layer, tansig transfer function (Fk) was calculated with the equation: For the first hidden layer, TANSIG transfer function (Fj) was calculated with the equation: Table 2-4 shows the weights, while Table 5 shows bias values. In the ANN model, R 2 value was obtained as 0.99999 and RMSE value was obtained as 0.000083 for training. For testing, R 2 value was found as 0.99999 and RMSE value was found as 0.00031. Table 6 shows experimental data, predicted values calculated from the ANN model, and the error values between these. Average error of the values obtained with ANN model when compared with the measured values was found as 0.011%. For all parameters, average error was obtained below acceptable limit (10%) (Çarman and Taner, 2012).
Measured values and the results of the test data obtained from ANN model were compared (Figure 2). It was found that the test data obtained with ANN model were in parallel with the measured data.
Determination coefficient (R 2 ) of the relationship between measured data and the data calculated from ANN model was found as 99.99% (Figure 3). Figure 1. Structure of the ANN model The success of the results of this study show that this method can be transferred to modern and innovative applications to classify chestnut. This way, the determination of chestnut varieties can be made without needing experts and expensive methods. Since physical and mechanical properties gave high accuracy degrees in our study, it shows that a database created in this way can also be used in biotechnological researches in the future. The results of our study demonstrate that the identification method tested is inexpensive, fast and reliable with high accuracy and for this reason it can be considered as an alternative to genetic applications in identifying chestnut cultivars.

Conclusions
A classification method for chestnut based on ANN was proposed in this study. Physical properties of nut and some basic mechanical properties of chestnut skin were used together. The results showed that ANN achieved a significant classification accuracy of 99.99%. The contributions of the study can be expressed as follows. A hybrid feature set, which included shape information and mechanical information was proposed. Back-propagation algorithm was introduced and employed to the training of ANN. Identification and classification methods that can easily be adapted into practice were chosen to classify chestnut varieties. The chosen identification and classification properties can be used in the design of many different systems from a simple hand device to a complex classification automation system. It is thought that this method will solve the problem of chestnut classification. Further studies should focus on using more chestnut varieties and more data.