SELECTING THE BEST TARGET FUNCTION TO PREDICT CROP YIELDS USING THEIR WATER USE THROUGH REGRESSION ANALYSIS

Summary. Current agricultural research is relevant to crop yield prediction. While there are many mathematical methods for predicting agricultural yields, regression analysis is still one of the more popular ones. The effectiveness of the prediction model is crucial

СЕКЦІЯ XII. АГРАРНІ НАУКИ ТА ПРОДОВОЛЬСТВО danger of drought and an increase in aridity, one of the key variables influencing crop yields would be the availability of water [13]. Therefore, it is reasonable to consider this factor as the major input.
The main objective of this study is to determine which popular regression analysis techniques, currently employed in agricultural yield modeling, are best for predicting the yield of important crops grown in the South of Ukraine by the amounts of water they use. Such models are crucial for the region's sustainable crop production because it falls under a category of agriculture that is very dangerous due to a persistent shortage of natural water supply [14].
Materials and methods. The basis for the study were retrospective yielding data for the crops of winter wheat, grain corn and soybeans, recorded at the irrigated and non-irrigated land-plots of the Institute of Climate-Smart Agriculture within 1970-2020. The initial dataset for winter wheat included 45 "yield -water use" pairs (for the period 1971-2016); for grain corn -47 "yield -water use" pairs (for the period 1970-2016); for soybeans -53 "yield -water use" pairs (for the period 1981-2020). The direct harvesting of the crops and subsequent recalculation of the yield to the standard moisture (14% for grain of winter wheat and corn, 12% for soybeans), provided the yielding data for the field experiments. The common methodology, described in the paper, was used to determine the studied crops' water use [15], by the Eq. (1): WU is water use, m 3 /ha; ER -effective rainfall (rainfall more than 50 m 3 /ha); SM -soil moisture (only moisture, taken up by the crops), m 3 /ha; IR -irrigation rate, m 3 /ha.
Utilizing a rain gauge, effective rainfall was measured under field conditions. Recalculating effective rainfall from millimeters to m 3 /ha required multiplication of the first figure by 10. The difference between the moisture at the time of sowing and harvesting was used to determine the soil moisture that was consumed by the crops. The gravimetric approach was used to gauge the soil moisture [16].
Using the best subsets regression analysis method and the BioStat v.7 software, the yields of the crops were mathematically modeled in accordance with the values of their water use [9,17,18]. The approach included nine regression target functions, which are presented in the Table 1. The value of the Pearson's correlation coefficient (R; the greater, the better) was used to assess how well various regression models fit data, while the accuracy was determined by the values of the mean absolute percentage error (MAPE; the less, the better), the maximum absolute error (MAE; the less, the better), and the magnitude of the absolute errors. (A; the less, the better) [19,20].
Through the calculation of the total score for each examined model, the ultimate judgment on the model quality was made. For each examined regression function, the best values of the statistical indices were added to determine the total points "for," with 1 point being assigned to each index. The model with the greatest R and the lowest MAPE should be chosen if the models have equal total scores.
Results. Following the statistical analysis of the crop yielding data, a total of 27 mathematical models for the yield prediction based on the crop water use were created. ( Table 2). The statistical indices for the models, which were chosen to assess the accuracy and quality of their fitting, are shown in Table 3. The best overall score, shown in the corresponding graph of Table 3, is used to determine the optimal regression function. The examination of the models revealed that soybeans had the best overall quality of the models (both in terms of fitting quality and prediction accuracy), and winter wheat had the worst. This finding may be explained by the input dataset's higher homogeneity for soybeans (where the crop varieties varied less and all crops were irrigated) and winter wheat's highest variability (many different varieties, cultivation in the irrigated and non-irrigated conditions). Additionally, there are differences in the optimal way to respond to regression functions. Soybeans and grain corn often respond well to polynomial functions (quadratic and cubic), but winter wheat responds best to exponential-1 and reverse functions.

СЕКЦІЯ XII. АГРАРНІ НАУКИ ТА ПРОДОВОЛЬСТВО
When comparing the final scores for each model, we discovered that cubic and quadratic functions both received an equal score of "4" points. We propose that cubic function should be the first option for agricultural modeling because it will perform better if we ignore less significant indices of the greatest absolute error and the amplitude of the absolute errors. We advise against using linear, power (stepwise), logarithmic, and exponential-2 functions in crop modeling unless there are compelling reasons to do so. Of course, there are limitations to this study, and the notion also holds true for models created using medium-sized datasets (45-55 input pairs) as those used in this work.
Discussion. Crop yield forecasting is an important and difficult task for modern agricultural research. As soon as crop yield prediction's relevance was acknowledged, scientists all over the world began looking for suitable mathematical techniques to use for the aforementioned purpose [21]. Regression analysis was the first statistical technique to be used for crop yield prediction [22].
Beginning with a straightforward linear function, the regression approach gained importance over time and complicated the mathematical functions that were being used. Regression analysis over time evolved into a dominant method for predicting crop yield from a variety of inputs (climate models, field experiment results, remote sensing data, etc.), involving a wide range of computation techniques and target functions, including polynomial functions, multiple and multivariate regression, stepwise, logistic regression, etc. [23,24,25,26]. Fuzzy regression and interaction regression models, for example, which have been shown to be quite 190 SECTION XII. AGRICULTURAL SCIENCES AND FOODSTUFFS trustworthy and accurate in carrying out the task of yield analysis, are two further innovative regression approaches that are still being developed and introduced [27,28]. Now that yielding data analysis has become critically important, researchers should shift their attention to figuring out the optimum statistical methodology for yield prediction in terms of fitting quality and prediction accuracy [29,30].
Studies specifically focused on the above-mentioned issue are scarce. The study [31] compares various regression models for agricultural production prediction based on rainfall revenue, which is like what we did. Another study [32] compared the effectiveness of Lasso and traditional polynomial regression methods for predicting crop production. Gonzalez-Sanchez et al. [33] conducted a thorough study of cutting-edge widely used approaches for agricultural yield prediction, including multiple linear regression and stepwise linear regression. Our study only adds to the previously mentioned studies' insights into the art of selecting the best modeling approach in terms of a pure intra-comparison of regression analysis techniques. Although the findings of each study vary considerably, the underlying principle remains the same: the better the regression function that is used, the more accurate predictions of crop yields will be.
A few things should be stated regarding more modern mathematical techniques for analyzing crop yields that use artificial neural networks, which are becoming more and more popular among scientists. Convolutional neural networks, long-short term memory, and deep neural networks are in high demand and are becoming more significant in a variety of models for predicting agricultural yields [34,35]. These mathematical methods frequently seem to be a little more accurate than traditional ones, such as regression analysis, especially when dealing with large datasets [36,37,38]. The latter strategy is not entirely obscured by artificial neural networks, though.
For projecting precision agricultural yields, deep learning algorithms are integrated with multiple regression analysis in a variety of ways. This method has been used with success in various studies [21,39]. Due to the so-called "black box nature" of the latter, it is extremely difficult to understand how a specific neural network has arrived at its modeling results. Such connected models address this problem by providing a clear equation of yield prediction. Additionally, overfitting is occasionally a problem with artificial neural networks, whereas with regression analysis, the researcher retains authority over it [40]. Regression analysis has so retained its value even now when it is tightly linked with cutting-edge data analytic techniques. Additionally, just as when utilizing regression analysis as a stand-alone data analysis tool, it is crucial to pick the appropriate function to work well with deep learning methods. We will investigate this matter further in the future.
Conclusions. Regression analysis is frequently used in agricultural sciences, despite being considered an outdated technique. The accuracy of the target function selection, in addition to the model's inputs' quality and quantity, is crucial to yield modeling's success. After examining the regression statistics of the models for predicting the yield of the three crops under study, it is determined that a cubic function is the best choice for medium-sized pair datasets. Avoid employing the functions linear, power (stepwise), logarithmic, and exponential-2 regression since they are very unlikely to produce accurate predictions and fitting solutions.