Use Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost to help farmers make informed decision about cultivation of crops.
The best model(random forest) achieves almost perfect accuracy to recommend the correct crop based on 7 features (N, P, K, temperature, humidity, ph, rainfall)
Precision Agriculture is a management technique that is based on observing, measuring and responding to inter and intra-field variability in crops.
With the avent of techniques such as GPS and GNSS, farmers and researchers could measure many variables such as crop yield, terrain features, organic mantter content, moisture levels, nitrogen levels, K and others important variables. These data could also be collected by sensor arrays and these real-time sensors could measure chlorophyll levels to plant water status and etc.
All these could be used to optimize crop inputs such as water, fertilizer or chemicals. It could suggest farmers to grow the most optimal crop for maximum yeild and profit by these features. It could help farmers too reduce crop failure and take informed decision about farming strategy.
The dataset is obtained from kaggle and it has these data fields:
This is a supervised learning task that tries to identify the category that the object belongs to. So, I’ll be trying commonly used classification algorithms to build the model.
Logistic regresson is commonly used for binary classification problem and it uses sigmoid functin to return the probability of a label. The probability output from thee sigmoid function is compared wit a pre-defined threshold to generate a label.
An alternative and modified version of logistic regression is called multinomial logistic regression that could predict a multinomial probability.
common hyperparameters: penalty, max_iter, C, solver
Random forest is a commonly used ensemble methods that aggreagte results from multiple predictors (a collection of decisin trees). It utilizes bagging method that trains each tree on random sampling of the original dataset and take majority votes from trees.
The advantage of using random forest is that it has better generalization comparing to a single decision tree.
common hyperparameters: n_estimators, max_features, max_depth, min_samples_split, min_samples_leaf, boostrap
Naive Bayes is an algorithm based on Bayes’ Theorem. The naive assumption is that each feature is independent to each other and to calculate the conditional probability is based on prior knowledge.
The advantage of naive bayes is that is does not require a huge set of dataset. Gaussian Naive Bayes is a common type that follows the normal distribution.
XGBoost is an ensemble tecnique but takes a iterative approach. Each tree is not isolation of each other but is trained in sequence and each one is trained to correct the errors made by the previous one.
The advantage of it is that each model added is focused on correcting the mistakes made by the previous ones rather than learning the same mistakes.