A Summary of Sun, Y., 2019. Real Estate Evaluation Model Based on Genetic Algorithm optimized Neural Network

Introduction

The rapid development of the Chinese real estate economy has led to growing fears of a massive bubble, highlighting the urgent need of an improved and optimized real estate valuation model. This research paper finds that recent applications of Back-Propagation Neural Network Modeling (BP Neural Network Modeling) in real estate valuations is improved when optimized with a Genetic Algorithm.

The realm of Real Estate assessment is a multi-faceted and complex calculation with inherent risk (risk in this case being the margin of error on predicted value and true value). The author points out that many factors influence the valuation including the level of economic development, urbanization degree, social security situation, government policy, assessor’s professional accomplishment, architectural style characteristics, market personnel conditions, geographical environment, economic environment, ecological environment and so on.

The author notes that many academics agree that a valuation model alternative to the traditional Capital Asset Pricing Models of the past is necessary considering pressures mounting on the Chinese real estate market and BP Neural Network modeling has already provided a significant start, but the popular method is not without its drawbacks.

Back-Propagation Neural Networks

BP Neural Network modeling is a popular method of applying machine learning. It is best visualized by a relationship matrix between inputs outputs and some determined number of “hidden layers” between the in and out step. Each is composed of nodes which each represent a little part of a complex calculation that is being performed. The first step of BP runs the input through this matrix right to left and calculates outputs, then total error is calculated based on how accurate the predictions were compared to the actual value. The second step sends that error number back through left to right where the values each node contains are tweaked to minimize that error. Despite its popularity the method suffers from low computational efficiency and a tendency to settle into local extremum, and a lack of global searching ability.

This notion of settling into local extremum can be explained by imagining a linear curve as if it were a hill on which one could roll a ball down, like in the simple figure to the left. The model might get “tricked” into thinking it’s found the minimum here if it doesn’t evaluate more rightward on this hypothetical x axis.

Genetic Algorithms for Optimization

To correct for this the author makes use of a Genetic Algorithm (GA) which achieves optimization by applying the principles of Darwin’s theory of evolution. The BP Neural network will be fed training samples which train through a certain number of iterations before being sent into the GA which uses predetermined selection conditions encoded into a “fitness function” that will select for samples, performs crossover and mutation probabilities which mixes successful genes with random new inputs. This step helps increase the global searching ability and reduces the likelihood that the model will settle into a local-extrema in its final iteration. Much like the theorized process of natural selection, mutations and combinations are tested iteratively and those that improve the valuation method will survive by the fitness function and those that don’t fail to propagate into the next iteration.

Results

The author uses MATLAB 7.0 on 20 groups of real estate data obtained from a real estate market survey taken in an unnamed test city from October of 2018 to January 2019. Typical standardization and data preprocessing activities were performed before the sample was split into training and testing samples. The number of nodes in the input layer of the neural network was directly determined by the number of features of the real estate data and the output layer node number was set to 1, since the intent of the model was to predict only price. Using trial and error the author “tuned” the number of nodes in the hidden layer to optimize. At first the author tested 3, and then 4, and so on until the optimal number of hidden layer nodes settled on 8. Through testing the experimental error was found to be minimized when the training samples was split into 60 sub samples to allow for BP modeling and genetic competition to work. The number of iterations through the model was set to 7.

The author ran two models for a side-by-side comparison. The control experiment was a standard BP model whereas the test model was a BP Neural Network model that added a GA selection. It was noted that with increased training times for the BP Neural Network model sections of both models each performed better as training time increased, but it was clear that the BP Neural Network with GA model outperformed the simple BP Neural Network model with a smaller mean square error (MSE). The author concluded that the accuracy of the BP Neural Network with GA model in real estate data evaluation was superior. Additionally, the author noted that the convergence speed of the BP Neural Network with GA model was much faster than the BP Neural Network model, citing the reason being that the BP Neural Network with GA model had screened out the most appropriate weights thresholds during the GA step which greatly improved convergence speed as those values propagated through the 60 generations. The author finds that the overall fit of the BP Neural Network model when compared to the BP Neural Network with GA model was low.

Conclusions

In this study the Author examines the problem of real estate valuation and put forward real estate evaluation model based on genetic algorithm optimized neural network in favor of a simpler BP neural network model. The author states that the work proved the feasibility of the proposed method as a new method for the valuation of real estate, while noting some deficiencies in the study. The author cited insufficient analysis of real estate market and small sample size as two limitations of the scope of this experiment and noted that in future work the size of training samples needed to be calibrated more precisely and that other factors influencing real estate price should be included in the study.

A link to the full article and bibliography on Data Science Journal: