ArticlePDF Available

Using Machine Learning Algorithms for Housing Price Prediction: The Case of Islamabad Housing Data

July 2021
Fundamenta Informaticae 1(1):11-23

July 2021
1(1):11-23

DOI:10.22995/scmi.2021.1.1.03

Authors:

Imran

gachon university medical campus

Umar Zaman

Muhammad Waqar

Staffordshire University

Atif Zaman

House price prediction is a significant financial decision for individuals working in the housing market as well as for potential buyers. From investment to buying a house for residence, a person investing in the housing market is interested in the potential gain. This paper presents machine learning algorithms to develop intelligent regressions models for House price prediction. The proposed research methodology consists of four stages, namely Data Collection, Pre Processing the data collected and transforming it to the best format, developing intelligent models using machine learning algorithms, training, testing, and validating the model on house prices of the housing market in the Capital, Islamabad. The data used for model validation and testing is the asking price from online property stores, which provide a reasonable estimate of the city housing market. The prediction model can significantly assist in the prediction of future housing prices in Pakistan. The regression results are encouraging and give promising directions for future prediction work on the collected dataset.

Properties count based on locations

…

Experimental Design.

…

Correlation between features

…

Figures - uploaded by Imran

Content may be subject to copyright.

Content uploaded by Imran

Content may be subject to copyright.

Soft Computing and Machine Intelligence Journal Vol (X), Issue (X), 20XX

Using Machine Learning Algorithms for Housing Price

Prediction: The Case of Islamabad Housing Data

Imran 1,∗, Umar Zaman 2,Muhammad Waqar1and Atif Zaman 1

1Department of Computer Science, Bahria University Islamabad, Pakistan

2Department of Computer Science, Iqra University Islamabad, Pakistan

*Correspondence: imranjejunu@gmail.com; Tel.: +82-1093369498

Abstract:

House price prediction is a signiﬁcant ﬁnancial decision for individuals

working in the housing market as well as for potential buyers. From investment to

buying a house for residence, a person investing in the housing market is interested in the

potential gain. This paper presents machine learning algorithms to develop intelligent

regressions models for House price prediction. The proposed research methodology

consists of four stages, namely Data Collection, Pre Processing the data collected and

transforming it to the best format, developing intelligent models using machine learning

algorithms, training, testing, and validating the model on house prices of the housing

market in the Capital, Islamabad. The data used for model validation and testing is the

asking price from online property stores, which provide a reasonable estimate of the city

housing market. The prediction model can signiﬁcantly assist in the prediction of future

housing prices in Pakistan. The regression results are encouraging and give promising

directions for future prediction work on the collected dataset.

Keywords:

machine learning for regression; housing dataset; Property stores; house

price prediction; housing property value; real estate market;

1. Introduction

The real estate market in Pakistan is a widespread trade, and with Projects like CPEC, the property

dynamics are changing quickly. Investors, as well as individuals, want to invest money in the housing

sector. Buyers and owners observe real estate trends, particularly in the housing market; these

trends also reﬂect the economic situation and social sentiment of any developing country. House

price estimation is a signiﬁcant ﬁnancial decision for individuals working in the housing market and

potential buyers. From investment to buying a house for residence, a person investing in the housing

market is interested in the potential gain. To understand this study’s background, We ﬁrst overview

the housing market of Pakistan and then give an overview of the dataset used in this study. There

are many factors which determine the houses prices. If we look into real estate in general, then an

increase in the real estate market is explained by the rise of the particular area’s inhabitants’ income.

However, careful analysis suggests that we can only temporarily suggest that the prices of real estate

are increasing due to these factors, such as demand-oriented variables and others. Therefore, we can

conclude that the factors can be changed from time to time.The house prices are based on income of

the inhabitants of the area, house stock supply and the payment system, whether accept installment or

require cash payment.

Other essential variables can include whether the price is the affordable, unemployment rate,

demographics, and others, but we can explain house prices as a general income function.In this study

do not consider all the possible variables that can be used to predict housing prices. In this study, we

Soft Computing and Machine Intelligence Journal, Vol (1), Issue (1), 2021 Page: 1 of 12

Vol (1), Issue (1), 2021

use only the housing data available from the property websites to predict the housing prices by looking

at the recent trends.Pakistan’s economy is slowly on the way towards recovery. The unemployment

rate is on a downwards stream, and consumer spending is going up. Nevertheless, the growth rate is

still struggling, which indicates that the Pakistan economy still has a long way to go before it is up and

running.

In Europe and other advanced countries, real estate companies challenge developing algorithms

that can forecast real estate property prices more accurately. Researchers are using some well-known

housing datasets, e.g., Boston and King city USA datasets. One of the gaps for Pakistan is the absence

of a comprehensive housing dataset. Some real estate property sites in Pakistan provide a reasonable

estimate of the Pakistan housing market, but currently, they are not using house price forecasting tools.

Websites like Zillow

, a US real estate market place organizes competitions on kaggle

to encourage

researchers to come up with accurate house price forecasting algorithms. Since such challenges are not

part of the Pakistan housing market yet, making it very difﬁcult for a research scholar to develop such

forecasting algorithms for the Pakistan real estate housing data, the only sources of housing data are

these online property stores.In the Pakistan real estate market, there are currently no Machine-based

forecasting tools used to estimate houses or any other real estate properties. There are some blogs

and magazines where human real estate market experts advise Pakistan real estate forecasts. Lack of

scientiﬁc research competitions for forecasting Pakistan’s housing prices and hence lack of housing

dataset make housing Price prediction for Pakistan real estate a difﬁcult and challenging task.

Figure 1. Properties count based on locations

Figure 1displays the Property counts of the dataset with respect to sectors of the capital Islamabad.

We collect the dataset for this study from the leading property websites in the country. The dataset for

this study is from online property stores based in Pakistan. These websites contain details of property

listings from various cities of Pakistan. The dataset for this study is of Islamabad. The dataset is in

tabular textual format consisting of 23 columns and 44647 rows collected over a period of one year.

1www.zillow.com

2www.kaggle.com

Page: 2 of 12

Vol (1), Issue (1), 2021

2. Related Work

In literature, the approaches used for house price prediction can be classiﬁed as regression

models, machine learning models, and hybrid models.A variety of research work has been done

to estimate housing prices. Gaussian Processes (GP) for regression Model beneﬁt from the London

housing dataset’s spatial structure; for this purpose, smaller local models are developed, which works

independently from each other. Once local models are trained, the overall predictions are obtained by

recombining predictions from local models. For generating visualization to clients through mobile [

the model is trained at the server-side, and prediction is generated for the user via a mobile app.

Linear Regression and Gradient Boosting methods are used to predict Zillow Estimation. Zillow

is offering competition on Kaggle to develop the most accurate property value forecasting algorithm.

They used property data to train their linear regression and gradient boosting models with which

they make predictions about other properties. For gradient boosting models, they use grid search to

ﬁne-tune their model’s hyperparameters. Oladunni et al. [

] reduce errors in the Hedonic housing

regression model by investigating Spatial Dependency substitutability of submarket and geospatial

attributes. The model is trained using best subset linear regression and regression tree algorithms.

Bayesian information criterion and residual mean deviance are used as performance matrices.

Ahmed et al. [

] design a neural network-based model for predicting housing Market Performance.

This model is trained through a historical market performance dataset to predict unforeseen future

performances. The model testing and validation show that the error in predicting his Neural Net

is in the range between –2 and +2 percent.To predict the Singapore housing market, Lim et al. [

]

design neural networks. They used two algorithms for prediction, the multilayer perceptron, and

autoregressive integrated moving average. The model with high accuracy score is used for prediction,

and the model with lower mean square error (MSE) of the ANN models shows that ANN is best

over other predictive tools. Chica et al.[

] designed Cokriging a Multivariate Spatial Method for

predicting Housing Location Price. This method estimates correlated spatial variables, interpolated

maps of house prices are created, providing information about house location prices to appraisers

and real estate agents. During the experiment, housing location price prediction value is estimated

using methods: isotopic data cokriging and heterotopic data cokriging. Results from both methods are

compared, and prediction from the best method is selected.

Bahia et al. [

] used a data mining model using an Artiﬁcial Neural network to the real estate

market. Two network models were developed during the study FFB and CFBP. Both of these models

were trained using the Boston dataset, and the performance matrix used was regression value. The

CFBP prediction results are best, and the regressional value is .964; the study suggests that CFBP

prediction accuracy is 96 percent. Stevens et al. [

] used text mining to predict housing prices. His

prediction price involves pricing indicators, e.g., selling price, asking price, and price ﬂuctuation. This

study shows that the SGD classiﬁer performed best for all pricing indicators and achieved the best

results. The study uses stemmed n-grams for classiﬁcation and regressions. R2 Matrix performance

value for prediction is 0.303. The study suggests that both of these results are good due to the task

complex nature.

Nissan et al. [

] used various algorithms to Predict real estate property prices in Montreal. The

study suggests a prediction model that predicts asking and selling prices based on features, such as

location, area, rooms, nearest police station, ﬁre station, etc. They used many regression models for

regression prediction. These regression methods include linear regression, SVR, kNN, regression Tree,

and Random Forest Regression. The proposed prediction models predict the Asking price with an

error of 0.0985 and the selling price with an error of 0.023.

Nghiep et al.[

] compared multiple regression analysis to artiﬁcial neural networks (ANN) using

three different-sized training sets of single-family houses. The prediction Model uses features, e.g.,

Page: 3 of 12

Vol (1), Issue (1), 2021

area, number of bathrooms and bedrooms, the property build year, which shows how much property is

old in terms of years, number of quarters, selling status, and whether or not the property has a garage

or carport. The researchers proposed that while MRA performs best on smaller-sized training sets,

ANN was found to outperform as the dataset size increases. Byeonghwa and Jae [

] applied various

prediction techniques to predict prices of houses in Fairfax County, VA. They build various models on

5359 townhouses. They evaluated and compared these models and proposed that RIPPER, Bayesian,

and AdaBoost. RIPPER is best than other prediction models. They also applied Naive Bayesian to

the same dataset, but RIPPER algorithm performance is outstanding for housing price prediction. In

literature, descriptive analysis have applied to effective management of waste data [

],healthcare and

thermal comfort applications [

] energy optimization domain [

]. Predictive analysis has been

proven helpful to forecast the outcome of a certain situation in the near future. Predictive analysis has

been applied recently to recommendation systems[

], safety applications[

] , policy-making [

and convergence applications [20].

3. Materials and Methods

We present the experimental design in three stages, where the former presents data collection,

and second presents Preprocessing steps, and the third presents regression models for house prices

prediction.

Pre-Processi ng Layer

Handling

Missing Value

Simple Data OR

Normalized Data

Prediction Layer

Data Collecti on Layer

Input

Layer

Hidden

Layer

Output

Layer

O=1

H=10

I= 4

Neuron

Output

Moving

Average

Batch

Normalization

Input

Layer Hidden

Layer Output

Layer

O=1

H=20

I=8

Neuron

Output

Smoothing

Performance

Evaluation La yer

Root Mean Square Error

Mean Absolute Error

Mean Absolute

Percentage Error

House features variables

Figure 2. Experimental Design.

3.1. Data Collection

Data is collected using scraping software that collects data from the internet in a format that the

machine learning model can use. When parsing, the output data is interpreted by a machine, but

the human can not understand it easily. Data scrapping is also referred to as data extraction.Data

scrapping is very useful as if humans perform the data collection from the internet, and there are

many chances of error as machines transfer data between programs in the form of data structures that

provide high integrity of the data. However, the script is written for a pre-determined format, and it

may not be necessary that the data is always in the given format. The data may have issues in terms

of data consistency and correctness. Therefore, data scrapping only collects raw data and requires

extensive preprocessing and, in some situations, also requires human involvement.The data scrapping

Page: 4 of 12

Vol (1), Issue (1), 2021

activity is primarily dependant on the Internet sources from where data is being collected and can not

be fully automated. For example, in the case of you scraping data from the website, the best format

is that if the developer has assigned to each unique HTML element, an attribute ID and an attribute

of the class are assigned to each item of the same group. This helps to create a script in almost any

programming language. Comparative study[

] of open-source scraping tools suggests that scrapy is

the best open-source tool for scrapping, So in this study, We use Python scrapy library for creating our

crawler.

However, note that web scraping from well-settled companies is not trivial as the companies

use defensive algorithms and software to protect un-wanted access to their website. Most of them

are blocking any type of script in their robots.txt. So the idea is to write a script that can scrap data

intelligently like a human being. This is achieved by automating human behavior when browsing a

website. For example, if scrapping an entry is delayed by 5 seconds or 10 seconds, the system may

not recognize data extraction from the website and could consider it a regular activity. Data scraping

is done on publicly available data via browser either without login or after authentication to their

website. In the case of using SQL Injection to hack their database is a saviors internet offense.Web

search engines, e.g., Google, yahoo, bing, and others, play an essential role in reaching a website. For

example, we type a keyword, and after the query is entertained, the search engine gives us results

based on that query. This helps ﬁnd a data host, and it gives both beneﬁts to the data host and the

person who is scraping. The Mechanism search engines use the same as web scraping, but they are not

blamed for data scraping as the data is used for the user’s convenience.

If we consider Google, Google has two part of their search engine, one Googlebot a software bot

which crawls billions of web pages from the websites on the internet and is stored in the Google data

hosts and another part of the system is an algorithm which entertains the user-queries based on the

data crawled and displays results to user with the help of a ranking algorithm.In regard to whether

web scraping is legal or illegal, Michael Mahoney observes [

] that legal action is taken against airline

price aggregators such as Orbitz Kayak and Expedia. Another example in this regard is Facebook.

Facebook has a history of suing third-party applications that have accessed and republished Facebook

user data [23].

Another exciting example is Craigslist [

] which provides services like Padmapper, 3Taps of an

improper gathering of their information and reposting it as a map interface which is plotted as the

chart on the location of the user-generated ads. The author states there is "no direct legal protection for

databases. However, data hosts can ﬁle a case against scraper if they can prove the scraper has harmed

them in any way". One such example is Intel and Hamidi’s case that ruled that server inconveniences

do not constitute an actionable harm [

]. Scraping may consume the bandwidth of websites and,

in extreme cases, crash a website or server. In summary, the legality of scraping by [

]: multiple

instances of data hosts pairing up with scraper show that data host should seek ways to embrace

scrapers that seek to improve their services. Further, the scrapers should review their business model.

If a data host thinks scraper is parasitic, then he can sue the scraper. Table 1and Table 2presents

physical, geographical and other features of the collected dataset.

Page: 5 of 12

Vol (1), Issue (1), 2021

Name of the attribute Description Data type

Area living area in Square feet Numeric

Bedrooms number of bed rooms Numeric

Bathrooms number of bath rooms Numeric

Dining Room dining room? (yes/no) Binary

Drawing Room Drawing Room? (yes/no) Binary

Laundry Room Laundry Room? (yes/no) Binary

Lounge Lounge? (yes/no) Binary

Garden Garden? (yes/no) Binary

Flooring Flooring? (yes/no) Binary

Study Room Study Room? (yes/no) Binary

Swimming Pool Swimming Pool? (yes/no) Binary

Central Air Conditioning Central Air Conditioning system? (yes/no) Binary

Build house build type? (old/new) Binary

Table 1. List of physical features selected for the dataset.

Name of the attribute Description Data type

Location sector name of the location Nominal

Nearby Hospitals Nearby Hospital? (yes/no) Binary

Nearby Schools Nearby School? (yes/no) Binary

Nearby Shopping Malls Nearby Shopping Malls? (yes/no) Binary

Maintenance Staff Maintenance Staff? (yes/no) Binary

Security Staff Security Staff? (yes/no) Binary

Nearby Airport(yes/no) Nearby Airport(yes/no)? (yes/no) Binary

View house View? (good/best/normal) Nominal

Parking Spaces Parking Spaces? (yes/no) Binary

Price price of the house in PKR? Numeric

Table 2. List of geographic and environmental features.

3.2. Preprocessing

Data preprocessing is done in order to transform the dataset into a clean dataset for better machine

learning models. Data preprocessing techniques are applied to data in raw format, which is not feasible

for analysis. As in our case, the data is collected from different property websites where property

agents entered it, so there are missing values, data in various formats, and incorrect data. We performed

data integration to combine the data from various sectors of the capital into an integrated dataset.

Data transformation methods were applied to transform the data records to a format that is good for

machine learning analysis.

To perform iterative analysis on data, we cleaned the dataset from missing and incorrect

values. Data Wrangling, Data Munging are similar terms used in the Data Science community;

data wrangling/data munging are techniques used to convert raw data into a format that is best for

using the data. In our case, we converted the textual data such as yes and no to binary variables.

Locations, views, and other variables were encoded into numbers for better analysis results.

We computed the binary variable Build from the year of construction of the house. The house’s

asking price was in various currencies and units, e.g., lacks, thousands, crore. We converted it into

lacks units and PKR currency. Machine learning algorithms such as neural networks perform best

on data values ranges from 0 to 1, so we scaled down our dataset values between 0 and 1 using the

Min-Max scaling algorithm. Later on, for performance evaluation, the values are scaled up to their

original range. Equation 1 shows how to scale down values between 0 and 1.

x0=x−min(x)

max(x)−min(x)(1)

Page: 6 of 12

Vol (1), Issue (1), 2021

Figure 3shows the Correlation between housing features, it is used to calculate the strength of

the relationship of housing features i.e., Bedrooms and Bathrooms, Build, Dining Room with price

feature. The Correlation Coefﬁcient value for Bedrooms and Bathrooms with respect to price features

is high than the rest of the features, which shows that price features having a strong relationship with

Bedrooms and Bathrooms features, and hence these will contribute more than other features in house

price prediction. Its clear from Figure 3that all the features except Area, Central air conditioning,

location, view having some sort of relationship with the price feature. In this study, we used the

Pearson correlation coefﬁcient to measure the strength of the features variables’ relationship. Pearson

correlation coefﬁcient can be calculated using Equation 2.

ρ=cov(X,Y)

σxσy(2)

Figure 3. Correlation between features

After applying Preprocessing the dataset, we Partition the dataset into training, validation, and

testing subsets. Each of these partitioned datasets is further divided into dependent and independent

variables, set X and Y.

3.3. Analysis procedure

This study developed various regression models, including Intelligent machine learning-based

models, and applied them to our dataset. The development toolkit used for developing our regression

Page: 7 of 12

Vol (1), Issue (1), 2021

models is anaconda spyder. We now discuss ten machine learning regression procedures applied to

our dataset.

3.3.1. Machine Learning Regression Methods

Linear regression (LR) [

] are used too much because its easy, straightforward to understand.

It is one of the most basic and popular algorithms in machine learning. In this study, we build a

multivariate LR Model to predict housing prices. LR Model will ﬁnd the best possible line that ﬁts

the training set and then predicts the unseen house price from the test set. We applied Support Vector

Regression(SVR) [

] into the same housing dataset for housing price prediction. SVR is slightly

different from the famous machine learning algorithm Support Vector Machine(SVM). The main

difference is that SVM is used for classiﬁcation, and SVR is used for a regression problem. In SVM,

a hyperplane is used as a separation line between classes. In SVR, we deﬁne the hyperplane line

for predicting the continuous value or housing price value. Other concepts, i.e., boundary line and

support vectors, are the same between SVM And SVR.

We estimated the housing price prediction problem using a machine learning probabilistic model

called Bayesian Ridge Regression (BRR) [

]. We estimate the house prices t be Gaussian distributed

around the independent housing features. The main advantage of using BRR for house price prediction

or other regression problems is that it can adapt to the data at hand, and second that it can be used to

include regularization parameters in the housing price estimation procedure.

LassoLars regression [

] is one of the simple techniques to reduce model complexity and prevent

over-ﬁtting, resulting from simple linear regression. Lasso regression helps in reducing over-ﬁtting

and in feature selection. Just like Ridge regression, the regularization parameters can be controlled for

better estimation of the housing prices. Elastic Net [

] ﬁrst emerged as a result of critique on lasso

regression, whose variable selection can be too dependent on data and thus unstable. The solution is

to combine the penalties of ridge regression and lasso to get the best of both worlds. The elastic Net

main aim is minimizing the loss function.

Gradient boosting regression(GBR) [

] is a machine learning that can be used to build a prediction

model for regression problems like house price prediction in the form of an ensemble of weak prediction

models. GBR repetitively leverages residuals patterns and strengthens a housing price prediction

model with weak predictions, and makes it better. The main aim is minimizing our loss function,

such that test loss reaches its minima. Random Forest(RF) [

] is an ensemble technique capable

of performing both regression and classiﬁcation tasks with the use of multiple decision trees and a

technique called Bootstrap Aggregation, commonly known as bagging.Stochastic gradient descent

(SGD) [

] is based on some addition to gradient descent. It is an iterative method for optimizing

an objective function and is mostly used as black-box optimizers. SGD can be called a stochastic

approximation of gradient descent optimization. Passive Aggressive Algorithms [

] are a family of

online learning algorithms. We use the Passive-Aggressive regression(PAR) model for the house price

prediction problem. The idea is elementary, and the house price estimation using this regression model

is better than many other alternative methods. Theil-Sen estimator is a method used for simple linear

regression, and it chooses the median of the slopes of all lines through pairs of points.

Page: 8 of 12

Vol (1), Issue (1), 2021

4. Results

This section of the study explains the experimental results of the machine learning models used in

the study for house price prediction. Figure 4visualize the comparison of the house’s original listing

price and predicted price values by various Machine learning Algorithms.

(a) Bayesian Regression (b) Linear Regression

(e) ElasticNet Regression (f) Gradient Boosting Regression

(g) Passive Aggressive Regression (h) Theil-Sen Regression

Figure 4. Predictive analysis of house price prediction with ML models.

Page: 9 of 12

Vol (1), Issue (1), 2021

Each subﬁgure listing price is represented using a dashed blue color line, whereas the machine

learning algorithm’s predicted price value is represented using a solid orange color line. Horizontal

access of the chart represents the housing property instance, and the Vertical axis represents price

values.

4.1. Performance matrices

The performance evaluation matrices used for the evaluation of the regression models are

MAPE(Mean absolute percentage error), RMSE(Root Mean Squared Error), and MAE(Mean absolute

error).Table 3 presents Comparison of regression methods performance analysis on the prepared

housing dataset.

4.1.1. Mean absolute percentage error

This performance measure computes an average deviation found in predicted house price value

from actual listing house price values. MAPE is calculated by dividing the sum of absolute differences

between the actual house price values and predicted house values by the machine learning algorithm

we applied in this study with the total number of price value data items, i.e., n.

MAPE =100%

∑

t=1

yt

(3)

4.1.2. Root Mean Squared Error

MSE sometimes increases the actual error, making it difﬁcult to realize and understand the actual

error amount. This problem is resolved by the RMSE measure, which is obtained by simply taking the

square root of MSE.

RMSE =s1

nΣn

i=1di−fi

σi2(4)

4.1.3. Mean absolute error

mean absolute error is a measure of difference between two continuous variables.In our case these

continuous variables are listing price value and predicted price value f the house property.

MAE =1

∑

t=1

|et|(5)

Method MAPE MAE RMSE

LR 5627.9369 10928.2603 16658.4158

BRR 7383.9969 10930.6388 16661.3350

SVR 1918.4957 8595.6057 18209.5558

SGDR 10698.1442 13139.1928 17345.1444

ElasticNet 7388.1547 10927.7181 16658.2267

GBR 5267.4830 9563.4324 16772.3870

LassoLars 7382.6600 10938.5807 16670.3489

RF 7371.0746 10902.9762 17105.2596

PAR 2133.8370 8621.9391 18069.2298

Theil-Sen 6031.6336 10151.4884 16754.2930

Table 3. Comparison of regression methods performance

Page: 10 of 12

Vol (1), Issue (1), 2021

5. Conclusions

In this study, We have explored eleven machine learning algorithms used to develop housing

price prediction models for estimating the future house pricing of the capital Islamabad. One of our

contributions in this study is collecting housing data and developing the ﬁrst scientiﬁc housing dataset

for the Pakistan housing market. Machine learning algorithms such as Passive-aggressive Regression,

Support Vector Regression, and Deep learning Network can estimate the prices very close to the listing

price. The results show that SVR performs best than the rest of the machine learning algorithms. In

this study, we compare various machine learning regression models’ performance for ﬁnding best

model for a better housing price prediction. There is currently no Machine learning or other house

forecast tools used in the best of our knowledge. We strongly believe that machine learning house price

prediction models will help those who work in the real estate market and potential buyers in making

a good house purchasing decision. In the future, this work can be used as base for several types of

studies, including the real estate market, stock price prediction, oil and petroleum prices forecast. In

the future, this textual tabular dataset can be used with the houses’ visual features, such as images

of the houses’ interior and exteriors, to build a more robust, novel house price prediction. Lastly, the

housing market can be inﬂuenced by other macro-economic variables such as price of gold, stock price

index, property tax, and the appraised value of a property; considering these can help develop house

price prediction models that can accurately estimate the house prices.

Acknowledgments:

We are thankful to Dr. Muhammad Muzammal, Associate professor Bahria

University for his supervision and valuable suggestions during this research study.

References

Ng, Aaron and Deisenroth, Marc. Peer-to-peer energy trading mechanism based on blockchain and machine

learning for sustainable electrical power supply in smart grid. Imperial College London 2015, , 142–149.

Sangani, Darshan and Erickson, Kelby and Al Hasan, Mohammad. Predicting zillow estimation error using

linear regression and gradient boosting. IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems

(MASS). 2017,IEEE,530–534.

Khalafallah, Ahmed et al. Neural network based model for predicting housing market performance. Tsinghua

Science and Technology. 2008,TUP,13,S1,325–328.

Lim, Wan Teng and Wang, Lipo and Wang, Yaoli and Chang, Qing. Housing price prediction using neural

networks. 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

(ICNC-FSKD). 2016,IEEE,518–522.

Chica-Olmo, Jorge et al. Prediction of housing location price by a multivariate spatial method: Cokriging. Taylor

& Francis. 2007, Taylor & Francis,29,1,91–114.

Bahia, Itedal Sabri Hashim and others. A Data Mining Model by Using ANN for Predicting Real Estate Market:

Comparative Study. International Journal of Intelligence Science. 2013, Scientiﬁc Research Publishing,03,04,162.

Stevens, Dick and Wubben, S and van Zaanen, MM. Predicting real estate price using text mining. Department of

Communication and Information Sciences. 2014, Tilburg University

Pow, Nissan and Janulewicz, Emil.Prediction of real estate property prices in Montreal. Repéré à urlhttp://rl. cs.

mcgill. ca/comp598/fall2014/comp598_submission_99. pdf. 2014,

Nghiep, Nguyen and Al, Cripps. Predicting housing value: A comparison of multiple regression analysis and

artiﬁcial neural networks. Journal of real estate research. 2001, Taylor & Francis,22,3,313–336.

10.

Park, Byeonghwa and Bae, Jae Kwon. Using machine learning algorithms for housing price prediction: The

case of Fairfax County, Virginia housing data. Expert systems with applications. 2015, Elsevier,42,6,2928–2934.

11.

Imran, S. Ahmad and D. H. Kim, "Quantum GIS Based Descriptive and Predictive Data Analysis

for Effective Planning of Waste Management," in IEEE Access, vol. 8, pp. 46193-46205, 2020, doi:

10.1109/ACCESS.2020.2979015.

Page: 11 of 12

Vol (1), Issue (1), 2021

12.

Imran; Ahmad, S.; Kim, D. Design and Implementation of Thermal Comfort System based on Tasks Allocation

Mechanism in Smart Homes. Sustainability 2019, 11, 5849. https://doi.org/10.3390/su11205849

13.

Imran; Iqbal, N.; Ahmad, S.; Kim, D.H. Health Monitoring System for Elderly Patients Using

Intelligent Task Mapping Mechanism in Closed Loop Healthcare Environment. Symmetry 2021, 13, 357.

https://doi.org/10.3390/sym13020357

14.

Jamil, Faisal and Iqbal, Naeem ,Imran and Ahmad, Shabir and Kim, Dohyeun and others. Peer-to-peer energy

trading mechanism based on blockchain and machine learning for sustainable electrical power supply in smart

grid. IEEE Access 2021,IEEE, 9,39193–39217.

15.

Wahid, F.; Fayaz, M.; Aljarbouh, A.; Mir, M.; Aamir, M.; Imran. Energy Consumption Optimization and User

Comfort Maximization in Smart Buildings Using a Hybrid of the Fireﬂy and Genetic Algorithms. Energies 2020,

13, 4363. https://doi.org/10.3390/en13174363

16.

S. Ahmad, Imran, F. Jamil, N. Iqbal and D. Kim, "Optimal Route Recommendation for Waste Carrier Vehicles for

Efﬁcient Waste Collection: A Step Forward Towards Sustainable Cities," in IEEE Access, vol. 8, pp. 77875-77887,

2020, doi: 10.1109/ACCESS.2020.2988173.

17.

Imran; Iqbal, N.; Ahmad, S.; Kim, D.H. Towards Mountain Fire Safety Using Fire Spread Predictive

Analytics and Mountain Fire Containment in IoT Environment. Sustainability 2021, 13, 2461.

https://doi.org/10.3390/su13052461

18.

Imran;Ahmad, Shabir and Kim, Do Hyeun.A task orchestration approach for efﬁcient mountain ﬁre detection

based on microservice and predictive analysis In IoT environment. Journal of Intelligent & Fuzzy Systems

2021

IOS Press,1–16.

19.

S. Ahmad, Imran, N. Iqbal, F. Jamil and D. Kim, "Optimal Policy-Making for Municipal Waste Management

Based on Predictive Model Optimization," in IEEE Access, vol. 8, pp. 218458-218469, 2020, doi:

10.1109/ACCESS.2020.3042598.

20.

Imran; Ghaffar, Z.; Alshahrani, A.; Fayaz, M.; Alghamdi, A.M.; Gwak, J. A Topical Review on Machine

Learning, Software Deﬁned Networking, Internet of Things Applications: Research Limitations and Challenges.

Electronics 2021, 10, 880. https://doi.org/10.3390/electronics10080880

21.

Yadav, M., Goyal, N. (2015). Comparison of Open Source Crawlers-A Review. International Journal of Scientiﬁc

Engineering Research, 6(9), 1544-1551.

22. Michael Mahoney. Orbitz Sued by Southwest Airlines. 2001, E-Commerce Times.

23. Inc v. Power Ventures. 2012, Facebook, Inc v. Power Ventures,844 F.Supp.2d 1025 (E.D. Cal).

24. Daniel J. Gervais. The Protection of Databases. 2007, 92 CHI.-Kent L. Rev. 1109.

25. Report. Intel Corp. v. Hamidi. 2003, 71 P.3d 296.

26.

Hirschey, Jeffrey et al. Symbiotic Relationships: Pragmatic Acceptance of Data Scraping. SSRN Electronic

Journal. 2014, SSRN.

27. Weisberg, Sanford. Applied linear regression. 2005,John Wiley & Sons.

28.

Panigrahi, S. S., Mantri, J. K. (2015, October). Epsilon-SVR and decision tree for stock market forecasting. In

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) (pp. 761-766). IEEE.

29.

Vinod, H. D. (1978). A survey of ridge regression and related techniques for improvements over ordinary least

squares. The Review of Economics and Statistics, 121-131.

30.

Gluhovsky, I. (2011). Multinomial least angle regression. IEEE transactions on neural networks and learning

systems, 23(1), 169-174.

31. Li, Q., Lin, N. (2010). The Bayesian elastic net. Bayesian analysis, 5(1), 151-170.

32.

Efron, Zemel, R. S., & Pitassi, T. (2001). A gradient-based boosting algorithm for regression problems. In

Advances in neural information processing systems (pp. 696-702).

33. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

34. Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

35.

Nishikawa, H., Arita, K., Tanaka, K., Hirao, T., Makino, T., Matsuo, Y. (2014, August). Learning to generate

coherent summary with discriminative hidden semi-markov model. In Proceedings of COLING 2014, the 25th

International Conference on Computational Linguistics: Technical Papers (pp. 1648-1659).

Page: 12 of 12

The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning * Makine Öğrenimi İle Mülk Değerlemesinde Aykırı Değer Tespit Yöntemlerinin Etkisi

Article

Full-text available

Jun 2023

For those who invest in real estate as an investment tool, as well as those who buy and sell real estate, the price of real estate should be predicted realistically and with the highest accuracy. It should be noted that the predict model should be the most appropriate representation of the underlying fundamentals of the market. Otherwise, the mistake to be made in the real estate valuation will cause some undesirable results such as inconsistent and unhealthy increase or decrease of the property tax, excessive gains or losses in favor of some groups, and adverse effects on investors and potential real estate owners. At this point, data-driven real estate valuation approaches are preferred more frequently to create highly accurate and unbiased estimates. However, the consistency, precision and accuracy of the models realized with machine learning approaches are directly related to the data quality. At this point, the effects of outlier detection on prediction performance in real estate valuation are investigated with a large data set obtained in this study. For this purpose, a heterogeneous data set with 70.771 real estate data and 283 variables, 4 different outlier detection methods were tested with 3 different machine learning approaches. The empirical findings reveal that the use of different outlier detection approaches increases the prediction performance in different ranges. With the best outlier detection approach, this performance increase was at a high 21,6% for Random Forest, with a 6,97% increase in average model performance.

Financial Fraudulent Detection using Vortex Search Algorithm based Efficient 1DCNN Classification

Conference Paper

Full-text available

Mar 2024

Enhancing loan fraud detection process in the banking sector using data mining techniques

Article

Full-text available

Nov 2023

span>Ongoing loan fraud is a source of concern for financial institutions, as it has a direct financial impact and also scares off customers. This pattern, which can be traced to the development of modern technology, the introduction of novel ideas, and the quickening pace of international connections, makes the detection of fraud an expensive endeavour. This article proposes a novel framework for enhancing the fraud detection of loan banking using data mining algorithms. The framework extracts a number of predictive analysis techniques for identifying loan fraud. Several methods employing a wide range of pipeline architectures have been tried in order to select the optimal champion model. Autotuning has also been used to find the best possible setting for the model’s hyperparameters. The results of the evaluation show that autoencoder with gradient boosting outperformed the other classification algorithms with an accuracy of 98.62%. The proposed framework has the potential to significantly improve the fraud detection process of loan banking, which can ultimately lead to better faster fraud detects rates by combining data mining techniques with dimensionality reduction strategies in the feature space.</span

Real estate pricing prediction via textual and visual features

Article

Full-text available

Oct 2023
MACH VISION APPL

The real estate industry relies heavily on accurately predicting the price of a house based on numerous factors such as size, location, amenities, and season. In this study, we explore the use of machine learning techniques for predicting house prices by considering both visual cues and estate attributes. We collected a dataset (REPD-3000) of 3000 houses across 74 cities in the USA and annotated 14 estate attributes and five visual images for each house's exterior, interior-living room, kitchen, bedroom, and bathroom. We extracted features from the input images using convolutional neural network (CNN) and fed them along with the estate attributes into a multi-kernel deep learning regression model to predict the house price. Our model outperformed baseline models in extensive experiments, achieving the best result with a mean absolute error (MAE) of 16.60. We compared our model with a multi-kernel support vector regression and analyzed the impact of incorporating individual feature sets. In future, we plan to address class imbalance by having the same number of houses in each class and explore feature engineering for improving the model's performance.

Does noise affect housing prices? A case study in the urban area of Thessaloniki

Article

Full-text available

Oct 2023

Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.

Analysis of Housing Market to Predict Home Price

Conference Paper

Jul 2023

Identify and predict incorrect prices by Machine Learning Model

Article

Full-text available

Jan 2023

Electronic commerce (e-commerce) brings huge advantages to businesses for selling products through multiple online shops. However, companies have difficulties in supervising the prices of products set by different retail shops on e-commerce platforms. Addressing these difficulties, we suggest a method to identify and predict products that sell at incorrect prices using a machine learning model combined price analysis. The study uses four machine learning models: K-nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM), and Multinomial Naive Bayes (MNB) and two text-based information extraction methods: BoW and TF-IDF to find to the best method. The research results show that the RF model and text-based information extraction method by the BoW provide more average accuracy than other specific models, when experimenting on the filter dataset the average accuracy after 10 runs are RF: 98.06%, SVM: 83.92%, MNB: 92.21%, KNN: 94.06%. Experimental results on the product dataset have an accuracy of RF: 83.02%, SVM: 55%, MNB: 79.33%, KNN: 79.36%.

A Novel Model for House Price Prediction with Machine Learning Techniques

Article

Jun 2023

In this paper, we are going to use machine learning algorithms for house price prediction. House prices increases drastically every year, so we felt a need for a system that will predict house prices in the future. Due to a lack of knowledge of property assets people cannot guess the accurate price of houses. Therefore, we felt a need for a model that will predict an accurate house price. So, the main aim of our project is to predict the accurate price of the house without any loss. This survey also deals with a comparative analysis of the results of the algorithms used and the model with the highest accuracy and minimum error rate will be implemented. For the choice of prediction ways, we tend to compare and explore numerous prediction ways. We tend to utilize Linear and random forest regression as our model attributable to its liable and probabilistic methodology on model Choice. Our result exhibits that approach to the problem ought to achieve success and has the flexibility to predictions that will be compared to different house price prediction models. We have a proclivity to propose a house price prediction model to hold up a customer to estimate the proper valuation of a house.

Smart Cane: Obstacle Recognition for Visually Impaired People Based on Convolutional Neural Network

Chapter

Full-text available

May 2023

This book presents use-cases of IoT, AI and Machine Learning (ML) for healthcare delivery and medical devices. It compiles 15 topics that discuss the applications, opportunities, and future trends of machine intelligence in the medical domain. The objective of the book is to demonstrate how these technologies can be used to keep patients safe and healthy and, at the same time, to empower physicians to deliver superior care. Readers will be familiarized with core principles, algorithms, protocols, emerging trends, security problems, and the latest concepts in e-healthcare services. It also includes a quick overview of deep feed forward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, practical methodology, and how they can be used to provide better solutions to healthcare related issues. The book is a timely update for basic and advanced readers in medicine, biomedical engineering, and computer science. Key topics covered in the book: o An introduction to the concept of the Internet of Medical Things (IoMT). o Cloud-edge based IoMT architecture and performance optimization in the context of Medical Big Data. o A comprehensive survey on different IoMT interference mitigation techniques for Wireless Body Area Networks (WBANs). o Artificial Intelligence and the Internet of Medical Things. o A review of new machine learning and AI solutions in different medical areas. o A Deep Learning based solution to optimize obstacle recognition for visually impaired patients. o A survey of the latest breakthroughs in Brain-Computer Interfaces and their applications. o Deep Learning for brain tumor detection. o Blockchain and patient data management.

Machine Learning Solution for Orthopedics: A Comprehensive Review

Chapter

Full-text available

May 2023

Peer-to-Peer Energy Trading Mechanism Based on Blockchain and Machine Learning for Sustainable Electrical Power Supply in Smart Grid

Article

Full-text available

Feb 2021

It is expected that peer to peer energy trading will constitute a significant share of research in upcoming generation power systems due to the rising demand of energy in smart microgrids. However, the on-demand use of energy is considered a big challenge to achieve the optimal cost for households. This paper proposes a blockchain-based predictive energy trading platform to provide real-time support, day-ahead controlling, and generation scheduling of distributed energy resources. The proposed blockchain-based platform consists of two modules; blockchain-based energy trading and smart contract enabled predictive analytics modules. The blockchain module allows peers with real-time energy consumption monitoring, easy energy trading control, reward model, and unchangeable energy trading transaction logs. The smart contract enabled predictive analytics module aims to build a prediction model based on historical energy consumption data to predict short-term energy consumption. This paper uses real energy consumption data acquired from the Jeju province energy department, the Republic of Korea. This study aims to achieve optimal power flow and energy crowdsourcing, supporting energy trading among the consumer and prosumer. Energy trading is based on day-ahead, real-time control, and scheduling of distributed energy resources to meet the smart grid’s load demand. Moreover, we use data mining techniques to perform time-series analysis to extract and analyze underlying patterns from the historical energy consumption data. The time-series analysis supports energy management to devise better future decisions to plan and manage energy resources effectively. To evaluate the proposed predictive model’s performance, we have used several statistical measures, such as mean square error and root mean square error on various machine learning models, namely recurrent neural networks and alike. Moreover, we also evaluate the blockchain platform’s effectiveness through hyperledger calliper in terms of latency, throughput, and resource utilization. Based on the experimental results, the proposed model is effectively used for energy crowdsourcing between the prosumer and consumer to attain service quality.

Optimal Policy-Making for Municipal Waste Management Based on Predictive Model Optimization

Article

Full-text available

Dec 2020

Waste management is an issue of grave concern in the modern urban scenario with the exponentially rising population. Over the past few decades, the Korean government has established several policies to tackle challenges pertaining to solid waste management. To devise a policy, it is necessary to investigate the trends and behaviour of people towards waste disposal. This article fills this gap by proposing a systematic approach of analyzing the solid waste data based on waste profiles of residential grids in Jeju Island. The solid waste data, along with predictive analytics, help the municipality to devise customized policies for different residential grids. We define policy in terms of the number of waste collection human resources cost, waste carrier’s vehicle cost and fuel cost. Thus, the paper aims to suggest the number of resources which lead to a minimum cost and also ensure a certain level of hygiene in the area. The analysis is carried out on the solid waste dataset of 2017-2019 generated from different residential grids. The analysis, coupled with prediction algorithms allows the policy-makers to generate a waste profile specific to a residential grid. The optimization algorithm then proposes minimum resources which are enough to ensure hygiene standard of the area based on the waste amount and frequency inside the grid. The results of different areas are illustrated, and the minimum cost is suggested, which enables the policy-makers to not only allocate optimal resources but also helps in ensuring a green and clean environment.

Energy Consumption Optimization and User Comfort Maximization in Smart Buildings Using a Hybrid of the Firefly and Genetic Algorithms

Article

Full-text available

Aug 2020

This research work proposed a hybrid model to maximize energy consumption and maximize user comfort in residential buildings. The proposed model consists of two widely used optimization algorithms named the firefly algorithm (FA) and genetic algorithm (GA). The hybridization of two optimization approaches results in a better optimization process, leading to better performance of the process in terms of minimum power consumption and maximum occupant’s comfort. The inputs of the optimization model are illumination, temperature, and air quality from the user, in addition with the external environment. The outputs of the proposed model are the optimized values of illumination, temperature, and air quality, which are, in turn, used in computing the values of user comfort. After the computation of the comfort index, these values enter the fuzzy controllers, which are used to adjust the cooling/heating system, illumination system, and ventilation system according to the occupant’s requirement. A user-friendly environment for power consumption minimization and user comfort maximization using data from different sensors, user, processes, power control systems, and various actuators is proposed in this work. The results obtained from the hybrid model have been compared with many state-of-the-art optimization algorithms. The final results revealed that the proposed approach performed better as compared to the standard optimization techniques.

Optimal Route Recommendation for Waste Carrier Vehicles for Efficient Waste Collection: A Step Forward Towards Sustainable Cities

Article

Full-text available

Apr 2020

The exponentially growing population, urbanization, and economic development have led to the rising generation of municipal solid waste. Municipal solid waste management is thus a significant hurdle for urban societies as it consumes a large chunk of public funds, and, when mishandled, it can lead to environmental and social hazards. Some of the prerequisites required for effective waste management are the monitoring of bins, timely collection of bins, and prioritization of those areas which produce more solid waste. In this paper, we propose an optimal route recommendation system for waste carriers vehicles to effectively collect solid waste based on the profile of a particular area. This article contributes with a multi-objective optimization approach to generate a route by minimizing the route distance and maximizing the amount of waste. Then, a family of evolutionary methods is employed to solve the proposed objective function and find the optimal route for waste carrier vehicles. The experiment is carried out on the real-world solid waste data of Jeju Island, South Korea. The data is processed to predict the behavior of people of a specified grid location in terms of waste disposal. Therefore, the recommendation system caters to the predicted waste across a set of bins inside the area, and considering the constraints such as total allowed distance and time, proposes a route that is best in terms of distance (fuel consumption) and waste collection. Different use cases are illustrated to signify the proposed system, and results indicate that it can be a step forward for the implementation of smart cities, which is the goal of Jeju Island.

Design and Implementation of Thermal Comfort System based on Tasks Allocation Mechanism in Smart Homes

Article

Full-text available

Oct 2019

The recent trend in the Internet of Things (IoT) is bringing innovations in almost every field of science. IoT is mainly focused on the connectivity of things via the Internet. IoT's integration tools are developed based on the Do It Yourself (DIY) approach, as the general public lacks technical skills. This paper presents a thermal comfort system based on tasks allocation mechanism in smart homes. This paper designs and implements the tasks allocation mechanism based on virtual objects composition for IoT applications. We provide user-friendly drag and drops panels for the new IoT users to visualize both task composition and device virtualization. This paper also designs tasks generation from microservices, tasks mapping, task scheduling, and tasks allocation for thermal comfort applications in smart home. Microservices are functional units of services in an IoT environment. Physical devices are registered, and their corresponding virtual objects are initialized. Tasks are generated from the microservices and connected with the relevant virtual objects. Afterward, they are scheduled and finally allocated on the physical IoT device. The task composition toolbox is deployed on the cloud for users to access the application remotely. The performance of the proposed architecture is evaluated using both real-time and simulated scenarios. Round trip time (RTT), response time, task dropping and latency are used as the performance metrics. Results indicate that even for worst-case scenarios, values of these metrics are negligible, which makes our architecture significant, better and ideal for task allocation in IoT network.

Housing price prediction using neural networks

Conference Paper

Full-text available