{"id":46645,"date":"2021-05-11T00:00:00","date_gmt":"2021-05-11T07:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/bank-loan-classification\/"},"modified":"2025-11-13T12:55:21","modified_gmt":"2025-11-13T20:55:21","slug":"bank-loan-classification","status":"publish","type":"post","link":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/","title":{"rendered":"Bank Loan Classification"},"content":{"rendered":"<h2>Introduction<\/h2>\n<p>Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the following:<\/p>\n<p><a href=\"#adding\">1&#46; Storing the data in GridDB <\/a><br \/>\n<a href=\"#extracting\">2&#46; Extracting the data from GridDB<\/a><br \/>\n<a href=\"#building\">3&#46; Building a Logistic Regression Model using Pandas<\/a><br \/>\n<a href=\"#evaluating\">4&#46; Evaluating our model using heat map and correlation matrix <\/a><\/p>\n<p>We will begin with installing the prerequisites and setting up our environment. We will be using <a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB&#8217;s Python connector<\/a> for this tutorial as the primary language used to model building is Python. GridDB has an extensive range of connecting <a href=\"https:\/\/github.com\/griddb\/griddb#client-and-connector\">libraries and APIs<\/a> which makes it easier to deal with both SQL and NoSQL interface.<\/p>\n<h2>Environment and Prerequisites<\/h2>\n<p>The following tutorial is carried out on Ubuntu Operating system (v. 18.04) with gcc version 7.5.0., GridDB (v. 4.5.2) and Jupyter Notebooks (Anaconda Navigator). As the code is in Python 3, it can be executed on any device\/code editor operating on the said programming language. In case, you are new to Python and Machine Learning, the following links would help you with the installation:<\/p>\n<ol>\n<li><a href=\"https:\/\/www.python.org\/downloads\/\">Installing Python<\/a><\/li>\n<li><a href=\"https:\/\/docs.anaconda.com\/anaconda\/install\/\">Installing Anaconda<\/a><\/li>\n<\/ol>\n<p>GridDB can be installed using their comprehensive guide available on <a href=\"https:\/\/github.com\/griddb\/griddb\">Github<\/a>. Prerequisites for GridDB&#8217;s <a href=\"https:\/\/github.com\/griddb\/python_client\">python-connector<\/a> include: 1. <a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a> 2. <a href=\"https:\/\/github.com\/griddb\/python_client#preparations\">SWIG<\/a> 3. pcre<\/p>\n<p>Tutorial on Python-client installation can be found <a href=\"https:\/\/griddb.net\/en\/blog\/python-client\/\">here<\/a><\/p>\n<h2>Dataset<\/h2>\n<p>For this tutorial, we will be working with a Bank Loan Classification dataset which is publicly available on <a href=\"https:\/\/www.kaggle.com\/sriharipramod\/bank-loan-classification\">Kaggle<\/a>. There are a total of 5000 instances in the dataset along with 14 attributes. The attributes signify user data evaluated on various criteria such as income, age, experience, etc. The response variable, in this case, is the \u00e2\u20ac\u02dcPersonal Loan\u00e2\u20ac\u2122 variable which is binary in nature. A label of 0 implies a rejection of the loan application whereas 1 conveys an acceptance.<\/p>\n<p>The objective is to classify an instance in either of those categories &#8211; 0 or 1 (rejected or accepted) based on the rest of the explanatory variables. This is a two-class classification task that could be accomplished using a Logistic Regression Model. So let\u00e2\u20ac\u2122s dive right in!<\/p>\n<h2>Set the necessary paths<\/h2>\n<p>The following code needs to be executed in the Ubuntu Terminal or whichever Operating System you are using after you have installed <a href=\"https:\/\/github.com\/griddb\/griddb\">GridDB<\/a> and the pre-requisites mentioned above.<\/p>\n<pre><code>export CPATH=$CPATH:&lt;Python header file directory path&gt;\n\nexport PYTHONPATH=$PYTHONPATH:&lt;installed directory path&gt;\n\nexport LIBRARY_PATH=$LIBRARY_PATH:C client library file directory path\n<\/code><\/pre>\n<h2>Importing necessary libraries<\/h2>\n<p>The first step is, of course, importing the necessary libraries. The libraries we will be using for this tutorial are as follows:<\/p>\n<ol>\n<li><a href=\"https:\/\/numpy.org\/\">NumPy<\/a><\/li>\n<li><a href=\"https:\/\/pandas.pydata.org\/\">Pandas<\/a><\/li>\n<li><a href=\"https:\/\/matplotlib.org\/\">Matplotlib<\/a><\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/\">scikit-learn<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">griddb_python<\/a><\/li>\n<\/ol>\n<p>Note that you might encounter an installation error if you&#8217;re using these libraries for the first time. Try <code>pip install package-name<\/code> or <code>conda install package-name<\/code>.<\/p>\n<p>Once the installation of these libraries is complete, we shall now import the <code>griddb_python<\/code> library. You could run the following command individually on the python console or create a .py file at your convenience. If the installation is successful, the import command should not return any error.<\/p>\n<pre><code>import griddb_python\n<\/code><\/pre>\n<p><strong>Troubleshooting<\/strong>: It is possible that the build might not be successful. Run the <code>make<\/code> command after setting up the paths. You could also <a href=\"https:\/\/github.com\/griddb\/python_client#how-to-run-sample\">run a sample program<\/a> to figure out the specifics.<\/p>\n<h2><span id=\"adding\">Adding data to GridDB<\/span><\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">import griddb_python as griddb\nimport pandas as pd\n\nfactory = griddb.StoreFactory.get_instance()\n\n# Container Initialization\ntry:\n    gridstore = factory.get_store(host=your_host, port=ypur_port, \n            cluster_name=your_cluster_name, username=your_username, \n            password=your_password)\n\n    conInfo = griddb.ContainerInfo(\"Dataset_Name\",\n                    [[\"attribute1\", griddb.Type.INTEGER],[\"attribute2\",griddb.Type.FLOAT],\n                    ....],\n                    griddb.ContainerType.COLLECTION, True)\n    \n    cont = gridstore.put_container(conInfo)   \n    cont.create_index(\"id\", griddb.IndexType.DEFAULT)\n    \n    dataset = pd.read_csv(\"BankLoanClassification.csv\")\n    \n    #Adding data to container\n    for i in range(len(dataset)):\n        ret = cont.put(data.iloc[i, :])\n    print(\"Data has been added\")\n\nexcept griddb.GSException as e:\n    for i in range(e.get_error_stack_size()):\n        print(\"[\", i, \"]\")\n        print(e.get_error_code(i))\n        print(e.get_location(i))\n        print(e.get_message(i))<\/code><\/pre>\n<\/div>\n<p><strong>Things to note<\/strong> 1. Replace the parameters of <code>factory.get_store()<\/code> function with your cluster credentials. Alternatively, <a href=\"https:\/\/github.com\/griddb\/griddb#start-a-server-1\">see<\/a> how to create a new cluster 2. Pass the relevant attribute name, dataset name and, data types to the function <code>griddb.ContainerInfo()<\/code><\/p>\n<h2><span id=\"extracting\">Accessing data from GridDB using SQL<\/span><\/h2>\n<p>The <code>griddb_python<\/code> library allows the user to access the data via SQL in Python. This helps us use both languages to our advantage. We can now access what is stored in the database simply by passing a query within our python file like<\/p>\n<pre><code>statement = ('SELECT * FROM Dataset_Name')\nsql_query = pd.read_sql_query(statement, cont)\n<\/code><\/pre>\n<p>Note that the <code>pd.read_sql_query()<\/code> function converts the data from the SQL query into a pandas dataframe. You can then directly work on the dataframe and build your machine learning model. Alternatively, you can import a csv file directly like:<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">import os\nimport numpy as np    \nimport pandas as pd   \nimport matplotlib.pyplot as plt    \nfrom sklearn.metrics import confusion_matrix,mean_squared_error,accuracy_score\nfrom sklearn.preprocessing import MinMaxScaler\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.metrics import accuracy_score \nimport seaborn as sns\nimport matplotlib.pyplot as plt \n<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">APP_PATH = os.getcwd()\nAPP_PATH\n<\/code><\/pre>\n<\/div>\n<pre><code>'C:\\Users\\SHRIPRIYA\\Desktop\\GridDB'\n<\/code><\/pre>\n<h2><span id=\"building\">Loading our dataset<\/span><\/h2>\n<p>Note that if you have loaded your dataset using GridDB&#8217;s <code>python_client<\/code>, you can skip this step as it is redundant.<\/p>\n<p>The <code>APP_PATH<\/code> variable contains the current directory which is appended to the file name in the below command. To avoid any complications, make sure that the python file and the csv file are in the same directory. If the case is otherwise, provide the full path of the dataset file to avoid <code>FileNotFoundError<\/code><\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset = pd.read_csv(os.path.join(APP_PATH, 'UniversalBank.csv'))<\/code><\/pre>\n<\/div>\n<h2>Exploring the dataset<\/h2>\n<p>We will be executing some trivial commands to get an overview of our dataset before building a model. Preprocessing and Analysis of your data is a good practice to ensure a more robust and effective model. The <code>head<\/code> command returns the first 5 rows of the dataset. In case, you want to display more, pass a number as an argument to the <code>head<\/code> command. For instance, <code>dataset.head(20)<\/code> would result in the first 20 rows whereas <code>dataset.head(-5)<\/code> would return the last 5.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.head()\n<\/code><\/pre>\n<\/div>\n<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          ID\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Experience\n        <\/th>\n<th>\n          Income\n        <\/th>\n<th>\n          ZIP Code\n        <\/th>\n<th>\n          Family\n        <\/th>\n<th>\n          CCAvg\n        <\/th>\n<th>\n          Education\n        <\/th>\n<th>\n          Mortgage\n        <\/th>\n<th>\n          Personal Loan\n        <\/th>\n<th>\n          Securities Account\n        <\/th>\n<th>\n          CD Account\n        <\/th>\n<th>\n          Online\n        <\/th>\n<th>\n          CreditCard\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          1\n        <\/td>\n<td>\n          25\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          49\n        <\/td>\n<td>\n          91107\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          1.6\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          2\n        <\/td>\n<td>\n          45\n        <\/td>\n<td>\n          19\n        <\/td>\n<td>\n          34\n        <\/td>\n<td>\n          90089\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          1.5\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          3\n        <\/td>\n<td>\n          39\n        <\/td>\n<td>\n          15\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          94720\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          4\n        <\/td>\n<td>\n          35\n        <\/td>\n<td>\n          9\n        <\/td>\n<td>\n          100\n        <\/td>\n<td>\n          94112\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          2.7\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          5\n        <\/td>\n<td>\n          35\n        <\/td>\n<td>\n          8\n        <\/td>\n<td>\n          45\n        <\/td>\n<td>\n          91330\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Let&#8217;s check out the total length of the dataset which would later be split into testing and training.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">len(dataset)<\/code><\/pre>\n<\/div>\n<pre><code>5000\n<\/code><\/pre>\n<p>The <code>dataset.dtypes<\/code> command tells us what kind of datatype each attribute has. This step is essential as we will have to scale numeric variables and in cases of text data, we will need to create dummy variables or use an encoding scheme. Therefore, it is a good idea to get a look at our data so that we could plan these steps prior to model building<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.dtypes<\/code><\/pre>\n<\/div>\n<pre><code>ID                      int64\nAge                     int64\nExperience              int64\nIncome                  int64\nZIP Code                int64\nFamily                  int64\nCCAvg                 float64\nEducation               int64\nMortgage                int64\nPersonal Loan           int64\nSecurities Account      int64\nCD Account              int64\nOnline                  int64\nCreditCard              int64\ndtype: object\n<\/code><\/pre>\n<p>It is now time to check for missing values in the dataset. <code>isnull()<\/code> command returns <code>True<\/code> if there is at least one null value in the dataset. Null values are either deleted or replaced by a predecided value before passing the dataset for training. Fortunately, as there are no null values in our case, we will simply move forward. However, to get a total count of null values in your dataset, you can type <code>dataset[key].isnull().sum()<\/code>. <code>[key]<\/code> is the attribute name you are interested in. The <code>dataset.dropna()<\/code> command will drop any row containing atleast one null value.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.isnull().values.any()<\/code><\/pre>\n<\/div>\n<pre><code>False\n<\/code><\/pre>\n<p>We will be dropping the columns &#8211; <code>ID<\/code> and <code>ZIP Code<\/code> as they play little to no role in predicting a loan application outcome.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.drop([\"ID\",\"ZIP Code\"],axis=1,inplace=True)<\/code><\/pre>\n<\/div>\n<p>Displaying the columns to verify the previous step<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.keys()<\/code><\/pre>\n<\/div>\n<pre><code>Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',\n       'Mortgage', 'Personal Loan', 'Securities Account', 'CD Account',\n       'Online', 'CreditCard'],\n      dtype='object')\n<\/code><\/pre>\n<p>It is imperative to convert the ordinal data into dummy data before moving on to creating a model. This makes it easier for the model to interpret these attributes. We will leave out the numerical data as they are taken as-is by the model.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">cat_cols = [\"Family\",\"Education\",\"Personal Loan\",\"Securities Account\",\"CD Account\",\"Online\",\"CreditCard\"]\ndataset = pd.get_dummies(dataset,columns=cat_cols,drop_first=True,)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset.keys()<\/code><\/pre>\n<\/div>\n<pre><code>Index(['Age', 'Experience', 'Income', 'CCAvg', 'Mortgage', 'Family_2',\n       'Family_3', 'Family_4', 'Education_2', 'Education_3', 'Personal Loan_1',\n       'Securities Account_1', 'CD Account_1', 'Online_1', 'CreditCard_1'],\n      dtype='object')\n<\/code><\/pre>\n<p>We can see 3 dummy columns are created for the <code>Family<\/code> attribute. Similarly, for <code>Education, Personal Loan<\/code>, etc. The number of columns depends on the labels a particular attribute has. To put it simply, if a variable has 3 levels &#8211; 0,1,2, the number of dummy columns created will be 3.<\/p>\n<p>Now our data is ready to be split into X and Y &#8211; Independent variables and the response variable. <code>y<\/code> will contain the labels, which in our case is the attribute &#8211; <code>Personal_Loan_1<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">X = dataset.copy().drop(\"Personal Loan_1\",axis=1)\ny = dataset[\"Personal Loan_1\"]<\/code><\/pre>\n<\/div>\n<p>Let&#8217;s see how many labels we have of each category<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">dataset[\"Personal Loan_1\"].value_counts()<\/code><\/pre>\n<\/div>\n<pre><code>0    4520\n1     480\nName: Personal Loan_1, dtype: int64\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">sns.countplot(x ='Personal Loan_1', data=dataset, palette='hls')\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/05\/output_50_0.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/05\/output_50_0.png\" alt=\"\" width=\"395\" height=\"263\" class=\"aligncenter size-full wp-image-27465\" srcset=\"\/wp-content\/uploads\/2021\/05\/output_50_0.png 395w, \/wp-content\/uploads\/2021\/05\/output_50_0-300x200.png 300w\" sizes=\"(max-width: 395px) 100vw, 395px\" \/><\/a><\/p>\n<h2>Splitting the dataset into Test and Train<\/h2>\n<p>Let us now split our dataset into Training and Testing. We have used an 80-20 ratio in our case. You could also use a 66-33 configuration which is more common. However, 1000 instances seem enough for training in this case.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">trainx, testx, trainy, testy = train_test_split(X, y, test_size=0.20)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">print(trainx.shape)\nprint(testx.shape)\nprint(trainy.shape)\nprint(testy.shape)<\/code><\/pre>\n<\/div>\n<pre><code>(4000, 14)\n(1000, 14)\n(4000,)\n(1000,)\n<\/code><\/pre>\n<h2>Scaling the features<\/h2>\n<p>Data Normalization or standardization is often performed when dealing with numeric data. This is important to ensure that each attribute lies within the same range. Data Standardization means rescaling the dataset such that it has an overall average of 0 and a standard deviation of 1. We will be using <code>scikit-learn StandardScaler()<\/code> for our purpose.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">scaler = StandardScaler()\nscaler.fit(trainx.iloc[:,:5])\n\ntrainx.iloc[:,:5] = scaler.transform(trainx.iloc[:,:5])\ntestx.iloc[:,:5] = scaler.transform(testx.iloc[:,:5])<\/code><\/pre>\n<\/div>\n<pre><code>C:UsersSHRIPRIYAanaconda3libsite-packagespandascoreindexing.py:966: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n  self.obj[item] = s\nC:UsersSHRIPRIYAanaconda3libsite-packagespandascoreindexing.py:966: SettingWithCopyWarning: \nA value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_indexer,col_indexer] = value instead\n\nSee the caveats in the documentation: https:\/\/pandas.pydata.org\/pandas-docs\/stable\/user_guide\/indexing.html#returning-a-view-versus-a-copy\n  self.obj[item] = s\n<\/code><\/pre>\n<h2>Building our Model<\/h2>\n<p>Now that our dataset is done with preprocessing, splitting and, standardizing, it is now time to pass it to the classification model. The training set and the output labels are passed to <code>model.fit<\/code> function for model building.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">X = trainx\ny = trainy\n\nmodel = LogisticRegression()\nmodel.fit(X , y)\npredicted_classes = model.predict(X)\naccuracy = accuracy_score(y,predicted_classes)\nparameters = model.coef_<\/code><\/pre>\n<\/div>\n<h2><span id=\"evaluating\">Model Evaluation<\/span><\/h2>\n<h3>Accuracy and Coefficients<\/h3>\n<p>The <code>accuracy<\/code> here is training accuracy and <code>parameters<\/code> are the coefficients of the built model.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">print(accuracy)\nprint(parameters)\nprint(model)<\/code><\/pre>\n<\/div>\n<pre><code>0.9615\n[[-0.11052202  0.21417872  2.70013875  0.30873518  0.04262124 -0.19654911\n   1.68376406  1.39637464  3.44973088  3.7034019  -0.56957366  3.35123915\n  -0.64710272 -0.72719671]]\nLogisticRegression()\n<\/code><\/pre>\n<p>Let us pass the test dataset to evaluate the accuracy of our model on unseen data of 1000 instances.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">model.fit(testx , testy)\npredicted_classes_test = model.predict(testx)\naccuracy = accuracy_score(testy,predicted_classes_test)\nprint(accuracy)<\/code><\/pre>\n<\/div>\n<pre><code>0.962\n<\/code><\/pre>\n<h3>Confusion Matrix and Heat Map<\/h3>\n<p>A confusion matrix shows 4 categories: 1. <strong>True Negative:<\/strong> Zeros predicted correctly (Actual and Predicted &#8211; 0) 2. <strong>False Negative:<\/strong> Ones wrongly predicted as zeros (Actual &#8211; 1, Predicted &#8211; 0) 3. <strong>False Positive:<\/strong> Zeros wrongly predicted as zeros (Actual &#8211; 0, Predicted &#8211; 1) 4. <strong>True Positive:<\/strong> Ones predicted correctly (Actual and Predicted &#8211; 1)<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">cm = confusion_matrix(testy,predicted_classes_test)\nfig, ax = plt.subplots(figsize=(6, 6))\nax.imshow(cm)\nax.grid(False)\nax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted 0s', 'Predicted 1s'))\nax.yaxis.set(ticks=(0, 1), ticklabels=('Actual 0s', 'Actual 1s'))\nax.set_ylim(1.5, -0.5)\nfor i in range(2):\n    for j in range(2):\n        ax.text(j, i, cm[i, j], ha='center', va='center', color='red')\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/05\/output_69_0.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/05\/output_69_0.png\" alt=\"\" width=\"393\" height=\"357\" class=\"aligncenter size-full wp-image-27466\" srcset=\"\/wp-content\/uploads\/2021\/05\/output_69_0.png 393w, \/wp-content\/uploads\/2021\/05\/output_69_0-300x273.png 300w\" sizes=\"(max-width: 393px) 100vw, 393px\" \/><\/a><\/p>\n<h2>Conclusion and Future Scope<\/h2>\n<p>Our Bank Classification model achieved an accuracy of <code>95.3%<\/code> on a dataset of 5000 instances which seems decent. The confusion matrix also revealed the specifics of false negatives and positives. The other alternatives for better accuracy can include &#8211; <code>Naive Bayes, KNN Classifier, Decision Trees, SVM, etc.<\/code> Explore more on the dataset home page <a href=\"https:\/\/www.kaggle.com\/sriharipramod\/bank-loan-classification\">here<\/a>.<\/p>\n<p>In this tutorial, we saw how can we insert our data into GridDB and access it using the <code>python-client<\/code>. We also used a simple SQL query to get the data from GridDB in a pandas dataframe. This ability of GridDB makes it versatile and thus, a popular choice for storing time-series data. There is a lot more you can do with GridDB. Check out our <a href=\"https:\/\/griddb.net\/en\/blog\/\">online community<\/a> today!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the following: 1&#46; Storing the data in GridDB 2&#46; Extracting the data from GridDB 3&#46; Building a Logistic Regression Model using Pandas 4&#46; Evaluating our model using heat map and [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":27479,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46645","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Bank Loan Classification | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bank Loan Classification | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-05-11T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:55:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Bank Loan Classification\",\"datePublished\":\"2021-05-11T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\"},\"wordCount\":1457,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\",\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\",\"name\":\"Bank Loan Classification | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg\",\"datePublished\":\"2021-05-11T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:21+00:00\",\"description\":\"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg\",\"contentUrl\":\"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg\",\"width\":2560,\"height\":1707},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Bank Loan Classification | GridDB: Open Source Time Series Database for IoT","description":"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/","og_locale":"en_US","og_type":"article","og_title":"Bank Loan Classification | GridDB: Open Source Time Series Database for IoT","og_description":"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the","og_url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2021-05-11T07:00:00+00:00","article_modified_time":"2025-11-13T20:55:21+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg","type":"image\/jpeg"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#article","isPartOf":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/"},"author":{"name":"griddb-admin","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Bank Loan Classification","datePublished":"2021-05-11T07:00:00+00:00","dateModified":"2025-11-13T20:55:21+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/"},"wordCount":1457,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/","url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/","name":"Bank Loan Classification | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage"},"image":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg","datePublished":"2021-05-11T07:00:00+00:00","dateModified":"2025-11-13T20:55:21+00:00","description":"Introduction Today, we will be building a Bank Loan Classification model from scratch using the data stored in GridDB. In this post, we will cover the","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/bank-loan-classification\/#primaryimage","url":"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg","contentUrl":"\/wp-content\/uploads\/2021\/05\/accounting_2560x1707.jpeg","width":2560,"height":1707},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46645","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/comments?post=46645"}],"version-history":[{"count":1,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46645\/revisions"}],"predecessor-version":[{"id":51320,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46645\/revisions\/51320"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media\/27479"}],"wp:attachment":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media?parent=46645"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/categories?post=46645"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/tags?post=46645"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}