{"id":46673,"date":"2021-11-12T00:00:00","date_gmt":"2021-11-12T08:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/exploring-customers-personalities-using-python\/"},"modified":"2025-11-13T12:55:39","modified_gmt":"2025-11-13T20:55:39","slug":"exploring-customers-personalities-using-python","status":"publish","type":"post","link":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/exploring-customers-personalities-using-python\/","title":{"rendered":"Exploring customers&#8217; personalities using Python"},"content":{"rendered":"<p>In this tutorial, we will be exploring a dataset containing customers&#8217; information such as Marital Status, Education, Income, etc. Getting an insight into the kinds of customers a firm deals with helps to provide better and customized services. Therefore, customer analysis and segmentation play a crucial role in a business.<\/p>\n<p>The outline for this tutorial is as follows:<\/p>\n<ol>\n<li>About the Dataset<\/li>\n<li>Importing the necessary libraries<\/li>\n<li>Exploratory Data Analysis<\/li>\n<li>Plotting the Correlation<\/li>\n<li>Encoding the data<\/li>\n<li>Scaling the data<\/li>\n<li>Dimensionality reduction<\/li>\n<li>CLustering<\/li>\n<li>Visualizing the clusters<\/li>\n<li>Observation<\/li>\n<li>Conclusion<\/li>\n<\/ol>\n<h2>1&#46; About the Dataset<\/h2>\n<p>The dataset used for this tutorial is publicly available on <a href=\"https:\/\/www.kaggle.com\/karnikakapoor\/customer-segmentation-clustering\/data\">Kaggle<\/a>. The total number of instances (or rows) are 2240 whereas the total number of attributes (or columns) is 29. As mentioned above, each attribute corresponds to a person trait important for customer classification such as Marital Status, Income, Customer ID, etc.<\/p>\n<p>Go ahead and download the dataset! We will next be importing libraries to begin our analysis.<\/p>\n<h2>2&#46; Importing the necessary libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">import numpy as np\nimport pandas as pd\nimport datetime\nimport matplotlib\nimport matplotlib.pyplot as plt\nfrom matplotlib import colors\nimport seaborn as sns\nfrom sklearn.preprocessing import StandardScaler\nfrom sklearn.cluster import AgglomerativeClustering\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom matplotlib.colors import ListedColormap\nfrom sklearn import metrics\nfrom sklearn.preprocessing import LabelEncoder\nfrom sklearn.decomposition import PCA<\/code><\/pre>\n<\/div>\n<p>In case you run into a package installation error, you can install it by typing <code>pip install package-name<\/code> in the command line. Alternatively, if you&#8217;re using a conda virtual environment, you can type <code>conda install package-name<\/code>.<\/p>\n<h2>Loading our dataset and Exploratory Data Analysis (EDA)<\/h2>\n<p>Let&#8217;s go ahead and load our CSV file.<\/p>\n<h3>Using GridDB<\/h3>\n<p>To store large amounts of data, a CSV file can be cumbersome and chaotic. <a href=\"https:\/\/griddb.net\/en\/\">GridDB<\/a> serves as a great alternative as it is open-source and a highly scalable database. It is optimized for IoT and Big Data so that you can store your time-series data effectively. It utilizes an in-memory data architecture along with parallel processing for higher performance. You can <a href=\"https:\/\/griddb.net\/en\/downloads\/\">Download GridDB<\/a> from their official website.<\/p>\n<p>By using the <a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB-Python<\/a> client, we can load our data directly into the Python environment as a dataframe.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import griddb_python as griddb\nimport pandas as pd\n\nsql_statement = ('SELECT * FROM marketing_campaign')\nmovie_review_test = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<p>Note that the <code>cont<\/code> variable has the container information where our data is stored. Replace the <code>marketing_campaign<\/code> with the name of your container. You can also refer to the <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">tutorial on reading and writing to GridDB<\/a> for more information.<\/p>\n<h3>Using Pandas<\/h3>\n<p>If you do not have GridDB, you can load the file using Pandas.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">data = pd.read_csv('marketing_campaign.csv', sep=\"t\")<\/code><\/pre>\n<\/div>\n<p>We&#8217;ll display the first five rows using the <code>head<\/code> command to get a gist of our data.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">data.head()<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          ID\n        <\/th>\n<th>\n          Year_Birth\n        <\/th>\n<th>\n          Education\n        <\/th>\n<th>\n          Marital_Status\n        <\/th>\n<th>\n          Income\n        <\/th>\n<th>\n          Kidhome\n        <\/th>\n<th>\n          Teenhome\n        <\/th>\n<th>\n          Dt_Customer\n        <\/th>\n<th>\n          Recency\n        <\/th>\n<th>\n          MntWines\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          NumWebVisitsMonth\n        <\/th>\n<th>\n          AcceptedCmp3\n        <\/th>\n<th>\n          AcceptedCmp4\n        <\/th>\n<th>\n          AcceptedCmp5\n        <\/th>\n<th>\n          AcceptedCmp1\n        <\/th>\n<th>\n          AcceptedCmp2\n        <\/th>\n<th>\n          Complain\n        <\/th>\n<th>\n          Z_CostContact\n        <\/th>\n<th>\n          Z_Revenue\n        <\/th>\n<th>\n          Response\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          5524\n        <\/td>\n<td>\n          1957\n        <\/td>\n<td>\n          Graduation\n        <\/td>\n<td>\n          Single\n        <\/td>\n<td>\n          58138.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          04-09-2012\n        <\/td>\n<td>\n          58\n        <\/td>\n<td>\n          635\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          7\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          2174\n        <\/td>\n<td>\n          1954\n        <\/td>\n<td>\n          Graduation\n        <\/td>\n<td>\n          Single\n        <\/td>\n<td>\n          46344.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          08-03-2014\n        <\/td>\n<td>\n          38\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          5\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          4141\n        <\/td>\n<td>\n          1965\n        <\/td>\n<td>\n          Graduation\n        <\/td>\n<td>\n          Together\n        <\/td>\n<td>\n          71613.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          21-08-2013\n        <\/td>\n<td>\n          26\n        <\/td>\n<td>\n          426\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          6182\n        <\/td>\n<td>\n          1984\n        <\/td>\n<td>\n          Graduation\n        <\/td>\n<td>\n          Together\n        <\/td>\n<td>\n          26646.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          10-02-2014\n        <\/td>\n<td>\n          26\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          6\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          5324\n        <\/td>\n<td>\n          1981\n        <\/td>\n<td>\n          PhD\n        <\/td>\n<td>\n          Married\n        <\/td>\n<td>\n          58293.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          19-01-2014\n        <\/td>\n<td>\n          94\n        <\/td>\n<td>\n          173\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          5\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n    5 rows \u00c3\u0097 29 columns\n  <\/p>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">len(data)<\/code><\/pre>\n<\/div>\n<pre><code>2240\n<\/code><\/pre>\n<p>As mentioned earlier, we have a total of 2240 instances. Let&#8217;s eliminate any missing values since they can behave abnormally during mathematical operations.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data = data.dropna()\nlen(data)<\/code><\/pre>\n<\/div>\n<pre><code>2216\n<\/code><\/pre>\n<p>We had 24 missing instances in our dataset. Let&#8217;s also convert the Date attribute of our dataset to the DateTime format as it will later be used for calculating the time a customer has been active.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Dt_Customer\"] = pd.to_datetime(data[\"Dt_Customer\"])\ndates = []\nfor i in data[\"Dt_Customer\"]:\n    i = i.date()\n    dates.append(i) <\/code><\/pre>\n<\/div>\n<p>We will be creating a new attribute &#8211; <code>Customer_dur<\/code> to calculate the duration for which a customer has been associated with the firm. For the sake of simplicity, we will be extracting the date of the most recent customer and using it to calculate the duration for others.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">days = []\nd1 = max(dates) \nfor i in dates:\n    delta = d1 - i\n    days.append(delta)\ndata[\"Customer_dur\"] = days\ndata[\"Customer_dur\"] = pd.to_numeric(data[\"Customer_dur\"], errors=\"coerce\")<\/code><\/pre>\n<\/div>\n<p>Great! Now let&#8217;s have a look at the next attribute &#8211; <code>Marital Status<\/code>. From the <code>head<\/code> command, it seems like there are more than two categories.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Marital_Status\"].value_counts()<\/code><\/pre>\n<\/div>\n<pre><code>Married     857\nTogether    573\nSingle      471\nDivorced    232\nWidow        76\nAlone         3\nAbsurd        2\nYOLO          2\nName: Marital_Status, dtype: int64\n<\/code><\/pre>\n<p>There are 8 further categories of this attribute. However, we will be grouping them together to ease out our segmentation process. We will be creating a binary attribute &#8211; <code>Living_With<\/code> having either of the two values &#8211; <code>Partner or Alone<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Living_With\"]=data[\"Marital_Status\"].replace({\"Married\":\"Partner\", \"Together\":\"Partner\", \"Absurd\":\"Alone\", \"Widow\":\"Alone\", \"YOLO\":\"Alone\", \"Divorced\":\"Alone\", \"Single\":\"Alone\",})<\/code><\/pre>\n<\/div>\n<p>Now that we have a derivative feature, <code>Marital_Status<\/code> serves as a redundant attribute. Let&#8217;s go ahead and drop that. We&#8217;ll also be dropping some columns that contain promotions&#8217; and deals&#8217; information as it is not relevant for customer segmentation.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">cols_del = ['Marital_Status', 'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1','AcceptedCmp2', 'Complain', 'Response']\ndata = data.drop(cols_del, axis=1)<\/code><\/pre>\n<\/div>\n<p>Let&#8217;s have a look at how many categories does the <code>Education<\/code> attribute hold.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Education\"].value_counts()<\/code><\/pre>\n<\/div>\n<pre><code>Graduation    1116\nPhD            481\nMaster         365\n2n Cycle       200\nBasic           54\nName: Education, dtype: int64\n<\/code><\/pre>\n<p>We will repeat the same step with the <code>Education<\/code> attribute as well. In this case, the categories will be <code>Undergraduate, Graduate, Postgraduate<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Education\"]=data[\"Education\"].replace({\"Basic\":\"Undergraduate\",\"2n Cycle\":\"Undergraduate\", \"Graduation\":\"Graduate\", \"Master\":\"Postgraduate\", \"PhD\":\"Postgraduate\"})<\/code><\/pre>\n<\/div>\n<p>We will be creating some new features derived from the original ones for easy calculations in the later sections. These features are as follows:<\/p>\n<ol>\n<li><code>Age<\/code>: can be derived from the <code>Year_of_Birth<\/code>.<\/li>\n<li><code>Spent<\/code>: sum of all the edible products (wine, fruits, fish, etc.).<\/li>\n<li><code>Children<\/code>: sum of <code>Kidhome and Teenhome<\/code>.<\/li>\n<li><code>Family_Size<\/code>: We&#8217;ll use the <code>Living_With<\/code> attribute along with the <code>Children<\/code> attribute.<\/li>\n<li><code>Is_Parent<\/code>: Binary attribute with values 0 or 1 derived from the <code>Children<\/code> attribute.<\/li>\n<\/ol>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data[\"Age\"] = 2021-data[\"Year_Birth\"]\ndata[\"Spent\"] = data[\"MntWines\"]+ data[\"MntFruits\"]+ data[\"MntMeatProducts\"]+ data[\"MntFishProducts\"]+ data[\"MntSweetProducts\"]+ data[\"MntGoldProds\"]\ndata[\"Children\"]=data[\"Kidhome\"]+data[\"Teenhome\"]\ndata[\"Family_Size\"] = data[\"Living_With\"].replace({\"Alone\": 1, \"Partner\":2})+ data[\"Children\"]\ndata[\"Is_Parent\"] = np.where(data.Children> 0, 1, 0)<\/code><\/pre>\n<\/div>\n<p>Like previously, we&#8217;ll drop the redundant attributes.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">to_drop = [\"Dt_Customer\", \"Z_CostContact\", \"Z_Revenue\", \"Year_Birth\", \"ID\"]\ndata = data.drop(to_drop, axis=1)<\/code><\/pre>\n<\/div>\n<p>After all the changes, our data looks like &#8211;<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data.describe()<\/code><\/pre>\n<\/div>\n<div style=\"overflow: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Income\n        <\/th>\n<th>\n          Kidhome\n        <\/th>\n<th>\n          Teenhome\n        <\/th>\n<th>\n          Recency\n        <\/th>\n<th>\n          MntWines\n        <\/th>\n<th>\n          MntFruits\n        <\/th>\n<th>\n          MntMeatProducts\n        <\/th>\n<th>\n          MntFishProducts\n        <\/th>\n<th>\n          MntSweetProducts\n        <\/th>\n<th>\n          MntGoldProds\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          NumWebPurchases\n        <\/th>\n<th>\n          NumCatalogPurchases\n        <\/th>\n<th>\n          NumStorePurchases\n        <\/th>\n<th>\n          NumWebVisitsMonth\n        <\/th>\n<th>\n          Customer_dur\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Spent\n        <\/th>\n<th>\n          Children\n        <\/th>\n<th>\n          Family_Size\n        <\/th>\n<th>\n          Is_Parent\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          count\n        <\/th>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2.216000e+03\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<td>\n          2216.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          mean\n        <\/th>\n<td>\n          52247.251354\n        <\/td>\n<td>\n          0.441787\n        <\/td>\n<td>\n          0.505415\n        <\/td>\n<td>\n          49.012635\n        <\/td>\n<td>\n          305.091606\n        <\/td>\n<td>\n          26.356047\n        <\/td>\n<td>\n          166.995939\n        <\/td>\n<td>\n          37.637635\n        <\/td>\n<td>\n          27.028881\n        <\/td>\n<td>\n          43.965253\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          4.085289\n        <\/td>\n<td>\n          2.671029\n        <\/td>\n<td>\n          5.800993\n        <\/td>\n<td>\n          5.319043\n        <\/td>\n<td>\n          4.423735e+16\n        <\/td>\n<td>\n          52.179603\n        <\/td>\n<td>\n          607.075361\n        <\/td>\n<td>\n          0.947202\n        <\/td>\n<td>\n          2.592509\n        <\/td>\n<td>\n          0.714350\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          std\n        <\/th>\n<td>\n          25173.076661\n        <\/td>\n<td>\n          0.536896\n        <\/td>\n<td>\n          0.544181\n        <\/td>\n<td>\n          28.948352\n        <\/td>\n<td>\n          337.327920\n        <\/td>\n<td>\n          39.793917\n        <\/td>\n<td>\n          224.283273\n        <\/td>\n<td>\n          54.752082\n        <\/td>\n<td>\n          41.072046\n        <\/td>\n<td>\n          51.815414\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2.740951\n        <\/td>\n<td>\n          2.926734\n        <\/td>\n<td>\n          3.250785\n        <\/td>\n<td>\n          2.425359\n        <\/td>\n<td>\n          2.008532e+16\n        <\/td>\n<td>\n          11.985554\n        <\/td>\n<td>\n          602.900476\n        <\/td>\n<td>\n          0.749062\n        <\/td>\n<td>\n          0.905722\n        <\/td>\n<td>\n          0.451825\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          min\n        <\/th>\n<td>\n          1730.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000e+00\n        <\/td>\n<td>\n          25.000000\n        <\/td>\n<td>\n          5.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          25%\n        <\/th>\n<td>\n          35303.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          24.000000\n        <\/td>\n<td>\n          24.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          16.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          9.000000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          2.937600e+16\n        <\/td>\n<td>\n          44.000000\n        <\/td>\n<td>\n          69.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          50%\n        <\/th>\n<td>\n          51381.500000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          49.000000\n        <\/td>\n<td>\n          174.500000\n        <\/td>\n<td>\n          8.000000\n        <\/td>\n<td>\n          68.000000\n        <\/td>\n<td>\n          12.000000\n        <\/td>\n<td>\n          8.000000\n        <\/td>\n<td>\n          24.500000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          4.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          5.000000\n        <\/td>\n<td>\n          6.000000\n        <\/td>\n<td>\n          4.432320e+16\n        <\/td>\n<td>\n          51.000000\n        <\/td>\n<td>\n          396.500000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          75%\n        <\/th>\n<td>\n          68522.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          74.000000\n        <\/td>\n<td>\n          505.000000\n        <\/td>\n<td>\n          33.000000\n        <\/td>\n<td>\n          232.250000\n        <\/td>\n<td>\n          50.000000\n        <\/td>\n<td>\n          33.000000\n        <\/td>\n<td>\n          56.000000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          6.000000\n        <\/td>\n<td>\n          4.000000\n        <\/td>\n<td>\n          8.000000\n        <\/td>\n<td>\n          7.000000\n        <\/td>\n<td>\n          5.927040e+16\n        <\/td>\n<td>\n          62.000000\n        <\/td>\n<td>\n          1048.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          max\n        <\/th>\n<td>\n          666666.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          99.000000\n        <\/td>\n<td>\n          1493.000000\n        <\/td>\n<td>\n          199.000000\n        <\/td>\n<td>\n          1725.000000\n        <\/td>\n<td>\n          259.000000\n        <\/td>\n<td>\n          262.000000\n        <\/td>\n<td>\n          321.000000\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          27.000000\n        <\/td>\n<td>\n          28.000000\n        <\/td>\n<td>\n          13.000000\n        <\/td>\n<td>\n          20.000000\n        <\/td>\n<td>\n          9.184320e+16\n        <\/td>\n<td>\n          128.000000\n        <\/td>\n<td>\n          2525.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          5.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n    8 rows \u00c3\u0097 21 columns\n  <\/p>\n<\/div>\n<h2>4&#46; Plotting the correlation<\/h2>\n<p>We will now plot these newly created features pair-wise while using the <code>Is_Parent<\/code> as the main classification attribute.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">To_Plot = [ \"Income\", \"Age\", \"Spent\", \"Is_Parent\"]\nplt.figure()\nsns.pairplot(data[To_Plot], hue= \"Is_Parent\") \nplt.show()<\/code><\/pre>\n<\/div>\n<pre><code>&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_38_1-1.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_38_1-1.png\" alt=\"\" width=\"621\" height=\"550\" class=\"aligncenter size-full wp-image-27872\" srcset=\"\/wp-content\/uploads\/2021\/11\/output_38_1-1.png 621w, \/wp-content\/uploads\/2021\/11\/output_38_1-1-300x266.png 300w, \/wp-content\/uploads\/2021\/11\/output_38_1-1-600x531.png 600w\" sizes=\"(max-width: 621px) 100vw, 621px\" \/><\/a><\/p>\n<p>We can see that we have a few outliers such as:<\/p>\n<ol>\n<li>\n<p><code>Age&gt;100<\/code> is highly unlikely. This implies that some of the data instances might be old. Also, the majority of the data is concentrated below the threshold of 80. So, we&#8217;ll eliminate the few instances that cross that threshold<\/p>\n<\/li>\n<li>\n<p><code>Income&gt;600000<\/code> has only one instance.<\/p>\n<\/li>\n<\/ol>\n<p>We&#8217;ll go ahead and delete that.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data = data[(data[\"Age\"]&lt;90)]\ndata = data[(data[\"Income\"]&lt;600000)]<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">len(data)<\/code><\/pre>\n<\/div>\n<pre><code>2212\n<\/code><\/pre>\n<h2>5&#46; Encoding the data<\/h2>\n<p>Categorical variables need to be encoded before a machine learning task as the string value can not be directly used in mathematical operations. Let&#8217;s print out the categorical attributes in our dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">s = (data.dtypes == 'object')\nobject_cols = list(s[s].index)\n\nobject_cols<\/code><\/pre>\n<\/div>\n<pre><code>['Education', 'Living_With']\n<\/code><\/pre>\n<p>We will use the scikit-learn&#8217;s Label Encoder which encodes the target values into classes: 0 to n-1. In our case, <code>Education<\/code> has three categories, so the values will be denoted by <code>0,1,2<\/code> respectively. Similarly, <code>Living_With<\/code> is a binary attribute, therefore, the values will take the form of <code>0 or 1<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">LE=LabelEncoder()\nfor i in object_cols:\n    data[i]=data[[i]].apply(LE.fit_transform)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">data.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Education\n        <\/th>\n<th>\n          Income\n        <\/th>\n<th>\n          Kidhome\n        <\/th>\n<th>\n          Teenhome\n        <\/th>\n<th>\n          Recency\n        <\/th>\n<th>\n          MntWines\n        <\/th>\n<th>\n          MntFruits\n        <\/th>\n<th>\n          MntMeatProducts\n        <\/th>\n<th>\n          MntFishProducts\n        <\/th>\n<th>\n          MntSweetProducts\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          NumCatalogPurchases\n        <\/th>\n<th>\n          NumStorePurchases\n        <\/th>\n<th>\n          NumWebVisitsMonth\n        <\/th>\n<th>\n          Customer_dur\n        <\/th>\n<th>\n          Living_With\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Spent\n        <\/th>\n<th>\n          Children\n        <\/th>\n<th>\n          Family_Size\n        <\/th>\n<th>\n          Is_Parent\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          0\n        <\/td>\n<td>\n          58138.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          58\n        <\/td>\n<td>\n          635\n        <\/td>\n<td>\n          88\n        <\/td>\n<td>\n          546\n        <\/td>\n<td>\n          172\n        <\/td>\n<td>\n          88\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          10\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          7\n        <\/td>\n<td>\n          83894400000000000\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          64\n        <\/td>\n<td>\n          1617\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          0\n        <\/td>\n<td>\n          46344.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          38\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          6\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          5\n        <\/td>\n<td>\n          10800000000000000\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          67\n        <\/td>\n<td>\n          27\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          0\n        <\/td>\n<td>\n          71613.0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          26\n        <\/td>\n<td>\n          426\n        <\/td>\n<td>\n          49\n        <\/td>\n<td>\n          127\n        <\/td>\n<td>\n          111\n        <\/td>\n<td>\n          21\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          10\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          40780800000000000\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          56\n        <\/td>\n<td>\n          776\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          0\n        <\/td>\n<td>\n          26646.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          26\n        <\/td>\n<td>\n          11\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          20\n        <\/td>\n<td>\n          10\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          6\n        <\/td>\n<td>\n          5616000000000000\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          37\n        <\/td>\n<td>\n          53\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          1\n        <\/td>\n<td>\n          58293.0\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          0\n        <\/td>\n<td>\n          94\n        <\/td>\n<td>\n          173\n        <\/td>\n<td>\n          43\n        <\/td>\n<td>\n          118\n        <\/td>\n<td>\n          46\n        <\/td>\n<td>\n          27\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          6\n        <\/td>\n<td>\n          5\n        <\/td>\n<td>\n          27734400000000000\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          40\n        <\/td>\n<td>\n          422\n        <\/td>\n<td>\n          1\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          1\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n    5 rows \u00c3\u0097 23 columns\n  <\/p>\n<\/div>\n<h2>6&#46; Scaling the data<\/h2>\n<p>It is also important to note that these numerical values differ in scale. This could lead to a biased model which gives much significance to one attribute over the other. Therefore, it is important to map them on a similar scale.<\/p>\n<p>Standard Scaler removes the mean and scales the features to unit variance. More information on how unit variance is calculated can be found <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html\">here<\/a>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">scaler = StandardScaler()\nscaler.fit(data)\nscaled_data = pd.DataFrame(scaler.transform(data),columns= data.columns)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">scaled_data.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Education\n        <\/th>\n<th>\n          Income\n        <\/th>\n<th>\n          Kidhome\n        <\/th>\n<th>\n          Teenhome\n        <\/th>\n<th>\n          Recency\n        <\/th>\n<th>\n          MntWines\n        <\/th>\n<th>\n          MntFruits\n        <\/th>\n<th>\n          MntMeatProducts\n        <\/th>\n<th>\n          MntFishProducts\n        <\/th>\n<th>\n          MntSweetProducts\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          NumCatalogPurchases\n        <\/th>\n<th>\n          NumStorePurchases\n        <\/th>\n<th>\n          NumWebVisitsMonth\n        <\/th>\n<th>\n          Customer_dur\n        <\/th>\n<th>\n          Living_With\n        <\/th>\n<th>\n          Age\n        <\/th>\n<th>\n          Spent\n        <\/th>\n<th>\n          Children\n        <\/th>\n<th>\n          Family_Size\n        <\/th>\n<th>\n          Is_Parent\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          -0.893586\n        <\/td>\n<td>\n          0.287105\n        <\/td>\n<td>\n          -0.822754\n        <\/td>\n<td>\n          -0.929699\n        <\/td>\n<td>\n          0.310353\n        <\/td>\n<td>\n          0.977660\n        <\/td>\n<td>\n          1.552041\n        <\/td>\n<td>\n          1.690293\n        <\/td>\n<td>\n          2.453472\n        <\/td>\n<td>\n          1.483713\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2.503607\n        <\/td>\n<td>\n          -0.555814\n        <\/td>\n<td>\n          0.692181\n        <\/td>\n<td>\n          1.973583\n        <\/td>\n<td>\n          -1.349603\n        <\/td>\n<td>\n          1.018352\n        <\/td>\n<td>\n          1.676245\n        <\/td>\n<td>\n          -1.264598\n        <\/td>\n<td>\n          -1.758359\n        <\/td>\n<td>\n          -1.581139\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          -0.893586\n        <\/td>\n<td>\n          -0.260882\n        <\/td>\n<td>\n          1.040021\n        <\/td>\n<td>\n          0.908097\n        <\/td>\n<td>\n          -0.380813\n        <\/td>\n<td>\n          -0.872618\n        <\/td>\n<td>\n          -0.637461\n        <\/td>\n<td>\n          -0.718230\n        <\/td>\n<td>\n          -0.651004\n        <\/td>\n<td>\n          -0.634019\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.571340\n        <\/td>\n<td>\n          -1.171160\n        <\/td>\n<td>\n          -0.132545\n        <\/td>\n<td>\n          -1.665144\n        <\/td>\n<td>\n          -1.349603\n        <\/td>\n<td>\n          1.274785\n        <\/td>\n<td>\n          -0.963297\n        <\/td>\n<td>\n          1.404572\n        <\/td>\n<td>\n          0.449070\n        <\/td>\n<td>\n          0.632456\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          -0.893586\n        <\/td>\n<td>\n          0.913196\n        <\/td>\n<td>\n          -0.822754\n        <\/td>\n<td>\n          -0.929699\n        <\/td>\n<td>\n          -0.795514\n        <\/td>\n<td>\n          0.357935\n        <\/td>\n<td>\n          0.570540\n        <\/td>\n<td>\n          -0.178542\n        <\/td>\n<td>\n          1.339513\n        <\/td>\n<td>\n          -0.147184\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.229679\n        <\/td>\n<td>\n          1.290224\n        <\/td>\n<td>\n          -0.544908\n        <\/td>\n<td>\n          -0.172664\n        <\/td>\n<td>\n          0.740959\n        <\/td>\n<td>\n          0.334530\n        <\/td>\n<td>\n          0.280110\n        <\/td>\n<td>\n          -1.264598\n        <\/td>\n<td>\n          -0.654644\n        <\/td>\n<td>\n          -1.581139\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          -0.893586\n        <\/td>\n<td>\n          -1.176114\n        <\/td>\n<td>\n          1.040021\n        <\/td>\n<td>\n          -0.929699\n        <\/td>\n<td>\n          -0.795514\n        <\/td>\n<td>\n          -0.872618\n        <\/td>\n<td>\n          -0.561961\n        <\/td>\n<td>\n          -0.655787\n        <\/td>\n<td>\n          -0.504911\n        <\/td>\n<td>\n          -0.585335\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          -0.913000\n        <\/td>\n<td>\n          -0.555814\n        <\/td>\n<td>\n          0.279818\n        <\/td>\n<td>\n          -1.923210\n        <\/td>\n<td>\n          0.740959\n        <\/td>\n<td>\n          -1.289547\n        <\/td>\n<td>\n          -0.920135\n        <\/td>\n<td>\n          0.069987\n        <\/td>\n<td>\n          0.449070\n        <\/td>\n<td>\n          0.632456\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          0.571657\n        <\/td>\n<td>\n          0.294307\n        <\/td>\n<td>\n          1.040021\n        <\/td>\n<td>\n          -0.929699\n        <\/td>\n<td>\n          1.554453\n        <\/td>\n<td>\n          -0.392257\n        <\/td>\n<td>\n          0.419540\n        <\/td>\n<td>\n          -0.218684\n        <\/td>\n<td>\n          0.152508\n        <\/td>\n<td>\n          -0.001133\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          0.111982\n        <\/td>\n<td>\n          0.059532\n        <\/td>\n<td>\n          -0.132545\n        <\/td>\n<td>\n          -0.822130\n        <\/td>\n<td>\n          0.740959\n        <\/td>\n<td>\n          -1.033114\n        <\/td>\n<td>\n          -0.307562\n        <\/td>\n<td>\n          0.069987\n        <\/td>\n<td>\n          0.449070\n        <\/td>\n<td>\n          0.632456\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n    5 rows \u00c3\u0097 23 columns\n  <\/p>\n<\/div>\n<h2>7&#46; Dimensionality reduction<\/h2>\n<p>We now have the scaled data but the total number of columns is still large to deal with. A higher number of columns lead to higher dimensions which makes it harder to work with. For the sake of simplicity, we will be reducing the total number of columns to 3 as some of them are already redundant.<\/p>\n<p>Principal Component Analysis (PCA) is one of the popular techniques used for dimensionality reduction as it minimizes information loss.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">pca = PCA(n_components=3)\npca.fit(scaled_data)\npca_data = pd.DataFrame(pca.transform(scaled_data), columns=([\"c1\",\"c2\", \"c3\"]))<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">pca_data.describe().T<\/code><\/pre>\n<\/div>\n<div style=\"overflow: auto;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          count\n        <\/th>\n<th>\n          mean\n        <\/th>\n<th>\n          std\n        <\/th>\n<th>\n          min\n        <\/th>\n<th>\n          25%\n        <\/th>\n<th>\n          50%\n        <\/th>\n<th>\n          75%\n        <\/th>\n<th>\n          max\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          c1\n        <\/th>\n<td>\n          2212.0\n        <\/td>\n<td>\n          -2.549698e-16\n        <\/td>\n<td>\n          2.878377\n        <\/td>\n<td>\n          -5.969394\n        <\/td>\n<td>\n          -2.538494\n        <\/td>\n<td>\n          -0.780421\n        <\/td>\n<td>\n          2.383290\n        <\/td>\n<td>\n          7.444304\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          c2\n        <\/th>\n<td>\n          2212.0\n        <\/td>\n<td>\n          -3.924929e-17\n        <\/td>\n<td>\n          1.706839\n        <\/td>\n<td>\n          -4.312236\n        <\/td>\n<td>\n          -1.328300\n        <\/td>\n<td>\n          -0.158123\n        <\/td>\n<td>\n          1.242307\n        <\/td>\n<td>\n          6.142677\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          c3\n        <\/th>\n<td>\n          2212.0\n        <\/td>\n<td>\n          6.936384e-17\n        <\/td>\n<td>\n          1.221957\n        <\/td>\n<td>\n          -3.530349\n        <\/td>\n<td>\n          -0.828741\n        <\/td>\n<td>\n          -0.021947\n        <\/td>\n<td>\n          0.799472\n        <\/td>\n<td>\n          6.614546\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2>8&#46; Clustering<\/h2>\n<p>Now that our data has been cleaned and reduced to 3 dimensions, we can go ahead and divide it into clusters. For this tutorial, we will be using Agglomerative Clustering from the scikit-learn library. Agglomerative Clustering is a recursive method of clustering that uses linkage distance. More information can be found <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.AgglomerativeClustering.html\">here<\/a>.<\/p>\n<p>The number of clusters is chosen as 4. However, for real-world models, there are approaches such as the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Elbow_method_(clustering)#:~:text=In%20cluster%20analysis%2C%20the%20elbow,number%20of%20clusters%20to%20use.\">Elbow method<\/a> to anticipate the ideal number of clusters for a sample of data.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">ac = AgglomerativeClustering(n_clusters=4)\ncustomer_ac = ac.fit_predict(pca_data)<\/code><\/pre>\n<\/div>\n<p>Let&#8217;s add this data to our original dataframes.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">pca_data[\"Clusters\"] = customer_ac\ndata[\"Clusters\"]= customer_ac<\/code><\/pre>\n<\/div>\n<h2>9&#46; Visualizing the clusters<\/h2>\n<p>Great! It is now time to plot our clusters and see how they look in 3D.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">x =pca_data[\"c1\"]\ny =pca_data[\"c2\"]\nz =pca_data[\"c3\"]\n\nfig = plt.figure(figsize=(8,8))\nax = plt.subplot(111, projection='3d')\nax.scatter(x, y, z, s=20, c=pca_data[\"Clusters\"])\nplt.show()<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_64_0.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_64_0.png\" alt=\"\" width=\"460\" height=\"449\" class=\"aligncenter size-full wp-image-27866\" srcset=\"\/wp-content\/uploads\/2021\/11\/output_64_0.png 460w, \/wp-content\/uploads\/2021\/11\/output_64_0-300x293.png 300w\" sizes=\"(max-width: 460px) 100vw, 460px\" \/><\/a><\/p>\n<p>We will now plot the pairwise plots of the <code>Spent<\/code> attribute with the other attributes we created previously in this tutorial. This will give us an insight as to how the final results are affected by each attribute and what kind of customers belong to each cluster.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-py\">columns = [ \"Customer_dur\", \"Age\", \"Family_Size\", \"Is_Parent\", \"Education\",\"Living_With\"]\nh = data[\"Clusters\"]\nfor col in columns:\n    plt.figure()\n    sns.jointplot(x=data[col], y=data[\"Spent\"], hue=h, kind=\"kde\")\n    plt.show()<\/code><\/pre>\n<\/div>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_2.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_2.png\" alt=\"\" width=\"434\" height=\"435\" class=\"aligncenter size-full wp-image-27867\" srcset=\"\/wp-content\/uploads\/2021\/11\/output_66_2.png 434w, \/wp-content\/uploads\/2021\/11\/output_66_2-300x300.png 300w, \/wp-content\/uploads\/2021\/11\/output_66_2-150x150.png 150w, \/wp-content\/uploads\/2021\/11\/output_66_2-230x230.png 230w, \/wp-content\/uploads\/2021\/11\/output_66_2-400x400.png 400w\" sizes=\"(max-width: 434px) 100vw, 434px\" \/><\/a><\/p>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\" alt=\"\" width=\"434\" height=\"424\" class=\"aligncenter size-full wp-image-27868\" srcset=\"\/wp-content\/uploads\/2021\/11\/output_66_5.png 434w, \/wp-content\/uploads\/2021\/11\/output_66_5-300x293.png 300w\" sizes=\"(max-width: 434px) 100vw, 434px\" \/><\/a><\/p>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_8.png\" alt=\"\" width=\"434\" height=\"424\" class=\"aligncenter size-full wp-image-27868\" \/><\/a><\/p>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_11.png\" alt=\"\" width=\"434\" height=\"424\" class=\"aligncenter size-full wp-image-27868\" \/><\/a><\/p>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_14.png\" alt=\"\" width=\"434\" height=\"424\" class=\"aligncenter size-full wp-image-27868\" \/><\/a><\/p>\n<pre><code>C:UsersShripriyaanaconda3libsite-packagesseaborndistributions.py:434: UserWarning: The following kwargs were not used by contour: 'hue'\n  cset = contour_func(xx, yy, z, n_levels, **kwargs)\n\n\n\n&lt;Figure size 432x288 with 0 Axes&gt;\n<\/code><\/pre>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2021\/11\/output_66_17.png\" alt=\"\" width=\"434\" height=\"424\" class=\"aligncenter size-full wp-image-27868\" \/><\/a><\/p>\n<h2>10&#46; Observation<\/h2>\n<p>Some key insights from the above joint plots:<\/p>\n<ol>\n<li><strong>Cluster 0<\/strong>: Certainly a parent, 40-70 age group, family size of 2-4.<\/li>\n<li><strong>Cluster 1<\/strong>: Not a parent, 30-80 age group, family size of 1-2.<\/li>\n<li><strong>Cluster 2<\/strong>: The majority of customers are a parent, all age groups from 20-80, family size of 1-3.<\/li>\n<li><strong>Cluster 3<\/strong>: Certainly a parent, 35-75 age group, family size of 2-5.<\/li>\n<\/ol>\n<p>The <code>Customer_dur<\/code> attribute spans all across the clusters and is not specific to one, therefore we get a widely spread shaped.<\/p>\n<p>Note: The seaborn library is not updated to its latest version on the system where the tutorial was executed and therefore, some features like contour did not behave as expected. However, if you have an updated environment, the above graphs will be much clearer with distinct lines and cluster labels.<\/p>\n<h2>11&#46; Conclusion<\/h2>\n<p>In this tutorial, we explored customer profiles and see how it can affect a firm&#8217;s business. We also segmented the customers using Agglomerative Clustering. In the end, we identified some key features corresponding to each cluster. Finally, we introduced an alternative for storing large amounts of data in a highly efficient and scalable way &#8211; GridDB!<\/p>\n<p>This code has been inspired from <a href=\"https:\/\/www.kaggle.com\/karnikakapoor\/customer-segmentation-clustering\">Kaggle<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will be exploring a dataset containing customers&#8217; information such as Marital Status, Education, Income, etc. Getting an insight into the kinds of customers a firm deals with helps to provide better and customized services. Therefore, customer analysis and segmentation play a crucial role in a business. The outline for this tutorial [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":27896,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46673","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Exploring customers&#039; personalities using Python | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"In this tutorial, we will be exploring a dataset containing customers&#039; information such as Marital Status, Education, Income, etc. Getting an insight into\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Exploring customers&#039; personalities using Python | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"In this tutorial, we will be exploring a dataset containing customers&#039; information such as Marital Status, Education, Income, etc. Getting an insight into\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-12T08:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:55:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2021\/11\/pythoncustomer.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1160\" \/>\n\t<meta property=\"og:image:height\" content=\"653\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"11 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Exploring customers&#8217; personalities using Python\",\"datePublished\":\"2021-11-12T08:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\"},\"wordCount\":1549,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\",\"name\":\"Exploring customers' personalities using Python | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png\",\"datePublished\":\"2021-11-12T08:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:39+00:00\",\"description\":\"In this tutorial, we will be exploring a dataset containing customers' information such as Marital Status, Education, Income, etc. Getting an insight into\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png\",\"contentUrl\":\"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png\",\"width\":1160,\"height\":653},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Exploring customers' personalities using Python | GridDB: Open Source Time Series Database for IoT","description":"In this tutorial, we will be exploring a dataset containing customers' information such as Marital Status, Education, Income, etc. Getting an insight into","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/","og_locale":"en_US","og_type":"article","og_title":"Exploring customers' personalities using Python | GridDB: Open Source Time Series Database for IoT","og_description":"In this tutorial, we will be exploring a dataset containing customers' information such as Marital Status, Education, Income, etc. Getting an insight into","og_url":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2021-11-12T08:00:00+00:00","article_modified_time":"2025-11-13T20:55:39+00:00","og_image":[{"width":1160,"height":653,"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2021\/11\/pythoncustomer.png","type":"image\/png"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"11 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/"},"author":{"name":"griddb-admin","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Exploring customers&#8217; personalities using Python","datePublished":"2021-11-12T08:00:00+00:00","dateModified":"2025-11-13T20:55:39+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/"},"wordCount":1549,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/","url":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/","name":"Exploring customers' personalities using Python | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png","datePublished":"2021-11-12T08:00:00+00:00","dateModified":"2025-11-13T20:55:39+00:00","description":"In this tutorial, we will be exploring a dataset containing customers' information such as Marital Status, Education, Income, etc. Getting an insight into","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/exploring-customers-personalities-using-python\/#primaryimage","url":"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png","contentUrl":"\/wp-content\/uploads\/2021\/11\/pythoncustomer.png","width":1160,"height":653},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46673","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/comments?post=46673"}],"version-history":[{"count":1,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46673\/revisions"}],"predecessor-version":[{"id":51347,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46673\/revisions\/51347"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media\/27896"}],"wp:attachment":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media?parent=46673"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/categories?post=46673"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/tags?post=46673"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}