{"id":46698,"date":"2022-04-01T00:00:00","date_gmt":"2022-04-01T07:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/multi-class-text-classification-using-python-and-griddb\/"},"modified":"2025-11-13T12:55:58","modified_gmt":"2025-11-13T20:55:58","slug":"multi-class-text-classification-using-python-and-griddb","status":"publish","type":"post","link":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/","title":{"rendered":"Multi Class Text Classification using Python and GridDB"},"content":{"rendered":"<p>On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing continuously, so it is important to classify the news in a way that lets users access the information they are interested in quickly and efficiently. Using this model, users would be able to identify news topics that go untracked, and\/or make recommendations based on their prior interests. Thus, we aim to build models that take news headlines and short descriptions as inputs and produce news categories as outputs.<\/p>\n<p>The problem we will tackle is the classification of BBC News articles and their categories. Using the text as an input, we will predict what the category would be. There are five types of categories: business, entertainment, politics, sport, and technology.<\/p>\n<p>The outline of the tutorial is as follows:<\/p>\n<ol>\n<li>Prerequisites and Environment setup<\/li>\n<li>Dataset overview<\/li>\n<li>Importing required libraries<\/li>\n<li>Loading the dataset<\/li>\n<li>Data Cleaning and Preprocessing<\/li>\n<li>Building and Training a Machine Learning Model<\/li>\n<li>Conclusion<\/li>\n<\/ol>\n<h2>1&#46; Prerequisites and Environment setup<\/h2>\n<p>This tutorial is carried out in Anaconda Navigator (Python version \u2013 3.8.3) on Windows Operating System. The following packages need to be installed before you continue with the tutorial \u2013<\/p>\n<ol>\n<li>\n<p>Pandas<\/p>\n<\/li>\n<li>\n<p>NumPy<\/p>\n<\/li>\n<li>\n<p>tensorflow<\/p>\n<\/li>\n<li>\n<p>nltk<\/p>\n<\/li>\n<li>\n<p>csv<\/p>\n<\/li>\n<li>\n<p>griddb_python<\/p>\n<\/li>\n<li>\n<p>matplotlib<\/p>\n<\/li>\n<\/ol>\n<p>You can install these packages in Conda\u2019s virtual environment using <code>conda install package-name<\/code>. In case you are using Python directly via terminal\/command prompt, <code>pip install package-name<\/code> will do the work.<\/p>\n<h3>GridDB installation<\/h3>\n<p>While loading the dataset, this tutorial will cover two methods \u2013 Using GridDB as well as Using Pandas. To access GridDB using Python, the following packages also need to be installed beforehand:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a><\/li>\n<li>SWIG (Simplified Wrapper and Interface Generator)<\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB Python Client<\/a><\/li>\n<\/ol>\n<h2>2&#46; Dataset Overview<\/h2>\n<p>Text documents are one of the richest sources of data for businesses.<\/p>\n<p>We\u2019ll use a public dataset from the BBC comprised of 2225 articles, each labeled under one of 5 categories: business, entertainment, politics, sport or tech.<\/p>\n<p>The dataset used in this project is the BBC News Raw Dataset. It can be downloaded from here (<code>http:\/\/mlg.ucd.ie\/datasets\/bbc.html<\/code>).<\/p>\n<h2>3&#46; Importing Required Libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import griddb_python as griddb\nimport csv\nimport tensorflow as tf\nimport numpy as np\nimport pandas as pd\nfrom tensorflow.keras.preprocessing.sequence import pad_sequences\nfrom tensorflow.keras.preprocessing.text import Tokenizer\nfrom tensorflow.keras.models import Sequential\nfrom tensorflow.keras.layers import Dense, Flatten, LSTM, Dropout, Activation, Embedding, Bidirectional\nimport nltk\nfrom nltk.corpus import stopwords\nimport matplotlib.pyplot as plt<\/code><\/pre>\n<\/div>\n<h2>4&#46; Loading the Dataset<\/h2>\n<p>Let\u2019s proceed and load the dataset into our notebook.<\/p>\n<h3>4&#46;a Using GridDB<\/h3>\n<p>Toshiba GridDB\u2122 is a highly scalable NoSQL database best suited for IoT and Big Data. The foundation of GridDB\u2019s principles is based upon offering a versatile data store that is optimized for IoT, provides high scalability, tuned for high performance, and ensures high reliability.<\/p>\n<p>To store large amounts of data, a CSV file can be cumbersome. GridDB serves as a perfect alternative as it in open-source and a highly scalable database. GridDB is a scalable, in-memory, No SQL database which makes it easier for you to store large amounts of data. If you are new to GridDB, a tutorial on <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a> can be useful.<\/p>\n<p>Assuming that you have already set up your database, we will now write the SQL query in python to load our dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">sql_statement = ('SELECT * FROM bbc-text')\ndataset = pd.read_sql_query(sql_statement, cont)<\/code><\/pre>\n<\/div>\n<p>Note that the <code>cont<\/code> variable has the container information where our data is stored. Replace the <code>bbc-text<\/code> with the name of your container. More info can be found in this tutorial <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">reading and writing to GridDB<\/a>.<\/p>\n<p>When it comes to IoT and Big Data use cases, GridDB clearly stands out among other databases in the Relational and NoSQL space. Overall, GridDB offers multiple reliability features for mission-critical applications that require high availability and data retention.<\/p>\n<h3>4&#46;b Using With statement<\/h3>\n<p>In Python you need to give access to a file by opening it. You can do it by using the open() function. Open returns a file object, which has methods and attributes for getting information about and manipulating the opened file. Both of the above methods will lead to the same output as the data is loaded in the form of a pandas dataframe using either of the methods.<\/p>\n<p>We import ntlk library and import the stopwords function. We will set the stopwords for English language. These are the samples for English stopwords: has, hasn\u2019t, and, aren\u2019t, because, each, during.<\/p>\n<p>The process of converting data to something a computer can understand is referred to as pre-processing. One of the major forms of pre-processing is to filter out useless data. In natural language processing, useless words (data), are referred to as stop words.<\/p>\n<p>A stop word is a commonly used word (such as \u201cthe\u201d, \u201ca\u201d, \u201can\u201d, \u201cin\u201d) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. We would not want these words to take up space in our database, or taking up valuable processing time. For this, we can remove them easily, by storing a list of words that you consider to stop words.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">nltk.download('stopwords')\nSTOPWORDS = set(stopwords.words('english'))<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">#We populate the list of articles and labels from the data and also remove the stopwords.\narticles = []\nlabels = []\n\nwith open(\"bbc-text.csv\", 'r') as csvfile:\n    reader = csv.reader(csvfile, delimiter=',')\n    next(reader)\n    for row in reader:\n        labels.append(row[0])\n        article = row[1]\n        for word in STOPWORDS:\n            token = ' ' + word + ' '\n            article = article.replace(token, ' ')\n            article = article.replace(' ', ' ')\n        articles.append(article)<\/code><\/pre>\n<\/div>\n<p>We set the hyper-Parameters that are required to build and train the model.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">vocab_size = 5000 \nembedding_dim = 64\nmax_length = 200\ntrunc_type = 'post'\npadding_type = 'post'\noov_tok = '&lt;oov>' # OOV = Out of Vocabulary\ntraining_portion = 0.8&lt;\/oov><\/code><\/pre>\n<\/div>\n<p>Once the dataset is loaded, let us now explore the dataset. We&#8217;ll print the first 10 rows of this dataset using head() function.<\/p>\n<h2>5&#46; Data Cleaning and Preprocessing<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">articles[:1]<\/code><\/pre>\n<\/div>\n<pre><code>['tv future hands viewers home theatre systems  plasma high-definition tvs  digital video recorders moving living room  way people watch tv radically different five years  time.  according expert panel gathered annual consumer electronics show las vegas discuss new technologies impact one favourite pastimes. us leading trend  programmes content delivered viewers via home networks  cable  satellite  telecoms companies  broadband service providers front rooms portable devices.  one talked-about technologies ces digital personal video recorders (dvr pvr). set-top boxes  like us tivo uk sky+ system  allow people record  store  play  pause forward wind tv programmes want.  essentially  technology allows much personalised tv. also built-in high-definition tv sets  big business japan us  slower take europe lack high-definition programming. people forward wind adverts  also forget abiding network channel schedules  putting together a-la-carte entertainment. us networks cable satellite companies worried means terms advertising revenues well  brand identity  viewer loyalty channels. although us leads technology moment  also concern raised europe  particularly growing uptake services like sky+.  happens today  see nine months years  time uk   adam hume  bbc broadcast futurologist told bbc news website. likes bbc  issues lost advertising revenue yet. pressing issue moment commercial uk broadcasters  brand loyalty important everyone.  talking content brands rather network brands   said tim hanlon  brand communications firm starcom mediavest.  reality broadband connections  anybody producer content.  added:  challenge hard promote programme much choice.   means  said stacey jolna  senior vice president tv guide tv group  way people find content want watch simplified tv viewers. means networks  us terms  channels could take leaf google book search engine future  instead scheduler help people find want watch. kind channel model might work younger ipod generation used taking control gadgets play them. might suit everyone  panel recognised. older generations comfortable familiar schedules channel brands know getting. perhaps want much choice put hands  mr hanlon suggested.  end  kids diapers pushing buttons already - everything possible available   said mr hanlon.  ultimately  consumer tell market want.   50 000 new gadgets technologies showcased ces  many enhancing tv-watching experience. high-definition tv sets everywhere many new models lcd (liquid crystal display) tvs launched dvr capability built  instead external boxes. one example launched show humax 26-inch lcd tv 80-hour tivo dvr dvd recorder. one us biggest satellite tv companies  directtv  even launched branded dvr show 100-hours recording capability  instant replay  search function. set pause rewind tv 90 hours. microsoft chief bill gates announced pre-show keynote speech partnership tivo  called tivotogo  means people play recorded programmes windows pcs mobile devices. reflect increasing trend freeing multimedia people watch want  want.']\n<\/code><\/pre>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">labels[:1]<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    ['tech']<\/code><\/pre>\n<\/div>\n<p>Now, let&#8217;s proceed to building and evaluating machine learning models on our credit card dataset. We&#8217;ll first create <code>features<\/code> and <code>labels<\/code> for our model and split them into train and test samples. Test size has been kept as 20% of the total dataset size.<\/p>\n<p>We need to split them into training set and validation set. We set 80% (training_portion = .8) for training and another 20% for validation.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_size = int(len(articles) * training_portion)\n\ntrain_articles = articles[0: train_size]\ntrain_labels = labels[0: train_size]\n\nvalidation_articles = articles[train_size:]\nvalidation_labels = labels[train_size:]<\/code><\/pre>\n<\/div>\n<h3>5&#46;a Tokenization<\/h3>\n<p>Tokenization is set with num_words equal to vocab_size (5000) and oov_token equal to &#8216;<oov>&#8216;. The method fits_on_texts is called on train_articles. By using word frequency, this method creates the vocabulary index. In the example given, &#8220;The cat sat on the mat. &#8220;, it will create a dictionary {&#8216;<oov>&#8216;: 1, &#8216;cat&#8217;: 3, &#8216;mat&#8217;: 6, &#8216;on&#8217;: 5, &#8216;sat&#8217;: 4, &#8216;the&#8217;: 2}.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)\ntokenizer.fit_on_texts(train_articles)\nword_index = tokenizer.word_index<\/code><\/pre>\n<\/div>\n<p>The oov_token is the value \u2018<oov>\u2019 that we put if the word is not listed in the dictionary.<\/oov><\/p>\n<h3>5&#46;b Convert to Sequences<\/h3>\n<p>Tokenization is followed by the method text_to_sequences. It converts each text in texts into an integer sequence. The method basically takes each word in the text and replaces it with its corresponding integer from the dictionary tokenizer.word_index. If the word is not in the dictionary, it will be assigned value of 1.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_sequences = tokenizer.texts_to_sequences(train_articles)<\/code><\/pre>\n<\/div>\n<h3>5&#46;c Sequence Truncation and Padding<\/h3>\n<p>When we train them for NLP, we need to make those sequences the same size (concrete shape). To make sure that all sequences are the same size, we will use padding and truncate them.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_padded = pad_sequences(train_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)<\/code><\/pre>\n<\/div>\n<p>We will apply tokenization, convert to sequences and padding\/truncating to train_articles and validation_articles.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)\ntokenizer.fit_on_texts(train_articles)\nword_index = tokenizer.word_index\n\ntrain_sequences = tokenizer.texts_to_sequences(train_articles)\ntrain_padded = pad_sequences(train_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)\n\nvalidation_sequences = tokenizer.texts_to_sequences(validation_articles)\nvalidation_padded = pad_sequences(validation_sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)<\/code><\/pre>\n<\/div>\n<p>As before, we need to do the same thing here as we did with the features and articles. The model does not understand words, so we need to convert the label to numbers. We tokenize and convert to sequence the same as before. When doing the tokenization, we don\u2019t indicate the vocab size and oov_token.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">label_tokenizer = Tokenizer()\nlabel_tokenizer.fit_on_texts(labels)\n\ntraining_label_seq = np.array(label_tokenizer.texts_to_sequences(train_labels))\nvalidation_label_seq = np.array(label_tokenizer.texts_to_sequences(validation_labels))<\/code><\/pre>\n<\/div>\n<h2>6&#46; Machine Learning Model Building<\/h2>\n<p>Now we are ready to create the Neural Network model. The model architecture consist of the following layers:<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">model = Sequential()\n\nmodel.add(Embedding(vocab_size, embedding_dim))\nmodel.add(Dropout(0.5))\nmodel.add(Bidirectional(LSTM(embedding_dim)))\nmodel.add(Dense(6, activation='softmax'))\n\nmodel.summary()<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    Model: \"sequential\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #   \n    =================================================================\n    embedding (Embedding)        (None, None, 64)          320000    \n    _________________________________________________________________\n    dropout (Dropout)            (None, None, 64)          0         \n    _________________________________________________________________\n    bidirectional (Bidirectional (None, 128)               66048     \n    _________________________________________________________________\n    dense (Dense)                (None, 6)                 774       \n    =================================================================\n    Total params: 386,822\n    Trainable params: 386,822\n    Non-trainable params: 0\n    _________________________________________________________________<\/code><\/pre>\n<\/div>\n<p>We then compile the model to configure the training process with the loss sparse_categorical_crossentropy since we didn\u2019t one-hot encode the labels. We use Adam optimizer.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">opt = tf.keras.optimizers.Adam(learning_rate=0.001, decay=1e-6)\nmodel.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">num_epochs = 12\nhistory = model.fit(train_padded, training_label_seq, epochs=num_epochs, validation_data=(validation_padded, validation_label_seq), verbose=2)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    Epoch 1\/12\n    56\/56 - 8s - loss: 1.6055 - accuracy: 0.2949 - val_loss: 1.4597 - val_accuracy: 0.3191\n    Epoch 2\/12\n    56\/56 - 5s - loss: 1.0623 - accuracy: 0.5854 - val_loss: 0.7767 - val_accuracy: 0.8000\n    Epoch 3\/12\n    56\/56 - 5s - loss: 0.6153 - accuracy: 0.7989 - val_loss: 0.7209 - val_accuracy: 0.7910\n    Epoch 4\/12\n    56\/56 - 5s - loss: 0.3402 - accuracy: 0.9101 - val_loss: 0.5048 - val_accuracy: 0.8135\n    Epoch 5\/12\n    56\/56 - 6s - loss: 0.1731 - accuracy: 0.9685 - val_loss: 0.1699 - val_accuracy: 0.9618\n    Epoch 6\/12\n    56\/56 - 6s - loss: 0.0448 - accuracy: 0.9955 - val_loss: 0.1592 - val_accuracy: 0.9663\n    Epoch 7\/12\n    56\/56 - 6s - loss: 0.0333 - accuracy: 0.9966 - val_loss: 0.1428 - val_accuracy: 0.9663\n    Epoch 8\/12\n    56\/56 - 5s - loss: 0.0400 - accuracy: 0.9927 - val_loss: 0.1245 - val_accuracy: 0.9685\n    Epoch 9\/12\n    56\/56 - 6s - loss: 0.0178 - accuracy: 0.9972 - val_loss: 0.1179 - val_accuracy: 0.9685\n    Epoch 10\/12\n    56\/56 - 5s - loss: 0.0135 - accuracy: 0.9972 - val_loss: 0.1557 - val_accuracy: 0.9573\n    Epoch 11\/12\n    56\/56 - 5s - loss: 0.0264 - accuracy: 0.9983 - val_loss: 0.1193 - val_accuracy: 0.9685\n    Epoch 12\/12\n    56\/56 - 6s - loss: 0.0102 - accuracy: 0.9994 - val_loss: 0.1306 - val_accuracy: 0.9663<\/code><\/pre>\n<\/div>\n<p>We plot the history for accuracy and loss and see if there is overfitting.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">def plot_graphs(history, string):\n  plt.plot(history.history[string])\n  plt.plot(history.history['val_'+string])\n  plt.xlabel(\"Epoch Count\")\n  plt.ylabel(string)\n  plt.legend([string, 'val_'+string])\n  plt.show()\n  \nplot_graphs(history, \"accuracy\")\nplot_graphs(history, \"loss\")<\/code><\/pre>\n<\/div>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/03\/output_49_0.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/03\/output_49_0.png\" alt=\"\" width=\"386\" height=\"262\" class=\"aligncenter size-full wp-image-28135\" srcset=\"\/wp-content\/uploads\/2022\/03\/output_49_0.png 386w, \/wp-content\/uploads\/2022\/03\/output_49_0-300x204.png 300w\" sizes=\"(max-width: 386px) 100vw, 386px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/03\/output_49_1.png\"><img decoding=\"async\" src=\"https:\/\/griddb.net\/wp-content\/uploads\/2022\/03\/output_49_1.png\" alt=\"\" width=\"386\" height=\"262\" class=\"aligncenter size-full wp-image-28136\" srcset=\"\/wp-content\/uploads\/2022\/03\/output_49_1.png 386w, \/wp-content\/uploads\/2022\/03\/output_49_1-300x204.png 300w\" sizes=\"(max-width: 386px) 100vw, 386px\" \/><\/a><\/p>\n<p>Finally, we call the method predict() to perform prediction on a sample text.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">txt = [\"Only bonds issued by the Russian government can be traded as part of a phased re-opening of the market. The exchange closed hours after Russian President Vladimir Putin sent thousands of troops into Ukraine on 24 February.Andrei Braginsky, a spokesman for the Moscow Exchange, said he hoped that trading in stocks would be able to start again soon. Technically everything is ready, and we are hoping this will resume in the near future, he said.\"]\n\nseq = tokenizer.texts_to_sequences(txt)\npadded = pad_sequences(seq, maxlen=max_length)\npred = model.predict(padded)\nlabels = ['sport', 'bussiness', 'politics', 'tech', 'entertainment'] \n\nprint(pred)\nprint(np.argmax(pred))\nprint(labels[np.argmax(pred)-1])<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    [[2.6411068e-04 2.1545513e-02 9.6170175e-01 7.2104726e-03 1.0733245e-03\n      8.2047796e-03]]\n    2\n    bussiness<\/code><\/pre>\n<\/div>\n<h2>7&#46; Conclusion<\/h2>\n<p>In this tutorial, we&#8217;ve build a text classification model with LSTM to predict the category of the BBC News articles. We examined two ways to import our data, using (1) GridDB and (2) With Statement. For large datasets, GridDB provides an excellent alternative to import data in your notebook as it is open-source and highly scalable. <a href=\"https:\/\/griddb.net\/en\/downloads\/\">Download GridDB<\/a> today!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing continuously, so it is important to classify the news in a way that lets users access the information they are interested in quickly and efficiently. Using this model, users [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":28168,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46698","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-04-01T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:55:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1696\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Multi Class Text Classification using Python and GridDB\",\"datePublished\":\"2022-04-01T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\"},\"wordCount\":1276,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\",\"name\":\"Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg\",\"datePublished\":\"2022-04-01T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:58+00:00\",\"description\":\"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg\",\"contentUrl\":\"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg\",\"width\":2560,\"height\":1696},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT","description":"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing","og_url":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2022-04-01T07:00:00+00:00","article_modified_time":"2025-11-13T20:55:58+00:00","og_image":[{"width":2560,"height":1696,"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg","type":"image\/jpeg"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/"},"author":{"name":"griddb-admin","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Multi Class Text Classification using Python and GridDB","datePublished":"2022-04-01T07:00:00+00:00","dateModified":"2025-11-13T20:55:58+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/"},"wordCount":1276,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/","url":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/","name":"Multi Class Text Classification using Python and GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg","datePublished":"2022-04-01T07:00:00+00:00","dateModified":"2025-11-13T20:55:58+00:00","description":"On the Internet, there are a lot of sources that provide enormous amounts of daily news. Further, the demand for information by users has been growing","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/multi-class-text-classification-using-python-and-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg","contentUrl":"\/wp-content\/uploads\/2022\/03\/silence-word-magnified_2560x1696.jpg","width":2560,"height":1696},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46698","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/comments?post=46698"}],"version-history":[{"count":1,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46698\/revisions"}],"predecessor-version":[{"id":51372,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/posts\/46698\/revisions\/51372"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media\/28168"}],"wp:attachment":[{"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/media?parent=46698"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/categories?post=46698"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/wp-json\/wp\/v2\/tags?post=46698"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}