Exploring Stellar Wonders: Unraveling the Enigma with GridDB

Exploring Stellar Wonders: Unraveling the Enigma with GridDB¶

Introduction¶

All of us have grown up in fascination of the sky and stars at night. The mere sight of a vast, twinkling canvas above us ignites a sense of wonder and curiosity. As we gaze at the celestial tapestry, our imaginations soar, pondering the mysteries of the universe. The stars, like ancient storytellers, hold secrets and tales of cosmic proportions. They have inspired countless myths, guided explorers across vast oceans, and propelled humanity's quest for knowledge and understanding. From the timeless beauty of the Milky Way to the fiery brilliance of distant supernovae, the sky beckons us to delve deeper into its mind-boggling wonders. It is in this awe-inspiring realm that GridDB, a powerful analytical tool, emerges as a guiding light, enabling us to unlock the hidden patterns and insights woven within the intricate fabric of the stars. With GridDB, we embark on a captivating journey of exploration, unraveling the enigmatic nature of the universe and painting a vivid portrait of our cosmic cousins. Let's get started!

About the Dataset¶

The Dataset is obtained from the web portal Astronomy Nexus which is owned by astronomy enthusiast and stargazer David Nash. For the purpose of this analysis, the dataset HYG 3.0 has been downloaded. Please refer to the Creative Commons Attribution-ShareAlike license of the dataset here.

No description has been provided for this image

Instructions to Download the Dataset¶

The dataset is in the gz format. Use a tool such as 7-Zip File Manager for this. Below are the instructions to extract the dataset from the compressed file.

Select the .gz file (hygdata_v3.csv.gz) and click the 'Extract' button.

In the screen that follows, click on 'Extract'

The file gets extracted inside a folder with the file name 'hygdata_v3.csv'.

Understanding the Dataset Abbreviations¶

The abbreviated column names represent the different properties or attributes associated with each star in the dataset.

id: Star ID
hip: Hipparcos ID
hd: Henry Draper Catalog ID
hr: Harvard Revised Catalog ID
gl: Gliese Catalog ID
bf: Bayer/Flamsteed designation
proper: Proper name of the star
ra: Right Ascension (RA) in decimal degrees or hours
dec: Declination (Dec) in decimal degrees
dist: Distance to the star
pmra: Proper motion in right ascension
pmdec: Proper motion in declination
rv: Radial velocity
mag: Apparent magnitude
absmag: Absolute magnitude
spect: Spectral type
ci: Color index
x, y, z: Cartesian coordinates
vx, vy, vz: Cartesian velocities
rarad, decrad: Right Ascension and Declination in radians
pmrarad, pmdecrad: Proper motion in right ascension and declination in radians
bayer: Bayer designation
flam: Flamsteed number
con: Constellation abbreviation
comp: Component identifier for multiple stars
comp_primary: Primary component identifier for multiple stars
base: Base temperature
lum: Luminosity
var: Variable star designation
var_min, var_max: Minimum and maximum magnitudes for variable stars

Importing Libraries¶

In [4]:

import os
os.getcwd()
import os
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from IPython.display import Image
import http
http.client.HTTPConnection.debuglevel = 1
import json
import requests
import astropy
import matplotlib.pyplot as plt
from astropy.coordinates import SkyCoord
import astropy.units as u

About GridDB¶

GridDB is a reliable option to store and analyze astronomical datasets. GridDB's features, such as its high performance, scalability, and real-time analytics capabilities, make it well-suited for handling the large and complex datasets encountered in astronomy. Astronomical data often requires complex queries to extract meaningful information. GridDB supports SQL-like queries, enabling researchers and astronomers to perform various operations such as filtering, aggregation, and joining of multiple astronomical datasets and analyzing them for powerful insights. Moreover, GridDB's in-memory computing architecture, combined with its disk-based storage, enables fast data processing and low-latency access to data.

In this case, we use Python's requests package to connect to GridDB's WebAPI (GridDB Cloud) to create the data container, load data into the container, and write queries against the data.

Reading the Dataset into a Dataframe¶

We will use this dataframe to load data into the container later on.

In [879]:

hygdata_v3 = pd.read_csv(os.getcwd()+ '\hygdata_v3.csv\hygdata_v3.csv')

Setting up the Authorization and Request¶

To connect to the GridDB cloud's WebAPI, the Request that is constructed should have the username and password encoded to base64 standards. To determine the base64 encoded credentials for your username and password, a tool such as https://www.base64encode.org/ can be used. Refer to the resource listed here to learn more about the different entities involved in creating a request - https://griddb.net/en/blog/using-griddb-cloud-version-1-2/. The basic authentication format in GridDB is 'username:password' where the case-sensitive username and password should be used for encoding.

Construct the base URL based on your GridDB cluster you'd like to connect to. Ensure that you replace the placeholders in the base_url below with the correct values that correspond to your GridDB instance

In [5]:

#Construct an object to hold the request headers 
#Ensure that you replace the XXXX placeholder with the correct value that matches the credentials for your GridDB instance)
header_obj = {"Authorization":"XXXX","Content-Type":"application/json; charset=UTF-8","User-Agent":"PostmanRuntime/7.29.0"}

#Construct the base URL based on your GridDB cluster you'd like to connect to
#ensure that you replace the placeholders in the URL below with the correct values that correspond to your GridDB instance)

base_url = 'https://[host]:[port]/griddb/v2/[clustername]/dbs/[database_name]/'

Creating the Data Container¶

GridDB supports two types of containers namely collections and time-series containers. While collections are general-purpose data containers that store data records with arbitrary schema, time series containers are specifically designed for handling data where each record is associated with a timestamp. In this case, Collections are ideal as they are suitable for storing diverse types of data. Refer to this official resource to learn more on GridDB containers and their characteristics.

An important pre-requisite to creating a container is to determine the data types of each data column you have. GridDB supports both primitive and complex datatypes. Among the primitive datatypes, GridDB supports Integer types, Boolean, String, Floating Point types and Time variables. Among the complex datatypes, GridDB supports Arrays, blob storage and Geospatial data storage. GridDB also has specialized datatypes that support TIMESTAMP-based data storage and retrieval optimizations and options to store Enumerated data types. Refer to this resource to learn more.

Below is a template to create a container object. For Collection containers, the rowkey can is usually set to False whereas for Timeseries containers, the rowkey has to be set to True. When we set 'rowkey' to True, we are effectively telling GridDB that the container has a primary key column. Timeseries containers require the timestamp variable to be the primary key. While Collection Containers can either have a rowkey of True or False, when the rowkey is set to False, GriDB creates a Collection Container by default.

The choice between using a row key or not depends on the requirements of your application and the access patterns you anticipate. If you need to perform frequent key-based lookups or retrieve individual records based on their unique identifiers, setting the row key to "true" would be appropriate. On the other hand, if your application primarily relies on querying and filtering records based on other field values, setting the row key to "false" is suitable.

data_obj = {
"container_name": "Container_Name",
"container_type": "Container Type",
"rowkey": False,
"columns": [
{
"name": "Name of column 1",
"type": "Datatype of column 1"
},
{ "name": "Name of column 2",
"type": "Datatype of column 2"
},
{ "name": "Name of column 3",
"type": "Datatype of column 3"
},

Rules for naming Containers and columns within a container object¶

Note that while creating column names, special characters such as '(', '/' and spaces should be avoided in column names and container names. An underscore ( _ ) can however be used. If there are such characters in your container declaration, the request fails with the error message '{"version":"v2","errorCode":145007,"errorMessage":"[145007:JC_ILLEGAL_SYMBOL_CHARACTER] Illegal character found (character=\" \"(\\u0020), component=base, containerName=\"Astronomy Dataset\", index=9)"}'. Here is a comprehensive list of error codes for GridDB - http://www.toshiba-sol.co.jp/en/pro/griddb/docs-en/v4_3/GridDB_ErrorCodes.html

A pythonic way to create a container¶

Tapping into Python's capabilities, the below code will iterate over the input_variables list and create a column dictionary for each variable. The "name" field of each column for the 'Collection' container object will be populated with the variable from the list 'input_variables'. Similarly, the 'type' field of each column for the 'Collection' container object will be populated with the variable from the list 'data_types'.

The zip() function is used to iterate over both lists simultaneously, allowing you to access the corresponding data type for each input variable. The updated data_obj will contain the dynamically generated column names based on the input variables.

In [45]:

data_obj = {
    "container_name": "Astronomy_Dataset",
    "container_type": "COLLECTION",
    "rowkey": False,
    "columns": []
}

input_variables = [
    "id", "hip", "hd", "hr", "gl", "bf", "proper", "ra", "dec", "dist", "pmra", "pmdec",
    "rv", "mag", "absmag", "spect", "ci", "x", "y", "z", "vx", "vy", "vz", "rarad", "decrad",
    "pmrarad", "pmdecrad", "bayer", "flam", "con", "comp", "comp_primary", "base", "lum",
    "var", "var_min", "var_max"
]


data_types = [
    "INTEGER", "INTEGER", "INTEGER", "INTEGER", "STRING", "STRING", "STRING", "DOUBLE",
    "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "STRING",
    "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE", "DOUBLE",
    "DOUBLE", "DOUBLE", "DOUBLE", "STRING", "INTEGER", "STRING", "INTEGER", "INTEGER",
    "STRING", "DOUBLE", "STRING", "DOUBLE","DOUBLE"
]

for variable, data_type in zip(input_variables, data_types):
    column = {
        "name": variable,
        "type": data_type
    }
    data_obj["columns"].append(column)

In [ ]:

#Set up the GridDB WebAPI URL
url = base_url + 'containers'

#Invoke the GridDB WebAPI with the headers and the request body
x = requests.post(url, json = data_obj, headers = header_obj)

Row Registration (Data Loading)¶

In GridDB, the process of loading data into a container is referred to as "Row Registration" because it involves registering or adding rows to the container. When you register a row, you are essentially inserting a new record into the container, specifying the values for each column in that row. This terminology aligns with the concept of a database table, where rows represent individual records or entities, and registering a row signifies the act of adding a new record to the table. Refer to this resource to learn more about what happens in the backend while using Row Registration. https://www.toshiba-sol.co.jp/en/pro/griddb/docs-en/v4_6/GridDB_Web_API_Reference.html#row-registration

The good thing about using Python in conjunction with GridDB is that we can use both inbuilt functions in Python and packages such as Json and Pandas to manipulate data easily and effectively. In this case, we use the 'to_json' function of the 'json' module.

The dataframe 'hygdata_v3.csv' (Refer to the section 'Reading the Dataset into a Dataframe') is used to load data into a container after a few transformations. The below lines of code convert the data from the DataFrame to JSON format (using the to_json function from the 'json' library). The JSON object is then sent in a PUT request to the GridDB WebAPI to register the rows in the specified container.

Batch Loading properties of GridDB¶

GridDB supports batch loading. If you are using the SDK, here is a useful resource - https://griddb.net/en/blog/griddb-optimization-with-multi-put-and-query/. If you use the WebAPI, then you can just use a simple for loop to load your subsets of data into your GridDB container. There are two major advantages to using batch loading to register rows into your GridDB container. One is that large datasets can be loaded seamlessly. The second is that you can effectively troubleshoot issues with your data easily.

In [ ]:

hygdata_v3['id'] = hygdata_v3['id'].astype('Int64')
hygdata_v3['hip'] = hygdata_v3['hip'].astype('Int64')
hygdata_v3['hd'] = hygdata_v3['hd'].astype('Int64')
hygdata_v3['hr'] = hygdata_v3['hr'].astype('Int64')

hygdata_v3['flam'] = hygdata_v3['flam'].astype('Int64')
hygdata_v3['comp'] = hygdata_v3['comp'].astype('Int64')
hygdata_v3['comp_primary'] = hygdata_v3['comp_primary'].astype('Int64')

In [ ]:

#creating 12 subsets of the data
Star_Subsets = np.array_split(hygdata_v3, 12)

In [49]:

#Setup the URL to be used to invoke the GridDB WebAPI to register rows in the container created previously
url = base_url + 'containers/Astronomy_Dataset/rows'

#Invoke the GridDB WebAPI using the request constructed
for subset in  Star_Subsets:
    #Convert the data in the dataframe to the JSON format
    Star_Subsets_json = subset.to_json(orient='values')

    request_body_subset = Star_Subsets_json
    x = requests.put(url, data=request_body_subset, headers=header_obj)
    print('______________________________________')
    print('_______________',x.text,'___________')
    print('______________________________________')
    if x.status_code > 299: #To account for HTTP response codes indicating a success - 200 and 201 etc.
        print(x.status_code) # an error has occurred; so, lets stop the process here
        print(Star_Subsets_json)
        break
    else:
        print('Success for chunk..')

______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9968} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9967} ___________
______________________________________
Success for chunk..
______________________________________
_______________ {"count":9967} ___________
______________________________________
Success for chunk..

Unveiling the Celestial Mysteries: Exploring the Captivating Astronomy Dataset¶

GridDB provides a query language called "GridDB Query Language", which is similar to SQL syntax. GQL allows you to perform various operations on GridDB databases, such as querying data, filtering results, aggregating data, and more. While GQL shares similarities with SQL, it also has its own unique features and syntax tailored specifically for GridDB's distributed architecture and data model. For Timeseries containers, GridDB provides specialized query language called TQL (TimeSeries Query Language) to support time-based aggregations, time-based selections and to set sliding windows. In this case, we stick to the SQL capabilities of GridDB as we have created a Collection Container. But if you have other use cases that are time-based, check out this TQL Reference . To refer to the SQL reference, visit this page.

GridDB supports conditional expressions like 'Case When'. Refer to this resource to learn more.

What does the distribution of the Top 20,000 stars look like?¶

In [7]:

sql_query1 = (f"""SELECT id, ra, dec,dist,lum,CASE WHEN proper IS NULL THEN 'Anonymous' ELSE proper END as proper FROM Astronomy_Dataset order by lum desc LIMIT 20000""")

In [8]:

#Setup the URL to be used to invoke the GridDB WebAPI to retrieve data from the container
url = base_url + 'sql'

#Construct the request body
request_body = '[{"type":"sql-select", "stmt":"'+sql_query1+'"}]'

#Validate the constructed request body
request_body

Out[8]:

'[{"type":"sql-select", "stmt":"SELECT id, ra, dec,dist,lum,CASE WHEN proper IS NULL THEN \'Anonymous\' ELSE proper END as proper FROM Astronomy_Dataset order by lum desc LIMIT 20000"}]'

In [ ]:

#Invoke the GridDB WebAPI
data_req1 = requests.post(url, data=request_body, headers=header_obj)

In [10]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req1.json()
Celestial_Coords = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"],myJson[0]["columns"][2]["name"],myJson[0]["columns"][3]["name"],myJson[0]["columns"][4]["name"],myJson[0]["columns"][5]["name"]])

In [11]:

# Sort the DataFrame by luminosity in descending order and select the top 100 rows
top_stars = Celestial_Coords.sort_values('lum', ascending=False).head(20000)

# Create a scatter plot using Plotly Express
scatter_plot = px.scatter(top_stars, x='ra', y='dec', size='lum', color='lum',
                          color_continuous_scale='plasma', hover_name='proper') #viridis

# Customize the plot
scatter_plot.update_layout(
    title='Top 20000 Stars by Luminosity',
    xaxis_title='Right Ascension (RA)',
    yaxis_title='Declination (Dec)'
)

scatter_plot.update_layout(template='plotly_dark')


# Show the plot
scatter_plot.show()

Insight(s):

When we look at the celestial co-ordinates of the Top 20,000 stars, we get the above distribution.
The brightest star in this case has a luminosity of 400M. The next brightest seems to be having a luminosity of 250M.
Most stars seem to be between 0 and 50M luminosity.

What are some known stars and their spectral classification and magnitude?¶

GridDB supports NULL handling and special handling. Refer to this resource to learn more.

In [12]:

sql_query2 = (f"""SELECT distinct spect,proper,mag FROM Astronomy_Dataset where proper IS NOT NULL""")

In [13]:

#Setup the URL to be used to invoke the GridDB WebAPI to retrieve data from the container
url = base_url + 'sql'

#Construct the request body
request_body = '[{"type":"sql-select", "stmt":"'+sql_query2+'"}]'

#Validate the constructed request body
request_body

Out[13]:

'[{"type":"sql-select", "stmt":"SELECT distinct spect,proper,mag FROM Astronomy_Dataset where proper IS NOT NULL"}]'

In [ ]:

#Invoke the GridDB WebAPI
data_req2 = requests.post(url, data=request_body, headers=header_obj)

In [15]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req2.json()
spec_mag = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"],myJson[0]["columns"][2]["name"]])

In [16]:

colorscale = [[0, '#4d004c'],[.5, '#f2e5ff'],[1, '#ffffff']]
fig = ff.create_table(spec_mag,colorscale=colorscale)
fig.show()

Insight(s): In astronomy, magnitude is a logarithmic scale used to quantify the brightness of celestial objects. The lower the magnitude value, the brighter the object.

Sirius has a magnitude of -1.44, making it one of the brightest stars in the night sky.
In the case of 3C 273, a magnitude of 12.88 indicates that it is a relatively bright object, although not visible to the naked eye. It may be observable with telescopes or other astronomical instruments.
Sirius, Proxima Centauri, Alpha Centauri (Rigil Kentaurus), Vega, Betelgeuse, Rigel, and Antares are noteworthy due to their brightness, proximity to Earth, or prominent positions in constellations.
The apparent magnitudes vary significantly across the dataset, ranging from -26.7 (Sol, our Sun) to 12.88 (3C 273). This wide range represents the vast differences in luminosity between stars, with the brightest stars appearing much brighter than the dimmer ones.

What are the Top 5 Constellations by the Number of Stars?¶

GridDB provides Null Handling functions such as ISNULL, IFNULL and Coalesce among its SQL functions. Refer to this resource to know more. GridDB also supports subqueries as seen in this resource.

In [17]:

sql_query3 = (f"""SELECT * FROM (SELECT IFNULL(con,'Anonymous Constellation') as con,count(id) as Num_Stars FROM Astronomy_Dataset group by 1) largest_constellations order by con desc""")

In [18]:

#Setup the URL to be used to invoke the GridDB WebAPI to retrieve data from the container
url = base_url + 'sql'

#Construct the request body
request_body = '[{"type":"sql-select", "stmt":"'+sql_query3+'"}]'

#Validate the constructed request body
request_body

Out[18]:

'[{"type":"sql-select", "stmt":"SELECT * FROM (SELECT IFNULL(con,\'Anonymous Constellation\') as con,count(id) as Num_Stars FROM Astronomy_Dataset group by 1) largest_constellations order by con desc"}]'

In [ ]:

#Invoke the GridDB WebAPI
data_req3 = requests.post(url, data=request_body, headers=header_obj)

In [20]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req3.json()
Stars_in_each_Constellation = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"]])
# Sort the dataframe by 'Num_Stars' column in descending order
sorted_df = Stars_in_each_Constellation.sort_values('Num_Stars', ascending=False)
Top_5_Constellations = sorted_df.sort_values(by=['Num_Stars'],ascending=False).head(5)

In [21]:

fig = px.bar(Top_5_Constellations, y='con', x='Num_Stars',text='Num_Stars',orientation='h')
fig.update_layout(height=800,title_text='Top Constellations in terms of the number of Stars')
fig.show()

Insight(s):

In terms of the number of stars, Centaurus is the largest constellation followed by Ursa Major.
The 3rd, 4th and 5th slots are captured by Hercules, Cygnus and Hydra respectively.

Which constellations have some of the brightest stars?¶

In [22]:

sql_query4 = (f"""
    SELECT con, proper, mag
    FROM Astronomy_Dataset
    ORDER BY mag asc
    limit 10
""")

In [23]:

# Construct the request body
request_body = json.dumps([
    {
        "type": "sql-select",
        "stmt": sql_query4
    }
])

# Validate the constructed request body
print(request_body)

[{"type": "sql-select", "stmt": "\n    SELECT con, proper, mag\n    FROM Astronomy_Dataset\n    ORDER BY mag asc\n    limit 10\n"}]

In [ ]:

#Invoke the GridDB WebAPI
data_req4 = requests.post(url, data=request_body, headers=header_obj)

In [26]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req4.json()
Brightest_Constellations = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"], myJson[0]["columns"][2]["name"]])

In [27]:

table = ff.create_table(Brightest_Constellations, index=False)

# Make text size larger
for i in range(len(table.layout.annotations)):
    table.layout.annotations[i].font.size = 15
    
# Update the layout to center align the table
table.update_layout(
    margin=dict(l=70, r=30, t=70, b=30),
    width=500,
    height=500,
    paper_bgcolor='white',
    title='Constellations with the brightest stars' 
)

# Display the table
table.show()

Insight(s): Below is a summary of the constellations with the brightest stars:

The star Sol("Solaris") referring to the Sun tops the list. It is not considered a part of any constellation in the traditional sense. The Sun, being the star at the center of our solar system, does not fall within the framework of the 88 constellations and hence does not have a constellation name.
The next brightest star is Sirius that belongs to the Constellation CMa('Canis Major').
This is followed by Canopus that belongs to the Constellation Car('Carina').

Visualizing the 'Orion' Constellation¶

In [28]:

sql_query5 = (f"""
    SELECT proper,id,ra,dec,lum,dist
    FROM Astronomy_Dataset
    WHERE lower(con) like '%ori%'
""")

In [29]:

# Construct the request body
request_body = json.dumps([
    {
        "type": "sql-select",
        "stmt": sql_query5
    }
])

# Validate the constructed request body
print(request_body)

[{"type": "sql-select", "stmt": "\n    SELECT proper,id,ra,dec,lum,dist\n    FROM Astronomy_Dataset\n    WHERE lower(con) like '%ori%'\n"}]

In [ ]:

#Invoke the GridDB WebAPI
data_req5 = requests.post(url, data=request_body, headers=header_obj)

In [31]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req5.json()
orion_subset = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"], myJson[0]["columns"][2]["name"],myJson[0]["columns"][3]["name"],myJson[0]["columns"][4]["name"],myJson[0]["columns"][5]["name"]])

In [32]:

# Sample data for demonstration
ra = orion_subset['ra']
dec = orion_subset['dec']
lum = orion_subset['lum']

# Create SkyCoord object with spherical coordinates
coords = SkyCoord(ra=ra*u.deg, dec=dec*u.deg, frame='icrs')

# Scale the luminosity values for marker sizes
sizes = lum * 100

# Plotting the scatter plot
fig, ax = plt.subplots(figsize=(15, 10))

# Set the plot background color to black
fig.patch.set_facecolor('black')
ax.set_facecolor('black')

# Plot white stars
ax.scatter(coords.ra.wrap_at(180*u.deg).radian, coords.dec.radian, s=5,marker='*', color='white', alpha=0.5)

# Set plot labels and limits
ax.set_xlabel('RA (radians)')
ax.set_ylabel('Dec (radians)')

# Set tick colors to white
ax.xaxis.set_tick_params(color='white')
ax.yaxis.set_tick_params(color='white')

# Set plot title color to white
ax.set_title('Orion Constellation - Scatter Plot', color='white')

# Set the axis labels color to white
ax.xaxis.label.set_color('white')
ax.yaxis.label.set_color('white')

# Set the axis tick labels color to white
ax.tick_params(axis='x', colors='white')
ax.tick_params(axis='y', colors='white')

plt.show()

Insight(s): The above plot shows the coordinates of all stars in the Orion Constellation.

What are the spectral types for known stars across all constellations?¶

In [33]:

sql_query6 = (f"""
    SELECT con,spect,CASE WHEN count(id) between 1 and 20 then '1-20 stars'
    WHEN count(id) between 21 and 50 then '21-50 stars'
    WHEN count(id) between 51 and 100 then '51-100 stars'
    WHEN count(id) between 101 and 200 then '101-200 stars'
    WHEN count(id) between 201 and 500 then '201-500 stars'
    WHEN count(id) > 500 then 'More than 500 stars'
    END as Num_Stars
    FROM Astronomy_Dataset
    GROUP BY 1,2
    ORDER BY 3 desc
""") # proper,id,ra,dec,lum,dist; WHERE proper IS NOT NULL

In [34]:

# Construct the request body
request_body = json.dumps([
    {
        "type": "sql-select",
        "stmt": sql_query6
    }
])

# Validate the constructed request body
print(request_body)

[{"type": "sql-select", "stmt": "\n    SELECT con,spect,CASE WHEN count(id) between 1 and 20 then '1-20 stars'\n    WHEN count(id) between 21 and 50 then '21-50 stars'\n    WHEN count(id) between 51 and 100 then '51-100 stars'\n    WHEN count(id) between 101 and 200 then '101-200 stars'\n    WHEN count(id) between 201 and 500 then '201-500 stars'\n    WHEN count(id) > 500 then 'More than 500 stars'\n    END as Num_Stars\n    FROM Astronomy_Dataset\n    GROUP BY 1,2\n    ORDER BY 3 desc\n"}]

In [ ]:

#Invoke the GridDB WebAPI
data_req6 = requests.post(url, data=request_body, headers=header_obj)

In [36]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req6.json()
stellar_types = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"],myJson[0]["columns"][2]["name"]])

In [37]:

Constellations_with_more_than_200_stars = stellar_types[(stellar_types['Num_Stars']=='More than 500 stars')|(stellar_types['Num_Stars']=='201-500 stars')]
#display(dataFrame[(dataFrame['Salary']>=100000) & (dataFrame['Age']<40)])

In [38]:

#fig = ff.create_table(Constellations_with_more_than_200_stars)
fig = px.strip(Constellations_with_more_than_200_stars, x="con", y="spect", color="Num_Stars") # , facet_col="spect"
fig.show()

Insight(s):

The Hercules constellation has more than 500 stars with the K0 spectral type.
K0 stars have a temperature range of approximately 4,500 to 5,000 Kelvin and generally appear orange in color.
A lot of other constellations like Ursa Major, Camelopardalis, Virgo, etc. and have between 201 and 500 K0 stars.
It's important to note that Ursa Major, Virgo, Boötes and Cetus also have an enormous amount of G5 stars.
Hercules, Draco, Leo and Pisces have G5 stars too.
G0 stars are hotter than G5 stars and have temperatures closer to the upper end of the G-type range. G0 stars have temperatures around 5,900 to 6,000 Kelvin, while G5 stars have temperatures around 5,500 to 5,600 Kelvin.

What does the spectral distribution of Stars look like across each constellation?¶

Here, we use GridDB's Conditional Expressions to create categories of spectral types and to create bins of the number of stars in each spectral type and constellation.

In [39]:

sql_query7 = (f"""
    SELECT con,
    CASE WHEN lower(spect) like 'a%' then 'A type'
    WHEN lower(spect) like 'b%' then 'B type'
    WHEN lower(spect) like 'c%' then 'C type'
    WHEN lower(spect) like 'd%' then 'D type'
    WHEN lower(spect) like 'e%' then 'E type'
    WHEN lower(spect) like 'f%' then 'F type'
    WHEN lower(spect) like 'g%' then 'G type'
    WHEN lower(spect) like 'k%' then 'K type'
    WHEN (lower(spect) like 'm%' or lower(spect) like 'n%' or lower(spect) like 'r%') then 'M-N type'
    WHEN lower(spect) like 'o%' then 'O type'
    WHEN lower(spect) = '' then 'None'
    ELSE 'Others'
    END AS spectral_classification,
    CASE WHEN count(id) between 1 and 20 then '1-20 stars'
    WHEN count(id) between 21 and 50 then '21-50 stars'
    WHEN count(id) between 51 and 100 then '51-100 stars'
    WHEN count(id) between 101 and 200 then '101-200 stars'
    WHEN count(id) between 201 and 500 then '201-500 stars'
    WHEN count(id) > 500 then 'More than 500 stars'
    END as Num_Stars
    FROM Astronomy_Dataset
    GROUP BY 1,2
    ORDER BY 3 desc
""") # proper,id,ra,dec,lum,dist; WHERE proper IS NOT NULL

In [40]:

# Construct the request body
request_body = json.dumps([
    {
        "type": "sql-select",
        "stmt": sql_query7
    }
])

# Validate the constructed request body
print(request_body)

[{"type": "sql-select", "stmt": "\n    SELECT con,\n    CASE WHEN lower(spect) like 'a%' then 'A type'\n    WHEN lower(spect) like 'b%' then 'B type'\n    WHEN lower(spect) like 'c%' then 'C type'\n    WHEN lower(spect) like 'd%' then 'D type'\n    WHEN lower(spect) like 'e%' then 'E type'\n    WHEN lower(spect) like 'f%' then 'F type'\n    WHEN lower(spect) like 'g%' then 'G type'\n    WHEN lower(spect) like 'k%' then 'K type'\n    WHEN (lower(spect) like 'm%' or lower(spect) like 'n%' or lower(spect) like 'r%') then 'M-N type'\n    WHEN lower(spect) like 'o%' then 'O type'\n    WHEN lower(spect) = '' then 'None'\n    ELSE 'Others'\n    END AS spectral_classification,\n    CASE WHEN count(id) between 1 and 20 then '1-20 stars'\n    WHEN count(id) between 21 and 50 then '21-50 stars'\n    WHEN count(id) between 51 and 100 then '51-100 stars'\n    WHEN count(id) between 101 and 200 then '101-200 stars'\n    WHEN count(id) between 201 and 500 then '201-500 stars'\n    WHEN count(id) > 500 then 'More than 500 stars'\n    END as Num_Stars\n    FROM Astronomy_Dataset\n    GROUP BY 1,2\n    ORDER BY 3 desc\n"}]

In [ ]:

#Invoke the GridDB WebAPI
data_req7 = requests.post(url, data=request_body, headers=header_obj)

In [42]:

#Process the response received and construct a Pandas dataframe with the data from the response
myJson = data_req7.json()
stellar_classification_table = pd.DataFrame(myJson[0]["results"], columns=[myJson[0]["columns"][0]["name"], myJson[0]["columns"][1]["name"],myJson[0]["columns"][2]["name"]])

In [43]:

Stellar_constellations = stellar_classification_table[stellar_classification_table['spectral_classification']!='Others']

In [44]:

# Define colors for each spectral type
colors = {'O': 'blue', 'B': 'green', 'A': 'orange', 'F': 'red', 'G': 'purple'}

# Define marker symbols for each constellation
symbols = {'And': 'star', 'Cas': 'hexagram', 'Ori': 'diamond', 'Leo': 'circle', 'Ursa': 'square'}

# Create scatter plot using Plotly Express
fig = px.scatter(Stellar_constellations, x='con', y='Num_Stars', color='spectral_classification',
                 symbol='spectral_classification', symbol_sequence=list(symbols.values()),
                 symbol_map=symbols, color_discrete_map=colors,
                 labels={'spectral_classification': 'Spectral Classification', 'Num_Stars': 'Number of Stars'})

# Customize layout
fig.update_layout(title='Stars in Different Constellations by Spectral Type',
                  xaxis_title='Spectral Type', yaxis_title='Number of Stars')

# Show the plot
fig.show()

Insight(s):

C-type, M-N types and K-type stars seem to be widely distributed across all constellations.
Aquila(Aql),Octans(Oct), Crux(Cru) and Lepus(Lep) are some constellations that have G-type stars also known as G-dwarfs or yellow dwarf stars.
O-type stars are less common. Dorado(Dor), Leo Minor(Lmi), Coma Berenices(Com), Chamaeleon(Cha) and a couple more constellations have O-type stars.

Concluding Remarks¶

In conclusion, the analysis conducted on the Astronomy Nexus data using GridDB has provided valuable insights into the complex world of astronomy. The utilization of GridDB's powerful features, such as subqueries and case statements, has greatly enhanced the effectiveness and efficiency of the analysis process.

Through the implementation of subqueries, we were able to extract specific subsets of data, allowing for targeted analysis on various astronomical phenomena. This enabled us to explore specific aspects of the data and uncover correlations and patterns that would have otherwise been challenging to identify.

The utilization of case statements within GridDB offered a flexible and intuitive way to handle complex conditions and derive meaningful information from the data. By applying case when statements and Null Handling, we were able to categorize and classify astronomical objects based on their specific characteristics, facilitating a more comprehensive understanding of the data set.

The Astronomy Nexus data presented unique challenges due to its complexity and vastness. However, GridDB proved to be an invaluable tool in handling and processing such intricate data. Its ability to efficiently store, retrieve, and analyze massive datasets while providing high availability and scalability allowed for seamless exploration and manipulation of the Astronomy Nexus data.

In summary, the combination of GridDB's advanced features, including subqueries and case statements, with the rich and intricate Astronomy Nexus data has demonstrated the effectiveness and utility of GridDB in the field of astronomy. By leveraging GridDB's capabilities, astronomers and researchers can unlock deeper insights and make significant advancements in our understanding of the universe.

Also available on Github Last updated: 16/06/2023 01:49:41

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.

Exploring Stellar Wonders with GridDB

Exploring Stellar Wonders: Unraveling the Enigma with GridDB¶

Introduction¶

About the Dataset¶

Instructions to Download the Dataset¶

Understanding the Dataset Abbreviations¶

Importing Libraries¶

About GridDB¶

Reading the Dataset into a Dataframe¶

Setting up the Authorization and Request¶

Creating the Data Container¶

Rules for naming Containers and columns within a container object¶

A pythonic way to create a container¶

Row Registration (Data Loading)¶

Batch Loading properties of GridDB¶

Unveiling the Celestial Mysteries: Exploring the Captivating Astronomy Dataset¶

What does the distribution of the Top 20,000 stars look like?¶

What are some known stars and their spectral classification and magnitude?¶

What are the Top 5 Constellations by the Number of Stars?¶

Which constellations have some of the brightest stars?¶

Visualizing the 'Orion' Constellation¶

What are the spectral types for known stars across all constellations?¶

What does the spectral distribution of Stars look like across each constellation?¶

Concluding Remarks¶

Leave a Reply Cancel reply