Table of contents¶

Introduction of Gaia
What is Astroquery
Querying databases with Astroquery
Working with Pandas Dataframes
Calculating the Distance to Stars
Converting units with Astropy
Converting from apparent to absolute magnitude
Exploratory Data Analysis with Pandas
Understanding Correlations: Unraveling Relationships in Data
Making Publication Quality Plots
Understanding Color Maps and Accessibility
The Hertzsprung-Russell Diagram
Mapping Stellar Density in Galactic Coordinates
Open Clusters
Clustering

Accessing and Manipulating Astronomical Data Using Python¶

In this session, we will learn how to use the Python package Astroquery to access astronomical data. We will then utilize Astropy to simplify various calculations and unit conversions.

Our primary focus will be on data obtained by the Gaia mission. Gaia is a European Space Agency mission with the goal of creating a 3D map of the Milky Way. Gaia's mission is to accurately measure the positions of over 1 billion stars.

Image: Gaia - Exploring the Multi-dimensional Milky Way

We will be working with data collected using Gaia's Blue and Red Photometers (BP & RP) and making use of parallax measurements to estimate the distances to celestial objects.

The use of Gaia data for a HR diagram is based off of a blog post by Vlas Sokolov.

In [1]:

Copied!

import astroquery
import astroquery

Top of the page

What is Astroquery? ¶

Astroquery is a Python package that streamlines the process of querying astronomical databases and catalogs. It offers a wide range of options for accessing astronomical data. Some of the available options include:

Querying databases of celestial objects.
Retrieving data from sky surveys and telescopes.
Accessing information from observatories and space missions.

Image: A selection of options provided by Astroquery

For our tutorial, we will focus on using Astroquery to work with data from the Gaia mission. To get started, let's import the Gaia module:

In [2]:

Copied!

from astroquery.gaia import Gaia
from astroquery.gaia import Gaia

In astronomy, there are often vast databases filled with valuable information, and sometimes, we might not know exactly what we're looking for. That's where the ability to explore and discover available database tables becomes essential.

In [3]:

Copied!

tables = Gaia.load_tables()
tables = Gaia.load_tables()

INFO: Retrieving tables... [astroquery.utils.tap.core]
INFO: Parsing tables... [astroquery.utils.tap.core]
INFO: Done. [astroquery.utils.tap.core]

In [4]:

Copied!

for i, tab in enumerate(tables):
    print (i, tab.name)
for i, tab in enumerate(tables):
    print (i, tab.name)

0 external.apassdr9
1 external.gaiadr2_astrophysical_parameters
2 external.gaiadr2_geometric_distance
3 external.gaiaedr3_distance
4 external.gaiaedr3_gcns_main_1
5 external.gaiaedr3_gcns_rejected_1
6 external.gaiaedr3_spurious
7 external.galex_ais
8 external.ravedr5_com
9 external.ravedr5_dr5
10 external.ravedr5_gra
11 external.ravedr5_on
12 external.ravedr6
13 external.sdssdr13_photoprimary
14 external.skymapperdr1_master
15 external.skymapperdr2_master
16 external.tmass_xsc
17 gaiadr1.aux_qso_icrf2_match
18 gaiadr1.ext_phot_zero_point
19 gaiadr1.allwise_best_neighbour
20 gaiadr1.allwise_neighbourhood
21 gaiadr1.gsc23_best_neighbour
22 gaiadr1.gsc23_neighbourhood
23 gaiadr1.ppmxl_best_neighbour
24 gaiadr1.ppmxl_neighbourhood
25 gaiadr1.sdss_dr9_best_neighbour
26 gaiadr1.sdss_dr9_neighbourhood
27 gaiadr1.tmass_best_neighbour
28 gaiadr1.tmass_neighbourhood
29 gaiadr1.ucac4_best_neighbour
30 gaiadr1.ucac4_neighbourhood
31 gaiadr1.urat1_best_neighbour
32 gaiadr1.urat1_neighbourhood
33 gaiadr1.cepheid
34 gaiadr1.phot_variable_time_series_gfov
35 gaiadr1.phot_variable_time_series_gfov_statistical_parameters
36 gaiadr1.rrlyrae
37 gaiadr1.variable_summary
38 gaiadr1.allwise_original_valid
39 gaiadr1.gsc23_original_valid
40 gaiadr1.ppmxl_original_valid
41 gaiadr1.sdssdr9_original_valid
42 gaiadr1.tmass_original_valid
43 gaiadr1.ucac4_original_valid
44 gaiadr1.urat1_original_valid
45 gaiadr1.gaia_source
46 gaiadr1.tgas_source
47 gaiadr2.aux_allwise_agn_gdr2_cross_id
48 gaiadr2.aux_iers_gdr2_cross_id
49 gaiadr2.aux_sso_orbit_residuals
50 gaiadr2.aux_sso_orbits
51 gaiadr2.dr1_neighbourhood
52 gaiadr2.allwise_best_neighbour
53 gaiadr2.allwise_neighbourhood
54 gaiadr2.apassdr9_best_neighbour
55 gaiadr2.apassdr9_neighbourhood
56 gaiadr2.gsc23_best_neighbour
57 gaiadr2.gsc23_neighbourhood
58 gaiadr2.hipparcos2_best_neighbour
59 gaiadr2.hipparcos2_neighbourhood
60 gaiadr2.panstarrs1_best_neighbour
61 gaiadr2.panstarrs1_neighbourhood
62 gaiadr2.ppmxl_best_neighbour
63 gaiadr2.ppmxl_neighbourhood
64 gaiadr2.ravedr5_best_neighbour
65 gaiadr2.ravedr5_neighbourhood
66 gaiadr2.sdssdr9_best_neighbour
67 gaiadr2.sdssdr9_neighbourhood
68 gaiadr2.tmass_best_neighbour
69 gaiadr2.tmass_neighbourhood
70 gaiadr2.tycho2_best_neighbour
71 gaiadr2.tycho2_neighbourhood
72 gaiadr2.urat1_best_neighbour
73 gaiadr2.urat1_neighbourhood
74 gaiadr2.sso_observation
75 gaiadr2.sso_source
76 gaiadr2.vari_cepheid
77 gaiadr2.vari_classifier_class_definition
78 gaiadr2.vari_classifier_definition
79 gaiadr2.vari_classifier_result
80 gaiadr2.vari_long_period_variable
81 gaiadr2.vari_rotation_modulation
82 gaiadr2.vari_rrlyrae
83 gaiadr2.vari_short_timescale
84 gaiadr2.vari_time_series_statistics
85 gaiadr2.panstarrs1_original_valid
86 gaiadr2.gaia_source
87 gaiadr2.ruwe
88 gaiadr3.gaia_source
89 gaiadr3.gaia_source_lite
90 gaiadr3.astrophysical_parameters
91 gaiadr3.astrophysical_parameters_supp
92 gaiadr3.oa_neuron_information
93 gaiadr3.oa_neuron_xp_spectra
94 gaiadr3.total_galactic_extinction_map
95 gaiadr3.total_galactic_extinction_map_opt
96 gaiadr3.commanded_scan_law
97 gaiadr3.allwise_best_neighbour
98 gaiadr3.allwise_neighbourhood
99 gaiadr3.apassdr9_best_neighbour
100 gaiadr3.apassdr9_join
101 gaiadr3.apassdr9_neighbourhood
102 gaiadr3.dr2_neighbourhood
103 gaiadr3.gsc23_best_neighbour
104 gaiadr3.gsc23_join
105 gaiadr3.gsc23_neighbourhood
106 gaiadr3.hipparcos2_best_neighbour
107 gaiadr3.hipparcos2_neighbourhood
108 gaiadr3.panstarrs1_best_neighbour
109 gaiadr3.panstarrs1_join
110 gaiadr3.panstarrs1_neighbourhood
111 gaiadr3.ravedr5_best_neighbour
112 gaiadr3.ravedr5_join
113 gaiadr3.ravedr5_neighbourhood
114 gaiadr3.ravedr6_best_neighbour
115 gaiadr3.ravedr6_join
116 gaiadr3.ravedr6_neighbourhood
117 gaiadr3.sdssdr13_best_neighbour
118 gaiadr3.sdssdr13_join
119 gaiadr3.sdssdr13_neighbourhood
120 gaiadr3.skymapperdr2_best_neighbour
121 gaiadr3.skymapperdr2_join
122 gaiadr3.skymapperdr2_neighbourhood
123 gaiadr3.tmass_psc_xsc_best_neighbour
124 gaiadr3.tmass_psc_xsc_join
125 gaiadr3.tmass_psc_xsc_neighbourhood
126 gaiadr3.tycho2tdsc_merge_best_neighbour
127 gaiadr3.tycho2tdsc_merge_neighbourhood
128 gaiadr3.urat1_best_neighbour
129 gaiadr3.urat1_neighbourhood
130 gaiadr3.galaxy_candidates
131 gaiadr3.galaxy_catalogue_name
132 gaiadr3.qso_candidates
133 gaiadr3.qso_catalogue_name
134 gaiadr3.nss_acceleration_astro
135 gaiadr3.nss_non_linear_spectro
136 gaiadr3.nss_two_body_orbit
137 gaiadr3.nss_vim_fl
138 gaiadr3.binary_masses
139 gaiadr3.chemical_cartography
140 gaiadr3.gold_sample_carbon_stars
141 gaiadr3.gold_sample_fgkm_stars
142 gaiadr3.gold_sample_oba_stars
143 gaiadr3.gold_sample_solar_analogues
144 gaiadr3.gold_sample_spss
145 gaiadr3.gold_sample_ucd
146 gaiadr3.sso_orbits
147 gaiadr3.synthetic_photometry_gspc
148 gaiadr3.vari_spurious_signals
149 gaiadr3.agn_cross_id
150 gaiadr3.frame_rotator_source
151 gaiadr3.gaia_crf3_xm
152 gaiadr3.alerts_mixedin_sourceids
153 gaiadr3.science_alerts
154 gaiadr3.gaia_source_simulation
155 gaiadr3.gaia_universe_model
156 gaiadr3.sso_observation
157 gaiadr3.sso_reflectance_spectrum
158 gaiadr3.sso_source
159 gaiadr3.xp_summary
160 gaiadr3.vari_agn
161 gaiadr3.vari_cepheid
162 gaiadr3.vari_classifier_class_definition
163 gaiadr3.vari_classifier_definition
164 gaiadr3.vari_classifier_result
165 gaiadr3.vari_compact_companion
166 gaiadr3.vari_eclipsing_binary
167 gaiadr3.vari_epoch_radial_velocity
168 gaiadr3.vari_long_period_variable
169 gaiadr3.vari_microlensing
170 gaiadr3.vari_ms_oscillator
171 gaiadr3.vari_planetary_transit
172 gaiadr3.vari_planetary_transit_13june2022
173 gaiadr3.vari_rad_vel_statistics
174 gaiadr3.vari_rotation_modulation
175 gaiadr3.vari_rrlyrae
176 gaiadr3.vari_short_timescale
177 gaiadr3.vari_summary
178 gaiadr3.tycho2tdsc_merge
179 gaiaedr3.gaia_source
180 gaiaedr3.agn_cross_id
181 gaiaedr3.commanded_scan_law
182 gaiaedr3.dr2_neighbourhood
183 gaiaedr3.frame_rotator_source
184 gaiaedr3.allwise_best_neighbour
185 gaiaedr3.allwise_neighbourhood
186 gaiaedr3.apassdr9_best_neighbour
187 gaiaedr3.apassdr9_join
188 gaiaedr3.apassdr9_neighbourhood
189 gaiaedr3.gsc23_best_neighbour
190 gaiaedr3.gsc23_join
191 gaiaedr3.gsc23_neighbourhood
192 gaiaedr3.hipparcos2_best_neighbour
193 gaiaedr3.hipparcos2_neighbourhood
194 gaiaedr3.panstarrs1_best_neighbour
195 gaiaedr3.panstarrs1_join
196 gaiaedr3.panstarrs1_neighbourhood
197 gaiaedr3.ravedr5_best_neighbour
198 gaiaedr3.ravedr5_join
199 gaiaedr3.ravedr5_neighbourhood
200 gaiaedr3.sdssdr13_best_neighbour
201 gaiaedr3.sdssdr13_join
202 gaiaedr3.sdssdr13_neighbourhood
203 gaiaedr3.skymapperdr2_best_neighbour
204 gaiaedr3.skymapperdr2_join
205 gaiaedr3.skymapperdr2_neighbourhood
206 gaiaedr3.tmass_psc_xsc_best_neighbour
207 gaiaedr3.tmass_psc_xsc_join
208 gaiaedr3.tmass_psc_xsc_neighbourhood
209 gaiaedr3.tycho2tdsc_merge_best_neighbour
210 gaiaedr3.tycho2tdsc_merge_neighbourhood
211 gaiaedr3.urat1_best_neighbour
212 gaiaedr3.urat1_neighbourhood
213 gaiaedr3.gaia_source_simulation
214 gaiaedr3.gaia_universe_model
215 gaiaedr3.tycho2tdsc_merge
216 gaiafpr.crowded_field_source
217 gaiafpr.lens_candidates
218 gaiafpr.lens_catalogue_name
219 gaiafpr.lens_observation
220 gaiafpr.lens_outlier
221 gaiafpr.sso_observation
222 gaiafpr.sso_source
223 gaiafpr.interstellar_medium_params
224 gaiafpr.interstellar_medium_spectra
225 gaiafpr.vari_epoch_radial_velocity
226 gaiafpr.vari_long_period_variable
227 gaiafpr.vari_rad_vel_statistics
228 public.hipparcos
229 public.hipparcos_newreduction
230 public.hubble_sc
231 public.igsl_source
232 public.igsl_source_catalog_ids
233 public.tycho2
234 public.dual
235 tap_config.coord_sys
236 tap_config.properties
237 tap_schema.columns
238 tap_schema.key_columns
239 tap_schema.keys
240 tap_schema.schemas
241 tap_schema.tables

We're interested in the "gaiadr3.gaia_source" table, this is entry 88. We can also see what columns are included in this table:

In [5]:

Copied!

cols = tables[88].columns
for col in cols:
    print (col.name)
cols = tables[88].columns
for col in cols:
    print (col.name)

solution_id
designation
source_id
random_index
ref_epoch
ra
ra_error
dec
dec_error
parallax
parallax_error
parallax_over_error
pm
pmra
pmra_error
pmdec
pmdec_error
ra_dec_corr
ra_parallax_corr
ra_pmra_corr
ra_pmdec_corr
dec_parallax_corr
dec_pmra_corr
dec_pmdec_corr
parallax_pmra_corr
parallax_pmdec_corr
pmra_pmdec_corr
astrometric_n_obs_al
astrometric_n_obs_ac
astrometric_n_good_obs_al
astrometric_n_bad_obs_al
astrometric_gof_al
astrometric_chi2_al
astrometric_excess_noise
astrometric_excess_noise_sig
astrometric_params_solved
astrometric_primary_flag
nu_eff_used_in_astrometry
pseudocolour
pseudocolour_error
ra_pseudocolour_corr
dec_pseudocolour_corr
parallax_pseudocolour_corr
pmra_pseudocolour_corr
pmdec_pseudocolour_corr
astrometric_matched_transits
visibility_periods_used
astrometric_sigma5d_max
matched_transits
new_matched_transits
matched_transits_removed
ipd_gof_harmonic_amplitude
ipd_gof_harmonic_phase
ipd_frac_multi_peak
ipd_frac_odd_win
ruwe
scan_direction_strength_k1
scan_direction_strength_k2
scan_direction_strength_k3
scan_direction_strength_k4
scan_direction_mean_k1
scan_direction_mean_k2
scan_direction_mean_k3
scan_direction_mean_k4
duplicated_source
phot_g_n_obs
phot_g_mean_flux
phot_g_mean_flux_error
phot_g_mean_flux_over_error
phot_g_mean_mag
phot_bp_n_obs
phot_bp_mean_flux
phot_bp_mean_flux_error
phot_bp_mean_flux_over_error
phot_bp_mean_mag
phot_rp_n_obs
phot_rp_mean_flux
phot_rp_mean_flux_error
phot_rp_mean_flux_over_error
phot_rp_mean_mag
phot_bp_rp_excess_factor
phot_bp_n_contaminated_transits
phot_bp_n_blended_transits
phot_rp_n_contaminated_transits
phot_rp_n_blended_transits
phot_proc_mode
bp_rp
bp_g
g_rp
radial_velocity
radial_velocity_error
rv_method_used
rv_nb_transits
rv_nb_deblended_transits
rv_visibility_periods_used
rv_expected_sig_to_noise
rv_renormalised_gof
rv_chisq_pvalue
rv_time_duration
rv_amplitude_robust
rv_template_teff
rv_template_logg
rv_template_fe_h
rv_atm_param_origin
vbroad
vbroad_error
vbroad_nb_transits
grvs_mag
grvs_mag_error
grvs_mag_nb_transits
rvs_spec_sig_to_noise
phot_variable_flag
l
b
ecl_lon
ecl_lat
in_qso_candidates
in_galaxy_candidates
non_single_star
has_xp_continuous
has_xp_sampled
has_rvs
has_epoch_photometry
has_epoch_rv
has_mcmc_gspphot
has_mcmc_msc
in_andromeda_survey
classprob_dsc_combmod_quasar
classprob_dsc_combmod_galaxy
classprob_dsc_combmod_star
teff_gspphot
teff_gspphot_lower
teff_gspphot_upper
logg_gspphot
logg_gspphot_lower
logg_gspphot_upper
mh_gspphot
mh_gspphot_lower
mh_gspphot_upper
distance_gspphot
distance_gspphot_lower
distance_gspphot_upper
azero_gspphot
azero_gspphot_lower
azero_gspphot_upper
ag_gspphot
ag_gspphot_lower
ag_gspphot_upper
ebpminrp_gspphot
ebpminrp_gspphot_lower
ebpminrp_gspphot_upper
libname_gspphot

Top of the page

What Data are we interseted in?¶

We're interested in extracting specific data fields from the database, which will help us gain insights into celestial objects. Here's what we want to retrieve:

Source Location:
- ra (Right Ascension)
- dec (Declination)
Color-Magnitude Diagram:
- bp_rp (Blue-Red Color), which represents the difference between the Blue and Red band magnitudes.
Brightness of the Source:
- phot_g_mean_mag, which is the average apparent G-band magnitude.
Distance Estimation:
- parallax, which measures the apparent motion of the star due to the Earth's motion.
- parallax_over_error, calculated as the ratio of parallax to the uncertainty on the parallax.

Image: Measuring Stellar Distances by Parallax

By combining these fields, we will construct a comprehensive query that will help us obtain the data we need. Additionally, it's a good practice to save the query results to a CSV file for future reference, so we won't need to rerun the query.

In [6]:

Copied!





# import pandas as pd
# query_size = 1000000  # Number of stars we want to get
# distance = 200  # Distance (in pc) out to which we will query
# job = Gaia.launch_job_async("select top {}".format(query_size)+
#                 " ra, dec, parallax, parallax_over_error, "   # Getting source location and parallax
#                 " bp_rp, phot_g_mean_mag "                    # Getting color and magnitude measurements
#                 " from gaiadr3.gaia_source"                   # Selecting the data source
#                 # All of these are data quality checks
#                 " where parallax_over_error > 10"
#                 " and visibility_periods_used > 8"
#                 " and phot_g_mean_flux_over_error > 50"
#                 " and phot_bp_mean_flux_over_error > 20"
#                 " and phot_rp_mean_flux_over_error > 20"
#                 " and phot_bp_rp_excess_factor <"
#                 " 1.3+0.06*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
#                 " and phot_bp_rp_excess_factor >"
#                 " 1.0+0.015*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
#                 " and astrometric_chi2_al/(astrometric_n_good_obs_al-5)<"
#                 "1.44*greatest(1,exp(-0.4*(phot_g_mean_mag-19.5)))"
#                 # Filtering on distance
#                 +" and 1000/parallax <= {}".format(distance))


# r = job.get_results()
# # Convert to pandas
# df = r.to_pandas()
# # Save to a csv
# df.to_csv("gaia3.csv")
# import pandas as pd
# query_size = 1000000  # Number of stars we want to get
# distance = 200  # Distance (in pc) out to which we will query
# job = Gaia.launch_job_async("select top {}".format(query_size)+
#                 " ra, dec, parallax, parallax_over_error, "   # Getting source location and parallax
#                 " bp_rp, phot_g_mean_mag "                    # Getting color and magnitude measurements
#                 " from gaiadr3.gaia_source"                   # Selecting the data source
#                 # All of these are data quality checks
#                 " where parallax_over_error > 10"
#                 " and visibility_periods_used > 8"
#                 " and phot_g_mean_flux_over_error > 50"
#                 " and phot_bp_mean_flux_over_error > 20"
#                 " and phot_rp_mean_flux_over_error > 20"
#                 " and phot_bp_rp_excess_factor <"
#                 " 1.3+0.06*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
#                 " and phot_bp_rp_excess_factor >"
#                 " 1.0+0.015*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
#                 " and astrometric_chi2_al/(astrometric_n_good_obs_al-5)<"
#                 "1.44*greatest(1,exp(-0.4*(phot_g_mean_mag-19.5)))"
#                 # Filtering on distance
#                 +" and 1000/parallax <= {}".format(distance))


# r = job.get_results()
# # Convert to pandas
# df = r.to_pandas()
# # Save to a csv
# df.to_csv("gaia3.csv")

Top of the page

What is pandas¶

Pandas is a popular open-source data manipulation and analysis library for Python, providing versatile data structures and data analysis tools. A pandas dataframe is a two-dimensional, labeled data structure that resembles a table and is used for data manipulation, analysis, and exploration in Python.

In [7]:

Copied!

import pandas as pd
df = pd.read_csv("gaia3.csv")
import pandas as pd
df = pd.read_csv("gaia3.csv")

In [8]:

Copied!

df.head()
df.head()

Out[8]:

	Unnamed: 0	ra	dec	parallax	parallax_over_error	bp_rp	phot_g_mean_mag	pm	pmra	pmra_error	pmdec	pmdec_error
0	0	251.492554	-50.749533	6.383756	103.153015	0.795505	10.546730	32.093870	31.380714	0.059630	-6.728077	0.045104
1	1	251.589977	-50.638635	13.398257	750.623540	2.165551	13.470652	98.008840	-49.358441	0.017024	-84.672770	0.012676
2	2	251.093139	-51.007072	6.106724	448.880800	1.269135	12.529914	9.986012	5.650984	0.013578	8.233275	0.009894
3	3	251.746461	-51.201058	7.481942	87.461440	2.588948	16.969418	147.466350	10.335827	0.085701	-147.103700	0.056509
4	4	250.188154	-51.273050	8.606533	163.012250	2.755577	16.339110	5.618203	-1.305274	0.064197	5.464473	0.043635

In [9]:

Copied!

df.describe()
df.describe()

Out[9]:

	Unnamed: 0	ra	dec	parallax	parallax_over_error	bp_rp	phot_g_mean_mag	pm	pmra	pmra_error	pmdec	pmdec_error
count	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06	1.433622e+06
mean	7.168105e+05	1.826020e+02	-5.932620e-01	7.951749e+00	2.770547e+02	2.157298e+00	1.482734e+01	6.438331e+01	-1.552588e+00	5.269976e-02	-2.148792e+01	4.799410e-02
std	4.138512e+05	1.039087e+02	3.972358e+01	4.958763e+00	2.882675e+02	8.357065e-01	2.650869e+00	6.995515e+01	6.803307e+01	4.709206e-02	6.281956e+01	4.192370e-02
min	0.000000e+00	1.786275e-04	-8.991512e+01	5.000001e+00	1.001268e+01	-6.166201e-01	2.777715e+00	1.365079e-02	-4.406469e+03	4.317274e-03	-5.817800e+03	5.807723e-03
25%	3.584052e+05	9.176268e+01	-3.176953e+01	5.610729e+00	9.643445e+01	1.533804e+00	1.329365e+01	2.768513e+01	-2.899033e+01	2.056515e-02	-4.296366e+01	1.857064e-02
50%	7.168105e+05	1.847296e+02	-6.636379e-01	6.571446e+00	1.894774e+02	2.433110e+00	1.555917e+01	4.784237e+01	-1.827989e+00	3.864415e-02	-1.609957e+01	3.580896e-02
75%	1.075216e+06	2.738694e+02	3.016465e+01	8.494259e+00	3.672869e+02	2.789234e+00	1.682265e+01	7.966884e+01	2.575548e+01	7.004569e-02	6.412341e+00	6.389684e-02
max	1.433621e+06	3.599994e+02	8.994934e+01	7.680665e+02	1.540048e+04	5.435446e+00	2.058442e+01	1.039335e+04	6.765995e+03	2.072775e+00	1.036239e+04	2.114847e+00

Top of the page

How Do We Calculate the Distance to Stars?¶

Determining the distance to stars in astronomy involves trigonometry. We can calculate the distance ($d$) to a star using the formula:

$$d = \frac{1}{p},$$

Here, $d$ is the distance in a unit called "parsecs," and "$p$" is the parallax angle in arcseconds. It's important to note that the Gaia data records the parallax angle in units of milliarcseconds, so we need to perform a unit conversion. Top of the page

How Do We Convert Units?¶

To handle this unit conversion, we'll utilize astropy.units. To further simplify the process, we'll leverage Astropy not only for unit conversion but also for distance calculation. In addition, we'll use NumPy arrays to perform mathematical operations on large lists or arrays of numbers.

In [10]:

Copied!

import numpy as np
from astropy import units as u
from astropy.coordinates import Distance
import numpy as np
from astropy import units as u
from astropy.coordinates import Distance

In [11]:

Copied!





def get_distance(parallax):
    """
    Calculate the distance to a star using its parallax angle.

    Parameters:
    -----------
    parallax : float
        Parallax angle in milliarcseconds (mas).

    Returns:
    --------
    astropy.coordinates.Distance
        The distance to the star in parsecs (pc).
    
    Example:
    --------
    >>> parallax = 12.34  # Parallax angle in mas
    >>> distance = get_distance(parallax)
    >>> print(distance)
    81.15 pc
    """
    # Convert parallax angle to milliarcseconds (mas)
    p = parallax * u.mas
    
    # Calculate the distance using astropy's Distance module
    distance = Distance(parallax=p)
    
    return distance

def get_distance(parallax):
    """
    Calculate the distance to a star using its parallax angle.

    Parameters:
    -----------
    parallax : float
        Parallax angle in milliarcseconds (mas).

    Returns:
    --------
    astropy.coordinates.Distance
        The distance to the star in parsecs (pc).
    
    Example:
    --------
    >>> parallax = 12.34  # Parallax angle in mas
    >>> distance = get_distance(parallax)
    >>> print(distance)
    81.15 pc
    """
    # Convert parallax angle to milliarcseconds (mas)
    p = parallax * u.mas
    
    # Calculate the distance using astropy's Distance module
    distance = Distance(parallax=p)
    
    return distance

In [12]:

Copied!

star_distances = get_distance( np.array(df["parallax"]) )
star_distances = get_distance( np.array(df["parallax"]) )

In [13]:

Copied!

min_distance = star_distances.min()
max_distance = star_distances.max()
print (min_distance, max_distance)

min_distance = star_distances.min()
max_distance = star_distances.max()
print (min_distance, max_distance)

1.3019705311704333 pc 199.9999448214332 pc

So, let's recap our progress:

We utilized Astroquery to retrieve data from the Gaia mission, obtaining essential information about stars.
With the help of Astropy, we handled the conversion of parallax angles from milliarcseconds to arcseconds and subsequently calculated the distances to these stars in parsecs.

The outputted distances are indicated with pc after the numeric value, signifying the unit of measurement. Astropy makes it easy to work with these units and perform conversions as needed.

In [14]:

Copied!





print (f'Distance in light years {min_distance.to("lightyear"):0.2f}')
print (f'Distance in meters {min_distance.to("m"):0.2e}')
# If we want to use imperial units we must enable it
# (because we should always use metric!)
u.imperial.enable()  
print (f'Distance in inches {min_distance.to("inch"):0.2e}')
print (f'Distance in light years {min_distance.to("lightyear"):0.2f}')
print (f'Distance in meters {min_distance.to("m"):0.2e}')
# If we want to use imperial units we must enable it
# (because we should always use metric!)
u.imperial.enable()  
print (f'Distance in inches {min_distance.to("inch"):0.2e}')

Distance in light years 4.25 lyr
Distance in meters 4.02e+16 m
Distance in inches 1.58e+18 inch

Converting between units with Astropy is as simple as using .to("unit"). This not only saves time but also minimizes the risk of human errors when performing unit conversions.

Top of the page

Converting from Apparent Magnitude to Absolute Magnitude¶

When assessing the brightness of an astronomical object, it's crucial to consider the impact of distance on our measurements.

The brightness of an object appears different depending on its proximity to us; it will appear brighter if it's closer and dimmer if it's farther away. In astronomy, we commonly use "Apparent Magnitude" to describe how bright an object appears from our vantage point and "Absolute Magnitude" to standardize brightness by accounting for the object's distance. Absolute Magnitude quantifies how bright an object would appear if it were located precisely 10 parsecs away.

Image: Absolute Magnitude Concept

By knowing the distance in parsecs, we can convert from apparent to absolute magnitude using the formula:

$$M = m - 5 \log_{10}(d_{pc}) + 5,$$

Where:

$M$ is the Absolute Magnitude.
$m$ is the Apparent Magnitude.
$d_{pc}$ is the distance in parsecs.

This formula enables us to correct for the influence of distance when comparing the brightness of celestial objects.

In [15]:

Copied!





def calculate_absolute_magnitude(apparent_magnitude, distance):
    """
    Calculate the Absolute Magnitude of an astronomical object given its Apparent Magnitude and distance.

    Parameters:
    -----------
    apparent_magnitude : float
        The Apparent Magnitude of the object.
    
    distance : astropy.units.Quantity
        The distance to the object in parsecs (pc).

    Returns:
    --------
    float
        The Absolute Magnitude of the object.

    Example:
    --------
    >>> apparent_mag = 5.0  # Apparent Magnitude
    >>> distance = 10 * u.pc  # Distance in parsecs
    >>> absolute_mag = calculate_absolute_magnitude(apparent_mag, distance)
    >>> print(absolute_mag)
    0.0
    """
    # Convert distance to parsecs
    dist_pc = distance.to(u.pc)
    
    # Calculate the Absolute Magnitude using the formula
    M = apparent_magnitude - 5 * np.log10(dist_pc.value) + 5

    return M
def calculate_absolute_magnitude(apparent_magnitude, distance):
    """
    Calculate the Absolute Magnitude of an astronomical object given its Apparent Magnitude and distance.

    Parameters:
    -----------
    apparent_magnitude : float
        The Apparent Magnitude of the object.
    
    distance : astropy.units.Quantity
        The distance to the object in parsecs (pc).

    Returns:
    --------
    float
        The Absolute Magnitude of the object.

    Example:
    --------
    >>> apparent_mag = 5.0  # Apparent Magnitude
    >>> distance = 10 * u.pc  # Distance in parsecs
    >>> absolute_mag = calculate_absolute_magnitude(apparent_mag, distance)
    >>> print(absolute_mag)
    0.0
    """
    # Convert distance to parsecs
    dist_pc = distance.to(u.pc)
    
    # Calculate the Absolute Magnitude using the formula
    M = apparent_magnitude - 5 * np.log10(dist_pc.value) + 5

    return M

In [16]:

Copied!

AbsM = calculate_absolute_magnitude(np.array(df["phot_g_mean_mag"]), star_distances)
AbsM = calculate_absolute_magnitude(np.array(df["phot_g_mean_mag"]), star_distances)

Let's add the values we've determinded back into out data frame

In [17]:

Copied!

df["AbsM"] = AbsM
df["Distance"] = star_distances.value
df["AbsM"] = AbsM
df["Distance"] = star_distances.value

In [18]:

Copied!

df.head()
df.head()

Out[18]:

	Unnamed: 0	ra	dec	parallax	parallax_over_error	bp_rp	phot_g_mean_mag	pm	pmra	pmra_error	pmdec	pmdec_error	AbsM	Distance
0	0	251.492554	-50.749533	6.383756	103.153015	0.795505	10.546730	32.093870	31.380714	0.059630	-6.728077	0.045104	4.572111	156.647594
1	1	251.589977	-50.638635	13.398257	750.623540	2.165551	13.470652	98.008840	-49.358441	0.017024	-84.672770	0.012676	9.105894	74.636572
2	2	251.093139	-51.007072	6.106724	448.880800	1.269135	12.529914	9.986012	5.650984	0.013578	8.233275	0.009894	6.458955	163.753924
3	3	251.746461	-51.201058	7.481942	87.461440	2.588948	16.969418	147.466350	10.335827	0.085701	-147.103700	0.056509	11.339490	133.655144
4	4	250.188154	-51.273050	8.606533	163.012250	2.755577	16.339110	5.618203	-1.305274	0.064197	5.464473	0.043635	11.013251	116.190808

Top of the page

Exploratory Data Analysis with Pandas¶

Explore your dataset through the lens of Exploratory Data Analysis (EDA). EDA is all about unveiling hidden patterns and stories within your data. It's your detective work to discover insights, detect anomalies, and gain a deeper understanding of your dataset.

We'll utilize pandas for data manipulation and Matplotlib/Seaborn for visualization. In this section, we'll delve into techniques for exploring your dataset, from basic properties to captivating visualizations. Welcome to the world of EDA!

Some methods of EDA:

Descriptive Statistics:
- Summary statistics
- Count of missing values
Data Distribution:
- Histograms
- Box plots
- Probability density functions (PDFs)
- Skewness and kurtosis
Data Relationships:
- Correlation matrix
- Scatter plots
- Heatmaps
- Pair plots
Categorical Data:
- Frequency counts
- Bar plots
- Cross-tabulations
Data Quality:
- Outliers
- Missing data
- Data consistency
Dimensionality Reduction:
- Principal Component Analysis (PCA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
Data Visualization:
- Various plots and charts

In [19]:

Copied!

import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

In [20]:

Copied!





# Get the columns to plot (excluding the first column, possibly an index or label)
cols = df.columns[1:]

# Calculate the number of rows and columns for subplots
nrows = 2
ncols = (len(cols) + 1) // 2

# Create a figure with subplots
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))

# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
    # Check if there are more columns to plot
    if i < len(cols):
        # Plot a histogram for the current column
        df[cols[i]].plot(kind='hist', ax=ax)
        ax.set_title(cols[i])

# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()

# Get the columns to plot (excluding the first column, possibly an index or label)
cols = df.columns[1:]

# Calculate the number of rows and columns for subplots
nrows = 2
ncols = (len(cols) + 1) // 2

# Create a figure with subplots
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))

# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
    # Check if there are more columns to plot
    if i < len(cols):
        # Plot a histogram for the current column
        df[cols[i]].plot(kind='hist', ax=ax)
        ax.set_title(cols[i])

# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()

No description has been provided for this image

seaborn also has a lof of useful plotting capbabilities

In [21]:

Copied!

import seaborn as sns
import seaborn as sns

In [22]:

Copied!





# Create a figure with subplots using Seaborn
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))

# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
    # Check if there are more columns to plot
    if i < len(cols):
        # Use Seaborn to create a histogram for the current column
        sns.histplot(data=df[:100000], x=cols[i], ax=ax, kde=True)  # Set kde=True for a kernel density estimate
        ax.set_title(cols[i])

# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()
# Create a figure with subplots using Seaborn
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))

# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
    # Check if there are more columns to plot
    if i < len(cols):
        # Use Seaborn to create a histogram for the current column
        sns.histplot(data=df[:100000], x=cols[i], ax=ax, kde=True)  # Set kde=True for a kernel density estimate
        ax.set_title(cols[i])

# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()

Top of the page

Understanding Correlations: Unraveling Relationships in Data¶

In the realm of data analysis, uncovering the relationships and dependencies between different variables is a fundamental task. Correlations are the keys that unlock the doors to understanding the hidden patterns, associations, and causations within datasets. These intriguing statistical relationships offer a wealth of insights, guiding our understanding of how changes in one variable can impact another.

Correlations enable us to answer questions such as:

Does an increase in one variable lead to a corresponding increase or decrease in another?
Are two variables completely independent, or is there a subtle connection waiting to be discovered?
Which factors are most influential in affecting a specific outcome?

With the correlation between two parameters (A and B) we can say:

1, when A increases so does B. These are said to be "correalated"
0, when A increases we see no change in B. These are said to be "uncorrelated"
-1, when A increase B decreases. These are said to be "anit-correalted"

In the case of 1 and -1, we have 100% correlation and anti-correlation. We can have any value between these two numbers

In [23]:

Copied!





# Calculate the correlation matrix
cormat = df[cols].corr()

# Set the colormap and color range
cmap = sns.color_palette("coolwarm", as_cmap=True)  # You can change the colormap as needed
vmin, vmax = -1, 1  # Set the range for the color scale

# Create the correlation heatmap
plt.figure(figsize=(10, 8))  # Set the figure size
sns.heatmap(cormat, cmap=cmap, vmin=vmin, vmax=vmax, annot=True, fmt=".2f")  # You can add annotations with fmt=".2f"

# Set plot title
plt.title("Correlation Heatmap")

# Show the plot
plt.show()

# Calculate the correlation matrix
cormat = df[cols].corr()

# Set the colormap and color range
cmap = sns.color_palette("coolwarm", as_cmap=True)  # You can change the colormap as needed
vmin, vmax = -1, 1  # Set the range for the color scale

# Create the correlation heatmap
plt.figure(figsize=(10, 8))  # Set the figure size
sns.heatmap(cormat, cmap=cmap, vmin=vmin, vmax=vmax, annot=True, fmt=".2f")  # You can add annotations with fmt=".2f"

# Set plot title
plt.title("Correlation Heatmap")

# Show the plot
plt.show()

With just a few lines of code, we can quickly discern significant insights from a correlation plot. We observe a clear correlation between the distance (Distance) and the brightness, as indicated by the phot_g_mean_mag value. However, this correlation disappears when we convert the brightness to Absolute Magnitude (AbsM).

Top of the page

Making Publication Quality Plots¶

Now, let's proceed with building the color-magnitude diagram by plotting bp_rp against AbsM.

Creating a high-quality scientific plot is essential for effectively conveying your data. A well-crafted plot should be designed to clearly communicate the key information to the viewer. It should include the following elements to enhance data understanding:

Labelled Axes: Clearly annotated x and y axes provide context and meaning to the data.
Descriptive Title: A meaningful title summarizing the plot's content is crucial for quick comprehension.
Grid Lines: Grid lines assist in reading data values accurately and help viewers align data points.
Sensible Scaling: Appropriate axis scaling ensures that data is presented in a visually understandable manner without distortion.
Legend (if applicable): If multiple data series are displayed, a legend should explain their meaning.
Colorbar (if applicable): When using color to represent data values, a colorbar helps viewers interpret the color-coding.

By incorporating these elements, your scientific plot becomes a powerful tool for sharing insights and findings with your audience.

In [24]:

Copied!





ax = plt.subplot()
# Lets plot each point
# alpha = 0.05, make the points transparanet (5%)
# s = 1, make the points small
# color = "k" make the points black
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k')

# Invert the y axis, smaller magnitude -> Brighter star
ax.invert_yaxis()
ax = plt.subplot()
# Lets plot each point
# alpha = 0.05, make the points transparanet (5%)
# s = 1, make the points small
# color = "k" make the points black
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k')

# Invert the y axis, smaller magnitude -> Brighter star
ax.invert_yaxis()

While the scatter plot does reveal some underlying structure, it may not provide a highly informative representation of the data. The plot becomes saturated in high-density regions, making it challenging to discern details. To address this, we can use a histogram to visualize the density distribution more effectively.

In [25]:

Copied!





ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=100,  cmap="jet")
ax.invert_yaxis()
ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=100,  cmap="jet")
ax.invert_yaxis()

This looks better, but we've made a critical error... Top of the page

Understanding Color Maps and Accessibility¶

Color maps play a crucial role in data visualization, helping us represent data through a spectrum of colors. However, not all color maps are created equal, and some, like the "jet" color map, have raised concerns due to their perceptual issues.

The "jet" color map, characterized by its vibrant and rainbow-like appearance, is notorious for its drawbacks, including:

Perceptual Uniformity: It fails to maintain a perceptual uniformity, which means that equal steps in data values may not appear equally spaced in the color map, leading to misleading visualizations.
Color Saturation: The "jet" color map can saturate at high and low ends, making it challenging to distinguish different data values in these regions.
Colorblind Accessibility: Accessibility is a vital concern. Many individuals with color vision deficiencies, commonly referred to as colorblindness, may find it difficult to interpret data represented using color maps like "jet."

To create effective and accessible visualizations, it's essential to consider the choice of color map and its impact on how data is perceived, especially in cases where colorblind individuals may interact with the visualizations.

What information do you get from the above plot?

$image.png$

CMasheris a valuable Python package offering accessible scientific colormaps. All the color maps included in CMasher are designed to be colorblind-friendly and exhibit a linear increase in brightness."

In [26]:

Copied!

import cmasher as cms
 import cmasher as cms

In [27]:

Copied!





ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50,  cmap=cms.toxic)
ax.invert_yaxis()
ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50,  cmap=cms.toxic)
ax.invert_yaxis()

This looks better, but we still need to clean things up. The plot looks good, but a reader has no idea what is been plotted.

In [28]:

Copied!





# use colors to get a log color scale
from matplotlib import colors


# Create a subplot
ax = plt.subplot()

# Scatter plot of data points with customizations
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram (density plot) with a log-normal color scale
# - bins: Use 100 bins in both the x and y directions
# - cmin: Require at least 10 entries in a bin to be shown
# - cmap: Use the "toxic" color map from the cms module
# - norm: Utilize a logarithmic color scale to better visualize data distribution
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50, norm=colors.LogNorm(), zorder=0.5, cmap=cms.toxic)


# Invert the y-axis to match the typical appearance of a color-magnitude diagram
ax.invert_yaxis()

# Add a color bar for the density plot
cb = fig.colorbar(h[3], ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'$G_{BP} - G_{RP}$')
ax.set_ylabel(r'$M_G$')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")

# Add grid lines to the plot
ax.grid()
# use colors to get a log color scale
from matplotlib import colors


# Create a subplot
ax = plt.subplot()

# Scatter plot of data points with customizations
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)

# Create a 2D histogram (density plot) with a log-normal color scale
# - bins: Use 100 bins in both the x and y directions
# - cmin: Require at least 10 entries in a bin to be shown
# - cmap: Use the "toxic" color map from the cms module
# - norm: Utilize a logarithmic color scale to better visualize data distribution
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50, norm=colors.LogNorm(), zorder=0.5, cmap=cms.toxic)


# Invert the y-axis to match the typical appearance of a color-magnitude diagram
ax.invert_yaxis()

# Add a color bar for the density plot
cb = fig.colorbar(h[3], ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'$G_{BP} - G_{RP}$')
ax.set_ylabel(r'$M_G$')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")

# Add grid lines to the plot
ax.grid()

So, what insights can we gather from this plot?

The plot's left side corresponds to 'blue,' while the right side corresponds to 'red.'
The lower section of the plot represents 'dim' stars, whereas the upper section showcases 'bright' stars.
The majority of stars form a discernible diagonal pattern, with dimmer stars appearing 'redder' and brighter stars appearing 'bluer.'
Notably, we observe a distinct grouping of dim, blue stars in the bottom-left quadrant of the plot.
There is a noticeable 'turn-off' branch where we encounter both bright blue and bright red stars.
As we progress along this branch, the bright red stars appear to level off in brightness and become more 'red.'
Stars residing in the bright red and dim blue regions are relatively scarce.

Top of the page

The Hertzsprung-Russell Diagram¶

In the realm of astronomy, we've uncovered one of the most profound tools: the Hertzsprung-Russell (HR) Diagram.

The HR Diagram serves as a stellar narrative, shedding light on the intricate lives of stars. It all commences with a star's birth, and its initial position on the diagram is determined by its mass. Stars embark on their cosmic journey along the 'Main Sequence,' where they sustain themselves by converting hydrogen into helium—a process known as hydrogen burning. This phase constitutes the majority of a star's existence.

However, as time advances, stars deplete their hydrogen fuel. When this occurs, they transition to the 'Giant' branch. Giants display a cooler surface temperature, giving them a 'reddish' appearance. Concurrently, they experience a considerable increase in luminosity, shining much brighter. This phenomenon is accompanied by a significant expansion in their physical size, leading to their classification as 'red giants.'

Now, here's the fascinating twist: Stars like our Sun, after the exhaustion of helium, evolve into 'White Dwarfs.' This transformation arises from the fact that they lack the necessary mass to facilitate the fusion of elements heavier than helium. These stars have meticulously fused all their available helium into carbon. Unlike most stars that maintain a delicate equilibrium between gravitational forces and nuclear fusion, White Dwarfs are held in place by a phenomenon known as 'electron degeneracy.'

In essence, the HR Diagram serves as a window into the diverse chapters of stellar existence, offering us valuable insights into the captivating stories of these cosmic marvels.

Top of the page

Mapping Stellar Density in Galactic Coordinates¶

Our celestial journey continues as we leverage the rich data from the Gaia mission that we previously dissected. In this leg of our astronomical exploration, we're about to embark on an exciting mission: mapping the stellar density within our Milky Way galaxy.

Armed with Gaia's invaluable data, we'll now venture into the realm of galactic coordinates. These coordinates provide us with a unique perspective, allowing us to chart the positions and densities of stars across the vast expanse of our galactic home.

By harnessing the power of data analysis and visualization, we'll unveil a stellar density map that offers an intricate view of the distribution and concentration of stars within the Milky Way. Our map will not only guide us through the celestial neighborhoods but also shed light on the interplay between stars, clusters, and the galactic structures that shape our night sky.

So, prepare to join us in this thrilling endeavor as we decode the secrets of the stars and craft a stellar density map in the wondrous realm of galactic coordinates.

In [29]:

Copied!

from astropy.coordinates import SkyCoord
from astropy.coordinates import SkyCoord

In [30]:

Copied!

coord = SkyCoord(df["ra"], df["dec"], unit="deg")
coord = SkyCoord(df["ra"], df["dec"], unit="deg")

In [31]:

Copied!

gal_coords = coord.galactic
gal_coords = coord.galactic

In [32]:

Copied!

plt.subplot(111, projection='aitoff')
plt.grid(True)
plt.scatter(gal_coords.l.wrap_at('180d').radian, gal_coords.b.radian, alpha = 0.002)
plt.subplot(111, projection='aitoff')
plt.grid(True)
plt.scatter(gal_coords.l.wrap_at('180d').radian, gal_coords.b.radian, alpha = 0.002)

Out[32]:

<matplotlib.collections.PathCollection at 0x7f3339ae9850>

In [33]:

Copied!





def get_bin_centers(binning):
    """
    Calculate the bin centers given a set of bin edges.

    Parameters:
    binning (numpy.ndarray): An array containing the bin edges.

    Returns:
    numpy.ndarray: An array of bin centers.

    Example:
    >>> bin_edges = np.array([0, 1, 2, 3, 4, 5])
    >>> centers = get_bin_centers(bin_edges)
    >>> print(centers)
    [0.5 1.5 2.5 3.5 4.5]
    """

    # Calculate the width of each bin by subtracting the previous bin's edge from the current bin's edge
    width = binning[1:] - binning[:-1]

    # Calculate the bin centers by adding half of the bin width to the left bin edge
    bin_centers = binning[:-1] + 0.5 * width

    return bin_centers
def get_bin_centers(binning):
    """
    Calculate the bin centers given a set of bin edges.

    Parameters:
    binning (numpy.ndarray): An array containing the bin edges.

    Returns:
    numpy.ndarray: An array of bin centers.

    Example:
    >>> bin_edges = np.array([0, 1, 2, 3, 4, 5])
    >>> centers = get_bin_centers(bin_edges)
    >>> print(centers)
    [0.5 1.5 2.5 3.5 4.5]
    """

    # Calculate the width of each bin by subtracting the previous bin's edge from the current bin's edge
    width = binning[1:] - binning[:-1]

    # Calculate the bin centers by adding half of the bin width to the left bin edge
    bin_centers = binning[:-1] + 0.5 * width

    return bin_centers

In [34]:

Copied!





# Define the number of bins and the bin edges for galactic longitude (l) and latitude (b)
nbins = 100
l_bins = np.linspace(-180, 180, nbins)
b_bins = np.linspace(-90, 90, nbins)

# Calculate the bin centers for galactic longitude (l) and latitude (b)
l_center = get_bin_centers(l_bins)
b_center = get_bin_centers(b_bins)

# Calculate the 2D histogram (counts) of galactic coordinates l and b
# Use np.histogram2d to create a 2D histogram from galactic coordinates
# Wrap galactic longitude (l) at 180 degrees to ensure it covers the full range
counts, _, _ = np.histogram2d(
    gal_coords.l.wrap_at('180d').radian,  # Convert and wrap l in radians
    gal_coords.b.radian,                # Convert b in radians
    bins=[np.deg2rad(l_bins), np.deg2rad(b_bins)]  # Convert bin edges to radians
)
# Define the number of bins and the bin edges for galactic longitude (l) and latitude (b)
nbins = 100
l_bins = np.linspace(-180, 180, nbins)
b_bins = np.linspace(-90, 90, nbins)

# Calculate the bin centers for galactic longitude (l) and latitude (b)
l_center = get_bin_centers(l_bins)
b_center = get_bin_centers(b_bins)

# Calculate the 2D histogram (counts) of galactic coordinates l and b
# Use np.histogram2d to create a 2D histogram from galactic coordinates
# Wrap galactic longitude (l) at 180 degrees to ensure it covers the full range
counts, _, _ = np.histogram2d(
    gal_coords.l.wrap_at('180d').radian,  # Convert and wrap l in radians
    gal_coords.b.radian,                # Convert b in radians
    bins=[np.deg2rad(l_bins), np.deg2rad(b_bins)]  # Convert bin edges to radians
)

In [35]:

Copied!





# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')

# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.savanna)

# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")

# Add grid lines to the celestial map
ax.grid()
# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')

# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.savanna)

# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")

# Add grid lines to the celestial map
ax.grid()

We observer that most stars fall alone a latitude of 0. This is the galactic plane. We also notice some spots of high stellar density. These are likely stellar clusters. Clusters of gravationally bound groups of stars.

Top of the page

Open Clusters¶

Open cluster: NGC 2164

Open clusters are remarkable gatherings of young stars that form from the same cloud of interstellar gas and dust. These stellar congregations are a captivating feature of our galaxy, the Milky Way, and can be found in various regions of its spiral arms. Open clusters typically contain hundreds to thousands of stars.

Open clusters provide insights into stellar evolution and the early stages of star formation. The young stars within open clusters often share similar ages, compositions, and distances, making them ideal laboratories for studying stellar properties and the effects of their environment on their development.

Over time, open clusters gradually disperse due to gravitational interactions and tidal forces from the Milky Way. As a result, the stars within these clusters eventually go their separate ways, becoming part of the general stellar population of our galaxy.

Studying open clusters offers valuable clues about the dynamics of our galaxy, the formation of stars, and the life cycles of stellar systems. Astronomers continue to explore and catalog these captivating celestial gatherings to deepen our understanding of the cosmos.

In [36]:

Copied!

df_open_clusters = pd.read_csv("./OpenClusters.csv")
df_open_clusters = pd.read_csv("./OpenClusters.csv")

In [37]:

Copied!

df_open_clusters_near = df_open_clusters[df_open_clusters["Distance (parsecs)"] < 200]
df_open_clusters_near = df_open_clusters[df_open_clusters["Distance (parsecs)"] < 200]

In [38]:

Copied!

df_open_clusters_near.head()
df_open_clusters_near.head()

Out[38]:

	Unnamed: 0	Cluster identifier	Unnamed: 1_level_0	Unnamed: 2_level_0	Constellation	Distance (parsecs)	Age (Myr)	Diameter	Apparent magnitude	Notes
0	0	Hyades	04h 26.9m	+15d 52m	Taurus	47.0	625	330.0	0.5	[3]
1	1	Coma Star Cluster	12h 22.5m	+25d 51m	Coma Berenices	86.0	400-500	120.0	1.8	[4]
20	20	Messier 44, Beehive Cluster	08h 40.4m	+19d 41m	Cancer	187.0	830	70.0	3.1	[10][11]
21	21	Messier 45, Pleiades	03h 47.4m	+24d 07m	Taurus	136.0	125	120.0	1.2	[12]
30	30	IC 2602, Southern Pleiades	10h 43.2m	-64d 24m	Carina	167.0	30	100.0	1.9	[14]

In [39]:

Copied!





cluster_ra = df_open_clusters_near["Unnamed: 1_level_0"]
cluster_dec = df_open_clusters_near["Unnamed: 2_level_0"]
cluster_coords = SkyCoord(cluster_ra,cluster_dec)

df_open_clusters_near["l"] = cluster_coords.galactic.l.rad
df_open_clusters_near["b"] = cluster_coords.galactic.b.rad
cluster_ra = df_open_clusters_near["Unnamed: 1_level_0"]
cluster_dec = df_open_clusters_near["Unnamed: 2_level_0"]
cluster_coords = SkyCoord(cluster_ra,cluster_dec)

df_open_clusters_near["l"] = cluster_coords.galactic.l.rad
df_open_clusters_near["b"] = cluster_coords.galactic.b.rad

/tmp/ipykernel_105960/3746201907.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_open_clusters_near["l"] = cluster_coords.galactic.l.rad
/tmp/ipykernel_105960/3746201907.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_open_clusters_near["b"] = cluster_coords.galactic.b.rad

In [40]:

Copied!

df_open_clusters_near.head()
df_open_clusters_near.head()

Out[40]:

	Unnamed: 0	Cluster identifier	Unnamed: 1_level_0	Unnamed: 2_level_0	Constellation	Distance (parsecs)	Age (Myr)	Diameter	Apparent magnitude	Notes	l	b
0	0	Hyades	04h 26.9m	+15d 52m	Taurus	47.0	625	330.0	0.5	[3]	3.142706	-0.389958
1	1	Coma Star Cluster	12h 22.5m	+25d 51m	Coma Berenices	86.0	400-500	120.0	1.8	[4]	3.882823	1.455624
20	20	Messier 44, Beehive Cluster	08h 40.4m	+19d 41m	Cancer	187.0	830	70.0	3.1	[10][11]	3.593651	0.567058
21	21	Messier 45, Pleiades	03h 47.4m	+24d 07m	Taurus	136.0	125	120.0	1.2	[12]	2.908491	-0.409449
30	30	IC 2602, Southern Pleiades	10h 43.2m	-64d 24m	Carina	167.0	30	100.0	1.9	[14]	5.054886	-0.085419

In [41]:

Copied!





sns.set_palette("colorblind")

# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')

# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)

# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")


# Set the number of brightest objects to overlay
N = 3 # Adjust this to select the top N brightest objects

# Sort the DataFrame by apparent magnitude (brightness)
sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')

# Select the top N brightest objects
brightest_objects = sorted_data.head(N)

# Create a scatter plot of galactic coordinates (l, b)
circle_size = 140
for i in range(len(brightest_objects)):
    ax.scatter(
        brightest_objects['l'].iloc[i], 
        brightest_objects['b'].iloc[i], 
        facecolor='none', edgecolor=f'C{i}',
        s=circle_size,
        label=brightest_objects['Cluster identifier'].iloc[i])


ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2)  # Adjust the bbox_to_anchor to control the legend position
# Add grid lines to the celestial map
ax.grid()
sns.set_palette("colorblind")

# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')

# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)

# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)

# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')

# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")


# Set the number of brightest objects to overlay
N = 3 # Adjust this to select the top N brightest objects

# Sort the DataFrame by apparent magnitude (brightness)
sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')

# Select the top N brightest objects
brightest_objects = sorted_data.head(N)

# Create a scatter plot of galactic coordinates (l, b)
circle_size = 140
for i in range(len(brightest_objects)):
    ax.scatter(
        brightest_objects['l'].iloc[i], 
        brightest_objects['b'].iloc[i], 
        facecolor='none', edgecolor=f'C{i}',
        s=circle_size,
        label=brightest_objects['Cluster identifier'].iloc[i])


ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2)  # Adjust the bbox_to_anchor to control the legend position
# Add grid lines to the celestial map
ax.grid()

Top of the page

Clustering¶

It's noteworthy that three of the brightest nearby open clusters align with three of the most prominent hotspots on our stellar density map. This alignment isn't surprising, as a density map naturally highlights concentrations of stars, including known star clusters.

Now, let's embark on the journey of exploring additional potential star clusters within our dataset. We'll employ a clustering algorithm known as DBSCAN. DBSCAN is a versatile algorithm capable of identifying clusters within data without prior knowledge of the number of clusters. Furthermore, it's designed to handle noise, such as random stars, ensuring that our focus remains on identifying genuine clusters without being perturbed by nearby stars.

In [42]:

Copied!

from sklearn.cluster import DBSCAN
from sklearn.cluster import DBSCAN

In [43]:

Copied!





x = gal_coords.l.wrap_at('180d').radian
y = gal_coords.b.radian
z = np.log10(df["Distance"])
data = np.array([x,y,z]).T

normalized_data = (data - data.mean(axis=0)) / data.std(axis=0)
x = gal_coords.l.wrap_at('180d').radian
y = gal_coords.b.radian
z = np.log10(df["Distance"])
data = np.array([x,y,z]).T

normalized_data = (data - data.mean(axis=0)) / data.std(axis=0)

In [44]:

Copied!





# # Define DBSCAN parameters
# eps_values = np.linspace(0.1,1, 5)  # Epsilon neighborhood size
# min_samples = 1000  # Minimum number of samples in a cluster

# for eps in eps_values:
#     # Apply DBSCAN
#     dbscan = DBSCAN(eps=eps, min_samples=min_samples)
#     labels = dbscan.fit_predict(data)

#     mean_l = []
#     mean_b = []
#     mean_dist = []
    
#     for lab in set(labels):
#         if lab == -1:
#             continue
#         print (lab)
#         lab_mask = labels == lab
#         sub_l = x[lab_mask]
#         sub_b = y[lab_mask]
#         sub_d = z[lab_mask]
#         mean_l.append(np.mean(sub_l))
#         mean_b.append(np.mean(sub_b))
#         mean_dist.append(np.mean(sub_d))

#     fig = plt.figure(figsize=(11,6))
#     # Create a subplot with an Aitoff projection for a celestial map
#     ax = plt.subplot(111, projection='aitoff')
    
#     # Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
#     # Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
#     # Use 'counts.T' to transpose the counts array for proper orientation
#     # Apply a PowerNorm to enhance visualization (power=0.7)
#     # Set zorder to control the layering of the plot
#     # Use the 'savanna' color map from cms module
#     h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)
    
#     # Add a color bar for the density map
#     cb = fig.colorbar(h, ax=ax, pad=0.02)
    
#     # Set labels for the x and y axes
#     ax.set_xlabel(r'Galactic Longitude')
#     ax.set_ylabel(r'Galactic Latitude')
    
#     # Set a label for the color bar
#     cb.set_label(r"$\mathrm{Stellar~density}$")
    
    
#     # # Set the number of brightest objects to overlay
#     # N = 3 # Adjust this to select the top N brightest objects
    
#     # # Sort the DataFrame by apparent magnitude (brightness)
#     # sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')
    
#     # # Select the top N brightest objects
#     # brightest_objects = sorted_data.head(N)
    
#     # # Create a scatter plot of galactic coordinates (l, b)
#     # circle_size = 140
#     # for i in range(len(brightest_objects)):
#     #     ax.scatter(
#     #         brightest_objects['l'].iloc[i], 
#     #         brightest_objects['b'].iloc[i], 
#     #         facecolor='none', edgecolor=f'C{i}',
#     #         s=circle_size,
#     #         label=brightest_objects['Cluster identifier'].iloc[i])
    
    
#     for i in range(len(mean_l)):
#         ax.scatter(
#             mean_l[i], 
#             mean_b[i], 
#             facecolor='none', edgecolor=f'C{i}',
#             s=circle_size,
#             label=f"Cluster {i+1}")
    
#     ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2)  # Adjust the bbox_to_anchor to control the legend position
#     # Add grid lines to the celestial map
#     ax.grid()
#     fig.savefig(f"eps_{eps}.png")
# # Define DBSCAN parameters
# eps_values = np.linspace(0.1,1, 5)  # Epsilon neighborhood size
# min_samples = 1000  # Minimum number of samples in a cluster

# for eps in eps_values:
#     # Apply DBSCAN
#     dbscan = DBSCAN(eps=eps, min_samples=min_samples)
#     labels = dbscan.fit_predict(data)

#     mean_l = []
#     mean_b = []
#     mean_dist = []
    
#     for lab in set(labels):
#         if lab == -1:
#             continue
#         print (lab)
#         lab_mask = labels == lab
#         sub_l = x[lab_mask]
#         sub_b = y[lab_mask]
#         sub_d = z[lab_mask]
#         mean_l.append(np.mean(sub_l))
#         mean_b.append(np.mean(sub_b))
#         mean_dist.append(np.mean(sub_d))

#     fig = plt.figure(figsize=(11,6))
#     # Create a subplot with an Aitoff projection for a celestial map
#     ax = plt.subplot(111, projection='aitoff')
    
#     # Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
#     # Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
#     # Use 'counts.T' to transpose the counts array for proper orientation
#     # Apply a PowerNorm to enhance visualization (power=0.7)
#     # Set zorder to control the layering of the plot
#     # Use the 'savanna' color map from cms module
#     h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)
    
#     # Add a color bar for the density map
#     cb = fig.colorbar(h, ax=ax, pad=0.02)
    
#     # Set labels for the x and y axes
#     ax.set_xlabel(r'Galactic Longitude')
#     ax.set_ylabel(r'Galactic Latitude')
    
#     # Set a label for the color bar
#     cb.set_label(r"$\mathrm{Stellar~density}$")
    
    
#     # # Set the number of brightest objects to overlay
#     # N = 3 # Adjust this to select the top N brightest objects
    
#     # # Sort the DataFrame by apparent magnitude (brightness)
#     # sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')
    
#     # # Select the top N brightest objects
#     # brightest_objects = sorted_data.head(N)
    
#     # # Create a scatter plot of galactic coordinates (l, b)
#     # circle_size = 140
#     # for i in range(len(brightest_objects)):
#     #     ax.scatter(
#     #         brightest_objects['l'].iloc[i], 
#     #         brightest_objects['b'].iloc[i], 
#     #         facecolor='none', edgecolor=f'C{i}',
#     #         s=circle_size,
#     #         label=brightest_objects['Cluster identifier'].iloc[i])
    
    
#     for i in range(len(mean_l)):
#         ax.scatter(
#             mean_l[i], 
#             mean_b[i], 
#             facecolor='none', edgecolor=f'C{i}',
#             s=circle_size,
#             label=f"Cluster {i+1}")
    
#     ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2)  # Adjust the bbox_to_anchor to control the legend position
#     # Add grid lines to the celestial map
#     ax.grid()
#     fig.savefig(f"eps_{eps}.png")

In [ ]: