Table of contents¶
- Introduction of Gaia
- What is Astroquery
- Querying databases with Astroquery
- Working with Pandas Dataframes
- Calculating the Distance to Stars
- Converting units with Astropy
- Converting from apparent to absolute magnitude
- Exploratory Data Analysis with Pandas
- Understanding Correlations: Unraveling Relationships in Data
- Making Publication Quality Plots
- Understanding Color Maps and Accessibility
- The Hertzsprung-Russell Diagram
- Mapping Stellar Density in Galactic Coordinates
- Open Clusters
- Clustering
Accessing and Manipulating Astronomical Data Using Python¶
In this session, we will learn how to use the Python package Astroquery to access astronomical data. We will then utilize Astropy to simplify various calculations and unit conversions.
Our primary focus will be on data obtained by the Gaia mission. Gaia is a European Space Agency mission with the goal of creating a 3D map of the Milky Way. Gaia's mission is to accurately measure the positions of over 1 billion stars.
Image: Gaia - Exploring the Multi-dimensional Milky Way
We will be working with data collected using Gaia's Blue and Red Photometers (BP & RP) and making use of parallax measurements to estimate the distances to celestial objects.
The use of Gaia data for a HR diagram is based off of a blog post by Vlas Sokolov.
import astroquery
What is Astroquery? ¶
Astroquery is a Python package that streamlines the process of querying astronomical databases and catalogs. It offers a wide range of options for accessing astronomical data. Some of the available options include:
- Querying databases of celestial objects.
- Retrieving data from sky surveys and telescopes.
- Accessing information from observatories and space missions.
Image: A selection of options provided by Astroquery
For our tutorial, we will focus on using Astroquery to work with data from the Gaia mission. To get started, let's import the Gaia module:
from astroquery.gaia import Gaia
In astronomy, there are often vast databases filled with valuable information, and sometimes, we might not know exactly what we're looking for. That's where the ability to explore and discover available database tables becomes essential.
tables = Gaia.load_tables()
INFO: Retrieving tables... [astroquery.utils.tap.core] INFO: Parsing tables... [astroquery.utils.tap.core] INFO: Done. [astroquery.utils.tap.core]
for i, tab in enumerate(tables):
print (i, tab.name)
0 external.apassdr9 1 external.gaiadr2_astrophysical_parameters 2 external.gaiadr2_geometric_distance 3 external.gaiaedr3_distance 4 external.gaiaedr3_gcns_main_1 5 external.gaiaedr3_gcns_rejected_1 6 external.gaiaedr3_spurious 7 external.galex_ais 8 external.ravedr5_com 9 external.ravedr5_dr5 10 external.ravedr5_gra 11 external.ravedr5_on 12 external.ravedr6 13 external.sdssdr13_photoprimary 14 external.skymapperdr1_master 15 external.skymapperdr2_master 16 external.tmass_xsc 17 gaiadr1.aux_qso_icrf2_match 18 gaiadr1.ext_phot_zero_point 19 gaiadr1.allwise_best_neighbour 20 gaiadr1.allwise_neighbourhood 21 gaiadr1.gsc23_best_neighbour 22 gaiadr1.gsc23_neighbourhood 23 gaiadr1.ppmxl_best_neighbour 24 gaiadr1.ppmxl_neighbourhood 25 gaiadr1.sdss_dr9_best_neighbour 26 gaiadr1.sdss_dr9_neighbourhood 27 gaiadr1.tmass_best_neighbour 28 gaiadr1.tmass_neighbourhood 29 gaiadr1.ucac4_best_neighbour 30 gaiadr1.ucac4_neighbourhood 31 gaiadr1.urat1_best_neighbour 32 gaiadr1.urat1_neighbourhood 33 gaiadr1.cepheid 34 gaiadr1.phot_variable_time_series_gfov 35 gaiadr1.phot_variable_time_series_gfov_statistical_parameters 36 gaiadr1.rrlyrae 37 gaiadr1.variable_summary 38 gaiadr1.allwise_original_valid 39 gaiadr1.gsc23_original_valid 40 gaiadr1.ppmxl_original_valid 41 gaiadr1.sdssdr9_original_valid 42 gaiadr1.tmass_original_valid 43 gaiadr1.ucac4_original_valid 44 gaiadr1.urat1_original_valid 45 gaiadr1.gaia_source 46 gaiadr1.tgas_source 47 gaiadr2.aux_allwise_agn_gdr2_cross_id 48 gaiadr2.aux_iers_gdr2_cross_id 49 gaiadr2.aux_sso_orbit_residuals 50 gaiadr2.aux_sso_orbits 51 gaiadr2.dr1_neighbourhood 52 gaiadr2.allwise_best_neighbour 53 gaiadr2.allwise_neighbourhood 54 gaiadr2.apassdr9_best_neighbour 55 gaiadr2.apassdr9_neighbourhood 56 gaiadr2.gsc23_best_neighbour 57 gaiadr2.gsc23_neighbourhood 58 gaiadr2.hipparcos2_best_neighbour 59 gaiadr2.hipparcos2_neighbourhood 60 gaiadr2.panstarrs1_best_neighbour 61 gaiadr2.panstarrs1_neighbourhood 62 gaiadr2.ppmxl_best_neighbour 63 gaiadr2.ppmxl_neighbourhood 64 gaiadr2.ravedr5_best_neighbour 65 gaiadr2.ravedr5_neighbourhood 66 gaiadr2.sdssdr9_best_neighbour 67 gaiadr2.sdssdr9_neighbourhood 68 gaiadr2.tmass_best_neighbour 69 gaiadr2.tmass_neighbourhood 70 gaiadr2.tycho2_best_neighbour 71 gaiadr2.tycho2_neighbourhood 72 gaiadr2.urat1_best_neighbour 73 gaiadr2.urat1_neighbourhood 74 gaiadr2.sso_observation 75 gaiadr2.sso_source 76 gaiadr2.vari_cepheid 77 gaiadr2.vari_classifier_class_definition 78 gaiadr2.vari_classifier_definition 79 gaiadr2.vari_classifier_result 80 gaiadr2.vari_long_period_variable 81 gaiadr2.vari_rotation_modulation 82 gaiadr2.vari_rrlyrae 83 gaiadr2.vari_short_timescale 84 gaiadr2.vari_time_series_statistics 85 gaiadr2.panstarrs1_original_valid 86 gaiadr2.gaia_source 87 gaiadr2.ruwe 88 gaiadr3.gaia_source 89 gaiadr3.gaia_source_lite 90 gaiadr3.astrophysical_parameters 91 gaiadr3.astrophysical_parameters_supp 92 gaiadr3.oa_neuron_information 93 gaiadr3.oa_neuron_xp_spectra 94 gaiadr3.total_galactic_extinction_map 95 gaiadr3.total_galactic_extinction_map_opt 96 gaiadr3.commanded_scan_law 97 gaiadr3.allwise_best_neighbour 98 gaiadr3.allwise_neighbourhood 99 gaiadr3.apassdr9_best_neighbour 100 gaiadr3.apassdr9_join 101 gaiadr3.apassdr9_neighbourhood 102 gaiadr3.dr2_neighbourhood 103 gaiadr3.gsc23_best_neighbour 104 gaiadr3.gsc23_join 105 gaiadr3.gsc23_neighbourhood 106 gaiadr3.hipparcos2_best_neighbour 107 gaiadr3.hipparcos2_neighbourhood 108 gaiadr3.panstarrs1_best_neighbour 109 gaiadr3.panstarrs1_join 110 gaiadr3.panstarrs1_neighbourhood 111 gaiadr3.ravedr5_best_neighbour 112 gaiadr3.ravedr5_join 113 gaiadr3.ravedr5_neighbourhood 114 gaiadr3.ravedr6_best_neighbour 115 gaiadr3.ravedr6_join 116 gaiadr3.ravedr6_neighbourhood 117 gaiadr3.sdssdr13_best_neighbour 118 gaiadr3.sdssdr13_join 119 gaiadr3.sdssdr13_neighbourhood 120 gaiadr3.skymapperdr2_best_neighbour 121 gaiadr3.skymapperdr2_join 122 gaiadr3.skymapperdr2_neighbourhood 123 gaiadr3.tmass_psc_xsc_best_neighbour 124 gaiadr3.tmass_psc_xsc_join 125 gaiadr3.tmass_psc_xsc_neighbourhood 126 gaiadr3.tycho2tdsc_merge_best_neighbour 127 gaiadr3.tycho2tdsc_merge_neighbourhood 128 gaiadr3.urat1_best_neighbour 129 gaiadr3.urat1_neighbourhood 130 gaiadr3.galaxy_candidates 131 gaiadr3.galaxy_catalogue_name 132 gaiadr3.qso_candidates 133 gaiadr3.qso_catalogue_name 134 gaiadr3.nss_acceleration_astro 135 gaiadr3.nss_non_linear_spectro 136 gaiadr3.nss_two_body_orbit 137 gaiadr3.nss_vim_fl 138 gaiadr3.binary_masses 139 gaiadr3.chemical_cartography 140 gaiadr3.gold_sample_carbon_stars 141 gaiadr3.gold_sample_fgkm_stars 142 gaiadr3.gold_sample_oba_stars 143 gaiadr3.gold_sample_solar_analogues 144 gaiadr3.gold_sample_spss 145 gaiadr3.gold_sample_ucd 146 gaiadr3.sso_orbits 147 gaiadr3.synthetic_photometry_gspc 148 gaiadr3.vari_spurious_signals 149 gaiadr3.agn_cross_id 150 gaiadr3.frame_rotator_source 151 gaiadr3.gaia_crf3_xm 152 gaiadr3.alerts_mixedin_sourceids 153 gaiadr3.science_alerts 154 gaiadr3.gaia_source_simulation 155 gaiadr3.gaia_universe_model 156 gaiadr3.sso_observation 157 gaiadr3.sso_reflectance_spectrum 158 gaiadr3.sso_source 159 gaiadr3.xp_summary 160 gaiadr3.vari_agn 161 gaiadr3.vari_cepheid 162 gaiadr3.vari_classifier_class_definition 163 gaiadr3.vari_classifier_definition 164 gaiadr3.vari_classifier_result 165 gaiadr3.vari_compact_companion 166 gaiadr3.vari_eclipsing_binary 167 gaiadr3.vari_epoch_radial_velocity 168 gaiadr3.vari_long_period_variable 169 gaiadr3.vari_microlensing 170 gaiadr3.vari_ms_oscillator 171 gaiadr3.vari_planetary_transit 172 gaiadr3.vari_planetary_transit_13june2022 173 gaiadr3.vari_rad_vel_statistics 174 gaiadr3.vari_rotation_modulation 175 gaiadr3.vari_rrlyrae 176 gaiadr3.vari_short_timescale 177 gaiadr3.vari_summary 178 gaiadr3.tycho2tdsc_merge 179 gaiaedr3.gaia_source 180 gaiaedr3.agn_cross_id 181 gaiaedr3.commanded_scan_law 182 gaiaedr3.dr2_neighbourhood 183 gaiaedr3.frame_rotator_source 184 gaiaedr3.allwise_best_neighbour 185 gaiaedr3.allwise_neighbourhood 186 gaiaedr3.apassdr9_best_neighbour 187 gaiaedr3.apassdr9_join 188 gaiaedr3.apassdr9_neighbourhood 189 gaiaedr3.gsc23_best_neighbour 190 gaiaedr3.gsc23_join 191 gaiaedr3.gsc23_neighbourhood 192 gaiaedr3.hipparcos2_best_neighbour 193 gaiaedr3.hipparcos2_neighbourhood 194 gaiaedr3.panstarrs1_best_neighbour 195 gaiaedr3.panstarrs1_join 196 gaiaedr3.panstarrs1_neighbourhood 197 gaiaedr3.ravedr5_best_neighbour 198 gaiaedr3.ravedr5_join 199 gaiaedr3.ravedr5_neighbourhood 200 gaiaedr3.sdssdr13_best_neighbour 201 gaiaedr3.sdssdr13_join 202 gaiaedr3.sdssdr13_neighbourhood 203 gaiaedr3.skymapperdr2_best_neighbour 204 gaiaedr3.skymapperdr2_join 205 gaiaedr3.skymapperdr2_neighbourhood 206 gaiaedr3.tmass_psc_xsc_best_neighbour 207 gaiaedr3.tmass_psc_xsc_join 208 gaiaedr3.tmass_psc_xsc_neighbourhood 209 gaiaedr3.tycho2tdsc_merge_best_neighbour 210 gaiaedr3.tycho2tdsc_merge_neighbourhood 211 gaiaedr3.urat1_best_neighbour 212 gaiaedr3.urat1_neighbourhood 213 gaiaedr3.gaia_source_simulation 214 gaiaedr3.gaia_universe_model 215 gaiaedr3.tycho2tdsc_merge 216 gaiafpr.crowded_field_source 217 gaiafpr.lens_candidates 218 gaiafpr.lens_catalogue_name 219 gaiafpr.lens_observation 220 gaiafpr.lens_outlier 221 gaiafpr.sso_observation 222 gaiafpr.sso_source 223 gaiafpr.interstellar_medium_params 224 gaiafpr.interstellar_medium_spectra 225 gaiafpr.vari_epoch_radial_velocity 226 gaiafpr.vari_long_period_variable 227 gaiafpr.vari_rad_vel_statistics 228 public.hipparcos 229 public.hipparcos_newreduction 230 public.hubble_sc 231 public.igsl_source 232 public.igsl_source_catalog_ids 233 public.tycho2 234 public.dual 235 tap_config.coord_sys 236 tap_config.properties 237 tap_schema.columns 238 tap_schema.key_columns 239 tap_schema.keys 240 tap_schema.schemas 241 tap_schema.tables
We're interested in the "gaiadr3.gaia_source" table, this is entry 88. We can also see what columns are included in this table:
cols = tables[88].columns
for col in cols:
print (col.name)
solution_id designation source_id random_index ref_epoch ra ra_error dec dec_error parallax parallax_error parallax_over_error pm pmra pmra_error pmdec pmdec_error ra_dec_corr ra_parallax_corr ra_pmra_corr ra_pmdec_corr dec_parallax_corr dec_pmra_corr dec_pmdec_corr parallax_pmra_corr parallax_pmdec_corr pmra_pmdec_corr astrometric_n_obs_al astrometric_n_obs_ac astrometric_n_good_obs_al astrometric_n_bad_obs_al astrometric_gof_al astrometric_chi2_al astrometric_excess_noise astrometric_excess_noise_sig astrometric_params_solved astrometric_primary_flag nu_eff_used_in_astrometry pseudocolour pseudocolour_error ra_pseudocolour_corr dec_pseudocolour_corr parallax_pseudocolour_corr pmra_pseudocolour_corr pmdec_pseudocolour_corr astrometric_matched_transits visibility_periods_used astrometric_sigma5d_max matched_transits new_matched_transits matched_transits_removed ipd_gof_harmonic_amplitude ipd_gof_harmonic_phase ipd_frac_multi_peak ipd_frac_odd_win ruwe scan_direction_strength_k1 scan_direction_strength_k2 scan_direction_strength_k3 scan_direction_strength_k4 scan_direction_mean_k1 scan_direction_mean_k2 scan_direction_mean_k3 scan_direction_mean_k4 duplicated_source phot_g_n_obs phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_flux_over_error phot_g_mean_mag phot_bp_n_obs phot_bp_mean_flux phot_bp_mean_flux_error phot_bp_mean_flux_over_error phot_bp_mean_mag phot_rp_n_obs phot_rp_mean_flux phot_rp_mean_flux_error phot_rp_mean_flux_over_error phot_rp_mean_mag phot_bp_rp_excess_factor phot_bp_n_contaminated_transits phot_bp_n_blended_transits phot_rp_n_contaminated_transits phot_rp_n_blended_transits phot_proc_mode bp_rp bp_g g_rp radial_velocity radial_velocity_error rv_method_used rv_nb_transits rv_nb_deblended_transits rv_visibility_periods_used rv_expected_sig_to_noise rv_renormalised_gof rv_chisq_pvalue rv_time_duration rv_amplitude_robust rv_template_teff rv_template_logg rv_template_fe_h rv_atm_param_origin vbroad vbroad_error vbroad_nb_transits grvs_mag grvs_mag_error grvs_mag_nb_transits rvs_spec_sig_to_noise phot_variable_flag l b ecl_lon ecl_lat in_qso_candidates in_galaxy_candidates non_single_star has_xp_continuous has_xp_sampled has_rvs has_epoch_photometry has_epoch_rv has_mcmc_gspphot has_mcmc_msc in_andromeda_survey classprob_dsc_combmod_quasar classprob_dsc_combmod_galaxy classprob_dsc_combmod_star teff_gspphot teff_gspphot_lower teff_gspphot_upper logg_gspphot logg_gspphot_lower logg_gspphot_upper mh_gspphot mh_gspphot_lower mh_gspphot_upper distance_gspphot distance_gspphot_lower distance_gspphot_upper azero_gspphot azero_gspphot_lower azero_gspphot_upper ag_gspphot ag_gspphot_lower ag_gspphot_upper ebpminrp_gspphot ebpminrp_gspphot_lower ebpminrp_gspphot_upper libname_gspphot
What Data are we interseted in?¶
We're interested in extracting specific data fields from the database, which will help us gain insights into celestial objects. Here's what we want to retrieve:
Source Location:
ra
(Right Ascension)dec
(Declination)
Color-Magnitude Diagram:
bp_rp
(Blue-Red Color), which represents the difference between the Blue and Red band magnitudes.
Brightness of the Source:
phot_g_mean_mag
, which is the average apparent G-band magnitude.
Distance Estimation:
parallax
, which measures the apparent motion of the star due to the Earth's motion.parallax_over_error
, calculated as the ratio of parallax to the uncertainty on the parallax.
Image: Measuring Stellar Distances by Parallax
By combining these fields, we will construct a comprehensive query that will help us obtain the data we need. Additionally, it's a good practice to save the query results to a CSV file for future reference, so we won't need to rerun the query.
# import pandas as pd
# query_size = 1000000 # Number of stars we want to get
# distance = 200 # Distance (in pc) out to which we will query
# job = Gaia.launch_job_async("select top {}".format(query_size)+
# " ra, dec, parallax, parallax_over_error, " # Getting source location and parallax
# " bp_rp, phot_g_mean_mag " # Getting color and magnitude measurements
# " from gaiadr3.gaia_source" # Selecting the data source
# # All of these are data quality checks
# " where parallax_over_error > 10"
# " and visibility_periods_used > 8"
# " and phot_g_mean_flux_over_error > 50"
# " and phot_bp_mean_flux_over_error > 20"
# " and phot_rp_mean_flux_over_error > 20"
# " and phot_bp_rp_excess_factor <"
# " 1.3+0.06*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
# " and phot_bp_rp_excess_factor >"
# " 1.0+0.015*power(phot_bp_mean_mag-phot_rp_mean_mag,2)"
# " and astrometric_chi2_al/(astrometric_n_good_obs_al-5)<"
# "1.44*greatest(1,exp(-0.4*(phot_g_mean_mag-19.5)))"
# # Filtering on distance
# +" and 1000/parallax <= {}".format(distance))
# r = job.get_results()
# # Convert to pandas
# df = r.to_pandas()
# # Save to a csv
# df.to_csv("gaia3.csv")
What is pandas¶
Pandas is a popular open-source data manipulation and analysis library for Python, providing versatile data structures and data analysis tools. A pandas dataframe
is a two-dimensional, labeled data structure that resembles a table and is used for data manipulation, analysis, and exploration in Python.
import pandas as pd
df = pd.read_csv("gaia3.csv")
df.head()
Unnamed: 0 | ra | dec | parallax | parallax_over_error | bp_rp | phot_g_mean_mag | pm | pmra | pmra_error | pmdec | pmdec_error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 251.492554 | -50.749533 | 6.383756 | 103.153015 | 0.795505 | 10.546730 | 32.093870 | 31.380714 | 0.059630 | -6.728077 | 0.045104 |
1 | 1 | 251.589977 | -50.638635 | 13.398257 | 750.623540 | 2.165551 | 13.470652 | 98.008840 | -49.358441 | 0.017024 | -84.672770 | 0.012676 |
2 | 2 | 251.093139 | -51.007072 | 6.106724 | 448.880800 | 1.269135 | 12.529914 | 9.986012 | 5.650984 | 0.013578 | 8.233275 | 0.009894 |
3 | 3 | 251.746461 | -51.201058 | 7.481942 | 87.461440 | 2.588948 | 16.969418 | 147.466350 | 10.335827 | 0.085701 | -147.103700 | 0.056509 |
4 | 4 | 250.188154 | -51.273050 | 8.606533 | 163.012250 | 2.755577 | 16.339110 | 5.618203 | -1.305274 | 0.064197 | 5.464473 | 0.043635 |
df.describe()
Unnamed: 0 | ra | dec | parallax | parallax_over_error | bp_rp | phot_g_mean_mag | pm | pmra | pmra_error | pmdec | pmdec_error | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 | 1.433622e+06 |
mean | 7.168105e+05 | 1.826020e+02 | -5.932620e-01 | 7.951749e+00 | 2.770547e+02 | 2.157298e+00 | 1.482734e+01 | 6.438331e+01 | -1.552588e+00 | 5.269976e-02 | -2.148792e+01 | 4.799410e-02 |
std | 4.138512e+05 | 1.039087e+02 | 3.972358e+01 | 4.958763e+00 | 2.882675e+02 | 8.357065e-01 | 2.650869e+00 | 6.995515e+01 | 6.803307e+01 | 4.709206e-02 | 6.281956e+01 | 4.192370e-02 |
min | 0.000000e+00 | 1.786275e-04 | -8.991512e+01 | 5.000001e+00 | 1.001268e+01 | -6.166201e-01 | 2.777715e+00 | 1.365079e-02 | -4.406469e+03 | 4.317274e-03 | -5.817800e+03 | 5.807723e-03 |
25% | 3.584052e+05 | 9.176268e+01 | -3.176953e+01 | 5.610729e+00 | 9.643445e+01 | 1.533804e+00 | 1.329365e+01 | 2.768513e+01 | -2.899033e+01 | 2.056515e-02 | -4.296366e+01 | 1.857064e-02 |
50% | 7.168105e+05 | 1.847296e+02 | -6.636379e-01 | 6.571446e+00 | 1.894774e+02 | 2.433110e+00 | 1.555917e+01 | 4.784237e+01 | -1.827989e+00 | 3.864415e-02 | -1.609957e+01 | 3.580896e-02 |
75% | 1.075216e+06 | 2.738694e+02 | 3.016465e+01 | 8.494259e+00 | 3.672869e+02 | 2.789234e+00 | 1.682265e+01 | 7.966884e+01 | 2.575548e+01 | 7.004569e-02 | 6.412341e+00 | 6.389684e-02 |
max | 1.433621e+06 | 3.599994e+02 | 8.994934e+01 | 7.680665e+02 | 1.540048e+04 | 5.435446e+00 | 2.058442e+01 | 1.039335e+04 | 6.765995e+03 | 2.072775e+00 | 1.036239e+04 | 2.114847e+00 |
How Do We Calculate the Distance to Stars?¶
Determining the distance to stars in astronomy involves trigonometry. We can calculate the distance ($d$) to a star using the formula:
$$d = \frac{1}{p},$$
Here, $d$ is the distance in a unit called "parsecs," and "$p$" is the parallax angle in arcseconds. It's important to note that the Gaia data records the parallax angle in units of milliarcseconds, so we need to perform a unit conversion. Top of the page
How Do We Convert Units?¶
To handle this unit conversion, we'll utilize astropy.units
. To further simplify the process, we'll leverage Astropy not only for unit conversion but also for distance calculation. In addition, we'll use NumPy arrays to perform mathematical operations on large lists or arrays of numbers.
import numpy as np
from astropy import units as u
from astropy.coordinates import Distance
def get_distance(parallax):
"""
Calculate the distance to a star using its parallax angle.
Parameters:
-----------
parallax : float
Parallax angle in milliarcseconds (mas).
Returns:
--------
astropy.coordinates.Distance
The distance to the star in parsecs (pc).
Example:
--------
>>> parallax = 12.34 # Parallax angle in mas
>>> distance = get_distance(parallax)
>>> print(distance)
81.15 pc
"""
# Convert parallax angle to milliarcseconds (mas)
p = parallax * u.mas
# Calculate the distance using astropy's Distance module
distance = Distance(parallax=p)
return distance
star_distances = get_distance( np.array(df["parallax"]) )
min_distance = star_distances.min()
max_distance = star_distances.max()
print (min_distance, max_distance)
1.3019705311704333 pc 199.9999448214332 pc
So, let's recap our progress:
- We utilized Astroquery to retrieve data from the Gaia mission, obtaining essential information about stars.
- With the help of Astropy, we handled the conversion of parallax angles from milliarcseconds to arcseconds and subsequently calculated the distances to these stars in parsecs.
The outputted distances are indicated with pc
after the numeric value, signifying the unit of measurement. Astropy makes it easy to work with these units and perform conversions as needed.
print (f'Distance in light years {min_distance.to("lightyear"):0.2f}')
print (f'Distance in meters {min_distance.to("m"):0.2e}')
# If we want to use imperial units we must enable it
# (because we should always use metric!)
u.imperial.enable()
print (f'Distance in inches {min_distance.to("inch"):0.2e}')
Distance in light years 4.25 lyr Distance in meters 4.02e+16 m Distance in inches 1.58e+18 inch
Converting between units with Astropy is as simple as using .to("unit")
. This not only saves time but also minimizes the risk of human errors when performing unit conversions.
Converting from Apparent Magnitude to Absolute Magnitude¶
When assessing the brightness of an astronomical object, it's crucial to consider the impact of distance on our measurements.
The brightness of an object appears different depending on its proximity to us; it will appear brighter if it's closer and dimmer if it's farther away. In astronomy, we commonly use "Apparent Magnitude" to describe how bright an object appears from our vantage point and "Absolute Magnitude" to standardize brightness by accounting for the object's distance. Absolute Magnitude quantifies how bright an object would appear if it were located precisely 10 parsecs away.
Image: Absolute Magnitude Concept
By knowing the distance in parsecs, we can convert from apparent to absolute magnitude using the formula:
$$M = m - 5 \log_{10}(d_{pc}) + 5,$$
Where:
- $M$ is the Absolute Magnitude.
- $m$ is the Apparent Magnitude.
- $d_{pc}$ is the distance in parsecs.
This formula enables us to correct for the influence of distance when comparing the brightness of celestial objects.
def calculate_absolute_magnitude(apparent_magnitude, distance):
"""
Calculate the Absolute Magnitude of an astronomical object given its Apparent Magnitude and distance.
Parameters:
-----------
apparent_magnitude : float
The Apparent Magnitude of the object.
distance : astropy.units.Quantity
The distance to the object in parsecs (pc).
Returns:
--------
float
The Absolute Magnitude of the object.
Example:
--------
>>> apparent_mag = 5.0 # Apparent Magnitude
>>> distance = 10 * u.pc # Distance in parsecs
>>> absolute_mag = calculate_absolute_magnitude(apparent_mag, distance)
>>> print(absolute_mag)
0.0
"""
# Convert distance to parsecs
dist_pc = distance.to(u.pc)
# Calculate the Absolute Magnitude using the formula
M = apparent_magnitude - 5 * np.log10(dist_pc.value) + 5
return M
AbsM = calculate_absolute_magnitude(np.array(df["phot_g_mean_mag"]), star_distances)
Let's add the values we've determinded back into out data frame
df["AbsM"] = AbsM
df["Distance"] = star_distances.value
df.head()
Unnamed: 0 | ra | dec | parallax | parallax_over_error | bp_rp | phot_g_mean_mag | pm | pmra | pmra_error | pmdec | pmdec_error | AbsM | Distance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 251.492554 | -50.749533 | 6.383756 | 103.153015 | 0.795505 | 10.546730 | 32.093870 | 31.380714 | 0.059630 | -6.728077 | 0.045104 | 4.572111 | 156.647594 |
1 | 1 | 251.589977 | -50.638635 | 13.398257 | 750.623540 | 2.165551 | 13.470652 | 98.008840 | -49.358441 | 0.017024 | -84.672770 | 0.012676 | 9.105894 | 74.636572 |
2 | 2 | 251.093139 | -51.007072 | 6.106724 | 448.880800 | 1.269135 | 12.529914 | 9.986012 | 5.650984 | 0.013578 | 8.233275 | 0.009894 | 6.458955 | 163.753924 |
3 | 3 | 251.746461 | -51.201058 | 7.481942 | 87.461440 | 2.588948 | 16.969418 | 147.466350 | 10.335827 | 0.085701 | -147.103700 | 0.056509 | 11.339490 | 133.655144 |
4 | 4 | 250.188154 | -51.273050 | 8.606533 | 163.012250 | 2.755577 | 16.339110 | 5.618203 | -1.305274 | 0.064197 | 5.464473 | 0.043635 | 11.013251 | 116.190808 |
Exploratory Data Analysis with Pandas¶
Explore your dataset through the lens of Exploratory Data Analysis (EDA). EDA is all about unveiling hidden patterns and stories within your data. It's your detective work to discover insights, detect anomalies, and gain a deeper understanding of your dataset.
We'll utilize pandas for data manipulation and Matplotlib/Seaborn for visualization. In this section, we'll delve into techniques for exploring your dataset, from basic properties to captivating visualizations. Welcome to the world of EDA!
Some methods of EDA:
Descriptive Statistics:
- Summary statistics
- Count of missing values
Data Distribution:
- Histograms
- Box plots
- Probability density functions (PDFs)
- Skewness and kurtosis
Data Relationships:
- Correlation matrix
- Scatter plots
- Heatmaps
- Pair plots
Categorical Data:
- Frequency counts
- Bar plots
- Cross-tabulations
Data Quality:
- Outliers
- Missing data
- Data consistency
Dimensionality Reduction:
- Principal Component Analysis (PCA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
Data Visualization:
- Various plots and charts
import matplotlib.pyplot as plt
# Get the columns to plot (excluding the first column, possibly an index or label)
cols = df.columns[1:]
# Calculate the number of rows and columns for subplots
nrows = 2
ncols = (len(cols) + 1) // 2
# Create a figure with subplots
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))
# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
# Check if there are more columns to plot
if i < len(cols):
# Plot a histogram for the current column
df[cols[i]].plot(kind='hist', ax=ax)
ax.set_title(cols[i])
# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()
seaborn also has a lof of useful plotting capbabilities
import seaborn as sns
# Create a figure with subplots using Seaborn
fig, axs = plt.subplots(nrows, ncols, figsize=(24, 6))
# Flatten the axs array to loop through the subplots
for i, ax in enumerate(axs.ravel()):
# Check if there are more columns to plot
if i < len(cols):
# Use Seaborn to create a histogram for the current column
sns.histplot(data=df[:100000], x=cols[i], ax=ax, kde=True) # Set kde=True for a kernel density estimate
ax.set_title(cols[i])
# Adjust subplot layout for better spacing
plt.tight_layout()
plt.show()
Understanding Correlations: Unraveling Relationships in Data¶
In the realm of data analysis, uncovering the relationships and dependencies between different variables is a fundamental task. Correlations are the keys that unlock the doors to understanding the hidden patterns, associations, and causations within datasets. These intriguing statistical relationships offer a wealth of insights, guiding our understanding of how changes in one variable can impact another.
Correlations enable us to answer questions such as:
- Does an increase in one variable lead to a corresponding increase or decrease in another?
- Are two variables completely independent, or is there a subtle connection waiting to be discovered?
- Which factors are most influential in affecting a specific outcome?
With the correlation between two parameters (A and B) we can say:
- 1, when A increases so does B. These are said to be "correalated"
- 0, when A increases we see no change in B. These are said to be "uncorrelated"
- -1, when A increase B decreases. These are said to be "anit-correalted"
In the case of 1 and -1, we have 100% correlation and anti-correlation. We can have any value between these two numbers
# Calculate the correlation matrix
cormat = df[cols].corr()
# Set the colormap and color range
cmap = sns.color_palette("coolwarm", as_cmap=True) # You can change the colormap as needed
vmin, vmax = -1, 1 # Set the range for the color scale
# Create the correlation heatmap
plt.figure(figsize=(10, 8)) # Set the figure size
sns.heatmap(cormat, cmap=cmap, vmin=vmin, vmax=vmax, annot=True, fmt=".2f") # You can add annotations with fmt=".2f"
# Set plot title
plt.title("Correlation Heatmap")
# Show the plot
plt.show()
With just a few lines of code, we can quickly discern significant insights from a correlation plot. We observe a clear correlation between the distance (Distance
) and the brightness, as indicated by the phot_g_mean_mag
value. However, this correlation disappears when we convert the brightness to Absolute Magnitude (AbsM
).
Making Publication Quality Plots¶
Now, let's proceed with building the color-magnitude diagram by plotting bp_rp
against AbsM
.
Creating a high-quality scientific plot is essential for effectively conveying your data. A well-crafted plot should be designed to clearly communicate the key information to the viewer. It should include the following elements to enhance data understanding:
Labelled Axes: Clearly annotated x and y axes provide context and meaning to the data.
Descriptive Title: A meaningful title summarizing the plot's content is crucial for quick comprehension.
Grid Lines: Grid lines assist in reading data values accurately and help viewers align data points.
Sensible Scaling: Appropriate axis scaling ensures that data is presented in a visually understandable manner without distortion.
Legend (if applicable): If multiple data series are displayed, a legend should explain their meaning.
Colorbar (if applicable): When using color to represent data values, a colorbar helps viewers interpret the color-coding.
By incorporating these elements, your scientific plot becomes a powerful tool for sharing insights and findings with your audience.
ax = plt.subplot()
# Lets plot each point
# alpha = 0.05, make the points transparanet (5%)
# s = 1, make the points small
# color = "k" make the points black
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k')
# Invert the y axis, smaller magnitude -> Brighter star
ax.invert_yaxis()
While the scatter plot does reveal some underlying structure, it may not provide a highly informative representation of the data. The plot becomes saturated in high-density regions, making it challenging to discern details. To address this, we can use a histogram to visualize the density distribution more effectively.
ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)
# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=100, cmap="jet")
ax.invert_yaxis()
This looks better, but we've made a critical error... Top of the page
Understanding Color Maps and Accessibility¶
Color maps play a crucial role in data visualization, helping us represent data through a spectrum of colors. However, not all color maps are created equal, and some, like the "jet" color map, have raised concerns due to their perceptual issues.
The "jet" color map, characterized by its vibrant and rainbow-like appearance, is notorious for its drawbacks, including:
Perceptual Uniformity: It fails to maintain a perceptual uniformity, which means that equal steps in data values may not appear equally spaced in the color map, leading to misleading visualizations.
Color Saturation: The "jet" color map can saturate at high and low ends, making it challenging to distinguish different data values in these regions.
Colorblind Accessibility: Accessibility is a vital concern. Many individuals with color vision deficiencies, commonly referred to as colorblindness, may find it difficult to interpret data represented using color maps like "jet."
To create effective and accessible visualizations, it's essential to consider the choice of color map and its impact on how data is perceived, especially in cases where colorblind individuals may interact with the visualizations.
What information do you get from the above plot?
CMasheris a valuable Python package offering accessible scientific colormaps. All the color maps included in CMasher are designed to be colorblind-friendly and exhibit a linear increase in brightness."
import cmasher as cms
ax = plt.subplot()
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)
# Create a 2D histogram
# bins = 100, require 100 bins in the x and y directions
# cmin = 10, require at least 10 entries in a bin to be shown
# cmap = "jet", use the "jet" color map
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50, cmap=cms.toxic)
ax.invert_yaxis()
This looks better, but we still need to clean things up. The plot looks good, but a reader has no idea what is been plotted.
# use colors to get a log color scale
from matplotlib import colors
# Create a subplot
ax = plt.subplot()
# Scatter plot of data points with customizations
ax.scatter(df["bp_rp"], df["AbsM"], alpha=0.05, s=1, color='k', zorder=0)
# Create a 2D histogram (density plot) with a log-normal color scale
# - bins: Use 100 bins in both the x and y directions
# - cmin: Require at least 10 entries in a bin to be shown
# - cmap: Use the "toxic" color map from the cms module
# - norm: Utilize a logarithmic color scale to better visualize data distribution
h = ax.hist2d(df["bp_rp"], df["AbsM"], bins=100, cmin=50, norm=colors.LogNorm(), zorder=0.5, cmap=cms.toxic)
# Invert the y-axis to match the typical appearance of a color-magnitude diagram
ax.invert_yaxis()
# Add a color bar for the density plot
cb = fig.colorbar(h[3], ax=ax, pad=0.02)
# Set labels for the x and y axes
ax.set_xlabel(r'$G_{BP} - G_{RP}$')
ax.set_ylabel(r'$M_G$')
# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")
# Add grid lines to the plot
ax.grid()
So, what insights can we gather from this plot?
- The plot's left side corresponds to 'blue,' while the right side corresponds to 'red.'
- The lower section of the plot represents 'dim' stars, whereas the upper section showcases 'bright' stars.
- The majority of stars form a discernible diagonal pattern, with dimmer stars appearing 'redder' and brighter stars appearing 'bluer.'
- Notably, we observe a distinct grouping of dim, blue stars in the bottom-left quadrant of the plot.
- There is a noticeable 'turn-off' branch where we encounter both bright blue and bright red stars.
- As we progress along this branch, the bright red stars appear to level off in brightness and become more 'red.'
- Stars residing in the bright red and dim blue regions are relatively scarce.
In the realm of astronomy, we've uncovered one of the most profound tools: the Hertzsprung-Russell (HR) Diagram.
The HR Diagram serves as a stellar narrative, shedding light on the intricate lives of stars. It all commences with a star's birth, and its initial position on the diagram is determined by its mass. Stars embark on their cosmic journey along the 'Main Sequence,' where they sustain themselves by converting hydrogen into helium—a process known as hydrogen burning. This phase constitutes the majority of a star's existence.
However, as time advances, stars deplete their hydrogen fuel. When this occurs, they transition to the 'Giant' branch. Giants display a cooler surface temperature, giving them a 'reddish' appearance. Concurrently, they experience a considerable increase in luminosity, shining much brighter. This phenomenon is accompanied by a significant expansion in their physical size, leading to their classification as 'red giants.'
Now, here's the fascinating twist: Stars like our Sun, after the exhaustion of helium, evolve into 'White Dwarfs.' This transformation arises from the fact that they lack the necessary mass to facilitate the fusion of elements heavier than helium. These stars have meticulously fused all their available helium into carbon. Unlike most stars that maintain a delicate equilibrium between gravitational forces and nuclear fusion, White Dwarfs are held in place by a phenomenon known as 'electron degeneracy.'
In essence, the HR Diagram serves as a window into the diverse chapters of stellar existence, offering us valuable insights into the captivating stories of these cosmic marvels.
Mapping Stellar Density in Galactic Coordinates¶
Our celestial journey continues as we leverage the rich data from the Gaia mission that we previously dissected. In this leg of our astronomical exploration, we're about to embark on an exciting mission: mapping the stellar density within our Milky Way galaxy.
Armed with Gaia's invaluable data, we'll now venture into the realm of galactic coordinates. These coordinates provide us with a unique perspective, allowing us to chart the positions and densities of stars across the vast expanse of our galactic home.
By harnessing the power of data analysis and visualization, we'll unveil a stellar density map that offers an intricate view of the distribution and concentration of stars within the Milky Way. Our map will not only guide us through the celestial neighborhoods but also shed light on the interplay between stars, clusters, and the galactic structures that shape our night sky.
So, prepare to join us in this thrilling endeavor as we decode the secrets of the stars and craft a stellar density map in the wondrous realm of galactic coordinates.
from astropy.coordinates import SkyCoord
coord = SkyCoord(df["ra"], df["dec"], unit="deg")
gal_coords = coord.galactic
plt.subplot(111, projection='aitoff')
plt.grid(True)
plt.scatter(gal_coords.l.wrap_at('180d').radian, gal_coords.b.radian, alpha = 0.002)
def get_bin_centers(binning):
"""
Calculate the bin centers given a set of bin edges.
Parameters:
binning (numpy.ndarray): An array containing the bin edges.
Returns:
numpy.ndarray: An array of bin centers.
Example:
>>> bin_edges = np.array([0, 1, 2, 3, 4, 5])
>>> centers = get_bin_centers(bin_edges)
>>> print(centers)
[0.5 1.5 2.5 3.5 4.5]
"""
# Calculate the width of each bin by subtracting the previous bin's edge from the current bin's edge
width = binning[1:] - binning[:-1]
# Calculate the bin centers by adding half of the bin width to the left bin edge
bin_centers = binning[:-1] + 0.5 * width
return bin_centers
# Define the number of bins and the bin edges for galactic longitude (l) and latitude (b)
nbins = 100
l_bins = np.linspace(-180, 180, nbins)
b_bins = np.linspace(-90, 90, nbins)
# Calculate the bin centers for galactic longitude (l) and latitude (b)
l_center = get_bin_centers(l_bins)
b_center = get_bin_centers(b_bins)
# Calculate the 2D histogram (counts) of galactic coordinates l and b
# Use np.histogram2d to create a 2D histogram from galactic coordinates
# Wrap galactic longitude (l) at 180 degrees to ensure it covers the full range
counts, _, _ = np.histogram2d(
gal_coords.l.wrap_at('180d').radian, # Convert and wrap l in radians
gal_coords.b.radian, # Convert b in radians
bins=[np.deg2rad(l_bins), np.deg2rad(b_bins)] # Convert bin edges to radians
)
# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')
# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.savanna)
# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)
# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')
# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")
# Add grid lines to the celestial map
ax.grid()
We observer that most stars fall alone a latitude of 0. This is the galactic plane. We also notice some spots of high stellar density. These are likely stellar clusters. Clusters of gravationally bound groups of stars.
Open Clusters¶
Open cluster: NGC 2164
Open clusters are remarkable gatherings of young stars that form from the same cloud of interstellar gas and dust. These stellar congregations are a captivating feature of our galaxy, the Milky Way, and can be found in various regions of its spiral arms. Open clusters typically contain hundreds to thousands of stars.
Open clusters provide insights into stellar evolution and the early stages of star formation. The young stars within open clusters often share similar ages, compositions, and distances, making them ideal laboratories for studying stellar properties and the effects of their environment on their development.
Over time, open clusters gradually disperse due to gravitational interactions and tidal forces from the Milky Way. As a result, the stars within these clusters eventually go their separate ways, becoming part of the general stellar population of our galaxy.
Studying open clusters offers valuable clues about the dynamics of our galaxy, the formation of stars, and the life cycles of stellar systems. Astronomers continue to explore and catalog these captivating celestial gatherings to deepen our understanding of the cosmos.
df_open_clusters = pd.read_csv("./OpenClusters.csv")
df_open_clusters_near = df_open_clusters[df_open_clusters["Distance (parsecs)"] < 200]
df_open_clusters_near.head()
Unnamed: 0 | Cluster identifier | Unnamed: 1_level_0 | Unnamed: 2_level_0 | Constellation | Distance (parsecs) | Age (Myr) | Diameter | Apparent magnitude | Notes | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Hyades | 04h 26.9m | +15d 52m | Taurus | 47.0 | 625 | 330.0 | 0.5 | [3] |
1 | 1 | Coma Star Cluster | 12h 22.5m | +25d 51m | Coma Berenices | 86.0 | 400-500 | 120.0 | 1.8 | [4] |
20 | 20 | Messier 44, Beehive Cluster | 08h 40.4m | +19d 41m | Cancer | 187.0 | 830 | 70.0 | 3.1 | [10][11] |
21 | 21 | Messier 45, Pleiades | 03h 47.4m | +24d 07m | Taurus | 136.0 | 125 | 120.0 | 1.2 | [12] |
30 | 30 | IC 2602, Southern Pleiades | 10h 43.2m | -64d 24m | Carina | 167.0 | 30 | 100.0 | 1.9 | [14] |
cluster_ra = df_open_clusters_near["Unnamed: 1_level_0"]
cluster_dec = df_open_clusters_near["Unnamed: 2_level_0"]
cluster_coords = SkyCoord(cluster_ra,cluster_dec)
df_open_clusters_near["l"] = cluster_coords.galactic.l.rad
df_open_clusters_near["b"] = cluster_coords.galactic.b.rad
/tmp/ipykernel_105960/3746201907.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_open_clusters_near["l"] = cluster_coords.galactic.l.rad /tmp/ipykernel_105960/3746201907.py:6: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_open_clusters_near["b"] = cluster_coords.galactic.b.rad
df_open_clusters_near.head()
Unnamed: 0 | Cluster identifier | Unnamed: 1_level_0 | Unnamed: 2_level_0 | Constellation | Distance (parsecs) | Age (Myr) | Diameter | Apparent magnitude | Notes | l | b | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Hyades | 04h 26.9m | +15d 52m | Taurus | 47.0 | 625 | 330.0 | 0.5 | [3] | 3.142706 | -0.389958 |
1 | 1 | Coma Star Cluster | 12h 22.5m | +25d 51m | Coma Berenices | 86.0 | 400-500 | 120.0 | 1.8 | [4] | 3.882823 | 1.455624 |
20 | 20 | Messier 44, Beehive Cluster | 08h 40.4m | +19d 41m | Cancer | 187.0 | 830 | 70.0 | 3.1 | [10][11] | 3.593651 | 0.567058 |
21 | 21 | Messier 45, Pleiades | 03h 47.4m | +24d 07m | Taurus | 136.0 | 125 | 120.0 | 1.2 | [12] | 2.908491 | -0.409449 |
30 | 30 | IC 2602, Southern Pleiades | 10h 43.2m | -64d 24m | Carina | 167.0 | 30 | 100.0 | 1.9 | [14] | 5.054886 | -0.085419 |
sns.set_palette("colorblind")
# Create a subplot with an Aitoff projection for a celestial map
ax = plt.subplot(111, projection='aitoff')
# Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# Use 'counts.T' to transpose the counts array for proper orientation
# Apply a PowerNorm to enhance visualization (power=0.7)
# Set zorder to control the layering of the plot
# Use the 'savanna' color map from cms module
h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)
# Add a color bar for the density map
cb = fig.colorbar(h, ax=ax, pad=0.02)
# Set labels for the x and y axes
ax.set_xlabel(r'Galactic Longitude')
ax.set_ylabel(r'Galactic Latitude')
# Set a label for the color bar
cb.set_label(r"$\mathrm{Stellar~density}$")
# Set the number of brightest objects to overlay
N = 3 # Adjust this to select the top N brightest objects
# Sort the DataFrame by apparent magnitude (brightness)
sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')
# Select the top N brightest objects
brightest_objects = sorted_data.head(N)
# Create a scatter plot of galactic coordinates (l, b)
circle_size = 140
for i in range(len(brightest_objects)):
ax.scatter(
brightest_objects['l'].iloc[i],
brightest_objects['b'].iloc[i],
facecolor='none', edgecolor=f'C{i}',
s=circle_size,
label=brightest_objects['Cluster identifier'].iloc[i])
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2) # Adjust the bbox_to_anchor to control the legend position
# Add grid lines to the celestial map
ax.grid()
Clustering¶
It's noteworthy that three of the brightest nearby open clusters align with three of the most prominent hotspots on our stellar density map. This alignment isn't surprising, as a density map naturally highlights concentrations of stars, including known star clusters.
Now, let's embark on the journey of exploring additional potential star clusters within our dataset. We'll employ a clustering algorithm known as DBSCAN. DBSCAN is a versatile algorithm capable of identifying clusters within data without prior knowledge of the number of clusters. Furthermore, it's designed to handle noise, such as random stars, ensuring that our focus remains on identifying genuine clusters without being perturbed by nearby stars.
from sklearn.cluster import DBSCAN
x = gal_coords.l.wrap_at('180d').radian
y = gal_coords.b.radian
z = np.log10(df["Distance"])
data = np.array([x,y,z]).T
normalized_data = (data - data.mean(axis=0)) / data.std(axis=0)
# # Define DBSCAN parameters
# eps_values = np.linspace(0.1,1, 5) # Epsilon neighborhood size
# min_samples = 1000 # Minimum number of samples in a cluster
# for eps in eps_values:
# # Apply DBSCAN
# dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# labels = dbscan.fit_predict(data)
# mean_l = []
# mean_b = []
# mean_dist = []
# for lab in set(labels):
# if lab == -1:
# continue
# print (lab)
# lab_mask = labels == lab
# sub_l = x[lab_mask]
# sub_b = y[lab_mask]
# sub_d = z[lab_mask]
# mean_l.append(np.mean(sub_l))
# mean_b.append(np.mean(sub_b))
# mean_dist.append(np.mean(sub_d))
# fig = plt.figure(figsize=(11,6))
# # Create a subplot with an Aitoff projection for a celestial map
# ax = plt.subplot(111, projection='aitoff')
# # Create a pseudo-color mesh plot (pcolormesh) for the stellar density map
# # Use np.deg2rad to convert bin centers from degrees to radians for the x and y axes
# # Use 'counts.T' to transpose the counts array for proper orientation
# # Apply a PowerNorm to enhance visualization (power=0.7)
# # Set zorder to control the layering of the plot
# # Use the 'savanna' color map from cms module
# h = ax.pcolormesh(np.deg2rad(l_center), np.deg2rad(b_center), counts.T, norm=colors.PowerNorm(0.7), zorder=0.5, cmap=cms.sepia)
# # Add a color bar for the density map
# cb = fig.colorbar(h, ax=ax, pad=0.02)
# # Set labels for the x and y axes
# ax.set_xlabel(r'Galactic Longitude')
# ax.set_ylabel(r'Galactic Latitude')
# # Set a label for the color bar
# cb.set_label(r"$\mathrm{Stellar~density}$")
# # # Set the number of brightest objects to overlay
# # N = 3 # Adjust this to select the top N brightest objects
# # # Sort the DataFrame by apparent magnitude (brightness)
# # sorted_data = df_open_clusters_near.sort_values(by='Apparent magnitude')
# # # Select the top N brightest objects
# # brightest_objects = sorted_data.head(N)
# # # Create a scatter plot of galactic coordinates (l, b)
# # circle_size = 140
# # for i in range(len(brightest_objects)):
# # ax.scatter(
# # brightest_objects['l'].iloc[i],
# # brightest_objects['b'].iloc[i],
# # facecolor='none', edgecolor=f'C{i}',
# # s=circle_size,
# # label=brightest_objects['Cluster identifier'].iloc[i])
# for i in range(len(mean_l)):
# ax.scatter(
# mean_l[i],
# mean_b[i],
# facecolor='none', edgecolor=f'C{i}',
# s=circle_size,
# label=f"Cluster {i+1}")
# ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=2) # Adjust the bbox_to_anchor to control the legend position
# # Add grid lines to the celestial map
# ax.grid()
# fig.savefig(f"eps_{eps}.png")