What is Python?¶

Python is one of the most widely used programming languages today. It is a high-level, interpreted programming language that emphasizes readability. Python's readability makes it often the language of choice for both those beginning to learn programming and large collaborative projects.

Unlike compiled languages like C++ and Rust, Python is an interpreted language. This means that the Python interpreter will compile the code at runtime, rather than compiling it down to a binary prior to running. This has the benefit of making Python an excellent language for quickly developing and debugging code.

Interactive environments like Jupyter Notebooks allow for clear frameworks for developing, testing, sharing, and presenting code.

Installing Python and working with Environments¶

We will use virtual environments in this tutorial. I recommend that you use an environment manager such as conda or mamba/micromamba.

Mamba/Micromamba is a "fast, robust, and cross-platform package manager" that offers significant performance advantages over conda when installing and resolving packages.

We can create a new environment using the following command:

mamba/conda create -n my_environment python=3.9 numpy matplotlib

Here, we are creating a new environment called my_environment, which installs Python 3.9 and the packages numpy and matplotlib. We can activate this environment using:

mamba/conda activate my_environment

Running which python confirms that we are utilizing the Python installed within our environment.

Environments enable us to install conflicting versions for various projects. For instance, suppose we need to execute older code dependent on Python 2.7, which is incompatible with modern packages. In that scenario, we can establish an environment with Python 2.7 and install versions of packages compatible with it. This action won't impact any other environment we've established.

For this workshop, we will utilize the following environment:

mamba/conda create -n workshop -c conda-forge python=3.10 numpy matplotlib scipy pandas jupyter jupyterlab ipykernel

Installing additional packages¶

Once in the environment, we can install additional packages using the install command:

mamba/conda install scipy

This installs the package scipy, which is a statistics package compatible with numpy data types. We can also remove a package using:

mamba/conda remove scipy

which would remove the package scipy. Some packages will require installation from a specific collection of packages:

mamba/conda install -c conda-forge astroquery

This installs the package astroquery, a package that allows querying astronomical databases like Simbad, which is part of the collection conda-forge.

If we ever want to see which packages are currently installed, we can use something like:

mamba/conda list

which gives a list of installed packages and their versions. We can output this to a machine-readable file using:

mamba/conda list -e > requirements.txt

Ensuring that users are using a standard environment can help debug version-specific bugs.

Hello World¶

Python is a dynamically typed language, which means that the type of a variable does not need to be known until that variable is used. This also means that we can change the type of a variable at any stage of the code.

We can define variables like:

my_string = "Hello"

We can also overwrite variables like:

In [156]:

Copied!





# Defining a bool
my_variable = True
# Redefining as a float
my_variable = 4.23
# Redefining as a string
my_variable = "goodbye"
# Defining a bool
my_variable = True
# Redefining as a float
my_variable = 4.23
# Redefining as a string
my_variable = "goodbye"

In python we can print output using the print() function:

In [157]:

Copied!





# Defining as a string
my_variable = "Hello, world"
print (my_variable)
# Redefining a bool
my_variable = True
print (my_variable)
# Redefining as a float
my_variable = 4.23
print (my_variable)
print ("Goodbye, World!")
# Defining as a string
my_variable = "Hello, world"
print (my_variable)
# Redefining a bool
my_variable = True
print (my_variable)
# Redefining as a float
my_variable = 4.23
print (my_variable)
print ("Goodbye, World!")

Hello, world
True
4.23
Goodbye, World!

when printing we can format strings using fstrings:

In [158]:

Copied!





pi = 3.14159265359
print (f"Pi to 5 digits = {pi:0.5f}")
print (f"Pi to 3 digits = {pi:0.3f}")
print (f"Pi to 4 digits in scientific notation = {pi:0.4e}")
print (f"Pi as an integer = {pi:0.0f}, minus pi to 1 digit {-pi:0.1f}")
pi = 3.14159265359
print (f"Pi to 5 digits = {pi:0.5f}")
print (f"Pi to 3 digits = {pi:0.3f}")
print (f"Pi to 4 digits in scientific notation = {pi:0.4e}")
print (f"Pi as an integer = {pi:0.0f}, minus pi to 1 digit {-pi:0.1f}")

Pi to 5 digits = 3.14159
Pi to 3 digits = 3.142
Pi to 4 digits in scientific notation = 3.1416e+00
Pi as an integer = 3, minus pi to 1 digit -3.1

We can also define strings to format later:

In [159]:

Copied!

my_string = "{name}'s favorite number is {number}"

formatted = my_string.format( name = "Ste", number = 42)
print(formatted)
my_string = "{name}'s favorite number is {number}"

formatted = my_string.format( name = "Ste", number = 42)
print(formatted)

Ste's favorite number is 42

And we can add strings together and take slices of string:

In [160]:

Copied!





# Adding to a string
extended = formatted + " and he likes python"
print (extended)
# Taking up to the last 6 elements and adding "pi"
print (extended[:-6] + "pi")
# Adding to a string
extended = formatted + " and he likes python"
print (extended)
# Taking up to the last 6 elements and adding "pi"
print (extended[:-6] + "pi")

Ste's favorite number is 42 and he likes python
Ste's favorite number is 42 and he likes pi

Basic Operations¶

Addition: a + b
Multiplication: a * b
Division: a / b
Integer division: a // b
Modulus: a % b
Power: a ** b
Equal: a == b
Not equal: a != b
Less than: a < b
Less than or equal to: a <= b
Greater than: a > b
Greater than or equal to: a >= b

Other logical statements:¶

Or:
- a or b
- a | b
And:
- a and b
- a & b

For example, if a multiplied by b is less than c divided by d, and e is greater than 10:

(a * b < c / d) and (e > 10)

In [161]:

Copied!





a = 7
b = 2.2
# Normal division
c = a/b
# Integer division
d = a//b
# Modulus (remainder)
e = a % b

print(f"{a} / {b} = {c}")
print(f"{a} // {b} = {d}")
print(f"{a} % {b} = {e}")

print (f"a > 2: {a > 2}")
a = 7
b = 2.2
# Normal division
c = a/b
# Integer division
d = a//b
# Modulus (remainder)
e = a % b

print(f"{a} / {b} = {c}")
print(f"{a} // {b} = {d}")
print(f"{a} % {b} = {e}")

print (f"a > 2: {a > 2}")

7 / 2.2 = 3.1818181818181817
7 // 2.2 = 3.0
7 % 2.2 = 0.39999999999999947
a > 2: True

Basic Data Types¶

Python has several basic data types:

int: integers: -3, -2, -1, 0, 1, 2, 3, etc.
float: non-integers: 3.14, 42.0, etc.
bool: boolean. Note in Python, True/False start with a capital letter.
- x = false will give an error, while x = False will not.
str: strings of characters. In Python, strings are wrapped in single ('') or double ("") quotation marks. They can be combined when using strings:
- my_str = "hello", my_str = 'apple', answer = 'Computer says "no"' - all of these will work just fine.
- my_str = "Goodbye' will not work since we need to match the quotation marks properly.

We can cast from one data type to another using the format:

x = 1.3
y = int(x)

Here, x is cast to the int type y. We can also determine the type of a variable using the type function:

type(x)

In [7]:

Copied!





x = 10
y = float(x)
z = bool(x)
v = str(x)
print (x,y,z,v)
print (type(x), type(y), type(z), type(v))
x = 10
y = float(x)
z = bool(x)
v = str(x)
print (x,y,z,v)
print (type(x), type(y), type(z), type(v))

10 10.0 True 10
<class 'int'> <class 'float'> <class 'bool'> <class 'str'>

Basic Collections of Data¶

Python offers several ways to organize and store data efficiently. These data structures play a vital role in managing and manipulating information within a program.

Lists¶

A list in Python is a versatile and mutable collection of items, ordered and enclosed within square brackets []. It allows storing various data types, including integers, strings, or even other lists. Lists are dynamic, meaning elements can be added, removed, or changed after creation using methods like append(), insert(), remove(), or by directly assigning values to specific indices.

my_list = [1, 2, 3, 'apple', 'banana', 'cherry']

In [8]:

Copied!





# Create a list
my_list = [4,5.2,-1.3]
print (my_list)

# Add a new element to the end
my_list.append(21)
print (my_list)

# "Pop" out the 1st element
element = my_list.pop(1)
print (element, my_list)

# Reasign a value
my_list[0] = -999
print (my_list)


# Lists can include multiple data types
my_list[-2] = "Hello"
print(my_list)
# Create a list
my_list = [4,5.2,-1.3]
print (my_list)

# Add a new element to the end
my_list.append(21)
print (my_list)

# "Pop" out the 1st element
element = my_list.pop(1)
print (element, my_list)

# Reasign a value
my_list[0] = -999
print (my_list)


# Lists can include multiple data types
my_list[-2] = "Hello"
print(my_list)

[4, 5.2, -1.3]
[4, 5.2, -1.3, 21]
5.2 [4, -1.3, 21]
[-999, -1.3, 21]
[-999, 'Hello', 21]

String slicing¶

In Python we can slice lists (and arrays, more on this later) to access sub sections of the list. We use the syntax:

my_list[start:stop]

where start and stop are the range that we want to access, with stop being exclusive. We can also access the last element with:

my_list[-1]

with -1 being the last element (-2 being the second last... etc.). To slice from the 2 element to the second last we would do:

my_list[2:-2]

In [9]:

Copied!





# Define a new list
my_list = [1,2,3,4,5,6,7,8,9, 1,2,3,4]
print (my_list)

# Create a slice excluding the first and last
my_sub_list = my_list[1:-1]
print(my_sub_list)

# get the length of the list
print (f"The list is {len(my_list)} elements long")
# Define a new list
my_list = [1,2,3,4,5,6,7,8,9, 1,2,3,4]
print (my_list)

# Create a slice excluding the first and last
my_sub_list = my_list[1:-1]
print(my_sub_list)

# get the length of the list
print (f"The list is {len(my_list)} elements long")

[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4]
[2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3]
The list is 13 elements long

Sets¶

Sets are useful collection in Python. They are an unordered and mutable collection of unique elements. Sets are enclosed in curly braces {} and support set operations like union, intersection, and difference. They are efficient for tasks requiring unique elements and membership testing.

In [11]:

Copied!





# Create a set using {}
first_set = {1,2,3,4,4,5}
print (first_set)

# create a set from the list
my_set = set(my_list)
print (my_set)

# Add values to the set
my_set.add(13)
my_set.add(1)
print (my_set)

# Sets can have multiple data types
my_set.add("Hello")
print (my_set)
# Create a set using {}
first_set = {1,2,3,4,4,5}
print (first_set)

# create a set from the list
my_set = set(my_list)
print (my_set)

# Add values to the set
my_set.add(13)
my_set.add(1)
print (my_set)

# Sets can have multiple data types
my_set.add("Hello")
print (my_set)

{1, 2, 3, 4, 5}
{1, 2, 3, 4, 5, 6, 7, 8, 9}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 13}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 13, 'Hello'}

Tuples¶

A tuple is similar to a list but is immutable once created, denoted by parentheses (). Tuples are often used to store related pieces of information together and are faster than lists due to their immutability. They are commonly utilized for items that shouldn't be changed, such as coordinates or configuration settings.

my_tuple = (4, 5, 6, 'dog', 'cat', 'rabbit')

In [12]:

Copied!





my_tup = (1,2,3,4, "Apple")
print (my_tup)
my_tup[0] = -2
print (my_tup)
my_tup = (1,2,3,4, "Apple")
print (my_tup)
my_tup[0] = -2
print (my_tup)

(1, 2, 3, 4, 'Apple')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[12], line 3
      1 my_tup = (1,2,3,4, "Apple")
      2 print (my_tup)
----> 3 my_tup[0] = -2
      4 print (my_tup)

TypeError: 'tuple' object does not support item assignment

Dictionaries¶

A dictionary is an unordered collection of key-value pairs enclosed in curly braces {}. Each element in a dictionary is accessed by its associated key rather than an index. Dictionaries are suitable for storing data where retrieval by a specific key is a priority. They are flexible and allow storing various data types as values.

my_dict = {'name': 'Alice', 'age': 25, 'country': 'USA'}

In [13]:

Copied!





my_dict = {}
fmt_string = "entry_{entry}"
for i in range(10):
    my_dict[fmt_string.format(entry=i)] = -1
    
print (my_dict["entry_9"])

if "new_key" in my_dict:
    print ("Key exists")

if "entry_1" in my_dict:
    print ("Key exists: ", my_dict["entry_1"])
my_dict = {}
fmt_string = "entry_{entry}"
for i in range(10):
    my_dict[fmt_string.format(entry=i)] = -1
    
print (my_dict["entry_9"])

if "new_key" in my_dict:
    print ("Key exists")

if "entry_1" in my_dict:
    print ("Key exists: ", my_dict["entry_1"])

-1
Key exists:  -1

Looping¶

In Python, there are two primary methods of looping: for loops and while loops.

For Loops¶

for loops use the syntax for variable in iterable, where iterable is some sequence-like object, and variable represents the current instance within the loop. The block of code to be executed within the loop is designated by indentation. Python's standard is to use 4 spaces for indentation, but using tabs (consistently) is also common (avoid mixing spaces and tabs). For example:

my_list = [1, 2, 3, 4, 5]
for num in my_list:
    print(num)

This loop iterates through the elements of my_list, assigning each element to the variable num, and then prints each element.

The range() function is often used with for loops to generate a sequence of numbers. It allows iterating a specific number of times or generating a sequence within a range.

In [14]:

Copied!





# Create an empty list
x = []

# range(n) will return an iteratable type which goes from 0-10 exclusive (0,1,...,9)
for i in range(10):
    # add i to our list
    x.append(i)

print (x)

# The list x is also iterable
for ix in x:
    # Print the value squared
    print (ix**2)
# Create an empty list
x = []

# range(n) will return an iteratable type which goes from 0-10 exclusive (0,1,...,9)
for i in range(10):
    # add i to our list
    x.append(i)

print (x)

# The list x is also iterable
for ix in x:
    # Print the value squared
    print (ix**2)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0
1
4
9
16
25
36
49
64
81

We can also use "list comprehension" to generate a list from simple loops. This takes the format:

array = [ some_funtion(i) for i in some_loop() ]

We can also use list comprehension to do some filtering:

array = [ some_function(i) for i in some_loop() if some_condition(i) ]

The if statement can go before or after the for loop:

array = [ some_function(i) if some_condition(i) for i in some_loop()  ]

We can also add in an else condition:

array = [ some_function(i) if some_condition(i) else 0 for i in some_loop() ]

In [15]:

Copied!





# Create a list with numbers between 0 and 100
my_list = [ x for x in range(100)]
# Create a list with only even numbers between 0 and 100
my_even_list = [x for x in range(100) if x % 2 == 0]
# Create a list with 0 for even indices and 1 for odd indicies
# Between 0 and 100
my_conditional_list = [ 0 if x %2 == 0 else 1 for x in range(100) ]

print (my_list[:5])
print (my_even_list[:5])
print (my_conditional_list[:5])
# Create a list with numbers between 0 and 100
my_list = [ x for x in range(100)]
# Create a list with only even numbers between 0 and 100
my_even_list = [x for x in range(100) if x % 2 == 0]
# Create a list with 0 for even indices and 1 for odd indicies
# Between 0 and 100
my_conditional_list = [ 0 if x %2 == 0 else 1 for x in range(100) ]

print (my_list[:5])
print (my_even_list[:5])
print (my_conditional_list[:5])

[0, 1, 2, 3, 4]
[0, 2, 4, 6, 8]
[0, 1, 0, 1, 0]

While¶

while loops execute a block of code as long as a specified condition is True. Care should be taken to avoid infinite loops where the condition always remains True. The syntax for a while loop is while condition: followed by an indented block of code.

The break statement can be used to exit a loop prematurely based on a condition, while continue skips the current iteration and proceeds to the next one.

When evaluting the condition anything that isn't False, 0 or None is considered to be True.

In [17]:

Copied!





i = 0
# This will not run
while None:
    i+=1
    print (i)
    if i > 5:
        break
i = 0
# This will not run
while None:
    i+=1
    print (i)
    if i > 5:
        break

In [18]:

Copied!





i = 0
# Use while loop to print the numbers up to 10
while i < 10:
    print (i)
    i+=1
i = 0
# Use while loop to print the numbers up to 10
while i < 10:
    print (i)
    i+=1

In [21]:

Copied!





i = 0
# Use an if condition to break this infinite loop after the 5th iteration
while "hello":
    print (i)
    i+=1

    if i >= 5:
        break
i = 0
# Use an if condition to break this infinite loop after the 5th iteration
while "hello":
    print (i)
    i+=1

    if i >= 5:
        break
    

Without the if statement here we would have an infinite loop! We can exit out of a loop with a break command or we can skip to the end of the current iteration using the continue command.

Let's use if statements to see how these work.

if-elif-else Statements¶

if-elif-else statements allow us to control the flow of the code based on conditions. They take the syntax:

if condition1:
    # condition 1 code
elif condition2:
    # condition 2 code
elif condition3:
    # condition 3 code
else:
    # default code

Notice that the if and elif statements take logical expressions, while else does not. You can have any number of elif branches but only one if branch and at most one else branch.

This construct allows for branching based on multiple conditions. Python evaluates each condition sequentially. If condition1 is true, it executes the code block under condition1. If condition1 is false, it checks condition2, and so on. If none of the conditions are true, the code block under else (if provided) is executed as the default action.

In [16]:

Copied!





even_sum = 0
odd_sum = 0

for i in range(100):
    # Exit the loop when i goes above or equal to 10
    if i >= 10:
        break
    # Skip the i = 3 or the i = 0 iteration
    elif (i == 3) | (i==0):
        continue
    elif i % 2 == 0:
        even_sum += i
        print (f"{i} -> Even Sum: {even_sum}")

    else :
        odd_sum +=1
        print (f"{i} -> Odd Sum: {odd_sum}")
even_sum = 0
odd_sum = 0

for i in range(100):
    # Exit the loop when i goes above or equal to 10
    if i >= 10:
        break
    # Skip the i = 3 or the i = 0 iteration
    elif (i == 3) | (i==0):
        continue
    elif i % 2 == 0:
        even_sum += i
        print (f"{i} -> Even Sum: {even_sum}")

    else :
        odd_sum +=1
        print (f"{i} -> Odd Sum: {odd_sum}")

1 -> Odd Sum: 1
2 -> Even Sum: 2
4 -> Even Sum: 6
5 -> Odd Sum: 2
6 -> Even Sum: 12
7 -> Odd Sum: 3
8 -> Even Sum: 20
9 -> Odd Sum: 4

Functions¶

Creating functions is an effective method to enhance code reusability and streamline debugging. When there's a block of code intended to be executed multiple times, encapsulating it within a function proves beneficial. This practice minimizes human error by necessitating modifications in only one location. Moreover, employing functions to execute smaller code segments can significantly enhance code readability and simplify the debugging process.

In Python, we define a function using the def keyword:

In [17]:

Copied!





# Simple function to print hello
def print_hello():
    print("Hello")

# We can take in arguments
def print_message(msg):
    print(msg)


# Simple function to take in two arguments and add them togeter and return the sum
def add_numbers(a,b):
    c = a + b
    # Use the print message function to print c
    print_message(c)
    return a+b


# We can also pass a function to a function
def repeat(func, n, args):
    for i in range(n):
        # Using the *args will unwrap the tuple and pass to the function
        func(*args)

print_hello()
print_message(42)

c = add_numbers(1.3, 5)

# We can specify which variable is which by specificing the argument name/
repeat(func = add_numbers, n = 5, args=(1.3,2.1))
# Simple function to print hello
def print_hello():
    print("Hello")

# We can take in arguments
def print_message(msg):
    print(msg)


# Simple function to take in two arguments and add them togeter and return the sum
def add_numbers(a,b):
    c = a + b
    # Use the print message function to print c
    print_message(c)
    return a+b


# We can also pass a function to a function
def repeat(func, n, args):
    for i in range(n):
        # Using the *args will unwrap the tuple and pass to the function
        func(*args)

print_hello()
print_message(42)

c = add_numbers(1.3, 5)

# We can specify which variable is which by specificing the argument name/
repeat(func = add_numbers, n = 5, args=(1.3,2.1))

Hello
42
6.3
3.4000000000000004
3.4000000000000004
3.4000000000000004
3.4000000000000004
3.4000000000000004

We can also write lambda functions which are short inline functions:

In [18]:

Copied!





# Lambda function to get square root
my_function = lambda x : x**0.5

# Lambda function to act as a wrapper
def larger_function(x, y):
    return x**2 / y**3

# Lambda function which calls for y = 1.5
my_wrapper = lambda x : larger_function(x, 1.5)


print (my_function(4))
print (my_wrapper(3))
print (larger_function(3,4))
# Lambda function to get square root
my_function = lambda x : x**0.5

# Lambda function to act as a wrapper
def larger_function(x, y):
    return x**2 / y**3

# Lambda function which calls for y = 1.5
my_wrapper = lambda x : larger_function(x, 1.5)


print (my_function(4))
print (my_wrapper(3))
print (larger_function(3,4))

2.0
2.6666666666666665
0.140625

Functions, Naming Conventions and Documentation¶

When writing functions and classes (more on this later), we should conform to a consistent convention. This helps both users and developers to better understand the code, improving the ability to use and develop the code.

The convention we'll follow in this example is the Google Python Style Guide. Let's look at some examples of why this is useful.

Consider the following. We have a function calc which takes three arguments (x, b and i).

In [22]:

Copied!

def calc(x, b, i):
    x[i] = x[i] / b
    return x
def calc(x, b, i):
    x[i] = x[i] / b
    return x

The function scales an element in x by 1/b. We can choose a name to better describe what the function does.

In [23]:

Copied!

def scale_element(x, b, i):
    x[i] = x[i] / b
    return x
def scale_element(x, b, i):
    x[i] = x[i] / b
    return x

The use now knows that the function will scale an element of the array, but the user still doesn't know what the arguments are or what is returned. We can add a doc string to help with this.

In [24]:

Copied!

help(scale_element)
help(scale_element)

Help on function scale_element in module __main__:

scale_element(x, b, i)

This help message is automatically generated from the "docstring" of the function. The docstring is a small description of a function that we write at the state of the function.

Following the Google Python Style Guide, a good template for your docstring is:

def function(x,y,z):
    """One line summary of my function

    More detailed description of my function, potentially showing
    some math relation:
    $\frac{dy}{dx} = x^2$

    Args:
        x: description of x
        y: description of y
        z: description of z

    Returns:
        description of what is returned

    Examples:
        Some example
        >>> function (x,y,z)
        return_value

    Raises:
        Error: Error raised and description of that error
    """

This is quite verbose but has huge benifits for the user and allows us to self document our code.

In [25]:

Copied!





def scale_element(x, b, i):
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    """
    x[i] = x[i] / b
    return x
def scale_element(x, b, i):
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    """
    x[i] = x[i] / b
    return x

Let's break down the sections here:

    """Scale an element of the input array

We start off with a short 1 sentence description of the function

    Scale an element of the array by a constant

We then use a more detailed description of the function and how to use it

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

We list the arguments by name and what they are.


    Returns:
        Scaled a list with the element i scaled by 1/b

We list what is returned by the function and what they are.

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1,1.0,3,4]

    """

We give an usage example of the functions and the expected output. The user can then access this helpful message anything using:

In [26]:

Copied!

help(scale_element)
help(scale_element)

Help on function scale_element in module __main__:

scale_element(x, b, i)
    Scale an element of the input array
    
    Scale an element of the array by a constant
    
    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale
    
    Returns:
        Scaled a list with the element i scaled by 1/b
    
    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

We can do better still. The user doesn't know what type of data to pass. For example if x is just a float rather than an array, this will fail. We can do this with "type-hinting". Type hinting is an optional feature in Python where we tell the expect type of the data that a fuction is expecting.

For example:

In [29]:

Copied!

def multiply(a, b):
    return a * b

# what happens when we pass two floats to this function?
multiply(1.5, 2.3)
def multiply(a, b):
    return a * b

# what happens when we pass two floats to this function?
multiply(1.5, 2.3)

Out[29]:

3.4499999999999997

In [33]:

Copied!

# What happens when we pass a string and an int?
multiply("apple", 6)
# What happens when we pass a string and an int?
multiply("apple", 6)

Out[33]:

'appleappleappleappleappleapple'

We've originally defined our function to ints or floats but we behaviour we weren't expecting when passing a string. We can use type hinting to be explicit about what can be passed to the function.

In [34]:

Copied!

def multiply(a : float, b : float ) -> float:
    return a * b

# what happens when we pass two floats to this function?
multiply(1.5, 2.3)
def multiply(a : float, b : float ) -> float:
    return a * b

# what happens when we pass two floats to this function?
multiply(1.5, 2.3)

Out[34]:

3.4499999999999997

In [35]:

Copied!

# What happens when we pass a string and an int?
multiply("apple", 6)
# What happens when we pass a string and an int?
multiply("apple", 6)

Out[35]:

'appleappleappleappleappleapple'

This doesn't stop us from calling the function with a string and int, but it does provide additional information to the help() function.

In [37]:

Copied!

help(multiply)
help(multiply)

Help on function multiply in module __main__:

multiply(a: float, b: float) -> float

In [38]:

Copied!





def scale_element(x : list, b : float, i : int) -> list:
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    """
    x[i] = x[i] / b
    return x
def scale_element(x : list, b : float, i : int) -> list:
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    """
    x[i] = x[i] / b
    return x

From the first line we can see:

def scale_element(x : list, b : float, i : int) -> list:

That x is expected to be a list, b is expected to be a float, i is expected to be a int and that the function will return a list.

In [39]:

Copied!

help(scale_element)
help(scale_element)

Help on function scale_element in module __main__:

scale_element(x: list, b: float, i: int) -> list
    Scale an element of the input array
    
    Scale an element of the array by a constant
    
    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale
    
    Returns:
        Scaled a list with the element i scaled by 1/b
    
    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

This is better but let's try and anticpate potential errors

In [40]:

Copied!

scale_element(3, 2, 1)
scale_element(3, 2, 1)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[40], line 1
----> 1 scale_element(3, 2, 1)

Cell In[38], line 19, in scale_element(x, b, i)
      1 def scale_element(x : list, b : float, i : int) -> list:
      2     """Scale an element of the input array
      3 
      4     Scale an element of the array by a constant
   (...)
     17 
     18     """
---> 19     x[i] = x[i] / b
     20     return x

TypeError: 'int' object is not subscriptable

This error message isn't too helpful... Let's write our own

In [42]:

Copied!





def scale_element(x : list, b : float, i : int) -> list:
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    Raises:
        ValueError: If x is not a list, i in not an int or b is neither an int or float
        IndexError: If i is out of bounds in list x

    """
    if not isinstance(x, list):
        raise ValueError(f'x is expected to be a list, recieved {type(x)}')
    if not isinstance(i, int):
        raise ValueError(f'i is expected to be an int, recieved {type(i)}')
    if not (isinstance(b, int) or isinstance(b, float)):
        raise ValueError(f'b is expected to be an int or float, recieved {type(b)}')

    if i >= len(x):
        raise IndexError(f'Index i ({i}) is out of bounds of array x (with len {len(x)})')
    x[i] = x[i] / b
    return x
def scale_element(x : list, b : float, i : int) -> list:
    """Scale an element of the input array

    Scale an element of the array by a constant

    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale

    Returns:
        Scaled a list with the element i scaled by 1/b

    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]

    Raises:
        ValueError: If x is not a list, i in not an int or b is neither an int or float
        IndexError: If i is out of bounds in list x

    """
    if not isinstance(x, list):
        raise ValueError(f'x is expected to be a list, recieved {type(x)}')
    if not isinstance(i, int):
        raise ValueError(f'i is expected to be an int, recieved {type(i)}')
    if not (isinstance(b, int) or isinstance(b, float)):
        raise ValueError(f'b is expected to be an int or float, recieved {type(b)}')

    if i >= len(x):
        raise IndexError(f'Index i ({i}) is out of bounds of array x (with len {len(x)})')
    x[i] = x[i] / b
    return x

In [43]:

Copied!

scale_element(3, 2, 1)
scale_element(3, 2, 1)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[43], line 1
----> 1 scale_element(3, 2, 1)

Cell In[42], line 24, in scale_element(x, b, i)
      2 """Scale an element of the input array
      3 
      4 Scale an element of the array by a constant
   (...)
     21 
     22 """
     23 if not isinstance(x, list):
---> 24     raise ValueError(f'x is expected to be a list, recieved {type(x)}')
     25 if not isinstance(i, int):
     26     raise ValueError(f'i is expected to be an int, recieved {type(i)}')

ValueError: x is expected to be a list, recieved <class 'int'>

In [44]:

Copied!

scale_element([3,2,2], 2, 5)
scale_element([3,2,2], 2, 5)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[44], line 1
----> 1 scale_element([3,2,2], 2, 5)

Cell In[42], line 31, in scale_element(x, b, i)
     28     raise ValueError(f'b is expected to be an int or float, recieved {type(b)}')
     30 if i >= len(x):
---> 31     raise IndexError(f'Index i ({i}) is out of bounds of array x (with len {len(x)})')
     32 x[i] = x[i] / b
     33 return x

IndexError: Index i (5) is out of bounds of array x (with len 3)

In [45]:

Copied!

help(scale_element)
help(scale_element)

Help on function scale_element in module __main__:

scale_element(x: list, b: float, i: int) -> list
    Scale an element of the input array
    
    Scale an element of the array by a constant
    
    Args:
        x: list of values
        b: constant to scale by
        i: index of element to scale
    
    Returns:
        Scaled a list with the element i scaled by 1/b
    
    Examples:
        >>> scale_element([1,2,3,4], 2, 1)
        [1, 1.0, 3, 4]
    
    Raises:
        ValueError: If x is not a list, i in not an int or b is neither an int or float
        IndexError: If i is out of bounds in list x

In [47]:

Copied!

import doctest
doctest.testmod(verbose=True)
import doctest
doctest.testmod(verbose=True)

Trying:
    scale_element([1,2,3,4], 2, 1)
Expecting:
    [1, 1.0, 3, 4]
ok
3 items had no tests:
    __main__
    __main__.calc
    __main__.multiply
1 items passed all tests:
   1 tests in __main__.scale_element
1 tests in 4 items.
1 passed and 0 failed.
Test passed.

Out[47]:

TestResults(failed=0, attempted=1)

Packages¶

Python boasts an extensive array of packages developed by the community. In Python, we use the import statement to bring in packages or specific sections of packages into our code.

import package as p

In the example above, we import a package named package. The as p statement allows us to assign an alias, p, to the imported package. This aliasing technique proves beneficial when accessing objects from within a package that might share a common name with objects in other packages. For instance:

In [48]:

Copied!

import numpy as np
import math as m

print(np.sin(0))
print(m.sin(0))
import numpy as np
import math as m

print(np.sin(0))
print(m.sin(0))

0.0
0.0

Here we have imported numpy using the alias np and math using the alias m. Then, we call the sin function from both packages, specifying which version of the sin function we want to invoke.

Numpy is a crucial package in scientific programming, and we'll delve deeper into its functionalities shortly.

We can also import only a section of a package. For example:

In [50]:

Copied!

import matplotlib.pyplot as plt
from scipy.stats import chi
import matplotlib.pyplot as plt
from scipy.stats import chi

Here, we import the pyplot sub-package from the larger matplotlib package and assign it the alias plt. Additionally, we import chi from the stats sub-package of the scipy package.

Both packages hold significant importance:

Matplotlib is an extensive library enabling the creation of static, animated, and interactive visualizations in Python. It offers a plethora of tools for various types of plots, charts, and graphical representations.
SciPy encompasses a wide range of scientific computing tools, providing algorithms for optimization, integration, interpolation, solving eigenvalue problems, handling algebraic and differential equations, statistical computations, and more. It's a fundamental package for scientific and technical computing in Python.

Working with Numpy¶

Numpy offers highly optimized functionality for typical matrix and vector operations, with the cornerstone being the numpy array. Arrays resemble lists in their mutability but differ in that they can only contain a single data type.

In [52]:

Copied!





# Define an array
x = np.array([0,1,2,3,4])
y = x**2
print (x)
print (y)

def add_2(arr : np.array ) -> np.array:
    """Add 2 to the value of the array

    Args:
        arr: array of values

    Returns:
        arr + 2
    """
    return arr + 2

z = add_2(x)
print(z)
print ( (y - x) / z )
# Define an array
x = np.array([0,1,2,3,4])
y = x**2
print (x)
print (y)

def add_2(arr : np.array ) -> np.array:
    """Add 2 to the value of the array

    Args:
        arr: array of values

    Returns:
        arr + 2
    """
    return arr + 2

z = add_2(x)
print(z)
print ( (y - x) / z )

[0 1 2 3 4]
[ 0  1  4  9 16]
[2 3 4 5 6]
[0.  0.  0.5 1.2 2. ]

In [55]:

Copied!





def subtract_and_add(x : np.array) -> np.array:
    """Subtract and add to an array

    Subtract 5 from the array and return the array + 10

    Args:
        x : input numpy array

    Returns:
        z : x -5 + 10
    """
    z = x
    z -= 5
    return z + 10

x_data = np.arange(0,10,1)
print (x_data)
y_data = subtract_and_add(x_data)

# What happened here!
print (x_data)

print (y_data)
def subtract_and_add(x : np.array) -> np.array:
    """Subtract and add to an array

    Subtract 5 from the array and return the array + 10

    Args:
        x : input numpy array

    Returns:
        z : x -5 + 10
    """
    z = x
    z -= 5
    return z + 10

x_data = np.arange(0,10,1)
print (x_data)
y_data = subtract_and_add(x_data)

# What happened here!
print (x_data)

print (y_data)

[0 1 2 3 4 5 6 7 8 9]
[-5 -4 -3 -2 -1  0  1  2  3  4]
[ 5  6  7  8  9 10 11 12 13 14]

In [58]:

Copied!





def subtract_and_add_copy(x):
    """Subtract and add to an array

    Subtract 5 from the array and return the array + 10.
    A copy is used to prevent modifying the input data

    Args:
        x : input numpy array

    Returns:
        z : x - 5 + 10
    """
    z = x.copy()
    z -= 5
    return z + 10

x_data = np.arange(0,10,1)
print (x_data)
y_data = subtract_and_add_copy(x_data)

# What happened here!
print (x_data)

print (y_data)
def subtract_and_add_copy(x):
    """Subtract and add to an array

    Subtract 5 from the array and return the array + 10.
    A copy is used to prevent modifying the input data

    Args:
        x : input numpy array

    Returns:
        z : x - 5 + 10
    """
    z = x.copy()
    z -= 5
    return z + 10

x_data = np.arange(0,10,1)
print (x_data)
y_data = subtract_and_add_copy(x_data)

# What happened here!
print (x_data)

print (y_data)

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[ 5  6  7  8  9 10 11 12 13 14]

Numpy arrays allow us to filter them using an array mask. We can pass an array of equal size to the array with a binary mask to select the items we want.

In [62]:

Copied!





x = np.array([1,2,3,4])
x_mask = np.array([True, False, True, True])

# We can select by indexing by the mask
print (x[x_mask])

# We can invert the selection using ~
print (x[~x_mask])
x = np.array([1,2,3,4])
x_mask = np.array([True, False, True, True])

# We can select by indexing by the mask
print (x[x_mask])

# We can invert the selection using ~
print (x[~x_mask])

[1 3 4]
[2]

In [59]:

Copied!





# We can mask and filter numpy arrays too
print (x_data)
# Get the odd numbers greater than 4
mask = (x_data > 4) & (x_data %2 == 1)
print (mask)
print (x_data[mask])
print (x_data[mask].sum())
# We can mask and filter numpy arrays too
print (x_data)
# Get the odd numbers greater than 4
mask = (x_data > 4) & (x_data %2 == 1)
print (mask)
print (x_data[mask])
print (x_data[mask].sum())

[0 1 2 3 4 5 6 7 8 9]
[False False False False False  True False  True False  True]
[5 7 9]
21

In [73]:

Copied!





# use numpy's random number generator to get normal random numbers:
x_rnd = np.random.normal(loc = 0, scale = 1, size = 1000)
great_that_0 = x_rnd > 0


# Use plt.hist to create histograms of the values
plt.hist(x_rnd)
plt.hist(x_rnd[great_that_0])
plt.hist(x_rnd[~great_that_0])
# use numpy's random number generator to get normal random numbers:
x_rnd = np.random.normal(loc = 0, scale = 1, size = 1000)
great_that_0 = x_rnd > 0


# Use plt.hist to create histograms of the values
plt.hist(x_rnd)
plt.hist(x_rnd[great_that_0])
plt.hist(x_rnd[~great_that_0])

Out[73]:

(array([  2.,   2.,   3.,   4.,  21.,  45.,  62., 113., 100., 132.]),
 array([-3.56610817, -3.20994554, -2.85378291, -2.49762028, -2.14145765,
        -1.78529501, -1.42913238, -1.07296975, -0.71680712, -0.36064449,
        -0.00448186]),
 <BarContainer object of 10 artists>)

No description has been provided for this image

In [86]:

Copied!





# alpha = transparancy of the histogram
# color = color of the histogram
# bins = binning to use

# linspace linearly paced numbers
# min, max, n
binning = np.linspace(-5,5, 20)
plt.hist(x_rnd, bins= binning, 
         alpha = 0.5, color = "magenta", label = "All", hatch = "/")
plt.hist(x_rnd[great_that_0], bins= binning, 
         alpha = 0.5, color = "black", label = "X>0", hatch = "o")
plt.hist(x_rnd[~great_that_0], bins= binning, 
         alpha = 0.5, color = "darkorange", label = "$X \leq 0$",hatch = "*")
plt.xlabel("X Value")
plt.ylabel("dN/dX")
plt.grid()
plt.legend()
# alpha = transparancy of the histogram
# color = color of the histogram
# bins = binning to use

# linspace linearly paced numbers
# min, max, n
binning = np.linspace(-5,5, 20)
plt.hist(x_rnd, bins= binning, 
         alpha = 0.5, color = "magenta", label = "All", hatch = "/")
plt.hist(x_rnd[great_that_0], bins= binning, 
         alpha = 0.5, color = "black", label = "X>0", hatch = "o")
plt.hist(x_rnd[~great_that_0], bins= binning, 
         alpha = 0.5, color = "darkorange", label = "$X \leq 0$",hatch = "*")
plt.xlabel("X Value")
plt.ylabel("dN/dX")
plt.grid()
plt.legend()

Out[86]:

<matplotlib.legend.Legend at 0x72ec91609050>

In [ ]:

Copied!

We can define functions which to operate on numpy arrays
We can define functions which to operate on numpy arrays

In [125]:

Copied!

def sqrt(x):
    return np.sqrt(x)
def sqrt(x):
    return np.sqrt(x)

In [126]:

Copied!

my_values = np.linspace(0,100)
my_sqrts = sqrt(my_values)
print (my_sqrts[:5])
my_values = np.linspace(0,100)
my_sqrts = sqrt(my_values)
print (my_sqrts[:5])

[0.         1.42857143 2.02030509 2.4743583  2.85714286]

However we do need to be careful on how we write our functions:

In [138]:

Copied!





def capped_sqrt(x):
    if x > 0: 
        return np.sqrt(x)
    else:
        return 0.
def capped_sqrt(x):
    if x > 0: 
        return np.sqrt(x)
    else:
        return 0.

In [139]:

Copied!

my_values = np.linspace(-10,10)
my_sqrts = capped_sqrt(my_values)
print (my_sqrts[:5])
my_values = np.linspace(-10,10)
my_sqrts = capped_sqrt(my_values)
print (my_sqrts[:5])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[139], line 2
      1 my_values = np.linspace(-10,10)
----> 2 my_sqrts = capped_sqrt(my_values)
      3 print (my_sqrts[:5])

Cell In[138], line 2, in capped_sqrt(x)
      1 def capped_sqrt(x):
----> 2     if x > 0: 
      3         return np.sqrt(x)
      4     else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We get around this by vectorizing functions. This allows us to run the function on the entire array without needing to loop over the elements.

In [140]:

Copied!





@np.vectorize
def capped_sqrt(x):
    if x > 0: 
        return np.sqrt(x)
    else:
        return 0.
@np.vectorize
def capped_sqrt(x):
    if x > 0: 
        return np.sqrt(x)
    else:
        return 0.

In [141]:

Copied!

my_values = np.arange(-1,10)
my_sqrts = capped_sqrt(my_values)
print (my_sqrts[:5])
my_values = np.arange(-1,10)
my_sqrts = capped_sqrt(my_values)
print (my_sqrts[:5])

[0.         0.         1.         1.41421356 1.73205081]

Decorators¶

Decorators allow us to modify the behavior of a function. They are essentially a function, that take another function as an arguement and modifies the behavior of the function.

Let's define a logging decorator:

In [153]:

Copied!





def my_logger(func):
    def wrapper(*args, **kwargs):
        print( f"Running {func.__name__} with:\n\t args = {args}\n\t kwargs = {kwargs}")
        ret = func(*args, **kwargs)
        print( f"Returning {ret}")
        return ret
    return wrapper
def my_logger(func):
    def wrapper(*args, **kwargs):
        print( f"Running {func.__name__} with:\n\t args = {args}\n\t kwargs = {kwargs}")
        ret = func(*args, **kwargs)
        print( f"Returning {ret}")
        return ret
    return wrapper
        

In [154]:

Copied!

@my_logger
def sqrt(x):
    return np.sqrt(x)
@my_logger
def sqrt(x):
    return np.sqrt(x)

In [155]:

Copied!

sqrt(4)
sqrt(4)

Running sqrt with:
	 args = (4,)
	 kwargs = {}
Returning 2.0

Out[155]:

2.0

Working with Pandas¶

Pandas is an open-source data manipulation and analysis library in Python that's built on top of NumPy. It provides high-level data structures and a variety of tools for working with structured data.

The core data structure in Pandas is the DataFrame, which is essentially a two-dimensional array with labeled axes (rows and columns). This DataFrame object is built upon NumPy's ndarray, utilizing its efficient operations and functions.

In [38]:

Copied!





import pandas as pd

# Creating a Pandas DataFrame
data = {
    'A': np.random.randn(5),  # Creating a NumPy array for column 'A'
    'B': np.random.rand(5)    # Creating a NumPy array for column 'B'
}

df = pd.DataFrame(data)
print("Pandas DataFrame:")
print(df)
import pandas as pd

# Creating a Pandas DataFrame
data = {
    'A': np.random.randn(5),  # Creating a NumPy array for column 'A'
    'B': np.random.rand(5)    # Creating a NumPy array for column 'B'
}

df = pd.DataFrame(data)
print("Pandas DataFrame:")
print(df)

Pandas DataFrame:
          A         B
0 -1.591723  0.883364
1 -1.550968  0.513569
2  0.753827  0.700477
3  0.557346  0.619088
4  1.900134  0.677126

In [39]:

Copied!

help(np.random.randn)
help(np.random.randn)

Help on built-in function randn:

randn(...) method of numpy.random.mtrand.RandomState instance
    randn(d0, d1, ..., dn)
    
    Return a sample (or samples) from the "standard normal" distribution.
    
    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `standard_normal`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.
    
    .. note::
        New code should use the
        `~numpy.random.Generator.standard_normal`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.
    
    If positive int_like arguments are provided, `randn` generates an array
    of shape ``(d0, d1, ..., dn)``, filled
    with random floats sampled from a univariate "normal" (Gaussian)
    distribution of mean 0 and variance 1. A single float randomly sampled
    from the distribution is returned if no argument is provided.
    
    Parameters
    ----------
    d0, d1, ..., dn : int, optional
        The dimensions of the returned array, must be non-negative.
        If no argument is given a single Python float is returned.
    
    Returns
    -------
    Z : ndarray or float
        A ``(d0, d1, ..., dn)``-shaped array of floating-point samples from
        the standard normal distribution, or a single such float if
        no parameters were supplied.
    
    See Also
    --------
    standard_normal : Similar, but takes a tuple as its argument.
    normal : Also accepts mu and sigma arguments.
    random.Generator.standard_normal: which should be used for new code.
    
    Notes
    -----
    For random samples from the normal distribution with mean ``mu`` and
    standard deviation ``sigma``, use::
    
        sigma * np.random.randn(...) + mu
    
    Examples
    --------
    >>> np.random.randn()
    2.1923875335537315  # random
    
    Two-by-four array of samples from the normal distribution with
    mean 3 and standard deviation 2.5:
    
    >>> 3 + 2.5 * np.random.randn(2, 4)
    array([[-4.49401501,  4.00950034, -1.81814867,  7.29718677],   # random
           [ 0.39924804,  4.68456316,  4.99394529,  4.84057254]])  # random

In [40]:

Copied!





# Accessing the underlying NumPy array of column 'A'
numpy_array = df['A'].values
print("Numpy array from Pandas DataFrame:")
print(numpy_array)
# Accessing the underlying NumPy array of column 'A'
numpy_array = df['A'].values
print("Numpy array from Pandas DataFrame:")
print(numpy_array)

Numpy array from Pandas DataFrame:
[-1.59172312 -1.55096775  0.75382695  0.55734567  1.90013409]

In [41]:

Copied!





# Loading a csv file using pandas
url="https://r2.datahub.io/clt98lqg6000el708ja5zbtz0/master/raw/data/monthly.csv"
df=pd.read_csv(url)
df.head()
# Loading a csv file using pandas
url="https://r2.datahub.io/clt98lqg6000el708ja5zbtz0/master/raw/data/monthly.csv"
df=pd.read_csv(url)
df.head()

Out[41]:

	Source	Date	Mean
0	GCAG	2016-12	0.7895
1	GISTEMP	2016-12	0.8100
2	GCAG	2016-11	0.7504
3	GISTEMP	2016-11	0.9300
4	GCAG	2016-10	0.7292

In [42]:

Copied!

df.tail()
df.tail()

Out[42]:

	Source	Date	Mean
3283	GISTEMP	1880-03	-0.1800
3284	GCAG	1880-02	-0.1229
3285	GISTEMP	1880-02	-0.2100
3286	GCAG	1880-01	0.0009
3287	GISTEMP	1880-01	-0.3000

In [43]:

Copied!

df.plot(x = "Date", y = "Mean")
df.plot(x = "Date", y = "Mean")

Out[43]:

<Axes: xlabel='Date'>

Data Analysis with Python¶

Python is a great language for high-level data analysis, with jupyter notebooks providing a great "analysis notebook" for documenting analysis and displaying results.

Let's look at how we might reduce and analyze data using Python and extract some meaningful results.

Fitting a model to data¶

scipy optimize package
numpy polyfit
Error propagation
bootstrapping

Let's start by creating a data set using numpy.

In [121]:

Copied!





# Let define the true model
def model(x, p0, p1, p2):
    return p0 * x**2 + p1 * x + p2

# Set the true parameters
p_true = [0.02, 0.1, -2.5]

# Let the x points be random floats between 0-10
x = 10*np.random.rand(100)
y = model(x, p_true[0], p_true[1], p_true[2])

# let's add some gaussian noise
y_noisey = y + np.random.normal(loc = 0, scale = 0.2, size = x.shape)
# define our y error as 0.2
y_err = 0.2 * np.ones(x.shape)

# Plot the data
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.grid()
# Let define the true model
def model(x, p0, p1, p2):
    return p0 * x**2 + p1 * x + p2

# Set the true parameters
p_true = [0.02, 0.1, -2.5]

# Let the x points be random floats between 0-10
x = 10*np.random.rand(100)
y = model(x, p_true[0], p_true[1], p_true[2])

# let's add some gaussian noise
y_noisey = y + np.random.normal(loc = 0, scale = 0.2, size = x.shape)
# define our y error as 0.2
y_err = 0.2 * np.ones(x.shape)

# Plot the data
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.grid()

Let's use scipy.optimize.curve_fit

curve_fit will perform a lease-squares minimization: $$ (\vec{y} - y_{model}(\vec{x}, \theta))^2 $$

If errors are provided than it will perform a $\chi^2$-minimization $$ \frac{(\vec{y} - y_{model}(\vec{x}, \theta))^2}{\vec{\Delta y}^2} $$

curve_fit returns the optimal parameters and the correlation matrix for the minimization allowing us to easily extract an uncertainty.

In [122]:

Copied!





# Use scipy curve_fit to perform a fit
from scipy.optimize import curve_fit

# Returns optimal (popt) and correation matrix (pcov)
popt, pcov = curve_fit(
    model, # Function we want to fit
    x,     # x data
    y_noisey,  # y data
    p0 = [-0.2, 1, 5],  # initial guess
    sigma=y_err   # error on y
)

x_plot = np.linspace(0,10)
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.plot(x_plot, model(x_plot, *popt), "r--", label = "Best fit")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.legend()
plt.grid()

parameter_errors = np.sqrt(np.diag(pcov))
for p, perr in zip(popt, parameter_errors):
    print (f"{p:0.3f} +/- {perr:0.3f}")
# Use scipy curve_fit to perform a fit
from scipy.optimize import curve_fit

# Returns optimal (popt) and correation matrix (pcov)
popt, pcov = curve_fit(
    model, # Function we want to fit
    x,     # x data
    y_noisey,  # y data
    p0 = [-0.2, 1, 5],  # initial guess
    sigma=y_err   # error on y
)

x_plot = np.linspace(0,10)
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.plot(x_plot, model(x_plot, *popt), "r--", label = "Best fit")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.legend()
plt.grid()

parameter_errors = np.sqrt(np.diag(pcov))
for p, perr in zip(popt, parameter_errors):
    print (f"{p:0.3f} +/- {perr:0.3f}")

0.018 +/- 0.003
0.117 +/- 0.031
-2.519 +/- 0.071

Are we confident in our uncertainty?¶

It can often be difficult to quantify our uncertainties. Bootstrapping is a useful method to estimate our uncertaities.

Assuming we have independent data points, we can randomly sample our data, apply out fit to that data and then repeat a number of times, to estimate the distribution of best fit values.

In [123]:

Copied!





# bootstrapping
samples = []

for i in range(100):
    # Get random indices
    # replace = True allows us to reuse indicies
    # So we could be drawing an estimate from the [0th, 11th, 81st, 0th] elements of our array
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    # Extract the corresponding values
    x_samp = x[rnd_int]
    y_samp = y_noisey[rnd_int]
    y_samp_err = y_err[rnd_int]

    # Apply fit
    p, _ = curve_fit(model, x_samp, y_samp, sigma = y_samp_err)
    # Store Result
    samples.append(p)
samples = np.array(samples)

# bootstrapping
samples = []

for i in range(100):
    # Get random indices
    # replace = True allows us to reuse indicies
    # So we could be drawing an estimate from the [0th, 11th, 81st, 0th] elements of our array
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    # Extract the corresponding values
    x_samp = x[rnd_int]
    y_samp = y_noisey[rnd_int]
    y_samp_err = y_err[rnd_int]

    # Apply fit
    p, _ = curve_fit(model, x_samp, y_samp, sigma = y_samp_err)
    # Store Result
    samples.append(p)
samples = np.array(samples)

In [96]:

Copied!





fig, axs = plt.subplots(1,3, figsize = (18,6))
for i in range(3):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
fig, axs = plt.subplots(1,3, figsize = (18,6))
for i in range(3):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
    

In [97]:

Copied!

### What if we under estimate our errors?
### What if we under estimate our errors?

In [98]:

Copied!





# let's add some gaussian noise
# Increase spread to 0.3
y_noisey = y + np.random.normal(loc = 0, scale = 0.3, size = x.shape)
# define our y error as 0.1 (decreasing error)
y_err = 0.1 * np.ones(x.shape)
# let's add some gaussian noise
# Increase spread to 0.3
y_noisey = y + np.random.normal(loc = 0, scale = 0.3, size = x.shape)
# define our y error as 0.1 (decreasing error)
y_err = 0.1 * np.ones(x.shape)

In [99]:

Copied!





# Returns optimal (popt) and correation matrix (pcov)
popt, pcov = curve_fit(
    model, # Function we want to fit
    x,     # x data
    y_noisey,  # y data
    p0 = [-0.2, 1, 5],  # initial guess
    sigma=y_err   # error on y
)

x_plot = np.linspace(0,10)
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.plot(x_plot, model(x_plot, *popt), "r--", label = "Best fit")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.legend()
plt.grid()
# Returns optimal (popt) and correation matrix (pcov)
popt, pcov = curve_fit(
    model, # Function we want to fit
    x,     # x data
    y_noisey,  # y data
    p0 = [-0.2, 1, 5],  # initial guess
    sigma=y_err   # error on y
)

x_plot = np.linspace(0,10)
plt.errorbar(x, y_noisey, yerr = y_err , fmt = "C0o", label = "Measured")
plt.plot(x_plot, model(x_plot, *popt), "r--", label = "Best fit")
plt.ylabel("Y Values")
plt.xlabel("X Values")
plt.legend()
plt.grid()

In [100]:

Copied!





# bootstrapping
samples = []
for i in range(100):
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    x_samp = x[rnd_int]
    y_samp = y_noisey[rnd_int]
    y_samp_err = y_err[rnd_int]

    p, _ = curve_fit(model, x_samp, y_samp, sigma = y_samp_err)
    samples.append(p)
samples = np.array(samples)
# bootstrapping
samples = []
for i in range(100):
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    x_samp = x[rnd_int]
    y_samp = y_noisey[rnd_int]
    y_samp_err = y_err[rnd_int]

    p, _ = curve_fit(model, x_samp, y_samp, sigma = y_samp_err)
    samples.append(p)
samples = np.array(samples)

In [101]:

Copied!





fig, axs = plt.subplots(1,3, figsize = (18,6))
for i in range(3):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
fig, axs = plt.subplots(1,3, figsize = (18,6))
for i in range(3):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
    

In [102]:

Copied!

### How to use bootstrapping to handle no-gaussian errors
### How to use bootstrapping to handle no-gaussian errors

In [103]:

Copied!

def exp_model(x, n, tau):
    return n*x**-tau
def exp_model(x, n, tau):
    return n*x**-tau

In [104]:

Copied!

p_true = [10, 0.5]
x_plot = np.linspace(0,10)
plt.plot(x_plot, exp_model(x_plot, *p_true))
p_true = [10, 0.5]
x_plot = np.linspace(0,10)
plt.plot(x_plot, exp_model(x_plot, *p_true))

/tmp/ipykernel_261108/1764079388.py:2: RuntimeWarning: divide by zero encountered in power
  return n*x**-tau

Out[104]:

[<matplotlib.lines.Line2D at 0x72ec8e584b10>]

Poisson Distribution¶

$$p(X = k ; \lambda) = \frac{e^{-\lambda}\lambda^{k}}{k!}$$

Where $k$ is the observed counts, $\lambda$ is the mean counts. Mean is $\lambda$, standard deviation is $\sqrt{\lambda}$. In counting experiments we typically say if $f= N$, then, $\Delta f = \sqrt{N}$.

Does this mean that its appropriate to use $\sqrt{N}$ in a $\chi^2$ fit?

In [105]:

Copied!





lam = np.arange(6)
fig, axs = plt.subplots(2,3, figsize = (11,6))

for l, ax in zip(lam, axs.ravel()):
    rnd_x = np.random.poisson(lam = l, size = 1000)
    ax.hist(rnd_x, alpha = 0.5, bins = -0.5 + np.arange(0,15))
    ax.axvline(l)
    ax.axvline(l - np.sqrt(l))
    ax.axvline(l + np.sqrt(l))
    ax.set_title("$\lambda$ = " + f"{l}")
    ax.grid()
fig.tight_layout()
lam = np.arange(6)
fig, axs = plt.subplots(2,3, figsize = (11,6))

for l, ax in zip(lam, axs.ravel()):
    rnd_x = np.random.poisson(lam = l, size = 1000)
    ax.hist(rnd_x, alpha = 0.5, bins = -0.5 + np.arange(0,15))
    ax.axvline(l)
    ax.axvline(l - np.sqrt(l))
    ax.axvline(l + np.sqrt(l))
    ax.set_title("$\lambda$ = " + f"{l}")
    ax.grid()
fig.tight_layout()
    

In [110]:

Copied!





x = 10*np.random.random(100)
# y = np.array([ 
#     np.random.poisson(lam = exp_model(x_i, *p_true), size = 1) 
#     for x_i in x 
# ])[:,0]
y = np.random.poisson(lam = exp_model(x, *p_true))
x = 10*np.random.random(100)
# y = np.array([ 
#     np.random.poisson(lam = exp_model(x_i, *p_true), size = 1) 
#     for x_i in x 
# ])[:,0]
y = np.random.poisson(lam = exp_model(x, *p_true)) 
    

In [111]:

Copied!





x_plot = np.linspace(0,10)
y_err = np.sqrt(y)
# popt, pcov = curve_fit(exp_model, x, y, sigma = y_err)
popt, pcov = curve_fit(exp_model, x, y)

plt.plot(x_plot, exp_model(x_plot, *p_true))
plt.errorbar(x, y, yerr = y_err, fmt = "C0o")
plt.plot(x_plot, exp_model(x_plot, *popt))
plt.grid()

parameter_errors = np.sqrt(np.diag(pcov))
for pt, p, perr in zip(p_true, popt, parameter_errors):
    print (f"{pt:0.3f} -> {p:0.3f} +/- {perr:0.3f}")
x_plot = np.linspace(0,10)
y_err = np.sqrt(y)
# popt, pcov = curve_fit(exp_model, x, y, sigma = y_err)
popt, pcov = curve_fit(exp_model, x, y)

plt.plot(x_plot, exp_model(x_plot, *p_true))
plt.errorbar(x, y, yerr = y_err, fmt = "C0o")
plt.plot(x_plot, exp_model(x_plot, *popt))
plt.grid()

parameter_errors = np.sqrt(np.diag(pcov))
for pt, p, perr in zip(p_true, popt, parameter_errors):
    print (f"{pt:0.3f} -> {p:0.3f} +/- {perr:0.3f}")

10.000 -> 9.989 +/- 0.369
0.500 -> 0.485 +/- 0.016

/tmp/ipykernel_261108/1764079388.py:2: RuntimeWarning: divide by zero encountered in power
  return n*x**-tau

In [112]:

Copied!





# bootstrapping
samples = []
for i in range(100):
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    x_samp = x[rnd_int]
    y_samp = y[rnd_int]
    y_samp_err = y_err[rnd_int]

    p, _ = curve_fit(exp_model, x_samp, y_samp)
    samples.append(p)
samples = np.array(samples)
# bootstrapping
samples = []
for i in range(100):
    rnd_int = np.random.choice(np.arange(len(x)), size=len(x), replace=True)
    x_samp = x[rnd_int]
    y_samp = y[rnd_int]
    y_samp_err = y_err[rnd_int]

    p, _ = curve_fit(exp_model, x_samp, y_samp)
    samples.append(p)
samples = np.array(samples)

In [113]:

Copied!





fig, axs = plt.subplots(1,2, figsize = (18,6))
for i in range(2):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
fig, axs = plt.subplots(1,2, figsize = (18,6))
for i in range(2):
    mean = np.mean(samples[:,i])
    std = np.std(samples[:,i])
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    axs[i].axvline(mean, color = "C4", label = "Bootstrap")
    axs[i].axvline(mean - std, ls = "--", color = "C4")
    axs[i].axvline(mean + std, ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
    

We can see that the bootstrapped distributions are highly non-gaussian. It might not make sense to report the uncertainty as 1 sigma error on the fit parameters. Instead we might report using the bootstrapped quantiles. A common way to represent the uncertinty would be to report the 90% confidience/credibility interval. This says that:

If we were to repeat this experiement 100 times, the measured value would be in this interval 90% of the time

In [120]:

Copied!





fig, axs = plt.subplots(1,2, figsize = (18,6))


for i, pt in enumerate(p_true):
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    quan = np.quantile(samples[:,i], [0.05, 0.5, 0.95])

    
    axs[i].axvline(quan[1], color = "C4", label = "Bootstrap")
    axs[i].axvline(quan[0], ls = "--", color = "C4")
    axs[i].axvline(quan[2], ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
    
    print (f"{pt:0.3f} -> {quan[1]:0.3f} [{quan[0]:0.3f}, {quan[2]:0.3f}]")
fig, axs = plt.subplots(1,2, figsize = (18,6))


for i, pt in enumerate(p_true):
    
    axs[i].hist(samples[:,i], alpha = 0.5)
    axs[i].axvline(popt[i], color = "r", label = "From Fit")
    axs[i].axvline(popt[i] - parameter_errors[i], ls = "--", color = "r")
    axs[i].axvline(popt[i] + parameter_errors[i], ls = "--", color = "r")

    quan = np.quantile(samples[:,i], [0.05, 0.5, 0.95])

    
    axs[i].axvline(quan[1], color = "C4", label = "Bootstrap")
    axs[i].axvline(quan[0], ls = "--", color = "C4")
    axs[i].axvline(quan[2], ls = "--", color = "C4")
    axs[i].axvline(p_true[i], color = "k", label = "True")
    axs[i].grid()
    axs[i].legend()
    
    print (f"{pt:0.3f} -> {quan[1]:0.3f} [{quan[0]:0.3f}, {quan[2]:0.3f}]")

10.000 -> 10.025 [9.340, 10.646]
0.500 -> 0.491 [0.469, 0.541]

Classes in Python¶

Classes in Python serve as templates or blueprints defining the attributes (data) and behaviors (methods) of objects. They encapsulate both data and methods that operate on that data within a single structure, promoting code organization and reusability.

To create a class in Python, you use the class keyword, allowing you to define properties (attributes) and behaviors (methods) within it.

Attributes and Methods¶

Attributes represent the data associated with a class, while methods are functions defined within the class that can access and manipulate this data. These methods can perform various operations on the attributes, thereby altering or providing access to the data encapsulated within the class.

Example of a Simple Class¶

Consider the following example of a basic class in Python:

class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year

    def get_details(self):
        return f"{self.year} {self.make} {self.model}"

The __init__ function is the "initialization" or constructor of the class. This function is called when the object is created.

In [ ]:

In [129]:

Copied!





class Data():
    """Data Class

    Class for holding x/y data with methods to calculate the properties 
    and some plotting functionalities. 
    """
    # The "self" keyword denotes data belonging to the class
    def __init__(self, x_data : np.ndarray, y_data : np.ndarray) -> None:
        """Initialization function

        Copy x_data and y_data

        Args:
            x_data : data on the x axis
            y_data : data on the y axis

        Returns:
            None
        """
        self.x_data = x_data.copy()
        self.y_data = y_data.copy()

    # Member functions take "self" as the first argument
    def calculate_properties(self) -> None:
        """Calculate properties of X and Y data

        Determine the mean and standard deviation of x_data and y_data.
        Mean and standard deviation are stored as attributes within Data class

        Args:
            None

        Returns:
            None
        """
        self.x_mean = np.mean(self.x_data)
        self.y_mean = np.mean(self.y_data)

        self.x_std = np.std(self.x_data)
        self.y_std = np.std(self.y_data)

    def plot_data(self, x_label : str = None, y_label : str = None) -> plt.figure:
        """Plot X and Y data

        Plot the X and Y data and return the figure. Add optional x/y labels.
        Lines are added for the mean x/y and their standard deviations.

        Args:
            x_label : (optional) string for the label of the x axis. (Default None)
            y_label : (optional) string for the label of the y axis. (Default None)

        Returns:
            figure with the plot of y(x)
        """

        fig = plt.figure(figsize = (11,6))
        plt.plot(self.x_data, self.y_data)

        # Plotting the means as solid lines and the +/- 1 sigma as dashed lines
        plt.axhline(self.y_mean, color = "C1", ls = "-", label = r"$\mu_{y}$")
        plt.axhline(self.y_mean + self.y_std, color = "C1", ls = "--")
        plt.axhline(self.y_mean - self.y_std, color = "C1", ls = "--", label = r"$\mu_{y} \pm  \sigma_{y}$")
        
        plt.axvline(self.x_mean, color = "C2", ls = "-", label = r"$\mu_{x}$")
        plt.axvline(self.x_mean + self.x_std, color = "C2", ls = "--", )
        plt.axvline(self.x_mean - self.x_std, color = "C2", ls = "--", label = r"$\mu_{x} \pm  \sigma_{x}$")
        
        if x_label is not None:
            plt.xlabel(x_label)
        if y_label is not None:
            plt.ylabel(y_label)
        plt.legend()
        plt.grid()

        return fig
class Data():
    """Data Class

    Class for holding x/y data with methods to calculate the properties 
    and some plotting functionalities. 
    """
    # The "self" keyword denotes data belonging to the class
    def __init__(self, x_data : np.ndarray, y_data : np.ndarray) -> None:
        """Initialization function

        Copy x_data and y_data

        Args:
            x_data : data on the x axis
            y_data : data on the y axis

        Returns:
            None
        """
        self.x_data = x_data.copy()
        self.y_data = y_data.copy()

    # Member functions take "self" as the first argument
    def calculate_properties(self) -> None:
        """Calculate properties of X and Y data

        Determine the mean and standard deviation of x_data and y_data.
        Mean and standard deviation are stored as attributes within Data class

        Args:
            None

        Returns:
            None
        """
        self.x_mean = np.mean(self.x_data)
        self.y_mean = np.mean(self.y_data)

        self.x_std = np.std(self.x_data)
        self.y_std = np.std(self.y_data)

    def plot_data(self, x_label : str = None, y_label : str = None) -> plt.figure:
        """Plot X and Y data

        Plot the X and Y data and return the figure. Add optional x/y labels.
        Lines are added for the mean x/y and their standard deviations.

        Args:
            x_label : (optional) string for the label of the x axis. (Default None)
            y_label : (optional) string for the label of the y axis. (Default None)

        Returns:
            figure with the plot of y(x)
        """

        fig = plt.figure(figsize = (11,6))
        plt.plot(self.x_data, self.y_data)

        # Plotting the means as solid lines and the +/- 1 sigma as dashed lines
        plt.axhline(self.y_mean, color = "C1", ls = "-", label = r"$\mu_{y}$")
        plt.axhline(self.y_mean + self.y_std, color = "C1", ls = "--")
        plt.axhline(self.y_mean - self.y_std, color = "C1", ls = "--", label = r"$\mu_{y} \pm  \sigma_{y}$")
        
        plt.axvline(self.x_mean, color = "C2", ls = "-", label = r"$\mu_{x}$")
        plt.axvline(self.x_mean + self.x_std, color = "C2", ls = "--", )
        plt.axvline(self.x_mean - self.x_std, color = "C2", ls = "--", label = r"$\mu_{x} \pm  \sigma_{x}$")
        
        if x_label is not None:
            plt.xlabel(x_label)
        if y_label is not None:
            plt.ylabel(y_label)
        plt.legend()
        plt.grid()

        return fig
                

In [130]:

Copied!





x = np.linspace(-3 * np.pi, 3 * np.pi, 100 )
y = np.sin(x)

my_data = Data(x, y)
my_data.calculate_properties()
fig = my_data.plot_data(y_label="Sin(x)")
x = np.linspace(-3 * np.pi, 3 * np.pi, 100 )
y = np.sin(x)

my_data = Data(x, y)
my_data.calculate_properties()
fig = my_data.plot_data(y_label="Sin(x)")

Inheritance in Python¶

Inheritance is a fundamental concept in object-oriented programming that allows a new class to inherit properties and behaviors (attributes and methods) from an existing class. This concept promotes code reuse, enhances readability, and enables the creation of more specialized classes.

Basics of Inheritance¶

In Python, inheritance is achieved by specifying the name of the parent class(es) inside the definition of a new class. The new class, also known as the child class or subclass, inherits all attributes and methods from its parent class or classes, referred to as the base class or superclass.

Syntax for Inheriting Classes¶

The syntax for creating a subclass that inherits from a superclass involves passing the name of the superclass inside parentheses when defining the subclass:

class ParentClass:
    # Parent class attributes and methods

class ChildClass(ParentClass):
    # Child class attributes and methods

Here, ChildClass is inheriting from ParentClass, which means ChildClass will inherit all attributes and methods defined in ParentClass.

In [131]:

Copied!





class TimeSeries(Data):
    """Time Series Data Class

    Class for holding x/y data with methods to calculate the properties 
    and some plotting functionalities. 
    The X data is assumed to be time
    """
    def __init__(self, x_data : np.ndarray, y_data: np.ndarray) -> None:
        """Initialization function

        Copy x_data and y_data

        Args:
            x_data : data on the x axis
            y_data : data on the y axis

        Returns:
            None
        """
        # We use the "super" keyword to call parent class functions
        super().__init__(x_data, y_data)

    # We can overwrite functions
    def plot_data(self) -> plt.figure:
        """Plot X and Y data

        Plot the X and Y data and return the figure. Add optional x/y labels.
        Lines are added for the mean x/y and their standard deviations.

        Args:
            None

        Returns:
            figure with the plot of y(x)
        """
        return super().plot_data(x_label = "Time", y_label = "AU")
    
    # We can also define new functions
    def add_to_data(self, y : float) -> None:
        """Add a constant offset to y data.

        A constant offset is added to self.y_data. 
        The properties (mean and std) are calculated for the adjusted dataset

        Args:
            y : constant offset to be added to self.y_data

        Returns:
            None
        """
        self.y_data += y
        # Recalculate the properties
        super().calculate_properties()
class TimeSeries(Data):
    """Time Series Data Class

    Class for holding x/y data with methods to calculate the properties 
    and some plotting functionalities. 
    The X data is assumed to be time
    """
    def __init__(self, x_data : np.ndarray, y_data: np.ndarray) -> None:
        """Initialization function

        Copy x_data and y_data

        Args:
            x_data : data on the x axis
            y_data : data on the y axis

        Returns:
            None
        """
        # We use the "super" keyword to call parent class functions
        super().__init__(x_data, y_data)

    # We can overwrite functions
    def plot_data(self) -> plt.figure:
        """Plot X and Y data

        Plot the X and Y data and return the figure. Add optional x/y labels.
        Lines are added for the mean x/y and their standard deviations.

        Args:
            None

        Returns:
            figure with the plot of y(x)
        """
        return super().plot_data(x_label = "Time", y_label = "AU")
    
    # We can also define new functions
    def add_to_data(self, y : float) -> None:
        """Add a constant offset to y data.

        A constant offset is added to self.y_data. 
        The properties (mean and std) are calculated for the adjusted dataset

        Args:
            y : constant offset to be added to self.y_data

        Returns:
            None
        """
        self.y_data += y
        # Recalculate the properties
        super().calculate_properties()

In [132]:

Copied!





my_time_series = TimeSeries(x, y)
# Calling a function from the parent class
my_time_series.calculate_properties()
# Call a function that only exists in the child class
my_time_series.add_to_data(10)
# Call overridden function
fig = my_time_series.plot_data()
my_time_series = TimeSeries(x, y)
# Calling a function from the parent class
my_time_series.calculate_properties()
# Call a function that only exists in the child class
my_time_series.add_to_data(10)
# Call overridden function
fig = my_time_series.plot_data()

In [ ]: