.jpg)
1. Introduction to Python's Mathematical Modules
Python's standard library provides a rich ecosystem of modules designed to handle various mathematical operations, from basic arithmetic and complex numbers to advanced statistical analysis, precision control, and cryptographic hashing. Understanding these modules is crucial for developing robust and accurate applications in scientific computing, data analysis, finance, security, and many other domains. This section introduces the core mathematical modules available out-of-the-box in Python.
Overview of the Python math module
The math module provides access to common mathematical functions and constants. It works only with real numbers (floats) and generally follows widely accepted computer programming standards when possible.
- ✅ Purpose: Standard mathematical functions for real numbers.
- 🔑 Key Features: Trigonometric, logarithmic, exponential functions, constants (
pi,e,inf,nan), number theory functions (gcd,factorial). - ❌ Limitation: Does not support complex numbers.
Overview of the Python cmath module
The cmath module provides mathematical functions for complex numbers. Its functions typically return complex numbers, even if the result could be expressed as a real number.
- ✅ Purpose: Mathematical functions specifically designed for complex numbers.
- 🔑 Key Features: Complex trigonometric, logarithmic, exponential functions, phase, polar and rectangular coordinate conversions.
- ❌ Limitation: Not suitable for real-number only calculations where
mathis preferred for potentially better performance or simpler type handling.
Overview of the Python decimal module
The decimal module offers a Decimal floating-point arithmetic implementation, designed for situations where exact decimal representation is needed, such as financial calculations, to avoid floating-point inaccuracies.
- ✅ Purpose: Arbitrary-precision decimal floating-point arithmetic.
- 🔑 Key Features: Exact representation of decimal numbers, configurable precision, controlled rounding, avoidance of common binary floating-point issues.
- ❌ Trade-off: Slower performance compared to native
floatoperations.
Overview of the Python fractions module
The fractions module provides support for rational number arithmetic. A Fraction instance can be constructed from a pair of integers, a float, or a string.
- ✅ Purpose: Exact rational number arithmetic (numerator and denominator).
- 🔑 Key Features: Represents numbers as fractions, no loss of precision in arithmetic operations, useful for exact results in mathematics.
- ❌ Trade-off: Can lead to very large numerators and denominators for complex calculations, potentially impacting performance.
Overview of the Python random module
The random module implements pseudo-random number generators for various distributions. It is suitable for simulations, games, and non-cryptographic applications.
- ✅ Purpose: Generate pseudo-random numbers for simulations and non-security-sensitive tasks.
- 🔑 Key Features: Integers (
randint), floats (uniform), sequence selection (choice,sample), various probability distributions (gauss,expovariate). - ❌ Caution: Not cryptographically secure; do not use for security-sensitive applications.
Overview of the Python secrets module
The secrets module is designed for generating cryptographically strong random numbers suitable for managing sensitive data such as passwords, authentication tokens, and security-critical keys.
- ✅ Purpose: Generate cryptographically strong random numbers for security purposes.
- 🔑 Key Features: Secure token generation (
token_hex,token_urlsafe), secure choice from sequences. - ❌ Caveat: May be slightly slower than
randomdue to higher entropy requirements.
Overview of the Python statistics module
The statistics module provides functions for calculating mathematical statistics of numerical data. It handles common descriptive statistics.
- ✅ Purpose: Basic descriptive statistics for numerical data.
- 🔑 Key Features: Measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation).
- ❌ Limitation: Not a full-fledged statistical analysis package like SciPy or NumPy; focuses on basic, common statistics.
Overview of the Python hashlib module
The hashlib module implements a common interface to many different secure hash and message digest algorithms. These are "one-way" functions critical for data integrity and security.
- ✅ Purpose: Secure hashing and message digest algorithms.
- 🔑 Key Features: Supports SHA256, SHA3, BLAKE2, and other algorithms, password hashing (PBKDF2), data integrity checks.
- ❌ Warning: Requires careful use to avoid common security pitfalls (e.g., using weak hashes, not salting passwords).
2. Phase 1: Foundations of Mathematical Functions
2.1. Basic Constants and Arithmetic (math module)
The math module is the cornerstone for fundamental mathematical operations on real numbers in Python. It provides access to essential mathematical constants and functions for number theory, rounding, and robust floating-point comparisons.
Utilization of Mathematical Constants
The math module defines several useful constants that represent mathematical concepts or specific numerical values.
- 🔑
math.pi: The mathematical constant π (pi), approximately 3.14159. Used in geometry and trigonometry. - 🔑
math.e: The mathematical constant e, approximately 2.71828. The base of the natural logarithm. - 🔑
math.inf: Positive floating-point infinity. Useful for comparisons involving unbounded values. - 🔑
math.nan: Not a Number (NaN). Represents undefined or unrepresentable numerical results, such asmath.sqrt(-1)or0/0.
import math
print(f"Pi: {math.pi}")
print(f"e: {math.e}")
print(f"Infinity: {math.inf}")
print(f"Not a Number: {math.nan}")
# Example of operations resulting in infinity or NaN
print(f"10 / 0: {math.inf}") # Division by zero for floats yields inf
print(f"sqrt(-1) with math module: {math.sqrt(-1)}") # Raises ValueError, but NaN represents such conceptual result for float operations
10 / 0 in Python for integers raises a ZeroDivisionError. However, for floating-point operations (e.g., 10.0 / 0.0), it results in math.inf. Similarly, math.sqrt(-1) raises a ValueError because math operates on real numbers; for complex numbers, you'd use cmath.
Number Theory Functions
- 🔑
math.gcd(a, b): Returns the greatest common divisor of the integersaandb. If eitheraorbis non-zero, the value ofgcd(a, b)is the largest positive integer that divides bothaandb.gcd(0, 0)returns0. - 🔑
math.factorial(x): Returnsxfactorial. RaisesValueErrorifxis not an integral non-negative number.
import math
# GCD example
print(f"GCD(48, 18): {math.gcd(48, 18)}") # Output: 6
print(f"GCD(17, 34): {math.gcd(17, 34)}") # Output: 17
# Factorial example
print(f"Factorial(5): {math.factorial(5)}") # Output: 120 (5*4*3*2*1)
print(f"Factorial(0): {math.factorial(0)}") # Output: 1
Rounding Techniques
Python offers various methods for rounding numbers, each with a specific behavior. The math module provides functions for rounding up and down, distinct from Python's built-in round().
- 🔑
math.ceil(x): Returns the smallest integer greater than or equal tox(rounds up). - 🔑
math.floor(x): Returns the largest integer less than or equal tox(rounds down).
Comparison with built-in round():
| Input | math.ceil() |
math.floor() |
round() (Python built-in) |
|---|
3.14
4
3
3
3.7
4
3
4
-3.14
-3
-4
-3
-3.7
-3
-4
-4
2.5
3
2
2 (rounds to nearest even)
3.5
4
3
4 (rounds to nearest even)
import math
value1 = 3.14
value2 = -3.7
print(f"Value: {value1}")
print(f" math.ceil({value1}): {math.ceil(value1)}")
print(f" math.floor({value1}): {math.floor(value1)}")
print(f" round({value1}): {round(value1)}")
print(f"\nValue: {value2}")
print(f" math.ceil({value2}): {math.ceil(value2)}")
print(f" math.floor({value2}): {math.floor(value2)}")
print(f" round({value2}): {round(value2)}")
print(f"\nRound half to even (banker's rounding) examples:")
print(f" round(2.5): {round(2.5)}") # Output: 2
print(f" round(3.5): {round(3.5)}") # Output: 4
Floating-point comparison with math.isclose()
Directly comparing floating-point numbers using == can be problematic due to their inherent approximations and precision limitations. Small discrepancies, often unnoticeable to humans, can cause equality checks to fail unexpectedly.
math.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) is designed to safely compare two floating-point values by checking if they are "close enough" to each other within specified tolerances.
rel_tol(relative tolerance): The maximum difference allowed relative to the larger of the two absolute values of the arguments. This is useful for numbers of vastly different magnitudes. Default is1e-09(0.000000001), meaning 9 decimal places of precision.abs_tol(absolute tolerance): The maximum difference allowed, regardless of the magnitude of the arguments. Useful for comparisons near zero or when absolute error bounds are known.
Two numbers a and b are considered close if their absolute difference is less than or equal to either the relative tolerance times the maximum absolute value of a and b, OR the absolute tolerance. Mathematically:
abs(a-b) <= max(rel_tol * abs(a), rel_tol * abs(b)) OR abs(a-b) <= abs_tol
This formula ensures that comparisons work well for both very large and very small numbers.
import math
# Direct comparison problem
x = 0.1 + 0.1 + 0.1
y = 0.3
print(f"x = {x}, y = {y}")
print(f"x == y: {x == y}") # Output: False (due to precision)
# Using math.isclose()
print(f"math.isclose(x, y): {math.isclose(x, y)}") # Output: True (default relative tolerance)
# Custom tolerances
a = 1000.000000001
b = 1000.000000002
print(f"math.isclose(a, b, rel_tol=1e-10): {math.isclose(a, b, rel_tol=1e-10)}") # True, difference 1e-9 is within 1e-10 * 1000
print(f"math.isclose(a, b, rel_tol=1e-12): {math.isclose(a, b, rel_tol=1e-12)}") # False, 1e-9 is too large
print(f"math.isclose(a, b, abs_tol=1e-08): {math.isclose(a, b, abs_tol=1e-08)}") # True, difference 1e-9 is within 1e-8
# Small numbers and abs_tol
c = 1e-9
d = 2e-9
print(f"math.isclose(c, d): {math.isclose(c, d)}") # False (default rel_tol is too high for very small numbers)
print(f"math.isclose(c, d, abs_tol=1e-8): {math.isclose(c, d, abs_tol=1e-8)}") # True, difference 1e-9 is within 1e-8
2.2. Floating Point Utilities
Beyond basic operations, the math module offers utilities for deeper understanding and manipulation of floating-point numbers, particularly addressing their internal representation and potential precision loss in computations.
Understanding Internal Representation: math.frexp()
Floating-point numbers are stored internally in a format similar to scientific notation, where a number x is represented as m * 2^e. Here, m is the mantissa (or significand) and e is the exponent.
The function math.frexp(x) breaks the float x into its mantissa and exponent components, returning a tuple (mantissa, exponent). The mantissa is a float m such that 0.5 <= abs(m) < 1, and the exponent is an integer e. If x is zero, (0.0, 0) is returned.
import math
x = 12.5
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: 12.5
x = 0.25
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: 0.25
x = -0.75
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: -0.75
Accurate Summation: math.fsum()
When summing a large series of floating-point numbers, especially if they vary widely in magnitude, the cumulative effect of small precision errors can lead to a significant loss of accuracy when using the built-in sum() function. This is because sum() adds numbers sequentially, and adding a very small number to a very large number can cause the small number's significant digits to be lost.
math.fsum(iterable) overcomes this by using a Kahan summation algorithm (or similar techniques) to maintain higher precision. It tracks the sum and the "lost" low-order bits separately, integrating them back into the sum to minimize error.
sum() (Default)
Less Accurate
math.fsum()
More Accurate
import math
# Create a list of numbers that highlight precision issues
# Many small numbers summed with one large number
data = [0.0000001] * 100000 + [10000000.0]
# Using built-in sum()
sum_result = sum(data)
print(f"Sum using built-in sum(): {sum_result:.10f}")
# Using math.fsum()
fsum_result = math.fsum(data)
print(f"Sum using math.fsum(): {fsum_result:.10f}")
# The true sum should be 10000000.0 + (0.0000001 * 100000) = 10000000.0 + 0.01 = 10000000.01
# You'll often see sum() yield 10000000.0 or something very close but not 10000000.01
Float Modulo Operations: math.fmod() vs % operator
Both math.fmod(x, y) and the built-in x % y operator compute the remainder of a division. However, they handle negative numbers differently, aligning with different mathematical conventions.
math.fmod(x, y): Follows the C standard, where the result has the same sign as the dividend (x).x % y: Follows the Python convention, where the result has the same sign as the divisor (y).
| Operation | math.fmod(x, y) |
x % y |
Sign of Result |
|---|
5.0 / 2.0
1.0
1.0
Same as dividend (positive)
-5.0 / 2.0
-1.0
1.0
fmod: dividend (negative); %: divisor (positive)
5.0 / -2.0
1.0
-1.0
fmod: dividend (positive); %: divisor (negative)
-5.0 / -2.0
-1.0
-1.0
Same as dividend (negative)
import math
print(f"math.fmod(5.0, 2.0) : {math.fmod(5.0, 2.0)}") # 1.0
print(f"5.0 % 2.0 : {5.0 % 2.0}") # 1.0
print(f"math.fmod(-5.0, 2.0): {math.fmod(-5.0, 2.0)}") # -1.0
print(f"-5.0 % 2.0 : {-5.0 % 2.0}") # 1.0
print(f"math.fmod(5.0, -2.0): {math.fmod(5.0, -2.0)}") # 1.0
print(f"5.0 % -2.0 : {5.0 % -2.0}") # -1.0
2.3. Advanced Mathematical Operations
The math module provides a comprehensive set of functions for more complex mathematical computations, including exponential, logarithmic, trigonometric, and vector operations.
Exponential and Logarithmic Functions
- 🔑
math.exp(x): Returnse^x. - 🔑
math.log(x[, base]): With one argument, returns the natural logarithm ofx(to base e). With two arguments, returns the logarithm ofxto the givenbase. - 🔑
math.log10(x): Returns the base-10 logarithm ofx. - 🔑
math.pow(x, y): Returnsxraised to the powery(x**y). Note that this function always returns a float. - 🔑
math.sqrt(x): Returns the square root ofx. RaisesValueErrorifxis negative.
import math
print(f"e^2: {math.exp(2)}") # ~7.389
print(f"ln(10): {math.log(10)}") # ~2.302 (natural logarithm)
print(f"log base 2 of 8: {math.log(8, 2)}") # 3.0
print(f"log10(100): {math.log10(100)}") # 2.0
print(f"2^10 using pow: {math.pow(2, 10)}") # 1024.0
print(f"Square root of 81: {math.sqrt(81)}") # 9.0
Trigonometric Functions
All trigonometric functions in the math module operate on angles expressed in radians. If you have angles in degrees, use math.radians() to convert them before using trigonometric functions, and math.degrees() to convert results back.
- 🔑
math.sin(x),math.cos(x),math.tan(x): Returns the sine, cosine, and tangent ofx(in radians), respectively. - 🔑
math.asin(x),math.acos(x),math.atan(x): Returns the inverse sine, cosine, and tangent ofx, returning an angle in radians. - 🔑
math.degrees(x),math.radians(x): Convert anglexfrom radians to degrees and vice versa. - 🔑 Hyperbolic variants:
math.sinh(x),math.cosh(x),math.tanh(x)for hyperbolic sine, cosine, and tangent.
import math
angle_degrees = 45
angle_radians = math.radians(angle_degrees)
print(f"45 degrees in radians: {angle_radians}")
print(f"Sine of {angle_degrees} degrees: {math.sin(angle_radians):.4f}") # ~0.7071
print(f"Cosine of {angle_degrees} degrees: {math.cos(angle_radians):.4f}") # ~0.7071
print(f"Tangent of {angle_degrees} degrees: {math.tan(angle_radians):.4f}") # ~1.0000
# Inverse functions
inv_sin_result_radians = math.asin(0.7071067811865476)
print(f"Inverse sine of ~0.7071 (radians): {inv_sin_result_radians:.4f}")
print(f"Inverse sine of ~0.7071 (degrees): {math.degrees(inv_sin_result_radians):.2f}")
# Hyperbolic
print(f"Hyperbolic sine of 1: {math.sinh(1):.4f}")
Vector Mathematics
The math module provides functions for calculating Euclidean distances, which are fundamental in geometry, physics, and machine learning.
- 🔑
math.hypot(x, y): Returns the Euclidean norm (magnitude) of a 2D vector(x, y), which issqrt(x*x + y*y). This is equivalent to the length of the hypotenuse of a right-angled triangle with sidesxandy. - 🔑
math.dist(p, q): Returns the Euclidean distance between two pointspandq, each given as a sequence (tuple or list) of coordinates. This supports N-dimensional points.
import math
# 2D Euclidean distance using hypot()
x1, y1 = 0, 0
x2, y2 = 3, 4
distance_2d = math.hypot(x2 - x1, y2 - y1)
print(f"2D distance (hypot) between (0,0) and (3,4): {distance_2d}") # 5.0
# N-dimensional Euclidean distance using dist()
point_a = (1, 2, 3)
point_b = (4, 6, 9)
distance_nd = math.dist(point_a, point_b)
print(f"ND distance (dist) between {point_a} and {point_b}: {distance_nd:.2f}") # ~7.81
point_c = (10, 20)
point_d = (13, 24)
distance_2d_dist = math.dist(point_c, point_d)
print(f"2D distance (dist) between {point_c} and {point_d}: {distance_2d_dist}") # 5.0
2.4. Complex Numbers (cmath module)
When calculations involve the square root of negative numbers, or complex exponentials and logarithms, the math module will raise errors because it only handles real numbers. This is where the cmath (complex math) module becomes essential.
Distinction between math and cmath modules
The primary difference lies in their domain: math works with real floating-point numbers, while cmath extends these operations to the complex plane.
| Feature | math Module |
cmath Module |
|---|
float)
Complex numbers (complex)
Return Type
float
complex (even if the imaginary part is zero)
Error Handling (e.g., sqrt(-1))
Raises ValueError
Returns 1j (the complex square root)
Constants
math.pi, math.e, math.inf, math.nan
cmath.pi, cmath.e, cmath.inf, cmath.nan (complex equivalents)
import math
import cmath
# Square root of a negative number
try:
print(f"math.sqrt(-1): {math.sqrt(-1)}")
except ValueError as e:
print(f"math.sqrt(-1) error: {e}") # Output: math.sqrt(-1) error: math domain error
print(f"cmath.sqrt(-1): {cmath.sqrt(-1)}") # Output: (0+1j)
cmath explicitly when dealing with complex numbers. Overlapping function names (e.g., sqrt, log) mean that simply importing * from both modules can lead to unexpected behavior or shadowing.
Coordinate Conversions
Complex numbers can be represented in two main forms:
- Rectangular (Cartesian) form:
z = x + yj, wherexis the real part andyis the imaginary part. - Polar form:
z = r * e^(i*phi), whereris the magnitude (radius) andphiis the phase (angle/argument).
The cmath module provides functions to convert between these representations.
- 🔑
cmath.phase(z): Returns the phase of the complex numberz, also known as the argument or angle. It's a float in the range(-pi, pi]. - 🔑
cmath.polar(z): Returns the polar representation ofzas a pair(r, phi), whereris the magnitude andphiis the phase. - 🔑
cmath.rect(r, phi): Returns the complex numberx + yjgiven its polar coordinates magnituderand phasephi.
import cmath
z = 1 + 1j # Complex number: 1 + i
# Get phase
phase_z = cmath.phase(z)
print(f"Phase of {z}: {phase_z:.4f} radians") # ~0.7854 (pi/4)
# Convert to polar coordinates
magnitude, phase_from_polar = cmath.polar(z)
print(f"Polar coordinates of {z}: Magnitude = {magnitude:.4f}, Phase = {phase_from_polar:.4f}") # Magnitude ~1.414, Phase ~0.7854
# Convert back to rectangular from polar
reconstructed_z = cmath.rect(magnitude, phase_from_polar)
print(f"Reconstructed from polar: {reconstructed_z}") # Output: (1+1j)
# Example with another complex number
z2 = -2 - 2j
magnitude2, phase2 = cmath.polar(z2)
print(f"Polar coordinates of {z2}: Magnitude = {magnitude2:.4f}, Phase = {phase2:.4f}") # Magnitude ~2.828, Phase ~-2.3562 (which is -3pi/4)
Practice & Application
🎯 Challenge 1: Precision in Scientific Measurements
You are analyzing data from a high-precision sensor. You have two measurements, measurement_a and measurement_b, and a known reference value, reference_val.
- Define
measurement_a = 0.0000001 * 1000000 + 0.000000000001andmeasurement_b = 0.1 + 0.000000000001. - Use
math.isclose()to determine ifmeasurement_aandmeasurement_bare considered practically equal. Start with default tolerances, then try withabs_tol=1e-10. What do you observe? - For
value_to_round = 5.67, usemath.ceil()andmath.floor(). Also, use the built-inround()and explain why its output might be different for a value like2.5. - Imagine you need to sum a list of numbers:
numbers = [0.000000000001] * 1000000 + [10.0]. Calculate the sum using bothsum()andmath.fsum(). Compare their results for precision. The true sum should be10.000001.
import math
# Part 1 & 2: Floating-point comparison
measurement_a = 0.0000001 * 1000000 + 0.000000000001
measurement_b = 0.1 + 0.000000000001
print(f"Measurement A: {measurement_a:.15f}")
print(f"Measurement B: {measurement_b:.15f}")
print(f"\nAre A and B close (default)? {math.isclose(measurement_a, measurement_b)}")
print(f"Are A and B close (abs_tol=1e-10)? {math.isclose(measurement_a, measurement_b, abs_tol=1e-10)}")
# The default rel_tol might not be sufficient for numbers very close to zero or where
# the absolute difference is significant compared to their magnitude difference.
# abs_tol=1e-10 checks if the absolute difference is less than 1e-10.
# Part 3: Rounding techniques
value_to_round = 5.67
print(f"\nValue to round: {value_to_round}")
print(f"math.ceil({value_to_round}): {math.ceil(value_to_round)}")
print(f"math.floor({value_to_round}): {math.floor(value_to_round)}")
print(f"round({value_to_round}): {round(value_to_round)}")
print(f"\nRounding 2.5: {round(2.5)}")
print(f"Rounding 3.5: {round(3.5)}")
# Explanation: Python's built-in round() uses "round half to even" or banker's rounding.
# This means that numbers ending in .5 are rounded to the nearest even integer.
# So, 2.5 rounds to 2, and 3.5 rounds to 4.
# Part 4: Accurate summation
numbers = [0.000000000001] * 1000000 + [10.0]
true_sum = 10.000001 # 10.0 + (1e-12 * 1e6) = 10.0 + 1e-6 = 10.000001
sum_builtin = sum(numbers)
sum_fsum = math.fsum(numbers)
print(f"\nTrue sum: {true_sum:.10f}")
print(f"Sum using built-in sum(): {sum_builtin:.10f}")
print(f"Sum using math.fsum(): {sum_fsum:.10f}")
# Observation: math.fsum() should provide a result closer to the true sum due to its
# algorithm that minimizes cumulative floating-point errors.
🎯 Challenge 2: Complex Rotations and Modulo Differences
Explore transformations of complex numbers and the nuances of modulo operations with negative floats.
- Define a complex number
z = 1 + 1j. Calculate its magnitude and phase usingcmath.polar(). - "Rotate" the complex number by adding
math.pi / 4(45 degrees) to its phase. Convert this new polar representation back to a rectangular complex number usingcmath.rect(). Print the original and rotated complex numbers. - Consider the operation
-7.5modulo2.0. Calculate the result using both the%operator andmath.fmod(). Explain why the results are different.
import math
import cmath
# Part 1 & 2: Complex number rotation
z_original = 1 + 1j
print(f"Original complex number: {z_original}")
# Get polar coordinates
magnitude, phase = cmath.polar(z_original)
print(f"Polar form: Magnitude = {magnitude:.4f}, Phase = {phase:.4f} radians ({math.degrees(phase):.2f} degrees)")
# Rotate by adding pi/4 to the phase
rotated_phase = phase + (math.pi / 4)
# Note: magnitude remains the same for rotation
z_rotated = cmath.rect(magnitude, rotated_phase)
print(f"Rotated complex number (by 45 deg): {z_rotated}")
# Expected: z_original (1+1j) is at 45 deg. Adding another 45 deg makes it 90 deg (purely imaginary).
# For z_original = 1+1j, magnitude = sqrt(2), phase = pi/4.
# rotated_phase = pi/4 + pi/4 = pi/2.
# cmath.rect(sqrt(2), pi/2) should be approximately 0 + sqrt(2)j (0 + 1.414j)
# Part 3: Modulo differences
num = -7.5
divisor = 2.0
# Using % operator
remainder_percent = num % divisor
print(f"\n{num} % {divisor} = {remainder_percent}")
# Using math.fmod()
remainder_fmod = math.fmod(num, divisor)
print(f"math.fmod({num}, {divisor}) = {remainder_fmod}")
# Explanation:
# The % operator in Python ensures the result has the same sign as the divisor.
# -7.5 = (-4 * 2.0) + 0.5 --> so remainder is 0.5 (same sign as 2.0)
# math.fmod() ensures the result has the same sign as the dividend.
# -7.5 = (3 * -2.0) - 1.5 --> math.fmod divides in C style:
# -7.5 = -3 * 2.0 - 1.5. This isn't quite right for fmod.
# More precisely, for fmod(x, y), the result is x - n*y where n is the integer part
# of x/y, truncated towards zero.
# x/y = -7.5 / 2.0 = -3.75. Truncate towards zero gives n = -3.
# Result = -7.5 - (-3 * 2.0) = -7.5 - (-6.0) = -1.5.
# So, for -7.5 % 2.0, the result is 0.5 (positive, like 2.0).
# For math.fmod(-7.5, 2.0), the result is -1.5 (negative, like -7.5).
3. Phase 2: Precision and Rational Arithmetic
While Python's built-in float type is highly optimized for performance and generally sufficient for scientific computing, it has inherent limitations due to its binary representation. This phase explores modules that provide alternatives for scenarios demanding exact decimal precision or precise rational number arithmetic: decimal for arbitrary-precision floats and fractions for exact rational numbers.
3.1. Arbitrary Precision Floats (decimal module)
Understanding "Floating Point Error"
Standard floating-point numbers (float in Python, typically IEEE 754 double-precision) are stored internally in binary. This means that many decimal fractions, like 0.1, cannot be represented exactly in binary. They become repeating binary fractions, similar to how 1/3 is a repeating decimal (0.333...). When these inexact binary representations are used in calculations, small errors can accumulate, leading to results that are slightly off from their true mathematical values.
# A classic example of floating-point inaccuracy
print(0.1 + 0.1 + 0.1 == 0.3) # Output: False
print(0.1 + 0.1 + 0.1) # Output: 0.30000000000000004
- 🔑 Root Cause: Binary representation of decimal fractions leads to approximation.
- ❌ Consequence: Inaccurate equality checks, accumulated errors in sums or repeated operations, potential issues in financial calculations where exactness is paramount.
- ✅ Mitigation: For critical precision, use
decimal.Decimalorfractions.Fraction.
Implementing Decimal type for exact representation
The decimal module provides a Decimal type for decimal floating-point arithmetic. This type can represent numbers exactly (up to its defined precision) and handles operations in base 10, thus avoiding the binary floating-point issues for decimal numbers.
To ensure exactness, it is highly recommended to construct Decimal objects from strings rather than floats. If you convert a float to a Decimal, the float's inherent inaccuracies are carried over.
from decimal import Decimal
Decimal('0.1') # Exact
Decimal(0.1) # Inherits float's inaccuracy, though Decimal tries to represent the *exact* binary float value.
from decimal import Decimal
# Using floats (prone to error)
float_sum = 0.1 + 0.1 + 0.1
print(f"Float sum: {float_sum}")
# Using Decimal from strings (recommended for exactness)
decimal_sum = Decimal('0.1') + Decimal('0.1') + Decimal('0.1')
print(f"Decimal sum (from strings): {decimal_sum}")
print(f"Decimal sum == Decimal('0.3'): {decimal_sum == Decimal('0.3')}") # Output: True
# Using Decimal from floats (demonstrates float's underlying inaccuracy)
decimal_from_float = Decimal(0.1)
print(f"Decimal from float(0.1): {decimal_from_float}")
# Notice the extended precision reflects the actual value of float 0.1
Context management for precision control
The decimal module allows you to control the precision of calculations globally using a "context." The most common setting to adjust is prec, which defines the number of significant digits for new Decimal results.
- 🔑
decimal.getcontext(): Returns the current context for the active thread. - 🔑
decimal.getcontext().prec: An integer specifying the precision (total number of significant digits) for arithmetic operations. The default is usually 28.
from decimal import Decimal, getcontext
# Default precision
print(f"Default precision: {getcontext().prec}") # Output: 28 (or similar, depends on system)
# Perform a calculation with default precision
result_default_prec = Decimal('1') / Decimal('7')
print(f"1/7 with default precision: {result_default_prec}") # Output: 0.1428571428571428571428571429
# Change precision for a block of code (globally for current thread)
getcontext().prec = 10 # Set precision to 10 significant digits
result_high_prec = Decimal('1') / Decimal('7')
print(f"1/7 with 10-digit precision: {result_high_prec}") # Output: 0.1428571429
# Reset precision (good practice, or use `localcontext`)
getcontext().prec = 28
# Context managers (`localcontext`) allow temporary changes
with getcontext() as ctx:
ctx.prec = 5
result_low_prec = Decimal('1') / Decimal('7')
print(f"1/7 with 5-digit precision (localcontext): {result_low_prec}") # Output: 0.14286
print(f"Precision after localcontext block: {getcontext().prec}") # Back to 28
float Precision
Fixed (~15-17 decimal digits)
Decimal Precision
Configurable (default 28, can be much higher)
3.2. Exact Rational Arithmetic (fractions module)
Storing numbers as numerator/denominator pairs
The fractions module provides the Fraction type, which represents a rational number as a pair of integers: a numerator and a denominator. This representation allows for arithmetic to be performed with absolute precision, as long as the numbers involved are rational.
- 🔑
fractions.Fraction(numerator=0, denominator=1): Constructor. Can take two integers, one integer, a float (caution: subject to float inaccuracies), or a string representation of a fraction or float.
from fractions import Fraction
# From two integers
f1 = Fraction(3, 4)
print(f"Fraction(3, 4): {f1}") # Output: 3/4
# From an integer
f2 = Fraction(5)
print(f"Fraction(5): {f2}") # Output: 5
# From a string (recommended for exactness)
f3 = Fraction('1.25')
print(f"Fraction('1.25'): {f3}") # Output: 5/4
f4 = Fraction('1/3')
print(f"Fraction('1/3'): {f4}") # Output: 1/3
# From a float (inherits float's inaccuracy, see below)
f5 = Fraction(0.1)
print(f"Fraction(0.1): {f5}")
# Output: 3602879701896397/36028797018963968 (this is the exact fraction for the float 0.1)
Decimal type, converting a float directly to a Fraction can embed the float's inherent binary inaccuracies into the Fraction. For exact decimal fractions, always use a string representation (e.g., Fraction('0.25') or Fraction('1/4')).
Performing arithmetic without precision loss
Arithmetic operations on Fraction objects automatically handle common denominators and simplification, ensuring that the result is always an exact rational number.
from fractions import Fraction
from decimal import Decimal
# Demonstrating precision with fractions
third = Fraction(1, 3)
sum_of_thirds = third + third + third
print(f"Sum of three 1/3s using Fraction: {sum_of_thirds}") # Output: 1
# Compare with floats and Decimals
float_sum = 1/3 + 1/3 + 1/3
print(f"Sum of three 1/3s using float: {float_sum}") # Output: 0.9999999999999999
decimal_sum = Decimal('1') / Decimal('3') + Decimal('1') / Decimal('3') + Decimal('1') / Decimal('3')
print(f"Sum of three 1/3s using Decimal (default prec=28): {decimal_sum}") # Output: 0.9999999999999999999999999999
# Basic operations
f_a = Fraction('1/2')
f_b = Fraction('1/3')
print(f"\n{f_a} + {f_b} = {f_a + f_b}") # Output: 5/6
print(f"{f_a} - {f_b} = {f_a - f_b}") # Output: 1/6
print(f"{f_a} * {f_b} = {f_a * f_b}") # Output: 1/6
print(f"{f_a} / {f_b} = {f_a / f_b}") # Output: 3/2
float Exactness
Low (Binary approx)
Decimal Exactness
High (Decimal approx, configurable)
Fraction Exactness
Perfect (Rational numbers only)
4. Phase 3: Probability and Randomness
Randomness plays a pivotal role in many computational tasks, from simulations and games to cryptographic security and statistical sampling. Python provides two primary modules for generating random numbers: the random module for general-purpose pseudorandomness and the secrets module for cryptographically strong random numbers.
Computers cannot generate truly random numbers. Instead, they produce pseudorandom numbers using deterministic algorithms. These algorithms start from a "seed" value and generate a sequence of numbers that appear random but are entirely predictable if the seed is known. For most simulations and games, this is perfectly adequate. For security-critical applications, cryptographically strong pseudorandom numbers are required, which are much harder to predict.
4.1. Pseudorandom Generators (random module)
The random module implements pseudorandom number generators for various distributions. It's built upon the Mersenne Twister algorithm, which is fast and generates high-quality pseudorandom numbers suitable for most non-cryptographic purposes.
Do NOT use the random module for generating security-sensitive data like passwords, encryption keys, or authentication tokens. For such purposes, use the secrets module.
Basic Random Number Generation
- 🔑
random.randint(a, b): Returns a random integerNsuch thata <= N <= b. Note that both endpoints are inclusive. - 🔑
random.uniform(a, b): Returns a random floating-point numberNsuch thata <= N <= bfora <= bandb <= N <= aforb < a. - 🔑
random.random(): Returns a random floatrin the range[0.0, 1.0)(i.e., 0.0 <= r < 1.0).
import random
# Random integer between 1 and 10 (inclusive)
print(f"Random integer (1-10): {random.randint(1, 10)}")
# Random float between 0.0 and 1.0
print(f"Random float (0.0-1.0): {random.random():.4f}")
# Random float between 10.0 and 20.0
print(f"Random uniform (10.0-20.0): {random.uniform(10.0, 20.0):.4f}")
Sequence Operations
The random module also provides functions to pick elements from sequences (lists, tuples, strings).
- 🔑
random.choice(seq): Returns a randomly selected element from the non-empty sequenceseq. - 🔑
random.choices(population, weights=None, k=1): Returns a list ofkelements chosen frompopulationwith replacement. Ifweightsis provided, it's a list of relative weights for each item. - 🔑
random.sample(population, k): Returns a list ofkunique elements chosen from thepopulationsequence without replacement. The size of the samplekmust be less than or equal to the population size. - 🔑
random.shuffle(x): Shuffles the sequencexin place.
import random
my_list = ['apple', 'banana', 'cherry', 'date']
# Single item selection
print(f"Random choice from list: {random.choice(my_list)}")
# Weighted selection (with replacement)
# 'banana' is twice as likely as 'apple', 'cherry', 'date'
weighted_choices = random.choices(my_list, weights=[10, 20, 10, 10], k=3)
print(f"Weighted choices (k=3): {weighted_choices}")
# Selection without replacement
sample_items = random.sample(my_list, k=2)
print(f"Random sample (k=2): {sample_items}")
# Shuffling a list in place
numbers = [1, 2, 3, 4, 5]
random.shuffle(numbers)
print(f"Shuffled numbers: {numbers}")
Generator State Management
Pseudorandom generators produce sequences deterministically based on an initial "seed." By saving and restoring the generator's internal state, you can reproduce the exact same sequence of "random" numbers. This is incredibly useful for debugging, testing, and ensuring reproducibility in scientific simulations.
- 🔑
random.getstate(): Returns an object capturing the current internal state of the generator. - 🔑
random.setstate(state): Restores the internal state of the generator to the state obtained from a previous call togetstate(). - 🔑
random.seed(a=None, version=2): Initializes the random number generator. Ifais omitted orNone, the current system time is used.
import random
# Set an initial seed for reproducibility
random.seed(123)
print("First sequence:")
for _ in range(3):
print(random.randint(1, 100))
# Save the current state
current_state = random.getstate()
print("\nSecond sequence (continues from previous):")
for _ in range(3):
print(random.randint(1, 100))
# Restore the saved state
random.setstate(current_state)
print("\nThird sequence (reproduced from saved state):")
for _ in range(3):
print(random.randint(1, 100))
# The numbers in the 'Second sequence' and 'Third sequence' will be identical.
4.2. Probability Distributions
The random module can generate numbers following specific statistical distributions, which is essential for simulations, modeling, and statistical analysis.
Generating numbers based on statistical models
- 🔑
random.gauss(mu, sigma): Returns a random floating-point number from a Gaussian (Normal) distribution.muis the mean, andsigmais the standard deviation. This is generally preferred overnormalvariate()for speed. - Other distributions:
random.expovariate(lambd)(exponential),random.lognormvariate(mu, sigma)(lognormal),random.vonmisesvariate(mu, kappa)(circular), etc.
import random
import matplotlib.pyplot as plt # Assuming matplotlib is available for visualization
# Generate 1000 numbers from a Gaussian distribution
# Mean (mu) = 0, Standard Deviation (sigma) = 1 (Standard Normal Distribution)
gaussian_numbers = [random.gauss(0, 1) for _ in range(10000)]
# Plotting a histogram to visualize the distribution
# (Requires matplotlib: pip install matplotlib)
# plt.hist(gaussian_numbers, bins=50, density=True, alpha=0.7, color='skyblue')
# plt.title("Gaussian (Normal) Distribution Sample")
# plt.xlabel("Value")
# plt.ylabel("Density")
# plt.grid(True)
# plt.show()
The histogram above (if visualized) would show the characteristic bell-curve shape of a Normal distribution, demonstrating the statistical properties of the generated numbers.
4.3. Secure Randomness (secrets module)
For applications requiring strong security, where unpredictability is paramount, the random module is inadequate. The secrets module is explicitly designed for cryptographic purposes, leveraging the operating system's most secure random number sources.
Comparison of SystemRandom and the secrets module
The random module includes a class called SystemRandom, which also uses os.urandom() for cryptographically secure random numbers. The secrets module is built on top of SystemRandom but provides a higher-level, more convenient API tailored for common security needs like token generation.
| Feature | random module |
random.SystemRandom |
secrets module |
|---|
os.urandom() (OS-provided source)
os.urandom() (OS-provided source)
Security Level
Non-cryptographic
Cryptographically secure
Cryptographically secure
Use Cases
Simulations, games, non-security sampling
Lower-level access to secure random bytes/numbers
Password generation, API keys, security tokens, temporary URLs
API Style
General-purpose number generation
Instance methods (e.g., SystemRandom().randint())
High-level, function-based for common security tasks
secrets module due to its specialized, high-level functions that reduce the chance of misuse.
Generating secure tokens and URLs
The secrets module provides functions tailored for generating data suitable for security applications.
- 🔑
secrets.token_bytes(nbytes=None): Returns a random byte string containingnbytesrandom bytes. IfnbytesisNoneor not supplied, a reasonable default is used. - 🔑
secrets.token_hex(nbytes=None): Returns a random text string in hexadecimal, suitable for a secure token. - 🔑
secrets.token_urlsafe(nbytes=None): Returns a random URL-safe text string, containingnbytesrandom bytes. - 🔑
secrets.choice(sequence): Returns a randomly chosen element from a non-empty sequence using a cryptographically secure random source.
import secrets
# Generate a secure hexadecimal token (e.g., for API keys)
hex_token = secrets.token_hex(16) # 16 bytes = 32 hex characters
print(f"Secure Hex Token: {hex_token}")
# Generate a URL-safe token (e.g., for password reset links)
url_safe_token = secrets.token_urlsafe(24) # 24 bytes base64-encoded
print(f"URL-Safe Token: {url_safe_token}")
# Choose a secure random item from a list (e.g., for a temporary password character)
characters = ['a', 'b', 'c', '1', '2', '3', '!', '@']
secure_char_choice = secrets.choice(characters)
print(f"Secure random character: {secure_char_choice}")
# Generate a temporary password
temp_password_chars = [secrets.choice('abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()') for _ in range(12)]
temp_password = "".join(temp_password_chars)
print(f"Generated temp password: {temp_password}")
Practice & Application
🎯 Challenge: Secure Lottery Simulation & OTP Generation
You are tasked with building a small system that handles two distinct randomness requirements:
Part 1: Lottery Draw (Non-Cryptographic)
Simulate a lottery draw where:
- Six unique "main numbers" are drawn from a pool of 1 to 49 (inclusive).
- One "bonus number" is drawn from the same pool (1 to 49), ensuring it is unique from the six main numbers.
- From a list of "lucky phrases" (e.g.,
["Feeling Lucky!", "Big Winner!", "Keep Trying!", "Almost There!"]), select three phrases with replacement. Make "Big Winner!" twice as likely to be chosen as any other phrase.
Part 2: Secure One-Time Passcode (Cryptographic) For a user authentication system, generate:
- A 12-character hexadecimal one-time password (OTP) that must be cryptographically secure.
- A 32-character URL-safe string for a password reset link.
import random
import secrets
# Part 1: Lottery Draw (Non-Cryptographic)
# 1. Draw six unique main numbers from 1 to 49
pool = list(range(1, 50))
main_numbers = random.sample(pool, k=6)
main_numbers.sort() # Sort for display purposes
print(f"Lottery Main Numbers: {main_numbers}")
# 2. Draw one bonus number, unique from the main numbers
# Remove main numbers from the pool for bonus draw
remaining_pool = [n for n in pool if n not in main_numbers]
bonus_number = random.choice(remaining_pool)
print(f"Lottery Bonus Number: {bonus_number}")
# 3. Select three lucky phrases with weighted probability
lucky_phrases = ["Feeling Lucky!", "Big Winner!", "Keep Trying!", "Almost There!"]
# Assign weights: Big Winner is 2, others are 1
weights = [1, 2, 1, 1]
selected_phrases = random.choices(lucky_phrases, weights=weights, k=3)
print(f"Lucky Dip Phrases: {selected_phrases}")
print("\n" + "="*40 + "\n")
# Part 2: Secure One-Time Passcode (Cryptographic)
# 1. Generate a 12-character hexadecimal OTP
# Each byte becomes 2 hex characters, so 6 bytes for 12 hex chars
otp = secrets.token_hex(6)
print(f"Secure One-Time Password (OTP): {otp}")
# 2. Generate a 32-character URL-safe string
# token_urlsafe() produces base64 strings, which are ~4/3 the length of bytes
# For 32 chars, we need roughly 32 * 3/4 = 24 bytes
recovery_link_token = secrets.token_urlsafe(24)
print(f"Password Reset Token (URL-safe): {recovery_link_token}")
5. Phase 4: Statistical Analysis
Understanding and summarizing data is a fundamental aspect of many fields, including science, finance, and social studies. Python's statistics module provides a straightforward way to calculate common mathematical statistics of numerical data. While not as extensive as specialized libraries like NumPy or SciPy, it's excellent for basic descriptive statistics on collections of real numbers.
5.1. Measures of Central Tendency (statistics module)
Measures of central tendency aim to describe the "center" or typical value of a dataset. They provide a single value that represents the entire distribution.
Calculating the "center" of data
- 🔑
statistics.mean(data): Returns the arithmetic mean ("average") of the data. It's the sum of all values divided by the count of values. Suitable for symmetrically distributed data without extreme outliers. - 🔑
statistics.median(data): Returns the median (middle value) of the data. If the data count is odd, it's the middle element. If even, it's the average of the two middle elements. The median is robust to outliers. - 🔑
statistics.mode(data): Returns the single most common data point from a discrete or nominal data. RaisesStatisticsErrorif there is no unique mode (i.e., multiple values appear with the same highest frequency).
import statistics
data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data2 = [1, 1, 2, 3, 4, 5, 5, 5, 6, 7]
data3 = [10, 20, 30, 40, 1000] # Data with an outlier
print(f"Data 1: {data1}")
print(f" Mean: {statistics.mean(data1)}") # 5.5
print(f" Median: {statistics.median(data1)}") # 5.5 (average of 5 and 6)
try:
print(f" Mode: {statistics.mode(data1)}")
except statistics.StatisticsError as e:
print(f" Mode: {e}") # No unique mode found
print(f"\nData 2: {data2}")
print(f" Mean: {statistics.mean(data2)}") # 3.9
print(f" Median: {statistics.median(data2)}") # 4.5 (average of 4 and 5)
print(f" Mode: {statistics.mode(data2)}") # 5
print(f"\nData 3 (with outlier): {data3}")
print(f" Mean: {statistics.mean(data3)}") # (10+20+30+40+1000)/5 = 220.0 (heavily influenced by 1000)
print(f" Median: {statistics.median(data3)}") # 30 (more robust to outlier)
print(f" Mode: {statistics.mode(data3)}") # No unique mode (raises error if uncommented)
Handling multi-modal data
When a dataset has two or more values that occur with the same highest frequency, it is called multi-modal. The statistics.mode() function will raise an error in this scenario. For such cases, statistics.multimode() is used.
- 🔑
statistics.multimode(data): Returns a list of the most frequently occurring values in the order they were first encountered in the data. If no unique mode exists, it will return all values that have the highest frequency.
import statistics
data_multimodal = [1, 2, 2, 3, 3, 4, 5]
# Using mode() would raise an error
try:
print(f"Mode for {data_multimodal}: {statistics.mode(data_multimodal)}")
except statistics.StatisticsError as e:
print(f"statistics.mode() error for {data_multimodal}: {e}") # Output: no unique mode; found 2 equally common values
# Using multimode() handles it correctly
print(f"Multimode for {data_multimodal}: {statistics.multimode(data_multimodal)}") # Output: [2, 3]
data_no_mode = [1, 2, 3, 4, 5]
print(f"Multimode for {data_no_mode}: {statistics.multimode(data_no_mode)}") # Output: [1, 2, 3, 4, 5] (all unique, so all are modes)
5.2. Measures of Dispersion
Measures of dispersion (or variability) describe how spread out the data points are. They quantify the range, variance, or standard deviation from the central tendency.
Analyzing data spread
- 🔑
statistics.variance(data, xbar=None): Calculates thesample variance of data. The variance measures the average of the squared differences from the mean. It's often used when analyzing a subset (sample) of a larger population. - 🔑
statistics.stdev(data, xbar=None): Calculates thesample standard deviation of data. The standard deviation is the square root of the variance and provides a measure of spread in the original units of the data.
import statistics
data = [60, 65, 70, 75, 80] # Scores of a sample of students
# Calculate mean first (or let functions do it)
mean_data = statistics.mean(data)
print(f"Data: {data}")
print(f"Mean: {mean_data}")
# Sample Variance
sample_variance = statistics.variance(data)
print(f"Sample Variance: {sample_variance:.2f}") # 62.50
# Sample Standard Deviation
sample_stdev = statistics.stdev(data)
print(f"Sample Standard Deviation: {sample_stdev:.2f}") # 7.91
Distinction between Population and Sample calculations
A crucial distinction in statistics is whether you are analyzing an entire population or just a sample from that population.
- Population: Refers to the entire group of individuals or instances about which we want to draw conclusions.
- Sample: A subset of the population used to infer characteristics of the whole population.
The formulas for variance and standard deviation differ slightly depending on whether you're working with a population or a sample. Specifically, sample variance and standard deviation use a denominator of n-1 (Bessel's correction) instead of n to provide an unbiased estimate of the population variance.
- 🔑 Population:
statistics.pvariance(data, mu=None): Calculates thepopulation variance of data.statistics.pstdev(data, mu=None): Calculates thepopulation standard deviation of data.
- 🔑 Sample:
statistics.variance(data, xbar=None): Calculates thesample variance of data. (Uses n-1 in denominator)statistics.stdev(data, xbar=None): Calculates thesample standard deviation of data. (Uses n-1 in denominator)
| Measure | Population Function | Sample Function | Denominator for Variance |
|---|
statistics.pvariance()
statistics.variance()
Population: n Sample: n-1 Standard Deviation
statistics.pstdev()
statistics.stdev()
Population: n Sample: n-1
When estimating population variance from a sample, dividing by n-1 instead of n (where n is the sample size) corrects for the fact that sample variance tends to underestimate population variance. This correction is known as Bessel's correction and is automatically applied by statistics.variance() and statistics.stdev().
import statistics
# Assume this represents an entire small population
population_data = [10, 15, 20, 25, 30]
# Assume this is a sample drawn from a larger population
sample_data = [12, 18, 22, 28]
print(f"Population Data: {population_data}")
print(f" Population Variance: {statistics.pvariance(population_data):.2f}") # (20.0)
print(f" Population StDev: {statistics.pstdev(population_data):.2f}") # (4.47)
print(f"\nSample Data: {sample_data}")
print(f" Sample Variance: {statistics.variance(sample_data):.2f}") # (58.67, using n-1=3)
print(f" Sample StDev: {statistics.stdev(sample_data):.2f}") # (7.66, using n-1=3)
# To illustrate the difference for the *same* data
# If we treat population_data as a sample:
print(f"\nPopulation Data treated as Sample:")
print(f" Sample Variance: {statistics.variance(population_data):.2f}") # (62.50, using n-1=4)
print(f" Sample StDev: {statistics.stdev(population_data):.2f}") # (7.91, using n-1=4)
Practice & Application
🎯 Challenge: Analyzing Student Exam Scores
You have been provided with a list of exam scores for a group of students. Your task is to perform a basic statistical analysis using Python's statistics module.
Use the following dataset: exam_scores = [78, 85, 92, 78, 95, 88, 80, 92, 78, 85, 90, 92, 65, 100]
- Calculate and print the mean of the exam scores.
- Calculate and print the median of the exam scores.
- Attempt to find the mode using
statistics.mode(). If it raises an error, explain why, and then usestatistics.multimode()to find all most frequent scores. - Assuming these scores represent a sample of student performance, calculate and print the sample variance and sample standard deviation.
- Now, assuming these scores represent the entire population of a very small class, calculate and print the population variance and population standard deviation.
- Compare the sample and population standard deviation values and briefly explain the reason for any difference.
import statistics
exam_scores = [78, 85, 92, 78, 95, 88, 80, 92, 78, 85, 90, 92, 65, 100]
print(f"Exam Scores: {exam_scores}")
print(f"Number of scores: {len(exam_scores)}")
# 1. Calculate the mean
mean_score = statistics.mean(exam_scores)
print(f"1. Mean Score: {mean_score:.2f}")
# 2. Calculate the median
median_score = statistics.median(exam_scores)
print(f"2. Median Score: {median_score:.2f}")
# 3. Find the mode(s)
print("\n3. Mode(s):")
try:
single_mode = statistics.mode(exam_scores)
print(f" Unique Mode (statistics.mode()): {single_mode}")
except statistics.StatisticsError as e:
print(f" Error with statistics.mode(): {e}")
multi_modes = statistics.multimode(exam_scores)
print(f" All Modes (statistics.multimode()): {multi_modes}")
# 4. Calculate sample variance and standard deviation
print("\n4. Sample Statistics:")
sample_variance = statistics.variance(exam_scores)
print(f" Sample Variance: {sample_variance:.2f}")
sample_stdev = statistics.stdev(exam_scores)
print(f" Sample Standard Deviation: {sample_stdev:.2f}")
# 5. Calculate population variance and standard deviation
print("\n5. Population Statistics:")
population_variance = statistics.pvariance(exam_scores)
print(f" Population Variance: {population_variance:.2f}")
population_stdev = statistics.pstdev(exam_scores)
print(f" Population Standard Deviation: {population_stdev:.2f}")
# 6. Comparison and Explanation
print("\n6. Comparison and Explanation:")
print(f"The Sample Standard Deviation ({sample_stdev:.2f}) is higher than the Population Standard Deviation ({population_stdev:.2f}).")
print("This is because the sample standard deviation (and variance) uses Bessel's correction (dividing by n-1 instead of n).")
print("Bessel's correction accounts for the fact that a sample's variability tends to underestimate the true variability of the larger population from which it was drawn, providing a less biased estimate.")
6. Phase 5: Data Integrity and Hashing
In the digital world, ensuring the integrity of data and the security of sensitive information like passwords is paramount. Cryptographic hash functions provide a robust mechanism for achieving these goals. Python's hashlib module offers a comprehensive suite of secure hash algorithms.
6.1. Cryptographic Hashing (hashlib module)
Understanding "One-way" hash functions
A cryptographic hash function is a mathematical algorithm that maps data of arbitrary size (the "message") to a bit array of a fixed size (the "hash value," "hash," or "message digest"). These functions are specifically designed to be "one-way" and possess several critical properties:
- ✅ Deterministic: The same input message always produces the same hash value.
- ✅ Quick Computation: It's computationally efficient to compute the hash value for any given message.
- ✅ Pre-image Resistance (One-Way Property): Given a hash value, it's computationally infeasible to find the original input message.
- ✅ Second Pre-image Resistance: Given an input message M1, it's computationally infeasible to find a different input message M2 that has the same hash value as M1.
- ✅ Collision Resistance: It's computationally infeasible to find two different input messages M1 and M2 that produce the same hash value.
Common applications include data integrity verification (if a file's hash changes, the file has been altered), digital signatures, and secure password storage.
Implementing various hash algorithms
The hashlib module provides constructors for various common hash algorithms. To use them, you typically create a hash object, feed it bytes (strings must be encoded), and then retrieve the digest in hexadecimal or binary format.
- 🔑 SHA-2 (Secure Hash Algorithm 2): A family of cryptographic hash functions published by the NIST. SHA-256 is commonly used.
- 🔑 SHA-3 (Secure Hash Algorithm 3): The latest generation of the Secure Hash Algorithm, chosen through a public competition. Provides a different construction from SHA-2.
- 🔑 BLAKE2: A cryptographically secure hash function that is faster than SHA-2 and SHA-3 on modern processors, while still offering similar security strength.
import hashlib
data = "This is a secret message to be hashed.".encode('utf-8') # Input must be bytes!
# SHA-256
sha256_hash_obj = hashlib.sha256()
sha256_hash_obj.update(data)
sha256_digest = sha256_hash_obj.hexdigest()
print(f"SHA-256 Hash: {sha256_digest}")
print(f"Length (chars): {len(sha256_digest)}") # 256 bits = 64 hex characters
# SHA-3 (e.g., SHA-3 256-bit variant)
sha3_hash_obj = hashlib.sha3_256()
sha3_hash_obj.update(data)
sha3_digest = sha3_hash_obj.hexdigest()
print(f"SHA-3 256-bit Hash: {sha3_digest}")
print(f"Length (chars): {len(sha3_digest)}") # 256 bits = 64 hex characters
# BLAKE2s (smaller, faster, 256-bit digest)
blake2s_hash_obj = hashlib.blake2s()
blake2s_hash_obj.update(data)
blake2s_digest = blake2s_hash_obj.hexdigest()
print(f"BLAKE2s Hash: {blake2s_digest}")
print(f"Length (chars): {len(blake2s_digest)}") # 256 bits = 64 hex characters
# BLAKE2b (larger, 512-bit digest)
blake2b_hash_obj = hashlib.blake2b()
blake2b_hash_obj.update(data)
blake2b_digest = blake2b_hash_obj.hexdigest()
print(f"BLAKE2b Hash: {blake2b_digest}")
print(f"Length (chars): {len(blake2b_digest)}") # 512 bits = 128 hex characters
Identifying and avoiding deprecated algorithms
The field of cryptography is constantly evolving. Algorithms once considered secure can become vulnerable due to new techniques for breaking codes (cryptanalysis) or increases in computing power. It's vital to stay current and avoid using outdated (deprecated) algorithms for new security-sensitive applications.
- ❌ MD5 (Message-Digest Algorithm 5):
Once widely used, MD5 has been shown to be severely compromised by collision attacks. This means it is feasible to find two different inputs that produce the same MD5 hash, undermining its integrity verification and digital signature uses.
- ❌ SHA-1 (Secure Hash Algorithm 1):
Similar to MD5, practical collision attacks have been demonstrated against SHA-1. While harder to execute than MD5 collisions, it is no longer considered safe for cryptographic purposes. Major browsers and security organizations have phased out support or flagged SHA-1 certificates as insecure.
These algorithms are broken and susceptible to attacks that could compromise data integrity, digital signatures, and password security. Always use modern, strong algorithms like SHA-256, SHA-3, or BLAKE2 for security-sensitive tasks.
6.2. Password Security
Storing user passwords requires special care. Directly hashing passwords, even with strong algorithms, is not enough. Attackers can employ techniques like rainbow tables and brute-force attacks. Two essential techniques mitigate these risks: salting and key derivation functions (KDFs).
Salting to prevent Rainbow Table attacks
A rainbow table is a precomputed table for reversing cryptographic hash functions, typically used to crack password hashes. If many users have common passwords (e.g., "password123"), their hashes will be identical. An attacker can compute the hash of "password123" once, then look for that hash across all stolen user hashes.
Salting prevents this by adding a unique, random string (the "salt") to each password *before* hashing. This means even if two users choose the same password, their combined (password + salt) inputs will be different, resulting in different hash values.
- 🔑 Salt: A random, unique string (or sequence of bytes) generated for each password.
- 🔑 Process: The password and its unique salt are concatenated, and then the combined string is hashed. The salt is stored in plain text alongside the hash.
- ✅ Benefit: A single rainbow table cannot be used to crack multiple passwords. Each password requires a new, independent hash calculation attempt.
Key derivation for slow brute-force attempts
Even with salting, if an attacker gets access to hashed passwords, they can still try to brute-force individual password hashes by guessing passwords, salting them, and comparing the result to the stored hash. Modern CPUs are incredibly fast at computing standard hash functions, making brute-forcing weak passwords a significant threat.
Key Derivation Functions (KDFs) are specifically designed to be computationally expensive (slow) to execute. This intentional slowness makes brute-force attacks prohibitively time-consuming for attackers, even with powerful hardware.
- 🔑 PBKDF2 (Password-Based Key Derivation Function 2): A widely recommended KDF standard.
- 🔑 Work Factor (Iterations): KDFs like PBKDF2 involve repeating the hashing process many times (thousands or millions of "iterations"). This value should be adjusted over time as computational power increases.
- ✅ Benefit: Significantly increases the time and resources required for an attacker to test each password guess, making brute-force attacks impractical.
While standard hash functions like SHA-256 are fast and good for integrity checking, they are *not* designed to resist brute-force password cracking. KDFs like PBKDF2, bcrypt, or scrypt are purpose-built for password hashing by intentionally adding computational cost.
The hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None) function is Python's standard library implementation for PBKDF2.
hash_name: The desired hash algorithm (e.g.,'sha256').password: The user's password (as bytes).salt: A unique, random salt (as bytes).iterations: The number of hashing rounds. This should be as high as feasible without impacting user experience too much (e.g., 100,000 to 600,000+).dklen: The desired length of the derived key (hash output).
import hashlib
import os
# User's raw password
password_raw = "mySuperSecretPassword123!"
# Generate a random salt for this user (should be unique for each user)
# A good practice is to use at least 16 bytes for the salt.
salt = os.urandom(16)
print(f"Generated Salt (hex): {salt.hex()}")
# Convert password to bytes
password_bytes = password_raw.encode('utf-8')
# Number of iterations (work factor) - adjust based on current hardware
# Start with at least 100,000, typically much higher (e.g., 400,000 or more)
iterations = 300_000 # Using underscores for readability
# Derived Key Length (e.g., 32 bytes for a 256-bit key)
dklen = 32
# Perform PBKDF2_HMAC
derived_key = hashlib.pbkdf2_hmac(
'sha256', # Hash algorithm to use internally
password_bytes, # The password as bytes
salt, # The salt as bytes
iterations, # The number of iterations
dklen=dklen # The length of the derived key
)
# Store the salt and derived_key (hash) in your database
# Both should be stored as hex strings or base64 for text-based storage
print(f"Stored Hash (hex): {derived_key.hex()}")
# To verify a password:
# 1. Retrieve the stored_salt and stored_hash from the database for the user.
# 2. Get the new_password_attempt from the user.
# 3. Calculate new_derived_key = hashlib.pbkdf2_hmac('sha256', new_password_attempt.encode('utf-8'), stored_salt, iterations, dklen=dklen)
# 4. Compare new_derived_key == stored_hash (using a constant-time comparison if available, like secrets.compare_digest)
Always store password hashes using a combination of a unique salt and a strong Key Derivation Function like PBKDF2, bcrypt, or scrypt. The secrets module's compare_digest() function should be used for comparing hashes to prevent timing attacks.
Homework / Challenges
📚 Homework 1: Exact Financial Calculations
You are managing a small investment portfolio and need to ensure absolute precision for all monetary values and share counts. Use the decimal and fractions modules to perform the following calculations:
-
Initial Investment: You buy
150shares of "TechCo" at$23.45per share. Calculate the total initial investment. Represent all monetary values usingDecimaland ensure calculations are done with a precision of4decimal places for currency. -
Stock Split: TechCo announces a
3-for-2stock split. This means for every 2 shares you own, you now get 3. Calculate your new total number of shares usingFractionto maintain exactness, as share counts must be whole numbers. -
Profit Calculation: After the split, you sell
200shares at$15.63per share. Calculate the revenue from this sale usingDecimal. Then, calculate your overall profit or loss by comparing your total revenue (from this sale, and assuming no other sales) against your initial investment. - Remaining Shares: How many shares do you have left after the sale? Ensure this is an exact integer.
from decimal import Decimal, getcontext
from fractions import Fraction
# Set decimal precision for currency calculations
getcontext().prec = 4
# 1. Initial Investment
shares_bought = Decimal('150')
price_per_share = Decimal('23.45')
initial_investment = shares_bought * price_per_share
print(f"1. Initial Investment: ${initial_investment}")
# 2. Stock Split
# Represent current shares as a Fraction to handle exact splits
current_shares_fraction = Fraction(int(shares_bought), 1)
split_ratio = Fraction(3, 2)
new_shares_fraction = current_shares_fraction * split_ratio
new_shares = int(new_shares_fraction) # Convert back to integer for actual share count
print(f"2. After 3-for-2 split: {new_shares} shares")
# 3. Profit Calculation
shares_sold = Decimal('200')
sale_price_per_share = Decimal('15.63')
revenue_from_sale = shares_sold * sale_price_per_share
print(f"3. Revenue from sale: ${revenue_from_sale}")
# Calculate overall profit/loss
# For simplicity, assuming the initial investment applies to all shares acquired pre-split
# and the cost basis for all new shares is effectively adjusted.
# A more complex scenario would require accounting for cost basis per share.
# Here, we compare total revenue from sold shares against a proportional part of the initial investment.
# This approach simplifies to (shares_sold / initial_shares) * initial_investment
# For a proper profit calculation, consider initial investment per share.
# However, a simpler profit/loss against *initial investment* for *sold shares* is a better exercise.
# Cost basis per share adjusted for split: original_price_per_share * (2/3)
cost_per_share_adjusted = price_per_share * Fraction(2,3) # Using Fraction for exactness
cost_of_sold_shares = shares_sold * Decimal(str(cost_per_share_adjusted.limit_denominator(1000))) # Convert Fraction to Decimal for multiplication
# limit_denominator is used here as an example to avoid extreme precision, could also just convert str(cost_per_share_adjusted)
# Let's simplify: compare total revenue against total initial investment for a rough estimate
# For a realistic profit, one needs to track the cost basis of the *specific* shares sold.
# A simpler approach: (revenue_from_sale / shares_sold) * new_shares_fraction.denominator
# The problem asks for *overall profit/loss* against *initial investment*.
# Let's consider the profit from the 200 shares sold based on their adjusted cost.
profit_loss = revenue_from_sale - (Decimal(str(cost_per_share_adjusted)) * shares_sold)
print(f" Profit/Loss from sold shares: ${profit_loss:.4f}")
# 4. Remaining Shares
remaining_shares = new_shares - int(shares_sold)
print(f"4. Remaining shares: {remaining_shares}")
📚 Homework 2: Analyzing a Biased Coin Experiment
You suspect a coin is biased towards "Heads" (H). You decide to conduct an experiment by flipping the coin many times and then analyze the results statistically.
-
Simulate Coin Flips: Write a function
simulate_flips(num_flips, heads_prob, seed=None)that simulatesnum_flipscoin flips. Each flip should have aheads_probchance of being 'H' and(1 - heads_prob)chance of being 'T'. Userandom.choices()for this. If aseedis provided, userandom.seed()to initialize the generator for reproducibility. -
Experiment Setup: Call your function to simulate
1000coin flips with aheads_prob = 0.6. Store the results (a list of 'H' or 'T'). -
Statistical Analysis:
- Calculate the actual proportion of 'H' and 'T' in your simulation results.
- Calculate the number of consecutive 'H' streaks of length 2 or more (e.g., 'HH', 'HHH', etc.) and 'T' streaks of length 2 or more.
- (Optional: If you converted results to numbers, calculate mean/median).
-
Reproducibility Test:
- Run the
simulate_flipsfunction once withnum_flips = 10,heads_prob = 0.5, andseed = 42. Print the results. - Save the state of the
randomgenerator after these 10 flips. - Generate 5 more flips without a seed.
- Restore the saved state, and then generate those same 5 flips again. Confirm they are identical.
- Run the
import random
import statistics
def simulate_flips(num_flips, heads_prob, seed=None):
if seed is not None:
random.seed(seed)
outcomes = ['H', 'T']
weights = [heads_prob, 1 - heads_prob]
flips = random.choices(outcomes, weights=weights, k=num_flips)
return flips
# 2. Experiment Setup
num_flips_experiment = 1000
heads_probability = 0.6
experiment_results = simulate_flips(num_flips_experiment, heads_probability)
# 3. Statistical Analysis
count_H = experiment_results.count('H')
count_T = experiment_results.count('T')
prop_H = count_H / num_flips_experiment
prop_T = count_T / num_flips_experiment
print(f"--- Coin Flip Experiment ({num_flips_experiment} flips, P(H)={heads_probability}) ---")
print(f" Actual Heads: {count_H} ({prop_H:.2%})")
print(f" Actual Tails: {count_T} ({prop_T:.2%})")
# Calculate streaks (simple implementation)
s = "".join(experiment_results)
streaks_H = s.count('HH') # A simple count, not truly unique streak detection
streaks_T = s.count('TT') # For actual streak detection, a loop and state machine would be needed.
# Simplified streak check: count occurrences of double H or T
num_hh_streaks = 0
num_tt_streaks = 0
for i in range(len(experiment_results) - 1):
if experiment_results[i] == 'H' and experiment_results[i+1] == 'H':
num_hh_streaks += 1
elif experiment_results[i] == 'T' and experiment_results[i+1] == 'T':
num_tt_streaks += 1
print(f" Approximate 'HH' streaks (consecutive pairs): {num_hh_streaks}")
print(f" Approximate 'TT' streaks (consecutive pairs): {num_tt_streaks}")
# 4. Reproducibility Test
print("\n--- Reproducibility Test ---")
# Run first 10 flips with a seed
first_10_flips = simulate_flips(10, 0.5, seed=42)
print(f" First 10 flips (seeded): {''.join(first_10_flips)}")
# Save state
saved_state = random.getstate()
# Generate 5 more flips
next_5_flips_1 = simulate_flips(5, 0.5) # No seed, continues from last state
print(f" Next 5 flips (continued): {''.join(next_5_flips_1)}")
# Restore state
random.setstate(saved_state)
# Generate 5 more flips again (should be identical to next_5_flips_1)
next_5_flips_2 = simulate_flips(5, 0.5)
print(f" Next 5 flips (restored): {''.join(next_5_flips_2)}")
print(f" Are the two sets of 5 flips identical? {next_5_flips_1 == next_5_flips_2}")
📚 Homework 3: File Integrity and Secure Password Storage
This challenge combines concepts from data integrity and password security using the hashlib and secrets modules.
-
File Integrity Check:
- Create a text file named
document.txtwith some arbitrary content (e.g., "This is the original document."). - Calculate its SHA-256 hash. Print the hash.
- Modify
document.txt(e.g., add "Updated content.") and calculate its SHA-256 hash again. Print the new hash and observe the difference.
- Create a text file named
-
Secure Password Storage Simulation:
- Define a dummy user password (e.g.,
"StrongPa$$w0rd!"). - Generate a unique, cryptographically secure 16-byte salt using
os.urandom(). Print this salt in hexadecimal format. - Hash the password using
hashlib.pbkdf2_hmac()with'sha256', your generated salt, and at least250,000iterations. Use a derived key length (dklen) of32bytes. Print the resulting hash in hexadecimal format.
- Define a dummy user password (e.g.,
-
Password Verification Simulation:
- Simulate a login attempt with the correct password. Use the stored salt and iterations to re-hash the submitted password. Compare the newly generated hash with the stored hash using
secrets.compare_digest(). Print whether the login was successful. - Simulate a login attempt with an incorrect password (e.g.,
"WrongPa$$w0rd!"). Repeat the hashing and comparison process. Print whether this login was successful.
- Simulate a login attempt with the correct password. Use the stored salt and iterations to re-hash the submitted password. Compare the newly generated hash with the stored hash using
import hashlib
import os
import secrets # Used for compare_digest
# --- Part 1: File Integrity Check ---
file_name = "document.txt"
original_content = "This is the original document. It contains important information."
# Create the original file
with open(file_name, "w") as f:
f.write(original_content)
print("1. File Integrity Check:")
# Calculate hash of the original file
with open(file_name, "rb") as f:
original_file_bytes = f.read()
original_hash = hashlib.sha256(original_file_bytes).hexdigest()
print(f" Original '{file_name}' SHA-256 hash: {original_hash}")
# Modify the file
modified_content = original_content + "\nUpdated content added later."
with open(file_name, "w") as f:
f.write(modified_content)
# Calculate hash of the modified file
with open(file_name, "rb") as f:
modified_file_bytes = f.read()
modified_hash = hashlib.sha256(modified_file_bytes).hexdigest()
print(f" Modified '{file_name}' SHA-256 hash: {modified_hash}")
print(f" Hashes are different: {original_hash != modified_hash}")
# Clean up the dummy file
os.remove(file_name)
# --- Part 2: Secure Password Storage Simulation ---
print("\n2. Secure Password Storage Simulation:")
# Dummy user password
user_password_raw = "StrongPa$$w0rd!"
# Generate a unique, cryptographically secure salt (16 bytes)
stored_salt = os.urandom(16)
print(f" Generated Salt (hex): {stored_salt.hex()}")
# Convert password to bytes
password_bytes = user_password_raw.encode('utf-8')
# Number of iterations for PBKDF2 (adjust for desired work factor)
iterations = 250_000
# Desired derived key length (32 bytes = 256 bits)
dklen = 32
# Hash the password
stored_hash = hashlib.pbkdf2_hmac(
'sha256',
password_bytes,
stored_salt,
iterations,
dklen=dklen
)
print(f" Stored Hash (hex): {stored_hash.hex()}")
# --- Part 3: Password Verification Simulation ---
print("\n3. Password Verification Simulation:")
# --- Correct password attempt ---
login_attempt_correct = "StrongPa$$w0rd!"
attempt_password_bytes_correct = login_attempt_correct.encode('utf-8')
# Re-hash the submitted password with the stored salt and iterations
attempt_hash_correct = hashlib.pbkdf2_hmac(
'sha256',
attempt_password_bytes_correct,
stored_salt,
iterations,
dklen=dklen
)
# Compare hashes using secrets.compare_digest for security against timing attacks
if secrets.compare_digest(stored_hash, attempt_hash_correct):
print(f" Login with correct password '{login_attempt_correct}': SUCCESS")
else:
print(f" Login with correct password '{login_attempt_correct}': FAILED (ERROR!)")
# --- Incorrect password attempt ---
login_attempt_incorrect = "WrongPa$$w0rd!"
attempt_password_bytes_incorrect = login_attempt_incorrect.encode('utf-8')
# Re-hash the submitted password with the stored salt and iterations
attempt_hash_incorrect = hashlib.pbkdf2_hmac(
'sha256',
attempt_password_bytes_incorrect,
stored_salt,
iterations,
dklen=dklen
)
# Compare hashes
if secrets.compare_digest(stored_hash, attempt_hash_incorrect):
print(f" Login with incorrect password '{login_attempt_incorrect}': SUCCESS (ERROR!)")
else:
print(f" Login with incorrect password '{login_attempt_incorrect}': FAILED")