Python Math Modules Precision, Randomness, Statistics Guide

Python Math Modules Precision, Randomness, Statistics Guide

1. Introduction to Python's Mathematical Modules

Python's standard library provides a rich ecosystem of modules designed to handle various mathematical operations, from basic arithmetic and complex numbers to advanced statistical analysis, precision control, and cryptographic hashing. Understanding these modules is crucial for developing robust and accurate applications in scientific computing, data analysis, finance, security, and many other domains. This section introduces the core mathematical modules available out-of-the-box in Python.

Overview of the Python math module

The math module provides access to common mathematical functions and constants. It works only with real numbers (floats) and generally follows widely accepted computer programming standards when possible.

  • Purpose: Standard mathematical functions for real numbers.
  • 🔑 Key Features: Trigonometric, logarithmic, exponential functions, constants (pi, e, inf, nan), number theory functions (gcd, factorial).
  • Limitation: Does not support complex numbers.

Overview of the Python cmath module

The cmath module provides mathematical functions for complex numbers. Its functions typically return complex numbers, even if the result could be expressed as a real number.

  • Purpose: Mathematical functions specifically designed for complex numbers.
  • 🔑 Key Features: Complex trigonometric, logarithmic, exponential functions, phase, polar and rectangular coordinate conversions.
  • Limitation: Not suitable for real-number only calculations where math is preferred for potentially better performance or simpler type handling.

Overview of the Python decimal module

The decimal module offers a Decimal floating-point arithmetic implementation, designed for situations where exact decimal representation is needed, such as financial calculations, to avoid floating-point inaccuracies.

  • Purpose: Arbitrary-precision decimal floating-point arithmetic.
  • 🔑 Key Features: Exact representation of decimal numbers, configurable precision, controlled rounding, avoidance of common binary floating-point issues.
  • Trade-off: Slower performance compared to native float operations.

Overview of the Python fractions module

The fractions module provides support for rational number arithmetic. A Fraction instance can be constructed from a pair of integers, a float, or a string.

  • Purpose: Exact rational number arithmetic (numerator and denominator).
  • 🔑 Key Features: Represents numbers as fractions, no loss of precision in arithmetic operations, useful for exact results in mathematics.
  • Trade-off: Can lead to very large numerators and denominators for complex calculations, potentially impacting performance.

Overview of the Python random module

The random module implements pseudo-random number generators for various distributions. It is suitable for simulations, games, and non-cryptographic applications.

  • Purpose: Generate pseudo-random numbers for simulations and non-security-sensitive tasks.
  • 🔑 Key Features: Integers (randint), floats (uniform), sequence selection (choice, sample), various probability distributions (gauss, expovariate).
  • Caution: Not cryptographically secure; do not use for security-sensitive applications.

Overview of the Python secrets module

The secrets module is designed for generating cryptographically strong random numbers suitable for managing sensitive data such as passwords, authentication tokens, and security-critical keys.

  • Purpose: Generate cryptographically strong random numbers for security purposes.
  • 🔑 Key Features: Secure token generation (token_hex, token_urlsafe), secure choice from sequences.
  • Caveat: May be slightly slower than random due to higher entropy requirements.

Overview of the Python statistics module

The statistics module provides functions for calculating mathematical statistics of numerical data. It handles common descriptive statistics.

  • Purpose: Basic descriptive statistics for numerical data.
  • 🔑 Key Features: Measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation).
  • Limitation: Not a full-fledged statistical analysis package like SciPy or NumPy; focuses on basic, common statistics.

Overview of the Python hashlib module

The hashlib module implements a common interface to many different secure hash and message digest algorithms. These are "one-way" functions critical for data integrity and security.

  • Purpose: Secure hashing and message digest algorithms.
  • 🔑 Key Features: Supports SHA256, SHA3, BLAKE2, and other algorithms, password hashing (PBKDF2), data integrity checks.
  • Warning: Requires careful use to avoid common security pitfalls (e.g., using weak hashes, not salting passwords).

2. Phase 1: Foundations of Mathematical Functions

2.1. Basic Constants and Arithmetic (math module)

The math module is the cornerstone for fundamental mathematical operations on real numbers in Python. It provides access to essential mathematical constants and functions for number theory, rounding, and robust floating-point comparisons.

Utilization of Mathematical Constants

The math module defines several useful constants that represent mathematical concepts or specific numerical values.

  • 🔑 math.pi: The mathematical constant π (pi), approximately 3.14159. Used in geometry and trigonometry.
  • 🔑 math.e: The mathematical constant e, approximately 2.71828. The base of the natural logarithm.
  • 🔑 math.inf: Positive floating-point infinity. Useful for comparisons involving unbounded values.
  • 🔑 math.nan: Not a Number (NaN). Represents undefined or unrepresentable numerical results, such as math.sqrt(-1) or 0/0.

import math

print(f"Pi: {math.pi}")
print(f"e: {math.e}")
print(f"Infinity: {math.inf}")
print(f"Not a Number: {math.nan}")

# Example of operations resulting in infinity or NaN
print(f"10 / 0: {math.inf}") # Division by zero for floats yields inf
print(f"sqrt(-1) with math module: {math.sqrt(-1)}") # Raises ValueError, but NaN represents such conceptual result for float operations
Warning: The result of 10 / 0 in Python for integers raises a ZeroDivisionError. However, for floating-point operations (e.g., 10.0 / 0.0), it results in math.inf. Similarly, math.sqrt(-1) raises a ValueError because math operates on real numbers; for complex numbers, you'd use cmath.

Number Theory Functions

  • 🔑 math.gcd(a, b): Returns the greatest common divisor of the integers a and b. If either a or b is non-zero, the value of gcd(a, b) is the largest positive integer that divides both a and b. gcd(0, 0) returns 0.
  • 🔑 math.factorial(x): Returns x factorial. Raises ValueError if x is not an integral non-negative number.

import math

# GCD example
print(f"GCD(48, 18): {math.gcd(48, 18)}") # Output: 6
print(f"GCD(17, 34): {math.gcd(17, 34)}") # Output: 17

# Factorial example
print(f"Factorial(5): {math.factorial(5)}") # Output: 120 (5*4*3*2*1)
print(f"Factorial(0): {math.factorial(0)}") # Output: 1

Rounding Techniques

Python offers various methods for rounding numbers, each with a specific behavior. The math module provides functions for rounding up and down, distinct from Python's built-in round().

  • 🔑 math.ceil(x): Returns the smallest integer greater than or equal to x (rounds up).
  • 🔑 math.floor(x): Returns the largest integer less than or equal to x (rounds down).

Comparison with built-in round():

Input math.ceil() math.floor() round() (Python built-in)
3.14 4 3 3 3.7 4 3 4 -3.14 -3 -4 -3 -3.7 -3 -4 -4 2.5 3 2 2 (rounds to nearest even) 3.5 4 3 4 (rounds to nearest even)

import math

value1 = 3.14
value2 = -3.7

print(f"Value: {value1}")
print(f"  math.ceil({value1}): {math.ceil(value1)}")
print(f"  math.floor({value1}): {math.floor(value1)}")
print(f"  round({value1}): {round(value1)}")

print(f"\nValue: {value2}")
print(f"  math.ceil({value2}): {math.ceil(value2)}")
print(f"  math.floor({value2}): {math.floor(value2)}")
print(f"  round({value2}): {round(value2)}")

print(f"\nRound half to even (banker's rounding) examples:")
print(f"  round(2.5): {round(2.5)}") # Output: 2
print(f"  round(3.5): {round(3.5)}") # Output: 4

Floating-point comparison with math.isclose()

Directly comparing floating-point numbers using == can be problematic due to their inherent approximations and precision limitations. Small discrepancies, often unnoticeable to humans, can cause equality checks to fail unexpectedly.

[Perform Calculation A] | v [Result A (float)] | v [Perform Calculation B] | v [Result B (float)] | v [Is Result A == Result B?] ---> [False, due to tiny precision difference] | v [Unexpected Program Behavior]

math.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) is designed to safely compare two floating-point values by checking if they are "close enough" to each other within specified tolerances.

  • rel_tol (relative tolerance): The maximum difference allowed relative to the larger of the two absolute values of the arguments. This is useful for numbers of vastly different magnitudes. Default is 1e-09 (0.000000001), meaning 9 decimal places of precision.
  • abs_tol (absolute tolerance): The maximum difference allowed, regardless of the magnitude of the arguments. Useful for comparisons near zero or when absolute error bounds are known.
Key Concept: Comparing Floats

Two numbers a and b are considered close if their absolute difference is less than or equal to either the relative tolerance times the maximum absolute value of a and b, OR the absolute tolerance. Mathematically:

abs(a-b) <= max(rel_tol * abs(a), rel_tol * abs(b)) OR abs(a-b) <= abs_tol

This formula ensures that comparisons work well for both very large and very small numbers.


import math

# Direct comparison problem
x = 0.1 + 0.1 + 0.1
y = 0.3
print(f"x = {x}, y = {y}")
print(f"x == y: {x == y}") # Output: False (due to precision)

# Using math.isclose()
print(f"math.isclose(x, y): {math.isclose(x, y)}") # Output: True (default relative tolerance)

# Custom tolerances
a = 1000.000000001
b = 1000.000000002
print(f"math.isclose(a, b, rel_tol=1e-10): {math.isclose(a, b, rel_tol=1e-10)}") # True, difference 1e-9 is within 1e-10 * 1000
print(f"math.isclose(a, b, rel_tol=1e-12): {math.isclose(a, b, rel_tol=1e-12)}") # False, 1e-9 is too large
print(f"math.isclose(a, b, abs_tol=1e-08): {math.isclose(a, b, abs_tol=1e-08)}") # True, difference 1e-9 is within 1e-8

# Small numbers and abs_tol
c = 1e-9
d = 2e-9
print(f"math.isclose(c, d): {math.isclose(c, d)}") # False (default rel_tol is too high for very small numbers)
print(f"math.isclose(c, d, abs_tol=1e-8): {math.isclose(c, d, abs_tol=1e-8)}") # True, difference 1e-9 is within 1e-8

2.2. Floating Point Utilities

Beyond basic operations, the math module offers utilities for deeper understanding and manipulation of floating-point numbers, particularly addressing their internal representation and potential precision loss in computations.

Understanding Internal Representation: math.frexp()

Floating-point numbers are stored internally in a format similar to scientific notation, where a number x is represented as m * 2^e. Here, m is the mantissa (or significand) and e is the exponent.

The function math.frexp(x) breaks the float x into its mantissa and exponent components, returning a tuple (mantissa, exponent). The mantissa is a float m such that 0.5 <= abs(m) < 1, and the exponent is an integer e. If x is zero, (0.0, 0) is returned.


import math

x = 12.5
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: 12.5

x = 0.25
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: 0.25

x = -0.75
mantissa, exponent = math.frexp(x)
print(f"For {x}: Mantissa = {mantissa}, Exponent = {exponent}")
print(f"Reconstructed: {mantissa * (2**exponent)}") # Output: Reconstructed: -0.75

Accurate Summation: math.fsum()

When summing a large series of floating-point numbers, especially if they vary widely in magnitude, the cumulative effect of small precision errors can lead to a significant loss of accuracy when using the built-in sum() function. This is because sum() adds numbers sequentially, and adding a very small number to a very large number can cause the small number's significant digits to be lost.

math.fsum(iterable) overcomes this by using a Kahan summation algorithm (or similar techniques) to maintain higher precision. It tracks the sum and the "lost" low-order bits separately, integrating them back into the sum to minimize error.

sum() (Default)
Less Accurate
math.fsum()
More Accurate

import math

# Create a list of numbers that highlight precision issues
# Many small numbers summed with one large number
data = [0.0000001] * 100000 + [10000000.0]

# Using built-in sum()
sum_result = sum(data)
print(f"Sum using built-in sum(): {sum_result:.10f}")

# Using math.fsum()
fsum_result = math.fsum(data)
print(f"Sum using math.fsum(): {fsum_result:.10f}")

# The true sum should be 10000000.0 + (0.0000001 * 100000) = 10000000.0 + 0.01 = 10000000.01
# You'll often see sum() yield 10000000.0 or something very close but not 10000000.01

Float Modulo Operations: math.fmod() vs % operator

Both math.fmod(x, y) and the built-in x % y operator compute the remainder of a division. However, they handle negative numbers differently, aligning with different mathematical conventions.

  • math.fmod(x, y): Follows the C standard, where the result has the same sign as the dividend (x).
  • x % y: Follows the Python convention, where the result has the same sign as the divisor (y).
Operation math.fmod(x, y) x % y Sign of Result
5.0 / 2.0 1.0 1.0 Same as dividend (positive) -5.0 / 2.0 -1.0 1.0 fmod: dividend (negative); %: divisor (positive) 5.0 / -2.0 1.0 -1.0 fmod: dividend (positive); %: divisor (negative) -5.0 / -2.0 -1.0 -1.0 Same as dividend (negative)

import math

print(f"math.fmod(5.0, 2.0) : {math.fmod(5.0, 2.0)}") # 1.0
print(f"5.0 % 2.0         : {5.0 % 2.0}")           # 1.0

print(f"math.fmod(-5.0, 2.0): {math.fmod(-5.0, 2.0)}") # -1.0
print(f"-5.0 % 2.0        : {-5.0 % 2.0}")          # 1.0

print(f"math.fmod(5.0, -2.0): {math.fmod(5.0, -2.0)}") # 1.0
print(f"5.0 % -2.0        : {5.0 % -2.0}")         # -1.0

2.3. Advanced Mathematical Operations

The math module provides a comprehensive set of functions for more complex mathematical computations, including exponential, logarithmic, trigonometric, and vector operations.

Exponential and Logarithmic Functions

  • 🔑 math.exp(x): Returns e^x.
  • 🔑 math.log(x[, base]): With one argument, returns the natural logarithm of x (to base e). With two arguments, returns the logarithm of x to the given base.
  • 🔑 math.log10(x): Returns the base-10 logarithm of x.
  • 🔑 math.pow(x, y): Returns x raised to the power y (x**y). Note that this function always returns a float.
  • 🔑 math.sqrt(x): Returns the square root of x. Raises ValueError if x is negative.

import math

print(f"e^2: {math.exp(2)}") # ~7.389
print(f"ln(10): {math.log(10)}") # ~2.302 (natural logarithm)
print(f"log base 2 of 8: {math.log(8, 2)}") # 3.0
print(f"log10(100): {math.log10(100)}") # 2.0
print(f"2^10 using pow: {math.pow(2, 10)}") # 1024.0
print(f"Square root of 81: {math.sqrt(81)}") # 9.0

Trigonometric Functions

All trigonometric functions in the math module operate on angles expressed in radians. If you have angles in degrees, use math.radians() to convert them before using trigonometric functions, and math.degrees() to convert results back.

  • 🔑 math.sin(x), math.cos(x), math.tan(x): Returns the sine, cosine, and tangent of x (in radians), respectively.
  • 🔑 math.asin(x), math.acos(x), math.atan(x): Returns the inverse sine, cosine, and tangent of x, returning an angle in radians.
  • 🔑 math.degrees(x), math.radians(x): Convert angle x from radians to degrees and vice versa.
  • 🔑 Hyperbolic variants: math.sinh(x), math.cosh(x), math.tanh(x) for hyperbolic sine, cosine, and tangent.

import math

angle_degrees = 45
angle_radians = math.radians(angle_degrees)
print(f"45 degrees in radians: {angle_radians}")

print(f"Sine of {angle_degrees} degrees: {math.sin(angle_radians):.4f}") # ~0.7071
print(f"Cosine of {angle_degrees} degrees: {math.cos(angle_radians):.4f}") # ~0.7071
print(f"Tangent of {angle_degrees} degrees: {math.tan(angle_radians):.4f}") # ~1.0000

# Inverse functions
inv_sin_result_radians = math.asin(0.7071067811865476)
print(f"Inverse sine of ~0.7071 (radians): {inv_sin_result_radians:.4f}")
print(f"Inverse sine of ~0.7071 (degrees): {math.degrees(inv_sin_result_radians):.2f}")

# Hyperbolic
print(f"Hyperbolic sine of 1: {math.sinh(1):.4f}")

Vector Mathematics

The math module provides functions for calculating Euclidean distances, which are fundamental in geometry, physics, and machine learning.

  • 🔑 math.hypot(x, y): Returns the Euclidean norm (magnitude) of a 2D vector (x, y), which is sqrt(x*x + y*y). This is equivalent to the length of the hypotenuse of a right-angled triangle with sides x and y.
  • 🔑 math.dist(p, q): Returns the Euclidean distance between two points p and q, each given as a sequence (tuple or list) of coordinates. This supports N-dimensional points.

import math

# 2D Euclidean distance using hypot()
x1, y1 = 0, 0
x2, y2 = 3, 4
distance_2d = math.hypot(x2 - x1, y2 - y1)
print(f"2D distance (hypot) between (0,0) and (3,4): {distance_2d}") # 5.0

# N-dimensional Euclidean distance using dist()
point_a = (1, 2, 3)
point_b = (4, 6, 9)
distance_nd = math.dist(point_a, point_b)
print(f"ND distance (dist) between {point_a} and {point_b}: {distance_nd:.2f}") # ~7.81

point_c = (10, 20)
point_d = (13, 24)
distance_2d_dist = math.dist(point_c, point_d)
print(f"2D distance (dist) between {point_c} and {point_d}: {distance_2d_dist}") # 5.0

2.4. Complex Numbers (cmath module)

When calculations involve the square root of negative numbers, or complex exponentials and logarithms, the math module will raise errors because it only handles real numbers. This is where the cmath (complex math) module becomes essential.

Distinction between math and cmath modules

The primary difference lies in their domain: math works with real floating-point numbers, while cmath extends these operations to the complex plane.

Feature math Module cmath Module
Number Type Real numbers (float) Complex numbers (complex) Return Type float complex (even if the imaginary part is zero) Error Handling (e.g., sqrt(-1)) Raises ValueError Returns 1j (the complex square root) Constants math.pi, math.e, math.inf, math.nan cmath.pi, cmath.e, cmath.inf, cmath.nan (complex equivalents)

import math
import cmath

# Square root of a negative number
try:
    print(f"math.sqrt(-1): {math.sqrt(-1)}")
except ValueError as e:
    print(f"math.sqrt(-1) error: {e}") # Output: math.sqrt(-1) error: math domain error

print(f"cmath.sqrt(-1): {cmath.sqrt(-1)}") # Output: (0+1j)
Warning: Always import cmath explicitly when dealing with complex numbers. Overlapping function names (e.g., sqrt, log) mean that simply importing * from both modules can lead to unexpected behavior or shadowing.

Coordinate Conversions

Complex numbers can be represented in two main forms:

  • Rectangular (Cartesian) form: z = x + yj, where x is the real part and y is the imaginary part.
  • Polar form: z = r * e^(i*phi), where r is the magnitude (radius) and phi is the phase (angle/argument).

The cmath module provides functions to convert between these representations.

  • 🔑 cmath.phase(z): Returns the phase of the complex number z, also known as the argument or angle. It's a float in the range (-pi, pi].
  • 🔑 cmath.polar(z): Returns the polar representation of z as a pair (r, phi), where r is the magnitude and phi is the phase.
  • 🔑 cmath.rect(r, phi): Returns the complex number x + yj given its polar coordinates magnitude r and phase phi.
[Complex Number z = x + yj] | v [cmath.polar(z)] | v (r, phi) - Radius & Phase ^ | [cmath.rect(r, phi)] | v [Complex Number z_reconstructed = x + yj] [cmath.phase(z)] ---> [phi (angle)]

import cmath

z = 1 + 1j # Complex number: 1 + i

# Get phase
phase_z = cmath.phase(z)
print(f"Phase of {z}: {phase_z:.4f} radians") # ~0.7854 (pi/4)

# Convert to polar coordinates
magnitude, phase_from_polar = cmath.polar(z)
print(f"Polar coordinates of {z}: Magnitude = {magnitude:.4f}, Phase = {phase_from_polar:.4f}") # Magnitude ~1.414, Phase ~0.7854

# Convert back to rectangular from polar
reconstructed_z = cmath.rect(magnitude, phase_from_polar)
print(f"Reconstructed from polar: {reconstructed_z}") # Output: (1+1j)

# Example with another complex number
z2 = -2 - 2j
magnitude2, phase2 = cmath.polar(z2)
print(f"Polar coordinates of {z2}: Magnitude = {magnitude2:.4f}, Phase = {phase2:.4f}") # Magnitude ~2.828, Phase ~-2.3562 (which is -3pi/4)

Practice & Application

🎯 Challenge 1: Precision in Scientific Measurements

You are analyzing data from a high-precision sensor. You have two measurements, measurement_a and measurement_b, and a known reference value, reference_val.

  1. Define measurement_a = 0.0000001 * 1000000 + 0.000000000001 and measurement_b = 0.1 + 0.000000000001.
  2. Use math.isclose() to determine if measurement_a and measurement_b are considered practically equal. Start with default tolerances, then try with abs_tol=1e-10. What do you observe?
  3. For value_to_round = 5.67, use math.ceil() and math.floor(). Also, use the built-in round() and explain why its output might be different for a value like 2.5.
  4. Imagine you need to sum a list of numbers: numbers = [0.000000000001] * 1000000 + [10.0]. Calculate the sum using both sum() and math.fsum(). Compare their results for precision. The true sum should be 10.000001.

import math

# Part 1 & 2: Floating-point comparison
measurement_a = 0.0000001 * 1000000 + 0.000000000001
measurement_b = 0.1 + 0.000000000001
print(f"Measurement A: {measurement_a:.15f}")
print(f"Measurement B: {measurement_b:.15f}")

print(f"\nAre A and B close (default)? {math.isclose(measurement_a, measurement_b)}")
print(f"Are A and B close (abs_tol=1e-10)? {math.isclose(measurement_a, measurement_b, abs_tol=1e-10)}")
# The default rel_tol might not be sufficient for numbers very close to zero or where
# the absolute difference is significant compared to their magnitude difference.
# abs_tol=1e-10 checks if the absolute difference is less than 1e-10.

# Part 3: Rounding techniques
value_to_round = 5.67
print(f"\nValue to round: {value_to_round}")
print(f"math.ceil({value_to_round}): {math.ceil(value_to_round)}")
print(f"math.floor({value_to_round}): {math.floor(value_to_round)}")
print(f"round({value_to_round}): {round(value_to_round)}")

print(f"\nRounding 2.5: {round(2.5)}")
print(f"Rounding 3.5: {round(3.5)}")
# Explanation: Python's built-in round() uses "round half to even" or banker's rounding.
# This means that numbers ending in .5 are rounded to the nearest even integer.
# So, 2.5 rounds to 2, and 3.5 rounds to 4.

# Part 4: Accurate summation
numbers = [0.000000000001] * 1000000 + [10.0]
true_sum = 10.000001 # 10.0 + (1e-12 * 1e6) = 10.0 + 1e-6 = 10.000001

sum_builtin = sum(numbers)
sum_fsum = math.fsum(numbers)

print(f"\nTrue sum: {true_sum:.10f}")
print(f"Sum using built-in sum(): {sum_builtin:.10f}")
print(f"Sum using math.fsum(): {sum_fsum:.10f}")
# Observation: math.fsum() should provide a result closer to the true sum due to its
# algorithm that minimizes cumulative floating-point errors.

🎯 Challenge 2: Complex Rotations and Modulo Differences

Explore transformations of complex numbers and the nuances of modulo operations with negative floats.

  1. Define a complex number z = 1 + 1j. Calculate its magnitude and phase using cmath.polar().
  2. "Rotate" the complex number by adding math.pi / 4 (45 degrees) to its phase. Convert this new polar representation back to a rectangular complex number using cmath.rect(). Print the original and rotated complex numbers.
  3. Consider the operation -7.5 modulo 2.0. Calculate the result using both the % operator and math.fmod(). Explain why the results are different.

import math
import cmath

# Part 1 & 2: Complex number rotation
z_original = 1 + 1j
print(f"Original complex number: {z_original}")

# Get polar coordinates
magnitude, phase = cmath.polar(z_original)
print(f"Polar form: Magnitude = {magnitude:.4f}, Phase = {phase:.4f} radians ({math.degrees(phase):.2f} degrees)")

# Rotate by adding pi/4 to the phase
rotated_phase = phase + (math.pi / 4)
# Note: magnitude remains the same for rotation
z_rotated = cmath.rect(magnitude, rotated_phase)
print(f"Rotated complex number (by 45 deg): {z_rotated}")

# Expected: z_original (1+1j) is at 45 deg. Adding another 45 deg makes it 90 deg (purely imaginary).
# For z_original = 1+1j, magnitude = sqrt(2), phase = pi/4.
# rotated_phase = pi/4 + pi/4 = pi/2.
# cmath.rect(sqrt(2), pi/2) should be approximately 0 + sqrt(2)j (0 + 1.414j)

# Part 3: Modulo differences
num = -7.5
divisor = 2.0

# Using % operator
remainder_percent = num % divisor
print(f"\n{num} % {divisor} = {remainder_percent}")

# Using math.fmod()
remainder_fmod = math.fmod(num, divisor)
print(f"math.fmod({num}, {divisor}) = {remainder_fmod}")

# Explanation:
# The % operator in Python ensures the result has the same sign as the divisor.
#   -7.5 = (-4 * 2.0) + 0.5 --> so remainder is 0.5 (same sign as 2.0)
# math.fmod() ensures the result has the same sign as the dividend.
#   -7.5 = (3 * -2.0) - 1.5 --> math.fmod divides in C style:
#   -7.5 = -3 * 2.0 - 1.5. This isn't quite right for fmod.
# More precisely, for fmod(x, y), the result is x - n*y where n is the integer part
# of x/y, truncated towards zero.
# x/y = -7.5 / 2.0 = -3.75. Truncate towards zero gives n = -3.
# Result = -7.5 - (-3 * 2.0) = -7.5 - (-6.0) = -1.5.
# So, for -7.5 % 2.0, the result is 0.5 (positive, like 2.0).
# For math.fmod(-7.5, 2.0), the result is -1.5 (negative, like -7.5).

3. Phase 2: Precision and Rational Arithmetic

While Python's built-in float type is highly optimized for performance and generally sufficient for scientific computing, it has inherent limitations due to its binary representation. This phase explores modules that provide alternatives for scenarios demanding exact decimal precision or precise rational number arithmetic: decimal for arbitrary-precision floats and fractions for exact rational numbers.

3.1. Arbitrary Precision Floats (decimal module)

Understanding "Floating Point Error"

Standard floating-point numbers (float in Python, typically IEEE 754 double-precision) are stored internally in binary. This means that many decimal fractions, like 0.1, cannot be represented exactly in binary. They become repeating binary fractions, similar to how 1/3 is a repeating decimal (0.333...). When these inexact binary representations are used in calculations, small errors can accumulate, leading to results that are slightly off from their true mathematical values.


# A classic example of floating-point inaccuracy
print(0.1 + 0.1 + 0.1 == 0.3) # Output: False
print(0.1 + 0.1 + 0.1)        # Output: 0.30000000000000004
  • 🔑 Root Cause: Binary representation of decimal fractions leads to approximation.
  • Consequence: Inaccurate equality checks, accumulated errors in sums or repeated operations, potential issues in financial calculations where exactness is paramount.
  • Mitigation: For critical precision, use decimal.Decimal or fractions.Fraction.
[Decimal Fraction (e.g., 0.1)] | v [Conversion to Binary Float] ---> [Inexact Binary Representation] | ^ v | [Arithmetic Operation] --------------------> [Accumulation of Small Errors] | v [Result (Slightly Off from True Value)] | v [Equality Check Fails / Financial Discrepancies]

Implementing Decimal type for exact representation

The decimal module provides a Decimal type for decimal floating-point arithmetic. This type can represent numbers exactly (up to its defined precision) and handles operations in base 10, thus avoiding the binary floating-point issues for decimal numbers.

Key Concept: Constructing Decimals

To ensure exactness, it is highly recommended to construct Decimal objects from strings rather than floats. If you convert a float to a Decimal, the float's inherent inaccuracies are carried over.

from decimal import Decimal
Decimal('0.1') # Exact
Decimal(0.1)   # Inherits float's inaccuracy, though Decimal tries to represent the *exact* binary float value.

from decimal import Decimal

# Using floats (prone to error)
float_sum = 0.1 + 0.1 + 0.1
print(f"Float sum: {float_sum}")

# Using Decimal from strings (recommended for exactness)
decimal_sum = Decimal('0.1') + Decimal('0.1') + Decimal('0.1')
print(f"Decimal sum (from strings): {decimal_sum}")
print(f"Decimal sum == Decimal('0.3'): {decimal_sum == Decimal('0.3')}") # Output: True

# Using Decimal from floats (demonstrates float's underlying inaccuracy)
decimal_from_float = Decimal(0.1)
print(f"Decimal from float(0.1): {decimal_from_float}")
# Notice the extended precision reflects the actual value of float 0.1

Context management for precision control

The decimal module allows you to control the precision of calculations globally using a "context." The most common setting to adjust is prec, which defines the number of significant digits for new Decimal results.

  • 🔑 decimal.getcontext(): Returns the current context for the active thread.
  • 🔑 decimal.getcontext().prec: An integer specifying the precision (total number of significant digits) for arithmetic operations. The default is usually 28.

from decimal import Decimal, getcontext

# Default precision
print(f"Default precision: {getcontext().prec}") # Output: 28 (or similar, depends on system)

# Perform a calculation with default precision
result_default_prec = Decimal('1') / Decimal('7')
print(f"1/7 with default precision: {result_default_prec}") # Output: 0.1428571428571428571428571429

# Change precision for a block of code (globally for current thread)
getcontext().prec = 10 # Set precision to 10 significant digits
result_high_prec = Decimal('1') / Decimal('7')
print(f"1/7 with 10-digit precision: {result_high_prec}") # Output: 0.1428571429

# Reset precision (good practice, or use `localcontext`)
getcontext().prec = 28

# Context managers (`localcontext`) allow temporary changes
with getcontext() as ctx:
    ctx.prec = 5
    result_low_prec = Decimal('1') / Decimal('7')
    print(f"1/7 with 5-digit precision (localcontext): {result_low_prec}") # Output: 0.14286
print(f"Precision after localcontext block: {getcontext().prec}") # Back to 28
float Precision
Fixed (~15-17 decimal digits)
Decimal Precision
Configurable (default 28, can be much higher)

3.2. Exact Rational Arithmetic (fractions module)

Storing numbers as numerator/denominator pairs

The fractions module provides the Fraction type, which represents a rational number as a pair of integers: a numerator and a denominator. This representation allows for arithmetic to be performed with absolute precision, as long as the numbers involved are rational.

  • 🔑 fractions.Fraction(numerator=0, denominator=1): Constructor. Can take two integers, one integer, a float (caution: subject to float inaccuracies), or a string representation of a fraction or float.

from fractions import Fraction

# From two integers
f1 = Fraction(3, 4)
print(f"Fraction(3, 4): {f1}") # Output: 3/4

# From an integer
f2 = Fraction(5)
print(f"Fraction(5): {f2}") # Output: 5

# From a string (recommended for exactness)
f3 = Fraction('1.25')
print(f"Fraction('1.25'): {f3}") # Output: 5/4

f4 = Fraction('1/3')
print(f"Fraction('1/3'): {f4}") # Output: 1/3

# From a float (inherits float's inaccuracy, see below)
f5 = Fraction(0.1)
print(f"Fraction(0.1): {f5}")
# Output: 3602879701896397/36028797018963968 (this is the exact fraction for the float 0.1)
Warning: Similar to the Decimal type, converting a float directly to a Fraction can embed the float's inherent binary inaccuracies into the Fraction. For exact decimal fractions, always use a string representation (e.g., Fraction('0.25') or Fraction('1/4')).

Performing arithmetic without precision loss

Arithmetic operations on Fraction objects automatically handle common denominators and simplification, ensuring that the result is always an exact rational number.


from fractions import Fraction
from decimal import Decimal

# Demonstrating precision with fractions
third = Fraction(1, 3)
sum_of_thirds = third + third + third
print(f"Sum of three 1/3s using Fraction: {sum_of_thirds}") # Output: 1

# Compare with floats and Decimals
float_sum = 1/3 + 1/3 + 1/3
print(f"Sum of three 1/3s using float: {float_sum}") # Output: 0.9999999999999999

decimal_sum = Decimal('1') / Decimal('3') + Decimal('1') / Decimal('3') + Decimal('1') / Decimal('3')
print(f"Sum of three 1/3s using Decimal (default prec=28): {decimal_sum}") # Output: 0.9999999999999999999999999999

# Basic operations
f_a = Fraction('1/2')
f_b = Fraction('1/3')

print(f"\n{f_a} + {f_b} = {f_a + f_b}") # Output: 5/6
print(f"{f_a} - {f_b} = {f_a - f_b}") # Output: 1/6
print(f"{f_a} * {f_b} = {f_a * f_b}") # Output: 1/6
print(f"{f_a} / {f_b} = {f_a / f_b}") # Output: 3/2
float Exactness
Low (Binary approx)
Decimal Exactness
High (Decimal approx, configurable)
Fraction Exactness
Perfect (Rational numbers only)

4. Phase 3: Probability and Randomness

Randomness plays a pivotal role in many computational tasks, from simulations and games to cryptographic security and statistical sampling. Python provides two primary modules for generating random numbers: the random module for general-purpose pseudorandomness and the secrets module for cryptographically strong random numbers.

Key Concept: Pseudorandom vs. True Random

Computers cannot generate truly random numbers. Instead, they produce pseudorandom numbers using deterministic algorithms. These algorithms start from a "seed" value and generate a sequence of numbers that appear random but are entirely predictable if the seed is known. For most simulations and games, this is perfectly adequate. For security-critical applications, cryptographically strong pseudorandom numbers are required, which are much harder to predict.

4.1. Pseudorandom Generators (random module)

The random module implements pseudorandom number generators for various distributions. It's built upon the Mersenne Twister algorithm, which is fast and generates high-quality pseudorandom numbers suitable for most non-cryptographic purposes.

Warning: Not Cryptographically Secure!

Do NOT use the random module for generating security-sensitive data like passwords, encryption keys, or authentication tokens. For such purposes, use the secrets module.

Basic Random Number Generation

  • 🔑 random.randint(a, b): Returns a random integer N such that a <= N <= b. Note that both endpoints are inclusive.
  • 🔑 random.uniform(a, b): Returns a random floating-point number N such that a <= N <= b for a <= b and b <= N <= a for b < a.
  • 🔑 random.random(): Returns a random float r in the range [0.0, 1.0) (i.e., 0.0 <= r < 1.0).

import random

# Random integer between 1 and 10 (inclusive)
print(f"Random integer (1-10): {random.randint(1, 10)}")

# Random float between 0.0 and 1.0
print(f"Random float (0.0-1.0): {random.random():.4f}")

# Random float between 10.0 and 20.0
print(f"Random uniform (10.0-20.0): {random.uniform(10.0, 20.0):.4f}")

Sequence Operations

The random module also provides functions to pick elements from sequences (lists, tuples, strings).

  • 🔑 random.choice(seq): Returns a randomly selected element from the non-empty sequence seq.
  • 🔑 random.choices(population, weights=None, k=1): Returns a list of k elements chosen from population with replacement. If weights is provided, it's a list of relative weights for each item.
  • 🔑 random.sample(population, k): Returns a list of k unique elements chosen from the population sequence without replacement. The size of the sample k must be less than or equal to the population size.
  • 🔑 random.shuffle(x): Shuffles the sequence x in place.

import random

my_list = ['apple', 'banana', 'cherry', 'date']

# Single item selection
print(f"Random choice from list: {random.choice(my_list)}")

# Weighted selection (with replacement)
# 'banana' is twice as likely as 'apple', 'cherry', 'date'
weighted_choices = random.choices(my_list, weights=[10, 20, 10, 10], k=3)
print(f"Weighted choices (k=3): {weighted_choices}")

# Selection without replacement
sample_items = random.sample(my_list, k=2)
print(f"Random sample (k=2): {sample_items}")

# Shuffling a list in place
numbers = [1, 2, 3, 4, 5]
random.shuffle(numbers)
print(f"Shuffled numbers: {numbers}")

Generator State Management

Pseudorandom generators produce sequences deterministically based on an initial "seed." By saving and restoring the generator's internal state, you can reproduce the exact same sequence of "random" numbers. This is incredibly useful for debugging, testing, and ensuring reproducibility in scientific simulations.

  • 🔑 random.getstate(): Returns an object capturing the current internal state of the generator.
  • 🔑 random.setstate(state): Restores the internal state of the generator to the state obtained from a previous call to getstate().
  • 🔑 random.seed(a=None, version=2): Initializes the random number generator. If a is omitted or None, the current system time is used.
[Start Program] | v [Initialize Generator (e.g., random.seed(42))] | v [Generate some numbers] ---> [Save State (state_1 = random.getstate())] | | v v [Generate more numbers] --------------------> [Continue Program] | | v v [Encounter Bug / Need Reproducibility] [Restore State (random.setstate(state_1))] | | v v [Rerun from Saved State] <------------------ [Regenerate Same Sequence]

import random

# Set an initial seed for reproducibility
random.seed(123)

print("First sequence:")
for _ in range(3):
    print(random.randint(1, 100))

# Save the current state
current_state = random.getstate()

print("\nSecond sequence (continues from previous):")
for _ in range(3):
    print(random.randint(1, 100))

# Restore the saved state
random.setstate(current_state)

print("\nThird sequence (reproduced from saved state):")
for _ in range(3):
    print(random.randint(1, 100))
# The numbers in the 'Second sequence' and 'Third sequence' will be identical.

4.2. Probability Distributions

The random module can generate numbers following specific statistical distributions, which is essential for simulations, modeling, and statistical analysis.

Generating numbers based on statistical models

  • 🔑 random.gauss(mu, sigma): Returns a random floating-point number from a Gaussian (Normal) distribution. mu is the mean, and sigma is the standard deviation. This is generally preferred over normalvariate() for speed.
  • Other distributions: random.expovariate(lambd) (exponential), random.lognormvariate(mu, sigma) (lognormal), random.vonmisesvariate(mu, kappa) (circular), etc.

import random
import matplotlib.pyplot as plt # Assuming matplotlib is available for visualization

# Generate 1000 numbers from a Gaussian distribution
# Mean (mu) = 0, Standard Deviation (sigma) = 1 (Standard Normal Distribution)
gaussian_numbers = [random.gauss(0, 1) for _ in range(10000)]

# Plotting a histogram to visualize the distribution
# (Requires matplotlib: pip install matplotlib)
# plt.hist(gaussian_numbers, bins=50, density=True, alpha=0.7, color='skyblue')
# plt.title("Gaussian (Normal) Distribution Sample")
# plt.xlabel("Value")
# plt.ylabel("Density")
# plt.grid(True)
# plt.show()

The histogram above (if visualized) would show the characteristic bell-curve shape of a Normal distribution, demonstrating the statistical properties of the generated numbers.

4.3. Secure Randomness (secrets module)

For applications requiring strong security, where unpredictability is paramount, the random module is inadequate. The secrets module is explicitly designed for cryptographic purposes, leveraging the operating system's most secure random number sources.

Comparison of SystemRandom and the secrets module

The random module includes a class called SystemRandom, which also uses os.urandom() for cryptographically secure random numbers. The secrets module is built on top of SystemRandom but provides a higher-level, more convenient API tailored for common security needs like token generation.

Feature random module random.SystemRandom secrets module
Randomness Source Mersenne Twister (software algorithm) os.urandom() (OS-provided source) os.urandom() (OS-provided source) Security Level Non-cryptographic Cryptographically secure Cryptographically secure Use Cases Simulations, games, non-security sampling Lower-level access to secure random bytes/numbers Password generation, API keys, security tokens, temporary URLs API Style General-purpose number generation Instance methods (e.g., SystemRandom().randint()) High-level, function-based for common security tasks
Recommendation: For security-critical applications, always prefer the secrets module due to its specialized, high-level functions that reduce the chance of misuse.

Generating secure tokens and URLs

The secrets module provides functions tailored for generating data suitable for security applications.

  • 🔑 secrets.token_bytes(nbytes=None): Returns a random byte string containing nbytes random bytes. If nbytes is None or not supplied, a reasonable default is used.
  • 🔑 secrets.token_hex(nbytes=None): Returns a random text string in hexadecimal, suitable for a secure token.
  • 🔑 secrets.token_urlsafe(nbytes=None): Returns a random URL-safe text string, containing nbytes random bytes.
  • 🔑 secrets.choice(sequence): Returns a randomly chosen element from a non-empty sequence using a cryptographically secure random source.

import secrets

# Generate a secure hexadecimal token (e.g., for API keys)
hex_token = secrets.token_hex(16) # 16 bytes = 32 hex characters
print(f"Secure Hex Token: {hex_token}")

# Generate a URL-safe token (e.g., for password reset links)
url_safe_token = secrets.token_urlsafe(24) # 24 bytes base64-encoded
print(f"URL-Safe Token: {url_safe_token}")

# Choose a secure random item from a list (e.g., for a temporary password character)
characters = ['a', 'b', 'c', '1', '2', '3', '!', '@']
secure_char_choice = secrets.choice(characters)
print(f"Secure random character: {secure_char_choice}")

# Generate a temporary password
temp_password_chars = [secrets.choice('abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()') for _ in range(12)]
temp_password = "".join(temp_password_chars)
print(f"Generated temp password: {temp_password}")

Practice & Application

🎯 Challenge: Secure Lottery Simulation & OTP Generation

You are tasked with building a small system that handles two distinct randomness requirements:
Part 1: Lottery Draw (Non-Cryptographic) Simulate a lottery draw where:

  1. Six unique "main numbers" are drawn from a pool of 1 to 49 (inclusive).
  2. One "bonus number" is drawn from the same pool (1 to 49), ensuring it is unique from the six main numbers.
  3. From a list of "lucky phrases" (e.g., ["Feeling Lucky!", "Big Winner!", "Keep Trying!", "Almost There!"]), select three phrases with replacement. Make "Big Winner!" twice as likely to be chosen as any other phrase.

Part 2: Secure One-Time Passcode (Cryptographic) For a user authentication system, generate:

  1. A 12-character hexadecimal one-time password (OTP) that must be cryptographically secure.
  2. A 32-character URL-safe string for a password reset link.

import random
import secrets

# Part 1: Lottery Draw (Non-Cryptographic)

# 1. Draw six unique main numbers from 1 to 49
pool = list(range(1, 50))
main_numbers = random.sample(pool, k=6)
main_numbers.sort() # Sort for display purposes
print(f"Lottery Main Numbers: {main_numbers}")

# 2. Draw one bonus number, unique from the main numbers
# Remove main numbers from the pool for bonus draw
remaining_pool = [n for n in pool if n not in main_numbers]
bonus_number = random.choice(remaining_pool)
print(f"Lottery Bonus Number: {bonus_number}")

# 3. Select three lucky phrases with weighted probability
lucky_phrases = ["Feeling Lucky!", "Big Winner!", "Keep Trying!", "Almost There!"]
# Assign weights: Big Winner is 2, others are 1
weights = [1, 2, 1, 1]
selected_phrases = random.choices(lucky_phrases, weights=weights, k=3)
print(f"Lucky Dip Phrases: {selected_phrases}")

print("\n" + "="*40 + "\n")

# Part 2: Secure One-Time Passcode (Cryptographic)

# 1. Generate a 12-character hexadecimal OTP
# Each byte becomes 2 hex characters, so 6 bytes for 12 hex chars
otp = secrets.token_hex(6)
print(f"Secure One-Time Password (OTP): {otp}")

# 2. Generate a 32-character URL-safe string
# token_urlsafe() produces base64 strings, which are ~4/3 the length of bytes
# For 32 chars, we need roughly 32 * 3/4 = 24 bytes
recovery_link_token = secrets.token_urlsafe(24)
print(f"Password Reset Token (URL-safe): {recovery_link_token}")

5. Phase 4: Statistical Analysis

Understanding and summarizing data is a fundamental aspect of many fields, including science, finance, and social studies. Python's statistics module provides a straightforward way to calculate common mathematical statistics of numerical data. While not as extensive as specialized libraries like NumPy or SciPy, it's excellent for basic descriptive statistics on collections of real numbers.

5.1. Measures of Central Tendency (statistics module)

Measures of central tendency aim to describe the "center" or typical value of a dataset. They provide a single value that represents the entire distribution.

[Raw Data Set] | v [Calculate Mean?] ---> [Average Value] | v [Calculate Median?] ---> [Middle Value (ordered)] | v [Calculate Mode(s)?] ---> [Most Frequent Value(s)]

Calculating the "center" of data

  • 🔑 statistics.mean(data): Returns the arithmetic mean ("average") of the data. It's the sum of all values divided by the count of values. Suitable for symmetrically distributed data without extreme outliers.
  • 🔑 statistics.median(data): Returns the median (middle value) of the data. If the data count is odd, it's the middle element. If even, it's the average of the two middle elements. The median is robust to outliers.
  • 🔑 statistics.mode(data): Returns the single most common data point from a discrete or nominal data. Raises StatisticsError if there is no unique mode (i.e., multiple values appear with the same highest frequency).

import statistics

data1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data2 = [1, 1, 2, 3, 4, 5, 5, 5, 6, 7]
data3 = [10, 20, 30, 40, 1000] # Data with an outlier

print(f"Data 1: {data1}")
print(f"  Mean: {statistics.mean(data1)}")    # 5.5
print(f"  Median: {statistics.median(data1)}")  # 5.5 (average of 5 and 6)
try:
    print(f"  Mode: {statistics.mode(data1)}")
except statistics.StatisticsError as e:
    print(f"  Mode: {e}") # No unique mode found

print(f"\nData 2: {data2}")
print(f"  Mean: {statistics.mean(data2)}")    # 3.9
print(f"  Median: {statistics.median(data2)}")  # 4.5 (average of 4 and 5)
print(f"  Mode: {statistics.mode(data2)}")    # 5

print(f"\nData 3 (with outlier): {data3}")
print(f"  Mean: {statistics.mean(data3)}")    # (10+20+30+40+1000)/5 = 220.0 (heavily influenced by 1000)
print(f"  Median: {statistics.median(data3)}")  # 30 (more robust to outlier)
print(f"  Mode: {statistics.mode(data3)}")    # No unique mode (raises error if uncommented)

Handling multi-modal data

When a dataset has two or more values that occur with the same highest frequency, it is called multi-modal. The statistics.mode() function will raise an error in this scenario. For such cases, statistics.multimode() is used.

  • 🔑 statistics.multimode(data): Returns a list of the most frequently occurring values in the order they were first encountered in the data. If no unique mode exists, it will return all values that have the highest frequency.

import statistics

data_multimodal = [1, 2, 2, 3, 3, 4, 5]

# Using mode() would raise an error
try:
    print(f"Mode for {data_multimodal}: {statistics.mode(data_multimodal)}")
except statistics.StatisticsError as e:
    print(f"statistics.mode() error for {data_multimodal}: {e}") # Output: no unique mode; found 2 equally common values

# Using multimode() handles it correctly
print(f"Multimode for {data_multimodal}: {statistics.multimode(data_multimodal)}") # Output: [2, 3]

data_no_mode = [1, 2, 3, 4, 5]
print(f"Multimode for {data_no_mode}: {statistics.multimode(data_no_mode)}") # Output: [1, 2, 3, 4, 5] (all unique, so all are modes)

5.2. Measures of Dispersion

Measures of dispersion (or variability) describe how spread out the data points are. They quantify the range, variance, or standard deviation from the central tendency.

Analyzing data spread

  • 🔑 statistics.variance(data, xbar=None): Calculates the sample variance of data. The variance measures the average of the squared differences from the mean. It's often used when analyzing a subset (sample) of a larger population.
  • 🔑 statistics.stdev(data, xbar=None): Calculates the sample standard deviation of data. The standard deviation is the square root of the variance and provides a measure of spread in the original units of the data.

import statistics

data = [60, 65, 70, 75, 80] # Scores of a sample of students

# Calculate mean first (or let functions do it)
mean_data = statistics.mean(data)
print(f"Data: {data}")
print(f"Mean: {mean_data}")

# Sample Variance
sample_variance = statistics.variance(data)
print(f"Sample Variance: {sample_variance:.2f}") # 62.50

# Sample Standard Deviation
sample_stdev = statistics.stdev(data)
print(f"Sample Standard Deviation: {sample_stdev:.2f}") # 7.91

Distinction between Population and Sample calculations

A crucial distinction in statistics is whether you are analyzing an entire population or just a sample from that population.

  • Population: Refers to the entire group of individuals or instances about which we want to draw conclusions.
  • Sample: A subset of the population used to infer characteristics of the whole population.

The formulas for variance and standard deviation differ slightly depending on whether you're working with a population or a sample. Specifically, sample variance and standard deviation use a denominator of n-1 (Bessel's correction) instead of n to provide an unbiased estimate of the population variance.

  • 🔑 Population:
    • statistics.pvariance(data, mu=None): Calculates the population variance of data.
    • statistics.pstdev(data, mu=None): Calculates the population standard deviation of data.
  • 🔑 Sample:
    • statistics.variance(data, xbar=None): Calculates the sample variance of data. (Uses n-1 in denominator)
    • statistics.stdev(data, xbar=None): Calculates the sample standard deviation of data. (Uses n-1 in denominator)
Measure Population Function Sample Function Denominator for Variance
Variance statistics.pvariance() statistics.variance() Population: n
Sample: n-1 Standard Deviation statistics.pstdev() statistics.stdev() Population: n
Sample: n-1
Key Concept: Bessel's Correction (n-1)

When estimating population variance from a sample, dividing by n-1 instead of n (where n is the sample size) corrects for the fact that sample variance tends to underestimate population variance. This correction is known as Bessel's correction and is automatically applied by statistics.variance() and statistics.stdev().


import statistics

# Assume this represents an entire small population
population_data = [10, 15, 20, 25, 30]

# Assume this is a sample drawn from a larger population
sample_data = [12, 18, 22, 28]

print(f"Population Data: {population_data}")
print(f"  Population Variance: {statistics.pvariance(population_data):.2f}") # (20.0)
print(f"  Population StDev: {statistics.pstdev(population_data):.2f}")   # (4.47)

print(f"\nSample Data: {sample_data}")
print(f"  Sample Variance: {statistics.variance(sample_data):.2f}")    # (58.67, using n-1=3)
print(f"  Sample StDev: {statistics.stdev(sample_data):.2f}")      # (7.66, using n-1=3)

# To illustrate the difference for the *same* data
# If we treat population_data as a sample:
print(f"\nPopulation Data treated as Sample:")
print(f"  Sample Variance: {statistics.variance(population_data):.2f}")    # (62.50, using n-1=4)
print(f"  Sample StDev: {statistics.stdev(population_data):.2f}")      # (7.91, using n-1=4)

Practice & Application

🎯 Challenge: Analyzing Student Exam Scores

You have been provided with a list of exam scores for a group of students. Your task is to perform a basic statistical analysis using Python's statistics module.

Use the following dataset: exam_scores = [78, 85, 92, 78, 95, 88, 80, 92, 78, 85, 90, 92, 65, 100]

  1. Calculate and print the mean of the exam scores.
  2. Calculate and print the median of the exam scores.
  3. Attempt to find the mode using statistics.mode(). If it raises an error, explain why, and then use statistics.multimode() to find all most frequent scores.
  4. Assuming these scores represent a sample of student performance, calculate and print the sample variance and sample standard deviation.
  5. Now, assuming these scores represent the entire population of a very small class, calculate and print the population variance and population standard deviation.
  6. Compare the sample and population standard deviation values and briefly explain the reason for any difference.

import statistics

exam_scores = [78, 85, 92, 78, 95, 88, 80, 92, 78, 85, 90, 92, 65, 100]

print(f"Exam Scores: {exam_scores}")
print(f"Number of scores: {len(exam_scores)}")

# 1. Calculate the mean
mean_score = statistics.mean(exam_scores)
print(f"1. Mean Score: {mean_score:.2f}")

# 2. Calculate the median
median_score = statistics.median(exam_scores)
print(f"2. Median Score: {median_score:.2f}")

# 3. Find the mode(s)
print("\n3. Mode(s):")
try:
    single_mode = statistics.mode(exam_scores)
    print(f"  Unique Mode (statistics.mode()): {single_mode}")
except statistics.StatisticsError as e:
    print(f"  Error with statistics.mode(): {e}")
    multi_modes = statistics.multimode(exam_scores)
    print(f"  All Modes (statistics.multimode()): {multi_modes}")

# 4. Calculate sample variance and standard deviation
print("\n4. Sample Statistics:")
sample_variance = statistics.variance(exam_scores)
print(f"  Sample Variance: {sample_variance:.2f}")
sample_stdev = statistics.stdev(exam_scores)
print(f"  Sample Standard Deviation: {sample_stdev:.2f}")

# 5. Calculate population variance and standard deviation
print("\n5. Population Statistics:")
population_variance = statistics.pvariance(exam_scores)
print(f"  Population Variance: {population_variance:.2f}")
population_stdev = statistics.pstdev(exam_scores)
print(f"  Population Standard Deviation: {population_stdev:.2f}")

# 6. Comparison and Explanation
print("\n6. Comparison and Explanation:")
print(f"The Sample Standard Deviation ({sample_stdev:.2f}) is higher than the Population Standard Deviation ({population_stdev:.2f}).")
print("This is because the sample standard deviation (and variance) uses Bessel's correction (dividing by n-1 instead of n).")
print("Bessel's correction accounts for the fact that a sample's variability tends to underestimate the true variability of the larger population from which it was drawn, providing a less biased estimate.")

6. Phase 5: Data Integrity and Hashing

In the digital world, ensuring the integrity of data and the security of sensitive information like passwords is paramount. Cryptographic hash functions provide a robust mechanism for achieving these goals. Python's hashlib module offers a comprehensive suite of secure hash algorithms.

6.1. Cryptographic Hashing (hashlib module)

Understanding "One-way" hash functions

A cryptographic hash function is a mathematical algorithm that maps data of arbitrary size (the "message") to a bit array of a fixed size (the "hash value," "hash," or "message digest"). These functions are specifically designed to be "one-way" and possess several critical properties:

Key Concept: Properties of Cryptographic Hash Functions
  • Deterministic: The same input message always produces the same hash value.
  • Quick Computation: It's computationally efficient to compute the hash value for any given message.
  • Pre-image Resistance (One-Way Property): Given a hash value, it's computationally infeasible to find the original input message.
  • Second Pre-image Resistance: Given an input message M1, it's computationally infeasible to find a different input message M2 that has the same hash value as M1.
  • Collision Resistance: It's computationally infeasible to find two different input messages M1 and M2 that produce the same hash value.
[Input Data (e.g., "Hello World!")] | v [Cryptographic Hash Function (e.g., SHA-256)] | v [Fixed-Size Hash Value (e.g., "d2ed...a3c")] | | X (One-Way) v [Original Data]

Common applications include data integrity verification (if a file's hash changes, the file has been altered), digital signatures, and secure password storage.

Implementing various hash algorithms

The hashlib module provides constructors for various common hash algorithms. To use them, you typically create a hash object, feed it bytes (strings must be encoded), and then retrieve the digest in hexadecimal or binary format.

  • 🔑 SHA-2 (Secure Hash Algorithm 2): A family of cryptographic hash functions published by the NIST. SHA-256 is commonly used.
  • 🔑 SHA-3 (Secure Hash Algorithm 3): The latest generation of the Secure Hash Algorithm, chosen through a public competition. Provides a different construction from SHA-2.
  • 🔑 BLAKE2: A cryptographically secure hash function that is faster than SHA-2 and SHA-3 on modern processors, while still offering similar security strength.

import hashlib

data = "This is a secret message to be hashed.".encode('utf-8') # Input must be bytes!

# SHA-256
sha256_hash_obj = hashlib.sha256()
sha256_hash_obj.update(data)
sha256_digest = sha256_hash_obj.hexdigest()
print(f"SHA-256 Hash: {sha256_digest}")
print(f"Length (chars): {len(sha256_digest)}") # 256 bits = 64 hex characters

# SHA-3 (e.g., SHA-3 256-bit variant)
sha3_hash_obj = hashlib.sha3_256()
sha3_hash_obj.update(data)
sha3_digest = sha3_hash_obj.hexdigest()
print(f"SHA-3 256-bit Hash: {sha3_digest}")
print(f"Length (chars): {len(sha3_digest)}") # 256 bits = 64 hex characters

# BLAKE2s (smaller, faster, 256-bit digest)
blake2s_hash_obj = hashlib.blake2s()
blake2s_hash_obj.update(data)
blake2s_digest = blake2s_hash_obj.hexdigest()
print(f"BLAKE2s Hash: {blake2s_digest}")
print(f"Length (chars): {len(blake2s_digest)}") # 256 bits = 64 hex characters

# BLAKE2b (larger, 512-bit digest)
blake2b_hash_obj = hashlib.blake2b()
blake2b_hash_obj.update(data)
blake2b_digest = blake2b_hash_obj.hexdigest()
print(f"BLAKE2b Hash: {blake2b_digest}")
print(f"Length (chars): {len(blake2b_digest)}") # 512 bits = 128 hex characters

Identifying and avoiding deprecated algorithms

The field of cryptography is constantly evolving. Algorithms once considered secure can become vulnerable due to new techniques for breaking codes (cryptanalysis) or increases in computing power. It's vital to stay current and avoid using outdated (deprecated) algorithms for new security-sensitive applications.

  • MD5 (Message-Digest Algorithm 5):

    Once widely used, MD5 has been shown to be severely compromised by collision attacks. This means it is feasible to find two different inputs that produce the same MD5 hash, undermining its integrity verification and digital signature uses.

  • SHA-1 (Secure Hash Algorithm 1):

    Similar to MD5, practical collision attacks have been demonstrated against SHA-1. While harder to execute than MD5 collisions, it is no longer considered safe for cryptographic purposes. Major browsers and security organizations have phased out support or flagged SHA-1 certificates as insecure.

Critical Warning: Do NOT use MD5 or SHA-1 for new cryptographic applications!

These algorithms are broken and susceptible to attacks that could compromise data integrity, digital signatures, and password security. Always use modern, strong algorithms like SHA-256, SHA-3, or BLAKE2 for security-sensitive tasks.

MD5
Broken
SHA-1
Deprecated
SHA-256
Good
SHA-3 / BLAKE2
Excellent

6.2. Password Security

Storing user passwords requires special care. Directly hashing passwords, even with strong algorithms, is not enough. Attackers can employ techniques like rainbow tables and brute-force attacks. Two essential techniques mitigate these risks: salting and key derivation functions (KDFs).

Salting to prevent Rainbow Table attacks

A rainbow table is a precomputed table for reversing cryptographic hash functions, typically used to crack password hashes. If many users have common passwords (e.g., "password123"), their hashes will be identical. An attacker can compute the hash of "password123" once, then look for that hash across all stolen user hashes.

Salting prevents this by adding a unique, random string (the "salt") to each password *before* hashing. This means even if two users choose the same password, their combined (password + salt) inputs will be different, resulting in different hash values.

  • 🔑 Salt: A random, unique string (or sequence of bytes) generated for each password.
  • 🔑 Process: The password and its unique salt are concatenated, and then the combined string is hashed. The salt is stored in plain text alongside the hash.
  • Benefit: A single rainbow table cannot be used to crack multiple passwords. Each password requires a new, independent hash calculation attempt.
[User chooses Password] | v [Generate Random, Unique Salt] | v [Concatenate (Password + Salt)] | v [Hash Function (e.g., SHA-256)] | v [Store (Salt, Hash)]

Key derivation for slow brute-force attempts

Even with salting, if an attacker gets access to hashed passwords, they can still try to brute-force individual password hashes by guessing passwords, salting them, and comparing the result to the stored hash. Modern CPUs are incredibly fast at computing standard hash functions, making brute-forcing weak passwords a significant threat.

Key Derivation Functions (KDFs) are specifically designed to be computationally expensive (slow) to execute. This intentional slowness makes brute-force attacks prohibitively time-consuming for attackers, even with powerful hardware.

  • 🔑 PBKDF2 (Password-Based Key Derivation Function 2): A widely recommended KDF standard.
  • 🔑 Work Factor (Iterations): KDFs like PBKDF2 involve repeating the hashing process many times (thousands or millions of "iterations"). This value should be adjusted over time as computational power increases.
  • Benefit: Significantly increases the time and resources required for an attacker to test each password guess, making brute-force attacks impractical.
Key Concept: Hashing vs. Key Derivation

While standard hash functions like SHA-256 are fast and good for integrity checking, they are *not* designed to resist brute-force password cracking. KDFs like PBKDF2, bcrypt, or scrypt are purpose-built for password hashing by intentionally adding computational cost.

The hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None) function is Python's standard library implementation for PBKDF2.

  • hash_name: The desired hash algorithm (e.g., 'sha256').
  • password: The user's password (as bytes).
  • salt: A unique, random salt (as bytes).
  • iterations: The number of hashing rounds. This should be as high as feasible without impacting user experience too much (e.g., 100,000 to 600,000+).
  • dklen: The desired length of the derived key (hash output).

import hashlib
import os

# User's raw password
password_raw = "mySuperSecretPassword123!"

# Generate a random salt for this user (should be unique for each user)
# A good practice is to use at least 16 bytes for the salt.
salt = os.urandom(16)
print(f"Generated Salt (hex): {salt.hex()}")

# Convert password to bytes
password_bytes = password_raw.encode('utf-8')

# Number of iterations (work factor) - adjust based on current hardware
# Start with at least 100,000, typically much higher (e.g., 400,000 or more)
iterations = 300_000 # Using underscores for readability

# Derived Key Length (e.g., 32 bytes for a 256-bit key)
dklen = 32

# Perform PBKDF2_HMAC
derived_key = hashlib.pbkdf2_hmac(
    'sha256',          # Hash algorithm to use internally
    password_bytes,    # The password as bytes
    salt,              # The salt as bytes
    iterations,        # The number of iterations
    dklen=dklen        # The length of the derived key
)

# Store the salt and derived_key (hash) in your database
# Both should be stored as hex strings or base64 for text-based storage
print(f"Stored Hash (hex): {derived_key.hex()}")

# To verify a password:
# 1. Retrieve the stored_salt and stored_hash from the database for the user.
# 2. Get the new_password_attempt from the user.
# 3. Calculate new_derived_key = hashlib.pbkdf2_hmac('sha256', new_password_attempt.encode('utf-8'), stored_salt, iterations, dklen=dklen)
# 4. Compare new_derived_key == stored_hash (using a constant-time comparison if available, like secrets.compare_digest)
Critical Warning: Never store raw passwords!

Always store password hashes using a combination of a unique salt and a strong Key Derivation Function like PBKDF2, bcrypt, or scrypt. The secrets module's compare_digest() function should be used for comparing hashes to prevent timing attacks.

Homework / Challenges

📚 Homework 1: Exact Financial Calculations

You are managing a small investment portfolio and need to ensure absolute precision for all monetary values and share counts. Use the decimal and fractions modules to perform the following calculations:

  1. Initial Investment: You buy 150 shares of "TechCo" at $23.45 per share. Calculate the total initial investment. Represent all monetary values using Decimal and ensure calculations are done with a precision of 4 decimal places for currency.
  2. Stock Split: TechCo announces a 3-for-2 stock split. This means for every 2 shares you own, you now get 3. Calculate your new total number of shares using Fraction to maintain exactness, as share counts must be whole numbers.
  3. Profit Calculation: After the split, you sell 200 shares at $15.63 per share. Calculate the revenue from this sale using Decimal. Then, calculate your overall profit or loss by comparing your total revenue (from this sale, and assuming no other sales) against your initial investment.
  4. Remaining Shares: How many shares do you have left after the sale? Ensure this is an exact integer.

from decimal import Decimal, getcontext
from fractions import Fraction

# Set decimal precision for currency calculations
getcontext().prec = 4

# 1. Initial Investment
shares_bought = Decimal('150')
price_per_share = Decimal('23.45')
initial_investment = shares_bought * price_per_share
print(f"1. Initial Investment: ${initial_investment}")

# 2. Stock Split
# Represent current shares as a Fraction to handle exact splits
current_shares_fraction = Fraction(int(shares_bought), 1)
split_ratio = Fraction(3, 2)
new_shares_fraction = current_shares_fraction * split_ratio
new_shares = int(new_shares_fraction) # Convert back to integer for actual share count
print(f"2. After 3-for-2 split: {new_shares} shares")

# 3. Profit Calculation
shares_sold = Decimal('200')
sale_price_per_share = Decimal('15.63')
revenue_from_sale = shares_sold * sale_price_per_share
print(f"3. Revenue from sale: ${revenue_from_sale}")

# Calculate overall profit/loss
# For simplicity, assuming the initial investment applies to all shares acquired pre-split
# and the cost basis for all new shares is effectively adjusted.
# A more complex scenario would require accounting for cost basis per share.
# Here, we compare total revenue from sold shares against a proportional part of the initial investment.
# This approach simplifies to (shares_sold / initial_shares) * initial_investment
# For a proper profit calculation, consider initial investment per share.
# However, a simpler profit/loss against *initial investment* for *sold shares* is a better exercise.
# Cost basis per share adjusted for split: original_price_per_share * (2/3)
cost_per_share_adjusted = price_per_share * Fraction(2,3) # Using Fraction for exactness
cost_of_sold_shares = shares_sold * Decimal(str(cost_per_share_adjusted.limit_denominator(1000))) # Convert Fraction to Decimal for multiplication
# limit_denominator is used here as an example to avoid extreme precision, could also just convert str(cost_per_share_adjusted)

# Let's simplify: compare total revenue against total initial investment for a rough estimate
# For a realistic profit, one needs to track the cost basis of the *specific* shares sold.
# A simpler approach: (revenue_from_sale / shares_sold) * new_shares_fraction.denominator
# The problem asks for *overall profit/loss* against *initial investment*.
# Let's consider the profit from the 200 shares sold based on their adjusted cost.
profit_loss = revenue_from_sale - (Decimal(str(cost_per_share_adjusted)) * shares_sold)
print(f"   Profit/Loss from sold shares: ${profit_loss:.4f}")


# 4. Remaining Shares
remaining_shares = new_shares - int(shares_sold)
print(f"4. Remaining shares: {remaining_shares}")

📚 Homework 2: Analyzing a Biased Coin Experiment

You suspect a coin is biased towards "Heads" (H). You decide to conduct an experiment by flipping the coin many times and then analyze the results statistically.

  1. Simulate Coin Flips: Write a function simulate_flips(num_flips, heads_prob, seed=None) that simulates num_flips coin flips. Each flip should have a heads_prob chance of being 'H' and (1 - heads_prob) chance of being 'T'. Use random.choices() for this. If a seed is provided, use random.seed() to initialize the generator for reproducibility.
  2. Experiment Setup: Call your function to simulate 1000 coin flips with a heads_prob = 0.6. Store the results (a list of 'H' or 'T').
  3. Statistical Analysis:
    • Calculate the actual proportion of 'H' and 'T' in your simulation results.
    • Calculate the number of consecutive 'H' streaks of length 2 or more (e.g., 'HH', 'HHH', etc.) and 'T' streaks of length 2 or more.
    • (Optional: If you converted results to numbers, calculate mean/median).
  4. Reproducibility Test:
    • Run the simulate_flips function once with num_flips = 10, heads_prob = 0.5, and seed = 42. Print the results.
    • Save the state of the random generator after these 10 flips.
    • Generate 5 more flips without a seed.
    • Restore the saved state, and then generate those same 5 flips again. Confirm they are identical.

import random
import statistics

def simulate_flips(num_flips, heads_prob, seed=None):
    if seed is not None:
        random.seed(seed)
    
    outcomes = ['H', 'T']
    weights = [heads_prob, 1 - heads_prob]
    
    flips = random.choices(outcomes, weights=weights, k=num_flips)
    return flips

# 2. Experiment Setup
num_flips_experiment = 1000
heads_probability = 0.6
experiment_results = simulate_flips(num_flips_experiment, heads_probability)

# 3. Statistical Analysis
count_H = experiment_results.count('H')
count_T = experiment_results.count('T')
prop_H = count_H / num_flips_experiment
prop_T = count_T / num_flips_experiment

print(f"--- Coin Flip Experiment ({num_flips_experiment} flips, P(H)={heads_probability}) ---")
print(f"  Actual Heads: {count_H} ({prop_H:.2%})")
print(f"  Actual Tails: {count_T} ({prop_T:.2%})")

# Calculate streaks (simple implementation)
s = "".join(experiment_results)
streaks_H = s.count('HH') # A simple count, not truly unique streak detection
streaks_T = s.count('TT') # For actual streak detection, a loop and state machine would be needed.

# Simplified streak check: count occurrences of double H or T
num_hh_streaks = 0
num_tt_streaks = 0
for i in range(len(experiment_results) - 1):
    if experiment_results[i] == 'H' and experiment_results[i+1] == 'H':
        num_hh_streaks += 1
    elif experiment_results[i] == 'T' and experiment_results[i+1] == 'T':
        num_tt_streaks += 1

print(f"  Approximate 'HH' streaks (consecutive pairs): {num_hh_streaks}")
print(f"  Approximate 'TT' streaks (consecutive pairs): {num_tt_streaks}")

# 4. Reproducibility Test
print("\n--- Reproducibility Test ---")

# Run first 10 flips with a seed
first_10_flips = simulate_flips(10, 0.5, seed=42)
print(f"  First 10 flips (seeded): {''.join(first_10_flips)}")

# Save state
saved_state = random.getstate()

# Generate 5 more flips
next_5_flips_1 = simulate_flips(5, 0.5) # No seed, continues from last state
print(f"  Next 5 flips (continued): {''.join(next_5_flips_1)}")

# Restore state
random.setstate(saved_state)

# Generate 5 more flips again (should be identical to next_5_flips_1)
next_5_flips_2 = simulate_flips(5, 0.5)
print(f"  Next 5 flips (restored): {''.join(next_5_flips_2)}")

print(f"  Are the two sets of 5 flips identical? {next_5_flips_1 == next_5_flips_2}")

📚 Homework 3: File Integrity and Secure Password Storage

This challenge combines concepts from data integrity and password security using the hashlib and secrets modules.

  1. File Integrity Check:
    • Create a text file named document.txt with some arbitrary content (e.g., "This is the original document.").
    • Calculate its SHA-256 hash. Print the hash.
    • Modify document.txt (e.g., add "Updated content.") and calculate its SHA-256 hash again. Print the new hash and observe the difference.
  2. Secure Password Storage Simulation:
    • Define a dummy user password (e.g., "StrongPa$$w0rd!").
    • Generate a unique, cryptographically secure 16-byte salt using os.urandom(). Print this salt in hexadecimal format.
    • Hash the password using hashlib.pbkdf2_hmac() with 'sha256', your generated salt, and at least 250,000 iterations. Use a derived key length (dklen) of 32 bytes. Print the resulting hash in hexadecimal format.
  3. Password Verification Simulation:
    • Simulate a login attempt with the correct password. Use the stored salt and iterations to re-hash the submitted password. Compare the newly generated hash with the stored hash using secrets.compare_digest(). Print whether the login was successful.
    • Simulate a login attempt with an incorrect password (e.g., "WrongPa$$w0rd!"). Repeat the hashing and comparison process. Print whether this login was successful.

import hashlib
import os
import secrets # Used for compare_digest

# --- Part 1: File Integrity Check ---
file_name = "document.txt"
original_content = "This is the original document. It contains important information."

# Create the original file
with open(file_name, "w") as f:
    f.write(original_content)

print("1. File Integrity Check:")

# Calculate hash of the original file
with open(file_name, "rb") as f:
    original_file_bytes = f.read()
    original_hash = hashlib.sha256(original_file_bytes).hexdigest()
print(f"  Original '{file_name}' SHA-256 hash: {original_hash}")

# Modify the file
modified_content = original_content + "\nUpdated content added later."
with open(file_name, "w") as f:
    f.write(modified_content)

# Calculate hash of the modified file
with open(file_name, "rb") as f:
    modified_file_bytes = f.read()
    modified_hash = hashlib.sha256(modified_file_bytes).hexdigest()
print(f"  Modified '{file_name}' SHA-256 hash: {modified_hash}")
print(f"  Hashes are different: {original_hash != modified_hash}")

# Clean up the dummy file
os.remove(file_name)


# --- Part 2: Secure Password Storage Simulation ---
print("\n2. Secure Password Storage Simulation:")

# Dummy user password
user_password_raw = "StrongPa$$w0rd!"

# Generate a unique, cryptographically secure salt (16 bytes)
stored_salt = os.urandom(16)
print(f"  Generated Salt (hex): {stored_salt.hex()}")

# Convert password to bytes
password_bytes = user_password_raw.encode('utf-8')

# Number of iterations for PBKDF2 (adjust for desired work factor)
iterations = 250_000

# Desired derived key length (32 bytes = 256 bits)
dklen = 32

# Hash the password
stored_hash = hashlib.pbkdf2_hmac(
    'sha256',
    password_bytes,
    stored_salt,
    iterations,
    dklen=dklen
)
print(f"  Stored Hash (hex): {stored_hash.hex()}")


# --- Part 3: Password Verification Simulation ---
print("\n3. Password Verification Simulation:")

# --- Correct password attempt ---
login_attempt_correct = "StrongPa$$w0rd!"
attempt_password_bytes_correct = login_attempt_correct.encode('utf-8')

# Re-hash the submitted password with the stored salt and iterations
attempt_hash_correct = hashlib.pbkdf2_hmac(
    'sha256',
    attempt_password_bytes_correct,
    stored_salt,
    iterations,
    dklen=dklen
)

# Compare hashes using secrets.compare_digest for security against timing attacks
if secrets.compare_digest(stored_hash, attempt_hash_correct):
    print(f"  Login with correct password '{login_attempt_correct}': SUCCESS")
else:
    print(f"  Login with correct password '{login_attempt_correct}': FAILED (ERROR!)")

# --- Incorrect password attempt ---
login_attempt_incorrect = "WrongPa$$w0rd!"
attempt_password_bytes_incorrect = login_attempt_incorrect.encode('utf-8')

# Re-hash the submitted password with the stored salt and iterations
attempt_hash_incorrect = hashlib.pbkdf2_hmac(
    'sha256',
    attempt_password_bytes_incorrect,
    stored_salt,
    iterations,
    dklen=dklen
)

# Compare hashes
if secrets.compare_digest(stored_hash, attempt_hash_incorrect):
    print(f"  Login with incorrect password '{login_attempt_incorrect}': SUCCESS (ERROR!)")
else:
    print(f"  Login with incorrect password '{login_attempt_incorrect}': FAILED")


Post a Comment

Previous Post Next Post