Dimensionality Reduction using Haar Wavelet Transform: Theory & Implementation

4 min readApr 18, 2024

Dimensionality reduction is a crucial technique in machine learning and data analysis, aimed at reducing the number of features or variables in a dataset while preserving its essential information. One powerful approach to dimensionality reduction is through the use of wavelet transforms, which decompose signals into localized frequency components. In this blog post, we’ll explore how to implement Haar Wavelet Transform, a simple yet effective wavelet transform, for dimensionality reduction in Python.

Understanding Haar Wavelet Transform

Introducing Haar Wavelet:
Haar Wavelet is the simplest form of wavelet transform, named after mathematician Alfréd Haar. It consists of a single positive coefficient followed by a single negative coefficient, representing a step function and its negative counterpart, respectively. Despite its simplicity, Haar Wavelet has been widely used in signal and image processing due to its fast computation and ability to capture sharp transitions and edges.

Wavelet Transform:
Wavelet Transform is a mathematical operation that decomposes a signal into wavelets at different scales and positions. In the case of Haar Wavelet Transform, the signal is decomposed into Haar wavelets, which capture local features and discontinuities in the signal.

Implementation in Python

Setting Up the Environment:
Before we dive into the implementation, make sure you have Python installed on your system along with the necessary libraries, such as NumPy and PyWavelets. You can install PyWavelets using pip:

pip install PyWavelets

Now, let’s break down the Python code step-by-step:

1. Import Libraries:

import numpy as np
import pywt

numpy (np): This library provides powerful tools for numerical computations and array manipulation. It's essential for working with data in Python.
pywt: This library specifically deals with wavelet transforms. We'll use its functions for applying DWT and thresholding.

2. Sample Data:

# Sample data (replace with your actual data)
data = np.random.rand(8, 20)

This line creates a sample data matrix (data) with 8 rows and 20 columns. You'll replace this with your actual data in practice. The data is assumed to be numerical for wavelet transform to work effectively.

3. Define Threshold:

# Define threshold (experiment with different values)
threshold = 0.1

This line defines a threshold value (threshold) used for discarding wavelet coefficients. It controls the aggressiveness of dimensionality reduction. Experiment with different values (e.g., 0.05, 0.2) to find the optimal balance for your data. Higher thresholds discard more coefficients, leading to greater reduction but potentially losing information.

4. Apply Haar DWT to Rows:

# Apply Haar DWT to each row
coeff_list = []
for row in data:
    coeff = pywt.dwt(row, 'haar')
    coeff_list.append(coeff)

We enter a loop that iterates through each row (row) of the data matrix.
Inside the loop, pywt.dwt(row, 'haar') applies the Haar Wavelet Transform to the current row. This decomposes the row into approximation coefficients (capturing low-frequency information) and detail coefficients (capturing high-frequency information) at a single scale.
The resulting coefficients (coeff) for each row are appended to a list coeff_list.

5. Apply Soft Thresholding:

# Apply thresholding (soft thresholding example)
thresholded_coeff_list = []
for coeff in coeff_list:
    thresholded_coeff = [pywt.threshold(c, mode='soft', value=threshold) for c in coeff]
    thresholded_coeff_list.append(thresholded_coeff)

Another loop iterates through the coeff_list containing the DWT coefficients for each row.
Inside the loop, another list comprehension is used. It iterates through each element (c) within the current row's coefficients (coeff).
pywt.threshold(c, mode='soft', value=threshold) applies soft thresholding to the coefficient c. This shrinks coefficients below the threshold value by the threshold amount, preserving some information but reducing their magnitude.
The thresholded coefficients for each row are stored in a new list thresholded_coeff_list.

6. Reduced Dimension Data:

# Reduced dimension data (keeping only approximation coefficients)
reduced_data = np.array([coeff[0] for coeff in thresholded_coeff_list])

This line creates the final reduced-dimension data (reduced_data). It creates a NumPy array by selecting only the first element (coeff[0]) from each row in the thresholded_coeff_list. This first element represents the approximation coefficients, which capture the most important information in the data after applying the threshold.

7. Print Data Shapes:

print("Original data shape:", data.shape)
print("Reduced data shape:", reduced_data.shape)

This section simply prints the shapes of the original data and the reduced data. This helps visualize the achieved dimensionality reduction (reduction in the number of columns).

By running this code, you’ll get the original data shape (8 rows, 20 columns) and the reduced data shape (8 rows, a smaller number of columns depending on the threshold). This indicates that the code successfully reduced the number of columns in your data while retaining the most significant information through the Haar Wavelet Transform and thresholding process.

Output of the code with threshold = 0.1

Applications and Benefits

Dimensionality Reduction:
By selecting a subset of the approximation coefficients obtained from Haar Wavelet Transform, we can effectively reduce the dimensionality of the dataset while preserving important features and patterns. This can lead to faster computation and improved performance in machine learning tasks.

Signal and Image Processing:
Haar Wavelet Transform is widely used in signal and image processing applications for tasks such as denoising, compression, and feature extraction. Its ability to capture local features and sharp transitions makes it well-suited for analyzing signals and images with complex structures.

Conclusion

Haar Wavelet Transform offers a powerful tool for dimensionality reduction and signal processing tasks in Python. By leveraging its simplicity and efficiency, we can effectively reduce the dimensionality of datasets while preserving important information and patterns. Whether you’re working on machine learning projects or signal processing applications, Haar Wavelet Transform is a valuable technique to have in your toolkit.

Dimensionality Reduction using Haar Wavelet Transform: Theory & Implementation

Understanding Haar Wavelet Transform

Implementation in Python

Applications and Benefits

Conclusion

Written by Saiyam Sakhuja