Vectors in Linear Algebra(Machine Learning)
Linear Algebra
Linear Algebra is a branch of mathematics that deals with linear equations, linear transformations, and vector spaces. Linear Algebra is a fundamental mathematical tool used in many aspects of Machine Learning (ML), including data preprocessing, feature extraction, model training, and prediction.
Some key Topics in Linear Algebra include:
- Scalars
- Vectors
- Matrices
- Tensors
Why we should study linear algebra in ML?
- Generating concepts in higher dimensions: Many ML algorithms deal with data that is high-dimensional, which means it has many features or attributes. For example, an image can be represented as a high-dimensional array of pixels, and a text document can be represented as a high-dimensional vector of word frequencies. Linear Algebra provides the tools to manipulate and transform high-dimensional data in a meaningful way. For example, Principal Component Analysis (PCA) is a Linear Algebra technique used to reduce the dimensionality of high-dimensional data while retaining the most important information.
- Data Representation: ML often involves working with large datasets that are represented as matrices and tensors. Linear Algebra provides the tools to manipulate and transform these data structures, which is necessary for many ML algorithms.
We will cover vectors in this blog.
Vectors
A vector is a quantity that has both magnitude and direction. It can be represented graphically as an arrow, with its length representing its magnitude and its direction representing its direction.
Use of Vector in ML
- Feature Vector: A feature vector is a representation of input data used in machine learning. It is a vector of numerical features that describe the characteristics of the data. The feature vector is used as input to machine learning models to train and make predictions. Examples of feature vectors include pixel values in an image, word frequencies in a document, or audio spectral features.
- Distance Metrics: Vectors are also used in ML to measure the similarity or dissimilarity between two data points. For instance, the Euclidean distance between two vectors can be used to measure the distance between two points in space.
Rows and Column Vector
In linear algebra, a vector is a one-dimensional array of numbers. There are two types of vectors, row vectors, and column vectors:
Row Vector: A row vector is a vector that has a single row of numbers. It is denoted as a matrix with dimensions 1 x n, where n is the number of elements in the row vector.
Column Vector: A column vector is a vector that has a single column of numbers. It is denoted as a matrix with dimensions n x 1, where n is the number of elements in the column vector.
Example of Row and Column Vector in ML
We can represent the Iris dataset as a set of row vectors, where each row represents an observation:
[5.1, 3.5, 1.4, 0.2] # row vector for observation 1
[4.9, 3.0, 1.4, 0.2] # row vector for observation 2
[4.7, 3.2, 1.3, 0.2] # row vector for observation 3
…
[6.5, 3.0, 5.2, 2.0] # row vector for observation 148
[6.2, 3.4, 5.4, 2.3] # row vector for observation 149
[5.9, 3.0, 5.1, 1.8] # row vector for observation 150
Alternatively, we can represent the Iris dataset as a set of column vectors, where each column represents a feature:
[5.1, 4.9, 4.7, …, 6.5, 6.2, 5.9] # column vector for sepal length
[3.5, 3.0, 3.2, …, 3.0, 3.4, 3.0] # column vector for sepal width
[1.4, 1.4, 1.3, …, 5.2, 5.4, 5.1] # column vector for petal length
[0.2, 0.2, 0.2, …, 2.0, 2.3, 1.8] # column vector for petal width
We can use either representation to perform machine learning tasks, such as classification using decision trees or k-nearest neighbors. The choice of representation may depend on the specific machine learning algorithm or task and may affect the efficiency of computations.
Distance From Origin ||A||
The distance from the origin is a common concept used in mathematics and can also be applied in machine learning. In n-dimensional space, the distance from the origin is calculated as the square root of the sum of the squares of the individual coordinates, where each coordinate represents a dimension.
For example, in 2-dimensional space (a plane), the distance from the origin to a point (x, y) is calculated as:
distance = sqrt(x² + y²)
In 3-dimensional space (a space with three dimensions), the distance from the origin to a point (x, y, z) is calculated as:
distance = sqrt(x² + y² + z²)
In machine learning, the distance from the origin can be used to calculate the magnitude of a vector, which represents the strength or length of the vector. The magnitude of a vector is calculated as the distance from the origin to the endpoint of the vector.
For example, if we have a 2-dimensional vector (3, 4), we can calculate its magnitude as:
magnitude(||A||) = sqrt(3² + 4²) = 5
Similarly, if we have a 3-dimensional vector (1, 2, 3), we can calculate its magnitude as:
magnitude = sqrt(1² + 2² + 3²) = sqrt(14)
# distance from origin -> euclidean norm
import numpy as np
# Define an n-dimensional vector A
A = np.array([1, 2, 3, 4, 5,6,7,8,9,10,11,112,12])
# Calculate the Euclidean distance from the origin (L2 norm)
distance = np.linalg.norm(A)
print("Euclidean distance from the origin:", distance)
Output:
Euclidean distance from the origin: 114.86513831445988
Euclidean Distance
Euclidean distance is a measure of the distance between two points in n-dimensional space. It is the most commonly used distance metric in machine learning and is based on the Pythagorean theorem.
In two-dimensional space, the Euclidean distance between two points (x1, y1) and (x2, y2) is calculated as:
distance = sqrt((x2 — x1)² + (y2 — y1)²)
In n-dimensional space, the Euclidean distance between two points with n coordinates is calculated as:
distance = sqrt((x2 — x1)² + (y2 — y1)² + … + (zn — zn-1)²)
where x1, x2, …, xn and y1, y2, …, yn are the coordinates of the two points in each dimension.
# to find euclidean distance in n-D
import numpy as np
# Define two n-dimensional vectors A and B
A = np.array([1, 2, 3, 4, 5])
B = np.array([6, 7, 8, 9, 10])
# Calculate the difference vector
difference = A - B
# Calculate the Euclidean distance between A and B (L2 norm of the difference)
distance = np.linalg.norm(difference)
print("Euclidean distance between A and B:", distance)
Output:
Euclidean distance between A and B: 11.180339887498949
Euclidean distance is used in various machine learning algorithms, such as k-nearest neighbors and k-means clustering, to determine the similarity or dissimilarity between two data points. The closer the Euclidean distance between two points, the more similar they are considered to be.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from scipy.spatial.distance import euclidean
# 1. Generate 5 random 3D vectors
vectors = np.random.rand(5, 3)
# 2. Assign a random class (0 or 1) to each vector
classes = np.random.randint(0, 2, 5)
# 3. Plot the vectors on a 3D Matplotlib graph
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
for vec, cls in zip(vectors, classes):
ax.quiver(0, 0, 0, vec[0], vec[1], vec[2], color=("r" if cls == 1 else "b"))
# 4. Get user input for a query point (3D vector)
query_vector = np.array(list(map(float, input("Enter the query point (3D vector) separated by space: ").split())))
# 5. Calculate the distance from the query vector to the 5 vectors and find the nearest neighbor
distances = [euclidean(query_vector, vec) for vec in vectors]
nearest_neighbor_index = np.argmin(distances)
# 6. Output the class of the nearest neighbor
print("The class of the nearest neighbor is:", classes[nearest_neighbor_index])
# 7. Plot the query vector with a different color
ax.quiver(0, 0, 0, query_vector[0], query_vector[1], query_vector[2], color="g")
ax.set_xlabel("X-axis")
ax.set_ylabel("Y-axis")
ax.set_zlabel("Z-axis")
plt.show()
Finally, the code outputs the class of the nearest neighbor and plots the query vector with a different color.
This example shows how Euclidean distance can be used to find the nearest neighbor to a query point in a 3D vector space, which is a common task in machine learning algorithms such as k-nearest neighbors.
Scalar Addition/Subtraction (Shifting)
Scalar addition/subtraction, also known as shifting, is a simple operation in linear algebra that involves adding or subtracting a scalar value to/from each element of a vector or matrix.
For example, if we have a vector v = [1, 2, 3] and we want to shift it by 2, we can add 2 to each element of the vector to get v’ = [3, 4, 5]. Similarly, if we subtract 2 from each element of v, we get v’’ = [-1, 0, 1].
Some common applications of shifting in ML include:
- Centering data: Shifting the data so that the mean of each feature is zero. This can help certain algorithms converge faster and can also help with interpretability.
- Scaling data: Shifting the data so that the minimum value of each feature is zero (or some other value) can help bring features into a similar range, which can make certain algorithms (e.g. neural networks) train faster and perform better.
- Data augmentation: Adding random values to each pixel of an image can help to artificially expand the dataset, which can help improve the performance of machine learning algorithms.
Let’s understand Mean centering!
Mean centering
Mean centering is a common data preprocessing technique used in machine learning. It involves shifting the data values so that the mean (average) of each feature is zero.
To mean center a dataset, we first calculate the mean value of each feature (column) in the dataset. We then subtract the mean from each data point in that feature. This ensures that the mean of the feature is centered at zero.
Mean centering is a useful preprocessing technique in machine learning that can be applied to a variety of tasks, such as:
- Principal Component Analysis (PCA)
- Linear Regression
- Gradient-based optimization algorithms
- Clustering algorithms
- Regularization
import numpy as np
import matplotlib.pyplot as plt
# Generate random 2D data with 100 points
data = np.random.rand(100, 2)
# Calculate the mean of the data along each dimension
data_mean = np.mean(data, axis=0)
# Perform mean centering
centered_data = data - data_mean
# Plot the original data
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.scatter(data[:, 0], data[:, 1], label="Original Data")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("Before Mean Centering")
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.legend()
# Plot the mean-centered data
plt.subplot(1, 2, 2)
plt.scatter(centered_data[:, 0], centered_data[:, 1], label="Mean-Centered Data", color="orange")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.title("After Mean Centering")
plt.axhline(0, color='black', linewidth=0.5)
plt.axvline(0, color='black', linewidth=0.5)
plt.legend()
plt.show()
Dot Product
The dot product, also known as the scalar product or inner product, is an operation between two vectors that results in a scalar. The dot product is calculated by taking the sum of the product of the corresponding elements of the two vectors. Mathematically, the dot product of two vectors a and b is denoted as a ⋅ b and can be calculated as:
a ⋅ b = a1 * b1 + a2 * b2 + … + an * bn
where a1, a2, …, an and b1, b2, …, bn are the corresponding elements of the two vectors.
import numpy as np
# Define three vectors
A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
print(np.dot(A, B))
print(A@B)
Output:
32
32
Use of Dot Product in Vector Applications
Here are some examples of the use of dot products in vector applications:
- Projection: The dot product can be used to find the projection of one vector onto another. Given two vectors a and b, the projection of an onto b can be computed as (a · b / ||b||²) * b, where ||b|| is the magnitude of b.
- Orthogonality: Two vectors are orthogonal (perpendicular) if and only if their dot product is zero. This property is used in various applications, such as determining the angle between two vectors or checking for the linear independence of a set of vectors.
- Similarity: The dot product can be used to measure the similarity between two vectors. The cosine similarity between two vectors a and b can be computed as (a · b) / (||a|| * ||b||), where ||a|| and ||b|| are the magnitudes of the vectors. The cosine similarity ranges from -1 (dissimilar) to 1 (similar).
- Machine learning: The dot product is used in various machine learning algorithms, such as linear regression, logistic regression, and support vector machines. In these algorithms, the dot product is used to compute the distance or similarity between data points or to calculate the decision boundary between classes.
Angle between vectors
The dot product of two vectors can be used to calculate the angle between them. Given two vectors A and B, the dot product of A and B is defined as:
A · B = ||A|| ||B|| cos(θ)
where ||A|| and ||B|| are the magnitudes (lengths) of the vectors A and B, and θ is the angle between them.
Rearranging the equation, we get:
cos(θ) = (A · B) / (||A|| ||B||)
Taking the inverse cosine of both sides, we get:
θ = cos^-1((A · B) / (||A|| ||B||))
If the dot product of two vectors is positive, then the angle between them is acute (less than 90 degrees). If the dot product is negative, then the angle is obtuse (greater than 90 degrees). If the dot product is zero, then the vectors are perpendicular.
import numpy as np
# Define two vectors
A = np.array([1, 2, 3])
B = np.array([-4, -5, -6])
C = np.array([5,5,5])
# Calculate the cosine similarity
cosine_similarity = np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B))
print("Cosine similarity between A and B:", cosine_similarity)
# Calculate the cosine similarity
cosine_similarity = np.dot(A, C) / (np.linalg.norm(A) * np.linalg.norm(C))
print("Cosine similarity between A and C:", cosine_similarity)
Output:
osine similarity between A and B: -0.9746318461970762
Cosine similarity between A and C: 0.9258200997725513
Hyperplane
A hyperplane in 3-dimensional space is a two-dimensional subspace, which can be visualized as a flat sheet or a plane.
In machine learning, hyperplanes are often used to separate data points in high-dimensional space. For example, in binary classification problems, we may have a dataset with two classes of points that are not linearly separable in the original feature space. However, by mapping the features to a higher-dimensional space, we may be able to find a hyperplane that can separate the two classes. This is the basis for many popular machine learning algorithms, such as support vector machines.
The equation of a hyperplane in n-dimensional space can be written as:
w0 + w1x1 + w2x2 + … + wnxn = 0
where w0, w1, w2, …, wn are the coefficients of the hyperplane, x1, x2, …, xn are the coordinates of a point in the space, and 0 is a constant.
To understand this equation, consider a simple example in two-dimensional space. A hyperplane in this space is just a line, and the equation of the line can be written as:
w1x1 + w2x2 + w0 = 0
where w0, w1, and w2 are the coefficients of the line, and x1 and x2 are the coordinates of a point on the line.
This equation can be rearranged as:
w1x1 + w2x2 = -w0
which is the standard form of the equation of a line. This equation tells us that any point (x1, x2) on the line must satisfy this equation. Geometrically, this means that the dot product of the vector [w1, w2] with the vector [x1, x2] is equal to -w_0.