Distance Metrics
Description
Different metrics of distance are convenient for different types of analysis. Flink ML provides built-in implementations for many standard distance metrics. You can create custom distance metrics by implementing the DistanceMetric trait.
Built-in Implementations
Currently, FlinkML supports the following metrics:
| Metric | Description |
|---|---|
| Euclidean Distance | \(\(d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}\)\) |
| Squared Euclidean Distance | \(\(d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2\)\) |
| Cosine Similarity | \(\(d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}\)\) |
| Chebyshev Distance | \(\(d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)\)\) |
| Manhattan Distance | \(\(d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert\)\) |
| Minkowski Distance | \(\(d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}\)\) |
| Tanimoto Distance | \(\(d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}\)\) with \(\x\) and \(\y\) being bit-vectors |
Custom Implementation
You can create your own distance metric by implementing the DistanceMetric trait.
class MyDistance extends DistanceMetric {
override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric }
object MyDistance {
def apply() = new MyDistance()
}
val myMetric = MyDistance()
