Implementing a self-organizing map, part 1: Weights and hexes

2025-03-10

The self-organizing map (SOM) is a special algorithm to me since it was the one that sparked my interest in machine learning back when I was a student. The SOM is a topology-preserving low-dimensional representation of a high-dimensional dataset, which basically means that it tries to visualize the significant relationships between the elements of a dataset. If you haven't read the original Self-organized formation of topologically correct feature maps paper or any of Kohonen's subsequent writings on the algorithm, they're definitely worth checking out. While I do count this as one of my all-time favorite algorithms, I've never taken a crack at implementing it myself, so I thought we'd give it a go in this series.

The SOM is made up of a grid of nodes that each have an N-dimensional weight vector, N being equal to the dimensionality of the dataset. The weight vectors are often initialized with random values, although some more sophisticated methods such as principal component initialization also see frequent use. The SOM is trained with competitive learning instead of error-correction learning. For each training sample the closest node (best-matching unit, BMU) in the map is selected based it's weight vector and some metric, and the weights of this node and it's neighbors are updated such that they move closer to the sample. When trained in this manner, the SOM starts to gradually form distinct areas that each respond to certain kinds of input, which can be thought of as the clusters present in the training data. Wikipedia has led me to believe that this not entirely unlike some processes that happen in the brain.

From an implementation point of view the SOM has some interesting features that make it quite fun to experiment with. The aspect that generates the biggest amount of work is the fact that the SOM usually employs hexagonal tiling. Square tiling is commonly used as well, but it doesn't look as cool, so we'll be implementing the classic version today. Just drawing a bunch of hexagons is not going to be enough though, since to get the neighborhood function used in the learning rule to work we'll also need a functioning coordinate system to be able to find the neighbors for any given node.

We'll build a SOM to recognize colors since that's one of my favorite examples to illustrate how the algorithm works. Colors represented as 3-dimensional vectors containing RGB data are great for illustration since they have a nice representation in two dimensions. This means that when the training of the map progresses it's very easy to see what's happening and why, which is definitely not the case with more complex kinds of data that usually require some analysis to see why certain clusters have formed. We will be training the map with vectors that represent red, orange, yellow, green, blue, indigo and violet, and from one training epoch to the next we should see the map organizing itself to form areas that correspond to these colors. Our nodes will be initialized with random values, so the map is likely going to look very different between runs of the program.

We will be programming in C++, but fear not, I'll try to keep the implementation below 500 lines. A significant chunk of these lines will be taken up by the UI, so the actual SOM implementation is not terribly complex.

Weights

The parts that make up a SOM node are a weight vector and a grid position. For the weight vectors we'll define a class called Vec, which is a thin wrapper around std::array<double, 3>. A more serious implementation would obviously use a proper linear algebra library such as Armadillo here.

class Vec { public: using SizeT = std::array<double, 3>::size_type; private: std::array<double, 3> data_; public: Vec() : data_{0.0, 0.0, 0.0} {} Vec(double r, double g, double b) : data_{r, g, b} {} Vec(const Vec&) = default; ... };

Doing math with Vecs requires some basic operators to be implemented as well. We'll stick to just the ones that are actually needed for our purposes here.

class Vec { ... Vec& operator+=(const Vec& other) { for (SizeT i = 0; i < data_.size(); ++i) { data_[i] += other.data_[i]; } return *this; } Vec& operator-=(const Vec& other) { for (SizeT i = 0; i < data_.size(); ++i) { data_[i] -= other.data_[i]; } return *this; } Vec& operator*=(double scalar) { for (SizeT i = 0; i < data_.size(); ++i) { data_[i] *= scalar; } return *this; } double& operator[](SizeT index) { return data_[index]; } const double& operator[](SizeT index) const { return data_[index]; } ... }; Vec operator+(Vec lhs, const Vec& rhs) { return lhs += rhs; } Vec operator-(Vec lhs, const Vec& rhs) { return lhs -= rhs; } Vec operator*(Vec lhs, double scalar) { return lhs *= scalar; } Vec operator*(double scalar, Vec rhs) { return rhs *= scalar; }

We'll also define a function to calculate the Euclidean distance between two Vecs. Depending on the kind of data that is going be fed to the SOM other metrics might be more appropriate.

class Vec { ... friend double EuclideanDistance(const Vec&, const Vec&); }; double EuclideanDistance(const Vec& lhs, const Vec& rhs) { double sum = 0.0; for (Vec::SizeT i = 0; i < lhs.data_.size(); ++i) { sum += std::pow(lhs.data_[i] - rhs.data_[i], 2); } return std::sqrt(sum); }

To wrap things up with Vec, we'll also add a method to randomize it's contents. This will be used in the SOM initialization later.

// Generates a random double in range [min, max). double RandomDouble(double min, double max) { static std::random_device random_device; static std::mt19937 random_engine(random_device()); std::uniform_real_distribution<> dist(min, max); return dist(random_engine); } class Vec { ... void Randomize(double min, double max) { for (SizeT i = 0; i < data_.size(); ++i) { data_[i] = RandomDouble(min, max); } } ... };

Hexes

The other half of a SOM node is it's position in the hex grid that makes up the map. There are many ways to represent hexagonal coordinates, but for this implementation we'll go with what are called cube coordinates. The definition of Hex is pretty simple.

struct Hex { const int q, r, s; Hex(int q, int r, int s) : q(q), r(r), s(s) { assert(q + r + s == 0); } Hex(const Hex&) = default; };

Finally we'll need the ability to calculate the distance between two Hexs to determine how close neighbors they are.

unsigned HexDistance(const Hex& lhs, const Hex& rhs) { auto diff = Hex(lhs.q - rhs.q, lhs.r - rhs.r, lhs.s - rhs.s); return unsigned( std::max({std::abs(diff.q), std::abs(diff.r), std::abs(diff.s)})); }

I think that's plenty for now. We'll continue with the node definition in part 2.