Implementing a self-organizing map, part 4: Training and drawing the map

2025-03-13

I have a good night of sleep behind and a cup of coffee in front, so let's get started on training the SOM that we have built. We'll add some lines to main() to create the SOM and some training data.

int main(void) { ... auto som = Som(7, 11); auto training_data = GenerateTrainingData(); ... }

We instantiate a SOM with a 7x11 grid, which fits nicely into the window with the UI layout constants that we defined in part 3. We also obtain a dataset for training with GenerateTrainingData(). Let's have a look at that next.

std::vector<Vec> GenerateTrainingData() { std::vector<Vec> data; for (double i = 0; i < 2; ++i) { data.push_back({255.0, 0.0, 0.0}); data.push_back({255.0, 127.0, 0.0}); data.push_back({255.0, 255.0, 0.0}); data.push_back({0.0, 255.0, 0.0}); data.push_back({0.0, 0.0, 255.0}); data.push_back({75.0, 0.0, 130.0}); data.push_back({148.0, 0.0, 211.0}); } return data; }

In GenerateTrainingData() we fill a vector with Vecs that contain the ROYGBIV colors that I've been talking about. In a real application working with more complex data the training data would of course most likely be loaded from a file or a database and the dataset would be big enough so that it shouldn't be loaded into memory all at once. Since we're doing recreational programming here we can happily ignore all that for now.

Now that we have SOM with nothing to do and a dataset waiting to be explored we can start the training. We will run the training in it's entirety before our frame-drawing loop begins and save a snapshot of the SOM state for each training epoch so that we can then cycle through them in the application. A method SaveSnapshot() will be added to the Som class to take this snapshot. Something like the memento pattern could be used here, but I think we can manage without it.

class Som { public: ... NodesT SaveSnapshot() { NodesT snapshot; for (auto node : nodes_) { snapshot.push_back(new Node(*node)); } return snapshot; } ... }; ... int main(void) { ... std::vector<Som::NodesT> epoch_snapshots; do { epoch_snapshots.push_back(som.SaveSnapshot()); } while (som.TrainOneEpoch(training_data)); ... }

We now have a fully trained SOM! When training in a more serious setting the training dataset is often not used in it's entirety in each training epoch, but is sampled via bootstrapping or using some other method. If the training dataset is sufficiently big this might even be mandatory to keep the training time reasonable.

With the SOM trained all that's left is to write up the loop that draws our UI on each frame. The code is quite simple and uses methods from the grid and sidebar drawing helpers that we finished up earlier.

int main(void) { ... unsigned selected_epoch = 0; while (!WindowShouldClose()) { BeginDrawing(); ClearBackground(RAYWHITE); Node* selected_node = nullptr; for (const auto& node : epoch_snapshots[selected_epoch]) { Color color = {static_cast<unsigned char>(node->weight[0]), static_cast<unsigned char>(node->weight[1]), static_cast<unsigned char>(node->weight[2]), 255}; if (hex_grid.DrawHex(node->hex, color)) { selected_node = node; } } if (selected_node) { hex_grid.HighlightHex(selected_node->hex); } sidebar.Draw(selected_epoch, selected_node); EndDrawing(); } CloseWindow(); return 0; }

We keep track of the selected epoch number since it's used to select the epoch to draw and is controlled via the buttons. HexGridHelper::DrawHex() returns true when the drawn hex is selected, so we can use that information to keep track of the currently selected node as well. The selected node is highlighted and information about it is displayed in the sidebar.

One thing that's very specific to our use of colors as the data is using the weight vectors of the nodes to determine the color with which they're drawn onto the map. This is nice for illustrating the functionality of the algorithm, but for higher-dimensional data where the component values are not neatly all in the [0.0, 255.0] range this is not generally doable. The SOM is a very neat algorithm, but with high-dimensional data you still have to do some work to analyze the generated map and see if you can figure out why it formed the clusters that it did.

That wraps up our treatment of the self-organizing map. You can check out the fully working version from GitHub, along with commands to compile and run it. If you haven't done so previously, please look up Kohonen's various writings on the topic throughout the years. He was definitely one of the all-time great computer scientists.