Sort a hash map by the values, not the keys?

Anybody has sample code for sorting a hash map by the values?

I’m not sure I understand what you’re after. What are you trying to accomplish exactly?

Let’s say we’re doint the counting words type of program where we have a string hash map with the words as keys and the frequency as the value. At the end, we want to print out the word frequencies in descending order so the most frequent words are easily spotted at the top of the output.

Word         Count
-------------------
the           10
fox            4
quick          1

But hash maps don’t retain any order, so this must be done some other way?

The most reliable way would be to dump the contents of the hash map to an array, sort it, and then print it:

$ cat hmsort.zig
const std = @import("std");

fn fill(hash_map: anytype) !void {
    try hash_map.putNoClobber("quick", 1);
    try hash_map.putNoClobber("fox", 4);
    try hash_map.putNoClobber("the", 10);
}

fn dump(allocator: *std.mem.Allocator, hash_map: anytype) !std.ArrayList(@TypeOf(hash_map).KV) {
    var list = try std.ArrayList(@TypeOf(hash_map).KV).initCapacity(allocator, hash_map.count());
    errdefer list.deinit();

    var it = hash_map.iterator();
    while (it.next()) |entry| {
        list.appendAssumeCapacity(.{
            .key = entry.key_ptr.*,
            .value = entry.value_ptr.*,
        });
    }

    return list;
}

fn desc(comptime HashMap: type) (fn (void, HashMap.KV, HashMap.KV) bool) {
    return struct {
        pub fn desc(context: void, lhs: HashMap.KV, rhs: HashMap.KV) bool {
            _ = context; // unused
            return lhs.value > rhs.value;
        }
    }.desc;
}

fn print(allocator: *std.mem.Allocator, hash_map: anytype, writer: anytype) !void {
    var arr = try dump(allocator, hash_map);
    defer arr.deinit();

    const HashMap = @TypeOf(hash_map);
    std.sort.sort(HashMap.KV, arr.items, {}, comptime desc(HashMap));

    try writer.writeAll("Word         Count \n");
    try writer.writeAll("-------------------\n");
    for (arr.items) |kv| {
        try writer.print("{s:<13}{d}\n", .{ kv.key, kv.value });
    }
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer if (gpa.deinit()) std.log.err("memory leak detected", .{});
    const allocator = &gpa.allocator;
    const stdout = std.io.getStdOut().writer();

    inline for (.{ std.StringHashMap(u32), std.StringArrayHashMap(u32) }) |HashMap, i| {
        if (i != 0) try stdout.writeByte('\n');
        try stdout.print(" -> {s}\n", .{@typeName(HashMap)});

        var array_hash_map = HashMap.init(allocator);
        defer array_hash_map.deinit();
        try fill(&array_hash_map);
        try print(allocator, array_hash_map, stdout);
    }
}

$ zig run hmsort.zig
 -> std.hash_map.HashMap([]const u8,u32,std.hash_map.StringContext,80)
Word         Count
-------------------
the          10
fox          4
quick        1

 -> std.array_hash_map.ArrayHashMap([]const u8,u32,std.array_hash_map.StringContext,true)
Word         Count
-------------------
the          10
fox          4
quick        1

Maybe it’s possible to avoid the extra allocation when using a variant of ArrayHashMapUnmanaged? Somebody smarter than me could probably help on that :slight_smile:

1 Like

@jmc this is an awesome solution, and totally generic too! Thanks! Just a minor fix: you named the sort order function asc when it should actually be desc.

Ah, of course, I’ll amend the name. Thanks!