> I run a site that stores a ton of chess lines, something like 100 million in total.
Because there are so many, one approach would be to sort the list of lines lexicographically to group similar ones together, then compress the result with a sliding window compression algorithm.
The sliding window compression will avoid storing repeated parts a second time, and the first part of the lines will be repeated a lot because of the sorting. There may also be some other repeated sequences.
This assumes that it's OK to sort the lines, though, which might or might not be true.
Because there are so many, one approach would be to sort the list of lines lexicographically to group similar ones together, then compress the result with a sliding window compression algorithm.
The sliding window compression will avoid storing repeated parts a second time, and the first part of the lines will be repeated a lot because of the sorting. There may also be some other repeated sequences.
This assumes that it's OK to sort the lines, though, which might or might not be true.