I think the default should be the method that displays the most information. Why hide information if you don't have to? In the case of one dimensional data, a dotplot shows the reader everything. Using a boxplot reduces information content, mean-plus-errorbars reduces this further. The mean plus errorbars imposes a probability distribution, which may be wrong, it doesn't reveal a hidden truth.
The same holds in two dimensions. Show me all the data, and include a regression line or a spline to highlight a trend. Only start hiding information when the scatterplot becomes misleading. That is, when overplotting prevents me from accurately assessing the actual distribution of the points.
Jumping immediately to a density plot also restricts me to your interpretation. The original data is lost. With a scatterplot, the raw data can be recovered from the plot, so i can do my own analysis should i be interested. This is common in meta-analyses that extract data from multiple published papers. If those original papers had used density plots instead of scatterplots, reanalysis will require direct access to the underlying data. Once the original author dies, or loses the data, all further use of the data is lost.
The original data would be well represented in 100x100 matrix.
since the data (grades 0-100) is already discrete. Basically the first picture in the article with a alpha setting that matches 1 (1=opaque) when multiplied with the maximum number of entries per field. e.g Max entries = 5 => alpha = 1/5 = 0.2.
Alternatively aggregating for 10x10 20x20 25x25 50x50 would work to if the data is too sparse. There is in need for Hex binning in this case!
When overplotting, the usual compositing operator gives a final alpha of
1 - (1-alpha)^N
So your alpha = 1/5 overdrawn 5 times would give a final opacity of ~0.673. By its very nature, there is no alpha < 1 which when composited together a finite number of times gives alpha = 1.
I was aware of that this approach towards alpha was oversimplistic to begin with, should have pointed that out. Thanks for posting the correct formula.
The same holds in two dimensions. Show me all the data, and include a regression line or a spline to highlight a trend. Only start hiding information when the scatterplot becomes misleading. That is, when overplotting prevents me from accurately assessing the actual distribution of the points.
Jumping immediately to a density plot also restricts me to your interpretation. The original data is lost. With a scatterplot, the raw data can be recovered from the plot, so i can do my own analysis should i be interested. This is common in meta-analyses that extract data from multiple published papers. If those original papers had used density plots instead of scatterplots, reanalysis will require direct access to the underlying data. Once the original author dies, or loses the data, all further use of the data is lost.