Interesting Dashboard Design from the Interaction Design Foundation
Mo Data stashed this in Data Visualizations
Dashboard Display MediaStacked Bar GraphsA stacked bar chart is used to display groups of data, within which there are a number of different instances of each independent group or category. Stacked bar charts differ from grouped bar charts as the latter display method is employed when dealing with data sets that are not comprised of further clear and natural groups, unlike the former where the bar for each group of data is broken down into its component groups or categories. A stacked bar chart is used to plot two categorical variables against a quantitative-ratio variable. Typically, one categorical variable is plotted along the y-axis and the other is represented by the different layers of the bars, which are denoted by position and colour, against one quantitative-ratio variable, conventionally displayed on the y-axis. The position referred to is the placement of each column, representing a series of related data, along the x-axis. The discrete value of this category variable is then given by a term read from the x-axis directly beneath the column.
An ExampleFor example, a graph with 5 columns, where each column represents a member of the sales team, would have an equal number of values in the positional category. Each of these 5 values would be the name of the member of staff, read off the x-axis under a column. In the exemplar image the position category is plotted on the x-axis and takes 2 values; expenditure and revenue. The data series, grouped together in columns, are further divided using colour. These colours represent the values of the second categorical variable. Each value has a distinct colour/shade. As is the case with the image chosen, every member of the series is assigned a different colour/shade which is indicated in a legend. In the case of this image the legend is displayed to the right. As the legend explains the association between the datum and its element in the column it must be as clear as possible and carry as much information as it can. Some information is lost in this image as the legend is truncated. Even if a tooltip or pop-out is used to make this information available it will make the viewing experience less efficient. Take care to not have too many elements in the legend or the data can be confused by overcrowding, additionally the more classes of data there are the more colours and legends are needed. This adds to a loss of clarity of exposition as the short-term memory is limited, hence users can only form a certain number of colour-category associations and general trends that the graph is required to illustrate may be lost in the process. In the case in question it is probable that the intention is to show the difference between total expenditure and total revenue, therefore the legend or information contained within is not vital (but then why show the components at all). The information could perhaps be illustrated just as well with a table of the data or even a simple calculation of profit/loss (total revenue - total expenditure). As it is, a large amount of space is used up to display the same information.
Why choose a stacked bar chart?The main reasons to use a stacked (as opposed to normal) bar chart are: to enable part-to-whole comparisons for each data series and (more significantly for this type) to see trends in the whole for each data series. For the chosen image part-to-whole would be comparison of the individual sums in a column with each other and the whole. A comparison of the whole would be measuring of total expenditure against total revenue. Using stacked bar charts for part-to-whole comparison is not effective, only the data at the bottom of each column share a common baseline. This makes it difficult to both accurately measure any other part value within the column and (as a result) make comparisons between parts. A grouped bar chart is a far better choice to enable comparison between the parts of a data series.
Important ConsiderationsIn this particular chart, data in each column are sorted according to their contribution to the whole; from largest to smallest going from bottom to top respectively. This allows the greatest contributor of each series (the largest colour block) to be measured more accurately as they begin at the baseline. In addition, as these parts all share the same baseline it is easier to make inter-column comparisons. These factors make this type of chart more useful when a significant contribution is made by one member of each data series; as the trend followed by the sum mirrors that of its largest component (mainly if the same factor is the largest contributor to each series/column). On the other hand if the smaller contributors are significant for reasons other than their size then this type of chart combined with the ordering used (placing the smallest members at the top of each column) will make it difficult to measure them with any accuracy. Additionally if some of the parts are significantly smaller than the whole for any column; then, as the y-axis (the quantitative-ratio variable) will need to extend to a value large enough to incorporate the peak of the column, the scale might not include small enough divisions to make interpretation of these parts feasible. When ordering the members of each data series, if the values of the colour category are the same in each (the same colours are all present but only once in each column) but have different quantitative-ratio values, then arranging the parts from largest to smallest might lead to a different ordering of these coloured segments from one stack to the next. In this situation sum the data for each category value across all of the series and then order the colours depending upon the total for each. This will have the added benefit of providing extra information within the chart as the ordering from bottom to top of any column will indicate the relative distributions for all of the different category values. As an example; consider a stacked bar chart representing 4 customer support staff (A, B, C and D), and the ratings that users have assigned the service each provided (ranges are bad, average and good). With |RATING| being the number of reviews of quality-RATING; if A has reports such that |bad|<|good|<|average| and B such that |bad|<|average|<|good|, then sum all bad values into bad total, likewise for average and good. Then order these totals. This order should be the one used to arrange the components of each column, colour representing largest total at the bottom all the way up to that for the smallest total at the top. This convention should also be followed for the legend as it will make it easier to read, so key at bottom should be for value with largest total up to that at the top for smallest total. Using this format will also make the chart more accessible to colour blind users and enable a better interpretation of information for all. Stacked bar graphs are more useful when multiple series (each containing common values for the colour category variable) are to be compared. The example chosen only compares 2 and it would appear they contain no common values for the colour category variable.The graph in the image is also 3-dimensional. Adding a 3rd dimension can make a graph misleading. Associating areas with a 1 dimensional change can lead to confusions with scaling. Using a pyramid (as in this image) goes some way towards solving this, but it is not clear what the decreases in area for the smaller contributing sums (those higher up the pyramid) represent or to what scale they are made. Also presenting the user with an area at the top of the column may cause confusion, should the y-axis value be read using the front or the back (in terms of perspective) of the square for the top element of data? The use of a 3-d isometric viewpoint also requires extra space around each column making it harder to compare the heights of the columns and their parts. For a chart with more series the use of this style would allow fewer columns within the focal area than a 2-d type with the series plotted closely. This type is therefore not data dense and as Edward Tufte states, in Envisioning Information, "Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, require viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice". Tufte again criticises this display media, "The user is also forced to remember things seen in one view so that he or she can use the other view effectively. This means that the user's short term memory is occupied with the incidentals rather than with the significant issues of analysis". Representing the chart in 3 dimensions might have the advantage of aesthetic appeal, and could attract the non-captive user to study the graph, although as stated it might then make it harder to interpret. If the user is captive, consider avoiding the use of a 3rd dimension.It might be more informative to present the data contained in this particular 3-d pyramid chart as a 2-dimensional combination grouped bar and line chart. A more suitable partitioning of the raw data might enable deeper understanding. For example, division of the component sums of both expenditure and revenue into a certain number of categories whose values are ranges. For example give the colour category 4 values (placing each element of data from the series for expenditure or revenue in one and only one) say components of expenditure/revenue with individual sums of; <= 250,000, 250,001-500,000, 500,001-750,000 and >=750,000. This would both give an indication of how much small or large expenditures/revenues contribute to the total of each and allow the use of the same legend for each column/pyramid with no loss of the information which the original intended to communicate. The grouped bar chart would then consist of 2 sets of columns, 4 in each, requiring 4 different colours/shades. Now the legend could use one hue and have different shades to represent each value, for example the group with the lowest upper bound could be represented by a very light red then gradually get darker for each value until a very dark red is used for the key of the value which has the highest lower bound. This would make it clear that there is a trend moving through each of the four grouped columns of both expenditure and revenue especially if they were ordered from lightest to darkest going left to right for both positional category values (in this case expenditure and revenue).
In SummaryUltimately, the stacked bar chart is best used when making comparisons between a set of data which can be grouped into series, where the information conveyed places emphasis on the relative sizes of the whole/sum of each series and not on part-to-whole. It is more informative for multiple series and for those especially when they contain common nominal elements. It is not recommended to use a 3-d version of this graphic that contains no extra information than is present in the 2-d case. The pyramid form does not compensate for the associated failings.
(From the course "Information Visualization: Getting Dashboards Right")
Stashed in: music, For Milo, For Conrad, Data Visualizations
3:16 PM Sep 29 2014