Why color choice matters for data visualization

"Color choice is really part of your statistical honesty, you are deciding how much effort a reader needs to correctly recover the pattern in your data."

Color is doing at least three jobs at once in a plot:

Encoding information
Different hues separate groups or categories.
Changes in lightness and saturation signal “more” or “less” in numeric scales.

Directing attention
A good palette lets the viewer’s eye land first where you want: the treatment of interest, the interaction that really changed, the outlier that matters. Over-saturated or noisy palettes pull attention everywhere at once, or worse, to the wrong place. 

Maintaining honesty and accessibility
Certain rainbow / “jet” colormaps distort the data: equal steps in the variable do not look like equal steps in color. They create artificial edges and blobs that do not exist in the underlying numbers, and they are hard or impossible to read for many people with color-vision deficiency.

The pieces of color theory that actually matter for data viz

You do not need the full art-school textbook. For scientific figures, a few principles cover most cases.

1. Match palette type to data type

This is the core rule.

Qualitative palettes – use distinct hues with similar lightness/saturation for categories with no intrinsic order (fungicide A vs B vs C; sites NSW / SA / WA).
Keep them limited and well separated; many guidelines suggest ~5–7 clearly distinct colors as a comfortable upper limit.

Sequential palettes – one hue, going from light to dark, for ordered or numeric data (low to high disease pressure, low to high biomass).
Here, perceived lightness should increase smoothly with the value. Perceptually uniform palettes like viridis, cividis, etc. are designed exactly for this.
Diverging palettes – two opposing hues with a neutral middle for deviations around a meaningful centre (log2 fold change around 0, effect size around “no effect”).

If you mix these up (for example, using a sequential scale for categories), then i believe the figure whispers the wrong story about the structure of the data.

2. Think in hue, lightness, and saturation – not just “pretty colors”

A practical way to translate color theory into data viz:

Use hue (blue vs orange vs green) primarily to separate different groups.
Use lightness (how light/dark) to encode magnitude along a scale.
Use saturation sparingly to highlight a subset: vivid for the focal treatment, muted / greyed for background groups.

This roughly corresponds to the HCL (Hue–Chroma–Luminance) color space, which was designed to line up with human perception and is widely recommended for scientific palettes. 

3. Design for people who do not see color like you

A non-trivial slice of the population has some form of color-vision deficiency, often affecting red–green or blue–yellow discrimination.

Practical consequences:

Avoid relying solely on a red vs green distinction. Combine hue with lightness difference or shape/line style where possible.
Check that adjacent colors in your palette differ not just in hue but also a bit in lightness or saturation.
For single-highlight plots, a strong accent color plus greyscale for everything else is often safest and clearest.

How this palette app fits into this

The little app you have just built is not just a toy; it is a practical color-theory assistant:

It forces you to declare intent: qualitative, analogous, sequential, or diverging. That is already half of color theory in data viz – matching palette structure to data structure.
You can generate up to 20 colors, then custom-tune individual swatches. That matters when you discover, for example, that “Color 4 and Color 5” look too similar at small symbol sizes, or when one hue is clashing with a journal’s background.
Each chip shows whether it is a “light” or “dark” color (based on luminance), which nudges you toward good contrast choices and helps you avoid legends full of mid-value mush.
The hex codes are immediately copy-able into R (e.g. scale_colour_manual(values = c("#126DA0", "#3112A0", ...))), Python, Illustrator, PowerPoint, or your manuscript template. That keeps your visual language consistent across figures and over time.

This tool helps you not picking colors blindly from a wheel every time, you are iterating within a constrained, data-aware framework that you control.

Jamil chowdhury, 21 June 2023