Data Types

Anyone familiar with computer spreadsheets (e.g. Excel) will be familiar with cell formatting and the different data types (numbers, text, currency, etc.) and a competent computer programmer will be all too aware of the need for consistent data types within variables.  With statistical analysis, it’s important to recognise whether numeric data is parametric or non-parametric; I find it useful to also classify non-numeric data into two types – poetry and prose.  For brevity:

Numeric:parametric – real values such as length or weight.  These can be analysed with the more well-known statistical techniques (mean, standard deviation, Student-t, etc).

Numeric:non-parametric – symbolic values such as first, second, etc., shoe size, preference 1 to 5.  Using the more common techniques will give meaningless results; there is a separate suite for these (chi-square, Mann-Whitney U-test, etc).

Non-numeric:poetry – facts that have a clear structure such as the sequence of events in a process.  Process maps, flowcharts and tables are often an aid to better understanding.

Non-numeric:prose – facts that have no readily discernible structure, perhaps ad hoc comments in a survey.  The lack of structure makes it almost impossible to develop specific analytical techniques for these and, often, it’s necessary to take a step back and hope for inspiration.  Sometimes, a pattern and relationships will emerge if they are translated or transposed into another form (or type).

These latter (non-numeric) types are not necessarily fixed but recognising the distinction can help when trying to make sense of them.

(Posted as a blog 9th April 2018)