Identify data formats

We can store data in various formats. The types of formats are: Structured, Semi-structured and Unstructured.

Commonly, we like to group data that represent an entity (such as customers, products, sales orders, and so on). Each entity normally has one or more attributes (such as a customer might have a name, and an address).

Structured Data

Structured data means that the data is tabular in nature. Basically tabular means the data is in tables with rows and columns. The rows represent each instance of a data entity and the columns represent attributes of the entity.

The relational model is designed for structured data. Multiple tables can reference one another by using key columns.

Semi-structured Data

As the name suggests, semi-structured data has some structure but it allows for some variation between entity instances.

If the term ‘entity instance’ confuses you, consider it a row in a table. A single row is an entity instance.

In structured data, we know that each entity instance will have the same fields (columns). However, in semi-structured data, this is not the case. The specific fields may vary between entity instances. So speaking in structured data terms, this is like saying each row in a table may have its own specific set of columns. Which we know in structured data, is not possible but it is possible in semi-structured data.

A common format for semi-structured data is JavaScript Object Notation (JSON).

Unstructured Data

Some data simply has no structure to it. Such as images, audio and video data. This type of data is referred to as unstructured data.

Data Stores

Organisations store data for analysis and reporting. There are two broad categories for data stores:

  • File stores
  • Databases