Data formats

Data formats

Data is a collection of facts such as numbers, descriptions, and observations used to record information. Data structures in which this data is organized often represents entities that are important to an organization (such as customers, products, sales orders, and so on). Each entity typically has one or more attributes, or characteristics (for example, a customer might have a name, an address, a phone number, and so on).

You can classify data as structured, semi-structured, or unstructured.

Structured data

  • Uses relational databases
  • Requires a fixed schema
    • Each row of data must contain the same set of attributes in the same order
  • Uses a primary key to uniquely identify each row

File formats

  • CSV

Semi-strucutured data

  • Does not need to adhere to a strict schema
    • Each instance of a data element can have different attributes
    • Each attribute can be stored in a different order
  • Allows for variation between instances of data entities
    • Storage of different sets of attributes per instance of a data entity
    • For example, one record may contain a single email address and another two email addresses
  • Allows for a hierarchical schema
    • Some entities can be modeled in a parent/child relationship
    • For example, manager/employee
  • Stores the entity attributes data as-is rather than in tables and columns like a relational DB

File formats

  • XML
  • JSON

Unstructured data

File formats

  • BLOB

Table components

A primary key is the name given to a specific column in the database that stores a unique identifier for each row.

A foreign key is the name given to any column in a database which references rows in another table by that tables unique identifer (primary key). This allows for the contruction of relationships between two tables.

A row is an instance of a data entity. Each row also contains a primary key.

An index is created from several columns to improve the speed of queries.

References

<your_text>

Last modified July 21, 2024: update (e2ae86c)