General

Batch and stream processing

Batch processing is where a dataset is processed on a regular schedule (not in real-time).

Stream processing is driven by an event (in real-time) as something happens.

Database objects

View

A view is a virtual table based on the results of a SELECT query. You can think of a view as a window on specified rows in one or more underlying tables.

Stored procedure

A stored procedure defines SQL statements that can be run on command. Stored procedures are used to encapsulate programmatic logic in a database for actions that applications need to perform when working with data.

Index

An index helps you search for data in a table. Think of an index over a table like an index at the back of a book. A book index contains a sorted set of references, with the pages on which each reference occurs. When you want to find a reference to an item in the book, you look it up through the index. You can use the page numbers in the index to go directly to the correct pages in the book. Without an index, you might have to read through the entire book to find the references you’re looking for.

Data formats

Structured data

Structured data is data that adheres to a fixed schema, so all of the data has the same fields or properties. Most commonly, the schema for structured data entities is tabular - in other words, the data is represented in one or more tables that consist of rows to represent each instance of a data entity, and columns to represent attributes of the entity.

Semi-structured data

Semi-structured data is information that has some structure, but which allows for some variation between entity instances. For example, while most customers may have an email address, some might have multiple email addresses, and some might have none at all.

Unstructured data Not all data is structured or even semi-structured. For example, documents, images, audio and video data, and binary files might not have a specific structure.

References

Batch and stream processing

Database objects

Data formats

Last modified July 21, 2024: update (e2ae86c)