Explore Azure Storage for non-relational data
6 minute read
many applications don’t need the rigid structure of a relational database and rely on non-relational (often referred to as NoSQL) storage.
Azure blob storage
- a service that enables you to store massive amounts of unstructured data as binary large objects, or blobs, in the cloud
- store blobs in containers
- organize blobs in a hierarchy of virtual folders
- using a “/” character in a blob name
- folders are purely virtual, and you can’t perform folder-level operations to control access or perform bulk operations.
Azure Blob Storage supports three different types of blob:
Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 100 MB. A block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently
Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines.
Append blobs. An append blob is a block blob optimized to support append operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn’t supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.
Blob storage provides three access tiers, which help to balance access latency and storage cost:
Hot tier is the default. You use this tier for blobs that are accessed frequently. The blob data is stored on high-performance media.
Cool tier has lower performance and incurs reduced storage charges compared to the Hot tier. Use the Cool tier for data that is accessed infrequently. It’s common for newly created blobs to be accessed frequently initially, but less so as time passes. In these situations, you can create the blob in the Hot tier, but migrate it to the Cool tier later. You can migrate a blob from the Cool tier back to the Hot tier.
Archive tier provides the lowest storage cost, but with increased latency. The Archive tier is intended for historical data that mustn’t be lost, but is required only rarely. Blobs in the Archive tier are effectively stored in an offline state. Typical reading latency for the Hot and Cool tiers is a few milliseconds, but for the Archive tier, it can take hours for the data to become available. To retrieve a blob from the Archive tier, you must change the access tier to Hot or Cool. The blob will then be rehydrated. You can read the blob only when the rehydration process is complete.
create lifecycle management policies for blobs in a storage account. A lifecycle management policy can automatically move a blob from Hot to Cool, and then to the Archive tier, as it ages and is used less frequently (policy is based on the number of days since modification).
Azure DataLake Storage Gen2
Azure Data Lake Store Gen1 is a separate service for hierarchical data storage for analytical data lakes, often used by so-called big data analytical solutions that work with structured, semi-structured, and unstructured data stored in files.
Azure Data Lake Storage Gen2 is a newer version of this service that is integrated into Azure Storage; enabling you to take advantage of the scalability of blob storage and the cost-control of storage tiers, combined with the hierarchical file system capabilities and compatibility with major analytics systems of Azure Data Lake Store.
Systems like Hadoop in Azure HDInsight, Azure Databricks, and Azure Synapse Analytics can mount a distributed file system hosted in Azure Data Lake Store Gen2 and use it to process huge volumes of data.
To create an Azure Data Lake Store Gen2 files system, you must enable the Hierarchical Namespace option of an Azure Storage account. You can do this when initially creating the storage account, or you can upgrade an existing Azure Storage account to support Data Lake Gen2
Azure Files
Azure Files is essentially a way to create cloud-based network shares, such as you typically find in on-premises organizations to make documents and other files available to multiple users. By hosting file shares in Azure, organizations can eliminate hardware costs and maintenance overhead, and benefit from high availability and scalable cloud storage for files.
- Azure Files enables you to share up to 100 TB of data in a single storage account.
- The maximum size of a single file is 1 TB
- Azure File Storage supports up to 2000 concurrent connections per shared file.
- Azure File Sync service to synchronize locally cached copies of shared files with the data in Azure File Storage.
Azure File Storage offers two performance tiers. The Standard tier uses hard disk-based hardware in a datacenter, and the Premium tier uses solid-state disks. The Premium tier offers greater throughput, but is charged at a higher rate.
Azure Files supports two common network file sharing protocols:
Server Message Block (SMB) file sharing is commonly used across multiple operating systems (Windows, Linux, macOS). Network File System (NFS) shares are used by some Linux and macOS versions. To create an NFS share, you must use a premium tier storage account and create and configure a virtual network through which access to the share can be controlled.
Azure Tables
NoSQL storage solution that makes use of tables containing key/value data items. Each item is represented by a row that contains columns for the data fields that need to be stored.
enables you to store semi-structured data
All rows in a table must have a unique key (composed of a partition key and a row key), and when you modify data in a table, a timestamp column records the date and time the modification was made; but other than that, the columns in each row can vary
Azure Table Storage tables have no concept of foreign keys, relationships, stored procedures, views, or other objects you might find in a relational database.
Data in Azure Table storage is usually denormalized, with each row holding the entire data for a logical entity
to help ensure fast access, Azure Table Storage splits a table into partitions
Partitioning is a mechanism for grouping related rows, based on a common property or partition key
Rows that share the same partition key will be stored together