Let’s use three tables: Customer, Order and Orderline.Īn ideal setup may be this one, depending on how you want to query everything:Ĭustomer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id What will you choose as the PartitionKey (PK) / RowKey (RK)? Here’s an exercise: say you want to store customers, orders and orderlines. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets.
Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see. Slow: searching on other properties (again, spans multiple partitions and properties). Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. The fastest way of querying? Specifying both PartitionKey and RowKey. Don’t just assign them a guid or a random string as it does matter for performance. PartitionKey and RowKey and performanceīefore building your code, it’s a good idea to think about both properties. So in essence, a RowKey is just the identifier of an entity within a partition. If you use multiple partitions, the same RowKey can be reused in every partition. Within one PartitionKey, you can only have unique RowKeys. PartitionKey + RowKey form the composite unique identifier for an entity. And that’s where our next concept kicks in… RowKey in Table StorageĪ RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. Should you set the PartitionKey to a unique value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to the same value for every entity stored? No. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition.
If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. In essence, you are responsible for the throughput you’ll get on your system. In Table Storage, you have to decide on the PartitionKey yourself. In tables, it’s different: you decide how data is co-located in the system.In queues, every queue is a separate partition.This means that every blob can get the maximal throughput guaranteed by the system. In blob storage, each blob is in a separate partition.Partitions are different for every storage mechanism: If it gets large, it’s moved to a location where there’s enough disk space available. If a partition gets busy, it’s moved to a server which can support the higher load.
Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine:īy doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage:īased on the size and load of a partition, partitions are fanned out across these machines. Partitions are used for scale out in the system. Whenever you store something on Windows Azure storage, it is located on some partition in the system. While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitionsĪll Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What PartitionKey and RowKey are for in Windows Azure Table Storage OctoEdit on GitHubįor the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure.