Azure的Read-Access Geo-Redundant (RA-GR) 存储


       现在创建 Azure Storage Account时,Replication 的方式选项又增加了一种“Read-Access Geo-Redunant"。该选项允许用户以只读方式访问在远程备份 (secondary) 数据中心备份的数据拷贝,备份操作是Azure在后台自动完成的。这比Geo-redundant是一个进步,Geo-redundant只允许在远程secondary数据中心备份,但用户无法直接利用远端备份的数据,而只能Azure来管理和使用。

下图列出了四种Replication方式:Locally Redundant Storage (LRS), Zone-Redundant Storage, Geo-Redundant Storage和Read-Access Geo-Redundant Storage (RA-GRS) 的比较,详细内容参见Azure Storage Replication

         在使用RA-GRS的存储账户时,经常会问到在Primary和Secondary的同步问题 - 如何确定在Primary中上传的Blob数据已经被复制到了Secondary端? 因为Primary和Secondary之间的数据复制不是严格的先来先复制的,所以不能依靠blob来判定。为此,Azure提供了Last Sync Time来帮助判断复制的进度 - Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage。注意:这个时间是UTC时间。

" However, since transactions across Partition Keys can happen out of order, we introduce a new term called “Last Sync Time” which acts as the conservative RPO time. All primary updates preceding the Last Sync Time (defined in UTC) are guaranteed to be available for read operations at the secondary. Primary updates after this point in time may or may not be available for reads. There is a separate Last Sync Time value provided for Blobs, Tables and Queues for a storage account. The Last Sync Time is calculated by tracking the geo replicated sync time for each partition and then reporting the minimum time for blobs, tables and queues. "

      使用RA-GR存储时,有一个经常遇到的问题是:Primary中新数据需要多久才能同步到异地的Secondary步中?有啥SLA没有?Windows Azure Storage Redundancy Options and Read Access Geo Redundant Storage 给出的答案是没有:

Recover Point Objective (RPO): In GRS and RA-GRS the storage service asynchronously geo-replicates the data from the primary to the secondary location. If there was a major regional disaster and a failover had to be performed, then recent delta changes that had not been geo-replicated could be lost. The number of minutes of potential data lost is referred to as RPO (i.e., the point in time to which data can be recovered to). We typically have a RPO less than 15 minutes, thoughthere is currently no SLA on how long geo-replication takes.”

       在 使用RA-GR存储时还用考虑到可扩展性,特别是命名规则上也有讲究。Microsoft Azure Storage Performance and Scalability Checklist SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency 值得一读。
  • Examine the naming convention you use for accounts, containers, blobs, tables and queues, closely. Consider prefixing account names with a 3-digit hash using a hashing function that best suits your needs.
  • If you organize your data using timestamps or numerical identifiers, you have to ensure you are not using an append-only (or prepend-only) traffic patterns. These patterns are not suitable for a range -based partitioning system, and could lead to all the traffic going to a single partition and limiting the system from effectively load balancing. For instance, if you have daily operations that use a blob object with a timestamp such as yyyymmdd, then all the traffic for that daily operation is directed to a single object which is served by a single partition server. Look at whether the per blob limits and per partition limits meet your needs, and consider breaking this operation into multiple blobs if needed. Similarly, if you store time series data in your tables, all the traffic could be directed to the last part of the key namespace. If you must use timestamps or numerical IDs, prefix the id with a 3-digit hash, or in the case of timestamps prefix the seconds part of the time such as ssyyyymmdd. If listing and querying operations are routinely performed, choose a hashing function that will limit your number of queries. In other cases, a random prefix may be sufficient.”