Choose the appropriate virtual machine size

May include but is not limited to: local size storage, memory, raw processing power, bandwidth

Windows Azure offers several different virtual machine sizes from small 1 GHz machines to multi core power houses. Since you choose, it’s important to be able to make an educated decision. Currently, the following machines are available (the source of the data is this MSDN article):

Name CPU (core/GHz) Memory Disk space Bandwidth
ExtraSmall Shared 1GHz 768 MB RAM 20 GB 5 Mbps
Small 1 / 1.6 GHz 1.75 GB RAM 225 GB 100 Mbps
Medium 2 / 1.6 GHz 3.5 GB RAM 490 GB 200 Mbps
Large 4 / 1.6 GHz 7 GB RAM 1,000 GB 400 Mbps
ExtraLarge 8 / 1.6 GHz 14 GB RAM 2,040 GB 800 Mbps

There’s not much I can say here, I think the numbers speak for themselves. Just memorize that there are four real VM sizes (ExtraSmall is never meant to be used in a production environment), and they start from 1 core, 1.75 GB RAM, 100 Mbps, and each VM size doubles the resources of the previous one.

Design a reliable data access layer to access SQL Azure

May include but is not limited to: define client data access standards, connection timeout scenarios

There’s a nice article on TechNet on this topic and the next one, Design an efficient strategy to avoid data access throttling. I won’t publish another post for it, because there’s such a huge overlap between these subjects. The linked article above explains both of them pretty well.

As I have stated numerous times in these few posts (I think I’m constantly repeating myself) SQL Azure has the unique feature of killing your connections if you don’t play by the rules and try to monopolize all resources of the server where your database server resides. This feature is called throttling, and you should be aware of it and code your data access logic accordingly. This means that you should build in retry logic (the most preferred way for this is to use extension methods on the basic ADO.NET classes, such as SqlConnection and SqlCommand, according to the online examples floating around). Typically (a Microsoft recommendation) is to wait ten seconds and try again. Of course you can increase this time limit if it needed.

Trying to force something that fails constantly isn’t such a bright idea. So on the second or third approach, you could as well check out what the real problem is, and change your workload accordingly – if all else fails, then of course you could partition your database horizontally, but that’s a whole different topic.

Microsoft says you should do these three things in your data access layer: execute transactions in a continuous loop, catch connection termination errors and wait a little to reconnect. Now let’s see some possible reasons of database throttling. I remind you again to check out the TechNet article that is the basis of this post for a complete reference. So the following conditions result in throttling:

  • Consuming more than a million locks.
  • Uncommitted transactions.
  • Locking a system resource for more than 20 seconds.
  • A single transaction’s log file size exceeds 1 GB.
  • Using more than 5 GB tempdb space.
  • Consuming more than 16 MB of memory for more than 20 seconds when memory is limited.
  • Exceeding maximum database size.
  • Idle connections longer than 30 minutes.
  • A transaction running for more than 24 hours.
  • DoS attack (throttles connections from a particular (set of) IP address(es).
  • Network errors.
  • Failover errors.

These are the main reasons of connection throttling. Of course there’s a cure, and it comes in the form of some general best practices recommended by Microsoft. It makes such a perfect set of exam questions that I won’t leave out anything. So the best practices to avoid throttling:

  • Minimize network latency (choose the closest data center).
  • Reduce network usage: cache data and minimize round-trips to the server.
  • Keep your connections open as short as possible.
  • Set up short timeout durations in the connection string.
  • Use connection pooling.
  • Wrap all database operations in transactions with TRY CATCH blocks.
  • Fine tune your T-SQL.
  • Keep (or push) business logic in SQL Azure.
  • Use stored procedures and batching.

Optimize a data access strategy

May include but is not limited to: batch operations and performance techniques, data latency due to location, saving bandwidth cost

These topics were mentioned in some of my posts previously published, but a little review and repeat couldn’t hurt. This topic is about improving performance – both “real” performance, the raw speed of retrieving data, and both perceived performance, such as not blocking your UI thread while you’re waiting for data to be received.

When dealing with the Azure environment you’d like to be cost effective (you should be cost effective in every case, but cloud computing is a bit special, since you literally have to pay for your laziness). This means that you should design your data access strategy in a chunky manner. Roundtrips to the database over encrypted connection tend to be slow, not to mention the possibility of failure. Because of this, you’d like to query bigger chunks of data, and let the SQL Azure engine process that for you to save computing costs, too. You’d better use views and stored procedures, and implement your business logic in them, very close to the data (an arguable design, I admit). Also, you’d like to submit your changes in bulks – a good ORM such as EF or Hibernate can do this for you.

Another great way of reducing bandwidth usage is to load the data that is absolute necessary, and not anything else. The concept of lazy loading is very useful here – and mature ORM tools provide this for you nowadays. Note that lazy loading and chunky data retrieval are contradictory at best, so you have to find the right balance between them.

You can do a lot by caching your results. If you deploy an ASP.NET application to the cloud, you get a great and easy-to-use caching-feature.

The concept of sharding (or horizontal partitioning, or federating…) can help a lot to scale an application horizontally. The idea is relatively simple: you have multiple databases with the exact same schema, and store data based on some criteria (geographic location, customer ID, etc.) in them. The benefits are smaller indexes and faster query results; the drawback is complication in data access code.

The techniques above are real performance tuners, but you can easily boost the perceived performance of your applications by not blocking the main (UI) thread. Make database calls asynchronously, show nice animations while loading, and your app will be judged faster.

Plan a cost-effective data architecture based on business requirements

May include but is not limited to: code migration cost, data migration cost, bandwidth cost

I’m a bit unsure of this section because prices change very often, and there’s little I can say here; I’ve developed solutions, but don’t pay the bill. So I guess this one won’t become the post of the year, but anyways, let’s get this over with.

When working with data in Azure (this holds for SQL Azure and the other data storage mechanisms discussed earlier) you pay for two things: storage and transfer. The storage of your data is relatively cheap, in the time I write this you pay $ 9.99 / month for a 1 GB SQL Azure database. There are various plans depending on what feature you’d like to use the most, and here’s a link for the Azure pricing calculator. Data transfer is relatively cheap, too. You have to pay $1.2 / 10 GB for outbound data in North America and Europe (Zone 1), and a little more, $1.9 / 10 GB for the rest of the world. Inbound data is free.

Storing blob data in Azure costs $7.01 / 50 GB nowadays for a month. But these prices can change every day, so let’s try to grab the concept behind them.

Let’s review a typical scenario: an SQL Azure database is used by an on-premise application. In this case, your primary concern is bandwidth usage. You wouldn’t like to have chatty database connections with Azure, you’d process data in SQL Azure, and send it back to the client in bigger chunks (possibly using views and other mechanisms to achieve this); remember: you don’t have to pay for CPU, memory and anything else when using SQL Azure. Of course you have to play nicely with others; otherwise your database connection will be terminated.

On the other hand, if you run your application in Azure along with your database, then you don’t have to be very concerned with bandwidth usage – traffic inside Azure is free. Because of the increased possibility of your connection being terminated, you shouldn’t be too chatty in this case either.

I can’t say much on code migration costs, but running code in Windows Azure is much more expensive than just storing data. Azure virtual machines come in five flavors, from $30 / month / instance to $720 / month / instance. You pay for compute hours – but this doesn’t mean that you only have to pay when your application is actually doing something. You get charged for the time of your code being deployed in Azure, on an hourly basis, but who removes their apps when the point is to ensure great availability and scalability?

Plan for media storage and accessibility

May include but is not limited to: media accessibility, global distribution with Content Delivery Network (CDN), blob storage

Delivering static content with Windows Azure is fairly easy. We’ve discussed blobs in the first Azure post recently. For a quick refresher blobs are (large) binary objects which come in two flavors: block blobs and page blobs. The former is of interest when dealing with media, since it supports streaming.

To create a blob, you need roughly the following code:

  1. CloudStorageAccount account = CloudStorageAccount.DevelopmentStorageAccount;
  2.       CloudBlobClient client = account.CreateCloudBlobClient();
  3.       //Notice the use of lowercase container reference
  4.       CloudBlobContainer container = client.GetContainerReference("mycontainer");
  5.       container.CreateIfNotExist();
  6.       //This is mandatory if you'd like to use your blob in CDN
  7.       container.SetPermissions(new BlobContainerPermissions() { PublicAccess = BlobContainerPublicAccessType.Blob });
  8.       
  9.       CloudBlob myBlob = container.GetBlobReference("myblob");
  10.       myBlob.UploadText("Hello Blob!");
* This source code was highlighted with Source Code Highlighter.

 

Now a little about the Content Delivery Network. This technology lets you provide content worldwide with great performance (I actually haven’t tested this). The idea is that content is cached and served by the most optimal server closest to your visitor.

CDN is optimized for static content; you can get your hands burned (financially) if you want to use dynamic content with it. To use it, you have to enable public access to your blobs and then, you have to enable CDN of course. Note that CDN isn’t used automatically. If you enable it, but use a Windows Azure Blob service URL (typically in the form of yourname.blob.core.windows.net) then nothing happens, the content will be served from the blob service. You have to specify the CDN URL, which is something like this: guid/vo.msecnd.net. Of course you can use a custom domain (exactly one per storage account) to access your CDN content.

Setting up and working with CDN is fairly easy, if you have access to the Windows Azure management portal you can surely figure it out yourself. More interesting are the facts from the end of Microsoft’s CDN article, because these little side notes have the nasty habit of turning into exam questions (sometimes keeping their wording, too). So blobs less than 10 GB tends to perform the best using CDN. Note that a custom domain can be used for one storage account at a time, so you cannot use the one domain for two (or more) endpoints, but you can use overlapping domain names (such as http://mydomain.com and http://mysubdomain.mydomain.com). Also note the absence of HTTPS; you can only use HTTP for CDN.

 

Design a database migration plan from SQL Server to SQL Azure

May include but is not limited to: differences between SQL Azure and SQL Server, concessions for unsupported features, schema, data, reporting and analytic tooling

Differences between SQL Azure and SQL Server

As I stated in the previous blog post, there are a lot of differences between SQL Azure and SQL Server. Let’s start at the beginning. When you create an SQL Azure database, you’ll have to provide the edition of SQL Azure and the size of database you wish to use. Here’s the list of database sizes available:

  • Web edition
    • 1 GB
    • 5 GB
  • Business edition
    • 10 GB
    • 20 GB
    • 30 GB
    • 40 GB
    • 50 GB
    • 100 GB
    • 150 GB

Note that the pricing varies – you pay $9.99 per database per month up to 1 GB, or $49.95 per database per month up to 5 GB database size in the Web Edition. The business edition costs $99.99 per 10 GB of database per month, and maxes out in $499.95 per database.

Continue reading “Design a database migration plan from SQL Server to SQL Azure”

Choose the appropriate data storage model based on technical requirements

May include but is not limited to: SQL Azure, Cloud drive, performance, scalability, accessibility from other applications and platforms, Windows Azure storage services: blobs, tables and queues

SQL Azure

SQL Azure in itself is big enough to fill a book (In fact, it does fill a book. Most if this section is based on the book Pro SQL Azure by Scott Klein and Herve Roggero) so this section is just a quick introduction. SQL Azure is a transactional database based on SQL Server 2008. It supports the T-SQL language and a limited set of functions from SQL Server. It also supports ADO.NET and ODBC data access. You can even use your favorite SSMS to connect and manage SQL Azure databases, but there’s an online solution, too.

You should be aware that SQL Azure runs in a multitenant environment. This means that you have restrictions on query time, CPU, etc. So if you have a long running query, massive CPU usage, or something similar that might affect another users’ databases on the server, your database connection can be (and will be) throttled (terminated).

Despite this fact you should be aware that using SQL Azure you pay for storage (GBs of database size)*, so you should perform some CPU intensive tasks within SQL Azure instead of your application. The benefit is that CPU usage in SQL Azure is free, while you have to pay for it on an hourly base in your app hosted in the cloud.

Scalability in SQL Azure is revolving around sharding. The design guidelines are explained here. Sharding is a kind of horizontal partitioning; you store rows separately instead of columns. I’ll explain the concept in another blog post later.

Last but not least, have a look at the (most important) limitations of SQL Azure:

  • No support for backing      up/restoring databases (there are workarounds, of course)
  • No USE statement, and you      cannot use database names (this ends cross-database queries)
  • No Windows Authentication
  • Setting server level collation      is disabled
  • No heap tables, clustered      indexes are a must
  • Maximal database size is 150 GB
  • No SQL Server Agent
  • Idle connection are terminated      (after 30 minutes)

For the full list of limitations, see http://msdn.microsoft.com/en-us/library/ee336245.aspx.

Windows Azure Storage Services

Blobs

Blobs – as their name shows are large binary objects stored in the cloud. At the time of this writing, their size maxes out at 200 GB in the case of a block blob and 1 TB when using page blobs. Usually you would store images or video/audio in blobs. Video usage is especially useful, because block blobs support streaming.

I introduced two different kinds of blobs, block and page blobs. Let’s elaborate further on them. If you need further info, refer to MSDN.

Block blobs

A block blob is built from blocks, which can have the maximum size of 4 MBs (the largest block supported in one operation). You are free to modify, delete and insert block of a block blob, commit or discard your changes as needed. The maximum size of a block blob is 200 GB, and it can contain a total of 50.000 blocks.

Page blobs

Page blobs are optimized for random access. They can be 1 TB large, and they are built from 512 byte pages. You cannot “version” your pages, so updates of one or more pages are immediately in effect.

A special subtype of page blobs is Azure Drive (or Cloud drive). This is a VHD mounted as a local drive letter. It was mostly used before the other APIs were available.

Queue storage

Windows Azure provides a queue-based messaging service that you can use for communication between Azure roles (more on them later). Your messages can be 64 KB in size, and generally they are FIFO, but no guarantee exists that they will be treated in this fashion. You can of course process messages bigger than 64 KBs, by using blobs.

Tables

Tables allow you to store entities of 1 MB up to 100 TB. An entity can have 255 “columns” with different data types. Unlike SQL Azure, there is no relational support, so you can’t have foreign keys, joins, etc. The best usage of these tables is for example a leader board for a game. Small in size, not complex, no relationships required.

Table entities have three reserved properties which define a key for the entity: a partition key for the table itself, a row key for the entity within the table, and a timestamp.

Tables come with a fairly limited type set, these are byte[], bool, DateTime, double, Guid, int, long and string. For more info on tables, refer to MSDN.

There is more info about the various storage options in Windows Azure in this Technet article.

 

*The full truth is that you pay for two things: storage and bandwidth, however, bandwidth within an SQL Azure database and an application running inside Windows Azure is free.