Hotmail receives billions of email messages each day and these have to be stored safely and has to be made readily available efficiently. Hotmail’s cloud based storage system supports over one billion mailboxes and hundreds of petabytes of data (one petabyte is a million gigabytes). And to handle hundreds of thousands of simultaneous transactions efficiently at this scale is an engineering challenge. The storage system is built using Microsoft technology, including Windows Server and Microsoft SQL Server.
Microsoft has been working on a major upgrade to this storage system. Since the start of this year a new system based on technologies developed at Hotmail has been running on a pilot cluster using personal accounts of Microsoft employees. And after rigorous testing the new system has been certified which provides better reliability at a significantly lower price.
Let us see some of these new technologies used by Hotmail.
Implementing JBOD replacing RAID:
“RAID (Redundant Array of Inexpensive Disks) is a technology that allows several hard drives to be attached to a single controller board, which makes them look like a single larger and much more reliable hard drive (sometimes called a “Logical Unit”) to the software running the storage system.”
Hotmail has been using RAID for a long time. Email were kept on multiple RAID groups so that even when entire RAID fails, messages can be restored. Hotmail studied the reliability perspective of drives of capacity larger than 1 terabyte and found that it was not money’s worth from the reliability point of view. As RAID systems easily deals with problems affecting single system but not when whole machine or the RAID controller runs into problems. Hotmail found that having copies on a different machine not sharing the controller was not only more reliable but cost wise also less expensive than RAID configuration.
So they developed JBOD (Just a Bunch Of Disks) where copies of data resides on independent hard drives, controllers, and machines. Thus making the hard drive controller almost completely out of the way and handing it to software to control. This JBOD system software was developed by Hotmail.This JBOD software constantly monitors for failures and raises an alert when found thus triggering a repair process. This repair process can be rebooting a machine or restarting a process, to fixing data corruption or even involving human intervention if required. The main advantage of using Software is it can maintain the good copies of mails prioritizing repair action if it finds less copies. Using this software, replication itself was simplified
“The storage system consists of a set of machines, each of which has its copy of an email message and a journal recording messages that have arrived, organized by arrival date. The machines talk to each other from time to time, compare their journals, and copy any messages that they realize haven’t been copied to all machines.“
Implementing Solid State Drives (SSDs) instead of Hard Drives.
We know SSDs are much faster than Hard drives. Hard drives though bigger and cheaper are slow in handling the rate of requests.
“A normal hard drive can perform a little more than one hundred read/write operations per second, whereas some of the fastest SSDs can do over one hundred thousand operations per second.”
Though this speed comes at a price as they are much expensive/gigabyte as compared to hard Drives.
SSDs handles the ever changing load efficiently. This can be explained as Hotmail not only stores the email messages but also keeps track of various constantly changing metadata such as list of messages in inbox, read/unread status of messages, conversation threading etc. This metadata though occupies a small fraction of storage space puts the maximum load on Hard drives due to ever changing nature. So using SSDs for storing metadata and Hard disks for messages makes a better and efficient combination.Apart from these improvements Hotmail will reveal many more in future posts. The rollout of new Storage system has already begun and the new clusters will be based on JBOD. Already 30 million users are on JBOD while another 100 million will be moving in course of time.
If you have noticed, Hotmail has become really fast. If you haven’t – try it now.