Logging Formats and Standards
september 2010 by hanicker
I have discussed the topic of logging standards multiple times on this blog. Some recent developments in the logging space urged me to give an update and provide my opinion:
Yet another vendor just released a “standard” log format (note the quotes around standard). It’s called UCF, the Universal Collection Framework™ (UCF). This is how the vendor describes it:
UCF is the first WAN-aware, store-and-forward, encrypted, compressed IT data transport. It allows customers to gather IT data, increase resilience, reduce network chatter and encrypt from almost any device, anywhere, quickly and easily. UCF leverages a new transport and store protocol that LogLogic intends to open source in the near future.
Sounds a whole lot like syslog. (syslog-ng and rsyslog seem to support exactly this!) Okay, let’s just look at this description: WAN aware? What the heck is that supposed to mean? You mean it won’t work well on a LAN? Does that mean it knows the Internets? That’s just a strange description to start with. Oh, and it’s the first property mentioned! The rest of the description sounds like a transport protocol. Interesting. Why not stick with syslog that is well known, has proven to work, and has integration libraries built already. I never understood why vendors implemented their own transport protocols. They are hard (very hard) to implement and even harder for producers and consumers to adopt to. Oh well.
When people talk about UCF, they keep bringing up ArcSight’s CEF. Well, I am greatly responsible for that specification. But guess what? It’s not a transport protocol! It’s a syntax definition. It tells a log producer how to format their log file. Not how to transport it. Because, there is always syslog that a lot of machines have installed already and it’s easy to use. (And in newer versions you get encryption, caching, etc.).
Now, my last point about standards. Why do vendors keep trying to come up with standards by themselves? It just doesn’t make any sense. How is going to adapt it? At ArcSight, about 4 years ago, we came up with CEF because CEE didn’t move fast enough and we wanted something that our partners could easily use. An analyst wrote that ArcSight is planning to take CEF to the IETF. I hope they are not going to do that. I don’t have any control over that anymore, but that would be stupid. We rather push CEE through IETF. If you have a chance, compare the CEE syntax proposal with CEF. Notice something? Yes. It’s very similar. Again, I might have had something to do with that. Anyways. Vendors should not define logging standards!
On a good note: CEE is moving forward and just released the architecture overview for public commentary. Check them out!
Uncategorized
from google
Yet another vendor just released a “standard” log format (note the quotes around standard). It’s called UCF, the Universal Collection Framework™ (UCF). This is how the vendor describes it:
UCF is the first WAN-aware, store-and-forward, encrypted, compressed IT data transport. It allows customers to gather IT data, increase resilience, reduce network chatter and encrypt from almost any device, anywhere, quickly and easily. UCF leverages a new transport and store protocol that LogLogic intends to open source in the near future.
Sounds a whole lot like syslog. (syslog-ng and rsyslog seem to support exactly this!) Okay, let’s just look at this description: WAN aware? What the heck is that supposed to mean? You mean it won’t work well on a LAN? Does that mean it knows the Internets? That’s just a strange description to start with. Oh, and it’s the first property mentioned! The rest of the description sounds like a transport protocol. Interesting. Why not stick with syslog that is well known, has proven to work, and has integration libraries built already. I never understood why vendors implemented their own transport protocols. They are hard (very hard) to implement and even harder for producers and consumers to adopt to. Oh well.
When people talk about UCF, they keep bringing up ArcSight’s CEF. Well, I am greatly responsible for that specification. But guess what? It’s not a transport protocol! It’s a syntax definition. It tells a log producer how to format their log file. Not how to transport it. Because, there is always syslog that a lot of machines have installed already and it’s easy to use. (And in newer versions you get encryption, caching, etc.).
Now, my last point about standards. Why do vendors keep trying to come up with standards by themselves? It just doesn’t make any sense. How is going to adapt it? At ArcSight, about 4 years ago, we came up with CEF because CEE didn’t move fast enough and we wanted something that our partners could easily use. An analyst wrote that ArcSight is planning to take CEF to the IETF. I hope they are not going to do that. I don’t have any control over that anymore, but that would be stupid. We rather push CEE through IETF. If you have a chance, compare the CEE syntax proposal with CEF. Notice something? Yes. It’s very similar. Again, I might have had something to do with that. Anyways. Vendors should not define logging standards!
On a good note: CEE is moving forward and just released the architecture overview for public commentary. Check them out!
september 2010 by hanicker
Producing and consuming OData feeds: An end-to-end example
february 2010 by hanicker
Having waxed theoretical about the Open Data Protocol (OData), it’s time to make things more concrete. I’ve been adding instrumentation to monitor the health and performance of my elmcity service. Now I’m using OData to feed the telemetry into Excel. It makes a nice end-to-end example, so let’s unpack it.
Data capture
The web and worker roles in my Azure service take periodic snapshots of a set of Windows performance counters, and store those to an Azure table. Although I could be using the recently-released Azure diagnostics API, I’d already come up with my own approach. I keep a list of the counters I want to measure in another Azure table, shown here in Cerebrata’s viewer/editor:
When you query an Azure table like this one, the records come back packaged as content elements within Atom entries:
<entry m:etag="W/"datetime'2010-02-09T00%3A00%3A53.7164253Z'"">
<id>http://elmcity.table.core.windows.net/monitor(PartitionKey='ProcessMonitor',
RowKey='634012704503641218')</id>
<content type="application/xml">
<m:properties>
<d:PartitionKey>ProcessMonitor</d:PartitionKey>
<d:RowKey>634012704503641218</d:RowKey>
<d:HostName>RD00155D317B3F</d:HostName>
<d:ProcName>WaWorkerHost</d:ProcName>
<d:mem_available_mbytes m:type="Edm.Double">1320</d:mem_available_mbytes>
...snip...
<d:tcp_connections_established m:type="Edm.Double">24</d:tcp_connections_established>
</m:properties>
</content>
</entry>
This isn’t immediately obvious if you use the storage client libary that comes with the Azure SDK, which wraps an ADO.NET Data Services abstraction around the Azure table service. But if you peek under the covers using a tool like Eric Lawrence’s astonishingly capable Fiddler, you’ll see nothing but Atom entries. In order to get direct access to them, I don’t actually use the storage client library in the SDK, but instead use an alternate interface that exposes the underlying HTTP/REST machinery.
Exposing data services
If the Azure table service did not require special authentication, it would itself be an OData service that you could point any OData-aware client at. To fetch recent entries from my table of snapshots, for example, you could use this URL in any browser:
GET http://elmcity.table.core.windows.net/monitor?$filter=Timestamp+gt+datetime’2010-02-08′
(A table named ‘monitor’ is where the telemetry data are stored.)
The table service does require authentication, though, so in order to export data feeds I’m creating wrappers around selected queries. Until recently, I’ve always packaged the query response as a .NET List of Dictionaries. A record in an Azure table maps nicely to a Dictionary. Both are flexible bags of name/value pairs, and a Dictionary is easily consumed from both C# and IronPython.
To enable OData services I just added an alternate method that returns the raw response from an Azure table query. Then I extended the public namespace of my service, adding a /odata mapping that accepts URL parameters for the name of a table, and for the text of a query. I’m doing this in ASP.NET MVC, but there’s nothing special about the technique. If you were working in, say, Rails or Django, it would be just the same. You’d map out a piece of public namespace, and wire it to a parameterized service that returns Atom feeds.
Discovering data services
An OData-aware client can use an Atom service document to find out what feeds are available from a provider. The one I’m using looks kind of like this:
<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<service xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:app='http://www.w3.org/2007/app' xmlns='http://www.w3.org/2007/app'>
<workspace>
<atom:title>elmcity odata feeds</atom:title>
<collection href='http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48'>
<atom:title>recent monitor data (web and worker roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48&
query=ProcName eq 'WaWebHost'">
<atom:title>recent monitor data (web roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48&
query=ProcName eq 'WaWorkerHost'">
<atom:title>recent monitor data (worker roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=counters">
<atom:title>peformance counters</atom:title>
</collection>
</workspace>
</service>
PowerPivot is an Excel add-in that knows about this stuff. Here’s a picture of PowerPivot discovering those feeds:
It’s straightforward for any application or service, written in any language, running in any environment, to enable this kind of discovery.
Using data services
In my case, PowerPivot — which is an add-in that brings some nice business intelligence capability to Excel — makes a good consumer of my data services. Here are some charts that slice my service’s request execution times in a couple of different ways:
Again, it’s straightforward for any application or service, written in any language, running in any environment, to do this kind of thing. It’s all just Atom feeds with data-describing payloads. There’s nothing special about it, which is the whole point. If things pan out as I hope, we’ll have a cornucopia of OData feeds — from our banks, from our Internet service providers, from our governments, and from every other source that currently publishes data on paper, or in less useful electronic formats like PDF and HTML. And we’ll have a variety of OData clients, on mobile devices and on our desktops and in the cloud, that enable us to work with those data feeds.
Uncategorized
from google
Data capture
The web and worker roles in my Azure service take periodic snapshots of a set of Windows performance counters, and store those to an Azure table. Although I could be using the recently-released Azure diagnostics API, I’d already come up with my own approach. I keep a list of the counters I want to measure in another Azure table, shown here in Cerebrata’s viewer/editor:
When you query an Azure table like this one, the records come back packaged as content elements within Atom entries:
<entry m:etag="W/"datetime'2010-02-09T00%3A00%3A53.7164253Z'"">
<id>http://elmcity.table.core.windows.net/monitor(PartitionKey='ProcessMonitor',
RowKey='634012704503641218')</id>
<content type="application/xml">
<m:properties>
<d:PartitionKey>ProcessMonitor</d:PartitionKey>
<d:RowKey>634012704503641218</d:RowKey>
<d:HostName>RD00155D317B3F</d:HostName>
<d:ProcName>WaWorkerHost</d:ProcName>
<d:mem_available_mbytes m:type="Edm.Double">1320</d:mem_available_mbytes>
...snip...
<d:tcp_connections_established m:type="Edm.Double">24</d:tcp_connections_established>
</m:properties>
</content>
</entry>
This isn’t immediately obvious if you use the storage client libary that comes with the Azure SDK, which wraps an ADO.NET Data Services abstraction around the Azure table service. But if you peek under the covers using a tool like Eric Lawrence’s astonishingly capable Fiddler, you’ll see nothing but Atom entries. In order to get direct access to them, I don’t actually use the storage client library in the SDK, but instead use an alternate interface that exposes the underlying HTTP/REST machinery.
Exposing data services
If the Azure table service did not require special authentication, it would itself be an OData service that you could point any OData-aware client at. To fetch recent entries from my table of snapshots, for example, you could use this URL in any browser:
GET http://elmcity.table.core.windows.net/monitor?$filter=Timestamp+gt+datetime’2010-02-08′
(A table named ‘monitor’ is where the telemetry data are stored.)
The table service does require authentication, though, so in order to export data feeds I’m creating wrappers around selected queries. Until recently, I’ve always packaged the query response as a .NET List of Dictionaries. A record in an Azure table maps nicely to a Dictionary. Both are flexible bags of name/value pairs, and a Dictionary is easily consumed from both C# and IronPython.
To enable OData services I just added an alternate method that returns the raw response from an Azure table query. Then I extended the public namespace of my service, adding a /odata mapping that accepts URL parameters for the name of a table, and for the text of a query. I’m doing this in ASP.NET MVC, but there’s nothing special about the technique. If you were working in, say, Rails or Django, it would be just the same. You’d map out a piece of public namespace, and wire it to a parameterized service that returns Atom feeds.
Discovering data services
An OData-aware client can use an Atom service document to find out what feeds are available from a provider. The one I’m using looks kind of like this:
<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<service xmlns:atom='http://www.w3.org/2005/Atom'
xmlns:app='http://www.w3.org/2007/app' xmlns='http://www.w3.org/2007/app'>
<workspace>
<atom:title>elmcity odata feeds</atom:title>
<collection href='http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48'>
<atom:title>recent monitor data (web and worker roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48&
query=ProcName eq 'WaWebHost'">
<atom:title>recent monitor data (web roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=monitor&hours_ago=48&
query=ProcName eq 'WaWorkerHost'">
<atom:title>recent monitor data (worker roles)</atom:title>
</collection>
<collection href="http://elmcity.cloudapp.net/odata?table=counters">
<atom:title>peformance counters</atom:title>
</collection>
</workspace>
</service>
PowerPivot is an Excel add-in that knows about this stuff. Here’s a picture of PowerPivot discovering those feeds:
It’s straightforward for any application or service, written in any language, running in any environment, to enable this kind of discovery.
Using data services
In my case, PowerPivot — which is an add-in that brings some nice business intelligence capability to Excel — makes a good consumer of my data services. Here are some charts that slice my service’s request execution times in a couple of different ways:
Again, it’s straightforward for any application or service, written in any language, running in any environment, to do this kind of thing. It’s all just Atom feeds with data-describing payloads. There’s nothing special about it, which is the whole point. If things pan out as I hope, we’ll have a cornucopia of OData feeds — from our banks, from our Internet service providers, from our governments, and from every other source that currently publishes data on paper, or in less useful electronic formats like PDF and HTML. And we’ll have a variety of OData clients, on mobile devices and on our desktops and in the cloud, that enable us to work with those data feeds.
february 2010 by hanicker
Breaking Password Based Encryption with Azure
january 2010 by hanicker
During a recent security review, we came across a .NET application that was encrypting query string data to thwart parameter based attacks. We had not been given access to the source code, but concluded this since each .aspx page was being passed a single Base64 encoded parameter which, when decoded, produced binary data with varying 16 byte blocks (likely AES considering it is the algorithm of choice for many .NET developers).
The Code
After doing some research (aka plugging the words “.NET”, “Query String” and “Encryption” into Google), we identified several references to a piece of code that had been written and published a few years back for encrypting query strings in .NET. The code we found even used the same parameter name as our application did to pass the encrypted query string data to each page, so we were fairly confident it was the code they were using.
Having written SPF, I am always interested to see how other applications implement cryptography since I know it is not always easy to do properly. In addition to the common problem of re-using the same IV for every encrypted query string, we noticed that the key was entirely derived from a static password embedded in the code (it was being derived using the .NET Framework PasswordDeriveBytes class directly from the literal string value “key”).
For reference, I’ve included the Decrypt method below:
private const string ENCRYPTION_KEY = "key";
public static string Decrypt(string inputText)
{
RijndaelManaged rijndaelCipher = new RijndaelManaged();
byte[] encryptedData = Convert.FromBase64String(inputText);
byte[] salt = Encoding.ASCII.GetBytes(ENCRYPTION_KEY.Length.ToString());
PasswordDeriveBytes secretKey = new PasswordDeriveBytes(ENCRYPTION_KEY, salt);
using (ICryptoTransform decryptor = rijndaelCipher.CreateDecryptor(secretKey.GetBytes(32),
secretKey.GetBytes(16)))
{
using (MemoryStream memoryStream = new MemoryStream(encryptedData))
{
using (CryptoStream cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read))
{
byte[] plainText = new byte[encryptedData.Length];
int decryptedCount = cryptoStream.Read(plainText, 0, plainText.Length);
return Encoding.Unicode.GetString(plainText, 0, decryptedCount);
}
}
}
}
Password based encryption schemes like this are common in many applications, since the key can easily be represented by a word or passphrase. The nice thing from an attacker’s perspective is that regardless of how large the real encryption key is, the feasibility of a brute force attack is largely dependent on the length and complexity of the password used to derive the key and not the key itself. So for this example, even though they are using 256-Bit AES encryption (generally considered secure), the password used to generate the key is easily brute forced since it is only 3 characters.
Given the code we found, the first and obvious test was to try decrypting our query string values with the same “key” string. Sadly that didn’t work. After trying several educated guesses at what we thought could be the password, I decided to clone the decryption logic into a .NET console utility and run a recursive alphanumeric brute force against the password. The approach was rather simple:
Take one of our encrypted samples
Loop through every alphanumeric character combination
Using the identical logic shown above, derive the key and decrypt
The caveat here is that we really don’t know what value to expect when it decrypts, but chances are it should be just ASCII text (and hopefully a query string name/value pair). The good news is that most of the keys we generate will generate a CryptographicException, so we can rule out any key value that results in this exception. For safety’s sake I decided to convert the results of every successful decrypt to ASCII and save for further review if needed.
The Cloud
After running the utility for an hour or so I realized that a laptop Windows instance was not the optimal environment for running a brute force password crack (not to mention it rendered the machine pretty useless in the meantime). Having recently signed up for a test account on the Microsoft Azure cloud platform for some unrelated WCF testing, I thought this would be a great opportunity to test out the power of the Microsoft cloud. Even better, Azure is FREE to use until February 1, 2010.
The concept of using the cloud to crack passwords is not new. Last year, David Campbell wrote about how to use Amazon EC2 to crack a PGP passphrase. Having never really worked with the Azure platform (aside from registering for a test account), I first needed to figure out the best way to perform this task in the environment. Windows Azure has two main components, which both run on the Azure Fabric Controller (the hosting environment of Windows Azure):
Compute – Provides the computation environment. Supports “Web Roles” (essentially web services and web applications) and “Worker Roles” (services that run in the background)
Storage – Provides scalable storage (Blobs, Tables, Queue)
I decided to create and deploy a “Worker Role” to run the password cracking logic, and then log all output to a table in the storage layer. I’ll spare you the boring details of how to port a console utility to a Worker Role, but it’s fairly simple. The first run of the Worker Role was able to produce approximately 1,000,000 decryption attempts every 30 minutes, or about 555 tries/second. This was definitely faster than the speed I was getting on the laptop, but not exactly what I was hoping for from “the cloud”.
I did some research on how the Fabric Controller allocates resources to each application, and as it turns out there are 4 VM sizes available as shown below:
Compute Instance Size
CPU
Memory
Instance Storage
I/O Performance
Small
1.6 GHz
1.75 GB
225 GB
Moderate
Medium
2 x 1.6 GHz
3.5 GB
490 GB
High
Large
4 x 1.6 GHz
7 GB
1,000 GB
High
Extra large
8 x 1.6 GHz
14 GB
2,040 GB
High
The size of the VM used by the Worker Role is controlled through the role properties that get defined when the role is configured in Visual Studio. By default, roles are set to use the “small” VM, but this is easily changed to another size. The task at hand is all about CPU, so I increased the VM to “Extra Large” and redeployed the worker role.
Expecting significant performance gains, I was disappointed to see that the newly deployed role was running at the same exact speed as before. The code was clearly not taking full advantage of all 8 cores, so a little more research led me to the Microsoft Task Parallel Library (TPL). TPL is part of the Parallel Extensions, a managed concurrency library developed by Microsoft for .NET that was specifically designed to make running parallel processes in a multi-core environment easy. Parallel Extensions are included by default as part of the .NET 4.0 Framework release. Unfortunately Azure does not currently support .NET 4.0, but luckily TPL is supported on .NET 3.5 through the Reactive Extensions for .NET (Rx).
Once you install Rx, you can reference the System.Threading.Tasks namespace which includes the Parallel class. Of specific interest for our purpose is the Parallel.For method. Essentially, this method executes a for loop in which iterations may run in parallel. Best of all, the job of spawning and terminating threads, as well as scaling the number of threads according to the number of available processors, is done automatically by the library.
As expected, this was the secret sauce I had been missing. Once re-implemented with a Parallel.For loop, the speed increased significantly to 7,500,000 decryption attempts every 30 minutes, or around 4,200 tries/second. That’s 1M tries every 4 minutes, meaning we can crack a 5 character alphanumeric (lowercase) password in about 4 hours, or the same 6 character equivalent in about 6 days. This is still significantly slower than the speed obtained by Campbell’s experiment, but then again he was using a distributed program designed specifically for fast password cracking (as opposed to the proof of concept code we are using here), not to mention I am also logging output to a database in the storage layer. At the time of writing, the password hasn’t cracked but the worker process has only been running for about 24 hours (so there’s still plenty of time). What remains to be seen is how fast this same code would run in the Amazon EC2 cloud, which may be a comparison worth doing.
The important takeaway here is not about the power of the cloud (since there’s nothing we can do to stop it), but rather about Password Based Encryption. Regardless of key length and choice of algorithm, the strength of your encryption always boils down to the weakest link…which in this case, is the choice of password.
Uncategorized
from google
The Code
After doing some research (aka plugging the words “.NET”, “Query String” and “Encryption” into Google), we identified several references to a piece of code that had been written and published a few years back for encrypting query strings in .NET. The code we found even used the same parameter name as our application did to pass the encrypted query string data to each page, so we were fairly confident it was the code they were using.
Having written SPF, I am always interested to see how other applications implement cryptography since I know it is not always easy to do properly. In addition to the common problem of re-using the same IV for every encrypted query string, we noticed that the key was entirely derived from a static password embedded in the code (it was being derived using the .NET Framework PasswordDeriveBytes class directly from the literal string value “key”).
For reference, I’ve included the Decrypt method below:
private const string ENCRYPTION_KEY = "key";
public static string Decrypt(string inputText)
{
RijndaelManaged rijndaelCipher = new RijndaelManaged();
byte[] encryptedData = Convert.FromBase64String(inputText);
byte[] salt = Encoding.ASCII.GetBytes(ENCRYPTION_KEY.Length.ToString());
PasswordDeriveBytes secretKey = new PasswordDeriveBytes(ENCRYPTION_KEY, salt);
using (ICryptoTransform decryptor = rijndaelCipher.CreateDecryptor(secretKey.GetBytes(32),
secretKey.GetBytes(16)))
{
using (MemoryStream memoryStream = new MemoryStream(encryptedData))
{
using (CryptoStream cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read))
{
byte[] plainText = new byte[encryptedData.Length];
int decryptedCount = cryptoStream.Read(plainText, 0, plainText.Length);
return Encoding.Unicode.GetString(plainText, 0, decryptedCount);
}
}
}
}
Password based encryption schemes like this are common in many applications, since the key can easily be represented by a word or passphrase. The nice thing from an attacker’s perspective is that regardless of how large the real encryption key is, the feasibility of a brute force attack is largely dependent on the length and complexity of the password used to derive the key and not the key itself. So for this example, even though they are using 256-Bit AES encryption (generally considered secure), the password used to generate the key is easily brute forced since it is only 3 characters.
Given the code we found, the first and obvious test was to try decrypting our query string values with the same “key” string. Sadly that didn’t work. After trying several educated guesses at what we thought could be the password, I decided to clone the decryption logic into a .NET console utility and run a recursive alphanumeric brute force against the password. The approach was rather simple:
Take one of our encrypted samples
Loop through every alphanumeric character combination
Using the identical logic shown above, derive the key and decrypt
The caveat here is that we really don’t know what value to expect when it decrypts, but chances are it should be just ASCII text (and hopefully a query string name/value pair). The good news is that most of the keys we generate will generate a CryptographicException, so we can rule out any key value that results in this exception. For safety’s sake I decided to convert the results of every successful decrypt to ASCII and save for further review if needed.
The Cloud
After running the utility for an hour or so I realized that a laptop Windows instance was not the optimal environment for running a brute force password crack (not to mention it rendered the machine pretty useless in the meantime). Having recently signed up for a test account on the Microsoft Azure cloud platform for some unrelated WCF testing, I thought this would be a great opportunity to test out the power of the Microsoft cloud. Even better, Azure is FREE to use until February 1, 2010.
The concept of using the cloud to crack passwords is not new. Last year, David Campbell wrote about how to use Amazon EC2 to crack a PGP passphrase. Having never really worked with the Azure platform (aside from registering for a test account), I first needed to figure out the best way to perform this task in the environment. Windows Azure has two main components, which both run on the Azure Fabric Controller (the hosting environment of Windows Azure):
Compute – Provides the computation environment. Supports “Web Roles” (essentially web services and web applications) and “Worker Roles” (services that run in the background)
Storage – Provides scalable storage (Blobs, Tables, Queue)
I decided to create and deploy a “Worker Role” to run the password cracking logic, and then log all output to a table in the storage layer. I’ll spare you the boring details of how to port a console utility to a Worker Role, but it’s fairly simple. The first run of the Worker Role was able to produce approximately 1,000,000 decryption attempts every 30 minutes, or about 555 tries/second. This was definitely faster than the speed I was getting on the laptop, but not exactly what I was hoping for from “the cloud”.
I did some research on how the Fabric Controller allocates resources to each application, and as it turns out there are 4 VM sizes available as shown below:
Compute Instance Size
CPU
Memory
Instance Storage
I/O Performance
Small
1.6 GHz
1.75 GB
225 GB
Moderate
Medium
2 x 1.6 GHz
3.5 GB
490 GB
High
Large
4 x 1.6 GHz
7 GB
1,000 GB
High
Extra large
8 x 1.6 GHz
14 GB
2,040 GB
High
The size of the VM used by the Worker Role is controlled through the role properties that get defined when the role is configured in Visual Studio. By default, roles are set to use the “small” VM, but this is easily changed to another size. The task at hand is all about CPU, so I increased the VM to “Extra Large” and redeployed the worker role.
Expecting significant performance gains, I was disappointed to see that the newly deployed role was running at the same exact speed as before. The code was clearly not taking full advantage of all 8 cores, so a little more research led me to the Microsoft Task Parallel Library (TPL). TPL is part of the Parallel Extensions, a managed concurrency library developed by Microsoft for .NET that was specifically designed to make running parallel processes in a multi-core environment easy. Parallel Extensions are included by default as part of the .NET 4.0 Framework release. Unfortunately Azure does not currently support .NET 4.0, but luckily TPL is supported on .NET 3.5 through the Reactive Extensions for .NET (Rx).
Once you install Rx, you can reference the System.Threading.Tasks namespace which includes the Parallel class. Of specific interest for our purpose is the Parallel.For method. Essentially, this method executes a for loop in which iterations may run in parallel. Best of all, the job of spawning and terminating threads, as well as scaling the number of threads according to the number of available processors, is done automatically by the library.
As expected, this was the secret sauce I had been missing. Once re-implemented with a Parallel.For loop, the speed increased significantly to 7,500,000 decryption attempts every 30 minutes, or around 4,200 tries/second. That’s 1M tries every 4 minutes, meaning we can crack a 5 character alphanumeric (lowercase) password in about 4 hours, or the same 6 character equivalent in about 6 days. This is still significantly slower than the speed obtained by Campbell’s experiment, but then again he was using a distributed program designed specifically for fast password cracking (as opposed to the proof of concept code we are using here), not to mention I am also logging output to a database in the storage layer. At the time of writing, the password hasn’t cracked but the worker process has only been running for about 24 hours (so there’s still plenty of time). What remains to be seen is how fast this same code would run in the Amazon EC2 cloud, which may be a comparison worth doing.
The important takeaway here is not about the power of the cloud (since there’s nothing we can do to stop it), but rather about Password Based Encryption. Regardless of key length and choice of algorithm, the strength of your encryption always boils down to the weakest link…which in this case, is the choice of password.
january 2010 by hanicker