rahuldave + cloud_computing   11

Video: How SimpleGeo Built a Scalable Geospatial Database with Apache Cassandra
SimpleGeo provides a platform for developers to build location-aware applications, and it has the distinction of being our Most Promising Company For 2011.

Spatial data is multidimensional, so SimpleGeo had to build its own indexing scheme for Apache Cassandra to handle its data. In this presentation, Mike Malone, an infrastructure engineer at SimpleGeo, explains how and why the company did it.

Sponsor

Malone also shared a slide deck explaining all of this last year:

Scaling GIS Data in Non-relational Data StoresView more presentations from Mike Malone.

If you want even more information on different types of consistency in NoSQL databases, check out this presentation from MongoDB's Roger Bodamer.

Discuss
Cloud_Computing  from google
february 2011 by rahuldave
The History of Collaboration [Infographic]
The history of collaboration is the subject of another infographic from Socialcast. A previous infographic the company did late last year explored how to measure the effectiveness of social technologies inside the enterprise.

This one explores collaboration starting with a quote from Charles Darwin about communications. It goes from there to explore the growth of collaboration; the types of workers in the U.S. workforce and the definition of a good worker.

Sponsor

The most compelling aspect of the infographic focuses on the ways collaboration can improve with ten common forms of recent effort.

Discuss
Cloud_Computing  from google
february 2011 by rahuldave
Clustrix Builds the Webscale Holy Grail: A Database That Scales
Clustrix, a Y Combinator graduate from 2006, launched today with the claim that it’s built a transaction database with MySQL-like functionality and reliability that can scale to billions of entries. Clustrix plans to sell its appliance (which consists of more than a terabyte of memory and its proprietary software) to web firms that don’t want to take on the complicated task of sharding their data (replicating it across multiple databases), or moving to less robust database options like Cassandra or a key value store such as what’s provided by Twitter.

This is big stuff. Indeed, Paul Mikesell — CEO of Clustrix and the former co-founder of storage system success story Isilon — said the goal is to use its appliance to solve a growing problem for companies managing large amounts of data, such as big travel, e-commerce and social websites. As the web grows more social, companies are trying to keep track of more pieces of data about users and their relationships to other users. This creates complicated and large databases that can slow down access to user information, and thus the end user experience.

We’ve written about myriad attempts to solve these data scalability problems, attempts that have spawned appliance startups and whole branches of code designed to help sites scale their data, from Hadoop to Cassandra to Twitter’s Gizzard. Mikesell said the product could replace the need for caching appliances such as those offered by Schooner or Northscale, but could also work in conjunction with them.

As for some of the open source options, new programming languages like Bloom, or cloud-based scalable databases such as Microsoft’s SQL Azure or Rackspace’s partnership with FathomDB, Mikesell is confident that the ability to replicate the functionality of a relational database at webscale without sharding or tweaking the existing code is powerful enough that customers would pay $80,000 for a 3-node machine containing the software. There are plenty of companies reluctant to trust the open-source spin-outs from companies like Twitter and Facebook.

The market is clearly there for scalable relational database products (GigaOM Pro, sub req’d), so if Clustrix can take the $18 million invested in it from Sequoia, ATA Ventures and US Venture Partners and turn it into an Isilon-like exit, more power to it.
Cloud_Computing  Infrastructure  Startup_Strategy  Startups  Clustrix  webscale  from google
may 2010 by rahuldave
Bloom Won’t Micromanage Data So Apps Can Scale
Building webscale or cloud applications is hampered by figuring out ways to spread tasks out over thousands of computers without slowing things down, or requiring too many people to keep things running. Virtualization and faster storage helps, as do new databases (GigaOM Pro sub req’d) and caching techniques, but right now folks are trying to adapt how they program computers to reflect that one has now become many.

Bloom, a programming language created at the University of California, Berkeley by a group led by Joseph Hellerstein, is one such effort. Bloom was profiled this week as one of the top 10 emerging technologies by MIT’s Technology Review, because it could help cloud computing continue to scale. Here’s how, according to Technology Review:

The challenge is that these languages process data in static batches. They can’t process data that is constantly changing, such as readings from a network of sensors. The solution, ­Hellerstein explains, is to build into the language the notion that data can be dynamic, changing as it’s being processed. This sense of time enables a program to make provisions for data that might be arriving later — or never.

Hellerstein also gave an extensive interview to HPC in the Cloud this week about what Bloom is and the problem it’s trying to solve. From that interview:

To put it simply, our what our work is trying to do is start with the data itself and get people to talk about what should happen to the data step-by-step through a program without ever having them specify at all how many machines are involved. So, when you ask a query of a database you describe what data you want—not how to get it.

The interview lays out how this programming effort  came about (building network protocols) and who might care most about using Bloom (Amazon, Google or anyone with big data needs), but for me the best part of it was how Hellerstein underscored that the ability to harness a heck of a lot of servers and treat them as a single computer is the next big shift in information technology.

We can call it cloud computing, webscale applications or merely bigger data centers, but the key element here is that the hardware has gone social in ways that require many-to-many ways of communication and delivering instructions to the processors — inside the servers, between the servers, and soon, between data centers. The exciting aspect of this shift is that while larger companies like Google, Yahoo and Amazon are innovating, there is plenty of room for startups with a new appliance, server, networking technology or chunk of code to make waves — and hopefully, money.

For more on the effort, please check out the FAQ’s Hellerstein has posted on his blog.

Image courtesy of Flickr user tibchris
@NYT  CNN_Big_Tech  Cloud_Computing  Infrastructure  SYN_Straight_News  Stacey's_Posts  innovation  Bloom  webscale  from google
april 2010 by rahuldave
Open vs. Closed: Ubuntu Walks the Line
Any debate over open vs. closed systems has to touch on open-source software and the ways in which companies are attempting to build code as a community effort, while still profiting from it in some way. So I chatted with Mark Shuttleworth, CEO of Canonical, the company that supports Ubuntu, about how it walks the line between spending to support open-source software and finding a business model that works.

Canonical’s 330 employees are responsible for maintaining, supporting and selling service for Ubuntu, an open-source version of the Linux operating system for servers, desktops and computer manufacturers. Some 120-150 of the Canonical employees contribute directly to the new releases of the software that come out every six months, and most of the company’s revenue comes from supporting enterprise server customers and makers of computers that want to put Ubuntu on desktops. Consumers also download the software, but few pay Canonical for support. The company is not yet profitable.

Shuttleworth believes that in order to develop a strong business model around an open approach, one has to create an open option early, ideally through a strong standardization process and one also needs to have a lot of different open-source projects fighting it out.  For example, in the operating system world there wasn’t a strong history of open alternatives, which meant that Ubuntu had to out-open its proprietary competition, which has high costs.

In that way it has pushed Canonical perhaps further out toward open on the spectrum. Shuttleworth calculates the direct costs of being so open as bringing people together in ways that empowers them and makes them feel like members of a community, as well as reaching out and putting in place the infrastructure to create a company. However, there are indirect costs as well.

“There is a myth that being open is necessarily more efficient and cheaper, but there are no hordes of people showing up to do the hard stuff,” Shuttleworth says. “Occasionally wonderful, magical things happen — really incredible things do happen, like people show up unexpectedly with brilliant ideas — but it’s still hard and expensive and you still have to be willing to do all the hard and expensive things and do it in an open fashion. And you’re still likely to be accused of being open only when it’s convenient.”

He points to the cloud computing market as one that tends to give a lot of lip service toward openness but where a lack of a big standardization effort and robust open source competition could lead to a relatively closed ecosystem.

“The basic story there is pretty bad at the moment,” Shuttleworth says. He notes that proprietary infrastructure, hypervisors and even the APIs and ways data is stored can lock folks into one cloud for life. “We need real open alternatives early in the process, making it possible for people to build own cloud infrastructure that responds to the same APIs that Amazon’s do.”

He’s accepted that Amazon Web Services’ APIs for its web services, while not created through an open standards group, have become a de facto standard and said that it’s more efficient to build open-source code around Amazon APIs rather than try to develop new ones for accessing the cloud. Canonical has a partnership agreement with Eucalyptus, which offers open-source software to create an AWS-compatible cloud, where people can use Ubuntu  and Eucalyptus to create their own cloud computing platform. But Shuttleworth would like to see more open-source options other than Eucalyptus  for building out a cloud computing service of your own.

At the platform-as-a-service level, the issue around openness will be around moving data from cloud to cloud easily. There’s room there for an open standard or open databases, he said. But at every level, when considering building a business around open source software, he he believes that “you want a common and clear standard with competing open source versions using that standard.”

That keeps proprietary vendors at bay, and gives the companies building a business around the open-source software a chance to decide where they want to be on the open-to-closed spectrum. But it also introduces the prospect of fragmentation, which we’ll leave for a later post.

Related content from GigaOM Pro (sub req’d):

For Open Cloud Computing, Look Inside Your Data Center
CNN_Big_Tech  Cloud_Computing  Infrastructure  NYT_Enterprise  SYN_Feature_Enterprise  Stacey's_Posts  Web  Canonical  Mark_Shuttleworth  Ubuntu  from google
april 2010 by rahuldave
SpringSource Buys Startup to Scale Messaging in the Cloud
SpringSource, a division of VMware, has purchased the open-source cloud messaging company behind the RabbitMQ software. The value of the deal was undisclosed, but the purchase of Rabbit Technologies Ltd. is yet another effort by VMware to become the operating system for enterprise clouds (GigaOM Pro, sub req’d) and add value to its commoditized hypervisor. It’s also the latest example of a company selling proprietary software buying up an open-source software company aimed at the cloud.

Cloud providers use RabbitMQ to create a messaging server allowing them to quickly manage the flow of messages between applications. It can also be used to notify users of a web service when content on the site has changed, such as when someone posts a Facebook photo and the service sends an email out notifying all a user’s friends.

The RabbitMQ code was created by Cohesive FT and LShift based on the relatively young AMQP standards effort backed by major banks, Cisco and a handful of smaller companies. As hardware is virtualized, translating some of the network equipment like load balancers into software allow services running on the virtualized hardware to scale better. Hopefully we’ll learn more about SpringSource, RabbitMQ and VMware’s plans for becoming the cloud OS when VMware CEO Paul Maritz speaks at our Structure conference in June.

Introduction to AMQP Messaging with RabbitMQ
View more presentations from Dmitriy Samovskiy.

Image courtesy of Flickr user Joshua Davis
CNN_Big_Tech  Cloud_Computing  Infrastructure  NYT_Company_News  SYN_Straight_News  Stacey's_Posts  Startups  RabbitMQ  SpringSource  VMWare  from google
april 2010 by rahuldave
You’ve Got Mail! Amazon Creates Cloud Notification Service
Amazon Web Services has launched its Simple Notification Service (Amazon SNS), which allows developers to create a push notification system for applications. The service allows companies to deliver messages to customers of their applications or even to other applications in a couple of different formats, among them HTTP and email. Amazon SNS could be used for system administrators in an IT department (notifying clients if they’re hitting a certain limit on storage capacity or that latency on their service is too high), or it could be used to build out notifications for mobile applications, such as letting consumers when friends check into a location, or when they have new email.

Developers using the service pay per instance, as with all Amazon cloud products. The price includes a per-request, notification delivery and data transfer fee, but developers can get started with Amazon SNS for free. Each month, Amazon SNS customers get the first 100,000 Amazon SNS Requests, the first 100,000 notifications over HTTP and the first 1,000 notifications over email free. After that, prices range from 6 cents to $2 per 100,000 messages sent for delivery and 8-15 cents per gigabyte of data transferred.

Related GigaOM Pro content (sub. req’d):Report: Delivering Content in the Cloud

Image courtesy of Flickr user Ed Siacoso (aka SC fiasco)
CNN_Big_Tech  Cloud_Computing  Infrastructure  NYT_Company_News  SYN_Straight_News  Stacey's_Posts  Amazon  from google
april 2010 by rahuldave
Does the iPad App Give Rackspace An Advantage?
Rackspace launched an iPad app to manage a cloud infrastructure, one of the first to offer such a service.

Amazon Web Services (AWS) does not have apps for the iPhone nor the iPad platforms. It has historically not offered mobile apps for AWS.

Sponsor

You can still access AWS on the iPad through the Safari browser. But is the experience as rich as what you would have as on a native app?

Mike Mayo built Rackspace's iPad app. He says it is the functionality that gives apps their value. It's evident in both consumer and enterprise apps. Users get a rich user experience. You can see it in the Racskpace cloud app.

Mayo humorously says that the app offers administrators "a life." Meaning that you can go out for dinner without the anxiety of not knowing how the infrastructure is faring. If you see a problem, you can reboot, directly from the device.

The app does have a new service not available on the iPhone version. You can delete your servers on it. Mayo kept the feature off the iPhone due to the concern that it's such a small device, easily left at a bar or restaurant. He feels people are less likely to leave an iPad due to its size. We're not so sure. People leave their laptops behind all the time.

We could go into details about the app and what it offers but Robert Scoble's video does a good job of that.

Mayo is currently developing a Rackpace cloud app for the Android.

Disclosure: RackSpace is a sponsor of ReadWriteCloud's parent site, ReadWriteWeb.

Discuss
Cloud_Computing  from google
april 2010 by rahuldave
Weekly Poll: How Will the iPad Affect Cloud Computing?
It is fairly evident that the iPad and cloud computing are deeply tied to each other. A selection of storage and cloud management apps now are available on the iPad. So, we want to know: "How will the iPad affect cloud computing?"

But before we get to that question, let's take a look at last week's poll. We asked: "Is Oracle a Cloud company?"

Sponsor

We had 125 people reply. In total, 68 respondents said: "None of the above. Oracle, like IBM and Microsoft, is a hybrid, with the majority of its revenue still with its on-premise offerings." Forty-four people agreed that Oracle missed the boat and won't leave its franchise.

This week we want to know:

Take Our Poll class="alignright"survey software

It's evident that there are a number of new issues that bring cloud computing into the discussion. Eddie Dumbill says it is Apple's lack of integration that is the biggest issue. It does not support MobileMe:

"However, the iPad is no more advanced than the iPhone in its cloud integration. I would have loved to have switched on the iPad, keyed in my MobileMe login, and automatically had my email, browser bookmarks, calendar and contacts set up for me, as well as the ability to load in ebooks through my iDisk, and have my photo galleries available.

Instead I was forced through the painfully overloaded iTunes application, and had to tether my device via USB to get all of my content on it. Setting things up was a crazy dance involving configuration in both iTunes and in the iPad's settings panel. To make matters worse, the iPad doesn't want to charge over USB. This means I need to plug it in twice: once to the charger, and then somewhere else to sync. Decent cloud access would have mitigated this a little."

What are your thoughts?

Discuss
Cloud_Computing  from google
april 2010 by rahuldave
EMC’s Crazy Plan to Create a Worldwide Data Cloud
Pat Gelsinger, who moved to EMC late last year after 30 years at Intel, is stirring things up at the storage giant with a plan to virtualize and federate storage so data and compute can truly be linked together (hat tip The Register). The implication of this vision is that organizations will have the ability to keep constantly changing information up to date around the world in real time despite the challenges of moving huge amounts of data over networks that measure data in in gigabytes rather than petabytes.

In a presentation on Thursday, Gelsinger pointed out that compute and storage are rapidly getting better about dealing with more information, while networks  are trying to catch up. “Compute is doubling every two years. Storage doubles every 15 months, and networking is much much much slower, like every four years, so how do you deal with latency bandwidth and consistency?” Gelsinger said.

Gelsinger’s answer is caching. Imagine a two-way content delivery network built on EMC appliances that tracks and replicates changes made to data at one node and then pushes them out to all the other nodes as quickly as possible. Gelsinger calls this freeing the information from physical storage, but it sounds more like making sure your information is in a bunch of different physical storage containers. He mentions EMC’s acquisition of intellectual property from Yotta Yotta as offering the breakthrough required to build this technology.

But at the end of the day, this is all a big if, not an actual product yet.  If EMC can link storage and virtualized machines together, the data center that “follows the sun” — basically moving compute loads around the world where it’s cheapest to run them – or automatic failover for cloud services become possible. However, it will be controlled by a proprietary hardware vendor, which certainly clouds its prospects a bit.
CNN_Big_Tech  Cloud_Computing  Infrastructure  NYT_Company_News  SYN_Straight_News  Stacey's_Posts  innovation  emc  INTC  Intel  VMWare  VMWR  from google
march 2010 by rahuldave
Why Digg Digs Cassandra
Digg, the San Francisco-based social media company, is dropping MySQL and instead betting its future on Cassandra, an open-source data store. It’s just the latest sign of the growing popularity of the software, which was developed (and open sourced) by Facebook to search through its inbox. While Facebook has since backed off Cassandra, Digg plans to open source all its work on Cassandra and champion the software’s development and adoption.

In a blog post on the Digg blog, John Quinn, Digg’s VP of engineering, writes:

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who’s been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.

What’s Wrong with MySQL?

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

Digg is just the latest high-profile convert to the NoSQL world. Instead of using databases such as MySQL, many of the companies that deal in near-real-time information are opting for new kind of data stores — most of them open source, such as Cassandra and CouchDB.

Cassandra is roughly the open-source equivalent of Google’s Big Table. It was intended by Facebook to solve the problem of inbox search; the company needed something that was fast, reliable and had the ability to handle read and write requests at the same time. Messaging in an environment as heavily used as Facebook requires a system that can not only store data but also provide results for search queries at blazing fast speeds.

Stu Hood, the technical lead for the search team in the Email & Apps division of Rackspace, recently said:

I think that distributed databases solve a problem that a lot of companies with large datasets have had to solve independently in the past…Cassandra has an approach that hybridizes the Bigtable and Dynamo models, where a lot of its competitors chose to take one path or the other. Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure (possible because of the eventually consistent approach). When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values.

Data Presentations Cassandra Sigmod
View more presentations from jhammerb.

In a post last year, contributing writer Gary Orenstein pointed out that thanks to these attributes, Cassandra has potential applications beyond inbox search that include “recommendation engines, targeted advertising, and content search, particularly when you combine many concurrent inputs and output requests to the same data set.”

Digg is a prototypical application. The company tells me that it gets:

40 million visitors a month, who in turn account for roughly 500 million page views a month.
20,000 daily submissions

It also generates:

170,000 daily Diggs
19,000 comments

As these numbers suggest, there is a high amount of interaction between the system and its users. No wonder Digg digs Cassandra!

Related content from GigaOM Pro (sub req’d):

What Cloud Computing Can Learn From NoSQL.
@Not_for_Syndication  Cloud_Computing  Infrastructure  Om's_Posts  Cassandara  Digg  MySQL  NoSQL  from google
march 2010 by rahuldave

Copy this bookmark:



description:


tags: