My MySQL keynote slides and video
april 2010 by rahuldave
Been asked a few times in the last few days about where my slides are from my MySQL keynote from *last* year.
Ooops.
Um, yeah. Sorry about that. Here’s a link to ‘The SmugMug Tale’ slides, and you can watch the video below:
Sorry for the extreme lag. I suck.
The important highlights go something like this:
Use transactional replication. Without it, you’re dead in the water. You have no idea where a crashed slave was.
Use a filesystem that lets you do snapshots. Easily the best way to do backups, spin up new slaves, etc. I love ZFS. You’ll need transactional replication to really make this painless.
Use SSDs if you can. We can’t afford to be fully deployed on SSDs (terabytes are expensive), but putting them in the write path to lower latency is awesome. The read path might help, too, depending on how much caching you’re already doing. Love hybrid storage pools.
Use Fishworks (aka Open Storage) if you can. The analytics are unbeatable, plus you get SSDs, snapshots, ZFS, and tons of other goodies.
Use transactional replication. This is so important I’m repeating it. Patch it into MySQL (Google, Facebook, and Percona have patches) or use XtraDB if you use replication. We use the Percona patch.
Holler in the comments if something in the presentation isn’t clear, I’ll answer. Apologies again.
Shameless plug - we’re hiring. And it’s a blast.
datacenter
MySQL
facebook
fishworks
flash
google
open_storage
percona
replication
smugmug
ssd
transactional_replication
zfs
from google
Ooops.
Um, yeah. Sorry about that. Here’s a link to ‘The SmugMug Tale’ slides, and you can watch the video below:
Sorry for the extreme lag. I suck.
The important highlights go something like this:
Use transactional replication. Without it, you’re dead in the water. You have no idea where a crashed slave was.
Use a filesystem that lets you do snapshots. Easily the best way to do backups, spin up new slaves, etc. I love ZFS. You’ll need transactional replication to really make this painless.
Use SSDs if you can. We can’t afford to be fully deployed on SSDs (terabytes are expensive), but putting them in the write path to lower latency is awesome. The read path might help, too, depending on how much caching you’re already doing. Love hybrid storage pools.
Use Fishworks (aka Open Storage) if you can. The analytics are unbeatable, plus you get SSDs, snapshots, ZFS, and tons of other goodies.
Use transactional replication. This is so important I’m repeating it. Patch it into MySQL (Google, Facebook, and Percona have patches) or use XtraDB if you use replication. We use the Percona patch.
Holler in the comments if something in the presentation isn’t clear, I’ll answer. Apologies again.
Shameless plug - we’re hiring. And it’s a blast.
april 2010 by rahuldave
Four short links: 17 March 2010
march 2010 by rahuldave
Common MySQL Queries -- a useful reference.
MySociety's Next 12 Months -- two new projects, FixMyTransport and "Project Fosbury". The latter is a more general tool to help people organise their own campaigns for change.
riak -- scalable key-value store with JSON interface. (via joshua on Delicious)
Notes from NoSQL Live Boston -- full of juicy nuggets of info from the NoSQL conference.
databases
events
gov20
mysociety
mysql
nosql
from google
MySociety's Next 12 Months -- two new projects, FixMyTransport and "Project Fosbury". The latter is a more general tool to help people organise their own campaigns for change.
riak -- scalable key-value store with JSON interface. (via joshua on Delicious)
Notes from NoSQL Live Boston -- full of juicy nuggets of info from the NoSQL conference.
march 2010 by rahuldave
Why Digg Digs Cassandra
march 2010 by rahuldave
Digg, the San Francisco-based social media company, is dropping MySQL and instead betting its future on Cassandra, an open-source data store. It’s just the latest sign of the growing popularity of the software, which was developed (and open sourced) by Facebook to search through its inbox. While Facebook has since backed off Cassandra, Digg plans to open source all its work on Cassandra and champion the software’s development and adoption.
In a blog post on the Digg blog, John Quinn, Digg’s VP of engineering, writes:
Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who’s been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.
What’s Wrong with MySQL?
Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.
Digg is just the latest high-profile convert to the NoSQL world. Instead of using databases such as MySQL, many of the companies that deal in near-real-time information are opting for new kind of data stores — most of them open source, such as Cassandra and CouchDB.
Cassandra is roughly the open-source equivalent of Google’s Big Table. It was intended by Facebook to solve the problem of inbox search; the company needed something that was fast, reliable and had the ability to handle read and write requests at the same time. Messaging in an environment as heavily used as Facebook requires a system that can not only store data but also provide results for search queries at blazing fast speeds.
Stu Hood, the technical lead for the search team in the Email & Apps division of Rackspace, recently said:
I think that distributed databases solve a problem that a lot of companies with large datasets have had to solve independently in the past…Cassandra has an approach that hybridizes the Bigtable and Dynamo models, where a lot of its competitors chose to take one path or the other. Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure (possible because of the eventually consistent approach). When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values.
Data Presentations Cassandra Sigmod
View more presentations from jhammerb.
In a post last year, contributing writer Gary Orenstein pointed out that thanks to these attributes, Cassandra has potential applications beyond inbox search that include “recommendation engines, targeted advertising, and content search, particularly when you combine many concurrent inputs and output requests to the same data set.”
Digg is a prototypical application. The company tells me that it gets:
40 million visitors a month, who in turn account for roughly 500 million page views a month.
20,000 daily submissions
It also generates:
170,000 daily Diggs
19,000 comments
As these numbers suggest, there is a high amount of interaction between the system and its users. No wonder Digg digs Cassandra!
Related content from GigaOM Pro (sub req’d):
What Cloud Computing Can Learn From NoSQL.
@Not_for_Syndication
Cloud_Computing
Infrastructure
Om's_Posts
Cassandara
Digg
MySQL
NoSQL
from google
In a blog post on the Digg blog, John Quinn, Digg’s VP of engineering, writes:
Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who’s been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.
What’s Wrong with MySQL?
Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.
Digg is just the latest high-profile convert to the NoSQL world. Instead of using databases such as MySQL, many of the companies that deal in near-real-time information are opting for new kind of data stores — most of them open source, such as Cassandra and CouchDB.
Cassandra is roughly the open-source equivalent of Google’s Big Table. It was intended by Facebook to solve the problem of inbox search; the company needed something that was fast, reliable and had the ability to handle read and write requests at the same time. Messaging in an environment as heavily used as Facebook requires a system that can not only store data but also provide results for search queries at blazing fast speeds.
Stu Hood, the technical lead for the search team in the Email & Apps division of Rackspace, recently said:
I think that distributed databases solve a problem that a lot of companies with large datasets have had to solve independently in the past…Cassandra has an approach that hybridizes the Bigtable and Dynamo models, where a lot of its competitors chose to take one path or the other. Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure (possible because of the eventually consistent approach). When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values.
Data Presentations Cassandra Sigmod
View more presentations from jhammerb.
In a post last year, contributing writer Gary Orenstein pointed out that thanks to these attributes, Cassandra has potential applications beyond inbox search that include “recommendation engines, targeted advertising, and content search, particularly when you combine many concurrent inputs and output requests to the same data set.”
Digg is a prototypical application. The company tells me that it gets:
40 million visitors a month, who in turn account for roughly 500 million page views a month.
20,000 daily submissions
It also generates:
170,000 daily Diggs
19,000 comments
As these numbers suggest, there is a high amount of interaction between the system and its users. No wonder Digg digs Cassandra!
Related content from GigaOM Pro (sub req’d):
What Cloud Computing Can Learn From NoSQL.
march 2010 by rahuldave
related tags
@Not_for_Syndication ⊕ Cassandara ⊕ Cloud_Computing ⊕ databases ⊕ datacenter ⊕ Digg ⊕ events ⊕ facebook ⊕ fishworks ⊕ flash ⊕ google ⊕ gov20 ⊕ Infrastructure ⊕ mysociety ⊕ mysql ⊖ nosql ⊕ Om's_Posts ⊕ open_storage ⊕ percona ⊕ replication ⊕ smugmug ⊕ ssd ⊕ transactional_replication ⊕ zfs ⊕Copy this bookmark: