mcroydon + distributed   132

Google: Achieving Rapid Response Times in Large Online Services
"Today’s large-scale web services provide rapid responses to interactive requests by applying large amounts of computational resources to massive datasets. They typically operate in warehouse-sized datacenters and run on clusters of machines that are shared across many kinds of interactive and batch jobs. As these systems distribute work to ever larger numbers of machines and sub-systems in order to provide interactive response times, it becomes increasingly difficult to tightly control latency variability across these machines, and often the 95%ile and 99%ile response times suffer in an effort to improve average response times."
dist  google  latency  performance  scaling  distributed  architecture 
20 days ago by mcroydon
Apache Kafka
Design doc pointed out by Eric but interesting project as well.
analysis  distributed  logging  messaging 
november 2011 by mcroydon
lusis/Noah - GitHub
A Zookeeper-like system built with Ruby and Redis.
configuration  distributed  zookeeper 
june 2011 by mcroydon
Introducing Doozer
Looks like an interesting lower level replacement for things like zookeeper.
distributed 
april 2011 by mcroydon
About – beanstalkd
High performance but really light weight queue provider. I wrote a backend for it in queues but keep forgetting about it.
beanstalkd  distributed  messaging  queue  ruby 
january 2011 by mcroydon
OpenTSDB - A Distributed, Scalable Monitoring System
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.
analysis  architecture  bigdata  cloud  data  database  db  java  lgpl  hbase  hadoop  development  graph  distributed  monitoring  nosql  opensource  operations  scalability  scale  time  sysadmin  software  storage  series  opentsdb  rrd  stumbleupon  time-series  timeseries 
november 2010 by mcroydon
s4: distributed stream computing platform
"S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data."
apache  bigdata  cloud  cloudcomputing  cluster  computing  mapreduce  map  java  hadoop  framework  distributed  data  opensource  processing  platform  programming  real-time  streaming  stream  software  scalability  reduce  realtime  streamprocessing  yahoo  tool  s4  streams 
november 2010 by mcroydon
SHARD Triple-Store
"SHARD is a proof-of-concept use of high-performance, low-cost distributed computing technology to develop a highly scalable triple-store. SHARD is released as an open-source project on the BSD license."
database  db  cloud  distributed  hadoop  lubm  mapreduce  rdf  store  sparql  storage  shard  semweb  semanticweb  scalability  triple-store 
october 2010 by mcroydon
cages - Project Hosting on Google Code
"Cages is a Java library of distributed synchronization primitives that uses the Apache ZooKeeper system. If you can run a ZooKeeper machine or cluster, then you can use Cages to synchronize and coordinate data access, data manipulation and data processing, configuration change and more esoteric things like cluster membership across multiple machines."
dev  development  distributed  java  library  lock  opensource  zookeeper  synchronization  programming 
may 2010 by mcroydon
Principles for Standardized REST Authentication - O'Reilly Broadcast
I want to live in this fantasty world where RESTful authentication isn't so hard or repetitive.
api  architecture  auth  authentication  dev  cloud  development  restful  rest  read  programming  patterns  oauth  http  distributed  security  soa  soap  toread  webservices 
may 2010 by mcroydon
nvie.com » Blog Archive » A successful Git branching model
I really like the ability to push out hotfixes that this allows, though I'd probably squash master and develop to be the same branch.
agile  article  branch  branches  branching  deploy  deployment  git  environment  distributed  dvcs  development  dev  merge  model  programming  reference  scm  vcs  tutorial  tips  subversion  strategy  sourcecontrol  management  software  version-control  versioning  workflow  versioncontrol 
march 2010 by mcroydon
Cassandra Internals – Reading
The companion to his writing-oriented Cassandra tour.
architecture  cassandra  database  data  distributed  internals  java  nosql  sysadmin 
march 2010 by mcroydon
Lineland
Scroll through for lots and lots of HBase internals.
blog  distributed  hadoop  hbase  nosql  mapreduce  programming  systems  storage  reference 
march 2010 by mcroydon
cloudkick | blog: 4 Months with Cassandra, a love story
A very interesting look at Cassandra with an eye toward gotchas. Cloudkick are doing some interesting stuff with aggregation over time periods.
admin  via:jacobian  administration  architecture  article  cassandra  database  databases  opensource  nosql  mysql  monitoring  django  distributed  db  datawarehouse  python  scalability  scaling  storage  toread  webdev  programming  cloudkick  neat 
march 2010 by mcroydon
Features — execnet v1.0.5 documentation
"execnet provides carefully tested means to easily interact with Python interpreters across version, platform and network barriers." Data structure interop between CPython, Jython, and PyPy.
programming  python  software  development  code  library  opensource  network  distributed  framework  cluster  deployment  module  parallel  foss  pycon2010  cross  interpreter  ipc 
february 2010 by mcroydon
The Basho Blog: Why Vector Clocks are Easy
Straightforward but very powerful message/value versioning conflict avoidance. This reminds me of git in a way since to avoid conflicts each message bust contain all predecessors in its vector mask.
programming  toread  tutorial  scalability  distributed  algorithm  concurrency  event  nosql  vector  versioning  via:chl  clock  dist  riak  basho  vectorclocks  distributed_systems  vector-clocks  clocks  vectorclock 
january 2010 by mcroydon
Erlang for Skeptics rev 22
This seems like a more gentle introduction than Erlang than most, taking advantage of the Erlang shell.
programming  software  development  howto  tutorial  erlang  free  book  books  distributed  language  ebook  tutorials  guide  concurrency  computerscience  ebooks  languages  read  concurrent  debug  function 
november 2009 by mcroydon
Avro: a Format for Big Data » Cloudera Hadoop & Big Data Blog
Another data interchange format (I think) like ProtocolBuffers and Thrift. I think one of the bigger problems that the Hadoop/big data community has is parallel internal implementations of building blocks that are later open-sourced.
data  database  storage  distributed  hadoop  apache  cloud  json  messaging  encoding  protocol  portable  cloudera  bigdata  data-structures  serialization  format  foss  thrift  buffers  introduction  avro 
november 2009 by mcroydon
Journal of Eivind Uggedal: NoSQL East 2009 - Summary of Day 1
Some interesting bits and more of the same but I really like the dark-launch approach that Scribe allows.
data  database  toread  blog  scalability  internet  distributed  article  hadoop  scaling  db  cloud  couchdb  conference  papers  keyvalue  nosql  links  cassandra  2009  mongodb  dynomite  riak 
november 2009 by mcroydon
LucidDB Home Page
"LucidDB is the first and only open-source RDBMS purpose-built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multiversioning."
programming  software  development  database  data  business  opensource  java  scalability  storage  distributed  cluster  databases  sql  db  datamining  olap  columndb  bi  datawarehouse  dbms  reporting  rdbms  luciddb  column  warehousing  column-store  data_warehouse  column-oriented  dwh 
october 2009 by mcroydon
Geeking with Greg: Advice from Google on large distributed systems
With links to slides from LADIS '09. This includes a refresh and update about how GFS, MapReduce, etc are working in Google's fault-filled environment.
programming  google  blog  scalability  storage  performance  architecture  distributed  advice  scaling  infrastructure  systems  datacenters 
october 2009 by mcroydon
Why I like Redis
Redis is indeed awesome (and a little different) due to its support for rich primitive types.
python  programming  data  storage  database  dev  distributed  article  databases  cache  db  memcached  caching  articles  convert  cli  nosql  experiments  redis  schemaless  repl 
october 2009 by mcroydon
Riak - A Decentralized Database
"Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications." Erlang under the hood.
programming  web  development  key-value  database  webdev  opensource  erlang  storage  scalability  distributed  rest  databases  http  mapreduce  json  db  couchdb  store  kvstore  datastore  keyvalue  nosql  document  cloudcomputing  riak  decentralized  basho  documentoriented  key-value-store 
october 2009 by mcroydon
NASA NEBULA - Cloud Computing
It looks like they've selected some good tech including Eucalyptus, Solr, and Varnish.
research  opensource  internet  distributed  computing  cloud  space  virtualization  nasa  computerscience  cloudcomputing  enterprise  platform  computer_science  capacity  nebula 
july 2009 by mcroydon
PubSub-over-Webhooks with RabbitHub « LShift Ltd.
"RabbitHub is our implementation of PubSubHubBub, a straightforward pubsub layer on top of plain old HTTP POST — pubsub over Webhooks."
programming  web  software  toread  erlang  opensource  tech  cool  distributed  http  rest  review  queue  rabbitmq  subscribe  po  pubsub  amqp  webhooks  pubsubhubbub 
july 2009 by mcroydon
Coding Horror: Scaling Up vs. Scaling Out: Hidden Costs
Food for thought with the caveat that scaling out is a lot easier if you don't have any per-server software costs. Big iron costs less to operate though.
programming  hardware  business  server  scalability  coding  networking  architecture  performance  distributed  scaling  web-development  cluster  hadoop  hosting  clustering  comparison  servers  distribution  it  codinghorror  2009  stackoverflow 
june 2009 by mcroydon
XCPU Project - Home
"The XCPU project comprises of a suite of tools for cluster management. It includes utilities for spawning jobs, management of cluster resources, scalable distribution of boot images across a cluster as well as tools for creation and control of virtual machines in a cluster environment."
plan9  distributed  computing  distributed-computing  clusters  linux 
june 2009 by mcroydon
katta - distributed lucene
"Katta serves large, replicated, Lucene indexes as shards to serve high loads and very large data sets."
software  development  java  search  scalability  performance  scaling  distributed  apache  hadoop  clustering  cloud  grid  lucene  tool  indexing  project  ir  searchengine  information-retrieval  dist  package  hdfs  katta 
june 2009 by mcroydon
Cloudera's Distribution for Hadoop | Cloudera
Includes lots of feature tickets that are pretty stable but not yet in a Hadoop release. It reminds me a lot of Debian unstable or Ubuntu a month or so before release. Good stuff indeed. Includes RPM and APT package management options.
software  data  linux  google  search  aws  distributed  computing  ec2  hadoop  cloud  mapreduce  cloudcomputing  distribution  clusters  packaging  cloudera  cloud-computing  rpm  apt 
june 2009 by mcroydon
Are Cloud Based Memory Architectures the Next Big Thing? | High Scalability
Quite a long and thoughtful post, worth skimming and pondering at the very least. This post is a little too enterprisey and a little less startups in the trenches, but still worth thinking about.
programming  database  tools  scalability  storage  architecture  distributed  performance  clustering  collaboration  memcached  grid  cloud  caching  db  concurrency  articles  communication  cloudcomputing  semanticweb 
march 2009 by mcroydon
« earlier      

related tags

2read  @toread  aa  acm  actionscript  activemq  admin  administration  advice  agile  ai  ajax  algorithm  algorithms  amazon  amqp  analysis  analytics  apache  api  app  apple  application  apt  arch  architecture  article  articles  asynchronous  atom  auth  authentication  availability  avro  aws  backup  balancer  balancing  bash  basho  bashreduce  batch  beanstalk  beanstalkd  benchmark  benchmarks  berkeley  bi  bigdata  bigtable  blog  book  books  branch  branches  branching  buffers  business  c  c++  cache  caching  caffeine  cap  capacity  cassandra  celery  chart  chat  cli  client  clock  clocks  cloud  cloud-computing  cloudcomputing  cloudera  cloudkick  cluster  clustering  clusters  code  coding  codinghorror  collaboration  collection  colossus  column  column-oriented  column-store  columndb  comet  comment  commercial  communication  compare  comparison  components  compression  compsci  computer  computer-science  computers  computerscience  computer_science  computing  concurrence  concurrency  concurrent  condor  conference  configuration  consistency  consistent  convert  cool  corba  couchdb  course  cron  cross  daemon  data  data-mining  data-structures  data-warehousing  database  databases  datacenter  datacenters  dataflow  datamining  dataprocessing  datastore  datastructures  datawarehouse  data_mining  data_warehouse  db  dbms  debug  decentralized  deploy  deployment  design  desktop  dev  developer  developers  developerworks  development  dht  dictionary  diff  differences  digg  dist  distributed  distributed-computing  distributedcomputing  distributed_systems  distribution  django  django-apps  dns  document  documentation  documentoriented  draft  dvcs  dw  dwh  dynamo  dynomite  ebook  ebooks  ebs  ec2  editing  education  elastic  elasticmapreduce  elasticsearch  emr  encoding  engine  engineering  enterprise  environment  erlang  event  example  experience  experiments  facebook  fast  file  filesystem  flash  flex  format  foss  framework  frameworks  free  freebase  fulltext  function  geek  gfs  gfs2  git  good  google  googlewave  graph  graphd  graphs  grid  gridcomputing  groovy  gui  guide  ha  hack  hacks  hadoop  hadoopdb  happy  hardware  hash  hashing  haskell  hbase  hdfs  high-availability  hive  hop  hosting  how-to  howto  hpc  htc  http  httpd  hypertable  ibm  implementation  important  index  indexing  info  information-retrieval  infrastructure  inspiration  integration  intelligence  interchange  interesting  internals  internet  interpreter  introduction  ipc  ir  irc  issues  it  jabber  java  javascript  job  jobs  json  jython  katta  key  key-value  key-value-store  keystore  keyvalue  knowledge  kvs  kvstore  language  languages  last.fm  latency  lecture  lectures  leopard  lgpl  lib  libevent  library  license:gpl  lightcloud  linkedin  links  linux  list  load  load-balancing  loadbalancing  load_balancing  lock  logging  logs  lua  lubm  lucene  luciddb  machinelearning  management  map  map-reduce  mapreduce  memcache  memcached  memcachedb  memcacheq  memory  merge  message  messagequeue  messaging  metaweb  middleware  mining  mit  ml  moa  mod  model  module  mod_backhand  mogilefs  mongo  mongodb  monitor  monitoring  mq  mrjob  multicore  multiprocessing  mysql  nasa  neat  nebula  network  networking  news  nlp  nokia  nosql  notes  oauth  olap  online  ontology  open  open-source  opensource  opentsdb  operational  operationaltransform  operationaltransformation  operations  optimization  os  oss  osx  ot  overview  p2p  package  packaging  paper  papers  parallel  parsing  partition  patterns  paxos  pdf  performance  php  pig  plan9  platform  plurk  po  portable  post  postgres  postgresql  presentation  processing  production  programming  programming-languages  project  protocol  protocolbuffers  protocols  proxy  pubsub  pubsubhubbub  pycon2008  pycon2010  python  query  queue  queues  queuing  rabbitmq  rackspace  raid  rails  rant  rdbms  rdf  read  reading  readlater  real-time  realtime  reddit  redis  reduce  reference  regression  relationaldb  remote  repl  replication  reporting  repository  research  resources  rest  restful  review  riak  ringo  rpm  rrd  rsync  ruby  ruby-on-rails  s3  s4  saas  scala  scalability  scalaris  scale  scaling  schemaless  scm  script  scripting  search  search-engine  searchengine  security  semantic  semanticweb  semantic_web  semweb  serialization  series  server  servers  service  shard  sharding  shell  simpledb  simulation  slides  soa  soap  social  socialmedia  social_networking  socket  sockets  software  solr  solution  source  sourcecontrol  space  sparql  spec  specification  sql  sqs  stack  stackless  stackoverflow  standards  storage  store  strategy  stream  streaming  streamprocessing  streams  stumbleupon  subscribe  subversion  sync  synchronization  syndication  sysadmin  system  systems  table  task  tasks  tcp  teaching  tech  technology  testing  text  text-mining  textmining  theorem  threading  thrift  time  time-series  timeseries  tips  to-read  todo  tokyo-cabinet  tokyo-tyrant  tokyocabinet  tokyotyrant  tool  tools  toread  to_read  transform  transformation  trends  triple  triple-store  triplestore  tuple  tuples  tuplespace  tutorial  tutorials  twisted  twitter  ui  unix  uri  url  utility  value  vcs  vector  vector-clocks  vectorclock  vectorclocks  version-control  versioncontrol  versioning  via:chl  via:jacobian  via:jkokerhans  via:pskomoroch  via:simonw  video  videos  virtualization  vldb  vm  voldemort  vs  warehouse  warehousing  wave  web  web-development  web-services  web2.0  webdesign  webdev  webhooks  webservices  weka  wiki  windows  work  workflow  wsgi  xml  yahoo  yale  yelp  zookeeper  zynga 

Copy this bookmark:



description:


tags: