bpo + paper   34

On system rollback and totalised fields: An algebraic approach to system change
"In system operations the term rollback is often used to imply that arbitrary changes can be reversed i.e. ‘rolled back’ from an erroneous state to a previously known acceptable state. We show that this assumption is flawed and discuss error-correction schemes based on absolute rather than relative change."

"By formulating this problem algebraically, the discussion is distanced from the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible. The discussion about totalisation of fields is particularly useful, as it maps nicely to the flaws in this thinking. To deal with the inverse of a many-to-one map, one must invoke a policy or arbitrary selection."
paper  deployment  cfengine  automation 
february 2012 by bpo
Reverse electrowetting as a new approach to high-power energy harvesting : Nature Communications : Nature Publishing Group
Over the last decade electrical batteries have emerged as a critical bottleneck for portable electronics development. High-power mechanical energy harvesting can potentially provide a valuable alternative to the use of batteries, but, until now, a suitable mechanical-to-electrical energy conversion technology did not exist. Here we describe a novel mechanical-to-electrical energy conversion method based on the reverse electrowetting phenomenon. Electrical energy generation is achieved through the interaction of arrays of moving microscopic liquid droplets with novel nanometer-thick multilayer dielectric films. Advantages of this process include the production of high power densities, up to 103 W m−2; the ability to directly utilize a very broad range of mechanical forces and displacements; and the ability to directly output a broad range of currents and voltages, from several volts to tens of volts. These advantages make this method uniquely suited for high-power energy harvesting from a wide variety of environmental mechanical energy sources.
cyborg  nature  paper  sensor 
september 2011 by bpo
[1107.3689] Edit wars in Wikipedia
We present a new, efficient method for automatically detecting severe conflicts `edit wars' in Wikipedia and evaluate this method on six different language WPs. We discuss how the number of edits, reverts, the length of discussions, the burstiness of edits and reverts deviate in such pages from those following the general workflow, and argue that earlier work has significantly over-estimated the contentiousness of the Wikipedia editing process.
wikipedia  language  analysis  paper 
august 2011 by bpo
Cooley Venture Financing Report: Q2 2011 - A Robust Financing Environment
After seeing a slow and uneven recovery in general financing trends in late 2010 and Q1 2011, the second quarter of 2011 produced significant increases in both deal volume and dollars raised. In Q2 2011, we handled 112 deals representing more than $2 billion in invested capital, a dollar amount not seen since 2006. Additionally, the percentage of up rounds increased in Q2 2011 to approximately 72% of deals. Median pre-money valuations for all deal stages also increased in Q2. Most significantly, average valuations for Series A deals rose to $8 million, a level not seen since Q3 2010. In another signal of a strong environment, we saw a sharp increase in the number of deals with a pre-money valuation greater than $100 million from the prior quarter.

Second quarter deal terms also pointed to increased optimism on the part of investors. Liquidation preferences of greater than 1x decreased in all financing stages. Additionally, we observed a continuing trend of decreases in percentage of deals with participating preferred provisions. This decrease was most striking in Series B deals during the quarter.

Deal terms in other areas painted a mixed environment. We observed a decrease in the percentage of deals utilizing pay-to-play provisions to the lowest level seen since Q4 2007. This decrease was most pronounced in Series C deals. However, we also witnessed an increase in both tranched deals and the use of drag-alone provisions from the prior quarter.
startup  funding  paper 
august 2011 by bpo
Priority Mechanisms for OLTP and Transactional Web Applications
Transactional workloads are a hallmark of modern OLTP and Web applications, ranging from electronic commerce and banking to online shopping. Often, the database at the core of these applications is the performance bottleneck. Given the limited resources available to the database, transaction execution times can vary wildly as they compete and wait for critical resources. As the competitor is “only a click away,” valuable (high-priority) users must be ensured consistently good performance via QoS and transaction prioritization.

This paper analyzes and proposes prioritization for transactional workloads in conventional DBMS. This work first conducts a detailed bottleneck analysis of resource usage by transactional workloads on commercial and noncommercial database systems (IBM DB2, PostgreSQL, Shore) under a variety of configurations. Our first contribution is a demonstration that for TPC-C workloads, under all of the above DBMS, transaction execution times are dominated by time spent waiting on locks, whereas for TPC-W workloads, CPU largely dominates transaction execution times. The second component of this work is an implementation and evaluation of several preemptive and non-preemptive prioritization algorithms in PostgreSQL and Shore. The primary contribution is a demonstration that transaction prioritization can provide 3x improvement for high-priority transactions in generalpurpose DBMS. Furthermore, despite evaluating a widerange of scheduling algorithms, we find that particularly simple scheduling policies are most effective in improving high-priority without significantly penalizing low-priority transactions.
postgres  performance  analysis  paper 
june 2011 by bpo
SSRN-Detecting Deceptive Discussions in Conference Calls by David Larcker, Anastasia Zakolyukina
We estimate classification models of deceptive discussions during quarterly earnings conference calls. Using data on subsequent financial restatements (and a set of criteria to identify especially serious accounting problems), we label the Question and Answer section of each call as "truthful" or "deceptive". Our models are developed with the word categories that have been shown by previous psychological and linguistic research to be related to deception. Using conservative statistical tests, we find that the out-of-sample performance of the models that are based on CEO or CFO narratives is significantly better than random by 4%- 6% (with 50% - 65% accuracy) and provides a significant improvement to a model based on discretionary accruals and traditional controls. We find that answers of deceptive executives have more references to general knowledge, fewer non-extreme positive emotions, and fewer references to shareholders value and value creation.
linguistics  finance  paper  psychology  research 
september 2010 by bpo
Jeff's Search Engine Caffè: SIGIR 2010 Workshops: CrowdSourcing for Search Evaluation
A main highlight was the CrowdFlower keynote:
Better Crowdsourcing through Automated Methods for Quality Control
CrowdFlower provides commercial support for companies performing tasks on Mechanical Turk. Everyone had great things to say about this talk that kept people enthralled even though it was the end of the day; some said it was the best talk of the conference.
research  crowdsourcing  crowdflower  paper  conference 
july 2010 by bpo
Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge
Freebase is a practical, scalable tuple database used to struc- ture general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Free- base currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP- based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.
paper  freebase  database 
july 2010 by bpo
The Anatomy of a Large-Scale Human Computation Engine
In this paper we describe RABJ (Redundant Array of Brains in a Jar), an engine designed to simplify collecting human input. We have used RABJ to collect over 2.3 million human judgments to augment data mining, data entry, data validation and curation tasks at Freebase over the course of a year. We illustrate several successful applications that have used RABJ to collect human judgment. We describe how the architecture and design decisions of RABJ are affected by the constraints of content agnosticity, data freshness, latency and visibility. We present work aimed at increasing the yield and reliability of human computation efforts. Finally, we discuss empirical observations and lessons learned in the course of a year of operating the service.
paper  computing 
july 2010 by bpo
Why Events are a Bad Idea (for high-concurrency servers)
Event-based programming has been highly touted in recent years as the best way to write highly concurrent applications. Having worked on several of these systems, we now believe this approach to be a mistake. Specifically, we believe that threads can achieve all of the strengths of events, including support for high concurrency, low overhead, and a simple concurrency model. Moreover, we argue that threads allow a simpler and more natural programming style.
architecture  concurrency  events  scalability  threads  paper 
april 2010 by bpo
Financial Incentives and the 'Performance of Crowds' | Yahoo! Research
"increased financial incentives increase the quantity, but not the quality, of work performed by participants, where the difference appears to be due to an “anchoring” effect: workers who were paid more also perceived the value of their work to be greater, and thus were no more motivated than workers paid less. In contrast with compensation levels, we find the details of the compensation scheme do matter—specifically, a “quota” system results in better work for less pay than an equivalent “piece rate” system. Although counterintuitive, these findings are consistent with previous laboratory studies, and may have real-world analogs as well"
mechanicalturk  crowdsourcing  research  paper 
february 2010 by bpo
Piece Rate Pay Design
Individual incentive plans offer the clearest link between a worker’s effort and the reward. Probably the best known individual or small group incentive pay plan is piece rate. Piece rate is more suited to repetitive crew work (e.g., boysenberry picking, vineyard pruning) than to precision planting, fertilizing, or irrigating. As the tie between individual work and results is diminished, so is the motivating effect of the incentive on the individual. My on-going research on piece-rate pay spans over two decades, beginning in 1985. The objective of this paper is to summarize much of this work, and give clear and precise suggestions for the effective design of piece rate pay. A number of serious challenges that threaten the effectiveness of this pay method are also included. While my work has been primarily in agriculture, the principles can be easily adapted to other types of work.
crowdsourcing  research  paper 
december 2009 by bpo
Understanding scam victims: seven principles for systems security - Powered by Google Docs
The success of many attacks on computer systems can be traced back to the security engineers not understanding the psychology of the system users they meant to protect. We examine a variety of scams and “short cons” that were investigated, documented and recreated for the BBC TV programme The Real Hustle and we extract from them some general principles about the recurring behavioural patterns of victims that hustlers have learnt to exploit.
We argue that an understanding of these inherent “human factors” vulnerabilities, and the necessity to take them into account during design rather than naïvely shifting the blame onto the “gullible users”, is a fundamental paradigm shift for the security engineer which, if adopted, will lead to stronger and more resilient systems security.
paper  security  toread 
december 2009 by bpo
The R Journal
The R Journal is the refereed journal of the R project for statistical computing.
stats  r  math  paper 
december 2009 by bpo
Bytecodes meet Combinators: invokedynamic on the JVM
The Java Virtual Machine (JVM) has been widely adopted in part because of its classfile format, which is portable, compact, modu- lar, verifiable, and reasonably easy to work with. However, it was designed for just one language—Java—and so when it is used to express programs in other source languages, there are often “pain points” which retard both development and execution. The most salient pain points show up at a familiar place, the method call site.
To generalize method calls on the JVM, the JSR 292 Expert Group has designed a new invokedynamic instruction that pro- vides user-defined call site semantics. In the chosen design, invokedynamic serves as a hinge-point between two coexisting kinds of intermediate language: bytecode containing dynamic call sites, and combinator graphs specifying call targets. A dynamic compiler can traverse both representations simultaneously, pro- ducing optimized machine code which is the seamless union of both kinds of input.
paper  programming  language  jvm 
november 2009 by bpo
Next Generation Connectivity: A review of broadband Internet transitions and policy from around the world
"The most surprising finding in our analysis is that open access policies contributed to the success of many of the highest performers during the first broadband transition, and as a result are now at the core of future planning processes in Europe and Japan. Contrary to perceptions in the United States, there is extensive evidence to support the position, adopted almost universally by other advanced economies, that open access policies, where undertaken with serious regulatory engagement, contributed to broadband penetration, capacity, and affordability in the first generation of broadband. We review the evidence here at length."
web  technology  research  internet  paper  culture 
october 2009 by bpo
Maximum Likelihood Estimation of Observer Error-rates using the EM Algorithm
In compiling a patient record many facets are subject to errors of measurement. A model is presented which allows individual error-rates to be estimated for polytomous facets even when the patient's "true" response is not available. The EM algorithm is shown to provide a slow but sure way of obtaining maximum likelihood estimates of the parameters of interest. Some preliminary experience is reported and the limitations of the method are described.
em  algorithms  stats  mechanicalturk  paper 
september 2009 by bpo
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof ) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon’s Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results... The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
paper  mechanicalturk  stats  data 
september 2009 by bpo
Why Events Are A Bad Idea for High Concurrency Servers
Event-based programming has been highly touted in recent years as the best way to write highly concurrent applications. Having worked on several of these systems, we now believe this approach to be a mistake. Specifically, we believe that threads can achieve all of the strengths of events, including support for high concurrency, low overhead, and a simple concurrency model. Moreover, we argue that threads allow a simpler and more natural programming style.
programming  paper  performance  concurrency  threads 
august 2009 by bpo
Experimenting on Mechanical Turk: 5 How Tos - PARC blog
Performing human-subjects experiments on Amazon Mechanical Turk offers many benefits, including very low experiment costs, quick turn-around rates, and relatively simple approvals from human subjects boards. But you have to be careful to avoid bias and error; we describe some techniques...
mechanicalturk  paper 
july 2009 by bpo
Phosphorous, the Popular Lisp
We present Phosphorous; a programming language that draws on the power and elegance of traditional Lisps such as Common Lisp and Scheme, yet which brings those languages into the 21st century by ruthless application of our “popular is better” philosophy into all possible areas of programming language design.
programming  language  funny  lisp  java  paper 
july 2009 by bpo
Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria
In this paper, we consider the difficult problem of classifying sentiment in political blog snippets. Annotation data from both expert annotators in a research lab and non-expert annotators recruited from the Internet are examined. Three selection criteria are identified to select high-quality annotations: noise level, sentiment ambiguity, and lexical uncertainty. Analysis confirm the utility of these criteria on improving data quality. We conduct an empirical study to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.
crowdsourcing  paper  nlp  mechanicalturk 
june 2009 by bpo
Supervised Learning from Multiple Experts: Whom to Trust when Everyone Lies a Bit
We describe a probabilistic approach for supervised learning when we have multiple experts/annotators providing (possibly noisy) labels but no absolute gold standard. The proposed algorithm evaluates the different experts and also gives an estimate of the actual hidden labels. Experimental results indicate that the proposed method is superior to the commonly used majority voting baseline.
crowdsourcing  mechanicalturk  paper 
june 2009 by bpo
The Slab Allocator: An Object-Caching Kernel Memory Allocator - Bonwick (ResearchIndex)
Abstract: This paper presents a comprehensive design overview of the SunOS 5.4 kernel memory allocator. This allocator is based on a set of object-caching primitives that reduce the cost of allocating complex objects by retaining their state between uses.
kernel  memory  paper  solaris  linux  programming  memcache 
january 2008 by bpo
Empirical Analysis of Predictive Algorithms for Collaborative Filtering
An early example of model-based algorithms for CF, which proposes cluster models and Bayesian models.
cf  paper  algorithms  model 
october 2007 by bpo
Bootstrap Institute: Engelbart papers
Articles by Douglas C. Engelbart available in HTML
computing  engelbart  research  programming  history  toread  paper 
february 2007 by bpo
Papers... Your personal library of science
"Papers will revolutionize the way you deal with scientific papers. Search for papers using PubMed, directly retrieve and archive PDFs, and read and study them all from within Papers, your personal library of Science."
software  osx  mac  science  paper 
february 2007 by bpo

Copy this bookmark:



description:


tags: