Vaguery + open-science   32

Review of 2011 Data Scientist Summit | (R news & tutorials)
This was the first annual Data Scientist Summit, and I will no doubt be back. With that said, discussion of technical topics had a bit of an introductory flavor to them, which made the discussion of the technology seem dated. For example, “Vanilla” Hadoop was introduced as a tool for processing vast amounts of data. I would expect that most Data Scientists have worked with Hadoop, or at least know what it is. Hadoop is somewhat old news in terms of “cutting-edge technology.” Tools like Pig, Cascalog, HBase, Hive, Cascading, etc. would have been a better discussion topic. I was also disappointed with how little coverage of tools (except for Hadoop, NoSQL, and enterpise databases) there was. It seemed as if R had gone M.I.A. and I was surprised that there was such little discussion of visualization tools like Tableau, Processing, Gephi, D3, Polymaps, etc.
data-science  conference  academic-culture  cultural-assumptions  corporatism  open-science 
may 2011 by Vaguery
Walking Randomly » Natural Scientists: their very big output files – and a tale of diffs
"A few years back, when a user at the University of Manchester asked for help with the ‘diff – files too big/ out of memory’ problem, I wrote a modern version that I called idiffh (for Ian’s diffh). My ground rules were:<br />
Work on any text files on any operating system with a C compilerHave no limits on, e.g., line lengths or file sizeNever ‘give up’ if the going gets tough (i.e. when the files are very different)"
diff  text-mining  dataset  open-science  tools  from delicious
april 2011 by Vaguery
Beekeeper Who Leaked EPA Documents: "I Don't Think We Can Survive This Winter" | Fast Company
""They told me that EPA scientists had reviewed the originally lifecycle study and determined it wasn't scientifically sound, and I asked if it had been documented, if there was a hard copy," he says, "The [employee] said yes, and I asked if I could get a copy." And just like that, he had the proof he needed that the EPA had overlooked something that could be killing America's bees."
astroturf  corporatism  pesticides  ecology  science  open-science  lawsuit 
december 2010 by Vaguery
» Open Data citation advantage Circle of Complexity
"Because sharing data resulted in a citation, I wonder how long will it take for Open Data advocates to start using this “open data citation advantage” as an argument for sharing data?"
citation-etiquette  economics  open-access  open-science  open-data  social-engineering  academic-culture 
august 2010 by Vaguery
Getting Started Guide - Google Prediction API - Google Code
"The Prediction API allows you to get more from your data and makes its patterns more accessible. Specifically, the Prediction API leverages Google's machine learning infrastructure to give you the tools to better analyze your data and reveal patterns that are often difficult to manually discover. The API also enables you to use those patterns to predict new outcomes, which facilitates the development of all types of software, from textual analysis systems to recommendation systems. Because the Prediction API is a RESTful HTTP service, you can easily access it from Google App Engine, Apps Script, and other Internet-connected desktop applications."
nudge  machine-learning  models  google  prediction  clustering  learning-from-data  AI  API  open-science 
may 2010 by Vaguery
An article attacking R gets responses from the R blogosphere – some reflections | (Articles about R)
"But Dr. De Mars post is (very) important for a different reason. Not because her claims are true or false, but because her writing angered people who love and care for R (whether legitimately or not, it doesn’t matter). Anger, being a very powerful emotion, can reveal interesting things. In our case, it just showed that R bloggers are connected to each other."
R  community  open-science  statistics  criticism-is-the-best-medicine 
april 2010 by Vaguery
Deluge of scientific data needs to be curated for long-term use
"Most organizations have serious problems with data management because it's expensive to do systematic curation, which includes documenting the context in which data were generated or derived, including the instruments involved, the protocols and such," Palmer said. "But that also requires caring for the data and making them available to other scientists. It takes serious commitment and investment."
curation  data  data-warehousing  openness  open-science  challenges 
february 2010 by Vaguery
Keeping computers from ending science's reproducibility
"The idea is that the researchers that rely on computational techniques as part of their day-to-day activities need an entire "reproducible research system" that will make it easier for them to document the sources of their data and the analyses performed on it. The system they've designed shares features with rapid application development environments, as it graphically represents modular computational tools, which can be ordered to create an analysis pipeline, and the individual settings for each can be tweaked. Once complete, the user can trigger the analysis to run; the system documents all of the relevant settings and software information."
agility  open-science  reproducibility  academic-culture  academics-shouldn't-design-interfaces  arguments-against-interns 
january 2010 by Vaguery
[0911.0454] The Financial Bubble Experiment: advanced diagnostics and forecasts of bubble terminations
"We continue this protocol until the future date (1 May 2010) at which time we upload our final version of the master document. For this final version, we include the URL of a web site where the .pdf documents of all of our past forecasts can be downloaded and independently checked for consistent MD5 and SHA-2 hashes. For convenience, we will include a summary of all of our forecasts in this final document."
prediction  economics  financial-crisis  finance  science  open-science  competition  public-policy 
december 2009 by Vaguery
About the Open Cloud Consortium
"The Open Cloud Consortium (OCC) is a member driven organization that:

Supports the development of standards for cloud computing and frameworks for interoperating between clouds;
develops benchmarks for cloud computing;
supports reference implementations for cloud computing, preferably open source reference implementations;
manages a testbed for cloud computing called the Open Cloud Testbed;
sponsors workshops and other events related to cloud computing."
cloud-computing  nudge  standards  openness  open-science  grid-computing 
october 2009 by Vaguery
"Essentials of Metaheuristics"
"About the Book: This is an open set of lecture notes on metaheuristics algorithms, intended for undergraduate students, practitioners, programmers, and other non-experts. It was developed as a series of lecture notes for an undergraduate course I taught at GMU. The chapters are designed to be printable separately if necessary. As it's lecture notes, the topics are short and light on examples and theory. It's best when complementing other texts. With time, I might remedy this."
metaheuristics  genetic-programming  book  open-source  open-science  creative-commons  computer-science  search  optimization  genetic-algorithm  stochastic 
august 2009 by Vaguery
Infochimps.org: Free Redistributable Data Sets of Every Kind
"There are many sources to find out something about everything. Until now, there’s been no good place for you to find out everything about something.
The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free allmanac, with tables on everything you can put in a table. Built by data nerds, used by data nerds, it's a central source for the information you need to power the projects the world needs. (learn more: help|faq)"
data  data-analysis  openness  open-science  public-domain  information  visualization  archive  database  free  raw-data-now 
april 2009 by Vaguery
myGrid » What is a workflow?
"In a scientific context what does this mean? The overall project referred to is your analysis. The activities are simple operations within your analysis. All these operations have a certain number of inputs and outputs. In the case of fetching a DNA sequence, an input may be an identifier of the sequence, whilst the output is a string representing the nucleotide sequence represented by this identifier.
The triggering of activities by other activities are where an operation feeds data into a subsequent operation. For example, the ‘fetch sequence’ operation may feed its output (the string containing sequence ‘ACTG’) into a ‘transcribe’ operation. This would subsequently change the DNA sequence into an RNA sequence. We would then have a simple workflow with one operation, and a link, which looks something like the following:..."
open-science  science  collaboration  modeling  work  communication  formalization 
january 2009 by Vaguery
The Back Page
"Wikipedia is a second example where scientists have missed an opportunity to innovate online. Wikipedia has a vision statement to warm a scientist’s heart: “Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.” You might guess Wikipedia was started by scientists eager to collect all of human knowledge into a single source. In fact, Wikipedia’s founder, Jimmy Wales, had a background in finance and as a web developer. In the early days few established scientists were involved. To contribute would arouse suspicion from colleagues that you were wasting time that could be spent writing papers and grants."
openness  open-science  publishing  cultural-norms  collaboration  transparency  wikinomics 
december 2008 by Vaguery
Opinion - My View: What's so wasteful about funding discovery? - sacbee.com
"Not all science needs to have a purpose. The nature of humans is that, sometimes, they simply want to know. Everything else is just a bonus.

Srinivasa Ramanujan and Albert Einstein, the two scientific geniuses of the 20th century, made their earliest discoveries while working as clerks, not as professors working on taxpayer-funded projects; but why risk, in the 21st century, that some diamond might remain forever unearthed for want of a government grant?"
science  politics  academia  basic-science  funding  government  grants  anti-intellectualism  open-science  cultural-norms 
october 2008 by Vaguery
Open Reading Frame
Discovery is the addiction that drives research -- it's the crackpipe hit, the rush, the thrill, that keeps us going through the down times and the plodding; but one of the best ways to alleviate the boredom and despondency that sets in between fixes is t
collaboration  science  open-access  open-science  academia  cultural-norms  learning-by-doing  blogs  community 
july 2007 by Vaguery
Open Reading Frame
"The real killer is ego: what if someone else gets there first?"
open-access  open-science  commentary  academia  cultural-norms  fear-uncertainty-doubt  FUD  blogging  competitiveness 
july 2007 by Vaguery
Open Reading Frame
Catching up on old posts of new-discoverd blog: Open-access peer reviewers' comments. Good idea.
openness  open-science  collaboration  peer-review  academia  publishing  authority  comments 
july 2007 by Vaguery

related tags

academia  academic-culture  academics-shouldn't-design-interfaces  agility  AI  algorithms  analytics  anti-intellectualism  API  archive  arguments-against-interns  astroturf  authority  basic-science  blogging  blogs  book  challenges  chemistry  citation-etiquette  cloud-computing  clustering  collaboration  commentary  comments  communication  community  competition  competitiveness  computer-science  conference  contagion-of-ideas  copyright  corporatism  creative-commons  criticism-is-the-best-medicine  crowdsourcing  cultural-assumptions  cultural-norms  curation  data  data-analysis  data-science  data-warehousing  database  dataset  diff  ecology  economics  experiment  fear-uncertainty-doubt  finance  financial-crisis  formalization  free  free-access  FUD  funding  genetic-algorithm  genetic-programming  GitHub  google  government  grants  grid-computing  html5  information  institutional-design  lawsuit  learning-by-doing  learning-from-data  library  machine-learning  marketing  metaheuristics  modeling  models  nudge  open-access  open-data  open-science  open-source  openness  optimization  peer-review  personal-brand  pesticides  plagiarism  politics  prediction  public-domain  public-policy  publish-or-perish  publishing  R  raw-data-now  reproducibility  research  revolution  scholarship  science  search  social-engineering  social-norms  standards  statistics  stochastic  text-mining  time-series  timeseries  tools  transparency  visualization  wiki  wikinomics  work  writing 

Copy this bookmark:



description:


tags: