rahuldave + opensource   8

Four short links: 2 May 2012
Punting on SxSW (Brad Feld) -- I came across this old post and thought: if you can make money by being a dick, or make money by being a caring family person, why would you choose to be a dick? As far as I can tell, being a dick is optional. Brogrammers, take note. Be more like Brad Feld, who prioritises his family and acts accordingly.
Probabilistic Structures for Data Mining -- readable introduction to useful algorithms and datastructures showing their performance, reliability, and resources trade-off. (via Hacker News)
Dataset -- a Javascript library for transforming, querying, manipulating data from different sources.
Many HTTPS Servers are Insecure -- 75% still vulnerable to the BEAST attack.
algorithms  bigdata  bradfeld  cs  culture  javascript  machinelearning  math  opensource  security  ssl  worklifebalance  from google
29 days ago by rahuldave
Profile of the Data Journalist: The Human Algorithm
Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Ben Welsh (@palewire) is an Web developer and journalist based in Los Angeles. Our interview follows.


Where do you work now? What is a day in your life like?

I work for the Los Angeles Times, a daily
newspaper and 24-hour Web site based in Southern California. I'm a member
of the Data Desk, a team of reporters and
Web developers that specializes in maps, databases, analysis and
visualization. We both build Web applications and conduct analysis for
reporting projects.

I like to compare The Times to a factory, a factory that makes information.
Metaphorically speaking, it has all sorts of different assembly lines. Just
to list a few, one makes beautifully rendered narratives, another makes battleship-like investigative projects.

A typical day involves juggling work on difference projects, mentally
moving from one assembly line to the other. Today I patched an embryonic open-source release, discussed our next move on a pending public records request, guided the real-time publication of results from the GOP primaries in Michigan and Arizona, and did some preparation for how we'll present a larger dump of results on Super Tuesday.

How did you get started in data journalism? Did you get any special
degrees or certificates?

I'm thrilled to see new-found interest in "data journalism" online. It's
drawing young, bright people into the field and involving people from
different domains. But it should be said that the idea isn't new.

I was initiated into the field as a graduate student at the Missouri School
of Journalism. There I worked at the National Institute for Computer-Assisted Reporting , also known as NICAR. Decades before anyone called it "data journalism," a disparate group of misfit reporters discovered that the data analysis made possible by computers enabled them to do more powerful investigative reporting. In 1989, they founded NICAR, which has, for decades, been training data skills
to journalists and nurtured a tribe of journalism geeks. In the time since, computerized data analysis has become a dominant force in investigative reporting, responsible for a large share of the field's best work.

To underscore my point, here's a 1986 Time magazine article about how
"newsmen are enlisting the machine."

Did you have any mentors? Who? What were the most important resources they
shared with you?

My first journalism job was in Chicago. I got a gig working for two great people there, Carol Marin and Don Moseley, who have spent most of their careers as television journalists. I worked as their assistant. Carol and Don are warm people who are good teachers, but they are also excellent at what they do. There was a moment when I realized, "Hey, I can do this!" It wasn't just something I heard about in class, but I could actually see myself doing.

At Missouri, I had a great classmate named Brian
Hamman, who is now at the New York Times. I remember seeing how invested Brian was in the Web, totally committed to Web development as a career path. When an opportunity opened up to be a graduate assistant at NICAR, Brian encouraged me to pursue it. I learned enough SQL to help do farmed-out investigative work for TV stations. And, more importantly, I learned that if you had technical skills you could get the job to work on a cool story.

After that I got a job doing data analysis at the Center for Public Integrity in Washington DC. I had the opportunity to work on investigative projects, but also the chance to learn a lot of computer programming along the way. I had the guidance of my talented coworkers, Daniel Lathrop, Agustin Armendariz, John Perry, Richard Mullins and Helena Bengtsson. I learned that computer programming wasn't impossible. They taught me that if you have a manageable task, a few friends to help you out and a door you can close, you can figure out a lot.

What does your personal data journalism "stack" look like? What tools
could you not live without?

I do my daily development in gedit text editor, Byobu's slick implementation of the screen terminal and the Chromium browser. And, this part may be hard to believe, but I love Ubuntu
Unity. I don't understand what everybody is complaining about.

I do almost all of my data management in the Python Web development
framework Django and
PostgreSQL's database, even if
the work is an exploratory reporting project that will never be published. I find that the structure of the framework can be useful for organizing just about any data-driven project.

I use GitHub for both version-control and
project management. Without it, I'd be lost.

What data journalism project are you the most proud of working on or
creating?

As we all know, there's a lot of data out there. And, as anyone who works
with it knows, most of it is crap. The projects I'm most proud of have
taken large, ugly data sets and refined them into something worth knowing:
a nut graf in an investigative story, or a
data-driven app that gives the reader some new
insight into the world around them. It's impossible to pick one. I like to
think the best is still, as they say in the newspaper business,
TK.

Where do you turn to keep your skills updated or learn new things?

Twitter is a great way to keep up with what is getting other programmers excited. I know a lot of people find social media overwhelming or distracting, but I feel plugged in and inspired by what I find there. I wouldn't want to live without it.

GitHub is another great source. I've learned so much just exploring other
people's code. It's invaluable.

Why are data journalism and "news apps" important, in the context of the
contemporary digital environment for information?

Computers offer us an opportunity to better master information, better
understand each other and better watchdog those who would govern us. I
tried to talk about some of the ways simply thinking about the process of
journalism as an algorithm can point the way at last week's NICAR
conference in a talk called "Human-Assisted Reporting." In my opinion, we should aspire to write code that embodies the idealistic principles and investigative methods of the previous generation. There's all this data out there now, and journalistic algorithms, "robot
reporters," can help us ask it tougher questions.
Data  Gov_2.0  Publishing  dataconference  datajournalism  dataproduct  datascience  nicarinterview  opensource  programming  from google
march 2012 by rahuldave
Feature: How Red Hat killed its core product—and became a billion-dollar business
A decade ago, Linux developer Red Hat faced a decision that would make or break the company: whether to stop producing the very product that gave Red Hat its name. The company was built on Red Hat Linux, but when Paul Cormier—now the head of Red Hat's technologies and products group—joined the company as vice president of engineering in 2001, he knew Red Hat's devotion to open source alone couldn't create a business model capable of standing up to the Microsofts and Oracles of the world. He pushed for drastic action.

To move from small player to big-time enterprise software competitor, Cormier argued that Red Hat had to ditch the freely downloadable Red Hat Linux. Instead, it should replace Red Hat Linux with a more robust enterprise software package that maintained the principles of free (as in freedom) software without actually being free (as in price) to customers.







Read the comments on this post
News  Features  News  News  Business  Open-source  linux  opensource  redhat  from google
february 2012 by rahuldave
Report from HIMSS: health care tries to leap the chasm from the average to the superb
I couldn't attend the session today on StealthVest--and small surprise. Who wouldn't want to come see an Arduino-based garment that can hold numerous health-monitoring devices in a way that is supposed to feel like a completely normal piece of clothing? As with many events at the HIMSS conference, which has registered over 35,000 people (at least four thousand more than last year), the StealthVest presentation drew an overflow crowd.

StealthVest sounds incredibly cool (and I may have another chance to report on it Thursday), but when I gave up on getting into the talk I walked downstairs to a session that sounds kind of boring but may actually be more significant: Practical Application of Control Theory to Improve Capacity in a Clinical Setting.

The speakers on this session, from Banner Gateway Medical Center in Gilbert, Arizona, laid out a fairly standard use of analytics to predict when the hospital units are likely to exceed their capacity, and then to reschedule patients and provider schedules to smooth out the curve. The basic idea comes from chemical engineering, and requires them to monitor all the factors that lead patients to come in to the hospital and that determine how long they stay. Queuing theory can show when things are likely to get tight. Hospitals care a lot about these workflow issues, as Fred Trotter and David Uhlman discuss in the O'Reilly book Beyond Meaningful Use, and they have a real effect on patient care too.

The reason I find this topic interesting is that capacity planning leads fairly quickly to visible cost savings. So hospitals are likely to do it. Furthermore, once they go down the path of collecting long-term data and crunching it, they may extend the practice to clinical decision support, public health reporting, and other things that can make a big difference to patient care.

A few stats about data in U.S. health care

Do we need a big push to do such things? We sure do, and that's why meaningful use was introduced into HITECH sections of the American Recovery and Reinvestment Act. HHS released mounds of government health data on Health.data.gov hoping to serve a similar purpose. Let's just take a look at how far the United States is from using its health data effectively.

Last November, a CompTIA survey (reported by Health Care IT News) found that only 28% of providers have comprehensive EHRs in use, and another 17% have partial implementations. One has to remember that even a "comprehensive" EHR is unlikely to support the sophisticated data mining, information exchange, and process improvement that will eventually lead to lower costs and better care.

According to a recent Beacon Partners survey (PDF), half of the responding institutions have not yet set up an infrastructure for pursuing health information exchange, although 70% consider it a priority. The main problem, according to a HIMSS survey, is budget: HIEs are shockingly expensive. There's more to this story, which I reported on from a recent conference in Massachusetts.

Stats like these have to be considered when HIMSS board chair, Charlene S. Underwood, extolled the organization's achievements in the morning keynote. HIMSS has promoted good causes, but only recently has it addressed cost, interoperability, and open source issues that can allow health IT to break out of the elite of institutions large or sophisticated enough to adopt the right practices.

As signs of change, I am particularly happy to hear of HIMSS's new collaboration with Open Health Tools and their acquisition of the mHealth summit. These should guide the health care field toward more patient engagement and adaptable computer systems. HIEs are another area crying out for change.

An HIE optimist

With the flaccid figures for HIE adoption in mind, I met Charles Parisot, chair of Interoperability Standards and Testing Manager for EHRA, which is HIMSS's Electronic Health Records Association. The biggest EHR vendors and HIEs come together in this association, and Parisot was just stoked with positive stories about their advances.

His take on the cost of HIEs is that most of them just do it in a brute force manner that doesn't work. They actually copy the data from each institution into a central database, which is hard to manage from many standpoints. The HIEs that have done it right (notably in New York state and parts of Tennessee) are sleek and low-cost. The solution involves:

Keeping the data at the health care providers, and storing in the HIE only some glue data that associates the patient and the type of data to the provider.

Keeping all metadata about formats out to the HIE, so that new formats, new codes, and new types of data can easily be introduced into the system without recoding the HIE.

Breaking information exchange down into constituent parts--the data itself, the exchange protocols, identification, standards for encryption and integrity, etc.--and finding standard solutions for each of these.

So EHRA has developed profiles (also known by its ONC term, implementation specifications) that indicate which standard is used for each part of the data exchange. Metadata can be stored in the core HL7 document, the Clinical Document Architecture, and differences between implementations of HL7 documents by different vendors can also be documented.

A view of different architectures in their approach can be found in an EHRA white paper, Supporting a Robust Health Information Exchange Strategy with a Pragmatic Transport Framework. As testament to their success, Parisot claimed that the interoperability lab (a huge part of the exhibit hall floor space, and a popular destination for attendees) could set up the software connecting all the vendors' and HIEs' systems in one hour.

I asked him about the simple email solution promised by the government's Direct project, and whether that may be the path forward for small, cash-strapped providers. He accepted that Direct is part of the solution, but warned that it doesn't make things so simple. Unless two providers have a pre-existing relationship, they need to be part of a directory or even a set of federated directories, and assure their identities through digital signatures.

And what if a large hospital receives hundreds of email messages a day from various doctors who don't even know to whom their patients are being referred? Parisot says metadata must accompany any communications--and he's found that it's more effective for institutions to pull the data they want than for referring physicians to push it.

Intelligence for hospitals

Finally, Parisot told me EHRA has developed standards for submitting data to EHRs from 350 types of devices, and have 50 manufacturers working on devices with these standards. I visited a booth of iSirona as an example. They accept basic monitoring data such as pulses from different systems that use different formats, and translate over 50 items of information into a simple text format that they transmit to an EHR. They also add networking to devices that communicate only over cables. Outlying values can be rejected by a person monitoring the data. The vendor pointed out that format translation will be necessary for some time to come, because neither vendors nor hospitals will replace their devices simply to implement a new data transfer protocol.

For more about devices, I dropped by one of the most entertaining parts of the conference, the Intelligent Hospital Pavilion. Here, after a badge scan, you are somberly led through a series of locked doors into simulated hospital rooms where you get to watch actors in nursing outfits work with lifesize dolls and check innumerable monitors. I think the information overload is barely ameliorated and may be worsened by the arrays of constantly updated screens.

But the background presentation is persuasive: by using attaching RFIDs and all sorts of other devices to everything from people to equipment, and basically making the hospital more like a factory, providers can radically speed up responses in emergency situations and reduce errors. Some devices use the ISM "junk" band, whereas more critical ones use dedicated spectrum. Redundancy is built in throughout the background servers.

Waiting for the main event

The US health care field held their breaths most of last week, waiting for Stage 2 meaningful use guidelines from HHS. The announcement never came, nor did it come this morning as many people had hoped. Because meaningful use is the major theme of HIMSS, and many sessions were planned on helping providers move to Stage 2, the delay in the announcement put the conference in an awkward position.

HIMSS is also nonplussed over a delay in another initiative, the adoption of a new standard in the classification of disease and procedures. ICD-10 is actually pretty old, having been standardized in the 1980s, and the U.S. lags decades behind other countries in adopting it. Advantages touted for ICD-10 are:

It incorporates newer discoveries in medicine than the dominant standard in the U.S., ICD-9, and therefore permits better disease tracking and treatment.

Additionally, it's much more detailed than ICD-9 (with an order of magnitude more classifications). This allows the recording of more information but complicates the job of classifying a patient correctly.

ICD-10 is rather controversial. Some people would prefer to base clinical decisions on SNOMED, a standard described in the Beyond Meaningful Use book mentioned earlier. Ultimately, doctors lobbied hard against the HHS timeline for adopting ICD-10 because providers are so busy with meaningful use. (But of course, the goals of adopting meaningful use are closely tied to the goals of adopting ICD-10.) It was the pushback from these institutions that led HHS to accede and announce a delay. HIMSS and many of its members were disap[…]
Data  Gov_2.0  americanrecoveryandreinvestmentact  arra  ehrs  electronichealthrecords  freesoftware  healthcare  healthit  himss  hitech  interoperability  meaningfuluse  medical  opensource  from google
february 2012 by rahuldave
Four short links: 28 December 2011
Terrier IR -- open source (Mozilla) text search engine, now with Hadoop support.
s3ql -- open source (GPLv3) Linux filesystem which stores its data on Google Storage, Amazon S3, or OpenStack. (via Adam Shand)
Esprima -- open source (BSD) fast Javascript parser in Javascript. (via Javascript Weekly)
Hogan.js -- open source (Apache) Javascript templating engine from Twitter. If it proves anywhere near as good as Bootstrap, it'll be heavily used.
cloud  javascript  opensource  programming  search  storage  textanalysis  web  from google
december 2011 by rahuldave
Four short links: 28 December 2011
Terrier IR -- open source (Mozilla) text search engine, now with Hadoop support.
s3ql -- open source (GPLv3) Linux filesystem which stores its data on Google Storage, Amazon S3, or OpenStack. (via Adam Shand)
Esprima -- open source (BSD) fast Javascript parser in Javascript. (via Javascript Weekly)
Hogan.js -- open source (Apache) Javascript templating engine from Twitter. If it proves anywhere near as good as Bootstrap, it'll be heavily used.
cloud  javascript  opensource  programming  search  storage  textanalysis  web  from google
december 2011 by rahuldave
Google Cloud Print: coming to a wireless device near you
The question of how to print from wireless devices has been thrust once again into the limelight recently thanks to the printing-anemic iPad. Longtime notebook and mobile device users are quite familiar with the printing conundrum—cables, drivers and all.

Google has announced that it's looking to address this problem in the form of Cloud Print. Part of the Chromium and Chromium OS projects, Cloud Print aims to allow any type of application to print to any printer. This includes Web, desktop, and mobile apps from any kind of device—potentially, this could be used on a BlackBerry, Windows machines, Macs, or even the iPad. (That is in addition to Google's own offerings: "Google Chrome OS will use Google Cloud Print for all printing. There is no print stack and there are no printer drivers on Google Chrome OS!" says the company.)





Read the comments on this post
News  News  News  News  Gadgets  Open-source  Web  api  cloud  cloudprint  google  internet  network  opensource  printing  from google
april 2010 by rahuldave
Four short links: 5 April 2010
Wrong about the iPad (Tim Bray) -- I am actively ignoring the iPad drivel, but this line caught my eye: Intelligence is a text-based application.
Fertile Medium -- online community consultancy, from the first and former Flickr community coordinator. One to watch: Heather and Derek really know their community. Again I say it: understanding of how open source and other collaborative communities can function is rare and valuable. (via waxy)
pigz -- parallel gzip implementation. Voom voom, so fast! (via kellan on Delicious
Prefab: What If We Could Modify Any Interface? -- screen-scraping for GUIs to bolt on new functionality to user interfaces. This is incredible. Watch the demo, it's impressive!
brains  community  hacks  opensource  programming  ui  from google
april 2010 by rahuldave

Copy this bookmark:



description:


tags: