Vaguery + text-mining   4

ashleyw/phrasie - GitHub
Determines important terms within a given piece of content. It uses linguistic tools such as Parts-Of-Speech (POS) and some simple statistical analysis to determine the terms and their strength.
Ruby  library  tagging  natural-language-processing  NLP  statistics  text-mining 
may 2011 by Vaguery
Walking Randomly » Natural Scientists: their very big output files – and a tale of diffs
"A few years back, when a user at the University of Manchester asked for help with the ‘diff – files too big/ out of memory’ problem, I wrote a modern version that I called idiffh (for Ian’s diffh). My ground rules were:<br />
Work on any text files on any operating system with a C compilerHave no limits on, e.g., line lengths or file sizeNever ‘give up’ if the going gets tough (i.e. when the files are very different)"
diff  text-mining  dataset  open-science  tools  from delicious
april 2011 by Vaguery
CASS
"In the social sciences, it is useful to understand the relative similarities of concepts that are embedded in a particular text (from a particular group or a particular person). For example, in trying to estimate conservative bias in FoxNews, one might estimate its tendency to associate conservative concepts (conservative, republican) and good concepts (good, positive, etc.), compared to conservative and bad concepts. The output would indicate conservative favoritism. This comparison could be further refined by taking into account important "baseline" information about the valences associated with liberal, namely liberal and good in comparison to liberal and bad.…"
text-mining  natural-language-processing  data-mining  machine-learning  Ruby  library 
june 2010 by Vaguery
[1005.5516] On the Fly Query Entity Decomposition Using Snippets
"One of the most important issues in Information Retrieval is inferring the intents underlying users' queries. Thus, any tool to enrich or to better contextualized queries can proof extremely valuable. Entity extraction, provided it is done fast, can be one of such tools. Such techniques usually rely on a prior training phase involving large datasets. That training is costly, specially in environments which are increasingly moving towards real time scenarios where latency to retrieve fresh informacion should be minimal. In this paper an `on-the-fly' query decomposition method is proposed. It uses snippets which are mined by means of a na\"ive statistical algorithm. An initial evaluation of such a method is provided, in addition to a discussion on its applicability to different scenarios."
search-engines  natural-language-processing  algorithms  nudge-targets  text-mining 
june 2010 by Vaguery

Copy this bookmark:



description:


tags: