Cold-start-simulator - Apache OpenOffice.org Wiki
sync ; echo 3 | sudo tee /proc/sys/vm/drop_caches
linux  flush  drop  cache 
25 days ago
Mosh: the mobile shell
Remote terminal application that allows roaming, supports intermittent connectivity, and provides intelligent local echo and line editing of user keystrokes.

Mosh is a replacement for SSH. It's more robust and responsive, especially over Wi-Fi, cellular, and long-distance links.

Mosh is free software, available for GNU/Linux, FreeBSD, and Mac OS X.
shell  ssh  terminal  linux  roaming  mobile  local  echo  line  editing 
25 days ago
Learning to Classify Text using Support Vector Machines
Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.
joachims  svm  book  toc 
25 days ago
Rada Mihalcea: Downloads
Texts semantically annotated with WordNet 1.6 senses (created at Princeton University), and automatically mapped to WordNet 1.7, WordNet 1.7.1, WordNet 2.0, WordNet 2.1, WordNet 3.0
semcor  brown  wordnet  semantic  sense  annotated  corpus  download 
25 days ago
RIES - Find Algebraic Equations, Given Their Solution at MROB
ries (or RIES, an acronym for RILYBOT Inverse Equation Solver) takes any number and produces a list of equations that approximately solve to that number, like the following example
equation  formula  approximation  solve  solution  xkcd 
25 days ago
Research: Text Segmentation and Classification in Email Messages
Our dataset consists of 11881 annotated lines from almost 400 email messages drawn at random from the Enron email corpus. We use the database dump of the Enron corpus (219Mb) released by Andrew Fiore and Jeff Heer. This version of the corpus has been processed to remove duplicate messages and to normalise sender and recipient names, resulting in just over 250,000 email messages. No attachments are included. Our annotations are made by a single annotator.
email  functional  zones  data  dataset 
25 days ago
Analyzing Microtext: Related Work
This is a partial list of work related to the AAAI-11 Workshop on Analyzing Microtext.
microtext  twitter  articles  index  list  bibliography 
25 days ago
ESSLLI X course on Inductive Language Learning
The roots of ILL can be traced back to De Saussure (1916) and Bloomfield (1933): language tasks can be learned and performed by employing analogy and induction on relations between language elements. Chomsky strongly criticised the bluntness of the analogy/induction approach, capitalising on its inability to capture relations involving meaning. In ILL, the pre-Chomskyan ideas on analogy and induction are implemented on present-day computer technology, using general-purpose inductive-learning tools developed in machine learning to explore the range of language tasks that can be learned successfully. After giving the historical background and an introducting into supervised inductive machine learning, the course will provide an overview of methods, techniques, and empirical results showing that ILL is successful in morpho-phonology, and (more suprisingly) succesful in several higher-level language tasks (e.g., POS tagging, PP-attachment).
language  induction 
6 weeks ago
Chinese Grammar Wiki
Resources for Chinese grammar have been scattered, incomprehensible, and hidden behind paywalls for way too long. Wikipedia has shown us a better way, and we have taken up the challenge of making Chinese grammar learner-friendly and accessible to all. We've already got 506 articles online, and we're growing!
chinese  grammar  language  wiki 
6 weeks ago
Aaron's Twitter Viewer
I made this little program so you can view and link to a whole conversation from Twitter in context. Here's an example! Want one for your conversation? Just enter the URL of an individual tweet (it should contain a long number):
twitter  conversation  thread  view  link 
6 weeks ago
SVMsequel
SVMsequel is a complete environment for training and use support vector machines. Some familiarity with kernel methods will be helpful (see here for class notes I've used for using SVMs for natural language processing if you need a refresher.
svm  string  kernel  opensource 
6 weeks ago
Hangul - Wikipedia, the free encyclopedia
Numerous linguists have praised Hangul for its featural design, describing it as "remarkable", "the most perfect phonetic system devised", and "brilliant, so deliberately does it fit the language like a glove."[23] The principal reason Hangul has attracted this praise is its partially featural design: The shapes of the letters are related to the features of the sounds they represent: The letters for consonants pronounced in the same place in the mouth are built on the same underlying shape. In addition, vowels are made from vertical or horizontal lines so that they are easily distinguishable from consonants.
korean  alphabet  letter  design  feature  featural  syllable  orthography 
6 weeks ago
Latino sine Flexione - Wikipedia, the free encyclopedia
Latino sine flexione ("Latin without inflections"), or Peano’s Interlingua (abbreviated as IL), is an international auxiliary language invented by the Italian mathematician Giuseppe Peano (1858–1932) in 1903. It is a simplified version of Latin, and retains its vocabulary. It was published in the journal Revue de Mathématiques, in an article entitled De Latino Sine Flexione, Lingua Auxiliare Internationale,[1] which explained the reason for its creation. The article argued that other auxiliary languages were unnecessary, since Latin was already established as the world’s international language. The article was written in classical Latin, but it gradually dropped its inflections until there were none.
latin  language  grammar  gradual  change  transition  alter  inflections 
6 weeks ago
TEL :: [tel-00192620, version 1] Gouttes rebondissantes: une association onde-particule à échelle macroscopique
A drop placed at the surface of the same liquid coalesces within a few tenths of seconds. Vibrating the bath of liquid on which the drop is placed can inhibit this process. The drop will then be able to bounce at the surface of the liquid for an unlimited time. In this thesis, we use liquids of medium to low viscosity. A bouncing drop then emits a wave at the surface of the liquid at each bounce. Those drops spontaneously organize themselves in bounded states or in clusters. Just below the Faraday instability threshold, a remarkable phenomenon occurs when the drop undergoes a drift bifurcation and starts moving horizontally at the surface of the liquid, acquiring a constant horizontal velocity. We call such drops walkers. We have studied this transition from a steady bouncing drop to a walker and described it theoretically. A walker never collides directly with one of the cell's walls but, via its own waves and the waves emitted at the boundaries, is repelled and undergoes a reflection. Thus in certain situations the drop can have a billiard-like motion in the cell. We have also observed the various collisions (always via their waves) of several walkers moving across the cell. The attractive collision of two walkers leads to the orbiting motion of the two drops. The size of the orbits can take a series of discrete values, which can be explained by the interaction of the drops via the interferences created by their associated waves. We also discuss the differences and similarities between these new objects and localized structures observed in various 2D dissipative systems such as oscillons in fluids and granular materials or cavity solitons in optics.
quantum  macroscopic  classical  drop  wave  interference 
6 weeks ago
HaLVM
The Haskell Lightweight Virtual Machine (or, informally, the HaLVM) is a port of the  GHC runtime system for Haskell to barebones  Xen. This means that Haskell programs written for the HaLVM run natively on Xen, without any intervening operating system, which allows them to boot quickly and use very little space.
haskell  vm  xen  kernel  virtualization 
8 weeks ago
Introduction — gensim
Gensim is a free Python framework designed to automatically extract semantic topics from documents, as efficiently (computer-wise) and painlessly (human-wise) as possible.

Gensim aims at processing raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised, which means no human input is necessary – you only need a corpus of plain text documents.

Once these statistical patterns are found, any plain text documents can be succinctly expressed in the new, semantic representation, and queried for topical similarity against other documents.
python  library  lsi  lsa  lda  topic  analysis  model  modelling  latent  semantic  dirichlet 
8 weeks ago
SimString - A fast and simple algorithm for approximate string matching/retrieval
SimString is a simple library for fast approximate string retrieval. Approximate string retrieval finds strings in a database whose similarity with a query string is no smaller than a threshold. Finding not only identical but similar strings, approximate string retrieval has various applications including spelling correction, flexible dictionary matching, duplicate detection, and record linkage.
library  text  python  approximate  fuzzy  string  match  matching  search  similar  edit  distance 
8 weeks ago
He Sells Shell Scripts to Intersect Sets
In this article we shall demonstrate how simple shell scripts can be used to implement sets, providing one line recipes for set creation, set union, set intersection and more. Having explored the power of the Unix shell we’ll consider its limitations, before finally discussing the more general lessons we can learn from the Unix tools.
unix  shell  set  sets  intersection  comm  common  bash  cli  scripting 
8 weeks ago
Silvia Quarteroni - Resources
YourQA Corpora

These are the TREC'01 non-factoid answer corpora used in the following papers
question  answering  data  dataset 
8 weeks ago
Training the Cloud with the Crowd: Training A Google Prediction API Model Using CrowdFlower’s Workforce | Dialogue Earth
Can a machine be taught to determine the sentiment of a Twitter message about weather?  With the data from over 1 million crowd sourced human judgements the goal was to use this data to train a predictive model and use this machine learning system to make judgements.  Below are the highlights from the research and development of a machine learning model in the cloud that predicts the sentiment of text regarding the weather.  The following are the major technologies used in this research:  Google Prediction API, CrowdFlower, Twitter,  Google Maps.
twitter  sentiment  analysis  crowd  sourcing  crowdflower  weather  google  prediction  api  machine-learning 
8 weeks ago
Bayesian parameter tuning for SVMs
We interpret the SVM algorithm as the maximum a posteriori solution to a Bayesian inference problem. It is then natural to select hyperparameters to maximize the evidence, i.e. the overall likelihood of the observed data. The key advantage is that the evidence is a continuous function of the hyperparameters, and so can be optimized by e.g. gradient ascent. We have tested this method on a number of standard data sets and found very encouraging results. For details and background references, see the papers on SVMs in my publications list.
bayesian  svm  parameter  hyperparameter  tuning  software 
10 weeks ago
Eli Bendersky's website » Python internals: adding a new statement to Python
This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.
python  internals  add  new  statement  syntax  howto  parser  grammar 
10 weeks ago
Pronunciation Guide to Mathematicians
These are mathematicians frequently encountered by undergraduate students. For a more complete listing, A History of Mathematics: An Introduction , by Victor Katz (Harper Collins) has a wonderful pronunciation guide as part of the index. Interested persons are urged to consult this text.
mathematics  language  pronunciation  names  lists  cauchy  dirichlet 
12 weeks ago
Corpus Resources
On-line corpora with concordancers
Free corpora for downloading
Subscription corpora and tools
On-line corpora of web texts
Transcribed spoken English (some audio)
Web as Corpus (corpora on the fly)
Concordancers and Tools
corpus  linguistics  tools  analysis  software  list  index  language  english  annotated  bnc 
12 weeks ago
David M. Blei
Much of my research is in topic models, which are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. These algorithms help us develop new ways to search, browse and summarize large archives of texts.

Below, you will find links to introductory materials, corpus browsers based on topic models, and open source software (from my research group) for topic modeling.
machinelearning  topic  modelling  papers  software  implementation  list  index 
12 weeks ago
PDF bookmarks with Ghostscript
I've bundled the whole pdfmarks-generation bit into a script, pdf-merge.py, which generates the pdfmark file and runs Ghostscript automatically. Think of it as a bookmark-preserving version of pdftk's cat. The script uses pdftk internally to extract bookmark information from the source PDFs.
pdf  merge  join  cat  concatenate  combine  bookmarks  contents  python  script  pdftk  ghostscript 
february 2012
FedoraForum.org - View Single Post - [SOLVED] internal microphone not working
It's probably muted. Open the terminal and start up the alsamixer:

Code:
alsamixer -c0 -Vcapture
Select the recording input device - choose between Mic and Front Mic to get sound in Skype. Use up and down arrows...
fedora  microphone  mic 
february 2012
Appendix:English irregular nouns - Wiktionary
The table below lists English words that have irregular plurals.
english  language  linguistics  word  list  examples  noun  plural  irregular  inflection 
february 2012
Appendix:English irregular verbs - Wiktionary
This is a list of irregular verbs in the English language. The citation form (the infinitive) comes first (with a link to the Wiktionary article on the verb), together with the present tense forms when they are different, then the preterite or simple past, and finally the past participle. The right hand column notes whether they are weak or strong and whether they belong to a subclass, and links to discussions elsewhere. Typical irregularities in weak verbs are the assimilation of dentals (bended → bent) and vowel reduction (*keeped → kept).
english  language  linguistics  word  list  examples  verb  irregular  inflection 
february 2012
Appendix:List of English copulae - Wiktionary
This is a list of English copulae. Because many of these copulative verbs may be used non-copulatively, examples are provided.
english  language  linguistics  word  list  examples  copula 
february 2012
Appendix:English catenative verbs - Wiktionary
Catenative verbs are verbs which can be followed directly by another verb — variously in the to-infinitive, bare infinitive or present participle/gerund forms. For example He deserves to win the cup, where deserve is a catenative verb which can be followed directly by another verb, in this case in the to-infinitive form.
Most of these verbs demand that the following verb be in one or the other form only. A few can take both forms, but sometimes there is a difference in meaning.
They are called catenative from their ability to form chains. We promised to agree to try practicing playing tennis more often.
english  language  linguistics  word  list  verb  type  catenative  examples 
february 2012
Duke University | Duke University Slavic Centers : Reference Grammars
This set of reference grammars has been designed for advanced-level language users and linguists to compare semantic categories across languages. Each grammar also provides background information about the language and its speakers. In some cases the author has included a topic which provides greater illumination of the language (e.g. tongue twisters, slang/profanity, or a set of exemplary texts). These sections are indicated by a shaded background in the left-side navigation pane.
reference  grammar  linguistics 
february 2012
List of symbols - Apertium
Eventually this will be a glossary of symbol names in alphabetical order with notes. Some of these names are specific to particular packages or language pairs, as not all languages have the same grammatical features (most don't have spatial distinction in articles for example).
apertium  morphology  symbols  abbreviations  gloss 
february 2012
Dept. of Linguistics | Resources | Glossing Rules
The Leipzig Glossing Rules have been developed jointly by the Department of
Linguistics of the Max Planck Institute for Evolutionary Anthropology
(Bernard Comrie, Martin Haspelmath) and by the Department of Linguistics
of the University of Leipzig (Balthasar Bickel). They consist of ten rules for the
"syntax" and "semantics" of interlinear glosses, and an appendix with a
proposed "lexicon" of abbreviated category labels. The rules cover a large part
of linguists' needs in glossing texts, but most authors will feel the need to add
(or modify) certain conventions (especially category labels). Still, it will be
useful to have a standard set of conventions that linguists can refer to, and the
Leipzig Rules are proposed as such to the community of linguists. The Rules
are intended to reflect common usage, and only very few (mostly optional)
innovations are proposed.
language  linguistics  interlinear  gloss 
february 2012
WALS - The World Atlas of Language Structures
The World Atlas of Language Structures (WALS) is a large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of 55 authors (many of them the leading authorities on the subject).
language  linguistics  reference 
february 2012
Tatoeba: Collecting example sentences
At its core, Tatoeba is a large database of example sentences translated into several languages. But as a whole, it is much more than that.
language  translation  parallel  corpus  sentences  linguistics 
january 2012
Wiktionary/Look Up tool - Meta
The Wiktionary Look Up tool is a javascript which allows readers to simply double-click on a term to receive a pop-up with the top definition for the term, without leaving the page they are reading. It's particularly useful when visiting a site not in your first language, but great whenever you run across a term you're not quite certain about.
wiktionary  dictionary  lookup  tool  bookmarklet  javascript  popup  definition  translation 
january 2012
Graphical Function Explorer grapher (GFE) - Math Open Reference
GFE is a free web-based function graphing tool that allows you to plot up to three functions on the same set of axes. In the functions you can refer to up to four independent variables that are controlled by sliders. This allows you to easily see the effect of changes since the graphs change in real time as you drag the sliders.
plot  function  free  variable  slider  manipulate 
january 2012
SpecGram—New speech disorder linguists contracted discovered!—Yreka Bakery
An apparently new speech disorder a linguistics department our correspondent visited was affected by has appeared. Those affected our correspondent a local grad student called could hardly understand apparently still speak fluently. The cause experts the LSA sent investigate remains elusive. Frighteningly, linguists linguists linguists sent examined are highly contagious. Physicians neurologists psychologists other linguists called for help called for help called for help didn’t help either. The disorder experts reporters SpecGram sent consulted investigated apparently is a case of pathological center embedding.
centre  embedding  linguistics  grammar  humour  satire  pathological 
november 2011
Golomb ruler - Wikipedia, the free encyclopedia
In mathematics, a Golomb ruler is a set of marks at integer positions along an imaginary ruler such that no two pairs of marks are the same distance apart.
distinct  unique  distance 
november 2011
Novikov self-consistency principle - Wikipedia, the free encyclopedia
Time loop logic, coined by the roboticist and futurist Hans Moravec,[11] is the name of a hypothetical system of computation that exploits the Novikov self-consistency principle to compute answers much faster than possible with the standard model of computational complexity using Turing machines. In this system, a computer sends a result of a computation backwards through time and relies upon the self-consistency principle to force the sent result to be correct.
computation  physics  time  travel  polynomial  p  np  prime  factorisation  paradox 
november 2011
Collatz conjecture - Wikipedia, the free encyclopedia
The conjecture has been checked by computer for all starting values up to 20 × 258 ≈ 5.764×1018.[9] All initial values tested so far eventually end in the repeating cycle {4,2,1}, which has only three terms. It is also known that {4,2,1} is the only repeating cycle possible with fewer than 35400 terms.[10]
Such computer evidence is not a proof that the conjecture is true. As shown in the cases of the Pólya conjecture, the Mertens conjecture and the Skewes' number, sometimes a conjecture's only counterexamples are found when using very large numbers. Since sequentially examining all natural numbers is a process which can never be completed, such an approach can never demonstrate that the conjecture is true, merely that no counterexamples have yet been discovered.
mathematics  induction  empirical  counterexample  large 
november 2011
Fedora: Rebuild A Source Package | moosechips
General instructions to rebuild a Fedora source rpm.
fedora  redhat  source  src  rpm  rebuild  modify  howto  reference 
november 2011
The Linux Cookbook: Tips and Techniques for Everyday Use - Analyzing Text - Making a Concordance of a Text
A concordance is an index of all the words in a text, along with their contexts. A concordance-like functionality -- an alphabetical listing of all words in a text and their frequency -- can be made fairly easily with some basic shell tools: tr, sort, and uniq.
concordance  text  word  frequency  table  tr  sort  uniq  unix  shell 
november 2011
Inversion (discrete mathematics) - Wikipedia, the free encyclopedia
In computer science and discrete mathematics, an inversion in a sequence of numbers is a pair of numbers in the sequence that are "out of order" with respect to an ascending or descending order.
sequence  order  greater  less  preceding  elements  sort  running  max  min 
november 2011
UbuntuDevelopment/Ports - Ubuntu Wiki
QEMU can launch Linux processes compiled for one CPU on another CPU, translating syscalls on the fly.
ubuntu  arm  arch  architecture  foreign  qemu  user  mode  emulation 
october 2011
ARM/BuildEABIChroot - Ubuntu Wiki
The binfmt-misc module in the kernel makes it possible to execute binaries of foreign arches under linux. Qemu can use this fact to enable several architecture specific execution environments in userspace without the need to run a kernel of the target architecture. ARM EABI support is included in the 0.11.x release used in ubuntu karmic today.
ubuntu  arm  arch  architecture  foreign  qemu  user  mode  emulation 
october 2011
« earlier      
3d ai ajax algorithm algorithms analysis api appengine applet arduino art article ascii astronomy audio backup bash bibliography book books brain browser bytecode c c++ calculator cheatsheet chemistry circuit class code color comparison compiler computer conversion convert converter cool corpus cpu css data database dataset debian del.icio.us delicious design development diy documentation download editor electricity electronics emulator engine engineering english evolution example examples faq flash fractal free fun funny generator geo gis git google googlemaps grammar graph graphics hack hardware hash haskell history howto html http humor humour im image images index interest interesting internet irc jabber java javascript jquery jvm kde language latex library linguistics linux lisp list logic machine machine-learning maps math mathematics maths matlab matrix memory midi music mysql mythtv network networking neural neural-networks notation online oop opengl openid opensource parser parsing pdf performance perl photo photography php physics plot plugin posix probability processing programming prolog proof psychology puzzle python qt quantum reference regex research ruby school science screen script search security semantic server shell signal similarity simulation software sound source specification standards statistics string structure subversion svm svn synthesis sysadmin system table tagging terminal test testing tex text texture tikz tips tool tools translation tutorial tutorials tv ubuntu unix video vision visualization vm vpn weather web web2.0 webdesign webdev wiki wikipedia windows wireless wlan word words xml

Copy this bookmark:



description:


tags: