Overview: Extracting article text from HTML documents | My tech blog.
march 2011 by jpfinley
In the world of web scraping, text mining and article reading utilities (readability bookmarklet) there is an ever growing demand for utilities that are capable of distinguishing parts of a HTML document which represent an article apart from other common website building blocks like menus, headers, footers, ads etc.
In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.
html
scrape
scraping
extraction
text
scripting
In the following chapters I’ll try to review some article text extraction methods that are applicable to today’s websites. They mostly leverage on machine learning, statistics and a wide rage of heuristics.
march 2011 by jpfinley
Old School Color Cycling with HTML5 | EffectGames.com
july 2010 by jpfinley
DAMN. 90s-era color cycling with JavaScript.
html
html5
javascript
games
colorcycling
animation
art
color
lucasarts
july 2010 by jpfinley
Chicago Deep Dish
october 2009 by jpfinley
For those who couldn’t be there, and for those who were there and seek to savor the memories, here is An Event Apart Chicago, all wrapped up in a pretty bow:
AEA Chicago – official photo set
By John Morrison, subism studios llc. See also (and contribute to) An Event Apart Chicago 2009 Pool, a user group on Flickr.
A Feed Apart Chicago
Live tweeting from the show, captured forever and still being updated. Includes complete blow-by-blow from Whitney Hess.
Luke W’s Notes on the Show
Smart note-taking by Luke Wroblewski, design lead for Yahoo!, frequent AEA speaker, and author of Web Form Design: Filling in the Blanks (Rosenfeld Media, 2008):
Jeffrey Zeldman: A Site Redesign
Jason Santa Maria: Thinking Small
Kristina Halvorson: Content First
Dan Brown: Concept Models -A Tool for Planning Websites
Whitney Hess: DIY UX -Give Your Users an Upgrade
Andy Clarke: Walls Come Tumbling Down
Eric Meyer: JavaScript Will Save Us All (not captured)
Aaron Gustafson: Using CSS3 Today with eCSStender (not captured)
Simon Willison: Building Things Fast
Luke Wroblewski: Web Form Design in Action (download slides)
Dan Rubin: Designing Virtual Realism
Dan Cederholm: Progressive Enrichment With CSS3 (not captured)
Three years of An Event Apart Presentations
Note: Comment posting here is a bit wonky at the moment. We are investigating the cause. Normal commenting has been restored. Thank you, Noel Jackson.
Short URL: zeldman.com/?p=2695
A_List_Apart
An_Event_Apart
Appearances
Authoring
Browsers
CSS
Career
Chicago
Code
Community
Compatibility
DOM
Design
Education
Fonts
Formats
HTML
HTML5
Happy_Cog™
Information_architecture
Jason_Santa_Maria
Markup
Real_type_on_the_web
Scripting
Search
Standards
State_of_the_Web
architecture
art_direction
bugs
cities
conferences
content
content_strategy
creativity
development
downloads
editorial
engagement
eric_meyer
events
flickr
glamorous
industry
javascript
photography
social_networking
speaking
spec
from google
AEA Chicago – official photo set
By John Morrison, subism studios llc. See also (and contribute to) An Event Apart Chicago 2009 Pool, a user group on Flickr.
A Feed Apart Chicago
Live tweeting from the show, captured forever and still being updated. Includes complete blow-by-blow from Whitney Hess.
Luke W’s Notes on the Show
Smart note-taking by Luke Wroblewski, design lead for Yahoo!, frequent AEA speaker, and author of Web Form Design: Filling in the Blanks (Rosenfeld Media, 2008):
Jeffrey Zeldman: A Site Redesign
Jason Santa Maria: Thinking Small
Kristina Halvorson: Content First
Dan Brown: Concept Models -A Tool for Planning Websites
Whitney Hess: DIY UX -Give Your Users an Upgrade
Andy Clarke: Walls Come Tumbling Down
Eric Meyer: JavaScript Will Save Us All (not captured)
Aaron Gustafson: Using CSS3 Today with eCSStender (not captured)
Simon Willison: Building Things Fast
Luke Wroblewski: Web Form Design in Action (download slides)
Dan Rubin: Designing Virtual Realism
Dan Cederholm: Progressive Enrichment With CSS3 (not captured)
Three years of An Event Apart Presentations
Note: Comment posting here is a bit wonky at the moment. We are investigating the cause. Normal commenting has been restored. Thank you, Noel Jackson.
Short URL: zeldman.com/?p=2695
october 2009 by jpfinley
StaticMatic
september 2007 by jpfinley
Static HTML with Haml
ruby
html
haml
code
development
programming
september 2007 by jpfinley
CSS Layouts
april 2007 by jpfinley
These CSS Layouts offer full Grade-A browser support.
css
layout
templates
html
april 2007 by jpfinley
Beginner's guide from a seasoned CSS designer ~ Authentic Boredom
september 2006 by jpfinley
Massive collection of links and resources for standards-based web design
css
design
webdesign
web
reference
links
development
mobile
html
typography
september 2006 by jpfinley
Coding a Layout
january 2006 by jpfinley
So, you’ve designed your next site but you’re having a little trouble turning your lovely PSD into a coded layout. This tutorial should help you learn how to analyze either a new template, or even your current layout to find the best way to code it.
html
webdesign
css
january 2006 by jpfinley
related tags
animation ⊕ An_Event_Apart ⊕ Appearances ⊕ architecture ⊕ art ⊕ articles ⊕ art_direction ⊕ Authoring ⊕ A_List_Apart ⊕ browser ⊕ Browsers ⊕ bug ⊕ bugs ⊕ button ⊕ Career ⊕ Chicago ⊕ cities ⊕ code ⊕ color ⊕ colorcycling ⊕ Community ⊕ Compatibility ⊕ computer ⊕ conferences ⊕ content ⊕ content_strategy ⊕ creativity ⊕ css ⊕ design ⊕ development ⊕ DOM ⊕ downloads ⊕ editorial ⊕ Education ⊕ email ⊕ engagement ⊕ eric_meyer ⊕ events ⊕ extraction ⊕ flickr ⊕ Fonts ⊕ footer ⊕ form ⊕ Formats ⊕ framework ⊕ games ⊕ glamorous ⊕ grid ⊕ haml ⊕ Happy_Cog™ ⊕ howto ⊕ html ⊖ html5 ⊕ ie ⊕ ie6 ⊕ industry ⊕ Information_architecture ⊕ Jason_Santa_Maria ⊕ javascript ⊕ layout ⊕ links ⊕ lucasarts ⊕ markup ⊕ mobile ⊕ patterns ⊕ photography ⊕ programming ⊕ rails ⊕ Real_type_on_the_web ⊕ reference ⊕ ruby ⊕ scrape ⊕ scraping ⊕ scripting ⊕ Search ⊕ social_networking ⊕ speaking ⊕ spec ⊕ Standards ⊕ State_of_the_Web ⊕ template ⊕ templates ⊕ text ⊕ tutorial ⊕ typography ⊕ usability ⊕ web ⊕ webdesign ⊕ wordpress ⊕ xhtml ⊕Copy this bookmark: