Hitting a link in an HTML email in mutt

add macro attach V |’lynx -stdin’\n to .muttrc then when you are in the attachment menu, select the HTML content and hit V. This will launch a lynx process to view the content, follow links, etc. This is on top of having text/html ; lynx -dump -force_html %s ; copiousoutput or similar set in your …

Continue reading ‘Hitting a link in an HTML email in mutt’ »

boilerpipe – Project Hosting on Google Code

The boilerpipe library provides algorithms to detect and remove the surplus “clutter” (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs …

Continue reading ‘boilerpipe – Project Hosting on Google Code’ »

Scrappy, Simple Stupid Spidering

Scrappy is an easy (and hopefully fun) way of scraping, spidering, and/or harvesting information from web pages. Internally Scrappy uses the awesome Web::Scraper and WWW::Mechanize modules so as such Scrappy imports its awesomeness. Scrappy is inspired by the fun and easy-to-use Dancer API. Beyond being a pretty API for WWW::Mechanize::Plugin::Web::Scraper, Scrappy also has its own …

Continue reading ‘Scrappy, Simple Stupid Spidering’ »