Welcome to the IdeaMatt blog!

My rebooted blog on tech, creative ideas, digital citizenship, and life as an experiment.

Wednesday
Sep302015

Notes From a Conversation With Political Activist Josh Silver (Represent.us)

Since leaving my research programmer position at UMass (funding was drying up), I've been reaching out to leaders to explore possible contributions I could make in the progressive or educational spaces 1. Here I'd like to briefly share a few notes from my conversation with one of them, Josh Silver. NB: Any errors I make are my own - please send corrections!)

And a quick pitch: If you know someone who's hiring in those areas then please give me a ring. (I'm looking for the intersection of progressive work, software tool development, and solid funding.)

Josh Silver (Represent.us)

I had the good fortune to meet Josh Silver, who has made contributions in media reform and campaign finance reform (think Citizens United). He currently runs Represent.Us, and, before that, Free Press. Josh summarized his perspective on what he sees as the two main challenges to policy change (an informed public, and government representatives who actually represent people, not big money or corporations) and gave me a precis on how money is injected into politics (direct to campaigns and super PACs, which can be tracked, and indirect via 501c4 and 501c6 organizations, which is not directly tracked - more here). Josh said there's a lot of activity in data and politics, for example:

He also showed me the Greenhouse browser extension, which rewrites web pages via some information extraction to pop up campaign finance information for people on the page.

We talked a little about the fallibility of human nature and how emotion and stories often trump facts (It's a "basic human survival skill," explains political scientist Arthur Lupia of the University of Michigan. We push threatening information away; we pull friendly information close. We apply fight-or-flight reflexes not only to predators, but to data itself.). (I tend to arrogantly think I'm immune, but I'm not - see Thinking, Fast and Slow and You Are Not So Smart - Amazon links)

Regarding funding for software development, Josh suggested either contacting one of the above organizations, or propose funding for a grant from someone like the Knight Foundation.

I finished up by talking a bit with one of his research assistants about possible tools that could help folks like him and reporters collect campaign contribution information more easily. My takeaway was that there's no good, central aggregator; information is disparate and spans multiple sites.

Great stuff! Stay tuned...

Areas of Interest

  • Skepticism, critical thinking
  • Journalism, media reform
  • Education and science
  • Fact checking
  • Environment, climate change
  • Freethought
  • Politics, digital government, campaign finance reform
  • Citizenship, participation, digital government
Friday
Sep252015

Brief report: Boston Python User Group: Favorite Libraries Meetup

python logo

Last night I attended a favorite libraries Python Meetup organized by The Boston Python User Group (@bostonpython) and hosted by Akamai. Here I'll give a brief summary of the libraries covered, and a newcomer's perspective on the event. The presentations overall were great - focused, not too long, with code examples and demos. And of course the theme was a winner - who doesn't love learning about a tasty library that may come in handy? Fun demos make it even better. I'm looking forward to attending more of them.

Neil Tenenholtz: mrjob

mrjob is a library for writing MapReduce jobs (Hadoop tutorial [here](https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.htm l), original Google paper here). Neil did a live demo showing how little code is needed to express paralellizable tasks, at the simplest, just a mapper() and reducer() function. He demo'd wordcounta and showed how to run it on different clusters, including simulated, local, and Hadoop or Amazon Elastic MapReduce installations. At one of the UMass labs I worked in, we used Hadoop in a number of ways, including batch data processing and cleanup (i.e., ETL) for the machine learning algorithms, and parallelized SQL via Cloudera Impala. Yummy stuff!

Lindsay Raymond: Funcy

Funcy adds to Python some functional tools "inspired by clojure, Underscore.js, and [his] own abstractions." (They reminded me a bit of the Scala idioms I saw in my brief exposure to it for a Spark GraphX project I did to implement fast graph path searches. Ask me sometime about my take on Scala as compared to Python ;-) Lindsay gave some examples, and it was clear that there's a lot more to the library than can be shown during a short demo, but she touched on concepts like how readable elegantly composed functions are, and how testing is eased by breaking computation down into stand-alone pieces. (As an XP fan, I really appreciated this point.) While I've done less functional programming than perhaps I'd like, funcy got me interested in learning more.

Scott Sanderson: Click

click logo

Click is a package that simplifies writing command-line interfaces. Scott demonstrated (with humor :-) the main features, showing the decorator-based style it uses. Oftentimes I end up writing command line tools for applications that don't need a UI or for ones that can be scripted, and I'll definitely have a look at Click the next time the need comes up. You can read the author's motivation for writing an alternative to the inbuilt argparse module here, and learn how it compares to others at Comparing Python Command-Line Parsing Libraries - Argparse, Docopt, and Click.

Amandalynne Paullada: NLTK

Natural Language Toolkit is a new favorite of mine, and I was excited to see Amandalynne's presentation. The library is rich in features, including word and sentence segmentation, part-of-speech tagging, classifiers, information extraction, and more. Amandalynne demo'd a clever nickname generator (e.g., https://en.wikipedia.org/wiki/Text_segmentation) which, as you'd expect, was a hit. Even (or especially) when it surprised. I'm currently evaluating it for a skeptical toolbox idea I have, using it for named-entity recognition, for example. (I learned a little about this area in my last contract; it's surprisingly difficult.) There's a very well done NLTK book, which I came away from reading thinking that it would be a great introduction to Python for newbies. This is because learning the language is set in the context of text processing, which is inherently cool. Good stuff.

Ned Jackson Lovely: itsdangerous and bcrypt

itsdangerous logo

itsdangerous is a library for doing cryptographic signing, and can be applied to cases like secure "amnesia" ("forgotten password") emails or saving protected cookies. Using the module this way can, in some cases, eliminate the need to store hashed data in server-side tables. bcrypt is a tool for password hashing. Not having done work in this area, I can't say much about these libraries, but I fully believed Ned's point about DIY cryptography being hard to get right, and about getting professionals involved for anything important, such as protecting credit card information.

Tuesday
Sep082015

An Arduino Relaxation Tool Using Two Cell Phone Motors

Here's a fun little project I did a while back, which I thougth I'd share. The idea was to have two small motors (like the ones in cell phones) that you hold in each hand, one motor per hand. The motors alternately pulse back and forth between the left and right hands, with two potentiometers controlling how long each motor stays on, and the delay between switching. I thought it might be relaxing (and fund to build), and it actually is! It's also kind of a treat at parties (at least the geeky ones I'm invited to) because you get to see how differently people set the pots to get their own relaxation level. I like mine relatively slow, but maybe it could be cranked up to work as a stimulator.

Anyway, the project is on GitHub at http://github.com/matthewcornell/arduino-relaxation-tool. Below I've included the readme along with some pics. Enjoy!

Circuit

This is basically a mashup of two standard Arduino projects - reading analog values from a pot, and controlling a piezo motor. The circuits are duplicated for left and right sides. The circuit diagram is here, and a picture of the assembled board is here.

Here is an example of the individual projects:

Parts

Assembly

I used a basic Arduino breadboard - it was pretty straighforward. I didn't have any 33 Ohm resistors so I had to use 3 x 10 Ohm ones in series. For the motors I simply applied some heat shrink tubing for something to hold onto. I looked into more durable cases (I tried ping pong balls), but this worked OK for the prototype. See the pic here.

Code

The code manages four pins - two analog in ones for reading the pots, and two digital out ones for controlling the motors. The fifth pin is the built in LED digital out one that I pulse for effect. The pots are read by adjustPeriodAndDurationBasedOnPots() and saved into two variables: vibePeriod (how often the motor vibrates in ms: a range ~ [100 ms, 2000 ms]) and vibeDuration (how long the motor vibrates, in the same time range). The last variable is isCycleLedOn, which alternates between 0 and 1 with each cycle. After setup() initializes the pins, The standard Arduino loop() function just reads the pots (which sets the two control variables), cycles the first motor, and repeats for the second one. cycleMotor() turns on the motor, cycles the LED, waits the appropriate duration, then repeats to turn the motor off. Finally, adjustPeriodAndDurationBasedOnPots() reads the analog value from each pot (a value between 0 and 1023) and scales it to a range between ~100ms to 2000ms, which seemed to work pretty well. (Any faster was irritating, and any slower would have been boring :-)

 

Monday
Aug312015

A simple Fabric.js-based web app for annotating SVG files

In the check-it-out category, I want to share a web app prototype I wrote recently for a client. It was my first experience with writing non-trivial JavaScript (peepweather.com uses a little - its source is on GitHub at github.com/matthewcornell/peepweather) and was a lot of fun to write. The app had no polish, as you can see from the screen shots, because all they wanted was for me to quickly write a solid starting point for an upcoming fall project of theirs.

Following is some detail from the README on the GitHub fork I created. You can play with it at annotator-demo.herokuapp.com.

Description

This is a simple Flask-based web app that demonstrates using Fabric.js to implement a direct-manipulation SVG annotation tool. It has a bare-bones UI and layout (i.e., no Bootstrap or equivalent) and no production-level features such as authentication, authorization, document owners and assignments, error handling, concurrent access, deployment, etc. However, it does support most of the front end features desired, plus the connection to the back end. The original prototype supported adding new svg files by dropping them into the repository directory, which a DAO would pick up, creating corresponding json files as needed. This demo uses a fixed repository of a few files to keep it simple. The point is mainly to show the front end proof-of-concept using Fabric.js.

Motivation

The fuller version of this tool was used to help a group of people annotate thousands of PDF files in SVG format as generated by Mozilla Labs' pdf.js. The SVG retains formatting and so is browser-renderable, but the markup is more useful for information extraction than that of PDF. The ultimate goal of the project was to create a gold standard for comparing the IE output to. Along with saving a document's annotations, users could ask the server for the text bounded by a particular annotation rectangle - a non-trivial problem with these complex SVG files (the 's generated are quite chopped up).

The decision to to write a tool was made after I researched existing annotation tools including some amazing ones:

Unfortunately they lacked the customization and power user features we needed for the high-throughput workflow necessary to annotate 1000s of SVG files. I then surveyed JavaScript graphics libraries (both Canvas- and SVG-based) to find one to support the rectangle-based UI we wanted. I settled on the excellent Fabric.js (the demos are awesome) after having looked at general purpose ones (of which there are many) including:

I also looked at diagram-oriented libraries, but they felt like too much work compared to a straightforward canvas wrapper.

In the end I liked Fabric best for its inbuilt support of handles and grouping, and its solid level of activity. And it worked pretty well, modulo a number of gotchas. It's a really nice piece of work, as are many of these. The JavaScript graphics library scene is definitely alive and well.

Code tour

While I don't have my client's permission to share the code, I'll sketch out the implementation for the curious.

Routes

At the top level of this standard Flask MVC app is app/routes.py, which has four URIs/controllers:

  • /: app/templates/index.html template. Lists the SVG files in the fake DAO.
  • /docs/<fileName>: app/templates/edit.html template. Described below.
  • /docs/<fileName>/annotations: REST API endpoint for getting a document's annotations as a JSON array. Finds and reads the JSON file corresponding to the SVG file and returns it as 'application/json'.
  • /docs/<fileName>/text: placeholder endpoint for calculating the text bounded by a rectangle. Pulls the rectangle bounds from the query parameters and returns a fake string.

edit.html

This view has a simple vanilla HTML editing section at the top with three hard-coded annotation types, and the editing area underneath. The latter layers the Fabric canvas on top of the SVG file, whose element is inserted dynamically via loadSvg() in app/static/edit-document.js (see below). Finally, edit.html sets the variables needed by edit-document.js, based on the Flask-injected fileName variable, and loads the Fabric and edit-document.js.

edit-document.js

This file does the heavy lifting. On load it links up the button and shortcut actions to their methods and then calls loadSvg() to load the SVG document and its annotations. It does so using an AJAX call because of a browser SVG loading bug where the SVG size is reported incorrectly until it's completely loaded. Once it's loaded, loadSvg() inserts the SVG element behind the Fabric canvas, sets its size, and then initializes the Fabric canvas, saving it as an application property on the canvas element for easy access. Once that's all done, the function calls loadAnnotations(), which performs the AJAX call to /docs/<fileName>/annotations and then creates Fabric Rect objects for each, using the correct color for the annotation type.

One tricky bit was handling the line connecting linked Rects. Lines are managed explicitly (there is no built in 'connector' feature in Fabric), so their endpoints must be dynamically adjust during rectangle moves and resizes, and pointers to/from them must be saved as properties on the Fabric objects (i.e., a Rect needs to know all of its Lines, and a Line needs to know its two endpoints' Rects).

The rest of the code manages all the fiddly aspects of the app - resizing, duplicating, linking, etc. And of course button state must be updated based on selection changes. The only other mildly interesting thing is the AJAX call to get an annotation rect's text - a straightforward call to the /docs/<fileName>/text endpoint.

User Documentation

The UI is a straightforward direct manipulation one where users work with rectangle objects. Click to select, drag to move, drag resize handles to resize, click the delete rectangle button to remove, etc. The only feature that's non- obvious is how to add and remove links between rectangles. To add a link, select exactly two rectangles with the same label and no existing link and then click the add link button. To remove a link, select two rectangles with an existing link and the click the remove link button. Keystroke shortcuts are supported for power users:

Keystroke shortcuts

Types

          1: set current type to Title
 shift  + 1: filter all but Title
          2: set current type to Abstract
 shift  + 2: filter all but Abstract
          3: set current type to Author
 shift  + 3: filter all but Author
     Escape: reset filter

Create

+: create new annotation using current type

Move & Resize

                     arrow key: move selection 1px
            shift  + arrow key: move selection 10px
           option  + arrow key: resize selection 1px
 shift  +  option  + arrow key: resize selection 10px

Delete/Duplicate

backspace|delete: delete selection
    control  + d: duplicate selection

Select

          tab: select next
 shift  + tab: select previous

Text Feedback

x: display text for selection
Saturday
Aug292015

Proposal: An online skeptical toolbox

For some time I've been collecting project ideas to help my overall goal of "computing in the service of humanity" (a phrase I recently picked up). While I definitely want to find consulting work in this area, starting up a personal project in the meantime is important to me. Out of a bunch of ideas I've decided to start with an online skeptical toolbox.

What finally kicked me in the butt were two things: 1) The publishing of Google's paper Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources (which got a good bit of coverage), and 2) the recent Committee for Skeptical Inquiry article Online Tools for Skeptical Fact Checking by Tamar Wilner. To that end I've created a GitHub project that at the moment has two files: the proposal itself and a somewhat-organized XMind mind map file skeptical-toolbox.xmind that lists some detail.

As I said in the Implementation section, the tools cover a range of complexity, which affords our quickly rolling out something useful using the simpler ones, and progressively introduce more sophisticated tools as the toolbox develops. I plan on using Python + Flask to write it in (same as I used for PeepWeather), but I'm quite open to other languages and frameworks, such as Ruby on Rails if someone steps up and convinces me. One thing in Python's favor is the popular Natural Language Toolkit with its information extraction tools, especially around entity recogition.

What do you think? It is very early days (day zero, I suppose) but I hope that sharing this will generate some thoughts. If you're interested in helping build this - awesome! Just comment here or send me a line.

(Image: Memory Belt)