Welcome to the IdeaMatt blog!

My rebooted blog on tech, creative ideas, digital citizenship, and life as an experiment.

Monday
Aug312015

A simple Fabric.js-based web app for annotating SVG files

In the check-it-out category, I want to share a web app prototype I wrote recently for a client. It was my first experience with writing non-trivial JavaScript (peepweather.com uses a little - its source is on GitHub at github.com/matthewcornell/peepweather) and was a lot of fun to write. The app had no polish, as you can see from the screen shots, because all they wanted was for me to quickly write a solid starting point for an upcoming fall project of theirs.

Following is some detail from the README on the GitHub fork I created. You can play with it at annotator-demo.herokuapp.com.

Description

This is a simple Flask-based web app that demonstrates using Fabric.js to implement a direct-manipulation SVG annotation tool. It has a bare-bones UI and layout (i.e., no Bootstrap or equivalent) and no production-level features such as authentication, authorization, document owners and assignments, error handling, concurrent access, deployment, etc. However, it does support most of the front end features desired, plus the connection to the back end. The original prototype supported adding new svg files by dropping them into the repository directory, which a DAO would pick up, creating corresponding json files as needed. This demo uses a fixed repository of a few files to keep it simple. The point is mainly to show the front end proof-of-concept using Fabric.js.

Motivation

The fuller version of this tool was used to help a group of people annotate thousands of PDF files in SVG format as generated by Mozilla Labs' pdf.js. The SVG retains formatting and so is browser-renderable, but the markup is more useful for information extraction than that of PDF. The ultimate goal of the project was to create a gold standard for comparing the IE output to. Along with saving a document's annotations, users could ask the server for the text bounded by a particular annotation rectangle - a non-trivial problem with these complex SVG files (the 's generated are quite chopped up).

The decision to to write a tool was made after I researched existing annotation tools including some amazing ones:

Unfortunately they lacked the customization and power user features we needed for the high-throughput workflow necessary to annotate 1000s of SVG files. I then surveyed JavaScript graphics libraries (both Canvas- and SVG-based) to find one to support the rectangle-based UI we wanted. I settled on the excellent Fabric.js (the demos are awesome) after having looked at general purpose ones (of which there are many) including:

I also looked at diagram-oriented libraries, but they felt like too much work compared to a straightforward canvas wrapper.

In the end I liked Fabric best for its inbuilt support of handles and grouping, and its solid level of activity. And it worked pretty well, modulo a number of gotchas. It's a really nice piece of work, as are many of these. The JavaScript graphics library scene is definitely alive and well.

Code tour

While I don't have my client's permission to share the code, I'll sketch out the implementation for the curious.

Routes

At the top level of this standard Flask MVC app is app/routes.py, which has four URIs/controllers:

  • /: app/templates/index.html template. Lists the SVG files in the fake DAO.
  • /docs/<fileName>: app/templates/edit.html template. Described below.
  • /docs/<fileName>/annotations: REST API endpoint for getting a document's annotations as a JSON array. Finds and reads the JSON file corresponding to the SVG file and returns it as 'application/json'.
  • /docs/<fileName>/text: placeholder endpoint for calculating the text bounded by a rectangle. Pulls the rectangle bounds from the query parameters and returns a fake string.

edit.html

This view has a simple vanilla HTML editing section at the top with three hard-coded annotation types, and the editing area underneath. The latter layers the Fabric canvas on top of the SVG file, whose element is inserted dynamically via loadSvg() in app/static/edit-document.js (see below). Finally, edit.html sets the variables needed by edit-document.js, based on the Flask-injected fileName variable, and loads the Fabric and edit-document.js.

edit-document.js

This file does the heavy lifting. On load it links up the button and shortcut actions to their methods and then calls loadSvg() to load the SVG document and its annotations. It does so using an AJAX call because of a browser SVG loading bug where the SVG size is reported incorrectly until it's completely loaded. Once it's loaded, loadSvg() inserts the SVG element behind the Fabric canvas, sets its size, and then initializes the Fabric canvas, saving it as an application property on the canvas element for easy access. Once that's all done, the function calls loadAnnotations(), which performs the AJAX call to /docs/<fileName>/annotations and then creates Fabric Rect objects for each, using the correct color for the annotation type.

One tricky bit was handling the line connecting linked Rects. Lines are managed explicitly (there is no built in 'connector' feature in Fabric), so their endpoints must be dynamically adjust during rectangle moves and resizes, and pointers to/from them must be saved as properties on the Fabric objects (i.e., a Rect needs to know all of its Lines, and a Line needs to know its two endpoints' Rects).

The rest of the code manages all the fiddly aspects of the app - resizing, duplicating, linking, etc. And of course button state must be updated based on selection changes. The only other mildly interesting thing is the AJAX call to get an annotation rect's text - a straightforward call to the /docs/<fileName>/text endpoint.

User Documentation

The UI is a straightforward direct manipulation one where users work with rectangle objects. Click to select, drag to move, drag resize handles to resize, click the delete rectangle button to remove, etc. The only feature that's non- obvious is how to add and remove links between rectangles. To add a link, select exactly two rectangles with the same label and no existing link and then click the add link button. To remove a link, select two rectangles with an existing link and the click the remove link button. Keystroke shortcuts are supported for power users:

Keystroke shortcuts

Types

          1: set current type to Title
 shift  + 1: filter all but Title
          2: set current type to Abstract
 shift  + 2: filter all but Abstract
          3: set current type to Author
 shift  + 3: filter all but Author
     Escape: reset filter

Create

+: create new annotation using current type

Move & Resize

                     arrow key: move selection 1px
            shift  + arrow key: move selection 10px
           option  + arrow key: resize selection 1px
 shift  +  option  + arrow key: resize selection 10px

Delete/Duplicate

backspace|delete: delete selection
    control  + d: duplicate selection

Select

          tab: select next
 shift  + tab: select previous

Text Feedback

x: display text for selection
Saturday
Aug292015

Proposal: An online skeptical toolbox

For some time I've been collecting project ideas to help my overall goal of "computing in the service of humanity" (a phrase I recently picked up). While I definitely want to find consulting work in this area, starting up a personal project in the meantime is important to me. Out of a bunch of ideas I've decided to start with an online skeptical toolbox.

What finally kicked me in the butt were two things: 1) The publishing of Google's paper Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources (which got a good bit of coverage), and 2) the recent Committee for Skeptical Inquiry article Online Tools for Skeptical Fact Checking by Tamar Wilner. To that end I've created a GitHub project that at the moment has two files: the proposal itself and a somewhat-organized XMind mind map file skeptical-toolbox.xmind that lists some detail.

As I said in the Implementation section, the tools cover a range of complexity, which affords our quickly rolling out something useful using the simpler ones, and progressively introduce more sophisticated tools as the toolbox develops. I plan on using Python + Flask to write it in (same as I used for PeepWeather), but I'm quite open to other languages and frameworks, such as Ruby on Rails if someone steps up and convinces me. One thing in Python's favor is the popular Natural Language Toolkit with its information extraction tools, especially around entity recogition.

What do you think? It is very early days (day zero, I suppose) but I hope that sharing this will generate some thoughts. If you're interested in helping build this - awesome! Just comment here or send me a line.

(Image: Memory Belt)

Tuesday
Aug112015

Available! Versatile Software Engineer, 20+ Years Experience, Object-Oriented & Extreme Programming

After taking a break from computing in 2009 and creating a new kind of social platform - one based on treating life as an experiment - I happily returned to CS research in 2011. I was asked to re-join the AI group at UMass where I previously helped build Proximity, a platform for machine learning research. However, that position's funding has ended (as is the way of research's ebb and flow) and I'm excited to move on to the next adventure.

In those four years I got a lot done, including exploring ways to scale up the Ph.D. students' algorithms [1], coaching students in the Extreme Programming agile software development methodology that I love so much, implementing lab infrastructure improvements (I consolidated multiple aging servers into a single modern cluster, and got the lab using the commercial wiki, Confluence), and of course collaboratively designing and coding the lab's new causal learning in Python (a language I love, BTW, though I still enjoy Java).

As a pleasant surprise, before the funding ended in May, at the last minute another CS professor asked me to help his lab get some urgent work done during the summer, which I was happy to do. While there I benchmarked and started an optimization effort for their new information extraction pipeline, led the team in meeting a crucial grant deadline, and prototyped a web-based PDF annotation tool [2] for creating a gold standard to evaluate their algorithms' performance against. I was introduced to Node.js, continued learning Scala (I wrote some for my GraphX work), and JavaScript for the web app.

Looking forward, I've started my job search for my next project, with the goal of finding work that's challenging, engaging, and meaningful. I've not had to do one in a while, so the process itself is a kind of experiment. What approaches will work? What is out there? Who is doing cool work? Exciting, and a little scary.

So: if you know someone looking for a versatile software engineer with lots of experience who writes excellent code, then please - drop me a line. My LinkedIn profile is linkedin.com/in/matthewcornell.

  • [1] I explored multiple MPP approaches including single-node SQL using PostgreSQL, distributed SQL using Impala on the Hadoop ecosystem and Vertica, and graph databases such as GraphX, Neo4J, and Giraph.
  • [2] The prototype allows users to load a PDF file as a browser-renderable SVG, draw and edit rectangular regions overlaid on the text, and save and load them from a server. I used the standard web technologies of JavaScript and JSON, all running on Play Framework and its Java API. I used Fabric.js for the direct manipulation UI, which I really enjoyed working with (check out the awesome demos).

(Image from Fountain on Boston Common.)

Wednesday
Apr222015

Smith College Guest Lecture, "How to build a simple production web site in Python"

Last week I had a fun gig giving a one-hour guest lecture for Smith College's freshman Intro Computer Science through Programming class. The topic was "How to build a simple production web site in Python" in which I was invited to share the process and technology that went into building PeepWeather.com . The rationale was that it would give the students a taste of what is possible to create using some relatively straightforward Python frameworks and libraries, hopefully providing some motivation for those wondering what good their current and (necessarily) bounded assignments are.

I had a great time meeting the challenge of putting together a presentation that had breadth and was motivating, but at the same time understandable to the audience (with a the right amount of stretching to keep them engaged). It went well, with some excellent questions during and after the talk. Here I want to share a little about the concepts and technology I covered in case you're considering giving a similar presentation of your own site. For the curious, here's the PowerPoint as a PDF (unfortunately the video did not work out).

Technologies

I named these technologies, describing each pretty briefly, except for diving deeper into Flask:

Concepts

I covered the following at various levels of detail:

The Code

I jumped into the code at the appropriate points, folding as needed so I didn't overwhelm. I stuck to high level basics, highlighting the concepts they currently know including classes and methods. Mainly the focus was on how the concepts got implemented in Python, but no significant code detail; there simply wasn't time. (Note that IntelliJ IDEA's presentation mode worked great for the projector.)

Motivation

I tried to make two points to get the students excited. First, they are programmers and so they have a powerful skill to create web sites! I suggested they pay attention to any thoughts that come up like "Wouldn't it be great if you could __?" or "Why isn't there a site to __?" Second, I pointed out that putting together even a small site is a fantastic portfolio piece that impresses prospective employers. It shows that they can have ideas and, better yet, bring them into reality on their own. I wrapped up by saying that I had a ton of fun writing PeepWeather and learning the various technologies. It's extremely gratifying to bring something into reality (if that's the right word for something as virtual as a web site :-)

Overall I enjoyed both the talk's preparation and presentation, and I hope to give it at the other Five Colleges.

Friday
Mar272015

Loving the wonderful book Code Complete 2

After two false starts [1], I am working my way through Steve McConnell's book Code Complete: A Practical Handbook of Software Construction, Second Edition (links: author, book , amazon). He describes the topic as "an extended description of the programming process for creating classes and routines." Extended is right!

There are very good reasons why it is in the top 10 in software design at Amazon: After 1/2 way through, I've found the book is unique and very well-written. (I am not alone in this assessment [2].)

It is unique because it goes into a level of detail about coding that is specific and really deep. McConnell has given incredibly clear thought into the minutae of programming, and then shared it with fine writing. My boss chucked when I showed him that there are four chapters (table of contents here) on variables, one dedicated solely to naming them. Awesome! I've been writing software for a long time, and it was gratifying to find that McConnell has put into words things learned intuitively, such as Think about efficiency:

... it's usually a waste of effort to work on efficiency at the level of individual routines. The big optimizations come from refining the high-level design, not the individual routines. You generally use micro-optimizations only when the high-level design turns out not to support the system's performance goals, and you won't know that until the whole program is done. Don't waste time scraping for incremental improvements until you know they're needed.

But beyond being reminded of these (and there are many, many of them, for example: "Initialize each variable close to where it's first used"), a big payoff is the new-to-me ones, of which there are many. The book is simply too detailed to summarize the main points, so I'll just share some tidbits that jumped out at me:

  • "Upstream Prerequisites" ("programmers are at the end of the food chain. The architect consumes the requirements, the designer consumes the architecture, and the coder consumes the design.") His point: "the overarching goal of preparation is risk reduction. By far the most common project risks in software development are poor requirements and poor project planning." (Note: I found the StackOverflow question Software design vs. software architecture answered terminology confusion I had about those terms: "I think we should use the following rule to determine when we talk about Design vs. Architecture: If the elements of a software picture you created can be mapped one to one to a programming language syntactical construction, then is Design, if not is Architecture.")
  • Programming IN a language (limit thoughts to constructs the language directly supports) vs. INTO a language (decide what thoughts you want to express then determine how to do so using the language's tools).
  • "Wicked problems" (like software design): can be clearly defined only by solving it or solving part of it.
  • "Design is a sloppy process": is about tradeoffs and priorities; involves restrictions; is nondeterministic; is a heuristic process; is emergent.
  • Software's Primary Technical Imperative: the importance of managing complexity by dividing a system into subsystems. "The goal of all software-design techniques is to break a complicated problem into simple pieces. The more independent they are the better."
  • For the sake of controlling complexity, you should maintain a heavy bias against inheritance (and prefer containment): Inherit when you want the base class to control your interface. Contain when you want to control your interface.
  • Aim for loose coupling (the manner and degree of interdependence between software modules) and strong cohesion (the degree to which the elements of a module belong together).
  • Routine length (he generalizes procedures, functions, and methods as "routines"): Allow them to grow organically up to 100-200 lines (excluding comments & blank lines). (My routines tend to be smaller - I'll have to give this more thought.)
  • Routine parameters: pass <= ~7 of them. More implies too much coupling.
  • Design Practices - Capturing Your Design Work: code documentation, wiki, email summaries, camera (whiteboard) vs. a drawing tool, flip charts, CRC cards (I love 'em), UML diagrams.
  • The Pseudocode Programming Process: It was helpful to see this named and formalized. "Once the pseudocode is written, you build the code around it and the pseudocode turns into programming-language comments. This eliminates most commenting effort. If the pseudocode follows the guidelines, the comments will be complete and meaningful." Cool!

And many more. Bottom line: If you are serious and committed about improving your craft, read the book! How about you? What coding books have helped you be a better programmer?

[1] There were two reasons my first two attempts to read this hefty 900-pager failed. 1) I wasn't fully committed, and 2) I didn't have a concrete plan to read it. I resolved the former when I realized I'd been focusing on my work projects to the exclusion of professional development [3] (other than reading a handful of blog posts a week) and needed to shake things up. The second block was simpler to address - break the book up into manageable chunks and commit to reading one every day. I summarized this generally in Reading Books The GTD Way, but in this case I did a back-of-the-napkin analysis, working backward from an estimated chapter velocity:

900 pages, 35 chapters

  • 1 chapter/hr -> 35 hours
  • 1 hour/workday, 3 workdays/week -> 3 hours/week
  • -> ~12 weeks (~2.8 months)

I've been recording my minutes/page progress, which ranges between 1 and 2 1/2. With an average chapter being about 30 pages, this works out to be about 60 minutes/page. Good estimate! (Actually, I'm readming more chapters than three per week, and at my current rate I will be done at the end of April.)

[2] A helpful review was Matt Grover's blog: Code Complete Review, which pointed me to Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin (Amazon link here). That book looks excellent too, though it apparently has less detail, weighing in at a svelte 500 pages.

[3] I was surprised to find few articles on the Web on the importance of professional development for software engineers, at least via a quick Google search. The main one was 10 Professional-Development Tips for Programmers. It is structured annoyingly as a slide show, meaning you have to click through to see the content (an SEO and advertising gambit?), so I've collected the titles for you:

  • Staying Current Requires Continuous Learning
  • Problem-Solving Skills
  • Communication and People Skills
  • Networking and Personal Branding
  • Code Documentation and Neatness
  • Master Naming Functions
  • Get Familiar With Agile
  • Get Familiar With a Native Mobile Platform
  • Project Management Skills
  • JavaScript, CSS and HTML5 Skills

Searching the two programming-specific sites I like (Hacker News and reddit.com/r/programming - what are your favorites, BTW?) was more fruitful. Here are a few: