.:aaron.helton:.

Google hasn’t mapped my thoughts just yet, so don’t get lost

PMF RSS Redux

with one comment

When I wrote the original RSS feed for the Projected Position System a few days ago, I was not satisfied with it.  It had a number of limitations that I suggested would only be fixed if I did the parsing myself.  Well, after a good deal of time figuring out how to do just that, I have created a new RSS feed that works better.  I still used Yahoo! Pipes to serve it up, but it’s coming from my home server (incidentally, I would love to have a mirror for this if anyone is interested).

Enjoy: http://pipes.yahoo.com/aaronhelton/pmfrss

Oh, and for anyone who is interested in my parser, read on.

Parsing the PMF’s PPS Site

For things like this, I use ruby more often than not.  It is very powerful and yet concise, plus it has a ton of third party libraries (gems) that make a number of tasks easier.

I had tried screen-scraping the PPS before, but had met with errors due to some badly-written HTML.  In the header HTML, for example, there was a closing STYLE tag with no opening STYLE tag.  Further down I found an illegal SPAN tag within a  TABLE element tag.  These two issues caused me a great deal of grief.  In the case of the first, it meant that very strict parsers were unable to process the document; in the second case it made finding the desired elements very difficult (as you will see below).

After some effort getting the right set of parsing tools for the job, I settled on Hpricot, since I really didn’t need to interact with the page in any meaningful way.  My basic search looks like this (including the beginning of the file):

require 'rubygems'
require 'open-uri'
require 'hpricot'

url = "https://www.pmf.opm.gov/JobSearch/results.aspx"

jobList = Array.new

doc = Hpricot(open(url))
doc.search("//span[@id='lblJobsList']").remove

list = doc.search("//font[@SIZE='-2']../../../tr")

That snippet is sufficient to open the PPS page, sort through the HTML, and return the section of the document that includes the job listings. From there it was a matter of limiting the output (20 rows), cleaning up the data, and grabbing the elements that would appear in the RSS feed. I have it set to run every hour from my home machine. With any luck, it will run there for a while, but with even MORE luck, PMF will obviate this with their own.

If you want the code for yourself: http://heltons.mooo.com/pmf/pmf.rb.txt

Written by aaronhelton

May 4, 2009 at 7:35 pm

Posted in development, pmf, ruby

Adding RSS to the PMF PPS

with one comment

[Update: I had to revise the feed links below since I could not rename the improved Pipes feed.]

I decided to take matters into my own hands and drag some portion of the PMF program kicking and screaming into the 21st Century: I made an RSS feed for the Projected Positions System.  OK, so that’s a link to the Yahoo! Pipes application, but it does have its own feed.

What follows is my reasoning and methodology, the feed’s limitations (due to technical constraints), and a how-to on RSS for those who aren’t familiar.

Reasoning

I puzzled over (well, ranted about) what I considered missing functionality in my recent  PMF Thoughts posting: the lack of RSS for the Projected Position System.  It seems that the federal government is only now getting the idea when it comes to providing data that can be consumed in a variety of ways, and the PPS is far behind any modern technology for job listing.  It is cumbersome to have to visit the PMF web site and go through their search options just to get a list of newly posted jobs; this is even more tedious when you are actually looking at specific agencies, but my tool doesn’t really address this directly.

What I set out to create was a simple RSS feed of the jobs that had been posted in the PPS to date.  The nature of RSS is that new postings show up at the top of the list in whatever program you have that reads the items (see the quick RSS primer at the end of this post if you haven’t used feeds before), and it updates the list by periodically polling for changes.  That means you no longer have to visit the site just to see if something new is there; you just point a feed reader at it and let it do the checking for you.

Methodology

My first impulse back in mid-March was to build a screen scraper that would load the positions page and parse out all the listings.  Of course, there is nothing technically limiting me from taking this approach (time notwithstanding), but my initial attempts did meet with some technology hurdles that I never really felt motivated to overcome.  So I let the project sit around a while, until someone mentioned Yahoo! Pipes.

I had come across Yahoo! Pipes shortly after its initial launch in 2007, but I just regarded it as a plaything that didn’t look very useful.  When I looked at it again yesterday, however, I immediately thought of what I could try with it.  Pipes has a function that lets you parse regular web pages for content, then use that to build RSS (or other output type) feeds.  I tried to use it, but it does not include a way to manipulate page elements (like clicking buttons and submitting forms).

Then I tried something: by going to the plain old job results page directly, bypassing the search filters, I got the ENTIRE listing.  But I still ran into a problem.  The Pipes module that fetches an HTML page is limited to 200KB, and the results page is over 400KB.  Nothing I tried could knock that down in a way that Pipes would accept, and so for a bit I believed the whole thing would be impossible.

I don’t give up easily though.  Some searching around turned up another screen to data service that ended up working: Dapper.  Dapper allowed me to load the entire page, then select the elements I was looking to include in my feed, name each one, and pass it out into a data format of my choice.  Right up front, though, I noticed some limitations (all of these are detailed below), such as feed items that consisted entirely of part of an agency name, some jumbled and munged links, and the sort.

I wasn’t entirely satisfied with the output from Dapper, but I knew I could use Dapper’s output as input to Pipes, so I went back to work.  Initially, I chose CSV for Dapper’s output, thinking that CSV would be easy for Pipes to work with.  When that proved fruitless, I went home and thought about it some more, hopped on this morning, and decided to try RSS for the Dapper output.  Bingo!

With RSS out from Dapper, I could use Pipes to parse the feed and Filter certain items, such as those with no post dates (that is, I wanted to get rid of the orphaned items that included only the pre-linefeed portions of the agency names).  I did this by looking to see if the feed’s posted date included either AM or PM, since all of the posted positions included a time.  That seemed to work, and what we have is the output you see if you follow the above link.

Limitations

The feed is not without limitations.  I worked around as many as I could, but without writing a custom script to parse this, there’s really no way I can fix the feed.

  1. Missing Agency Names: Some of the agency names were cut off due to the presence of newline or linefeed characters in between the agency and sub-agency names.  So, for instance, “Department of the Interior [line break] Bureau of Land Management” shows up in the feed as “Bureau of Land Management.”  In most cases, I think this is a minor issue.
  2. Bad Job Title Links: In cases where the feed title contains multiple job title listings, the feed item link does not work correctly.  This is a result of (3), where some positions were merged together in the parsing (it could be missing or improperly-terminated HTML elements for all I know).  The links in the body of these feed items do work, however.
  3. Merged Job Postings: Again, this could have a number of causes, but without writing my own parsing script, I can’t fix it.  One of the side effects of this, incidentally, is that a few of the job postings have been lumped together under the wrong agency.  I would consider this the most severe of the three issues, but any confusion can be cleared up by following that job title’s link back to the PPS.

RSS Primer

In case you aren’t familiar with RSS, let me provide a bit of info.  RSS is a syndication format that pipes machine-readable data around the Web.  What it’s really good for (so far) is providing lists of updates to frequently updated sites, such as news sites, blogs, and the like.  RSS is an XML format that is readable by programs known as feed readers.  There are many readers available, but my favorite is Netvibes (technically Netvibes is not a reader all by itself, it just contains a way to gather feeds into separate feed-reading widgets).  You can also use iGoogle if you are so inclined.  Anyway, the beauty of RSS is that the feed reader tells you when you have something new, usually by making unread items appear in bold face; it knows the items are new because all items have a posted date that should correspond with the actual publication date of the items in question.

Each feed reader has its own method of adding feeds to check, so you would have to try them out to understand the steps for that particular reader.  I haven’t found RSS very difficult once you have a feed address.

Written by aaronhelton

May 1, 2009 at 5:02 pm

Posted in pipes, pmf, technology

To Thomas: Your Third Birthday

leave a comment »

First off, yes, this is late.  Your birthday was on Friday, April 24th, and while we did not get to observe it on that day with a great deal of fanfare, you may rest easy knowing that we did have a celebration (several, actually) for you.  Your actual party was the weekend before your birthday, where you got to play on one of those big jumping balloons and eat cupcakes (plus you got to open your presents).  Some time before your birthday (in the same month at least), we bought you a bicycle.  I can see it will take you some time before you can figure out how to pedal it, but you have plenty of time to learn.

Your third year has been a joy to watch.  You are conversant in English, Spanish, and whatever you call that other set of vocalizations :) and you know most of your alphabet.  With some effort, we potty trained you this year, and you are mostly accident free.  For now, you still sleep with training pants, but that’s just fine.  All of this means that you are finding your way in the world, gaining your independence, and leaving behind your baby days.

Your attention span has greatly increased (naturally), and so you will sit longer for stories now.  You like to look at your books, when Daniel isn’t taking them away, and you have a few favorite toys that you latch on to, when you can keep them from your brother.  For a while, you were very attached to a toy backhoe, but you’ve alternated between various vehicles, including tractors and bikes.  You got a stuffed yellow ducky from Ms. Tina, and you seem to really like it as well.

Speaking of your brother, he does tend to pester you a lot.  When you were younger, you put up with it, but you have been asserting yourself more and more, often quite aggressively.  Most times, though, the two of you play well together, and we can tell there’s a brotherly bond between you.

There are changes coming over the next year, and they are big, but I am confident you will adjust to them quickly.

Always know that I love you, and everything that I do is for the betterment of you and your brother

Written by aaronhelton

April 29, 2009 at 4:58 pm

Walk Score Ubiquity Command

leave a comment »

I’ve found myself interested in the Walk Score results for all of the potential locations I am looking at online for when I relocate. Incidentally, I have been using Ubiquity, Mozilla’s handy extension that puts semantic-web-like capabilities in the hands of end-users.  One of the features I use the most is the “map” command, which takes your highlighted text as the input for a Google Maps query.  This is great for just displaying a map, but I also want to check the Walk Score without having to open a new tab, browse to the Walk Score web site, then paste in the address.  So I created my own Ubiquity command that does this for me.  All I have to do now is highlight an address, bring up the Ubiquity interface (read Ubiquity site for more info on that), and start typing “walkscore.”  Once it tells me in my preview area about the command to be executed and the selection it has, I press Enter, and I’m taken straight to the Walk Score page for that address.

Installation

If you haven’t already done so, you’ll need to install Ubiquity.  Once you do that, the walkscore command is available automatically from here, but you can also just copy and paste my code into your Ubiquity command editor.

Let’s see if this works:

CmdUtils.CreateCommand({
  name: "walkscore",
  homepage: "http://heltons.mooo.com/ubiq/",
  author: {name: "Aaron Helton", email: "mariusagricola@gmail.com"},
  license: "GPL",
  description: "For any address you can map in Google Maps, you can
  also get your walk score",
  help: "Finds the Walk Score for any address you can find in Google
  Maps.  Try using a full address for the narrowest and most accurate
  result.",
  takes: {"input": /.*/},
  preview: function(pblock, input) {
    pblock.innerHTML = "Get the Walk Score for <b>" + input.text +
    "</b>.";
  },
  execute: function(input) {
    //displayMessage("Get the Walk Score for: " + input.text);
    var wsloc = input.text;
    var wsurl = "http://www.walkscore.com/get-score.php?street=" +
     escape(wsloc) + "&go=Go";
    Utils.openUrlInBrowser( wsurl );
  }
});

It’s short and sweet, but quite useful for me.  If you find it similarly useful, let me know.  Also, I’ll be happy to help work out any issues you find (especially if Walk Score changes their query structure).  Oh, and if the link to my home server doesn’t work, just get the code from this page.  I will most likely be relocating soon, and the server could be offline for a bit.

Written by aaronhelton

April 3, 2009 at 6:37 pm

PMF Program Thoughts, Part 1: Nomination to Job Fair

with 9 comments

Overview

The PMF program is designed to source the next generation of Federal Government managers from the nation’s top graduate students.  I have documented the steps from Nomination to Job Fair (which could include appointment).  Subsequent parts will look at appointment, background checks, rotational opportunities, and training, as well as (if I can get it) some compiled and anonymized feedback on some of the participating agencies.

Nomination Process

While the nomination process is quite familiar to schools with a history of participation in the PMF program, it was difficult to convey to a school that had no such history. I mentioned in an earlier posting [link] that finding someone to act as the nominating official was no easy task, in part because nobody at the school clearly understood the requirements that the school would need to meet in order to nominate anyone. In the future, I would like to see a better-organized set of information for schools to draw from, such that an interested student need not do all the footwork. Perhaps with my ultimate acceptance and placement into the PMF program, I can help St. Edward’s in particular to establish a more formal program for this. I am convinced that many schools (and students) do not participate in the program either because they are unaware of its existence, or because even if they have heard of it, they might assume that the only people the Federal Government would be interested in are those with a background in law, public policy and administration, or another closely related field. The fact that the vast majority of applicants have these backgrounds and come from schools that have strong programs in these fields seems to support this conclusion. Taking this to its (in my mind) natural conclusion, it makes sense that a number of finalists ultimately fail to find a position that interests them. I will discuss what I perceive to be the mechanics of the placement process in more detail in the Finalist Pool section. Interestingly, the Federal Government, or its primary recruiting partner, OPM, does partner with schools in meaningful ways via StudentJobs.gov. I don’t have data, of course, on the effectiveness of these partnerships in promoting Federal employment in universities, but even so, the effort appears to me to be a somewhat siloed approach. In any event, my own discovery of the PMF program was not through StudentJobs.gov, and I think there is a real opportunity for the Federal Government to further leverage the relationships it has developed so far with colleges and universities. Communication is the biggest factor here, both in terms of making good information both easily accessible and easy to find. Whatever the reason, I do not see that this has been achieved so far, but that does not mean we can’t hope for better.

PMF Assessment

Quite some time after the nomination comes the assessment. Those who were nominated by the schools to participate in the program converge on a number of testing facilities to take a standardized exam. For me, it was a simple drive to a location in the same town where I am (was) living. The exam itself did not seem very difficult, and had three sections, one for scenario-based reasoning, one for open-ended personality questions, and one for editing and proofreading. The only meaningful way to prepare for this test is to use the sample questions provided on the PMF web site. What is interesting about the exam is that, subsequent to nomination, it is the only item used to score an applicant’s package and ultimately determine whether a nominee becomes a finalist. This means that the background of the test-taker is completely disregarded, with one important exception: veteran’s preference. A PMF program official at the job fair rattled off an estimate of the number of veterans that participate: 10%. In a standard distribution model, we can estimate that the percentage of veterans in the nomination pool versus the number in the finalist pool remains constant, except that it might not, since the preference grants a score additive. Either way, the fact remains that for the majority of applicants, the only factor that determines a nominee’s finalist status is the assessment.

NORs and General Communication

NOR stands for Notice of Results, and it is the primary communication channel for the PMF program to notify applicants of both their nomination status and their finalist status (including job fair information). Beyond these two very specific uses, however, it conveys nothing else. Unverified anecdotal evidence exists of the mishandling of finalist NORs this year, as a number of people whose last names started between P-Z reported a lack of notification when they should have gone out (and indeed the other name groups, including myself, received them very quickly). Combined with this is the dismal news page link presented on the PMF web site. Sure, there is a picture of a newspaper there, but it is to the side of the other categorical links. It takes you to a page in which context is severely lacking. We have an excellent model for how this SHOULD look (you’re reading a perfect example). Even if the PMF crew found it necessary to completely replace the previous year’s news with a fresh set (I’m not convinced this is necessary; if you’re reading this, you’re web-savvy enough to know what date this was posted), the format should ideally be blog-like. Can I get some RSS? At a minimum, PMF nominees should be able to opt-in to email notifications when such news is posted. Overall, I would give the PMF program a rating of 50% on the communication front. I can’t verify the P-Z issue beyond the anecdotes that have already been shared, but I can point to the other listed shortcomings in my assessment. Based on this, the PMF program is sitting at 1999 technology at best.

PPS

I’m sure much could be written about the woeful state of the Projected Position System, but I will try to summarize the shortcomings I see in it. Quite frankly, I think the PPS is the worst feature of the program and its site. First off, it lacks any way to search positions that have been posted. Even the most basic job board has this feature, so the fact that the PPS does not makes it harder to use than probably most finalists these days would be accustomed to. Second, the approach to sorting the listed positions is just awful. There is no reason in this modern age to separate the sorting options into four different pages that only offer the four ways of sorting. Finally, it would be nice to see some other basic details about the positions listed, such as the total openings for each posting, as well as information that only appears on the other views. You know what would be a vast improvement? An AJAX-ified sortable grid like what you get with the Rails plugin ActiveScaffold. It has the advantage of including a built-in free-text search, which solves all of the inadequacies I already listed. Further, because it’s just so cool ALL BY ITSELF, you don’t even have to dress it up. Come on, PMF program, enter the 21st Century sometime before we reach the 22nd. Oh, and can we get RSS on the postings? Pretty please??

Lest you think I’m all complaints here, I do think the PPS contains a wealth of information; it’s just not intuitively easy to sort through.

Finalist Pool and Competition for Positions

I admit I had done some research into the academic backgrounds of previous years’ finalists, and so I knew going into this that Information Technology was highly underrepresented as compared to Law, Public Administration, and the like (in fact, IT is lumped together in “Other,” just to provide some perspective). Nevertheless, I am going to stretch a bit and suggest that this program need not attract such a homogenous group necessarily. The current makeup of the finalists for this and previous years is evidence of both a great deal of experience within certain schools and academic programs with respect to their participation in PMF and of the natural inclination of those programs to lead to public service, of which PMF is but one possibility. What I would like to see is the targeting of schools that don’t produce Law and Public Administration graduates to try to bring up the participation of those in the “Other” category, IT included. What this homogeneity means for participating agencies and finalists is that some positions will receive the bulk of finalist applications, while others will scarcely draw attention or, if they do, that attention will only come from underrepresented groups. In short, the power dynamic is different depending on which group you are in. I hate to suggest that those with underrepresented but in-demand backgrounds can have their pick of positions, but it certainly doesn’t hurt to be in a field that everyone wants and few finalists have. For those in backgrounds that are well-represented, the finalists themselves often will have to compete with one another for what they perceive to be desirable positions. In all, this means that there could very well be a difference between the academic fields of finalists and the needs of the agencies recruiting. This alone, however, would not account for the 2/7ths of the finalists who have failed in the past to secure positions. I suspect there are a number of reasons why a finalist ultimately does not secure an appointment. First, they may withdraw at any point, which as of this writing has been the case for at least three or four finalists on the list for this year. Second, no suitable match might be available between the finalist’s background and the participating agencies. This could certainly be the case, for instance, for engineers, but it could also be the case for the JDs and Public Administration folks. Third, finalists may simply pursue other interests during the year in which they might have otherwise secured an appointment.

PMF Job Fair

And that brings us to the job fair. There is a fair amount of jockeying that goes on between the NOR and the actual job fair, in which finalists attempt to secure one of the precious few interview slots with the agencies they are interested in, and the agencies attempt to fill the available slots with finalists they are interested in. It’s a fine dance, I can imagine. The job fair itself is a whirlwind of activity, and I am certain it is stressful for those who did not secure interviews before arriving. I know it was stressful for me, and I had all but two of my 10 interviews arranged before arriving. Nevertheless, the job fair was fun and interesting, and despite the stress, I had a pretty good time meeting with other agencies and talking to them about whether we could come to any agreement. The biggest surprise for me was receiving an offer on the spot, in my first interview no less! My advice for any finalist that is unsure whether to attend? GO! It is worth the time to talk to agencies face to face, because you may think you have an idea what you want to do or whom you want to work for, but until you let them describe what it is they do, you might have no real idea what it is you would be getting into, or whether the program is ultimately right for you.

Conclusions

Overall, my experience in this program has been overwhelmingly positive. There were some issues that I pointed out, but I do not consider them a reason to abstain from the program. I would do this all over again, and I hope it’s an option when my kids grow up.

Written by aaronhelton

March 31, 2009 at 12:41 pm

Posted in pmf