Friday, January 30, 2015

Thoughts on Interviewing

I recently moved from San Francisco to Seattle, for personal reasons, and for a misguided attempt at getting involved in a research project. I am thankfully no longer involved, but that also means I've had to look for a job in a new city with relatively few contacts at the worst time of the year: mid-December. I could say a lot about an employer that would let go of an employee in such circumstances, but that's a different topic altogether.

What I've come to appreciate during this journey to find a new job is the difference a good interview process can make in finding and hiring a good candidate. This does not necessarily have anything to do with Seattle, because the companied I targeted included some relatively large companies based outside of Seattle.

For the record I consider myself to be a good generalist engineer capable of working at a senior level in a technical leadership role. Getting hired into a senior level role is tricky for an employer, at that level a bad hire can have disproportionate negative effects.

Notes to the Employer

  • Know that you can't possibly screen a candidate sufficiently during the interview. You will necessarily have to balance speed and coverage.
  • Smaller employers have the option of being more agile in everything they do. They can take a chance with a bad hire, and let that hire go if things don't work out. They should use this to their advantage.
  • Be as clear as possible on what you want from the candidate. If you need a particular specialized skill, make that clear to candidates before the interview process commences. Job postings are notoriously poor at making this clear, as their quality and specificity varies greatly.
  • Be clear on how you will interview. Explicitly lay out the process up front, and try to follow it as best they can. Be clear on what you want to achieve from the interviews.
  • Focus the interview for the right level of detail. For a senior position focus on the types of architectural issues that you have encountered. Use whiteboard coding sparingly, it is the least useful indicator of being able to think at the architectural level.
  • Especially for senior engineers, be aware that you're unlikely to find a candidate with the exact skills you require. Most companies appear to be pretty good about this. But there are still several that fall for this trap.
  • Keep the interview process as short as possible, and come to a decision as quickly as possible. A long interview process takes up employee time, and could very well leave the candidate with a bad taste.
  • Wrap up each interview session with a meeting with the hiring manager, or at least the involved recruiter. The candidate has spent a whole day being interviewed, and giving reasonable closure and a sense for the next steps is important.

On Take Home Problems

Take home problems, code portfolios, etc are great ways of seeing how the candidate codes. But keep a few things in mind if you use a take home problem:

  • The evaluation of the solution to a problem should include a code review with the developer. Otherwise you're applying an arbitrary evaluation function on the outcome that requires a successful candidate to read your mind.
  • Keep the problem to be of a reasonable length. Try to limit it to four hours of work, candidates have their day job or other interviews to work on.
  • Focus the problem on key issues of interest. Consider providing them with a partial implementation. After all, being able to read others' code is just as important as writing something from scratch.
  • Be explicit on how the program will be evaluated. Saying "production quality" is rarely sufficient, this can mean different things for different development circumstances.
  • Do the review sooner rather than later. The ability to develop software is an essential skill for an engineer, and it also tells candidates you care about the effort the they have already put into the interview process.
  • If you do a take home problem, don't spend time on whiteboard coding. You will not learn anything useful from this.
  • Don't use algorithmically complex problems for evaluating coding ability. Evaluate algorithmic thinking separately. The two skills are quite distinct. An exception is if knowing certain classes of algorithms is a pre-requisite for the position.

Notes for the Candidate

Preparing for an interview is tricky. I can't claim to be particularly good at this, but I can claim to have reflected sufficiently on my performance to point out obvious don'ts.

  • Be prepared to reflect on your performance on all your past projects. Do this for yourself, as honestly as possible, with particular attention to if and how you might have done better. I find employers wanting to take me back often to my earliest projects.
  • Know what aspect of the job excites you the most. Explicitly discuss these particulars with the employer, and evaluate if you have correctly understood the job.
  • If you are not getting what you want, either in terms of compensation or responsibility, walk away. It is rarely the case that you won't find a more suitable situation in short order.
  • Interview with as many employers as you can in parallel, especially if this is yoursole task. This is a great chance to dream about what you might possibly want to do, there might just be a situation out there which fulfills that wish.
  • Keep interviewing until you have an acceptable offer in hand. Sometimes even the most promising situation falls apart at the final stages. It is essential to have other situations available that you can continue working on.
  • At the same time, selling yourself every day is emotionally exhausting. Balance time spent looiking for a job with other activities you find relaxing.
  • Managing your emotions is essential. There will be highs and lows in the process. Rejections are not negative assessments about your abilities and career choices.
  • Finally, keep a continual focus on your career growth. Keep an eye on the rest of the world and keep asking if there's something you'd rather be doing out there. Not to switch jobs at a whim, but to see if there's something fresh you can bring into your current job and your life and career.

Saturday, January 3, 2015

Graph Databases and Sharding

Lately I've become interested in graph databases. And over the past week I've been looking at graph database sharding.

tl;dr It is a difficult problem, one that appears to be more complex than RDBMS sharding. A few nice links on sharding relational databases:

Sharding graph databases is still an active research topic, below I've included a couple of papers and a Stack Overflow discussion on Neo4J sharding:

In both relational and graph databases, the sharding strategy chosen must reflect the use case being addressed. For relational databases the most general solution involves implementing a "share-nothing" strategy. We split the table that requies sharding into subsets that can be independently manipulated. We should be able to execute all join operations the application requires within a single shard. Sharding is in other words necessarily tied to the requirements of the application.

Graph databases should be approached in a similiar manner to relational databases: avoid sharding if you can. And if sharding is unavoidable, the sharding strategy should be based on the requirements of the application. The most extreme situation arises with scale-free graphs, where a few vertices are much more highly connected than the rest of the vertices in the graph. Even here unless we're doing graph analytics (map-reduce style computations over the whole graph) we should be able to formulate a sharding strategy that works for the given application.

But one of the big draws of graph databases is the promise of efficiently traversing arbitrary paths through the graph. For instance one might be interested in the shortest path between two arbitrary vertices, where the vertices reside in different shards. A different type of problem might occur with graphs so large that all the edges for some of the vertices cannot be maintained on a single machine. This is admittedly far-fetched, given the compute power available today. In these instances we are necessarily executing computations distributed across shards, where network communication might become the limiting factor. There is substantial algorithmic complexity in deciding how to shard such databases. But then such computations are equivalent to executing arbitrary joins on a sharded relational database, which will necessarily spill over the shard boundaries.