This is one of my absolute upmost favourite questions to ask during a face to face interview. Right now in your current role you’ll be working on a system (maybe more than one), and you’ve put hours of your life into it. You probably have a bunch of guys and girls you’ve worked with to put the system live and deliver some awesome value to someone. This should be the one thing that you know really really well. Nonetheless the number of candidates that get caught out by this question is incredible. You wouldn’t believe how much some people struggle with this.
Why are you getting asked this question?
For the interviewer this is a great way to gauge a candidates experience. Some developers will often just use the technologies put in front of them and neither understand nor care why. This should send up a big red flag. Being a java developer is about much more than knowing the syntax. You need to understand why systems are put together the way they are and what the pros and cons are.
I expect this from candidates irrelevant of their experience. It also doesn’t matter whether you were involved in architecting the system or not. You should be able to take a view on the decisions made, whether it’s positive or negative. Remember, the architecture of the system itself is not on trial; if you think it’s terrible then you can say as much. Very few people get to work on true greenfield projects and build their solutions up from the start, and in reality every system is going to have flaws; and that’s ok. The important thing is that you know the weaknesses in your system and you can explain how you would do things differently.
Take advantage of the opportunity
If you are lucky enough to get this question then seize the chance with both hands. More than any other part of an interview this gives you the opportunity to step up and show yourself as an excellent candidate. If you can crush it then you’re well on your way to a job offer.
Sit down with a pen and a piece of paper and draw your system out. If you think the system you’re working on currently isn’t interesting then use something else you’ve worked on. As a rule of thumb the interviewer shouldn’t care specifically about the project you’re working on right now, they just want you to talk them through a system.
Draw the relevant components. What languages are they written in and what technologies are frameworks used? Do you agree with those choices or do you wish there was something different? Draw and label the communications between them. Does it use files? Bus? Sockets? Pigeons? Write it down.
Now explain exactly why each thing does what it does. Why was it chosen? It doesn’t matter whether you were involved in the decision, or if you agree with it. Explain the reasoning (or what you think the reasoning was), and outline if this is a good or a bad thing. If you have a way you would prefer to do it, then say that as well. Most choices in reality boil down to some sort of non functional requirement. “This is very latency sensitive so we chose this middleware to fit that requirement”. “The traders don’t really care so we made this a batch process as it’s easier to support”. Your job with this questions is to show that you understand the requirements of each component and how that relates to the technology choices. Don’t forget that requirements are never purely functional.
How do you support the application? What is the mean time to recovery? Is the system global or local? What happens if your datacenter floods?
This interview is fictional, but should be representative of what you could expect in a real interview. Apply this sort of questioning to your own system.
Can you please tell me about your system?
Sure. It takes a bunch of transaction data from downstream systems and stores it. It performs some complex transformation on it and makes it available for end users and further up stream systems to use.
Ok, could you draw it for me please?
Sure. So first of all we have the importer. This slurps in the data from a bunch of our downstream systems which have information about orders and reporting in. It’s written in Java. It takes files once a day from 3 systems, and also has the option to manually import data using some web screens we threw together.
Why did you choose to take files and not have a continuous update or something else?
Legacy reasons. These systems were built many years ago and there’s a reluctance to invest any money in them as they basically work. This limits us as we can’t create things like customer reports in real time, only once a day. We are currently mid way through a transformational project to allow real time flows; although some systems will never be upgraded there are some brand new downstreams that need to move onto our system and want a real time interface. That’s the bulk of my current job. The target state is to have a real time flow of data so that the business can see the orders as soon as they’re put in. I’ve put an HTTP interface in there to allow intraday updates as a tactical fix. That won’t cope with the data loads we’re targeting but we haven’t concluded how yet.
Ok, and is this system global?
Yes, we have an importer in New York and London. The systems in each are slightly different in each region so we’ve had to tweak the code slightly.
Do you have 1 code base then or multiple?
We used to have multiple code bases. It was a nightmare patching fixes across them so we’ve recently merged into a single GIT repo and we’ve added feature toggles based on each location. I really enjoyed learning GIT, it’s much better than SVN, and it was nice to put feature toggles into practice.
And what about failover? What happens if something goes down?
To be honest it doesn’t happen very often, but until recently there was no failover. As it’s currently a batch system this has never been a problem but as we’re trying to move to realtime updates it means if we do get problems we get very angry users. As a result we’ve started running 2 instances. The second instance only exposes the real time features though and doesn’t touch the batches. This is because we’d have had to introduce complex failover mechanics to stop 2 batches occurring. It’s a cost risk analysis; if something goes wrong with batch it’s ok if it’s delayed for a bit whilst we fix it, which isn’t the case with real time.
Ok, so what does the importer do then?
It takes the files and updates and turns them into messages and puts them on our global bus. We have a bunch of components that need this data so hang it off this bus. It’s using IBMMQ: not my choice, it’s mandated by the company.
What would you rather use? Why choose a bus?
I’d rather use ActiveMQ. It’s much easier for testing and I’ve had great experience with it when I’ve played with it before. The bus was my idea. It removes a lot of issues with regards to failover. Before we had a massive web of point to point connections which were difficult to manage from a support perspective and meant that if a single component went down the system would collapse. Having a bus breaks the tight coupling between components and has seen our overall stability increase significantly; we used to have 1 outage a month last year but since introducing this we’ve had no full blackouts.
Are there any issues using a bus?
It’s extra middleware which can be a nightmare to manage, particularly if you have a centralised team like we do. It also decreases the visibility of what’s going on. We had some problems with messages going missing and had to very quickly become experts at config tuning. I think overall it was a good choice.
Do you know any other Bus technologies?
There’s HornetQ and ZeroMQ. I’ve not had chance to play with them but I hear hornetQ is really fast.
What format do you put on the wire?
We’re using XML. I wanted to use JSON but there was push back from the upstream teams. JSON is much nicer, it’s more readable, it’s much less verbose and so smaller and quicker on the wire. However the other teams are heavily invested in XML so we compromised.
Ok. What next?
We have a database that hangs off that records all the data imported onto the bus. We need a historical record of data so this stores it for up to 7 years. It’s a basic MySQL instance.
Why did you choose to hang it on the bus? Why not have it attached to the importer? Couldn’t you have a case where the bus collapses and you’ll lose a message?
We make sure to log stuff as it comes through the importer so we can manually reconcile messages in a worst case scenario. If we had to do a DB transaction every new record it would be really slow in the importer. We felt it was best to decouple this completely. It also made failover easier, as the importers just need to worry about who’s publishing to the bus, and don’t need to mess around with database connections.
Some of our upstreams consume this raw feed, but for others we have an enhanced feed. This transformer enriches the data with some information from a different database and puts it back onto the bus.
How quick is that?
Not very, it takes a few seconds. Right now that’s not a problem but as we move to real time it needs to be fixed. I’d like to look at some near caching technologies for it.
We also have a set of user screens that feed of the data. This talks directly to the transformer.
Why did you choose to directly connect it and not use the bus?
It’s another piece of legacy we haven’t gotten around to fixing. Ideally it would hang off the bus too but for now it has a direct connection.
What connection are you using between them?
Plain sockets. We’ve got an embedded Netty server in the transformer. I’m a big fan of netty.
What about failover?
We run the Transformers in London only, but we run them hot warm. We have some basic heartbeating on the bus so they know which is alive. If the heartbeats stop then the secondary takes over.
Have you had any issues with this? What about split brain?
Yes, we’ve had split brain happen a couple of times. We were too aggressive with the heartbeat timeout so we’ve pushed that back. It’s an acceptable compromise, if the transformer gets delayed for a few seconds during failover it’s ok.
Hopefully you get the idea. For every system you need to be able to explain
- The technology choice
- The failover strategy
- The transport choice
You also can use this as an opportunity to express you knowledge of other technologies. In the interview there’s references to Netty, ZeroMQ, HornetQ, IBMMQ, ActiveMQ, Git, Feature Toggles, heartbeating, XML vs JSON, MySQL and Oracle. That’s a lot of technologies and design patterns. This is an effective way to show that you have a broad knowledge of your domain. It’s impossible to know everything which is why when hiring it’s good to look for people who can introduce new ideas and technologies into the organisation. And if you don’t know any then spend the time to look on google. Type in “Alternatives to ” and the technology and you’ll have a ton of articles laying the arguments out for you.
Did you find this useful? What sort of questions have you been asked when getting grilled about your system? Let me know in the comments.
Why not sign up to the mailing list for weekly updates?