Archive for the ‘Questions’ Category

Is Polyglot Programming a good thing?

Is Polyglot Programming a good thing-

You have a lot of technologies on your CV in your last role. Java, Python, Scala, Node. What are your thoughts on Polyglot programming? Is it a good or a bad thing?

TL;DR: Right tool for the right job, but chances are the right tool is Java. Don’t choose a language because you want to try something new: it’s going to be a maintenance nightmare.

I’m not sure if you’ve noticed, but recently there seems to be an awful lot of noise about the death of Java and the rise of the JVM. Scala, Clojure, Groovy, Jython have all recently gone through being the language choice du jour (Although this great article shows how Java still rules the roost by a long way). More and more projects in these languages are appearing on the job boards and in projects at bigger and bigger companies.

But, is this a good thing or not?

As a massive nerd, I love trying new programming languages. New things are shiny. I like shiny. Last year when I started doing the “functional programming principals in Scala” course on coursera (it’s free!) run by Martin Odersky, the creator of Scala, I was convinced Scala was going to be the answer to all of my Java based problems. “Look ma, the syntax is so much more concise! And now I understand why immutable is so awesome!”. It’s this kind of hedonism which is why more and more projects are getting started in these languages. Bored programmers want to try new things out, and so a new language is brought into the company.

This is a terrible idea.

The sales pitch is “right tool for the job”. And that’s absolutely correct. Much like how in the early days of Java, if you needed really low latency you had to go to C++. It was the right tool for the job. There are absolutely valid cases for where functional programming is a better fit than object-oriented. But you need to be able to properly justify this up front with strong reasoning. The answer shouldn’t be “I fancied trying something new”. There’s a great post on Yegor256 called “It is not a school” which goes into depth about the fact that projects should hire people who already have the skills for the job:

However, if you make your projects spend their money on your education, that’s theft. And a good project manager should stop you, saying “This is not a school!”

This is a hard truth to swallow because I want to learn new stuff on the job, but the brutal fact is that it’s not in the best interests of the firm to have to spend time and money to teach me new skills.

Even if you already know the language in question it’s still probably a bad idea to use it. Your code will outlive your stay at an organisation. Irrelevant of language, people are going to curse your name when they have to maintain it, because as programmers it’s much easier to blame the person that came before us. Our jobs are hard enough when we have to take over Java code. Taking over an old codebase in a brand new language is going to be that much harder. If this is on a big project, then the organisation is going to have to hire people that specifically know the language. For now at least, these people are few and far between, and often expensive because of their niche.

As if that weren’t reason enough to think twice, you need to consider tooling. Java has been around for a really long time and has an incredible set of tools and libraries around it. IntelliJ is all powerful; the verbosity of Java is irrelevant thanks to Keyboard shortcuts and intellisense which make me immensely productive. Tooling like this doesn’t exist in a mature fashion in any other language (if you’ve ever tried to debug a gradle script you will know my pain). There’s also a lack of documentation and online support; it’s safe to say that most common Java problems have been answered on StackOverflow. Newer languages are improving, but they’re not there yet.

This isn’t to disparage polyglot programmers. I think being able to program in multiple languages is an absolute must and is something I look for on a CV. It shows a wider interest in programming and it’s a great way to change your mindset around your coding style. Spending the time to try out functional programming has absolutely changed my approach to Java and I write better code for it. I feel educated enough in the languages I know so that when the right use case comes up, I can and will use the right tool for the right job. But it won’t just be because I fancy giving Clojure a go (which I really do).

Now, I’m sure this is going to draw a lot of ire from the community. Which is why it’s important to remember that in the context of interviews, you need to present whatever your own opinion is but justify it in the context of the arguments above. It’s perfectly valid to argue that Java is a ridiculously verbose language and it’s not as suitable for a lot of tasks; highly parallelised data processing or complex multi threading could be better suited to Scala and Akka for example. You can argue the gains in using other languages outweigh the costs of introducing new languages to be maintained, particularly as developers we should be willing and able to retrain ourselves in those languages. Just make sure you can articulate your viewpoint and the arguments against it properly.

The Idiots Guide to Big O

idiots guide to big o notation


I hate big O notation. For as long as I can remember it’s been my biggest achilles heel (of which I have many). It’s just something I’ve never managed to successfully motivate myself to learn about despite knowing it’s going to come up in every single interview. I’ll get asked to implement an algorithm which I’ll do via brute force to start with: “What is the Big O of that?”. “I don’t know Big O but I know it’s slow”.

This is the wrong attitude and the wrong approach, and I have finally decided it is time to face my demons. All the resources I’ve found online I’ve sent me quickly to sleep, so I am determined to create an accessible Big O mastery page.

What on earth is Big O?

Big O is the way of measuring the efficiency of an algorithm and how well it scales based on the size of the dataset.  Imagine you have a list of 10 objects, and you want to sort them in order.  There’s a whole bunch of algorithms you can use to make that happen, but not all algorithms are built equal.  Some are quicker than others but more importantly the speed of an algorithm can vary depending on how many items it’s dealing with. Big O is a way of measuring how an algorithm scales.  Big O references how complex an algorithm is.

Big O is represented using something like O(n).  The O simply denoted we’re talking about big O and you can ignore it (at least for the purpose of the interview).  n is the thing the complexity is in relation to; for programming interview questions this is almost always the size of a collection. The complexity will increase or decrease in accordance with the size of the data store.

 






Download the PDF of The Idiots Guide To Big-O now for free!


Below is a list of the Big O complexities in order of how well they scale relative to the dataset.

O(1)/Constant Complexity: Constant.  This means irrelevant of the size of the data set the algorithm will always take a constant time. 1 item takes 1 second, 10 items takes 1 second, 100 items takes 1 second. It always takes the same amount of time.

O(log n)/Logarithmic Complexity: Not as good as constant, but still pretty good.  The time taken increases with the size of the data set, but not proportionately so. This means the algorithm takes longer per item on smaller datasets relative to larger ones.   1 item takes 1 second, 10 items takes 2 seconds, 100 items takes 3 seconds. If your dataset has 10 items, each item causes 0.2 seconds latency. If your dataset has 100, it only takes 0.03 seconds extra per item. This makes log n algorithms very scalable.

O(n)/Linear Complexity: The larger the data set, the time taken grows proportionately.  1 item takes 1 second, 10 items takes 10 seconds, 100 items takes 100 seconds.

O(n log n): A nice combination of the previous two.  Normally there’s 2 parts to the sort, the first loop is O(n), the second is O(log n), combining to form O(n log n) 1 item takes 2 seconds, 10 items takes 12 seconds, 100 items takes 103 seconds.

O(n^2)/Quadratic Complexity: Things are getting extra slow. 1 item takes 1 second, 10 items takes 100, 100 items takes 10000.  

O(2^n): Exponential Growth! The algorithm takes twice as long for every new element added.  1 item takes 1 second, 10 items takes 1024 seconds, 100 items takes 1267650600228229401496703205376 seconds.

It is important to notice that the above is not ordered by the best to worst complexity. There is no “best” algorithm, as it completely hinges on the size of the dataset and the task at hand.  It is also important to remember the code cost; a more complex algorithm may result in an incredibly quick sort, but if the code has become unmaintainble and difficult to debug is that the right thing to do?  It probably depends on your requirements.

There is also a variation in complexity within the above complexities.  Imagine an algorithm which loops through a list exactly two times.  This would be O(2n) complexity, as it’s going through your lists length (n) twice!

Why does this matter?

Simply put: an algorithm that works on a small dataset is not guaranteed to work well on a large dataset.  Just because something is lightening fast on your machine doesn’t mean that it’s going to work when you scale up to a serious dataset.  You need to understand exactly what your algorithm is doing, and what it’s big O complexity is, in order to choose the right solution.

There are three things we care about with algorithms: best case, worst case and expected case. In reality we only actually care about the latter two, as we’re a bunch of pessimists. If you ask an algorithm to sort a pre-sorted list it’s probably going to do it much faster than a completely jumbled list. Instead we want to know what the worst case (the absolutely maximum amount of steps the algorithm could take) and the expected case (the likely or average number of steps the algorithm could take).  Just to add to the fun, these can and often are different.

Examples

Hopefully you’re with me so far, but let’s dive into some example algorithms for sorting and searching.  The important thing is to be able to explain what complexity an algorithm is.  Interviewers love to get candidates to design algorithms and then ask what the complexity of it is.

O(1)

Irrelevant of the size, it will always return at constant speed. The javadoc for Queue states that it is  “constant time for the retrieval methods (peek, element, and size)”. It’s pretty clear why this is the case.  For peek, we are always returning the first element which we always have a reference to; it doesn’t matter how many elements follow it.  The size of the list is updated upon element addition/removal, and referencing this number is just one operation to access no matter what the size of the list is.

O(log n)

The classic example is a Binary search.  You’re a massive geek so you’ve obviously alphabetised your movie collection. To find your copy of “Back To The Future”, you first go to the middle of the list. You discover the middle film is “Meet The Fockers”, so you then head to the movie in between the start and this film.  You discover this is “Children of Men”. You repeat this again and you’ve found back to the future.   There’s a great interactive demo of binary search here.

Although adding more elements will increase the amount of time it takes to search, it doesn’t do so proportionally. Therefore it is O(log n).

O(n)

As discussed in my post on collections, LinkedLists are not so good (relatively speaking) when it comes to retrieval.  It actually has a Big O of O(n) as in the worst case, to find an element T which is the last element in the list, it is necessary to navigate the entire list of n elements to find T. As the number of elements increases so does the access time in proportion.

O(n log n)

The best example of O(n log n) is a merge sort. This is a divide and conquer algorithm. Imagine you have a list of integers. We divide the list in two again and again until we are left with with a number of lists with 1 item in: each of these lists is therefore sorted.We then merge each list with it’s neighbour (comparing the first elements of each every time). We repeat this with the new composite list until we have our sorted result.Merge-sort-example-300px(Image from wikipedia)

To explain why this is O(n log n) is a bit more complex.  In the above example of 8 numbers, we have 3 levels of sorting:

  • 4 list sorts when the list sizes are 2
  • 2 list sorts when the list sizes are 4
  • 1 list sort when the list size is 8

Now consider if I were to double the number of elements to 16: this would only require one more level of sorting. Hopefully your recognise this is a log n scale.

However, on each level of sorting a total of n operations takes place (look at the red boxes in the diagram above).  this results in (n * log n) operations, e.g. O(n log n).

O(n^2)

The Bubble Sort algorithm is everyone’s first algorithm in school, and interestingly it is quadratic complexity. If you need a reminder; we go through the list and compare each element with the one next to it, swapping the elements if they are out of order. At the end of the first iteration, we then start again from the beginning, with the caveat that we now know the last element is correct.

Bubble-sort-example-300px

Image from Wikipedia.

Imagine writing the code for this; it’s two loops of n iterations.

public int[] sort(int[] toSort){
        for (int i = 0; i < toSort.length -1; i++) {
            boolean swapped = false;
            for (int j = 0; j < toSort.length - 1 - i; j++) {
                if(toSort[j] > toSort[j+1]){
                    swapped = true;
                    int swap = toSort[j+1];
                    toSort[j + 1] = toSort[j];
                    toSort[j] = swap;
                }
            }
if(!swapped){
    break;
}
} return toSort; }

This is also a good example of best vs worst case. If the list to sort is already sorted, then it will only take n to sort; a single iteration will do.  However, in the worst case where we have to go through the list n times and each time looping another n – (first loop) which is slow.

You may notice that it’s technically less than n2 as the second loop decreases each time. This gets ignored because as the size of the data set increases this impact of this becomes more and more marginal and tends towards quadratic.

O(2^n)

Exponential growth! Any algorithm where adding another element dramatically increases the processing time. Take for example trying to find combinations; if I have  a list of 150 people and I would like to find every combination of groupings; everyone by themselves, all of the groups of 2 people, all of the groups of 3 people etc. Using a simple program which takes each person and loops through the combinations,  if I add one extra person then it’s going to increase the processing time exponentially. Every new element will double processing time.

In reality O(2^n) are in no way scalable.

How to figure out Big O in an interview

This is not an exhaustive list of Big O.  Much as you can O(n2), you can also have O(n^3) (imagine throwing an extra loop into your bubble sort for no apparent reason).  What the list on this page should allow you to do is have a stab in the dark at figuring out what the big O of an algorithm is.  If someone is asking you this during an interview they probably want to see how you try and figure it out. Break down the loops and processing.

  • Does it have to go through the entire list? There will be an n in there somewhere.
  •  Does the algorithms processing time increase at a slower rate than the size of the data set? Then there’s probably a log n in there.
  • Are there nested loops? You’re probably looking at n^2 or n^3.
  • Is access time constant irrelevant of the size of the dataset?? O(1)

Sample Questions

I have an array of the numbers 1 to 100 in a random number. One of the numbers is missing. Write an algorithm to figure out what the number is and what position is missing.

There are many variations of this question all of which are very popular.  To calculate the missing number we can simply add up all the numbers we do have, and subtract this from the expected answer of the sum of all numbers between 1 and 100.  To do this we have to iterate the list once. Whilst doing this we can also note which spot has the gap.

   public void findTheBlank(int[] theNumbers) {
        int sumOfAllNumbers = 0;
        int sumOfNumbersPresent = 0;
        int blankSpace = 0;

        for (int i = 0; i < theNumbers.length; i++) {
            sumOfAllNumbers += i + 1;
            sumOfNumbersPresent += theNumbers[i];
            if (theNumbers[i] == 0)
                blankSpace = i;
        }
        System.out.println("Missing number = " + (sumOfAllNumbers - sumOfNumbersPresent) + " at location " + blankSpace +" of the array");
    }

new BubbleSort().findTheBlank(new int[]{7,6,0,1,3,2,4});

//Missing number = 5 at location 2 of the array

Caveat: you can also calculate sumOfAllNumbers using (theNumbers.length+1) * (theNumbers.length) / 2.0). I would never remember that in an interview though.

What is the big O of your algo?

Our algorithm iterates through our list once, so it’s simply O(n).

 

 

JVM and Garbage Collection Interview Questions: The Beginners Guide

java garbage collection interview questions

java garbage collection interview questions

The Java Virtual Machine is the achilles heel of most developers and can cause even the most seasoned developers to be come unstuck.  The simple fact is that unless something is going wrong, we don’t normally care about it.  Maybe we tune it a little when the application goes live but after that it remains untouched until something goes wrong.  This makes it a very difficult subject to excel in during interviews.  Even worse, interviewers love to ask questions about it.  Everyone should have a basic knowledge of the JVM to be able to do their job but often people recruiting are looking for someone who knows how to fix a problem like a memory leak when it happens.

In this guide we take a ground up approach to the JVM and garbage collection so you can feel some level of confidence going into your big day.

Java Question: What is the JVM? Why is it a good thing? What is “write once, run anywhere”? Are there negatives?

JVM stands for Java Virtual Machine.  Java code is compiled down into an intermediary language called byte code. The Java Virtual Machine is then responsible for executing this byte code.  This is unlike languages such as C++ which are compiled directly to native code for a specific platform.

This is what gives Java its ‘Write once, run anywhere’ ability.  In a language which compiles directly to platform you would have to compile and test the application separately on every platform on which you wish it to run. There would likely me several issues with libraries, ensuring they are available on all of the platforms for example. Every new platform would require new compilation and new testing.  This is time consuming and expensive.

On the other hand a java program can be run on any system where a Java Virtual Machine is available.  The JVM acts as the intermediary layer and handles the OS specific details which means that as developers we shouldn’t need to worry about it. In reality there are still some kinks between different operating systems, but these are relatively small.  This makes it quicker and easier to develop, and means developers can write software on windows laptops that may be destined for other platforms. I only need to write my software once and it is available on a huge variety of platforms, from android to Solaris.

In theory, this is at the cost of speed.  The extra layer of the JVM means it is slower than direct-to-tin languages like C.  However java has been making a lot of progress in recent years, and given the many other benefits such as ease of use, it is being used more and more often for low latency applications.

The other benefit of the JVM is that any language that can compile down to byte code can run on it, not just java.  Languages like Groovy, Scala and Clojure are all JVM based languages.  This also means the languages can easily use libraries written in other languages. As a scala developer I can use java libraries in my applications as it all runs on the same platform.

The separation from the real hardware also means the code is sandboxed, limiting the amount of damage it can do to a host computer.  Security is a great benefit of the JVM.

There is another interesting facet to consider; not all JVMs are built equal.  There are a number of different implementations beyond the standard JVM implementation from Oracle.  JRockit is renowned for being an exceptionally quick JVM.  OpenJDK is an open source equivalent. There are tons of JVM implementations available.  Whilst this choice is ultimately a good thing, all of the JVMs may behave slightly differently.  A number of areas of the Java specification are left intentionally vague with regards to their implementation and each VM may do things differently.  This can result in a bug which only manifests in a certain VM in a certain platform.  These can be some of the hardest bugs to figure out.

From a developer perspective, the JVM offers a number of benefits, specifically around memory management and performance optimisation.

Java Interview Question: What is JIT?

JIT stands for “Just in Time”.  As discussed, the JVM executes bytecode.  However, if it determines a section of code is being run frequently it can optionally compile a section down to native code to increase the speed of execution. The smallest block that can be JIT compiled is a method.  By default, a piece of code needs to be excuted 1500 times for it to be JIT compiled although this is configurable.  This leads to the concept of “warming up” the JVM.  It will be at it’s most performant the longer it runs as these optimisations occur.  On the downside, JIT compilation is not free; it takes time and resource when it occurs.

Java Garbage Collection Interview Questions

Java Interview Question: What do we mean when we say memory is managed in Java? What is the Garbage Collector?

In languages like C the developer has direct access to memory.  The code literally references memory space addresses. This can be difficult and dangerous, and can result in damaging memory leaks.  In Java on the other hand all memory is managed.  As a programmer we deal exclusively in objects and primitives and have no concept of what is happening underneath with regards to memory and pointers.  Most importantly, Java has the concept of a garbage collector.  When objects are no longer needed the JVM will automatically identify and clear the memory space for us.

Java Interview Question: What are the benefits and negatives of the Garbage Collector?

On the positive side:

  • The developer can worry much less about memory management and concentrate on actual problem solving. Although memory leaks are still technically possible they are much less common.
  • The GC has a lot of smart algorithms for memory management which work automatically in the background. Contrary to popular belief, these can often be better at determining when best to perform GC than when collecting manually.

On the negative side

  • When a garbage collection occurs it has an effect on the application performance, notably slowing it down or stopping it.  In so called “Stop the world” garbage collections the rest of the application will freeze whilst this occurs. This is can be unacceptable depending on the application requirements, although GC tuning can minimise or even remove the impact.
  • Although it’s possible to do a lot of tuning with the garbage collector, you cannot specify when or how the application performs GC.

Java Interview Question: What is “Stop the World”?

When a GC happens it is necessary to completely pause the threads in an application whilst collection occurs. This is known as Stop The World.  For most applications long pauses are not acceptable.  As a result it is important to tune the garbage collector to minimise the impact of collections to be acceptable for the application.

Java Interview Question: How does Generational GC work? Why do we use generational GC? How is the Java Heap structured?

It is important to understand how the Java Heap works to be able to answer questions about GC.  All objects are stored on the Heap (as opposed to the Stack, where variables and methods are stored along with references to objects in the heap). Garbage Collection is the process of removing Objects which are no longer needed from the Heap and returning the space for general consumption. Almost all GCs are “generational”, where the Heap is divided into a number of sections, or generations.  This has proven significantly more optimal which is why almost all collectors use this pattern.

New Generation

Most applications have a high volume of short lived objects.  Analyzing all objects in an application during a GC would be slow and time consuming, so it therefore makes sense to separate the shortlived objects so that they can be quickly collected. As a result all new Objects are placed into the new generation. New gen is split up further:

  • Eden Space: all new objects are placed in here.  When it becomes full, a minor GC occurs.  all objects that are still referenced are then promoted to a survivor space
  • Survivor spaces: The implementation of survivor spaces varies based on the JVM but the premise is the same.  Each GC of the New Generation increments the age of objects in the survivor space.  When an object has survived a sufficient number of minor GCs (Defaults vary but normally start at 15) it will then be promoted to the Old Generation.  Some implementations use two survivor spaces, a From space and a To space. During each collection these will swap roles, with all promoted Eden objects and surviving objects move to the To space, leaving From empty.

A GC in the NewGen is known as a minor GC.

One of the benefits of using a New Generation is the reduction of the impact of fragmentation. When an Object is garbage collected, it leaves a gap in the memory where it was.  We can compact the remaining Objects (a stop-the-world scenario) or we can leave them and slot new Objects in.   By having a Generational GC we limit the amount that this happens in the Old Generation as it is generally more stable which is good for improving latencies by reducing stop the world.  However if we do not compact we may find Objects cannot just fit in the spaces inbetween, perhaps due to size concerns.  If this is the case then you will see objects failing to be promoted from New Generation.

Old Generation

Any objects that survive from survivor spaces in the New Generation are promoted to the Old Generation. The Old Generation is usually much larger than the New Generation.  When a GC occurs in old gen it is known as a full GC.  Full GCs are also stop-the-world and tend to take longer, which is why most JVM tuning occurs here.  There are a number of different Algorithms available for Garbage Collection, and it is possible to use different algorithms for new and old gen.

Serial GC

Designed when computers only had one CPU and stops the entire application whilst GC occurs.  It uses mark-sweep-compact. This means it goes through all of the objects and marks which objects are available for Garbage Collection, before clearing them out and then copying all of the objects into contiguous space (so therefore has no fragmentation).

Parallel GC

Similar to Serial, except that it uses multiple threads to perform the GC so should be faster.

Concurrent Mark Sweep

CMS GC minimises pauses by doing most of the GC related work concurrently with the processing of the application.  This minimises the amount of time when the application has to completely pause and so lends itself much better to applications which are sensitive to this.  CMS is a non compacting algorithm which can lead to fragmentation problems. The CMS collector actually uses Parallel GC for the young generation.

G1GC (garbage first garbage collector)

A concurrent parallel collector that is viewed as the long term replacement for CMS and does not suffer from the same fragmentation problems as CMS.

PermGen

The PermGen is where the JVM stores the metadata about classes.  It no longer exists in Java 8, having been replaced with metaspace. Generally the PermGen doesn’t require any tuning above ensuring it has enough space, although it is possible to have leaks if Classes are not being unloaded properly.

Java Interview Question: Which is better? Serial, Parallel or CMS?

It depends entirely on the application.  Each one is tailored to the requirements of the application. Serial is better if you’re on a single CPU, or in a scenario where there are more VMs running on the machine than CPUs.  Parallel is a throughput collector and really good if you have a lot of work to do but you’re ok with pauses.  CMS is the best of the three if you need consistent responsiveness with minimal pauses.

From the code

Java Interview Question: Can you tell the system to perform a garbage collection?

This is an interesting question.  The answer is both yes and no.  We can use the call “System.gc()” to suggest to the JVM to perform a garbage collection. However, there is no guarantee this will do anything. As a java developer, we don’t know for certain what JVM our code is being run in. The JVM spec makes no guarantees on what will happen when this method is called. There is even a startup flag, -XX:+DisableExplicitGC, which will stop this from doing anything.

It is considered bad practice to use System.gc().

Java Interview Question: What does finalize() do?

finalize() is a method on java.lang.Object so exists on all Objects.  The default implementation does nothing.  It is called by the garbage collector when it determines there are no more references to the object.  As a result there are no guarantees the code will ever be executed and so should not be used to execute actual functionality.  Instead it is used for clean up, such as file references.  It will never be called more than once on an object (by the JVM).

Tuning

Java Interview Question: What flags can I use to tune the JVM and GC?

There are textbooks available on tuning the JVM for optimal Garbage Collection.  Nonetheless it’s good to know a few for the purpose of interview.

-XX:-UseConcMarkSweepGC: Use the CMS collector for the old gen.

-XX:-UseParallelGC: Use Parallel GC for New Gen

-XX:-UseParallelOldGC: Use Parallel GC for Old and New Gen.

-XX:-HeapDumpOnOutOfMemoryError: Create a thread dump when the application runs out of memory. Very useful for diagnostics.

-XX:-PrintGCDetails: Log out details of Garbage Collection.

-Xms512m: Sets the initial heap size to 512m

-Xmx1024m: Sets the maximum heap size to 1024m

-XX:NewSize and -XX:MaxNewSize: Specifically set the default and max size of the New Generation

– XX:NewRatio=3: Set the size of the Young Generation as a ratio of the size of the Old Generation.

-XX:SurvivorRatio=10: Set the size of Eden space relative to the size of a survivor space.

Diagnosis

Whilst all of the questions above are very good to know to show you have a basic understanding of how the JVM works, one of the most standard questions during an interview is this: “Have you ever experience a memory leak? How did you diagnose it?”.  This is a difficult question to answer for most people as although they may have done it, chances are it was a long time ago and isn’t something you’ve done recently. The best way to prepare is to actually try and write an application with a memory leak and attempt to diagnosis it.  Below I have created a ridiculous example of a memory leak which will allow us to go step by step through the process of identifying the problem.  I strongly advise you download the code and follow through this process.  It is much more likely to be committed to your memory if you actually do this process.

import java.util.ArrayDeque;
import java.util.Deque;

public class Main {
    public static void main(String[] args) {
        TaskList taskList = new TaskList();
        final TaskCreator taskCreator = new TaskCreator(taskList);
        new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i < 100000; i++) {
                    taskCreator.createTask();
                }
            }
        }).start();
    }

    private static class TaskCreator {
        private TaskList taskList;

        public TaskCreator(TaskList taskList) {
            this.taskList = taskList;
        }

        public void createTask() {
            taskList.addTask(new Task());
        }
    }


    private static class TaskList {
        private Deque<Task> tasks = new ArrayDeque<Task>();

        public void addTask(Task task) {
            tasks.add(task);
            tasks.peek().execute();//Memory leak!
        }
    }

    private static class Task {
        private Object[] array = new Object[1000];

        public void execute() {
           //dostuff
        }
    }
}

In the above very contrived example, the application executes tasks put onto a Deque.  However when we run this we get an out of memory! What could it possibly be?

Memory Leak

To find out we need to use a profiler. A profiler allows us to look at exactly what is going on the VM.  There are a number of options available. VisualVM (https://visualvm.java.net/download.html) is free and allows basic profiling.  For a more complete tool suite there are a number of options but my personal favourite is Yourkit.  It has an amazing array of tools to help you with diagnosis and analysis. However the principles used are generally the same.

I started running my application locally, then fired up VisualVM and selected the process.  You can then watch exactly what’s going on in the heap, permgen etc.

visual vm

You can see on the heap (top right) the tell tail signs of a memory leak.  The application sawtooths, which is not a problem per se, but the memory is consistently going up and not returning to a base level. This smells like a memory leak.  But how can we tell what’s going on?  If we head over to the Sampler tab we can get a clear indication of what is sitting on our heap.

java heap snapshot

Those Object arrays look a bit odd. But how do we know if that’s the problem?  Visual VM allows us to take snapshots, like a photograph of the memory at that time.  The above screenshot is a snapshot from after the application had only been running for a little bit.  The next snapshot a couple of minutes later confirms this:

java heap snapshot 2

We can actually compare these directly by selecting both in the menu and selecting compare.

compare heap snapshots

compare items from heap snapshots

There’s definitely something funky going on with the array of objects.  How can we figure out the leak though? By using the profile tab.  If I go to profile, and in settings enable “record allocations stack traces”  we can then find out where the leak has come from.

visual vm profile

By now taking snapshot and showing allocation traces we can see where the object arrays are being instantiated.

memory leak source

Looks like there are thousands of Task objects holding references to Object arrays! But what is holding onto these Task items?

If we go back to the “Monitor” tab we can create a heap dump.  If we double click on the Object[] in the heap dump it will show us all instances in the application, and in the bottom right panel we can identify where the reference is.

memory leak object references

It looks like TaskList is the culprit!  If we take a look at the code we can see what the problem is.

tasks.peek().execute();

We’re never clearing the reference after we’ve finished with it! If we change this to use poll() then the memory leak is fixed.

 

Whilst clearly this is a very contrived example, going through the steps will refresh your memory for if you are asked to explain how you would identify memory leaks in an application.  Look for memory continuing to increase despite GCs happening, take memory snapshot and compare them to see which Objects may be candidates for not being released, and use a heap dump to analyze what is holding references to them.

 

 

Core Java Interview Questions Part II

Core Java Interview Questions

Core Java Interview QuestionsThe name of the game is core java interview questions and I will be your quiz master. Each week I will publish 10 new quick fire questions and answers. If you have any questions you’d like to suggest then drop me an emailat hello@corejavainterviewquestions.com.

1. What is a shutdown hook?

A shutdown hook allows you to start and run a Thread as the JVM is shutting down.

Runtime.getRuntime().addShutdownHook(new Thread() {

});

There are a number of reasons you may want to use one, such as to stop accepting new clients, or to correctly close up connections.  It cannot be relied on to execute, for example if there is a forced shutdown such as kill -9.

2. What rule are there around naming variables?

Variables must begin with a letter, $ (dollar sign) or _ (underscore). As a result a variable cannot begin with a number, however subsequent characters can include numbers.

3. Does Java support goto?

goto is a reserved keyword in java, however it is marked as not used.  As a result you cannot directly do goto statements in java.

4. What is the contract between hashcode and equals?

Two objects which are equal must have the same hashcode.  Two objects with the same hashcode may or may not be equal.

if a.equals(b), then a.hashCode() == b.hashCode().
if a.hashCode() == b.hashCode(), a may or may not equal() b

In practice this means when you override one you should override the other.

5. Can you override a static method?

No.  Static methods belong to the class itself, not to an object instance.

6. What do we mean when we say java supports covariant return type

When implementing/overriding a method, we can return a class which is a subtype of the original return type.  For example, the parent returns a Car, it is ok for the overriding class to return a Tesla.

public interface CarFactory {
    Car makeCar();
}
interface Car {
}

class TeslaFactory implements CarFactory{

    @Override
    public Tesla makeCar() { //Returns a Tesla, a subtype of Car, wheras the parent returns a car
        return null;
    }

    private class Tesla implements Car {
    }
}

This has been the case since Java 5.

7. What is the difference between final, finally and finalize?

final is a keyword that can be used on classes, methods or variables to denote that the value cannot be changed or overridden.  finally is excuted at the end of a try catch block and is guaranteed to execute under normal circumstances. finalize is a method called on an object when it is being garbage collected. There is no guarantee that it will be ever executed so it’s best not to rely on it.

8. What is the difference between HashMap and HashTable?

HashTable is syncronized and will not allow null values.  HashMap is not syncronized and will allow 1 null key and all null values.  HashTable is generally discouraged now, with Collections.synchronizedMap(Map) and ConcurrentHashMap as preferred alternatives.

9. What is the difference between Collections.synchronizedMap and ConcurrentHashMap?

ConcurrentHashMap is significantly more performant and allows parallel access for reads and writes from multiple threads.  However, it makes no guarantee that the iterator will or will not include updates made during the iteration.  There is therefore the potential for a Thread to have an out of date view.  Collections.synchronizedMap locks the entire map for reads and writes and is therefore unperformant but has guaranteed safety and data visibility.  ConcurrentHashMap also does not allow null keys or values.

10. What is a shallow copy vs. a deep copy?

A shallow copy will keep all of the same references as the original object.  For example, if a Person object has a reference to a Car, let’s call it Car A, and we do a shallow copy of the Person then the new Person will point to Car A. If we do a deep copy we will navigate the entire object graph from that Person and copy every other object. In this case we would make a copy of Car A to belong to our new person.

Java Threading Interview Questions: The Complete Guide

Java threading interview questions

As a developer the single best thing we can do for producing clean, understandable code that won’t fall over is to avoid putting any multi threading into your code base. The best programmers can break most problems down to be solved without complex threading. Poor programmers will throw threading in to fix a problem they don’t understand or to prematurely optimise. Threading questions are exceptionally popular for interviews so it’s important to be able to demonstrate a sound understanding of how threads work in Java, when and why we use them, and how to actually program them in a safe manner. In this article we will tackle threading from the ground up and cover what you need to know to be able to answer any questions that come up.

Introduction to Threading

Java Interview Questions: 

  • What is a Thread?
  • How do we use Threads in java?

Threads allow us to do more than one thing at once. Without Threads code is executed in a linear fashion, one command after another. By using threads we can perform multiple actions at the same time. This takes advantage of the fact that computers now have multiple processors as well as the ability to time slice tasks, so multiple threads can even be run on a single processors.

There are 2 main ways to create a new Thread in java.  Firstly, you can extend Java Thread and override the run method to execute your code.

public class SayHello extends Thread {

    public static void main(String[] args) {
        new SayHello().start();
    }

    @Override
    public void run() {
        for (int i = 0; i < 100; i++) {
            System.out.println("Hi there!");
        }
    }
}

Generally we avoid this. Java doesn’t have multiple inheritance so this option will limit your ability to extend anything else. More importantly, it’s just not a very pretty way of doing it. As good developers we aim to favour composition over inheritance.

Option two is much nicer, allowing us to implement the Runnable interface and pass this to a new Thread object. The interface has one method on it, run.

public class SayHelloRunner implements Runnable {
    public static void main(String[] args) {
        new Thread(new SayHelloRunner()).start();    
    }
    
    @Override
    public void run() {
        for (int i = 0; i < 100; i++) {
            System.out.println("Hi there!");
        }
    }
}

We pass the new Runnable to a Thread to start it. We always use start() to run the thread; using run() will run execute in the same thread as the caller.

Java Interview Question: How do we stop a Thread in java? What is the volatile keyword?

There is actually no API level way to stop a Thread reliably in java. The only guarantee is that when the code being executed in the thread completes, the thread will finish. It is therefore down to the developer to implement a way of telling a Thread to stop.

public class SayHelloRunner implements Runnable {
    private volatile boolean running = true;

    public void stopIt(){
        running = false;
    }

    @Override
    public void run() {
        while(running)
            System.out.println("Hi there!");
    }
}

As you can see in the above example the flag to check if the thread is running or not has the keyword volatile.

Java Interview Question: : What is the volatile keyword?

In modern processors a lot goes on to try and optimise memory and throughput such as caching. This is where regularly accessed variables are kept in a small amount of memory close to a specific processor to increase processing speed; it reduces the amount the processor has to go to disk to get information (which can be very slow). However, in a multithreaded environment this can be very dangerous as it means that two threads could see a different version of a variable if it has been updated in cache and not written back to memory; processor A may think int i = 2 whereas processor B thinks int i = 4. This is where volatile comes in. It tells Java that this variable could be accessed by multiple threads and therefore should not be cached, thus guaranteeing the value is correct when accessing. It also stops the compiler from trying to optimise the code by reordering it.

The downside is that there is a performance penalty on using volatile variables. There is a greater distance to travel to get access to the information as it cannot be stored in the cache. However volatile is usually a much better option speed wise than using a synchronize block as it will not cause threads to block and as such is much faster than other options for synchronisation.

 

Java Interview Question: : Explain the synchronized keyword. What does it do? Why do we need it?

Synchronized is a keyword in java that can be used as a block around a piece of code or as part of the method signature.

public class Bank {
    private int funds = 100;
    
    public void deposit(int money){
        synchronized (this){
            funds += money;
        }
    }
    
    public synchronized void withdraw(int money){
       if(funds > money) 
          funds -= money;
    }
}

Synchronized exists in Java to allow multiple threads which can both see an object to safely access it to ensure that the data is correct. The classic example is that of the bank account. Imagine if you remove the synchronisation from the above example. Thread one attempts to remove 100 from the account. At the exact same time, Thread two attempts to remove 100 from the account. For both threads when checking if there are sufficient funds the if statement returns true, resulting in them both withdrawing and a resultant balance of -100 which is not allowed. By using synchronized only a single thread can access the section of code in the synchronized block, which can help us to ensure correct state.

When a synchronized keyword is placed on a method such as on withdraw(int money), it has the same effect as wrapping all of a method’s code in synchronized(this) (which is the case in the deposit method in the example). If any Thread tries to execute any synchronized method on the object it will be blocked.

Java Interview Question:  What are the downsides of using synchronized? How can we avoid or mitigate them?

Locks are slow. Really slow. Particularly when using the synchronized keyword, it can have a huge effect on performance. The reason for this is explained wonderfully in a paper LMAX created on how they built Disruptor, a super low latency concurrency component. It is a brilliant read for anyone who wants to get into the fine details of threading and mechanical sympathy.

” Locks are incredibly expensive because they require arbitration when contended. This arbitration is achieved by a context switch to the operating system kernel which will suspend threads waiting on a lock until it is released. During such a context switch, as well as releasing control to the operating system which may decide to do other house-keeping tasks while it has control, execution context can lose previously cached data and instructions. This can have a serious performance impact on modern processors.”

You don’t need to learn the contents of the details of the quote; it should be sufficient to say that along with the impact of threads being blocked whilst they wait for a resource, there is an impact at an OS level which causes huge performance damage. In the same paper an experiment is carried out to see the impact on latency of different types of lock; without locking the process took 300ms, whereas with 2 threads contesting a lock it took 224,000ms. Getting threading right is hard but getting it wrong has huge consequences.

If you insist on using synchronized then minimising the size of the code block that is synchronized is a good start. The smaller this is will help to minimise impact. Even better, you can lock on specific objects, or use a lock object if using a primitive, so that the entire object does not need to be contended.

public class Bank {
    private int funds = 100;
    private Object fundLock;

    private UserDetails details = new UserDetails();

    public void deposit(int money){
        synchronized (fundLock){ //Lock Object needs to be acquired to update funds 
            funds += money;
        }
    }

    public void createUsername(){
        synchronized (details){ //Lock on the details object mean that the details object will block, but Bank will not.
            details.setName("Bob");
        }
        System.out.println("His name is Bob");
    }

}

Ideally though we want to avoid using synchronized where possible.

Java Interview Question: What other options do we have for creating Thread safe code?

As previously discussed, making a variable volatile is more performant and less complex. However Java has a number of classes that aid us with threading so that we can do it in an efficient way.

The Atomic Classes

Java Interview Question: What is CAS? Why is it good?

Java has a number of Atomic classes including AtomicInteger, AtomicDouble and AtomicBoolean. (see more here). These are completely thread safe and do not involve locking, but instead use Compare and Swap (CAS) operations. A CAS operation is an atomic one (it happens as one single operation) where during an update we check that the field value has not changed from when we decided to update it (it hasn’t been modified by another thread) and if so it sets the value to the new amount. This is exceptionally quick; CAS is actually an operation available on most processors and no blocking is required. It is still slower than single threaded (the LMAX experiment concluded it was 100x slower) but it is the best available away of ensuring thread safe access to values.

It is important to understand the difference between volatile and CAS. if you try and perform any operation that uses the original value it will render the volatile keyword useless. For example, i++ breaks down into i=i + 1. In the time between i being read (i + 1) and it being written (i = result) another thread can jump in between and update the value. Because the Atomic classes checks and writes values as an atomic operation, it is a much safer option.

Immutability

Immutable objects are brilliant. An immutable object is an object whose state cannot be changed after creation. By this definition all immutable objects are thread-safe. Programming to use im- mutable objects can be difficult and is a paradigm shift from standard object oriented programming. Functional languages like Scala make heavy use of immutable objects which allows them to scale well and be highly concurrent. The more we can rely on immutable objects the less threading we will need.

Thread methods

Java Interview Questions

  • What does yield() do?
  • What does interrupt() do?
  • What does join() do?

The Thread class has a whole bunch of methods on it which you need to know for your interview. Interviewers love testing people on their ability to remember the details around these methods. It is best to read up on them before an interview.

Yield gives up the processor. Imagine you have a single CPU only. If a thread refuses to allow other threads CPU time they will remain permanently blocked. By calling yield() the thread is generously saying “who else wants a turn?”. In reality it is unreliable and isn’t normally used. The implementation can be different on any system or JVM, and can often be different between Java versions. There is no guarantee of what other thread will get the next access, nor when your thread will next get an opportunity to run.

You may have noticed that a number of Thread related methods (most commonly Thread.sleep()) force you to catch an InterruptedException. If we need to suggest to a Thread it needs to stop for whatever reason we can do this by calling interrupt() on it. This sets the “interrupted” flag on the target thread. If the target thread chooses to check this flag (using the interrupted() or isInterrupted()) then it can optionally throw an Exception (usually Interrupted exception). It is predicated on the target Thread actually checking this. However there are a number of methods, such as Thread.sleep() which automatically poll for this flag. If the target thread is running one of these then an InterruptedException will be thrown straight away.

join() will simply tell the currently running thread to wait until completion of whichever thread join() is called on completes.

public class SayHelloRunner implements Runnable {

    public static void main(String[] args) throws InterruptedException {
        Thread thread = new Thread(new SayHelloRunner());
        thread.start();
        thread.join();
        System.exit(3);
    }

    @Override
    public void run() {
        int i = 0;
        while(i < 10000){
            System.out.println(i);
            i++;
        }
    }
}

In this example if we did not have the thread.join() the application will exit without printing anything out. The join() call effectively makes the call synchronous; it wants the Thread to finish before proceeding.

Object methods

Java Interview Questions:

  • What do wait(), notify() and notifyAll() do?
  • Why must they be called from a synchronised block?
  • What alternatives are there to these?

wait(), notify() and notifyAll()  are used as a means of inter-thread communication. When acquiring a lock on an object it may not be in the required state; perhaps a resource isn’t set yet, a value isn’t correct. We can use wait() to put the thread to sleep until something changes. When that something does change, the awaiting clients can be notified by calling notify() or notifyAll() on the object that is being waited for. If all of your waiting threads could in theory take action with the new information then use notifyAll(). However if there is a new lock to be acquired (so only one waiting Thread can take action), then call just notify().

Example:

public class Example {

    public static void main(String[] args) {

        ResourceCarrier carrier = new ResourceCarrier();
        ThingNeedingResource thingNeedingResource = new ThingNeedingResource(carrier);
        ThingNeedingResource thingNeedingResource2 = new ThingNeedingResource(carrier);
        ThingNeedingResource thingNeedingResource3 = new ThingNeedingResource(carrier);
        ResourceCreator resourceCreator = new ResourceCreator(carrier);

        new Thread(thingNeedingResource).start();
        new Thread(thingNeedingResource2).start();
        new Thread(thingNeedingResource3).start();
        new Thread(resourceCreator).start();
    }

}public class ResourceCarrier {
    private boolean resourceReady;



    public boolean isResourceReady() {
        return resourceReady;
    }

    public void resourceIsReady() {
        resourceReady = true;

    }
}
public class ResourceCreator implements Runnable {
    private ResourceCarrier carrier;

    public ResourceCreator(ResourceCarrier carrier) {

        this.carrier = carrier;
    }

    @Override
    public void run() {
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        synchronized (carrier) {
            carrier.resourceIsReady();
            carrier.notifyAll();
        }
    }
}
package com.core.interview.thread;


public class ThingNeedingResource implements Runnable {

    private ResourceCarrier carrier;

    public ThingNeedingResource(ResourceCarrier carrier){

        this.carrier = carrier;
    }
    @Override
    public void run() {
        synchronized (carrier){
            while(!carrier.isResourceReady()){
                try {
                    System.out.println("Waiting for Resource");
                    carrier.wait();
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
            System.out.println("haz resource");
        }
    }
}

Sample output:
Waiting for Resource
Waiting for Resource
Waiting for Resource
haz resource
haz resource
haz resource

In this example we have a wrapper around a resource called “Resource Carrier” and 3 threads that want access to its resource. When it acquires the lock it sees the resource isn’t available and goes into wait mode by calling wait() on the ResourceCarrier object. The ResourceCreator swoops in later to create the resource and calls notify on the ResourceCarrier, at which point the three threads spring back to life.

 

Deadlock

Java Interview Questions:

  • What is a deadlock?
  • How do we prevent deadlocks?

Many candidates completely fluff the answer to deadlock questions. It’s a very common interview question, and it’s an easy enough concept to understand, but it can be tricky to explain (particularly over the phone).

A deadlock occurs when two or more threads are awaiting the actions of each other which prevents any further processing from happening. For example, Thread A has the lock on Object 1 and attempts to require a lock on Object 2 to continue. Simultaneously, Thread B has a lock on Object 2 and attempts to acquire a lock on Object 1. As a result both Threads are kept busy waiting for the other one, which will never release. This is a deadlock.

The easiest way to prevent deadlock is to ensure that the ordering of acquisition for locks is consistent. If both Threads try to acquire Object 1 then Object 2 in that order then deadlock will not occur. It is important to code defensively when acquiring locks. Even better is to make synchronized blocks as small as possible and where possible avoid locking multiple objects.

Conclusion

What a ton of information! We’ve still barely scratched the surface of Threading, but if you can get your head around everything on this page then you’ll be well on your way to answering the majority of interview questions on the subject.  If you’re keen to read more, there is a lot more information and questions available in Java Interview Bootcamp, my interview prep book available now.

Tell me about your system

Tell me about your system

Tell me about your system

This is one of my absolute upmost favourite questions to ask during a face to face interview. Right now in your current role you’ll be working on a system (maybe more than one), and you’ve put hours of your life into it. You probably have a bunch of guys and girls you’ve worked with to put the system live and deliver some awesome value to someone. This should be the one thing that you know really really well.  Nonetheless the number of candidates that get caught out by this question is incredible. You wouldn’t believe how much some people struggle with this.

Why are you getting asked this question?

For the interviewer this is a great way to gauge a candidates experience. Some developers will often just use the technologies put in front of them and neither understand nor care why. This should send up a big red flag. Being a java developer is about much more than knowing the syntax. You need to understand why systems are put together the way they are and what the pros and cons are.

I expect this from candidates irrelevant of their experience. It also doesn’t matter whether you were involved in architecting the system or not. You should be able to take a view on the decisions made, whether it’s positive or negative. Remember, the architecture of the system itself is not on trial; if you think it’s terrible then you can say as much. Very few people get to work on true greenfield projects and build their solutions up from the start, and in reality every system is going to have flaws; and that’s ok. The important thing is that you know the weaknesses in your system and you can explain how you would do things differently.

Take advantage of the opportunity

If you are lucky enough to get this question then seize the chance with both hands. More than any other part of an interview this gives you the opportunity to step up and show yourself as an excellent candidate. If you can crush it then you’re well on your way to a job offer.

Preparation

Sit down with a pen and a piece of paper and draw your system out. If you think the system you’re working on currently isn’t interesting then use something else you’ve worked on. As a rule of thumb the interviewer shouldn’t care specifically about the project you’re working on right now, they just want you to talk them through a system.

Draw the relevant components. What languages are they written in and what technologies are frameworks used? Do you agree with those choices or do you wish there was something different? Draw and label the communications between them. Does it use files? Bus? Sockets? Pigeons? Write it down.

Now explain exactly why each thing does what it does. Why was it chosen?  It doesn’t matter whether you were involved in the decision, or if you agree with it. Explain the reasoning (or what you think the reasoning was), and outline if this is a good or a bad thing.  If you have a way you would prefer to do it, then say that as well.  Most choices in reality boil down to some sort of non functional requirement.  “This is very latency sensitive so we chose this middleware to fit that requirement”. “The traders don’t really care so we made this a batch process as it’s easier to support”.   Your job with this questions is to show that you understand the requirements of each component and how that relates to the technology choices. Don’t forget that requirements are never purely functional.

How do you support the application?  What is the mean time to recovery? Is the system global or local? What happens if your datacenter floods?

Example Interview

This interview is fictional, but should be representative of what you could expect in a real interview.  Apply this sort of questioning to your own system.

Can you please tell me about your system?

Sure. It takes  a bunch of transaction data from downstream systems and stores it. It performs some complex transformation on it and makes it available for end users and further up stream systems to use.

Ok, could you draw it for me please?

Sure.  So first of all we have the importer. This slurps in the data from a bunch of our downstream systems which have information about orders and reporting in. It’s written in Java.  It takes files once a day from 3 systems, and also has the option to manually import data using some web screens we threw together.example architecture

Why did you choose to take files and not have a continuous update or something else?

Legacy reasons. These systems were built many years ago and there’s a reluctance to invest any money in them as they basically work. This limits us as we can’t create things like customer reports in real time, only once a day.  We are currently mid way through a transformational project to allow real time flows; although some systems will never be upgraded there are some brand new downstreams that need to move onto our system and want a real time interface.  That’s the bulk of my current job. The target state is to have a real time flow of data so that the business can see the orders as soon as they’re put in. I’ve put an HTTP interface in there to allow intraday updates as a tactical fix.  That won’t cope with the data loads we’re targeting but we haven’t concluded how yet.

Ok, and is this system global?

Yes, we have an importer in New York and London. The systems in each are slightly different in each region so we’ve had to tweak the code slightly.

Do you have 1 code base then or multiple?

We used to have multiple code bases. It was a nightmare patching fixes across them so we’ve recently merged into a single GIT repo and we’ve added feature toggles based on each location.  I really enjoyed learning GIT, it’s much better than SVN, and it was nice to put feature toggles into practice.

And what about failover?  What happens if something goes down?

To be honest it doesn’t happen very often, but until recently there was no failover. As it’s currently a batch system this has never been a problem but as we’re trying to move to realtime updates it means if we do get problems we get very angry users.  As a result we’ve started running 2 instances.  The second instance only exposes the real time features though and doesn’t touch the batches. This is because we’d have had to introduce complex failover mechanics to stop 2 batches occurring.  It’s a cost risk analysis; if something goes wrong with batch it’s ok if it’s delayed for a bit whilst we fix it, which isn’t the case with real time.

Ok, so what does the importer do then?

It takes the files and updates and turns them into messages and puts them on our global bus.  We have a bunch of components that need this data so hang it off this bus. It’s using IBMMQ: not my choice, it’s mandated by the company.

What would you rather use? Why choose a bus?

I’d rather use ActiveMQ. It’s much easier for testing and I’ve had great experience with it when I’ve played with it before.  The bus was my idea. It removes a lot of issues with regards to failover.  Before we had a massive web of point to point connections which were difficult to manage from a support perspective and meant that if a single component went down the system would collapse.  Having a bus breaks the tight coupling between components and has seen our overall stability increase significantly; we used to have 1 outage a month last year but since introducing this we’ve had no full blackouts.

Are there any issues using a bus?

It’s extra middleware which can be a nightmare to manage, particularly if you have a centralised team like we do. It also decreases the visibility of what’s going on.  We had some problems with messages going missing and had to very quickly become experts at config tuning.   I think overall it was a good choice.

Do you know any other Bus technologies?

There’s HornetQ and ZeroMQ.  I’ve not had chance to play with them but I hear hornetQ is really fast.

What format do you put on the wire?

We’re using XML.  I wanted to use JSON but there was push back from the upstream teams.  JSON is much nicer, it’s more readable, it’s much less verbose and so smaller and quicker on the wire. However the other teams are heavily invested in XML so we compromised.

Ok. What next?

We have a database that hangs off that records all the data imported onto the bus. We need a historical record of data so this stores it for up to 7 years.  It’s a basic MySQL instance.

Why did you choose to hang it on the bus? Why not have it attached to the importer? Couldn’t you have a case where the bus collapses and you’ll lose a message?

We make sure to log stuff as it comes through the importer so we can manually reconcile messages in a worst case scenario.  If we had to do a DB transaction every new record it would be really slow in the importer. We felt it was best to decouple this completely. It also made failover easier, as the importers just need to worry about who’s publishing to the bus, and don’t need to mess around with database connections.

Some of our upstreams consume this raw feed, but for others we have an enhanced feed.  This transformer enriches the data with some information from a different database and puts it back onto the bus.

How quick is that?

Not very, it takes a few seconds. Right now that’s not a problem but as we move to real time it needs to be fixed. I’d like to look at some near caching technologies for it.

We also have a set of user screens that feed of the data.  This talks directly to the transformer.

Why did you choose to directly connect it and not use the bus?

It’s another piece of legacy we haven’t gotten around to fixing.  Ideally it would hang off the bus too but for now it has a direct connection.

What connection are you using between them?

Plain sockets.  We’ve got an embedded Netty server in the transformer. I’m a big fan of netty.

What about failover?

We run the Transformers in London only, but we run them hot warm.  We have some basic heartbeating on the bus so they know which is alive. If the heartbeats stop then the secondary takes over.

Have you had any issues with this? What about split brain?

Yes, we’ve had split brain happen a couple of times.  We were too aggressive with the heartbeat timeout so we’ve pushed that back.  It’s an acceptable compromise, if the transformer gets delayed for a few seconds during failover it’s ok.


 

Hopefully you get the idea.  For every system you need to be able to explain

  • The technology choice
  • The failover strategy
  • The transport choice

You also can use this as an opportunity to express you knowledge of other technologies.  In the interview there’s references to Netty, ZeroMQ, HornetQ, IBMMQ, ActiveMQ, Git, Feature Toggles, heartbeating, XML vs JSON, MySQL and Oracle. That’s a lot of technologies and design patterns. This is an effective way to show that you have a broad knowledge of your domain.  It’s impossible to know everything which is why when hiring it’s good to look for people who can introduce new ideas and technologies into the organisation.  And if you don’t know any then spend the time to look on google.  Type in “Alternatives to ” and the technology and you’ll have a ton of articles laying the arguments out for you.

Did you find this useful? What sort of questions have you been asked when getting grilled about your system? Let me know in the comments.


Why not sign up to the mailing list for weekly updates?
[mc4wp_form]


Image credit Marie Mosley

Design Patterns Part I: The Singleton Design Pattern

Singleton Design Pattern

Singleton Design PatternIntroduction

Possible Questions

  • Tell me a bit about design patterns
  • What is good about design patterns?

Design Patterns are another oft used area during interviews.  Originally pioneered by the “Gang of Four” (GoF) in one of the more dry textbooks you’ll read in your life Design Patterns: Elements of Reusable Object-Oriented Software. A design pattern is a way of structuring or shaping the design of your code which is well tested and proven. You cannot “invent” a design pattern and call it a design pattern.   A design pattern establishes itself over time.

Importantly, they (normally) work across languages, particularly OO.  The GoF book was written in C++ but the ideals apply equally to C# and Java.  The reason why Design Patterns are so brilliant are that they provide a common language between developers. If you wanted to created a singleton (more on that shortly) then as opposed to having to explain to the other developer what one is, how it would be implemented etc., design patterns mean you can say “this should be a singleton” and everyone understands what you mean.

Design patterns are an important toolkit to building systems.

Singletons

A Singleton is used when you want a single instance of an object in your entire application. You want to literally remove the ability to create multiple instances of something.  This can be useful for things like data storage, where you want a single go-to place in your application to get data on something.  The singleton pattern governs the creation and access of an object in order to create this restriction.

Possible Questions:

  • How do you create a Singleton in Java?

Objects are created using constructors. In order to be create the restriction you need to remove this ability. This can be done in Java using a private constructor.  This means that only static methods in the class itself can create the object.  An accessor method is then provided which makes the single instance of the object available.

public class SingletonExample {
    private static SingletonExample singleton;

    private SingletonExample(){ //Hides constructor from public view
            
    }
    
    public static SingletonExample getInstance(){
        if(singleton==null){
            singleton = new SingletonExample();
        }
        return singleton;
    }
}

 

This now means that anything trying to access SingletonExample will have to go through our getInstance method().

Possible Questions:

  • How do you cope with thread safety when creating a Singleton?
  • What is the difference between lazy and eager instantiation? Why does it matter?

Due to the lazy instantiation, that is, the fact the object is not created until it is first needed, the code sample above is not threadsafe. If multiple threads try to simultaneously access the Singleton it can result in 2 objects getting created, thus defeating the purpose of the pattern.  There are two ways around this:

  1. Make the getInstance() block synchronized.  This ensures that only one thread can access the method at once.  This however will make all calls to getInstance() syncronized which could result in performance degredation
  2. Use Eager Initialisation.  By initialising on the field the object is guaranteed to be ready before access, eliminating threading and performance issues.
public class SingletonExample {
    private static SingletonExample singleton =  new SingletonExample();;

    private SingletonExample(){ //Hides constructor from public view

    }

    public static  synchronized SingletonExample getInstance(){
        return singleton;
    }
}

But when do you choose between lazy and eager initialisation?  Lazy initialisation defers creation of an object until you absolutely need it. If it’s an “expensive” object like a database connection for example, it could in theory be good to use lazy initialisation if there’s a chance the connection won’t be needed.   However, the benefit of lazy is simply that you will speed up your start up time.  If startup times are not a concern for you (and generally they aren’t), go with eager initialisation as it solves a lot of problems with threading.  The simple fact is Threading is hard and if you can avoid things like syncronized then the trade off is worth it.

Possible Questions

  • Are singletons a good pattern to use?
  • How do singletons affect testing?

Singletons are not a good coding practice as they cause high coupling and violate dependency injection principles.  This makes code really hard to test.  Any object making use of a singleton needs to use a static method to get access to it.  This means it is not being “injected” in and so cannot be stubbed out for use in testing.  This can cause real problems as Singletons are often used to manage resources.

 private final SingletonExample instance;

 public ThingThatUsesSingleton(){
    instance = SingletonExample.getInstance();

 }
 public int doSomethingThatICanUnitTest(){
    instance.doSomething();
    return 17;
 }

In the above example I want to unit test the doSomethingThatICanUnitTest() method to ensure it returns 17. However, when I try to create this class in my test it will force the creation of a SingletonExample object and any expense that it has.  If it’s a database connection for example, that means my unit test will try and connect to the database as part of my unit test which is terrible.  Dependencies should be injected in so that I can mock them out and test the behaviour I actually care about.  Singletons break this.  As a result they are often avoided in reality, although they are the most well known pattern and so are often used as an interview question.

 

What do you think? Have you had these questions in an interview before? Why not put a comment below and let us know your experience.

If you’d like more questions like this into your inbox then sign up for the mailing list below!

[mc4wp_form]

How to handle Java Exception Interview Questions

Java exception interview questions

Java exception interview questionsYes, the title is a pun on handling exceptions. Sorry. Couldn’t resist.

Java Exceptions are one of my favourite questions to ask during telephone interviews (which I recently covered in some depth here). They’re one of the simpler parts of Java so you should expect most candidates to be able to give good answers.  It’s normally easy to weed out those who don’t as not worth your time.  However what I really like about Exceptions as interview fodder is that they allow the combination of a technical core java question with an opinion piece. Checked Exceptions are actually a really controversial part of Java and it’s good to see which side of the fence a candidate falls on, or whether they are even aware that the fence is there.

Core Questions

  • What is an Exception in Java?

I don’t normally ask this question as it’s so basic but it’s good to make sure you’ve got a standard answer in your pocket.  Exceptions are a way to programmatically convey system and programming errors. All exceptions inherit from Throwable.  When something goes wrong you can use the throw keyword to fire an exception.

  • There are 2 types of exception in Java, what are they and what’s the difference?

It still amazes me how much the “2 types” question throws people (yep, throws, I did it again, another pun).  This is as basic as core java gets. If you don’t get this then it tells me that you didn’t learn core java properly (or at all) which will probably act as a shaky base for the rest of your knowledge.  Get this answer right.

The two types of exception are checked and unchecked.  I’ve heard some strange answers in my time including “Runtime and NullPointerException”! A checked exception is one that forces you to catch it. It forms part of the API or contract. Anyone using code that throws a checked Exception can see that as it is declared on the method and they are forced to handle it using a try/catch block.  Unchecked on the other hand does not need to be caught and does not notify anyone using the code that it could be thrown.

  • How do I know if an Exception class is checked or unchecked?

All exceptions are checked exceptions except those that inherit from java.lang.RuntimeException.

  • What does “finally” do?

Exceptions are caught using a try-catch block.  However following the catch you can have a finally block. This block is guaranteed to execute no matter what happens (short of someone pulling out the plug), such as if an Exception is thrown in the catch block.  This is really useful to ensure the clean up of resources such as Connections.

  • Can I throw multiple Exceptions from the same method? How do I catch them?

Yes you can, they just need listing after the “throws” in the method signature.  You can catch them simply using multiple catch blocks, or by catching a parent exception class e.g. you can just catch Exception which will catch all Exceptions. This is in theory bad practice as you lose the nuance of what went wrong and specifically the ability to recover.  You can also catch multiple exceptions in a single catch block as of Java 7.

When catching exceptions it is important to do so in order from most specific to least specific.  If your catch block catches Exception first and then IOException second, the first catch block will always catch any Exception coming through and the IOException will be rendered useless (and in fact it will cause a compiler error saying the Exception has already been caught)

Discussion questions

  • What do you think of Checked Exceptions, are they a good thing?
  • When you’re writing your own code and you need to throw a custom exception, do you create a checked or unchecked exception?
  • C sharp does not have Checked Exceptions.  Can you tell me why this might be? Who do you think was right, Java or C sharp?

I love these questions so much.  It’s a really good way to stretch a candidate and to see if they understand how to architect systems.  Having an appropriate policy on Exceptions is key to maintaining a clean code base, and you’ll quickly figure out who’s thought about the issue and who hasn’t.

A must read before your interview is this interview with the creator of C# and why he didn’t include checked exceptions in the language. The argument in general is that Checked Exceptions cause ugly code in Java. Handling is incredibly verbose, and normally most developers don’t know how to handle them or are too lazy to.  This results in empty catch blocks or just log statements.  This results in a brittle system where Exceptions get swallowed never to be heard from again and the system can get into a brittle state without telling anyone.

On the positive side for checked, it declares on the API what it’s going to do so people can handle it. This is good if you’re handing out a library. However a lot of developers use this for evil, and control the flow of their system using exceptions. Exceptions are for exceptional circumstances only.

As with most discussion questions the important part is not that the developer agrees with your opinion but that they are capable of forming an opinion and explaining it. I personally fall heavily on the side that Checked Exceptions are terrible and developers misuse them. In my code all checked exceptions get wrapped in an application specific RuntimeException and thrown up the stack where I’ll normally have an Exception Guard to catch and log the issue.   However if a candidate say they use Checked Exceptions in their code and can explain why with an understanding of the problems then they get a pass.

Just make sure you’ve thought about where you lie on the issue before you go in for your interview so you can articulate it to the interviewer.  And if you’ve never thought about it before, then remember checked exceptions are evil :).

Java Data Structures Interview Questions

T H A N K  Y O U (3)

Java data structures interview questions come up in almost all java interviews.  At it’s nexus being a programmer is about handling data, transforming and outputting it from one format to another which is why it is so important to understand what’s going on under the hood and how it can affect your application. It never fails to confound me how many people can’t tell what collection is used to back an ArrayList (hint: it’s in the name).

From the interviewer’s perspective, questions on data structures reveal a lot of information about the candidate.  They show if the candidate has a core understanding of how Java works.  Even better it provides a platform to lead to a wide ranging set of questions on design, algorithms and a whole lot more.  In day to day java programming, you’re going to spend most of your time with ArrayLists, LinkedLists and HashMaps.  As a result it makes sense to review collections and algorithms you haven’t looked at for a while as, in my experience, companies love asking this sort of question.






Download the Data Structures Interview PDF with bonus material now

The Basics

Possible Interview Questions

  • Q: What is the Java Collection Framework?
  • Q: What Collection types are there in Java?
  • Q: What are the basic interface types?
  • Q: What are the differences between List, Set and Map?

This is the bread and butter stuff that everyone who uses Java should know. The collections framework is simply the set of collection types that are included as part of core java. There are three types of collection you will deal with in Java; Set and List (both of which implement Collection) and Maps (which implement Map and technically aren’t part of core Collections). Each type of collection exhibits certain characteristics which define it’s usage. The key features of a collection are:

  • Elements can be added or removed
  • The collection will have a size that can be queried
  • The collection may or may not contain duplicates
  • It may or may not provide ordering
  • It may or may not provide positional access (e.g. get me item at location 6)

The behaviour of these methods varies based on implementation (as you would expect from an interface). What happens on if you call remove() on a collection for an object that doesn’t exist? It depends on the implementation.

 

The Collection Types

Set: A set has no duplicates and no guaranteed order. Because of this, it does not provide positional access. Implements Collection. Example implementations include TreeSet and HashSet.

List: A list may or may not contain duplicates and also guarantees order, allowing positional access. Implements Collection. Example implementations include ArrayList and LinkedList.

Map: A map is slightly different as it contains key-value pairs as opposed to specific object. The key of a map may not contain duplicates. A map has no guaranteed order and no positional access. Does not implement Collection. Example implementations include HashMap and TreeMap.

But what does this mean with regards to interviews? You’re almost certainly going to get a question about the differences between the types so make an effort to learn them. More importantly the smart interviewee understands what the implication of these features are. Why would you choose one over the other? What are the implications?

 

Which to choose?

In reality, whether you choose a Set, List or Map will depend on the structure of your data. If you won’t have duplicates and you don’t need order, then your choice will lie with a set. If you need to enshrine order into your data, then you will choose List. If you have key-value pairs, usually if you want to associate two different objects or an object has an obvious identifier, then you will want to choose a map.

Examples

  • A collection of colours would be best put into a set. There is no ordering to the elements, and there are no duplicates; You can’t have two copies of “Red”!
  • The batting order of a cricket team would be good to put in a list; you want to retain the order of the objects.
  • A collection of web sessions would be best in a map; the unique session ID would make a good key/reference to the real object.
Understanding why you choose a collection is vitally important for giving the best impression in interviews. I’ve had candidates come in and tell me they just use ArrayList and HashMap because that’s just the default implementation they choose to use. In reality, this maybe isn’t a bad thing. Most times we need to use collections there are no major latency or access requirements. It doesn’t matter. However, people who are asking you these questions in interviews want to know that you understand how things work under the covers.

What are the main concerns when choosing a collection?

When you’re handling a collection, you care about speed, specifically

  • Speed of access
  • Speed of adding a new element
  • Speed of removing an element
  • Speed of iteration

On top of this, it isn’t just a case of “which is faster”. There is also consistency. Some implementations will guarantee the speed of access, whereas others will have variable speed. How that speed is determined depends on the implementation. The important thing to understand is the speed of the relative collection implementations to each other, and how that influences your selection.

Collection Implementations

Possible Interview Questions

  • Why would I choose LinkedList over an ArrayList?
  • Which is faster, TreeSet or HashSet?
  • How does a LinkedList work?

The collection implementations tend to be based on either a Linked implementation, a Tree implementation or a Hash implementation. There are entire textbooks based on this, but what you need to know is how it affects your collection choices.

Linked implementation

Example: LinkedList, LinkedHashSet

Under the covers, each item is held in a “node”. Each node has a pointer to the next item in the collection like so.

linkedlist

What does this mean for speed?

When adding an item into a linked connection it’s quick, irrelevant of where you’re adding the node. If you have a list of 3 items and want to add a new one in position 1, the operation is simply to point position 2 to the new item, and the new item to the old position 2.

new

That’s pretty fast! No copying collections, no moving things, no reordering. The same is true for removals; simply point node 0 to node 2 and you’ve removed node 1.

Conversely when accessing objects it is relatively slow. If you have a list of 1000 items, and you want item 999, you need to traverse every single node to get to that item. This also means that the access speed is not consistent. It’s quicker to access element 1 than it is element 1000. Bear in mind that when adding a new object you need to first traverse to the node before, so this can have some impact on speed.

So linked collections such as LinkedList are great when you need fast addition/removal and access time is less of a concern. If you’re going to be adding/remove items from your collection a lot, then this is the option for you.

 

Array implementation

Example: ArrayList

ArrayList is the only Array based implementation in the collection classes, but is often used to compare to LinkedList so it’s important to understand. As alluded to previously, ArrayLists are backed by an Array. This has interesting implications.

When adding an item into the middle of an ArrayList, the structure needs to copy all of the items to shift them down the Array. This can be slow, and is not guaranteed time (depending on how many elements need to be copied.

Add Item To Arraylist

The same applies when removing objects; all object references after it have to be copied forward one space. There is an even worse case. On creating an ArrayList it starts with a fixed length Array (whose length you can set in the constructor, or the default is 10). If the capacity needs to increase over this, the collection has to create a brand new array of greater length and copy all the references to it which can be really slow.

Where ArrayList succeeds is access speed; as the array is in contiguous memory it means that if you request object 999 of 1000, you can calculate exactly where in memory the reference is with no need to traverse the full collection. This gives constant time access which is fast.

So, ArrayLists make a great choice if you have a set of data that is unlikely to be modified significantly, and you need speedy read access.

 

Hash Implementation

Examples: HashSet, HashMap

The HashMap is a complicated beast in itself and is a good choice for an interview question as it is easy to add extending questions to.

 

Possible Interview Questions:

  • How does a HashMap work?
  • Why does the hashcode of the key matter?
  • What happens if two objects have the same HashCode?
  • How does the Map select the correct object if there is a clash?

Interestingly HashSet is backed by HashMap, so for the purpose of this I will just discuss the latter. A HashMap works like this: under the covers a HashMap is an array of references (called the Entry Table, which defaults to 16 entries) to a group of LinkedLists (called Buckets) where HashEntries are stored. A HashEntry is an object containing the key and associated object.

Empty Hashmap

All instances of Object have a method, hashCode(), which returns an int. The hashcode is usually a value derived from the properties of an object. The hashcode returned from an object should be consistent and equal objects must return the same hashcode. This property can then be used to determine which bucket to put it in based on the index of the entry table; for example, in the above example with 4 spaces in the entry table, if I have a key in with HashCode 123456, I can do 12345 % 4 = 1, so I would place my object in bucket 1.

This makes object retrieval very quick. In my example, if I try to retrieve the object for my key then all I need to do is the same mod calculation to find the bucket and it’s the first entry. Exceptionally quick.

This can get more complicated. What if I have two more objects, HashCode of “5” and “9”? They both have a (modulus 4) of 1 so would both be put into bucket 1 too. Now when I use my “9” key to retrieve my object the HashMap has to iterate through a LinkedList, calling .equals() on the keys of each HashEntry.

HashMap Retrieval

 

This is slow! Fortunately, this shouldn’t happen often. Although collisions will naturally occur, the intention is that objects will be spread evenly across the entry table with minimal collisions. This relies on a well implemented hashcode function. If the hashcode is poor (e.g., always return 0) then you will hit collisions.

To further try and reduce the chance of this, each HashMap has a load factor (which defaults to 0.75). If the entry table is more full than the load factor (e.g. if more than 75% of the entry table has entries) then, similar to the ArrayList, the HashMap will double the size of the entry table and reorder the entries. This is known as rehashing.

 

Tree Implementation

Examples: TreeMap, TreeSet

The Tree implementation is completely different to any of the data structures we’ve considered before and is considerably more complicated. It is based on a Red Black tree, a complicated algorithm which you’re unlikely to get asked about (if you’re really keen then watch this). Values are stored in an ordered way, determined either by the natural ordering of the object or using a Comparator passed in as part of the constructor. This means on all implementations the collection will re-sort itself to maintain this natural order. When retrieving items it is necessary to navigate the tree, potentially traversing a large number of nodes for sizeable collections. The important thing to note from your perspective: this is slower than HashMap. Considerably slower. If speed is the goal then always use a HashMap or HashSet. However, if ordering is important then TreeMap/TreeSet should be your collection of choice.

 

Conclusion

That’s a lot of information to take in, but it’s no where near complete. There is a lot more in the CJIQ book such as Queues and multithreading with Collections, Java Interview Bootcamp, available here.