Spark streaming reduceByKeyAndWindow unstable application

I wanted to run a job which runs 24×7 and which reports if certain keywords occur more than a N¬†times in the stream. Spark streaming looked a ideal candidate for this task. Spark has a¬†reduceByKeyAndWindow function which was exactly what I was looking for. I decided to use a window length of 1 minute and […]

Spark streaming: Fixing all executors not getting jobs

I was working on a feature recently which needed a streaming job that runs 24×7 and processing 100 million rows per day. The spark web ui is a wonderful tool to look at how things are running internally. While debugging I noticed that the streaming jobs were getting allocated to only one machine. Spark has […]

My 2x estimation rule

Estimation is an important part of software development and its incredibly difficult to make correct estimation. Over the time, I have realized that on most occasions my estimates were very aggressive despite keeping some buffer time. Then one day I decided to try out something new. I divided the feature in small tasks and then […]

Mutex Vs Semaphore

The title seemed to bug me a lot, until today when i found a really good explanation. If you google for the difference between Mutex and Semaphore you will see in most of the places it is mentioned that mutex is a binary semaphore. But mutex is not actually a binary semaphore. See these links […]

Coin Denomination Problem

Coin denomination problem is a very common question asked in interviews. It involves Greedy Approach, Dynamic Programming and Recursion. You may find a lot of stuff on the internet if you just google out “Coin denomination problem”. A link that was really helpful for me to analyse the problem is Some other helpful text […]