4 days before delivering the Thesis Report

Right now, I should not be writing here, but only in my report :p But hey! I will be fast 😀

The day I was waiting for so long is approaching! 4 days till delivering the final thesis report. (teeth grinding, tears rolling, and a secret smile waits to give its huge finale to this 6-month performance)

I have so many words, definitions and numbers going around my head. And all this “jungle bubble” is taking structure in a form of sentences but getting restricted and limited in some lines of – somehow – academic writing.

I implemented 3 algorithms; all of them are Pregel-based, implemented on top of Giraph. They are iterative, vertex-centric and scalable.

The first two come from Collaborative Filtering; a technique used by Recommendation Systems. Think of a website with movies, where you can give ratings for movies you have watched. After rating some, the website starts recommending movies that you may like to watch. And sometimes you follow their recommendations and the movies are actually good. Magic! …Or just math. Recommendation systems use different algorithms for predicting what rating a user may give to a movie he/she has not watched. These algorithms are usually based on ratings the user has given in the past and on ratings of other users. Imagine you and another guy (or girl) from the other side of the planet have the same taste in movies and give the same ratings. It’s reasonable that if he (or she) watches a movie and gives a good rating for it, you should receive a recommendation from the system about this movie. Of course there are exceptions in the rule and other factors affect our preferences, but something is better than nothing. 😉  So the two algorithms do some math (or magic :p) and improve the predictions! Which are they, how they do it and what is going on will come in another post. 😉

The third algorithm copes with Graph Partitioning. Another fascinating area! Just imagine you are working on Facebook, and you have all these millions of addicts (I do not accuse anybody and I do not exclude myself ;p) posting, commenting, poking, sending messages to their friends. We – the users – create a graph; we are the nodes and by becoming friends with someone we create the edges. The graph created by all this mess is huge. The machines used to serve our requests and store all our information are a lot and partitioning the data to these machines is not a simple problem. It’s easier though if I and my friends get served from the same machine (or same cluster), so whenever we exchange messages, or post and comment to each other, these operations will be executed faster. The algorithm I implemented is one of the many existing out there, but one of the few that allows me to implement it in Pregel-mode 🙂 More info to come in another post. 😀

Before this thesis I never had any experience with recommendation systems; now I can say that this is a huge area of business and of great importance. Deepening in this area takes a place in my ToDo list!

As for the graph partitioning, this is my second time. My first project was last semester at KTH, in which we implemented a project about Scaling Online Social Networks. Hmm I should write about this as well! 😀

Well, I had my dose of funny/relaxed/non-academic writing.

Back to my serious/academic/thesis writing. 🙂


One thought on “4 days before delivering the Thesis Report

  1. After reading some of your posts about the Giraph, I have sent following mail to Giraph community.
    Few days ago, I wrote a MapReduce job for iterative graph processing. In my job, the map tasks do the computation and the reduce tasks create some nodes dynamically based on the computation in the map phase. The final output then feed back as input of the map tasks and the whole procedures go on for n times.
    After running the job in a cluster, I found the job completion time is much higher than expected. So, I have decided to do the coding using the Giraph. I have successfully built the framework and studied some example source codes. Now, I am confused about the possibility to implement my Hadoop Job in the Giraph framework.

    As, my problem involves creation of the nodes/vertices in run time, I am little bit confused. There is no example to create the node dynamically. Also, I do not understand how I can do the iteration and backtracking of source node. I am desperately seeking suggestions in this matter. Also, I am interested to contribute as a developer. Is it possible.

    A. Sarker

    But nobody has replied yet. Can you give me any suggestions? Thanks!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s