Right now, I should not be writing here, but only in my report :p But hey! I will be fast 😀
The day I was waiting for so long is approaching! 4 days till delivering the final thesis report. (teeth grinding, tears rolling, and a secret smile waits to give its huge finale to this 6-month performance)
I have so many words, definitions and numbers going around my head. And all this “jungle bubble” is taking structure in a form of sentences but getting restricted and limited in some lines of – somehow – academic writing.
I implemented 3 algorithms; all of them are Pregel-based, implemented on top of Giraph. They are iterative, vertex-centric and scalable.
The first two come from Collaborative Filtering; a technique used by Recommendation Systems. Think of a website with movies, where you can give ratings for movies you have watched. After rating some, the website starts recommending movies that you may like to watch. And sometimes you follow their recommendations and the movies are actually good. Magic! …Or just math. Recommendation systems use different algorithms for predicting what rating a user may give to a movie he/she has not watched. These algorithms are usually based on ratings the user has given in the past and on ratings of other users. Imagine you and another guy (or girl) from the other side of the planet have the same taste in movies and give the same ratings. It’s reasonable that if he (or she) watches a movie and gives a good rating for it, you should receive a recommendation from the system about this movie. Of course there are exceptions in the rule and other factors affect our preferences, but something is better than nothing. 😉 So the two algorithms do some math (or magic :p) and improve the predictions! Which are they, how they do it and what is going on will come in another post. 😉
The third algorithm copes with Graph Partitioning. Another fascinating area! Just imagine you are working on Facebook, and you have all these millions of addicts (I do not accuse anybody and I do not exclude myself ;p) posting, commenting, poking, sending messages to their friends. We – the users – create a graph; we are the nodes and by becoming friends with someone we create the edges. The graph created by all this mess is huge. The machines used to serve our requests and store all our information are a lot and partitioning the data to these machines is not a simple problem. It’s easier though if I and my friends get served from the same machine (or same cluster), so whenever we exchange messages, or post and comment to each other, these operations will be executed faster. The algorithm I implemented is one of the many existing out there, but one of the few that allows me to implement it in Pregel-mode 🙂 More info to come in another post. 😀
Before this thesis I never had any experience with recommendation systems; now I can say that this is a huge area of business and of great importance. Deepening in this area takes a place in my ToDo list!
As for the graph partitioning, this is my second time. My first project was last semester at KTH, in which we implemented a project about Scaling Online Social Networks. Hmm I should write about this as well! 😀
Well, I had my dose of funny/relaxed/non-academic writing.
Back to my serious/academic/thesis writing. 🙂