Run Example in Giraph: Shortest Paths

When planning to run a code in Giraph, I ask myself some questions. When I answer to all my questions, I move to actually implement and run the code. (so I kinda discuss a lot with myself :p). Let’s have a look to this inner discussion – while running the Shortest Paths problem.

~~~~~ Q#1: What’s the Shortest Path problem?

Problem Description: Find the shortest path between 2 vertices in a graph, so that the sum of weights of the edges in the path is minimized. The example given in Giraph finds the shortest path from each vertex to the source-vertex.

~~~~~ Q#2: How can this be implemented in Giraph?

Think “Pregely”: Since in Pregel the same code is executed in all vertices at the same time, we need to think as we are a vertex.

So, I am a vertex. I should receive messages, make some computation and send messages. If I am the source-vertex, then no edges are needed to reach myself and therefore the shortest path to myself is weighted to zero. Otherwise, I want to find the shortest path to the given source vertex. I do not know any information other than: (i) my Id (not so useful in this example), (ii) my value (which is initialized to a maximum constant at the beginning), (iii) the values on the edges going out from me (read from input file) and (iv) the messages I receive from neighbors (computed during this phase). Since I do not know what’s going on between me and the source (how many other vertices exist, how much they weight), I should receive messages with the sum of edges-weights from the source up to me. From all the messages I receive, I choose the minimum sum (this is the shortest path) and make it my value only if it’s smaller than my current value. Then I prepare the messages to be sent to my neighbors; for each of my edges, I add  the edge weight to my value and send the new sum to the destination vertex of this edge.

top

~~~~~ Q#3: How should the main code of Shortest Paths be?

The above computation of the vertex is the compute() method of the algorithm. This algorithm is one of the Giraph examples. Directory: giraph-examples/scr/main/java/. Package: org.apache.giraph.examples. Name: SimpleShortestPathsVertex.java

top

~~~~~ Q#4: What is the input file? What information do I need in order to run the algorithm and in what format am I gonna receive this information? Is any of the existing java files for I/O capable to read my input file?

  • The Input File should contain vertices and edges between them with weights on the edges. Let’s say, each line should have the SourceVertexId, and tubles of DestinationVertexId and EdgeWeight.
  • The VertexId and EdgeWeight should be of some type: int, double, whatever.
  • Yes, there is an existing input format, i.e. JsonLongDoubleFloatDoubleVertexInputFormat. This file is a Vertex Reader. It expects to receive information for a vertex; a Long Vertex Id, a Double Vertex Value, a Float Edge Value and expects the messages sent/received to transfer Double values. This takes lines in Json format. A very good explanation of the Json format is given in the comments of the file: “The files should be in the following JSON format:
    JSONArray(, , JSONArray(JSONArray(, ), …)).”
    Here is a simple input file I have created. I give 0 to all vertices values, because anyway the code does not take them into consideration, it initializes them to a MAX constant.
[0,0,[[1,1],[3,3]]]
[1,0,[[0,1],[2,2],[3,1]]]
[2,0,[[1,2],[4,4]]]
[3,0,[[0,3],[1,1],[4,4]]]
[4,0,[[3,4],[2,4]]]

top

~~~~~ Q#5: What is the output file? What do I want to print in the output file? Is any of the existing java files for I/O capable to generate the desired output file?

  • The Output File should contain all vertices with the value of their shortest path to the source-vertex. Let’s say, each line should have the VertexId and the the Sum of the weights for its shortest path.
  • The VertexId and the Sum should be of some type, int, double, whatever.
  • Yes, there is an existing output format, i.e. IdWithValueTextOutputFormat. This writes out vertices’ IDs and values.

top

~~~~~ Q#6: How do I run the algorithm?

Start the MapReduce and run the Giraph command line.

1. In the hadoop directory run:

bin/start-dfs.sh
bin/start-mapred.sh

2. Create the input folder in the HDFS and move the input file there.

hadoop fs -mkdir /in
hadoop fs -put /local-directory-to-input-file/input_file /in/

3. Run the command, in which you should include: (i) the jar file generated when installing giraph, (ii) the path to the main code, (iii) the path to the code for reading the input file and the path to the input file, (iv) the path to the code for generating the output file and the path to the output file, (v) the number of workers. Below I give all these parameters in the same order.

hadoop jar /directory-to-giraph/giraph-core/target/giraph-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /in/input-json -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /outShortest -w 1

top

~~~~~ Q#7: What results should I expect?

If it runs successfully, then an output folder named outShortest is created with the output file. Open the file:

hadoop fs -cat /outShortest/*

The result should be this:

0 1.0
2 2.0
1 0.0
3 1.0
4 5.0

Here the source vertex was the Vertex Id 1. This was given as a constant at the beginning of the code and can be of course changed.

top

Observation: I have noticed that this algorithm works only when the input file gives an undirected graph, i.e. when both edges a b and b a are included in the input file.
Why? At Superstep 0, the vertices get initialized with a MAX value. If a vertex does not exist at superstep 0, when it gets created because of receiving a message, its value will not be initialized to MAX like the others (since we are not at superstep 0 anymore). Therefore, its behavior get a bit random with no value initialization.
What should we do? I submitted my first patch!
And the conclusion is…? I am learning new stuff! One of the Giraph members suggested to use the VertexValueFactory.java* to fix the problem. We are in the process of fixing it and in the meanwhile I’m learning more cool stuff 😀

* The VertexValueFactory is responsible to initialize the vertex value when it is created either by reading the input file or by receiving a message.

top

Advertisements

39 thoughts on “Run Example in Giraph: Shortest Paths

    • Not yet :p My initial patch was the addition of an if-clause to initialize the vertices created after Supertep 0. Then I made a second patch using the VertexValueFactory to automatically initialize all vertices independently of the way they are created. I’m waiting for the greater forces (Giraph members and advanced contributors) to check it 😀

      Reply
  1. Hi Marsty5,
    Thank you for great instructions on how to install Giraph and run the shortest path example.

    Unfortunately I can’t get the shortest path example above to run on my Hadoop instance. I have a single node instance running on Ubuntu 12.04

    This is the error I am getting:
    Exception in thread “main” java.lang.ClassNotFoundException: org.apache.giraph.examples.SimpleShortestPathsVertex

    This is the command I am running:
    hadoop jar /home/ubuntu/giraph/giraph-core/target/giraph-1.1.0-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar
    org.apache.giraph.GiraphRunner
    org.apache.giraph.examples.SimpleShortestPathsVertex
    -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
    -vip /in/SimpleJsonArray2
    -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
    -op /outShortest1
    -w 1

    If you have any ideas how I could get this example to work I would really appreciate it.
    Thank you
    Peter

    Reply
    • Hello Peter,

      I’m glad you found these posts helpful!
      The ClassNotFoundException occurs when it cannot find the code in the jar file (maybe something went wrong in the compilation).
      1. Did you do ‘mvn compile’ in the giraph/ and not the giraph/giraph-core right? If you did it in the giraph/giraph-core, then the examples are not compiled.
      2. Check if both jar files exist in giraph/giraph-core/target/ and giraph/giraph-examples/target/. If not, then do ‘mvn compile’ again and be sure that you get a SUCCESS message for the packages ‘Apache Giraph Core’ and ‘Apache Giraph Examples’.
      3. If it still does not work, then run the command using directly the jar file from the giraph-examples instead of the giraph-core, which is: giraph/giraph-examples/target/giraph-examples-0.2-SNAPSHOT-for-hadoop-0.20.203.0-jar-with-dependencies.jar

      Please, tell me if this works! 🙂
      You can also post this question – exactly like you wrote it here in the mailing list user@giraph.apache.org. People are more than willing to help!

      Reply
  2. Hi Maria,
    Thank you for replying to my question.

    The mvn compile command was not completing correctly. It was failing when it got to the Giraph examples. I ran the below command instead and Giraph compiled correctly

    mvn -Phadoop_1.0 -DskipTests clean install

    I then tried to run the Shortest Path command but I don’t have the SimpleShortestPathsVertex file so I used the SimpleShortestPathsComputation file instead, see below.

    bin/hadoop jar /home/ubuntu/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation
    -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
    -vip /in
    -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
    -op /outShortest
    -w 1

    I am getting warnings on screen when I run the code but the command is completing and the output file is correct.

    13/05/23 22:31:50 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
    13/05/23 22:31:50 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
    13/05/23 22:31:50 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
    13/05/23 22:31:50 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
    13/05/23 22:31:50 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)

    I will post the warning up to the mailing list and see if anybody can explain it.
    Thank you,
    Peter

    Reply
    • Interesting!
      1. With the command ‘mvn -Phadoop_1.0 -DskipTests clean install’ you specify which version of hadoop to use. I am using hadoop 0.20.203.0, maybe you installed a previous version and there was a conflict?

      2. When you say you don’t have the SimpleShortestPathsVertex, where did you look for it? It must be – specifically – in the path /giraph-folder/giraph-examples/src/main/java/org/apache/giraph/examples/.
      If it is there but you still get the not found exception, then it’s not inside the jar.
      (a) Check if the jar file from giraph/giraph-examples/target/ has the SimpleShortestPathsVertex (in terminal run: jar tf snapshot-bla-bla.jar | grep SimpleShortestPathsVertex).
      If true –> Try step 3 from my previous comment.
      else –> mvn compile inside the giraph/giraph-examples and repeat 2(a).

      3. Where did you find the SimpleShortestPathsComputation? I cannot find it anywhere. Do you have the latest release of Giraph (May 6, 2013) ?

      4. As long as they are warnings and not errors, we are ok 😉

      5. Please do post it on the mailing list! (I’m waiting to see your e-mail) I would like to read how the members explain these issues!

      Reply
  3. Pingback: Run example in Giraph: PageRank | In a distributed manner... or not!

  4. Hi Maria,

    nice documentation. I would add something: why don’t you put a small section related to how to test it with a different starting vector. I guess it would be very helpful.

    I checked the current implementation and I found out that there is the need for the giraph-site.xml configuration file placed in the hadoop/conf/ directory with the following content:

    SimpleShortestPathsVertex.sourceId
    0

    of course the name of the property can change depending on the release of the code 🙂

    Cheers,
    Armando

    Reply
    • Hi Armando,

      You are right that I should have mention this! There is no reason to modify the giraph-site.
      In the end of the command line (question 6, point 3) you can add the following:
      -ca SimpleShortestPathsVertex.source=2 if you want node 2 to be the source node.

      Cheers

      Reply
      • I tried that previously and it did not work.. are you sure that it should work? I checked the source code and it explicitely works on a Configuration object.

      • Yes, I tested it and then posted the comment 🙂 Yes you are right that it works on a Configuration object. GiraphRunner (it is included in the command line) creates a configuraiton object. It also calls the ConfigurationUtils which reads and translates the parameters given in the command line. Have a look at the source code of org.apache.giraph.GiraphRunner.java and org.apache.giraph.utils.ConfigurationUtils.java.
        I’ve been told that I can use -D instead of the -ca, but it never worked for me. maybe you found the alternative way. 😉

  5. Pingback: HowTo: The Shortest Path example | A great computer science adventure

  6. Hi Maria,

    ===========
    I got stucked at Map 100%, reduce 0% for half an hour
    =====
    The output is:

    [hadoop@Giraffe giraph-1.0.0]$ hadoop jar /home/hadoop/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hadoop/input_file -of org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexOutputFormat -op /user/hadoop/output_file -w 1
    13/06/27 13:53:51 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
    13/06/27 13:53:51 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
    13/06/27 13:53:52 INFO mapred.JobClient: Running job: job_201306271352_0001
    13/06/27 13:53:53 INFO mapred.JobClient: map 0% reduce 0%
    13/06/27 13:54:07 INFO mapred.JobClient: map 50% reduce 0%
    13/06/27 13:54:08 INFO mapred.JobClient: map 100% reduce 0%
    ALWAYS STUCK HERE FOR LONG TIME

    Have you ever encountered with this problem?

    Thanks,

    Hang

    Reply
    • Hey Hang,

      some comments: A giraph job is a map-only job. The reduce never runs, so you will never see any growing percentage in the reduce phase. The fact that it stucks in 100% means that it’s actually running and it never reaches a halting condition. So my questions are:
      1. which input file are you using? mine?
      2. which giraph version are you using?

      Reply
      • Hi Maria,

        I’m new to both Hadoop and Giraph. I also encounter exactly the same problem as Hang.
        1. The input file is tiny_graph.txt
        [0,0,[[1,1],[3,3]]]
        [1,0,[[0,1],[2,2],[3,1]]]
        [2,0,[[1,2],[4,4]]]
        [3,0,[[0,3],[1,1],[4,4]]]
        [4,0,[[3,4],[2,4]]]
        2. Version of Giraph is 1.1.0 while Hadoop is 1.2.1

        Thanks,

        Alfred

    • I had the same problem. The issue is that although the Hadoop Job may encounter errors (e.g. exception), it does not stop the execution or output the error for us. I’m new to Hadoop so I don’t know if this issue only pertains Giraph.

      So if hadoop stucks, check the status using the Job tracker (localhost:50030) to see if there’s any problem. This is what I did and found out that the error was due to reading of the input file (extra blank line at the end of the file). If that’s the case, just delete the last line of the input file.

      Reply
  7. I am having some trouble getting these examples running. I’m using your input file and using giraph version 1.1.0. I see the following output.

    13/07/29 14:36:06 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
    13/07/29 14:36:06 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
    13/07/29 14:36:20 INFO mapred.JobClient: Running job: job_201307232135_0588
    13/07/29 14:36:21 INFO mapred.JobClient: map 0% reduce 0%
    13/07/29 14:36:52 INFO mapred.JobClient: map 50% reduce 0%
    13/07/29 14:47:24 INFO mapred.JobClient: map 0% reduce 0%
    13/07/29 14:47:39 INFO mapred.JobClient: Job complete: job_201307232135_0588
    13/07/29 14:47:39 INFO mapred.JobClient: Counters: 6
    13/07/29 14:47:39 INFO mapred.JobClient: Job Counters
    13/07/29 14:47:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=670508
    13/07/29 14:47:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    13/07/29 14:47:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    13/07/29 14:47:39 INFO mapred.JobClient: Launched map tasks=2
    13/07/29 14:47:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
    13/07/29 14:47:39 INFO mapred.JobClient: Failed map tasks=1

    When I check the job tracker, I see that two map jobs were killed, with the following errors:

    java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
    Caused by: java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)

    java.lang.IllegalStateException: run: Caught an unrecoverable exception exists: Failed to check /_hadoopBsp/job_201307232135_0588/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions after 3 tries!
    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
    at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.IllegalStateException: exists: Failed to check /_hadoopBsp/job_201307232135_0588/_applicationAttemptsDir/0/_superstepDir/-1/_addressesAndPartitions after 3 tries!
    at org.apache.giraph.zk.ZooKeeperExt.exists(ZooKeeperExt.java:369)
    at org.apache.giraph.worker.BspServiceWorker.s
    Task attempt_201307232135_0588_m_000001_0 failed to report status for 600 seconds. Killing!

    Any idea what the problem is?
    Thanks in advance.

    Reply
    • Which Hadoop version do you use? if you run on Hadoop 1.0.3, try to build the Giraph with this command: mvn -Phadoop_1.0 clean package
      If this is not the case, post your issue in the mailing list to see what the rest of the guys will say about it 🙂

      Reply
  8. I am new to Apache Giraph. I am just starting my experiments with giraph example (shortest path)..

    From your post, I can find ”
    Observation: I have noticed that this algorithm works only when the input file gives an undirected graph, i.e. when both edges a → b and b → a are included in the input file.
    Why? At Superstep 0, the vertices get initialized with a MAX value. If a vertex does not exist at superstep 0, when it gets created because of receiving a message, its value will not be initialized to MAX like the others (since we are not at superstep 0 anymore). Therefore, its behavior get a bit random with no value initialization.”

    I am able to see this incorrect behavior with my input file having included only edge a->b and not b->a. But, I am not able to understand the reason why it is not behaving correctly. Could you please help me to understand the reason?

    Reply
    • Let’s say we included a->b and not b->a.
      At Superstep 0, a is created and sets its value to MAX. b is not created yet!
      At Superstep 1, there is a message from a to b. By definition, a vertex is created if a message is sent to it. So b gets created but it does not set its value to MAX because we are not at Superstep 0 anymore. The algorithm does not (at least did not last time I checked) handle the initialization of vertices values if they are created after Superstep 0.

      Supersteps are global and are the same for all vertices no matter when they get created; thus the thought “when a vertex is created, it begins from Superstep 0” is not valid.

      Reply
  9. Hello Maria,

    Thank you for your installation guide for Giraph it was very useful and well presented! I am having trouble with running this simpleShotestPathsVertex example. I use the data you provided, with the following command:

    hadoop jar /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/ghufran/input/input-json -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ghufran/output1 -w 1

    but for some reason my map job fails. This is displayed after I run the command:

    14/02/13 18:51:14 INFO mapred.JobClient: Running job: job_201402131818_0004
    14/02/13 18:51:15 INFO mapred.JobClient: map 0% reduce 0%
    14/02/13 18:51:33 INFO mapred.JobClient: map 50% reduce 0%
    14/02/13 18:54:18 INFO mapred.JobClient: map 0% reduce 0%
    14/02/13 18:54:21 INFO mapred.JobClient: map 50% reduce 0%
    14/02/13 19:05:01 INFO mapred.JobClient: map 0% reduce 0%
    14/02/13 19:05:06 INFO mapred.JobClient: Job complete: job_201402131818_0004
    14/02/13 19:05:06 INFO mapred.JobClient: Counters: 6
    14/02/13 19:05:06 INFO mapred.JobClient: Job Counters
    14/02/13 19:05:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=653402
    14/02/13 19:05:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    14/02/13 19:05:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    14/02/13 19:05:06 INFO mapred.JobClient: Launched map tasks=2
    14/02/13 19:05:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
    14/02/13 19:05:06 INFO mapred.JobClient: Failed map tasks=1

    Any help on this would be much appreciated.

    Thank you

    Reply
  10. Sorry I also forgot to include this (which comes first):

    14/02/13 20:16:13 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
    14/02/13 20:16:13 WARN job.GiraphConfigurationValidator: Output format vertex index type is not known
    14/02/13 20:16:13 WARN job.GiraphConfigurationValidator: Output format vertex value type is not known
    14/02/13 20:16:13 WARN job.GiraphConfigurationValidator: Output format edge value type is not known
    14/02/13 20:16:13 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)

    In my above question.

    Reply
    • Hi Ghurfran, thank you for your kind words.
      From the error messages we can see “Output format vertex index type is not known”, which means it doesn’t recognize the output format code. Can you try -vof instead of -of before “org.apache.giraph.io.formats.IdWithValueTextOutputFormat” ? Let me know.

      Reply
      • Nps! It says that it does not recognise the -vof option. The full command and output is below:

        hadoop jar /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/ghufran/input/input-json -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/ghufran/outShortest -w 1

        Exception in thread “main” org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -vof
        at org.apache.commons.cli.Parser.processOption(Parser.java:363)
        at org.apache.commons.cli.Parser.parse(Parser.java:199)
        at org.apache.commons.cli.Parser.parse(Parser.java:85)
        at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:129)
        at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

  11. OK, for some reason it seems to of magically fixed itself, and is working now :s? I still get the WARN messages and I still use -of.

    Reply
  12. hello,
    I am working on hadoop giraph for recommender system , i tested your example on hadoop giraph now i want to implement depth first search using giraph, so can you tell me what to refer for the same?

    Reply
  13. Hi,

    I am trying to run a giraph job (to compute Multi-source shortest paths) on my cloudera cluster (3 nodes) . Here is my command:
    hadoop jar $OKAPI_JAR org.apache.giraph.GiraphRunner \
    -libjars $OKAPI_JAR \
    ml.grafos.okapi.graphs.MultipleSourceShortestPaths\$InitSources \
    -mc ml.grafos.okapi.graphs.MultipleSourceShortestPaths\$MasterCompute \
    -eif ml.grafos.okapi.io.formats.LongFloatTextEdgeInputFormat \
    -eip $INPUT_PATH/inputg-edges \
    -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
    -op $OUTPUT/multiSSP \
    -w 2 \
    -ca sources.list=2

    The job execution gives the following error:
    14/06/06 17:27:50 ERROR yarn.GiraphYarnClient: Giraph: ml.grafos.okapi.graphs.MultipleSourceShortestPaths$InitSources reports FAILED state, diagnostics show: Application application_1401922662003_0008 failed 2 times due to AM Container for appattempt_1401922662003_0008_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
    Container exited with a non-zero exit code 1
    .Failing this attempt.. Failing the application.

    Please let me know if I am missing any configuration settings here.
    All I did was to place the okapi jar (okapi-giraph-yarn-0.3.3.jar) in all my nodes and set the $OKAPI_JAR variable.

    Thank you. I appreciate your time.

    Reply
  14. Hi ,

    i have been trying to run “SimpleShortestPathsVertex” program from eclipse and was facing error shown below,
    java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!

    So wanted to check, if it is possible to run the “SimpleShortestPathsVertex” program from eclipse or is eclipse is only to just develop code and then to create the jar file and run it in command line.

    is there a way to run the program using hadoop from eclipse??

    REGARDS,
    SATYAJIT.

    Reply
  15. Hi
    I am facing this exception when i am running the shortestpath job..can anyone help me please??

    15/05/25 18:43:53 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    15/05/25 18:43:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    15/05/25 18:43:55 INFO mapreduce.JobSubmitter: number of splits:1
    15/05/25 18:43:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1432551428094_0007
    15/05/25 18:43:55 INFO impl.YarnClientImpl: Submitted application application_1432551428094_0007
    15/05/25 18:43:55 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1432551428094_0007/
    15/05/25 18:43:55 INFO job.GiraphJob: Tracking URL: http://localhost:8088/proxy/application_1432551428094_0007/
    15/05/25 18:43:55 INFO job.GiraphJob: Waiting for resources… Job will start only when it gets all 2 mappers
    15/05/25 18:44:08 INFO mapreduce.Job: Running job: job_1432551428094_0007
    15/05/25 18:44:08 INFO mapreduce.Job: Job job_1432551428094_0007 running in uber mode : false
    15/05/25 18:44:08 INFO mapreduce.Job: map 100% reduce 0%
    15/05/25 18:44:08 INFO mapreduce.Job: Job job_1432551428094_0007 failed with state FAILED due to: Task failed task_1432551428094_0007_m_000000
    Job failed as tasks failed. failedMaps:1 failedReduces:0

    15/05/25 18:44:08 INFO mapreduce.Job: Counters: 8
    Job Counters
    Failed map tasks=1
    Launched map tasks=1
    Other local map tasks=1
    Total time spent by all maps in occupied slots (ms)=5795
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=5795
    Total vcore-seconds taken by all map tasks=5795
    Total megabyte-seconds taken by all map tasks=5934080

    Thanks
    Laxmi

    Reply
  16. I am getting this error after running this command
    hadoop jar /home/lakshay/giraph-folder/giraph/giraph-core/target/giraph-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /in/input-json -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /outShortest -w 1

    and I am getting this error after running this command

    Exception in thread “main” org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -of
    at org.apache.commons.cli.Parser.processOption(Parser.java:363)
    at org.apache.commons.cli.Parser.parse(Parser.java:199)
    at org.apache.commons.cli.Parser.parse(Parser.java:85)
    at org.apache.giraph.utils.ConfigurationUtils.parseArgs(ConfigurationUtils.java:197)
    at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:74)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

    why is it showing unrecognized option: -of ??
    P.S. – My build was not successful
    this is the summary of my build failure

    Reactor Summary:
    [INFO]
    [INFO] Apache Giraph Parent ………………………… SUCCESS [3.423s]
    [INFO] Apache Giraph Core ………………………….. SUCCESS [7:38.494s]
    [INFO] Apache Giraph Blocks Framework ……………….. SUCCESS [29.505s]
    [INFO] Apache Giraph Examples ………………………. SUCCESS [10:02.845s]
    [INFO] Apache Giraph Accumulo I/O …………………… SUCCESS [32.021s]
    [INFO] Apache Giraph HBase I/O ……………………… FAILURE [15.444s]
    [INFO] Apache Giraph HCatalog I/O …………………… SUCCESS [13:01.196s]
    [INFO] Apache Giraph Gora I/O ………………………. FAILURE [5:30.401s]
    [INFO] Apache Giraph Rexster I/O ……………………. SUCCESS [0.721s]
    [INFO] Apache Giraph Rexster Kibble …………………. SUCCESS [1:34.204s]
    [INFO] Apache Giraph Rexster I/O Formats …………….. FAILURE [9:35.679s]
    [INFO] Apache Giraph Distribution …………………… SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE

    Please help me out on how to run this command and how to proceed further with apache giraph?

    Reply
  17. HI Thanks for the detailed explanation, but i didnt find the path. i mean finding shortespath is good but there is now way. how can we show the path. example: 4 5.0

    But i need to display the path as well like 4->3->1

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s