Skip to content

Commit 9bf86c5

Browse files
committed
Merge pull request #36 from kbastani/1.1.1-RELEASE
Merging 1.1.1-RELEASE into master
2 parents 2b39b6b + fe63f36 commit 9bf86c5

File tree

18 files changed

+1475
-161
lines changed

18 files changed

+1475
-161
lines changed

README.md

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ This docker image adds high-performance graph analytics to a [Neo4j graph databa
88

99
*Closeness Centrality*
1010

11+
*Betweenness Centrality*
12+
1113
*Triangle Counting*
1214

1315
*Connected Components*
@@ -23,13 +25,13 @@ The Neo4j Mazerunner service in this image is a [unmanaged extension](http://neo
2325
Installation requires 3 docker image deployments, each containing a separate linked component.
2426

2527
* *Hadoop HDFS* (sequenceiq/hadoop-docker:2.4.1)
26-
* *Neo4j Graph Database* (kbastani/docker-neo4j:2.2.0)
28+
* *Neo4j Graph Database* (kbastani/docker-neo4j:2.2.1)
2729
* *Apache Spark Service* (kbastani/neo4j-graph-analytics:1.1.0)
2830

2931
Pull the following docker images:
3032

3133
docker pull sequenceiq/hadoop-docker:2.4.1
32-
docker pull kbastani/docker-neo4j:2.2.0
34+
docker pull kbastani/docker-neo4j:2.2.1
3335
docker pull kbastani/neo4j-graph-analytics:1.1.0
3436

3537
After each image has been downloaded to your Docker server, run the following commands in order to create the linked containers.
@@ -43,13 +45,13 @@ After each image has been downloaded to your Docker server, run the following co
4345
# Create Neo4j database with links to HDFS and Mazerunner
4446
# Replace <user> and <neo4j-path>
4547
# with the location to your existing Neo4j database store directory
46-
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j:2.2.0
48+
docker run -d -P -v /Users/<user>/<neo4j-path>/data:/opt/data --name graphdb --link mazerunner:mazerunner --link hdfs:hdfs kbastani/docker-neo4j:2.2.1
4749

4850
### Use Existing Neo4j Database
4951

5052
To use an existing Neo4j database, make sure that the database store directory, typically `data/graph.db`, is available on your host OS. Read the [setup guide](https://github.com/kbastani/docker-neo4j#start-neo4j-container) for *kbastani/docker-neo4j* for additional details.
5153

52-
> Note: The kbastani/docker-neo4j:2.2.0 image is running Neo4j 2.2.0. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.
54+
> Note: The kbastani/docker-neo4j:2.2.1 image is running Neo4j 2.2.1. If you point it to an older database store, that database may become unable to be attached to a previous version of Neo4j. Make sure you back up your store files before proceeding.
5355
5456
### Use New Neo4j Database
5557

@@ -69,6 +71,7 @@ Replace `{analysis}` in the endpoint with one of the following analysis algorith
6971

7072
- pagerank
7173
- closeness_centrality
74+
- betweenness_centrality
7275
- triangle_count
7376
- connected_components
7477
- strongly_connected_components
@@ -98,7 +101,7 @@ To begin graph analysis jobs on a particular metric, HTTP GET request on the fol
98101

99102
* PageRank is used to find the relative importance of a node within a set of connected nodes.
100103

101-
### Closeness Centrality (New)
104+
### Closeness Centrality
102105

103106
http://172.17.0.21:7474/service/mazerunner/analysis/closeness_centrality/FOLLOWS
104107

@@ -108,6 +111,16 @@ To begin graph analysis jobs on a particular metric, HTTP GET request on the fol
108111

109112
* A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes.
110113

114+
### Betweenness Centrality
115+
116+
http://172.17.0.21:7474/service/mazerunner/analysis/betweenness_centrality/FOLLOWS
117+
118+
* Gets all nodes connected by the `FOLLOWS` relationship and updates each node with the property key `betweenness_centrality`.
119+
120+
* The value of the `betweenness_centrality` property is a float data type, ex. `betweenness_centrality: 20.345`.
121+
122+
* Betweenness centrality is an indicator of a node's centrality in a network. It is equal to the number of shortest paths from all vertices to all others that pass through that node. A node with high betweenness centrality has a large influence on the transfer of items through the network, under the assumption that item transfer follows the shortest paths.
123+
111124
### Triangle Counting
112125

113126
http://172.17.0.21:7474/service/mazerunner/analysis/triangle_count/FOLLOWS
88.1 MB
Binary file not shown.

src/extension/pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@
66

77
<groupId>org.mazerunner</groupId>
88
<artifactId>extension</artifactId>
9-
<version>1.1.0-RELEASE</version>
9+
<version>1.1.1-RELEASE</version>
1010

1111
<properties>
12-
<neo4j.version>2.2.0</neo4j.version>
12+
<neo4j.version>2.2.1</neo4j.version>
1313
<joda.version>2.3</joda.version>
1414
<guava.version>17.0</guava.version>
1515
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
@@ -30,7 +30,7 @@
3030
<dependency>
3131
<groupId>org.neo4j</groupId>
3232
<artifactId>neo4j-kernel</artifactId>
33-
<version>2.2.0</version>
33+
<version>2.2.1</version>
3434
<type>test-jar</type>
3535
<scope>test</scope>
3636
</dependency>

src/spark/pom.xml

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<groupId>org.mazerunner</groupId>
88
<artifactId>spark</artifactId>
9-
<version>1.1.0-RELEASE</version>
9+
<version>1.1.1-RELEASE</version>
1010

1111
<properties>
1212
<jetty.version>7.6.9.v20130131</jetty.version>
@@ -75,7 +75,7 @@
7575
<dependency>
7676
<groupId>org.apache.spark</groupId>
7777
<artifactId>spark-core_2.10</artifactId>
78-
<version>1.3.0</version>
78+
<version>1.3.1</version>
7979
<exclusions>
8080
<exclusion>
8181
<groupId>ch.qos.logback</groupId>
@@ -86,7 +86,7 @@
8686
<dependency>
8787
<groupId>org.apache.spark</groupId>
8888
<artifactId>spark-graphx_2.10</artifactId>
89-
<version>1.3.0</version>
89+
<version>1.3.1</version>
9090
<exclusions>
9191
<exclusion>
9292
<groupId>ch.qos.logback</groupId>
@@ -172,6 +172,17 @@
172172
<artifactId>gson</artifactId>
173173
<version>2.3</version>
174174
</dependency>
175+
<dependency>
176+
<groupId>org.scalatest</groupId>
177+
<artifactId>scalatest_2.10</artifactId>
178+
<version>2.0</version>
179+
<scope>test</scope>
180+
</dependency>
181+
<dependency>
182+
<groupId>com.github.mdr</groupId>
183+
<artifactId>ascii-graphs_2.10</artifactId>
184+
<version>0.0.6</version>
185+
</dependency>
175186
</dependencies>
176187
<build>
177188
<resources>

src/spark/src/main/java/org/mazerunner/core/config/ConfigurationLoader.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,8 @@ public void initialize() throws IOException {
108108

109109
public void initializeTest()
110110
{
111-
hadoopSitePath = "/Users/kennybastani/hadoop-1.0.4/conf/core-site.xml";
112-
hadoopHdfsPath = "/Users/kennybastani/hadoop-1.0.4/conf/hdfs-site.xml";
111+
hadoopSitePath = "/hadoop-2.4.1/conf/core-site.xml";
112+
hadoopHdfsPath = "/hadoop-2.4.1/conf/hdfs-site.xml";
113113
hadoopHdfsUri = "hdfs://0.0.0.0:9000";
114114
mazerunnerRelationshipType = "CONNECTED_TO";
115115
rabbitmqNodename = "localhost";

src/spark/src/main/java/org/mazerunner/core/messaging/Worker.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ public class Worker {
4040
private String sparkAppName = "mazerunner";
4141

4242
@Option(name="--spark.executor.memory",usage="Amount of memory to use per executor process, in the same format as JVM memory strings (e.g. 512m, 2g). ", metaVar = "<string>")
43-
private String sparkExecutorMemory = "512m";
43+
private String sparkExecutorMemory = "4092m";
4444

4545
@Option(name="--spark.master",usage="The Spark master URL (e.g. spark://localhost:7077).",metaVar="<url>")
46-
private String sparkMaster = "local";
46+
private String sparkMaster = "local[8]";
4747

4848
@Option(name="--hadoop.hdfs",usage="The HDFS URL (e.g. hdfs://0.0.0.0:9000).", metaVar = "<url>")
4949
private String hadoopHdfs = "hdfs://0.0.0.0:9000";

src/spark/src/main/java/org/mazerunner/core/processor/GraphProcessor.java

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ public class GraphProcessor {
3434
public static final String PAGERANK = "pagerank";
3535
public static final String STRONGLY_CONNECTED_COMPONENTS = "strongly_connected_components";
3636
public static final String CLOSENESS_CENTRALITY = "closeness_centrality";
37+
public static final String BETWEENNESS_CENTRALITY = "betweenness_centrality";
3738

3839
public static JavaSparkContext javaSparkContext = null;
3940

@@ -63,9 +64,13 @@ public static void processEdgeList(ProcessorMessage processorMessage) throws IOE
6364
results = algorithms.stronglyConnectedComponents(javaSparkContext.sc(), processorMessage.getPath());
6465
break;
6566
case CLOSENESS_CENTRALITY:
66-
// Route to StronglyConnectedComponents
67+
// Route to ClosenessCentrality
6768
results = algorithms.closenessCentrality(javaSparkContext.sc(), processorMessage.getPath());
6869
break;
70+
case BETWEENNESS_CENTRALITY:
71+
// Route to BetweennessCentrality
72+
results = algorithms.betweennessCentrality(javaSparkContext.sc(), processorMessage.getPath());
73+
break;
6974
default:
7075
// Analysis does not exist
7176
System.out.println("Did not recognize analysis key: " + processorMessage.getAnalysis());
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
package org.mazerunner.core.abstractions
2+
3+
import org.apache.spark.graphx._
4+
5+
import scala.reflect.ClassTag
6+
7+
/**
8+
* The [[PregelProgram]] abstraction wraps Spark's Pregel API implementation from the [[GraphOps]]
9+
* class into a model that is easier to write graph algorithms.
10+
* @tparam VertexState is the generic type representing the state of a vertex
11+
*/
12+
abstract class PregelProgram[VertexState: ClassTag, VD: ClassTag, ED: ClassTag] protected () extends Serializable {
13+
14+
@transient val graph: Graph[VD, ED]
15+
16+
/**
17+
* The vertex program receives a state update and acts to update its state
18+
* @param id is the [[VertexId]] that this program will perform a state operation for
19+
* @param state is the current state of this [[VertexId]]
20+
* @param message is the state received from another vertex in the graph
21+
* @return a [[VertexState]] resulting from a comparison between current state and incoming state
22+
*/
23+
def vertexProgram(id : VertexId, state : VertexState, message : VertexState) : VertexState
24+
25+
/**
26+
* The message broker sends and receives messages. It will initially receive one message for
27+
* each vertex in the graph.
28+
* @param triplet An edge triplet is an object containing a pair of connected vertex objects and edge object.
29+
* For example (v1)-[r]->(v2)
30+
* @return The message broker returns a key value list, each containing a VertexId and a new message
31+
*/
32+
def messageBroker(triplet :EdgeTriplet[VertexState, ED]) : Iterator[(VertexId, VertexState)]
33+
34+
/**
35+
* This method is used to reduce or combine the set of all state outcomes produced by a vertexProgram
36+
* for each vertex in each superstep iteration. Each vertex has a list of state updates received from
37+
* other vertices in the graph via the messageBroker method. This method is used to reduce the list
38+
* of state updates into a single state for the next superstep iteration.
39+
* @param a A first [[VertexState]] representing a partial state of a vertex.
40+
* @param b A second [[VertexState]] representing a different partial state of a vertex
41+
* @return a merged [[VertexState]] representation from the two [[VertexState]] parameters
42+
*/
43+
def combiner(a: VertexState, b: VertexState) : VertexState
44+
45+
}

0 commit comments

Comments
 (0)