- Chen_Gu_poster.pdf: The final poster
- Chen_Gu_poster.zip: We used the overleaf to make the poster, and this is the original files
- Chen_Gu_abstract.pdf: The abstract of final project.
-
CompareDifVersion.ipynb: The jupyter notebook that used for compare different version paraclique algorithms
- Figure 3 was generated by this code
- The running time can be long!
- The first part of the code analysis the data distribution and generate the figure 3 in the poster
- The second part records the running time of the three different impletation with different samples
- The output of second part are tables/files with the running time
- Change the samplesize, samplenum, and inputfile in the code can change the samples. In our project, we manually changed these three paramaters to get the results of different samples.
-
time\time_resluts_plot.ipynb: visualization the time tables
- Figure 4 was generated by this code
- After we got the time tables of different samples, we moved the files into the time directory and plot with time_resluts_plot.ipynb
-
CompareDifVersion.py: Same code with CompareDifVersion.ipynb
- Because the computing needs a long time. What we real did is run this python code at the background
- To run this code, change the PYSPARK_PYTHON to /opt/anaconda3/bin/python in .bashrc
-
These 5 input files will be downloaded by gdown when run the jupyter notebook. If for any reason the download was fail, the files can be downloaded from
-
fdownload = { "large.clique" : "https://drive.google.com/file/d/1BKjc9we7qCoJ5lFAoY5DCYXxbZIrNG4A/view?usp=sharing", "middle.clique" : "https://drive.google.com/file/d/1i24rvEufCwibeDh9RzqUEigZHxSv4KcG/view?usp=sharing", "small.clique" : "https://drive.google.com/file/d/1phbG6V8Mx2Dk30rfAL2gI0xYCZIYhlHc/view?usp=sharing", "C14.280.edgelist": "https://drive.google.com/file/d/1p7wPv4CxtGJ9y8GQ7kXsNpM89xxwbpPE/view?usp=sharing", "C14-280-cliques.clique": "https://drive.google.com/file/d/1OZTfCtZpvNXvQjEpWjZjNhCm01eqbBSd/view?usp=sharing"}
-
C14.280.edgelist: the original graph
-
C14-280-cliques.clique: the maximal cliques of the graph C14.280.edgelist
-
small.clique: The cliques that vertex number less than 100 in C14-280-cliques.clique
-
middle.clique: The cliques in C14-280-cliques.clique that have sizes between 100 and 250
-
large.clique: The cliques that vertex number larger than 250 in C14-280-cliques.clique
- time: some output of our code.
- For instance, large800-10.time file includes the time statistical result of a test on large clique set. The test include 10 samples, each sample have 800 large cliques