Skip to content

tahoe01/Passionfruit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Passionfruit

A Hadoop-like cloud computing framework for distributed storage and big data processing using the MapReduce programming model.

Project Setup Requirement

Please install the following tools with correct version first:

  1. Java 8 (Openjdk)
  2. Apache Maven

Steps to build & run project

  1. Change to the root of our project directory.

  2. Start the server process by issuing the command: ./start.sh <node number> <deployment mode>

  • 1st argument: Node number (range: [1, 10], according to vm number)
  • 2th argument: 1 if you run the process in your local environment; 2 if you run the process on our group's vm.
  1. After you start the server process, you can begin testing using the following command that our command line interface (CLI) supports:
  • maple [maple_exe] [num_maples] [sdfs_intermediate_filename_prefix] [sdfs_src_directory]

    • maple_exe: a user-specified executable that takes as input one file and outputs a series of (key, value) pairs. maple_exe is the file name local to the file system of wherever the command is executed (alternately, store it in SDFS).

    • num_maples: number of maple tasks.

    • sdfs_intermediate_filename_prefix: the prefix of the maple output files. For a key K, all (K, any_value) pairs output by any Maple task must be appended to the file sdfs_intermediate_filename_prefix_K.

    • sdfs_src_directory: a directory that specifies the location of the input files.

  • juice <juice_exe> <num_juices> <sdfs_intermediate_filename_prefix> <sdfs_dest_filename> delete_input={0,1} shuffle_option={1,2}

    • juice_exe: a user-specified executable that takes as input multiple(key, value) input lines, processes groups of (key, any_values) input lines together(sharing the same key, just like Reduce), and outputs (key, value) pairs. juice_exe is the file name local to the file system of wherever the command is executed (we also store it in SDFS).

    • num_juices: the number of Juice tasks (typically the same as the number of machines

    • sdfs_intermediate_filename_prefix: a prefix of file indicating what intermediate files to read and preocess

    • sdfs_dest_filename: the name of the juice output file.

    • delete_input: 0 indicating not to delete the input file, 1 indicating delete the input file.

    • shuffle_option: 1 indicating hash partitioning, 2 indicating range partitioning.

  • put [localfilename] [sdfsfilename]

    Insert/Update a local file named localfilename into a file in the SDFS file system named sdfsfilename.

  • get [sdfsfilename] [localfilename]

    Fetch a file named sdfsfilename from the SDFS file system to the local directory, and name it localfilename.

  • delete [sdfsfilename]

    Delete the file named sdfsfilename from the SDFS file system.

  • ls [sdfsfilename]

    List all machine (VM) addresses where this file is currently being stored.

  • store

    List all files currently being stored at the current machine.

  • global

    List all files currently being stored at the ALL machines.

  • info

    Print basic information about the current running node: NodeId, Heartbeat Count, Unix Timestamp when node was created, Converted timestamp (CDT), Status.

  • ls

    Print information of all nodes in the membership list. Information includes: NodeId, Heartbeat Count, Latest timestamp (Timestamp when last heartbeat was received), Converted timestamp (CDT), Status. If the node is not in the group, it will not print membership list. Introducer will always be in the group automatically.

  • join

    Join the current running node to the group. Optional argument: 1 (All-to-All), 2 (Gossip).

  • leave

    Let the node leave the group.

  • switch

    Switch the failure detection protocol.

  • stop

    Stop SDFS Server and Membership Server process. You can also use Ctrl C to kill the server process.

About

Distributed Storage & Big Data Processing Framework

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published