The Metanome project is a joint project between the Hasso-Plattner-Institut (HPI) and the Qatar Computing Reserach Institute (QCRI). Metanome provides a fresh view on data profiling by developing and integrating efficient algorithms into a common tool, expanding on the functionality of data profiling, and addressing performance and scalabilities issues for Big Data. A vision of the project appears in SIGMOD Record: "Data Profiling Revisited".
The Metanome tool is supplied under Apache License. You can use and extend the tool to develop your own profiling algorithms. The profiling algorithms contained in our downloadable Metanome build have HPI copyright. You are free to use and distribute them for research purposes.
Metanome is a maven project, which can be build by executing:
mvn verify
The verify phase should be executed as GWTTests are executed in this phase of the build.
Metanome can be packaged together with a jetty webserver and profiling algorithms. To speedup builds this package is not created in the default maven profile. The deployment package can be created by executing the build with the deployment profile:
mvn verify -P deployment
or by executing package on the deployment project directly (if metanome has not been installed dependencies will be retrieved online):
mvn -f deployment/pom.xml package
Metanome releases can be found on the download page at:
https://hpi.de/naumann/sites/metanome/files
The Metanome tool, information for algorithm developers and contributors to the project can be found in the github wiki.
Javadocs for the project can be found at https://hpi.de/naumann/sites/metanome/javadoc.
The Metanome modules are continously deployed to sonatype and can be used by adding the repository:
<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</repository>
</repositories>
The project follows the google-styleguide please make sure that all contributions adhere to the correct format. Formatting settings for common ides can be found at: http://code.google.com/p/google-styleguide/
All files should contain the apache copyright header. The header can be found in the COPYRIGHT_HEADER
file.