Skip to content
This repository was archived by the owner on Jan 10, 2025. It is now read-only.

Commit 3129d4d

Browse files
author
Peter Kim
committed
Merge pull request #7 from peterskim12/master
Add campaign finance demo
2 parents 0e10d85 + 78d5a29 commit 3129d4d

File tree

481 files changed

+115151
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

481 files changed

+115151
-0
lines changed

Diff for: usfec/README.md

+90
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
US FEC Campaign Contributions Demo: 2013-2014 US Election cycle
2+
=====
3+
4+
For some background information for this demo, please see the blog post here:
5+
[Kibana 4 for investigating PACs, Super PACs, and who your neighbor might be voting for](http://www.elasticsearch.org/blog/kibana-4-for-investigating-pacs-super-pacs-and-your-neighbors/)
6+
7+
#Installation
8+
9+
This demo consists of the following:
10+
11+
* Instructions for restoring index snapshot with pre-indexed campaign contributions data
12+
* Python script for joining normalized files and outputting JSON
13+
* Elasticsearch index template
14+
* Logstash config
15+
16+
17+
## Restoring index snapshot
18+
19+
After downloading and installing the ELK stack, you’ll need to download the index snapshot file for the campaign contributions data which can be obtained here (FYI it’s a 1.4GB file; we take no responsibility for this download eating up your monthly mobile tethering quota):
20+
21+
http://download.elasticsearch.org/demos/usfec/snapshot_demo_usfec.tar.gz
22+
23+
Create a folder somewhere on your local drive called “snapshots” and uncompress the .tar.gz file into that directory. For example:
24+
```
25+
# Create snapshots directory
26+
mkdir -p ~/elk/snapshots
27+
# Copy snapshot download to your new snapshots directory
28+
cp ~/Downloads/snapshot_demo_usfec.tar.gz ~/elk/snapshots
29+
# Go to snapshots directory
30+
cd ~/elk/snapshots
31+
# Uncompress snapshot file
32+
tar xf snapshot_demo_usfec.tar.gz
33+
```
34+
Once you have Elasticsearch running, restoring the index is a two-step process:
35+
36+
1) Register a file system repository for the snapshot (change the value of the “location” parameter below to the location of your usfec snapshot directory):
37+
```
38+
curl -XPUT 'http://localhost:9200/_snapshot/usfec' -d '{
39+
"type": "fs",
40+
"settings": {
41+
"location": "/tmp/snapshots/usfec",
42+
"compress": true,
43+
"max_snapshot_bytes_per_sec": "1000mb",
44+
"max_restore_bytes_per_sec": "1000mb"
45+
}
46+
}'
47+
```
48+
2) Call the Restore API endpoint to start restoring the index data into your Elasticsearch instance:
49+
```
50+
curl -XPOST "localhost:9200/_snapshot/usfec/1/_restore"
51+
```
52+
At this point, go make yourself a [coffee](https://bluebottlecoffee.com/preparation-guides). When your delicious cup of single-origin, direct trade coffee has finished brewing, you can check to see if the restore operation is complete by calling the cat recovery API:
53+
```
54+
curl -XGET 'localhost:9200/_cat/recovery?v'
55+
```
56+
Or get a count of the documents in the expected indexes:
57+
```
58+
curl -XGET localhost:9200/usfec*/_count -d '{
59+
"query": {
60+
"match_all": {}
61+
}
62+
}'
63+
```
64+
which should return a count of approximately 4250251.
65+
66+
## Python script
67+
68+
The raw FEC data is provided in a number of 7 files. In order to do some useful querying of the data in a search engine / NoSQL store like Elasticsearch, you typically have to go through a data modeling process of identifying how to join data from various tables.
69+
70+
The Python script (in scripts/process_camfin.py) takes care of some of the obvious ways to join the various data files and produces four .json files which can then be loaded into Elasticsearch using Logstash. The script requires Python 3.
71+
72+
You don't need to run the Python script but it's here in case you want to modify how the data is joined, perform additional data cleansing/enrichment, re-process the latest raw data set from the FEC, etc.
73+
74+
##Elasticsearch index template config
75+
76+
The Elasticsearch mapping configuration is defined in the index template file: index\_template.json. Documentation:
77+
78+
* [Mapping documentation](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping.html)
79+
* [Index templates documentation](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html)
80+
81+
##Logstash config
82+
83+
The Logstash configuration is defined in the file: logstash.conf. Documentation for Logstash plugins: [http://www.elasticsearch.org/guide/en/logstash/current/index.html](http://www.elasticsearch.org/guide/en/logstash/current/index.html).
84+
85+
##Miscellaneous
86+
87+
There are a few other files in this directory which probably deserves explanation:
88+
89+
* data/US.txt, data/zip_codes.csv: These are two zip code to lat/long mapping files which the Python script uses to enrich zip codes in the raw data with a lat/long that Elasticsearch can use for geo queries. If you run the Python script, make sure these two files are in the same directory as the current working dir at the time of execution.
90+
* Vagrant/Puppet files: The first demo released in this demo repo, the NYC traffic accidents demo, included these Vagrant/Puppet files to programmatically instantiate a virtual machine that installs the ELK stack and restore the index snapshot with a simple 'vagrant up' command. While you are still free to use these files, we chose not to recommend this for this demo since the index snapshot is so large which can cause problems if people's internet connections are slow, laptops don't have sufficient resources for running a larger VM, etc.

Diff for: usfec/Vagrantfile

+139
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# -*- mode: ruby -*-
2+
# vi: set ft=ruby :
3+
4+
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
5+
VAGRANTFILE_API_VERSION = "2"
6+
7+
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
8+
# All Vagrant configuration is done here. The most common configuration
9+
# options are documented and commented below. For a complete reference,
10+
# please see the online documentation at vagrantup.com.
11+
12+
# Every Vagrant virtual environment requires a box to build off of.
13+
config.vm.box = "puppetlabs/ubuntu-14.04-64-puppet"
14+
15+
config.vm.provision :shell, path: "bootstrap.sh"
16+
config.vm.provision :puppet do |puppet|
17+
puppet.manifests_path = "puppet/manifests"
18+
puppet.module_path = "puppet/modules"
19+
# puppet.options = "--verbose --debug"
20+
end
21+
22+
config.vm.network "forwarded_port", guest: 9200, host: 9200
23+
config.vm.network "forwarded_port", guest: 5601, host: 5601
24+
25+
config.vm.provider "virtualbox" do |vb|
26+
# Use VBoxManage to customize the VM. For example to change memory:
27+
vb.customize ["modifyvm", :id, "--memory", "2048"]
28+
vb.customize ["modifyvm", :id, "--ioapic", "on"]
29+
vb.cpus = 2
30+
end
31+
32+
# Disable automatic box update checking. If you disable this, then
33+
# boxes will only be checked for updates when the user runs
34+
# `vagrant box outdated`. This is not recommended.
35+
# config.vm.box_check_update = false
36+
37+
# Create a forwarded port mapping which allows access to a specific port
38+
# within the machine from a port on the host machine. In the example below,
39+
# accessing "localhost:8080" will access port 80 on the guest machine.
40+
# config.vm.network "forwarded_port", guest: 80, host: 8080
41+
42+
# Create a private network, which allows host-only access to the machine
43+
# using a specific IP.
44+
# config.vm.network "private_network", ip: "192.168.33.10"
45+
46+
# Create a public network, which generally matched to bridged network.
47+
# Bridged networks make the machine appear as another physical device on
48+
# your network.
49+
# config.vm.network "public_network"
50+
51+
# If true, then any SSH connections made will enable agent forwarding.
52+
# Default value: false
53+
# config.ssh.forward_agent = true
54+
55+
# Share an additional folder to the guest VM. The first argument is
56+
# the path on the host to the actual folder. The second argument is
57+
# the path on the guest to mount the folder. And the optional third
58+
# argument is a set of non-required options.
59+
# config.vm.synced_folder "../data", "/vagrant_data"
60+
61+
# Provider-specific configuration so you can fine-tune various
62+
# backing providers for Vagrant. These expose provider-specific options.
63+
# Example for VirtualBox:
64+
#
65+
# config.vm.provider "virtualbox" do |vb|
66+
# # Don't boot with headless mode
67+
# vb.gui = true
68+
#
69+
# # Use VBoxManage to customize the VM. For example to change memory:
70+
# vb.customize ["modifyvm", :id, "--memory", "1024"]
71+
# end
72+
#
73+
# View the documentation for the provider you're using for more
74+
# information on available options.
75+
76+
# Enable provisioning with CFEngine. CFEngine Community packages are
77+
# automatically installed. For example, configure the host as a
78+
# policy server and optionally a policy file to run:
79+
#
80+
# config.vm.provision "cfengine" do |cf|
81+
# cf.am_policy_hub = true
82+
# # cf.run_file = "motd.cf"
83+
# end
84+
#
85+
# You can also configure and bootstrap a client to an existing
86+
# policy server:
87+
#
88+
# config.vm.provision "cfengine" do |cf|
89+
# cf.policy_server_address = "10.0.2.15"
90+
# end
91+
92+
# Enable provisioning with Puppet stand alone. Puppet manifests
93+
# are contained in a directory path relative to this Vagrantfile.
94+
# You will need to create the manifests directory and a manifest in
95+
# the file default.pp in the manifests_path directory.
96+
#
97+
# config.vm.provision "puppet" do |puppet|
98+
# puppet.manifests_path = "manifests"
99+
# puppet.manifest_file = "site.pp"
100+
# end
101+
102+
# Enable provisioning with chef solo, specifying a cookbooks path, roles
103+
# path, and data_bags path (all relative to this Vagrantfile), and adding
104+
# some recipes and/or roles.
105+
#
106+
# config.vm.provision "chef_solo" do |chef|
107+
# chef.cookbooks_path = "../my-recipes/cookbooks"
108+
# chef.roles_path = "../my-recipes/roles"
109+
# chef.data_bags_path = "../my-recipes/data_bags"
110+
# chef.add_recipe "mysql"
111+
# chef.add_role "web"
112+
#
113+
# # You may also specify custom JSON attributes:
114+
# chef.json = { :mysql_password => "foo" }
115+
# end
116+
117+
# Enable provisioning with chef server, specifying the chef server URL,
118+
# and the path to the validation key (relative to this Vagrantfile).
119+
#
120+
# The Opscode Platform uses HTTPS. Substitute your organization for
121+
# ORGNAME in the URL and validation key.
122+
#
123+
# If you have your own Chef Server, use the appropriate URL, which may be
124+
# HTTP instead of HTTPS depending on your configuration. Also change the
125+
# validation key to validation.pem.
126+
#
127+
# config.vm.provision "chef_client" do |chef|
128+
# chef.chef_server_url = "https://api.opscode.com/organizations/ORGNAME"
129+
# chef.validation_key_path = "ORGNAME-validator.pem"
130+
# end
131+
#
132+
# If you're using the Opscode platform, your validator client is
133+
# ORGNAME-validator, replacing ORGNAME with your organization name.
134+
#
135+
# If you have your own Chef Server, the default validation client name is
136+
# chef-validator, unless you changed the configuration.
137+
#
138+
# chef.validation_client_name = "ORGNAME-validator"
139+
end

Diff for: usfec/bootstrap.sh

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/usr/bin/env bash
2+
3+
apt-get update

0 commit comments

Comments
 (0)