What to choose between:
- individual inserts,
- multiple rows insert,
- copy statement
The tests have been performed with 100 simulated clients. The clients are supposed to send requests every 5 milliseconds, every request posts 100 messages (100 items to insert).
For every test, I report the amount of RPS (requests per second), the service is able to handle. I also report the PostgreSQL CPU.
Every item is inserted by an individual INSERT statement.
for item in items["messages"]:
cursor.execute(
"""
INSERT INTO items (
message
) VALUES (
%(message)s
);
""",
{
"message": item["message"]
}
)
80 RPS PostgreSQL CPU 25%
A multiple rows INSERT statement is built and executed once.
sql = "INSERT INTO items(message) VALUES "
params = {}
messages = items["messages"]
amount = len(messages)
for counter, item in enumerate(messages):
key = "message_" + str(counter)
sql += "(%(" + key + ")s)"
if counter != amount - 1:
sql += ","
params[key] = item["message"]
sql += ";"
with self.db:
with self.db.cursor(
cursor_factory=psycopg2.extras.RealDictCursor
) as cursor:
cursor.execute(
sql,
params
)
235 RPS PostgreSQL CPU 30%
A data string is generated and copied into the table.
values = ""
for item in items["messages"]:
values += item["message"] + "\n"
with self.db:
with self.db.cursor(
cursor_factory=psycopg2.extras.RealDictCursor
) as cursor:
cursor.copy_from(
StringIO(values),
"items",
columns=('message',)
)
275 RPS PostgreSQL CPU 30%
For massive inserts, use the psycopg2.copy_from()
method is optimized.
Start the container:
vagrant up
Connect to the container:
vagrant ssh
source /tmp/virtual_env35/bin/activate
Execute the tests:
py.test
vagrant ssh
source /tmp/virtual_env35/bin/activate
locust --host="http://localhost:8080"