Database Benchmarking Criteria and Using HammerDB

Where the Problem Started

Database benchmarks are easy to misread when they become only a comparison of numbers. A high TPS value matters less than the schema and workload that produced it. This post records how I used HammerDB while thinking through which benchmark conditions make the result meaningful.

Choosing a database system has long-term consequences.

If you need to replace the database during operation after making the choice, migration is not easy, so you need to anticipate and detect problems early in development.

Things to think about during the selection process include the following.

Schema and record size
Number of clients
Query format and access patterns
Ratio of read and write queries
Range of variation in the variables above

After that, there is a process of building a test cluster and simulating workloads, which can be done with tools such as YCSB(serving benchmark) and hammerDB.

The following is the hammerDB configuration.

Create a hammerDB Dockerfile in the directory you want.

Implementation Path

cd <directory>
touch Dockerfile
vi Dockerfile

FROM  docker.io/tpcorg/hammerdb:oracle as oracle

FROM  docker.io/tpcorg/hammerdb:mssqls

# Install and configure IBM Db2 client libraries,
# You will need to pre-download IDB Db2 client libraries and place in the local folder
# RUN mkdir -p db2_cli_odbc_driver/odbc_cli
# ADD odbc_cli db2_cli_odbc_driver/odbc_cli/
# RUN apt update && \
#    echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections && \
#    apt -y install libxml2 && \
#    echo 'export DB2_CLI_DRIVER_INSTALL_PATH="/home/db2_cli_odbc_driver/odbc_cli/clidriver"' >> ~/.bashrc && \
#    echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/home/db2_cli_odbc_driver/odbc_cli/clidriver/lib"' >> ~/.bashrc && \
#    echo 'export LIBPATH="/home/db2_cli_odbc_driver/odbc_cli/clidriver/lib"' >> ~/.bashrc && \
#    echo 'export PATH="$PATH:/home/db2_cli_odbc_driver/odbc_cli/clidriver/bin"' >> ~/.bashrc && \
#    echo 'export PATH="$PATH:/home/db2_cli_odbc_driver/odbc_cli/clidriver/adm"' >>  ~/.bashrc

COPY --from=oracle /home/instantclient_21_5 /home/instantclient_21_5
ENV ORACLE_LIBRARY=/home/instantclient_21_5/libclntsh.so
RUN echo 'export LD_LIBRARY_PATH=/home/instantclient_21_5/:$LD_LIBRARY_PATH'  >> ~/.bashrc

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y libmariadb3 libpq-dev libmysqlclient21 && \
    rm -rf /var/lib/apt/lists/*

Build the image.

docker build -t hammerdb .

If you want to create a container,

docker run -it --name <container name> hammerdb bash

Database benchmarking starts from the HammerDB CLI. First, locate the CLI path.

docker exec -it hammerdb /bin/bash

Go in and find the path, then run

docker exec -it <container name> <path name>

and you will enter the cli.

Choose the database type for the benchmark.

ora mssqls db2 mysql pg maria

The options are oracle, MsSQL, IBM Db2, MySQL, PostgreSQL, and MariaDB.

For this run, I selected MySQL.

Choose the benchmark type according to the workload you want to model. HammerDB documents the available benchmark families here: https://www.hammerdb.com/benchmarks.html

For now, I will test with the TCP-C type.

hammerdb> dbset bm TPC-C

You need to check the dict(dictionary). This can differ depending on the hammerDB version, database type, and benchmark type, so the settings can vary. For example, in some situations you can set the host and other values with just the set keyword, while in others you may need to use diset to change the dictionary itself.

hammerdb>print dict
Dictionary Settings for MySQL
connection {
 mysql_host               = 127.0.0.1
 mysql_port               = 3306
 mysql_socket             = /tmp/mysql.sock
 mysql_ssl                = false
 mysql_ssl_two_way        = false
 mysql_ssl_linux_capath   = /etc/mysql/certs
 mysql_ssl_windows_capath = C:\mysql\certs
 mysql_ssl_ca             = ca-cert.pem
 mysql_ssl_cert           = client-cert.pem
 mysql_ssl_key            = client-key.pem
 mysql_ssl_cipher         = server
}
tpcc       {
 mysql_count_ware       = 1
 mysql_num_vu           = 1
 mysql_user             = root
 mysql_pass             = mysql
 mysql_dbase            = tpcc
 mysql_storage_engine   = innodb
 mysql_partition        = false
 mysql_prepared         = false
 mysql_total_iterations = 10000000
 mysql_raiseerror       = false
 mysql_keyandthink      = false
 mysql_driver           = timed
 mysql_rampup           = 2
 mysql_duration         = 5
 mysql_allwarehouse     = false
 mysql_timeprofile      = false
 mysql_async_scale      = false
 mysql_async_client     = 10
 mysql_async_verbose    = false
 mysql_async_delay      = 1000
 mysql_connect_pool     = false
}

This is the current dict structure. If you want to change the host, port, user, pass, and so on,

diset connection mysql_port <your port>
diset connection mysql_host <your host>
diset tpcc mysql_user <your user>
diset tpcc mysql_pass <your pass>

run these commands.

If you are trying to load an AWS RDS database, you need to open the IP in the Security Group.

Load the benchmark schema next.

vudestroy

You can think of this as a kind of clean step.

buildschema

This is the command for loading the schema. If the settings were completed correctly, the host and port will appear, and there will be some waiting time.

Vuser 1 created - WAIT IDLE
Vuser 1:RUNNING
Vuser 1:CREATING TPCC SCHEMA
Vuser 1:Ssl_cipher
Vuser 1:CREATING DATABASE tpcc
Vuser 1:CREATING TPCC TABLES
Vuser 1:Loading Item

Refer to https://stackoverflow.com/questions/73624954/how-long-it-takes-to-build-schema-or-my-hammerdb-not-working

Vuser 1:Orders Done
Vuser 1:End:Tue Oct 24 17:43:06 UTC 2023
Vuser 1:CREATING TPCC STORED PROCEDURES
Vuser 1:GATHERING SCHEMA STATISTICS
Vuser 1:TPCC SCHEMA COMPLETE
Vuser 1:FINISHED SUCCESS
ALL VIRTUAL USERS COMPLETE
Schema Build jobid=6537FC1D608703E253535343

After the load finishes, HammerDB shows the completed schema state.

From there, run the benchmark and inspect the performance metrics. On RDS, CloudWatch can provide additional database-level monitoring. Before executing the run, configure the virtual user count to match the scenario.

vurun

Benchmarking Takeaway

A benchmark never fully reproduces production. Still, when schema, client count, query mix, and read/write ratio are explicit, it becomes a comparison that can be reasoned about. The important part is not merely obtaining a number, but being able to explain the conditions behind it.