Development
The source code is available at https://github.com/azazellochg/em_health
Changing Dashboards
By default, the provisioned dashboards are read-only. If you set EMHEALTH_DEBUG=true in the docker/.env, you can modify and save changes via the Grafana UI. However, if you then update the provisioned dashboards (e.g. via pip install -U em_health), the changes made via UI will be lost. See details here. The workaround is the following:
Make changes to a dashboard via Grafana UI.
Save and export dashboard to JSON (DO NOT check Export the dashboard to use in another instance).
Overwrite existing dashboard file (they are in docker/grafana/provisioning/dashboards/) with the saved json file.
Any file changes in the provisioning folder are immediately picked up by Grafana. There’s no need to restart it.
There are a few limitations:
You cannot create nested folders for dashboards. Only single level depth is supported.
You should not rename dashboards or folders via GUI as this will conflict with provisioned files. Do it directly on the files if really needed.
Some provisioned resources (alerts, contact points, datasources) cannot be modified from the GUI. You can create new ones though.
DB performance metrics
After installation the DB performance monitoring is enabled by default. You can check the dashboards under DB performance folder.
Performance statistics is inspired by Pganalyze and includes:
database statistics (updated every 10 min)
tables statistics (updated every 10 min)
index statistics (updated every 10 min)
auto-VACUUM statistics (updated every 1 min)
query statistics (updated every 1 min)
CPU and RAM host statistics (updated every 1 min)
auto-EXPLAIN plans (for queries longer than 500ms)
Statistics retention time is 6 months.
SQL commands
Below are some frequently used commands for psql database client:
connect: psql -U postgres -h localhost -d tem
change db to sem: \c sem
list tables: \dt
list materialized views: dm
list table structure: \d data;
list table content: SELECT * FROM parameters;
disconnect: \q
For more examples refer to the command line cheetsheet
Using Grafana API
Grafana provides HTTP API that can be used once you create a service admin account
with an API token and save it to GRAFANA_API_TOKEN in the docker/.env. A simple Python client inside EMHealth can then access the API.
At the moment the client can only change the default organization preferences by running:
python em_health/grafana_client.py
Logs
All EMHealth application actions are saved into emhealth.log. PostgreSQL logs are in CSV format and can be accessed through:
docker exec -it timescaledb bash
cd /var/lib/postgresql/data/log
cat *.csv
Grafana logs are accessible via:
docker logs grafana
Database structure
We have two databases: tem and sem, both have the same structure at the moment. Each database has several schemas:
public - default schema for storing HM events data
schema_info - table to store the current schema version
instruments - global metadata for each microscope
enum_types - enumeration names for each instrument
enum_values - enumeration values for each enum
parameters - parameters metadata
enum_values_history - old/replaced enumeration values
parameters_history - old/replaced parameters
data - main events data table for all instruments
data_staging - staging table for bulk data inserts with COPY
uec - schema for storing UECs / Alarms. UEC codes are unified across different instruments
device_type
device_instance
error_code
subsystem
error_definitions
errors - main UEC data table for all instruments
fdw_ms_IID - foreign server schema for MSSQL with UECs (for each instrument ID)
error_definitions
error_notifications
fdw_pg_IID - foreign server schema for PostgreSQL with HM data (for each instrument ID)
event_property
event_property_type
event_type
parameter_type
instrument_event_config
pganalyze - schema to store database statistics
database_stats
table_stats
index_stats
vacuum_stats
stat_statements
stat_snapshots
queries
sys_stats
stat_explains
Measuring ingestion performance
These benchmarks compare different ingestion strategies for loading timeseries-like CSV data into TimescaleDB.
Workflow:
Generate test data with the desired number of rows.
Run one or more ingestion tests (COPY, EXECUTEMANY, UNNEST).
Compare performance metrics such as rows/s, query planning, and execution times.
Input dataset
The input is a simulated CSV file with N rows. Each row contains:
time (timestamp, millisecond precision)
instrument_id (integer)
param_id (integer)
value_num (float)
value_text (string, optional)
Data generation parameters:
30 days of data
10 instruments
500–1500 parameters per instrument
To generate 1,000,000 rows:
emhealth db test-data 1000000
Benchmarking COPY
The COPY test uses psycopg3 text-format COPY with a configurable chunk size. Each chunk is a Python string containing concatenated rows. This test allows tuning both chunk size and Postgres server settings.
Run with an 8 MB chunk size:
emhealth db test-copy 8388608
Benchmarking EXECUTEMANY
The EXECUTEMANY test uses cursor.executemany() in psycopg3. Internally this leverages libpq’s pipeline mode to run batched INSERT .. VALUES statements. We still commit transactions in batches.
Each run inserts batch_size × num_columns values.
Example with batch size 1000:
emhealth db test-execmany 1000
Benchmarking UNNEST
The UNNEST test uses cursor.execute() to run an INSERT .. UNNEST query. Instead of sending row-by-row inserts, this method sends arrays (one per column) and expands them into rows in PostgreSQL. This reduces query planning overhead compared to EXECUTEMANY.
Example with batch size 1000:
emhealth db test-unnest 1000
Example output
Each test is run 5 times. Results include raw wall times, throughput (rows/s), and query planning/execution stats from pg_stat_statements.
Example output (truncated):
Using insert_copy to insert 997,905 rows into data_staging table:
Batch size: 8000000
Raw run times: [0.8793832040391862, 0.903236785903573, 0.8884079209528863, 0.8673364277929068, 0.8408198338001966], rows/s: [1134778.3257815468, 1104809.9629840956, 1123250.903627322, 1150539.7075726986, 1186823.8115766563]
Avg time over 5 runs: 0.8758 s
Avg performance: 1,140,040.5423 rows/s
Calls per run: 1
Plan time per call: 0.0000 ms
Exec time per call: 865.8786 ms
Interpreting results
COPY is typically the fastest for bulk ingestion. Experiment with chunk sizes (e.g. 4 MB, 8 MB, 16 MB) to balance client/server memory usage.
EXECUTEMANY is slower but more flexible when UPSERTs are required.
UNNEST can outperform EXECUTEMANY for medium batch sizes, since fewer query plans are created.
Always run with different batch sizes (1,000, 5,000, 10,000) and average results across trials for reliable benchmarks.