Monitor Your SIEM Stack with InfluxDB
Monitoring system resource consumption, including CPU, memory, disk usage, and critical processes, is a fundamental aspect of maintaining the health, efficiency, and security of computing environments. This practice is crucial for several reasons:
1. Performance Optimization
Monitoring resources helps in identifying bottlenecks and inefficiencies within the system. By analyzing trends and patterns in resource usage, administrators can make informed decisions on upgrading hardware, optimizing applications, or redistributing workloads to improve overall performance.
2. Capacity Planning
Understanding the consumption patterns of resources over time aids in forecasting future needs. This proactive approach ensures that sufficient resources are available to meet demand without overprovisioning, which can be costly.
3. Reliability and Availability
High resource utilization can lead to system instability and downtime. Monitoring allows for the early detection of issues that could cause outages, enabling preventative measures to be taken to maintain service availability.
The Role of InfluxDB in Monitoring
InfluxDB, a time series database designed specifically for handling time-stamped data, is an excellent tool for monitoring system resources. It offers several features that make it well-suited for this purpose:
- High Performance: InfluxDB is optimized for fast, high-availability storage and retrieval of time series data, making it ideal for monitoring data that is generated in large volumes and at high velocity.
- Scalability: It can handle millions of data points per second, supporting the monitoring needs of both small systems and large enterprise environments.
- Flexible Query Language: InfluxDB’s query language (InfluxQL) is similar to SQL, making it accessible to those familiar with relational databases. This allows for complex queries to be constructed to analyze resource consumption patterns.
- Integrated Visualization and Analytics: With tools like Grafana, users can visualize their data in real-time, allowing for immediate insights into system performance.
- Interoperability: InfluxDB can easily integrate with a wide range of data collectors, such as Telegraf, and monitoring solutions, providing flexibility in how data is gathered and used for monitoring purposes.
Install Procedure
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt-get update && sudo apt-get install influxdb2
systemctl start influxdb
systemctl status influxdb
systemctl enable influxdb
Install Telegraf Agent
Select a server you want to monitor and install the telegraf agent to collect system metrics and send them to your InfluxDB server.
# influxdata-archive_compat.key GPG Fingerprint: 9D539D90D3328DC7D6C8D3B9D8FF8E1F7DF8B07E
wget -q https://repos.influxdata.com/influxdata-archive_compat.key
echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt-get update && sudo apt-get install telegraf
Update telegraf.conf:
#### Telegraf Configuration - Linux Agents
#
# SOCFortress, LLP, info@socfortress.co
#
####
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
logtarget = "file"
logfile = "/var/log/telegraf/telegraf.log"
logfile_rotation_interval = "1d"
logfile_rotation_max_archives = 5
hostname = ""
omit_hostname = false
[[outputs.influxdb_v2]]
urls = ["http://*REPLACE*:8086"]
token = "*PASS*"
organization = "ORG"
bucket = "telegraf"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.net]]
[[inputs.procstat]]
pattern = ".*"
[[inputs.systemd_units]]
## Set timeout for systemctl execution
timeout = "1s"
# Filter for a specific unit type, default is "service", other possible
# values are "socket", "target", "device", "mount", "automount", "swap",
# "timer", "path", "slice" and "scope ":
unittype = "service"
# Filter for a specific pattern, default is "" (i.e. all), other possible
# values are valid pattern for systemctl, e.g. "a*" for all units with
# names starting with "a"
pattern = ""
systemctl enable telegraf
systemctl start telegraf
Configure Alert Monitoring
1. CPU Check
${ r._level }: ${ r._check_name }
Host: ${r.host}
CPU Usage(%): ${100.0 - r.usage_idle}
2. Memory Check
${ r._level }: ${ r._check_name }
Host: ${r.host}
Percent Usage(%): ${r.used_percent}
3. Disk Space Check
${ r._level }: ${ r._check_name }
Host: ${r.host}
Percent Usage(%): ${r.used_percent} on device: ${r.device}
4. Critical Services Check
Critical Systemd Service ${r.name} is down. Host ${r.host}