Part 2. Graylog Install — Log Ingestion
Normalize and enrich Security logs with Graylog!
Graylog Documentation: https://docs.graylog.org/docs
PART ONE: Backend Storage
PART THREE: Log Analysis
Intro
With our backend storage in place, we now need a tool in place that will be responsible for sending our logs to the Wazuh-Indexer (Read More). Graylog is the perfect tool for this. Graylog will receive logs from our Wazuh-Manager, network devices, or any service that has a syslog forwarding option (many 3rd parties are beginning to offer this, but I always recommend pulling data via API if possible :) ).
Why Graylog?
With multiple log forwarders currently available, such as filebeat, logstash, NIFI, fluent-bit, etc., why should I pick Graylog?
We first need a tool that can read and write to our Wazuh-Indexer backend (OpenSearch 1.3 fork). Currently Graylog supports up to OpenSearch 1.3 (2.x not supported yet) and Elasticsearch 7.10.x or below. THIS MUST BE FOLLOWED, LATER VERSIONS OF ELASTICSEARCH OR OPENSEARCH 2.x WILL BREAK GRAYLOG
Multiple Inputs Supported
Perhaps we want to ingest AWS logs, firewall logs, and endpoint logs. Within Graylog we can instruct our node to accept data via inputs. This opens a port that Graylog listens on to receive our logs. Even better is that inputs of various types are supported, such as:
- AWS Cloudtrail
- Beats
- Gelf
- Syslog
- Raw/Plaintext
- And MUCH MORE
Index Management
Indexes are the building blocks to how Elastic/OpenSearch store data. Poor performance and disk capacity limitations can occur if you are not ensuring healthy index retention periods. For example, one index may hold data from 6 months ago. If I do not delete that index, that data will still reside on disk which could impact the ability to write current date/future logs.
Thus, we need a mechanism in place to roll off old data that we no longer care about storing to make room for new data. Graylog allows us to do exactly that!
Log Normalization
Normalizing our data is a must. We need to ensure common data fields that we receive from our logs (no matter the source) are universally mapped so we can create dashboard and alerting standards that apply to all log types. Your future self will love you!
For example, our firewall writes the source IP that triggered a connection to source_ip_ipv4
, and our Sysmon events that we collect from our endpoints stores the source IP within the data_win_eventdata_sourceIp
field. Because these fields contain the same metadata (source IP), it is beneficial for us to write these values to a standard field such as src_ip
. Now our SOC team can quickly search for a source IP address within one query, rather than two. Dashboards and alerting are now faster to configure because we are only having to recreate the same dashboards/alerting once rather than having to create new ones as new log sources begin to be ingested.
API Enrichment
Automating our ingested logs with threat feed analysis is a crucial piece to any SIEM stack. We can use Graylog to interact with MISP, Virustotal, etc. to detect any known malicious IPs, domains, file hashes, etc. that are appearing within our collected logs.
This enrichment happens in real time and occurs before the log is stored within the Wazuh-Indexer!
No Data Loss
Unfortunately, failures to our backend storage can happen. Thankfully Graylog can detect when the backend is in an unhealthy state and write logs to disk until the backend is back online. We now have time to get the backend back in a healthy state without fear of losing logs!
Install
Let’s now install Graylog onto our Debian 11 machine!
- PREREQUISITES
sudo apt update && sudo apt upgrade
sudo apt install apt-transport-https openjdk-11-jre-headless uuid-runtime pwgen dirmngr gnupg wget
2. MONGODB — Graylog uses MongoDB to store your configuration data, not your log data. Only metadata is stored, such as user information or stream configurations. This can be installed on a dedicated server or on the same host as Graylog.
3. GRAYLOG — Now install the Graylog repository configuration and Graylog itself with the following commands:
Add RootCA to Keystore if using HTTPS for Wazuh-Indexer
Change Default Java Options
Add our cacerts keystore that we copied over to our default Graylog Java options.
nano /etc/default/graylog-server
Add the below:
GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Dlog4j2.formatMsgNoLookups=true -Djavax.net.ssl.trustStore=/etc/graylog/server/certs/cacerts -Djavax.net.ssl.trustStorePassword=changeit"
You can also set your Java heap in this file.
Edit the Configuration File
Read the instructions within the configurations file and edit as needed, located at /etc/graylog/server/server.conf.
Additionally, add password_secret
and root_password_sha2
as these are mandatory and Graylog will not start without them.
To create your password_secret
run the following command:
pwgen -N 1 -s 96
To create your root_password_sha2
run the following command:
echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1
Configure the Connection to your Wazuh-Indexer:
elasticsearch_hosts = https://user:pass@wazuh-indexerhostname:9200
Graylog’s WebUI should now be listening for connections on port 9000. Go try it out!
Next Steps
With Graylog now installed, we still need to configure our Inputs, Indices, Streams, and Pipelines. More to come on that in later posts!
Conclusion
In my opinion, Graylog is the best log ingestion, parser, and enricher tool within the OpenSource community. With so many built in features, we are able to put some intelligence behind the logs we ingest before they are permanently written to our backend storage and is a big building block in ensuring a successful SIEM!
Need Help?
The functionality discussed in this post, and so much more, are available via SOCFortress’s Professional Services. Let SOCFortress help you and your team keep your infrastructure secure.
Website: https://www.socfortress.co/
Professional Services: https://www.socfortress.co/ps.html