Part 2. Graylog Install — Log Ingestion

5 min readOct 9, 2022

Normalize and enrich Security logs with Graylog!

Graylog Documentation: https://docs.graylog.org/docs

Intro

With our backend storage in place, we now need a tool in place that will be responsible for sending our logs to the Wazuh-Indexer (Read More). Graylog is the perfect tool for this. Graylog will receive logs from our Wazuh-Manager, network devices, or any service that has a syslog forwarding option (many 3rd parties are beginning to offer this, but I always recommend pulling data via API if possible :) ).

Why Graylog?

With multiple log forwarders currently available, such as filebeat, logstash, NIFI, fluent-bit, etc., why should I pick Graylog?

We first need a tool that can read and write to our Wazuh-Indexer backend (OpenSearch 1.3 fork). Currently Graylog supports up to OpenSearch 1.3 (2.x not supported yet) and Elasticsearch 7.10.x or below. THIS MUST BE FOLLOWED, LATER VERSIONS OF ELASTICSEARCH OR OPENSEARCH 2.x WILL BREAK GRAYLOG

Graylog Supported Elasticsearch Versions

Multiple Inputs Supported

Perhaps we want to ingest AWS logs, firewall logs, and endpoint logs. Within Graylog we can instruct our node to accept data via inputs. This opens a port that Graylog listens on to receive our logs. Even better is that inputs of various types are supported, such as:

AWS Cloudtrail
Beats
Gelf
Syslog
Raw/Plaintext
And MUCH MORE

Index Management

Indexes are the building blocks to how Elastic/OpenSearch store data. Poor performance and disk capacity limitations can occur if you are not ensuring healthy index retention periods. For example, one index may hold data from 6 months ago. If I do not delete that index, that data will still reside on disk which could impact the ability to write current date/future logs.

Thus, we need a mechanism in place to roll off old data that we no longer care about storing to make room for new data. Graylog allows us to do exactly that!

Log Normalization

Normalizing our data is a must. We need to ensure common data fields that we receive from our logs (no matter the source) are universally mapped so we can create dashboard and alerting standards that apply to all log types. Your future self will love you!

For example, our firewall writes the source IP that triggered a connection to source_ip_ipv4 , and our Sysmon events that we collect from our endpoints stores the source IP within the data_win_eventdata_sourceIp field. Because these fields contain the same metadata (source IP), it is beneficial for us to write these values to a standard field such as src_ip . Now our SOC team can quickly search for a source IP address within one query, rather than two. Dashboards and alerting are now faster to configure because we are only having to recreate the same dashboards/alerting once rather than having to create new ones as new log sources begin to be ingested.

Standardize win_eventdata_sourceIp field

API Enrichment

Automating our ingested logs with threat feed analysis is a crucial piece to any SIEM stack. We can use Graylog to interact with MISP, Virustotal, etc. to detect any known malicious IPs, domains, file hashes, etc. that are appearing within our collected logs.

This enrichment happens in real time and occurs before the log is stored within the Wazuh-Indexer!

No Data Loss

Unfortunately, failures to our backend storage can happen. Thankfully Graylog can detect when the backend is in an unhealthy state and write logs to disk until the backend is back online. We now have time to get the backend back in a healthy state without fear of losing logs!

Install

Let’s now install Graylog onto our Debian 11 machine!

PREREQUISITES

sudo apt update && sudo apt upgrade
sudo apt install apt-transport-https openjdk-11-jre-headless uuid-runtime pwgen dirmngr gnupg wget

2. MONGODB — Graylog uses MongoDB to store your configuration data, not your log data. Only metadata is stored, such as user information or stream configurations. This can be installed on a dedicated server or on the same host as Graylog.

3. GRAYLOG — Now install the Graylog repository configuration and Graylog itself with the following commands:

Add RootCA to Keystore if using HTTPS for Wazuh-Indexer

Change Default Java Options

Add our cacerts keystore that we copied over to our default Graylog Java options.

nano /etc/default/graylog-server

Add the below:

GRAYLOG_SERVER_JAVA_OPTS="$GRAYLOG_SERVER_JAVA_OPTS -Dlog4j2.formatMsgNoLookups=true -Djavax.net.ssl.trustStore=/etc/graylog/server/certs/cacerts -Djavax.net.ssl.trustStorePassword=changeit"

You can also set your Java heap in this file.

Edit the Configuration File

Read the instructions within the configurations file and edit as needed, located at /etc/graylog/server/server.conf. Additionally, add password_secret and root_password_sha2 as these are mandatory and Graylog will not start without them.

To create your password_secret run the following command:

pwgen -N 1 -s 96

To create your root_password_sha2 run the following command:

echo -n "Enter Password: " && head -1 </dev/stdin | tr -d '\n' | sha256sum | cut -d" " -f1

Configure the Connection to your Wazuh-Indexer:

elasticsearch_hosts = https://user:pass@wazuh-indexerhostname:9200

Graylog’s WebUI should now be listening for connections on port 9000. Go try it out!

Next Steps

With Graylog now installed, we still need to configure our Inputs, Indices, Streams, and Pipelines. More to come on that in later posts!

Conclusion

In my opinion, Graylog is the best log ingestion, parser, and enricher tool within the OpenSource community. With so many built in features, we are able to put some intelligence behind the logs we ingest before they are permanently written to our backend storage and is a big building block in ensuring a successful SIEM!

Need Help?

The functionality discussed in this post, and so much more, are available via SOCFortress’s Professional Services. Let SOCFortress help you and your team keep your infrastructure secure.

Website: https://www.socfortress.co/

Professional Services: https://www.socfortress.co/ps.html