Deploying OSSEC at scale

As part of a general effort to improve the security of our infrastructure, all of our AWS instances now have OSSEC installed on them to provide host-based intrusion detection - this includes instances in our autoscaling groups. In this post I'll explain how this is done.

Process overview

We use Ansible to both provision and configure each AWS instance in our VPCs.

After provisioning, a base Ansible playbook is run for each instance and configures things like authorized SSH keys, default software packages, and so on. When creating images that instances in our autoscaling groups can use, all that is required is to run the base Ansible playbook plus the specific playbook that specializes an instance. An AMI is then created from this.

An OSSEC playbook include was added to the base playbook so that every instance we spin up will have OSSEC installed on it. That playbook installs the the "local" OSSEC flavour, which means that each AWS instance monitors itself and sends alerts when appropriate. This is different from the traditional server / agent configuration where the server receives information from each agent and decides whether or not to alert.

By using the "local" OSSEC flavour, instances brought up in an autoscaling group do not have to discover where the OSSEC central server is - the local OSSEC starts when the instance is booted, and starts monitoring right away. Likewise, when an instance in an autoscaling group is terminated, it does not have to unregister from the OSSEC server.

Ansible specifics

In our OSSEC playbook, a frozen checkout of OSSEC is pulled down from S3 onto the instance and installed by moving a static copy of preloaded-vars.conf into etc/ and running ./install.sh. This preloaded-vars.conf is fairly barebones, with the most important part being that local is set for USER_INSTALL_TYPE.

An alert_rules_list variable holds all of the rule_id that should trigger an alert. Here is a subset of this list:

  • 510: Host-based anomaly detection event (rootcheck)
  • 533: Listened ports status (netstat) changed (new port opened or closed).
  • 550: Integrity checksum changed.
  • 5720: Multiple SSHD authentication failures.

I don't recommend alerting on 5710 (attempt to login using a non-existent user) if your instance is accessible from the Internet...

An ossec.conf.j2 template then uses this rules list to fill in what should happen on each rule. For example, with the development version of OSSEC that supports Slack integration, you could use the following template loop to post alerts to a Slack channel:

{% for rule in alert_rules_list %}
   <integration>
     <name>slack</name>
     <rule_id>{{ rule }}</rule_id>
     <hook_url>{{ slack_ossec_url }}</hook_url>
   </integration>
{% endfor %}

Finally, a static local_rules.xml is copied over. After that, OSSEC is started. When OSSEC installs it adds to /etc/rc.d/rc.local so that it starts on startup:

echo "Starting OSSEC HIDS"  
/var/ossec/bin/ossec-control start

If either our OSSEC version or the configuration files need to be updated, all that is required is to make those changes and rerun only the OSSEC playbook - it simply uninstalls the previous OSSEC before installing over it. This way we can push updates without disrupting service, and the next time instances are provisioned or an AMI is made for our autoscaling instances it will pick up the changes.

Problems faced

The most difficult problem (that is still being tackled) is determining what is malicious behaviour and what is not. For example, running an Ansible playbook performs the following actions:

  1. ssh into the instance
  2. pipe a shell script into ~/.ansible/tmp/
  3. run that shell script, in some cases using sudo

Steps 2 and 3 are almost the same as what happens from a remote execution exploit (see "Taking Control").

What if a server program is being deployed? That process involves copying that program to the instance and running it, which will open up ports - something that we have configured OSSEC to alert on.

It is important to tune which behaviour will generate alerts. OSSEC will reveal in detail what is happening on your instances, and this includes things that you may not have realized were happening in the first place. Early one morning, there was an alert storm from OSSEC when several of our instances started alerting that each binary on the system was being altered. It is very easy to panic if you were unaware that prelinking was enabled on your system, or that prelinking was even something that happened.

For each of the three scenarios above, we have ways of mitigating the problem, but a general solution has not been found yet.

Conclusions

By combining the local type of OSSEC install and Ansible's provisioning and configuration management capabilities, running and maintaining OSSEC on hundreds of AWS instances becomes straight forward.

Deploying OSSEC has already helped us make our systems more secure by pointing out a few places where improvements could be made. It also has given us a greater insight into what is happening on our instances from day to day.

Vena is hiring in Toronto!
Learn about our culture, if you think you're a good fit, apply!