Cloud Management With Prolog

23rd March 2019

Introduction

If you work on cloud infrastructure and tools then after a while the proliferation of tools and custom DSLs (domain specific languages) can start grating on you. My main gripe is that none of the DSLs were designed with programmability in mind so as the complexity of the problem increases so do the workarounds necessary to make it all work. I think I've seen enough jinja and erb templates that generate terraform files to last me a lifetime.

With the mini-rant out of the way I want to propose a better way. I propose we use Prolog as a general purpose infrastructure management tool because it is almost custom made for the kinds of problems that come up in that domain. I'm going to outline a process I'm currently using to clean up stale/dangling resources. The concrete example will be about security groups but the idea generalizes to other use cases.

Of course, none of this is actually new and @hakmem pointed out that using Prolog for system administration goes back a few decades. I'm proposing we take this idea to the cloud and start thinking how it can be used to manage global system state and policies.

Generating a Resource Graph

Systems by their nature are graph-like and the cloud is no exception. In the background there is a graph structure that links resources together, i.e. an EC2 instances uses a set of security groups, those security groups are linked to other security groups, the EC2 instance runs in a VPC, has associated IP addresses and potentially some load-balancers that route traffic to it, and so on and so forth. Explicitly spelling out these relationships gives us a graph so let's write some code to generate this graph as a set of facts in Prolog

require 'aws-sdk'

# Generates facts of the form "type(v, t)" where
# "v" is a value and "t" is a symbol, e.g. ec2_instance, security_group, etc.
def typed_value(value, typ, context)
  context << "type(\"#{value}\", #{typ})."
end

# Links two values together and corresponds to the edges
# in the resource graph. You can think of "link(X, Y)" as a directed
# edge between resources that are identified by "X" and "Y", i.e.
# X -> Y. I think of it as "X" uses "Y" but this interpretation is not always
# consistent so thinking of it as an undirected edge is also valid.
def link(source, target, context)
  context << "link(\"#{source}\", \"#{target}\")."
end

# Grab all the security groups associated with this EC2 instance
# and generate the associated typed values and links between
# the security group and the EC2 instance.
def get_security_groups(instance_id, groups, context) 
  groups.each do |group|
    typed_value(group_id = group.group_id,
      'security_group', context)
    link(instance_id, group_id, context)
  end
end

# We will keep all the Prolog facts we are generating
# here before writing to disk.
context = []
ec2 = Aws::EC2::Resource.new(region: 'us-west-2')

# Grab all the EC2 instances.
ec2.instances.each do |i|
  typed_value(instance_id = i.data.instance_id, 'ec2_instance', context)
  get_security_groups(instance_id, i.data.security_groups, context)
end

# Grab all the security groups.
sgs = Aws::EC2::Client.new(region: 'us-west-2').describe_security_groups
sgs.security_groups.each do |security_group|
  typed_value(group_id = security_group.group_id,
    'security_group', context)
  typed_value(group_name = security_group.group_name,
    'sg_name', context)
  link(group_id, group_name, context)
  security_group.ip_permissions.each do |ip_permission|
    ip_permission.user_id_group_pairs.each do |group_pair|
      link(group_id, group_pair.group_id, context)
    end
  end
end

# Generate the Prolog file with all the facts.
File.open('graph.pl', 'w') { |f| f.puts context.sort.uniq.join("\n") }

I recommend running the above to see what is generated and placed in graph.pl. It is basically a set of tuples that describes an attribute graph with nodes being linked to other nodes and their attributes. The nodes have a "type" associated with them like ec2_instance, security_group, sg_name, etc. Then the nodes are linked to each other and their attributes via link tuples and you should think of link(X, Y) as X uses Y but that's not the only consistent interpretation and it is also possible to think of link as an undirected edge in the resource graph.

Querying the Graph

Now that we have the graph we can query for properties we are interested in. My current use case is about cleaning up stale/dangling security groups. By "stale" I mean they're not used by any EC2 instance. This is a simple graph traversal problem and the implementation in Prolog is a few lines

% live.pl
% Base case: there is an EC2 instance using this security group
live_security_group(X, _) :-
  type(X, security_group),
  type(E, ec2_instance),
  link(E, X).
% Recursive case: There is another group Y that uses this group
% and Y is a live security group.
live_security_group(X, Seen) :-
  link(Y, X),
  dif(Y, X),
  \+member(Y, Seen),
  type(Y, security_group),
  live_security_group(Y, [X|Seen]).

If you have SWI-Prolog installed then you can load up these files and run the following query to get the set of "live" security groups

setof(X, live_security_group(X, []), L).

But we're really interested in the complement and there are a few ways to do that but the easiest is probably

setof(X, (type(X, security_group), \+live_security_group(X, [])), L).

Further Work

I'm not sure if this can be packaged as something re-usable but the idea is definitely re-usable and I plan to continue chipping away at this to see how far it can go before it breaks. I've gotten enough feedback from smart folks to know that it's not an insane idea. The next step will be setting up a GitHub repository so that the work can happen out in the open and anyone interested can contribute improvements and ideas.

$ ./keygen.sh 6428f5771007cf005037d47c9aeac9bfcc8925f9  -