Over the summer, I worked with
Professor Albanese at the Center for Secure Information Systems to refine an
algorithm designed to take a set of observed events, put them through various models
to detect complex behaviors, and then probabilistically identify sequences of
events that aren't sufficiently explained by any of those behaviors. This
approach is intended for use in network security, drawing attention to
unexplained network activity simply by process of elimination. My research this
fall has been to find a way to convert this "offline" algorithm,
which requires all input data at the very start, to an "online"
algorithm, which can be fed data piece-by-piece as it is recorded, which is far
more useful in a network security environment.
My interest in this was twofold. First,
the relatively simple (and, dare I say, elegant) math underlying the algorithm
means I don't need to be an expert in network security to make any progress; my
job is simply to create a framework to be expanded upon by others. Second, the
level of planning, trial, and error inherent to a project this size has
necessarily made me a better programmer (a useful skill in this field). It's
nothing exciting to look at—the hardest part is simply finding the time between
studying and sleeping to sit down with my laptop and program for an hour or
two.
This week, I've been trying to
figure out which parts of the program should store data, and when to add
newly-captured data, which is harder than it seems because different parts of
the program require the data to be in different states (old, new, combined).
The solution? Have each function generate a list of changes and only apply them
once the program no longer needs the unchanged data. Easier said than done, of
course, but it's rewarding little victories like this that remind me why I
enjoy programming. Who knows, with enough little victories I may even be able
to accomplish a few big ones.