Over the summer, I worked with Professor Albanese at the Center for Secure Information Systems to refine an algorithm designed to take a set of observed events, put them through various models to detect complex behaviors, and then probabilistically identify sequences of events that aren't sufficiently explained by any of those behaviors. This approach is intended for use in network security, drawing attention to unexplained network activity simply by process of elimination. My research this fall has been to find a way to convert this "offline" algorithm, which requires all input data at the very start, to an "online" algorithm, which can be fed data piece-by-piece as it is recorded, which is far more useful in a network security environment.
My interest in this was twofold. First, the relatively simple (and, dare I say, elegant) math underlying the algorithm means I don't need to be an expert in network security to make any progress; my job is simply to create a framework to be expanded upon by others. Second, the level of planning, trial, and error inherent to a project this size has necessarily made me a better programmer (a useful skill in this field). It's nothing exciting to look at—the hardest part is simply finding the time between studying and sleeping to sit down with my laptop and program for an hour or two.
This week, I've been trying to figure out which parts of the program should store data, and when to add newly-captured data, which is harder than it seems because different parts of the program require the data to be in different states (old, new, combined). The solution? Have each function generate a list of changes and only apply them once the program no longer needs the unchanged data. Easier said than done, of course, but it's rewarding little victories like this that remind me why I enjoy programming. Who knows, with enough little victories I may even be able to accomplish a few big ones.