Tuesday, July 31, 2018

IDE approach to log analysis pt. 1

Intro

I think most software engineers understand the importance of logs. They have become part of software development. If something doesn't work we try to find the cause in the logs. This could be enough for simple cases when a bug prevents an application from opening a window. You find the issue in the logs, look it up on Google and apply the solution. But if you are fixing bugs in a large product with many components analyzing logs becomes the main problem. Usually sustain engineers (who are fixing bugs not developing new features) need to work with many hundreds of megabytes of logs. The logs are usually split into separate files 50-100 MB each and zipped. 

There are several approaches to make this work easier. I'll describe some existing solutions and then explain a theoretical approach to this problem. This blog post will not discuss any concrete implementations. 

Existing Solutions

Text Editor

This solution is not actually a solution it is what most people would do when they need to read a text file. Some text editors could have useful features like color selection, bookmarks which can make the work easier. But still the text editor falls short of a decent solution.

Logsaw

This tool can use the log4j pattern to extract the fields from your logs. Sounds good but these fields are already obvious from the text. Clearly the improvement is insignificant over a simple text editor. 

LogStash

This project looks pretty alive. But this approach is quite specific. Even though I've never worked with this tool from the description I understood that they use ElasticSearch and simple text search to analyze logs. The logs must be uploaded somewhere and indexed. After that the tool may show the most common words, the user may use text search etc. Sounds good, seems to be some improvement. Unfortunately not so much. Here are the cons:
  • Some time is required to begin working with the logs. One has to upload them, index them. After the work is done these logs must be removed from the system. Looks like a little overkill if the logs are meant to be analyzed and discarded. 
  • A lot of components involved with a lot of configuration necessary.
  • Full text search is not very useful with logs. Usually the engineer is looking for something like "connection 2345 created with parameter 678678678". Looking for "created with parameter" will return all connections. Looking for "connection 2345" will return all such statements but usually there is only one - when this connection was created.

Other Cloud-based Solutions

There are a lot of cloud-based solutions available. Most of them have commercial plans and some have free plans. They offer notifications, visualizations and other features but the main principles are the same as for LogStash. 

Log Analysis Explained 

To understand why these solutions do not perform well for analyzing complex issues we need to try to understand the workflow. Here is a sample workflow with the text editor:
  1. An engineer received 1 GB of logs with the information that the bug happened at 23:00 with request ID 12345.
  2. First he or she tries to find any errors or exceptions around that time.
  3. If that fails the engineer has to reconstruct the flow of events for this request. He or she begins looking for statements like "connection created", "connection deleted", "request moved to this stage" trying to narrow down the time frame for the issue. 
  4. That is usually successful (even though could take a lot of time) now it is clear that the issue happened after connection 111 was moved to state Q. 
  5. After digging a little more the engineer finds out that this coincides with connection 222 moving to state W. 
  6. Finally the engineer is delighted to see that the thread that moved connection 222 to the new state also modified another variable that affected connection 111. Finally the root cause.
In this workflow we see that the engineer most of the time is looking for standard strings with some parameters. If only it could be simplified...

IDE approach

There are several parts to the IDE approach.
  1. Regular expressions. With regular expressions one can specify the template and search for it in the logs. Looking for standard strings is much simpler with regular expressions. 
  2. Regular Expressions Configuration. The idea here is that standard strings like "connection created \d{5}\w{2}", "connection \d{5}\w{2} moved to stage \w{7}", "connection\d{5}\w{2} deleted" do not change often. Writing the regular expression to find it every time is unwieldy because such regexes could be really long and complicated. It is easier if they can be configured and used by clicking on a button.
  3. IDE. We need some kind of an IDE to unite this together. To read the configuration, show the logs files and stored regexes, display the text and search results. Preferably like this:
  4. Color features. From experience I know that log analysis is much easier when you can mark some strings with color to easily see it in the logs. Most commercial log analyzer tools use color selection. The IDE should help with that.

Pros and Cons

Pros of the IDE approach:

  1. No cloud service necessary. No loading gigabytes of logs somewhere, no cloud configuration. One only has to open the IDE for logs, open the log folder and start analyzing.
  2. If the IDE is free the whole process is completely free. Anyway should be cheaper than a log service.

Cons of the IDE approach:

  1. Most cloud services offer real-time notifications and log analysis "on the fly". It means as soon as the specified exception happens the user is notified. The IDE approach cannot do that.
  2. The requirements for the user's PC are somewhat higher because working with big strings in Java consumes a lot of memory. 8 GB is the minimum requirement from my experience. 
The bottom line is the IDE approach is suitable to analyze complicated issues in the logs. It cannot offer real-time features of cloud services but should be much cheaper and easier for analyzing and fixing bugs. 

Final Thoughts

It would be great if someone could implement this great approach! I mean create this IDE with all those features and make log analysis easier for everyone! I know from experience that this could be a tedious work that feels harder than it actually is. In the next post (part 2) I'll explain the difficulties/challenges with this approach and offer a working implementation based on the Eclipse framework.

Friday, July 27, 2018

Distribution System Pattern

Intro

Hi all! This blog post continues the "new patterns ideas" series. In these posts I suggest new patterns that utilize the latest technology available in computer programming. The well-known patterns from the Gang of Four book are all great and are widely used. But they were created a long time ago. They didn't have concurrency, futures and other stuff. I call them "new pattern ideas" because they are not actually patterns at this moment only ideas. I'm sure some programmers have used similar structures in their programs trying to solve similar problems.

Patterns help us in many ways. For those who don't know who to solve a problem they may show the way. For those who have already solved a problem they help explain the solution to others. In my opinion the best programs can be represented as a group of patterns linked together.

Motivation

The picture above shows the motivation for this pattern. You have a number of tasks which can be executed in parallel. The tasks must be distributed to workers and the results the workers produce add up to one final result. All the tools for this task are already available in Java: CompletableFuture, ExecutorService. This pattern will introduce some clarity how to use these tools.

Applicability

Use this pattern when you have a big task that can be decomposed into many small tasks that can be executed in parallel. Execution of one small task creates one small result. All the small results add up to produce one final result. 

Structure

Tasks

The tasks should be organized hierarchically like this:
Leaves contain the Task objects that do the actual work. The main idea is that the task execution must return a result and CompletableFuture uses Supplier in supplyAsync methods. Consequently it is easier to use task objects implementing Supplier. All the leaves have one execute method that returns a CompletableFuture received by passing the task object to CompletableFuture.supplyAsync (possibly also passing its own executor).  
Folders allow for grouping of tasks and ordered execution. For example one group of tasks may be executed before the other. It is important to remember though that all the tasks return results of one type or at least inherited from one type. Folders call execute on all of their children and return a CompletableFuture that is the result of CompletableFuture.allOf(...). If necessary folders may contain other folders.
The root is used for administrative purposes. It may contain the executor service for the tasks and also is the main entry point for the distribution system. It should have something like an execute method that begins the whole execution process and calls execute on all folders. 
Here is the call hierarchy:

execute always returns a CompletableFuture. 

Accumulating Results

Accumulating results is a very important part of this pattern. The idea is to create an interface like this:

public interface IDistribAccumulator<R, F, E> {

    public void addResult(R newResult);

    public List<R> getAllResults();

    public F getResult();

    public void addError(E newError);

    public List<E> getErrors();
}

We'll call this object the accumulator. Here R is the type of the task result, F the final result of this operation, E the type of an error. The leaf adds some handling to the task by means of CompletableFuture.handle. If the task completed without errors it adds the result to the accumulator. If an exception was thrown it adds an error to the accumulator. Here is a sample code of the execute method of the Leaf:

public CompletableFuture<R> execute() {
        CompletableFuture<R> execFuture = CompletableFuture.supplyAsync(task, es);
        return execFuture.handle((R arg0, Throwable t) -> {
            if (arg0 != null) {
                accumulator.addResult(arg0);
            }
            if (t != null) {
                accumulator.addError(new <Throwable wrapper>(t));
            }
            return arg0;
        });
    }

Tasks remain decoupled from the leaves and the work is done. 

Users of this distribution system pattern will implement this interface and may provide a way to get the final result from the individual results when all the tasks have completed execution. 

In progress status

Also a common issue is how to report to the user the status when the work is in progress. Some tasks have completed execution, some are in progress, some are waiting in the queue and the UI must show the current status. The solution is to use the accumulator. It can return the tasks that are already completed and even the results for them. It is of course desirable that the accumulator returns an unmodified list from getAllResults to avoid any issues. 


How to create the structure

It makes sense to create a Builder to create the distribution system. The instance of the Builder is created with some parameters and then the Builder uses these parameters to create first the root, then the folders and then the tasks. 

Structure mapped to motivation


Participants

TaskRoot - the entry point to the distribution system. Contains shared objects and begin the execution
TaskFolder - contains individual tasks. Used for grouping tasks.
TaskLeaf - contains the actual task object. Executes it, adds the result or the error to the accumulator, returns the CompletableFuture.
Task - the actual piece of work. All tasks are independent (at least within a folder) and can be run in parallel.
Accumulator - accumulates the results from individual tasks and returns the final result.

Consequences

  1. Clear structure. After Java more or less streamlined the process of parallel execution by adding ExecutorService, CompletableFuture and other facilities a lot of people have begun using these facilities. But in order to successfully implement a solution a clear structure is necessary. This pattern offers such a clear structure.
  2. Reduced coupling. The structure to execute the tasks and get results is decoupled from the actual tasks. The objects that must be implemented are the tasks themselves, the accumulator and the builder. 
  3. Code reuse. The main structure can be written once and use multiple times by supplying the task objects, the accumulator and the builder. 

Implementation Considerations

This pattern is a concurrent pattern. Consequently it is very important to synchronize all the methods that can be accessed by multiple threads. Mostly these are the methods in the accumulator. It may be best if all of them are synchronized. 
The parts that need to be changed in any concrete implementation are the tasks, the accumulator and the builder. All the other parts of the system are fairly general and can be reused.

Final Thoughts

There is an actual implementation of this pattern and it is working quite well. Maybe in one of the next posts I'll talk about it.

Thursday, July 26, 2018

On the question of licenses

Intro

Hi all. Today's topic is indirectly related to Java. I think it is not an overestimation that most people hate software licenses. I'm one of them. Every time I begin reading a license it makes my brain go crazy. Like this excerpt from the Gnu Gpl:
You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you.
 All these "covered but not limited but complying with..." and "on your behalf or on your protection or ...". It must take several years to get comfortable with this language. 95% of the people in the world are not legal experts and cannot understand most of it. This is the main reason why these unpleasant things may happen. People may agree to anything because most of them do not read the terms and they do not read them because they are hard to understand.

I read some blog posts suggesting that AI may help with this. Like it will be scanning licenses and contracts for "bad" parts. The person who created this idea must have really disliked AI and AI developers. Even courts and judges have difficulties trying to understand the contracts and licenses and he or she wanted AI to handle it.

Solution

What is funny this problem does have a fairly easy solution. And this solution is well-known. Most programmers and also java programmers use this solution every day. It is (tense music in the background) INHERITANCE. Yes simple plain old inheritance. Even Java-style inheritance will do. Licenses must get rid of the old model when every license contains all the provisions, information that is necessary to use it. Licenses for concrete products must inherit well-known licenses like Java classes. So instead of the brain-depressing text above the Gnu Gpl could look like this:
If someone is presented with this structure he or she can clearly see everything he or she needs. And the mind-boggling text is not necessary. If someone wants to use the product he or she sees it is free (inherited from "Free Software License" which means no payment necessary). If someone wants to use it as part of a commercial product he or she sees that this license is non-commercial (inherited from "Non-commercial free software license") and cannot be used for this purpose. If someone wants to know the specifics he or she can read any of these texts to find out.

Explanation

With this scheme everybody gets what they want. The legal experts still have those mind-boggling texts. The texts do not go away and the current legal system doesn't need to change their ways much by incorporating AI to read the texts with them. The users get a clear explanation of the license they can understand.

This scheme is based on some assumptions. First there should be well-defined "abstract" licenses like abstract Java classes which define the basic terms. For example there could be an abstract license for  proprietary software and free software. These abstract licenses are organized in hierarchies. The provisions of the licenses higher in the hierarchy cannot be overridden by the licenses lower in the hierarchy. It is like all the methods/fields of the abstract classes are final. If two provisions contradict each other the provision from the license higher in the hierarchy is used.

Sounds really simple. Why has nobody used this solution before? I don't know. For unknown reasons the judicial system uses the one-license-has-all approach. It is like they work in Java without inheritance always writing the same code even though they need to change only one line. The people in the software world at least understand why writing the same code many times is bad and they came up with the idea of standard licenses like GNU GPL, Apache which people can use if they need it. While it is a good idea it falls short of a comprehensive solution to this problem.
Inheritance could be that solution.

Friday, July 20, 2018

Assembly Line Pattern

Intro

Hi all!
Important: This is my first attempt at a decent blog post about programming. Please go easy on me if I miss something or write something that doesn't look good.

So to the point. Java's CompletableFuture has been available for quite some time and it is a very useful tool indeed. But most well-known patterns from the famous "Design Patterns" book were created a looong time ago when creating more than one thread was rare. The pattern that I called "Assembly Line" will use both some well-known patterns and CompletableFuture to solve one common task.

Motivation

The user pressed a button and the program must execute some actions in response. The actions are executed sequentially slowly building up the end result. Like this:
It seems like an easy task - create some functions and do it! Nothing can be easier, it is simple single-threaded programming! Unfortunately it is not so easy. What if you have several similar tasks like this and in each task the steps are a little different? One step is removed, one is added. For example for some actions you do not need to show the user dialog in others additional processing is required for the parameters. That means a lot of code must be written several times. What if one or two of those steps represent a long-running operation? A special subsystem is called for to create and process such requests.

Applicability

Use this pattern when the process of handling user requests consists of similar steps with a little variation between the requests. The steps are executed sequentially no parallel execution is necessary.

Structure

Simple CompletableFuture

To describe the structure we'll start with the simplest form possible:
This diagram clearly shows the idea but it is very unwieldy. Usually the sequential tasks are not running in isolation. They need to get some information and add some information to the final result. 

Context

We need a context:
This is much better but here is the next hurdle. We need to reuse as much code as possible and that means the code for individual stages (we'll call them stages from now on) must be separated from each other and added as necessary. That means we also need a builder for the stages.

Builder

We pass some parameters to the builder and the builder creates a chain of stages. It is not the same as the Chain of Responsibility pattern because all the stages are always executed. This is more like filters in a servlet.
Now that we have a builder we can take a look at the base class for all stages.

Stage Base Class

The idea is to use "then" methods of CompletableFuture to chain executions and return a CompletableFuture that is executed after all the stages have completed execution. It is best to use an Executor to run the stages.

Complete Picture

The complete picture looks a little more complicated than the one we started with:
Everything looks in place. The final CompletableFuture from the System Entry point returns the context object.

Participants

Assembly Line system entry point - accepts the requests and begin processing. Creates and executes the chain of stages.
Context - stores the information the stages need to execute. Stages may read information from the context and write information to the context
Stage - one piece of work that is executed as part of a chain. Could be a long-running task
StageBuilder - creates the chain of stages based on the user request

Consequences

  1. As usual - reduced coupling. More classes that know nothing about each other :-). Seriously all the code that is used to handle user requests can be divided into small, more or less  independent classes/methods. Then the builder will create the correct chain for every user request. No code duplication, loose coupling through the context object
  2. Ability to execute long-running tasks as part of the chain. The main reason to use CompletableFuture in this pattern. In fact some stages may even use their own thread pools within themselves as long as they return a CompletableFuture.
  3. This structure is easy to understand and extend. It is easy to add more stages if necessary - just add the necessary classes and modify/add the builder. In fact the ability to understand should not be underestimated here. This simple model allows us to break a lot of sequential code into simpler pieces and combine them as necessary.

Final Thoughts

The assembly line system has existed for more than 100 years. It must be very simple for humans to understand and use. Why not use it in programming as well?