If you want to know what’s cooking on the net, you will at times have to process a lot of measurements. Examples are logs of webservers, and measurements of network activity. You need this ‘network business intelligence’ in order to figure out what the users are doing, where the capacity is going, where the delays are, and where the ‘funny’stuff’ is .
I used to do that processing with Excel, or had serious data warehouse and business intelligence solutions built. Excel brings a lot of convenience, but maxes out at some point. Custom development with big warehouses is not always a viable solution.
Here is a tool that is open source, runs on Windows and Linux, is extendible and has lots of packages built around it. Link to project page.
Basically it is a command line cruncher of tables. It takes only a few seconds on a laptop to read in half a million records. You’ll be able to generate all kinds of statistics and graphics, and it will even output to PDF and Excel. You will also be able to document all your manipulation and replay it when the data is updated. For example, you can do a pivot table based on the result of another pivot table and have that redone automatically.
So, R is really a great take-away from the CMG conference (link) I went to recently.
R rocks. Don’t leave home without it.
One Comment on “Power tools for logfile crunching”
pve7 January 2010 at 09:27
>Link to R should be http://www.R-project.org/