AWK: Extract Logs for the Given Date(s) from a Log File

If your log file has entries like these:

2011-12-10T22:00:27.996+0000 [http-8080-1] INFO  my.package.MyClass Hello, I'm alive!
2011-12-11T17:05:46.811+0000 [http-8080-15] ERROR my.package.MyClass  - Error caught in DispatcherServlet
        at my.package.MyServiceClass(MyServiceClass.java:36)
...
2011-12-11T17:06:10.120+0000 [http-8080-14] DEBUG my.package.MyClass Whoo, that has been a long day!

Then you can use the following bash script snippet to extract logs only for a particular day or consecutive days, including everything – even lines not starting with the date such as stacktraces – between the first log of the date up to the first log of a subsequent date (default: yesterday):

LOGFILE_ORIG="$0"; LOGFILE="${LOGFILE_ORIG}.subset"
if [ -z "$LOGDAY" ]; then LOGDAY=$(date +%F -d "-1 days"); fi
if [ -z "$AFTERLOGDAY" ]; then AFTERLOGDAY=$(date +%F -d "$LOGDAY +1 days"); fi
echo "Extracting logs in the range (>= $LOGDAY && < $AFTERLOGDAY) into $LOGFILE ..." awk "/^$LOGDAY/,/^$AFTERLOGDAY/ {if(!/^$AFTERLOGDAY/) print}" $LOGFILE_ORIG > $LOGFILE

This date format works on Linux. Date is very flexible and can provide dates in any format, not only yyyy-mm-dd. You may also want to read more about Awk ranges and other tips.

You would run it in one of the following ways:

$ ./analysis.sh /path/to/logfile.log
$ LOGDAY=2011-12-12 AFTERLOGDAY=2011-12-17 ./analysis.sh /path/to/logfile.log

Published by Jakub Holý

I’m a JVM-based developer since 2005, consultant, and occasionally a project manager, working currently with Iterate AS in Norway.