Log parser PHP - Parse the log while it gets modified by another process

Armin Source

I am building a log parser in PHP. The log parser program runs in an infinite loop and scans through the log lines, then does some additional processing for each line.

Log parser uses inotify to detect whether the log file was modified, and then it opens the file again, goes to the previously processed line number and then processed onward. The previously processed line number is stored in a variable and incremented each time a log line is processed. It is also stored into a file so if a log program crashes, it can continue where it last stopped processing.

My problem is that if the log is modified, the parser program does not refresh the contents of the file that was originally opened before the modification, meaning that after the loop iterates to the end of the log, it is waiting for the inotify to signal that the file is modified, which is fine, but then it reopens the whole file again and goes line by line again to the last processed line. This might be performance intensive if log contains a lot of lines. How can I avoid this and get the file updates immediately without reopening the file and skipping N processed lines all over again?

Example code:

$ftp_log_file = '/var/log/proftpd/my_log.log';
$ftp_log_status_file = '/var/log/proftpd/log_status.log';
if ( ! file_exists($ftp_log_status_file)) {
  die("failed to load the ftp log status file $ftp_log_status_file!\n");
}
$log_status = json_decode(file_get_contents($ftp_log_status_file));

if ( ! isset($log_status->read_position)) {
  $read_position = 0;
} else {
  $read_position = $log_status->read_position;
}

// Open an inotify instance
$inoInst = inotify_init();
$watch_id = inotify_add_watch($inoInst, '/var/log/proftpd/my_log.log', IN_MODIFY);

while (1) {
  $current_read_index = 0;
  $events = inotify_read($inoInst);

  $fd = fopen($ftp_log_file, 'r+');
  if ($fd === false)
    die("unable to open $ftp_log_file!\n");

  while ($line = trim(fgets($fd))) {
    $current_read_index++;
    if ($current_read_index < $read_position) {
      continue;
    }

    // DO SOME LOG PROCESSING

    $read_position++;
    $log_status->read_position++;
    file_put_contents($ftp_log_status_file, json_encode($log_status));
  }
  fclose($fd);
}

// stop watching our directory
inotify_rm_watch($inoInst, $watch_id);

// close our inotify instance
fclose($inoInst);
phplogging

Answers

answered 5 days ago Peter #1

fgets seems to remember the fact that the end of file was reached, and future fgets fail silently. An explict fseek() before fgets() seems to fix this.

<?php

$inoInst = inotify_init();
inotify_add_watch($inoInst, 'foo.txt', IN_MODIFY);

$f = fopen('foo.txt', 'r');

for (;;) {
    while ($line = fgets($f)) {
        echo $line;
    }

    inotify_read($inoInst);
    fseek($f, 0, SEEK_CUR); // make fgets work again
}

Note that there is still the issue of incomplete lines. The line you are currently reading may not be complete yet (e.g. proftpd will finish it with its next write() call).

Since fgets doesn't let you know if it reached a newline or the end of the file, I don't see a convenient way to handle this from the top of my head. The only thing I can think of is to read N bytes at a time and split the lines yourself.

comments powered by Disqus