-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
294 lines (212 loc) · 10.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
SYNOPSIS
use Mail::Log::Parse;
$object = Mail::Log::Parse->new({ log_file => '/path/to/logfile' });
%line_info = %{object->next()};
$line_num = $object->get_line_number();
if ( $object->go_forward($amount) ) {
...
}
if ( $object->go_backward($amount) ) {
...
}
%line_info = %{object->previous()};
DESCRIPTION
This is the root-level module for a generic mail log file parser. It is
capable of opening either a compressed or uncompressed logfile, and
either stepping through it line by line, or seeking around in it based
on the logical lines. (Lines not pertaining to the type of log currently
being searched are skipped, as if they don't exist.)
On it's own it doesn't actually do much: You'll need a subclass that can
parse a particular program's log entries. But such subclasses are
designed to be easy to write and use.
USAGE
This is an object-oriented module. Available object methods are below.
In a string context, it will return a string specifying the path to the
file and the current line number. In a boolean context, it will return
whether it has been correctly initialized. (Whether it has a file.)
Numeric context throws an error.
Oh, and iterator context ('<>') returns the same as 'next'...
new (constructor)
The base constructor for the Mail::Log::Parse classes. It takes an
(optional) hash containing path to the logfile as an argument, and
returns the new object.
Example:
$object = Mail::Log::Parse->new({ log_file => '/path/to/logfile' });
Note that it is an error to call any method other than `set_logfile' if
you have not passed it in the constructor.
Optional keys in the hash are 'buffer_length' and 'debug'. The buffer
length is the number of lines to read at a time (and store in the
internal buffer). Default is 128. Setting debug to a true value will
result in some debugging information being printed to STDERR. (I reserve
the right to remove or change the debug info at any time.)
set_logfile
Sets the logfile that this object will attempt to parse. It will throw
exceptions if it can't open the file for any reason, and will return
true on success.
Files can be compressed or uncompressed: If they are compressed, then
`IO::Uncompress::AnyUncompress' must be installed with the relevant
decompression libraries. (As well as version 0.17 or better of
File::Temp.) Currently only 'tgz', 'zip', 'gz', and 'bz2' archives are
supported, but there is no technical reason not to support more. (It
just keeps a couple of lines of code shorter.)
Note that to support seeking in the file the log will be uncompressed to
disk before it is read: If there is insufficient space to do so, we may
have trouble. It also means this method may take a while to return for
large compressed logs.
Example:
$object->set_logfile('path/to/file');
next
Returns a reference to a hash of the next parsable line of the log, or
'undef' on end of file/failure.
There are a couple of required keys that any parser must implement:
timestamp, program, id, text.
Where `timestamp' must the the unix timestamp, `program' must be the
name of the program that reported the logline (Sub-programs are
recommended to be listed, if possible), `id' is the tracking ID for that
message, as reported by the program, and `text' is the text following
any 'standard' headers. (Usually, minus those already required keys.)
This version is just a placeholder: It will return a
'Mail::Log::Exceptions::Unimplemented' exception if called. Subclasses
are expected to override the `_parse_next_line' method to get an
operable parser. (And that is the only method needed to be overridden
for a working subclass.)
Other 'standard' fields that are expected in a certain format (but are
not required to always be present) are 'from', 'to', 'size', 'subject',
delay. 'to' should point to an array of addresses. (As listed in the
log. That includes angle brackets, usually.)
Example:
while $hash_ref ( $object->next() ) {
...
}
or...
while $hash_ref ( <$object> ) {
...
}
previous
Returns a reference to a hash of the previous line of the log, or undef
on failure/beginning of file.
See `next' for details: It works nearly exactly the same. (In fact, it
calls next as a parser.)
go_forward
Goes forward a specified number of (logical) lines, or 1 if unspecified.
It will throw an error if it fails to seek as requested.
Returns true on success.
Example:
$object->go_forward(4);
go_backward
Goes backward a specified number of (logical) lines, or 1 if
unspecified. It will throw an error if it fails to seek as requested.
If the seek would go beyond the beginning of the file, it will go to the
beginning of the file.
Returns true on success.
Example:
$object->go_backward(4);
go_to_beginning
Goes to the beginning of the file, no matter how far away that is.
Returns true on success.
go_to_end
Goes to the end of the file, no matter where it is.
This attempts to be efficient about it, skipping where it can.
Returns true on success.
get_line_number
Returns the current logical line number.
Note that line numbers start at zero, where 0 is the absolute beginning
of the file.
Example:
$line_num = $object->get_line_number();
go_to_line_number
Goes to a specific logical line number. (Preferably one that exits...)
SUBCLASSING
This class is useless without subclasses to handle specific file
formats. As such, attempts have been made to make subclassing as
painless as possible. In general, you should only ever have to implement
one method: `_parse_next_line'.
`_parse_next_line' will be called whenever another line of the log needs
to be read. Its responsibility is to identify the next line, report
where that is in the actual file, and to parse that line.
Specifically, it should *not* assume that every line in the input file
is a valid log line. It is expected to check first.
Mail::Log::Parse is (as of v1.3) a cached inside-out object. If you
don't know what that means, ignore it: just writing `_parse_next_line'
correctly is enough. However, if you find you need to store sub-class
object info for some reason, and want to use an inside-out object syntax
yourself, `$$self == refaddr $self'. Which is useful and fast.
Speed *is* important. It is not unlikely for someone to try to parse
through a week's worth of logs from a dozen boxes, where each day's log
is hundreds of megabytes worth of data. Be as good as you can.
One other thing: Realize that you may also be subclassed. Even if you
parse every possible option of some log format, someone somewhere will
probably have a customized version with a slightly different format. If
you've done your job well, they'll be able to use your parser and just
extend it slightly. Key to this is to leave the *unaltered* line in the
return hash under the 'text' key.
Suggested usage:
Suggestion on how to use the above two methods to implement a
'_parse_next_line' routine in a subclass:
sub _parse_next_line {
my ($self) = @_;
# The hash we will return.
my %line_info = ( program => '' );
# Some temp variables.
my $line;
# In a mixed-log enviornment, we can't count on any
# particular line being something we can parse. Keep
# going until we can.
while ( $line_info{program} !~ m/$program_name/ ) {
# Read the line, using the Mail::Log::Parse utilty method.
$line = $self->_get_data_line() or return undef;
# Program name. (We trust the logs. ;) )
$line_info{program} = $line ~= m/$regrex/;
}
# Continue parsing
...
return \%line_info;
}
UTILITY METHODS
The following methods are not for general consumption: They are
specifically provided for use in implementing subclasses. Using them
incorrectly, or outside a subclass, can get the object into an invalid
state.
ONLY USE IF YOU ARE IMPLEMENTING A SUBCLASS.
_set_current_position_as_next_line
Depreciated: No longer needed. An empty stub exists for
backwards-compatibility.
_get_data_line
Returns the next line of data, as a string, from the logfile. This is
raw data from the logfile, separated by the current input separator.
_clear_buffer
Clears the internal buffer of any data that may have been read into it
so far. Normally you should never need to use this: It is provided only
for those rare cases where something that has already been read may be
changed because of outside input. (For instance: You can change the year
dates are assumed to be in during mid-read on Postfix.)
Avoid using unless actually needed.
BUGS
`go_forward' and `go_backward' at the moment don't test for negative
numbers. They may or may not work with a negative number of lines: It
depends where you are in the file and what you've read so far.
Those two methods should do slightly better on 'success' testing, to
return better values. (They basically always return true at the moment.)
`get_line_number' will return one less than the true line number if you
are at the end of the file, and the buffer was completely filled. (So
that the end of the file is the last space of the buffer.) Changing the
buffer size or just going back and re-reading so that the buffer is
restarted at a different location will allow you to retrieve the correct
file length.
REQUIRES
Scalar::Util, File::Basename, IO::File, Mail::Log::Exceptions
RECOMMENDS
IO::Uncompress::AnyUncompress, File::Temp
AUTHOR
Daniel T. Staal
SEE ALSO
Parse::Syslog::Mail, which does some of what this module does. (This
module is a result of running into what that module *doesn't* support.
Namely seeking through a file, both forwards and back.)
COPYRIGHT and LICENSE
Copyright (c) 2008 Daniel T. Staal. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same
terms as Perl itself.
This copyright will expire in 30 years, or 5 years after the author's
death, whichever is longer.