Skip to content
Matthias Hecker edited this page Mar 9, 2015 · 1 revision

Filters can be briefly described as "means to transform data into other data". Yes, they are that generic, although of course there a few common applications that are supposed to obey some rules.

Common filter interface

A common (and currently the only) way to create filters is to do it in appropriate plugins. The method is

@bot.register_filter(fname, [gname]) { |s|
  ... stuff ...
}

where fname is the filter name, gname is an optional group name and the parameter s that gets passed to the block is a DataStream object, which is a glorified Hash whose :text key is considered its main key. The filter thus has a DataStream as input, and should return a DataStream, or at least a Hash, as output.

For more complex filters, it's write to write the actual filtering code in its own method, and call it from the block:

def fname_filter(s)
  ... stuff ...
end

@bot.register_filter(:fname, [:gname]) { |s| fname_filter(s) }

To use one or more filters, you can call

@bot.filter(:fname1, ..., :fnameN, ds)

where :fname1, … :fnameN are filter names, and ds is a DataStream. Alternatively, the last argument can be replaced by a String followed by a Hash, with either the String or the Hash (but not both) being optional. The String will become the DataStream :text value.

This call will filter ds through all of the specified filters, the output of each of them being passed through as input to the next one.

Filter groups

Filter groups are used to collect filters with similar or related.

:htmlinfo

The :htmlinfo filter and its filter group is used to summarize web pages. Its output is typically used by the url plugin to display information on web pages linked in channels watched by the bot.

The input DataStream passed to an :htmlinfo filter might or might not have a :headers key. If it has, then the DataStream was created from a (partially downloaded) webpage, and the :headers value holds the HTTP response headers. The Utils.check_location method is used to check the location of the webpage against a given regular expression, and nil will be returned unless the location matches.

The input DataStream :text holds the webpage. Note that in general the amount of data passed on to :htmlinfo filters is not the entire webpage, but only the amount downloaded for htmlinfo purposes, which is held by the bot http.info_bytes configuration option.

The DataStream returned by an :htmlinfo filter should contain at least two keys: :title, with the page title, and :content, with the summarized webpage content. Since currently all :htmlinfo filters are tried, nil should be returned when the filter is not able to handle a given page.

def fname_filter(s)
  loc = Utils.check_location(s, /site.regexp/)
  return nil unless loc
  ... stuff ... # retrieve the page title and summarize its content
  return {:title => title, :content => content}
end

@bot.register_filter(:fname, :htmlinfo) { |s| fname_filter(s) }

:sendmsg

The :sendmsg filter group is used to manipulate messages that are to be sent by the bot. The filters are invoked by the sendmsg() method (and thus indirectly also by methods such as say() or reply(), with a DataStream with the following four keys:

  • :text: the message text
  • :type: the message type (usually PRIVMSG or NOTICE)
  • :dest: the message destination (usually an Irc::Channel or Irc::User)
  • :options: any options passed on to sendmsg(), including the default send options.

The output of each filters in the :sendmsg group is merged into the DataStream and passed on to the next. These filter can thus alter the message destination, or its type, or the options, and, of course, the message text itself. Beware that debugging errors in these filters is almost impossible to do from IRC itself.

def nobadword_filter(ds)
  ds[:text].gsub!(/badword/,'****')
  return ds
end

@bot.register_filter(:nobadword, :sendmsg) { |ds| nobadword_filter(ds) }

RSS types

RSS types are filters in the :"rss.out" group, and they are used to turn a DataStream representing the feed into the output to be provided on IRC.

The RSS plugins allows output customization by letting the user define new types. An RSS type is actually a filter that takes in a DataStream describing the feed and item to be processed, and whose output should be composed of up to two lines.

Custom RSS types are loaded by the files filters/rss.rb and rss/types.rb under the bot data dir (typically ~/.rbot). These files will typically contain one or more blocks in the following form:

rss_type :forum do |s|
  line1 = "%{handle}%{date}%{author} wrote about %{title} @ %{link}"
  line2 = "%{desc}" 
  #make_stream(line1, nil, s)
  make_stream(line1, line2, s,
              :handle_wrap => ["#{Reverse}[", "]#{Reverse} "],
              :author_wrap => Bold,
              :title_wrap => [Irc.color(:green), NormalText])
end

The example block defines a new type called forum. Each defined type must end with a make_stream command, where:

the first parameter is a string describing the first line the second line is a string describing the second line, or nil if this format only provides one-liners the third parameter is the same DataStream s that was passed as input to the filter optionally, some overrides for the default DataStream fields The commented #make_stream(line1, nil, s) line shows an example with a single line and no overrides, while the uncommented form shows the full command with two lines and overrides.

The String which define the lines (conventionally associated to local variables line1 and line2) are arbitrary, and can use the %{} variable substitution syntax to take the values from the provided DataStream s, that contains the following elements:

  • :item : the item object
  • :handle : the handle used for the RSS feed
  • :date, :author, :title, :link, :desc, :category, :categories : the date, author, title, link, description/content and subject/category/categories of the item
  • :at : the string " @ " if both a title and link are present in the item, an empty string otherwise

For each key, there is also a corresponding _wrap key available that can be used to wrap the content of the respective element with given strings or formatting instructions. The value of a _wrap can be either a single String (which will be placed both before and after the corresponding value), an Array (whose first and second value will be put respectively before and after the corresponding value) or nil, meaning no wrapping should be done. By default, these are defined:

      :handle_wrap => ['::', ':: '],
      :date_wrap => [nil, ' :: '],
      :title_wrap => Bold,

These can be overridden using the final optional hash in the make_stream command. For example, to prevent the bold title one would use something like make_stream(line1, line2, s, :title_wrap => nil).

Clone this wiki locally