Skip to content
Jens Wille edited this page Oct 13, 2020 · 39 revisions

Please add recipes under 'Solutions'. If you have a problem in need of a solution, please add it under 'Open Problems'.

See also

Solutions

The following sections present solutions to common problems.

Emitting value of A if A occurs, but 'x' if A is missing

Please note that this example is outdated: As of Metafacture version 1.2.0 you can use the new if-statement to conditionally emit a value. See this commit for an example.

Use <choose> to give preference to A. Add a data source for '_id' as fallback (the record id literal '_id' is guaranteed to occur in every record).

<choose name="out">
   <data source="A"/>
   <data source="_id">
      <constant value="x"/>
   </data>
</choose>

A variation is to emit for every entity E the value of A if A occurs, but 'x' if A is missing in E:

<choose name="out" flushWith="E">
   <data source="E.A"/>
   <data source="E">
      <constant value="x"/>
   </data>
</choose>

In this case the fallback is E. Note that <choose> needs to be flushed with every occurrence of E.

Emitting value of A whenever B occurs

Please note that this example is outdated: As of Metafacture version 1.2.0 you can use the new if-statement to conditionally emit a value. See this commit for an example.

If A happens only once and before Bs:

<combine name="" value="${a}" reset="false">
   <data source="A" name="a"/>
   <data source="B"/>
</combine>

Note that reset is set to false in order to retain the value of A.

If A happens only once but after Bs, the Bs must be delayed by buffering them:

<combine name="" value="${a}" reset="false">
   <data source="A" name="a"/>
   <data source="B">
      <buffer/>   
   </data>
</combine>

Emitting value of A whenever B not occurs

Please note that this example is outdated: As of Metafacture version 1.2.0 you can use the new if-statement to conditionally emit a value. See this commit for an example.

<combine name="" value="${a}" reset="false">
   <data source="A" name="a"/>
   <choose>
      <data source="B">
         <constant value="FOUND" />
      </data>
      <data source="_id">
         <constant value="NOTFOUND" />
      </data>
      <postprocess>
         <equals string="NOTFOUND" />
      </postprocess>
   </choose>
</combine>

Avoid reset of a single value in a collector

This is a variation of 'Emitting value of A whenever B occurs'.

Imagine a record which stores all variant names of a person in the entity '028@'. We want to combine the name with the person id which occurs only once per record in '[email protected]'. We need to flush the collector to avoid the mixing of name parts between different name, but would like to retain the id. The solution is:

<combine name="name"
        value="${surname}${forename}, ${id}"
        flushWith="028@" reset="true">
    
    <combine name="id" value="${value}" reset='false'>
      <data source="028@"/>
      <data source="[email protected]" name="value"/> 
    </combine>
      
    <data source="[email protected]" name="surname"/>  
    <data source="[email protected]" name="forename"/>  
 </combine>  

The id is re-emitted for every entity '028@'. The reset=false in the inner <combine> assures that the id is retained for the next occurrence of entity '028@'.

Restrict firing to only when the collector isComplete()

This is another variation of 'Emitting value of A whenever B occurs'.

Primarily needed for EqualsFilter but may also be useful as a generic option for flushing collectors:

  • EqualsFilter: Use case described below.
  • Combine, Entity: No actual use case yet.
  • All: Already built-in (as an internal condition in emit()).
  • Choose, Concat, Group, Range, Square, Tuples: Inherently "incomplete" (isComplete() always false).

When transforming data, we need to select fields based on some other field's subfield value:

<combine name="date" value="${value}" sameEntity="true">
  <if>
    <equalsFilter name="" value="" flushWith="POC  .a" reset="true">
      <data name="" source="@portfolio" />
      <data name="" source="POC  .a" />
    </equalsFilter>
  </if>
  <data name="value" source="POC  .b" />
</combine>

But this fires (emit()s) on every flush, even if the target value (@portfolio) wasn't received (it's only populated at a later point in the stream).

The solution is: Restrict firing to only when the collector isComplete():

<combine name="date" value="${value}" sameEntity="true">
  <if>
    <equalsFilter name="" value="" flushWith="POC  .a" flushIncomplete="false" reset="true">
      <data name="" source="@portfolio" />
      <data name="" source="POC  .a" />
    </equalsFilter>
  </if>
  <data name="value" source="POC  .b" />
</combine>

NOTE: This option is only applicable in combination with flushWith.

Since version 5.2.

Open Problems

Clone this wiki locally