Skip to content

Commit

Permalink
typo fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
fab2s committed Oct 27, 2017
1 parent fcaa760 commit f753cd8
Show file tree
Hide file tree
Showing 11 changed files with 29 additions and 25 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
[![Documentation Status](https://readthedocs.org/projects/yaetl/badge/?version=latest)](http://yaetl.readthedocs.io/en/latest/?badge=latest) [![Build Status](https://travis-ci.org/fab2s/YaEtl.svg?branch=master)](https://travis-ci.org/fab2s/YaEtl) [![SensioLabsInsight](https://insight.sensiolabs.com/projects/1f24395f-9b33-4d99-acc7-d286a5f54db4/mini.png)](https://insight.sensiolabs.com/projects/1f24395f-9b33-4d99-acc7-d286a5f54db4) [![Code Climate](https://codeclimate.com/github/fab2s/YaEtl/badges/gpa.svg)](https://codeclimate.com/github/fab2s/YaEtl) [![Codacy Badge](https://api.codacy.com/project/badge/Grade/aa2adb7aac514da497b154d6ad37db3c)](https://www.codacy.com/app/fab2s/YaEtl) [![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/fab2s/YaEtl/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/fab2s/YaEtl/?branch=master) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat)](http://makeapullrequest.com) [![License](https://poser.pugx.org/fab2s/nodalflow/license)](https://packagist.org/packages/fab2s/yaetl)

YaEtl ("Yay'TL", or YetAnotherEtl) is a PHP implementation of a widely extended Extract-Transform-Load (aka ETL) workflow based on [NodalFlow](https://github.com/fab2s/NodalFlow).
ETL workflows comes handy in numerous situations where a lot of records meet with various sources, format and repositories.
YaEtl widely extends this pattern allowing you to chain any number of E-T-L operation with an extra Join one allowing you to join records among extractors as you would do it with a DBMS. YaEtl can even just Extract and load with no transformation involved, or even just load or transform. If we where to acronym the workflow behind YaEtl, it could result in *NEJTL* for *Nodal-Extract-Join-Tranform-Load* workflow.
ETL workflow comes handy in numerous situations where a lot of records meet with various sources, format and repositories.
YaEtl widely extends this pattern allowing you to chain any number of E-T-L operation with an extra Join one allowing you to join records among extractors as you would do it with a DBMS. YaEtl can even just Extract and load with no transformation involved, or even just load or transform. If we where to acronym the workflow behind YaEtl, it could result in *NEJTL* for *Nodal-Extract-Join-Transform-Load* workflow.

> [NodalFlow](https://github.com/fab2s/NodalFlow) was written while YaEtl was already started as it became clear that the pure executable flow logic would better be separated from it. The principle behind NodalFlow is simple, it's a directed graph composed of nodes which are somehow executable, accept one parameter and may be set to return a value that will be used as argument to the next node, or not, in which case the previous and untouched argument will be passed to the next node. Nodes can also be traversable (data generators etc ...) in which case they will be iterated over each of their values in the flow until they run out. When a node is "travsersed", each of the values yielded will trigger the execution of the successor nodes with or without the yielded value as argument, depending on the traversable node properties.
> [NodalFlow](https://github.com/fab2s/NodalFlow) was written while YaEtl was already started as it became clear that the pure executable flow logic would better be separated from it. The principle behind NodalFlow is simple, it's a directed graph composed of nodes which are somehow executable, accept one parameter and may be set to return a value that will be used as argument to the next node, or not, in which case the previous and untouched argument will be passed to the next node. Nodes can also be traversable (data generators etc ...) in which case they will be iterated over each of their values in the flow until they run out. When a node is "traversed", each of the values yielded will trigger the execution of the successor nodes with or without the yielded value as argument, depending on the traversable node properties.
The major interest of such design is, in addition to organize complex task with ease, to create reusable and atomic tasks. Each node in the workflow will be reusable in any other workflow just and strictly as it is. And this can represent tremendous time saving along the way, actually, just more and more over time and as the code base grows.

Expand Down Expand Up @@ -156,7 +156,7 @@ As Extractors will have the up stream return value as argument, it is possible t

Some time, it could be required to extract data from several physical sources and / or shards at a low level, that is without any predefined and ready to use abstraction.

This kind of operation is easy with YaEtl as Extractors can be aggregated to each other when building the flow. You could for example wish to extract data spanning over several sources where each would only keep a specific time frame. The same extractor could then be instantiated for each shard with proper sorting to end up extracting all the data as if it was stored in a single repository. YaEtl would then internally consume each eaxtractor's records in the order they where added to the flow and provide them one by one to the remaining nodes strictly as if a single extractor was used.
This kind of operation is easy with YaEtl as Extractors can be aggregated to each other when building the flow. You could for example wish to extract data spanning over several sources where each would only keep a specific time frame. The same extractor could then be instantiated for each shard with proper sorting to end up extracting all the data as if it was stored in a single repository. YaEtl would then internally consume each extractor's records in the order they where added to the flow and provide them one by one to the remaining nodes strictly as if a single extractor was used.

```bash
+-------------+ +-------------+ +-------------+
Expand All @@ -183,7 +183,7 @@ This kind of operation is easy with YaEtl as Extractors can be aggregated to eac

### Joins

YaEtl provides with all the necessary interfaces to implement Join operation in pretty much the same way a DBMS would (regular and left join). Under the hoods, this require to communicate some kind of record map for joiners to know what record to match in the process. YaEtl comes with a complete `PDO` implementation of a generic Joinable Extractor (against single unique key). Use cases of such feature are endless, especially when you start considering that all the above patterns are fully combinable and even branchable. It is also important to note that YaEtl extractors support extracting records by batches even for joiners which could (and most likely should) be smaller than the extractor joined against (eg smaller sets for `WHERE IN` query types).
YaEtl provides with all the necessary interfaces to implement Join operation in pretty much the same way a DBMS would (regular and left join). Under the hoods, this require to communicate some kind of record map for joiners to know what record to match in the process. YaEtl comes with a complete `PDO` implementation of a generic Joinable Extractor (against single unique key). Use cases of such feature are endless, especially when you start considering that all the above patterns are fully combineable and even branchable. It is also important to note that YaEtl extractors support extracting records by batches even for joiners which could (and most likely should) be smaller than the extractor joined against (eg smaller sets for `WHERE IN` query types).

```bash
+-----------+ +------------+
Expand Down
2 changes: 1 addition & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
],
"require" : {
"php": ">=5.6.0",
"fab2s/nodalflow": "~1.0.0"
"fab2s/nodalflow": "~1.0.2"
},
"require-dev": {
"phpunit/phpunit": "~5.0",
Expand Down
2 changes: 1 addition & 1 deletion src/Extractors/DbExtractorAbstract.php
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ public function getTraversable($param = null)
*
* Now since using shift() will result in an empty
* SplDoublyLinkedList at the end of the extraction cycle,
* the ETL will end up using less RAM when using multiple froms.
* the ETL will end up using less RAM when using multiple forms.
* Otherwise, each extractor would keep its entire last
* extracted collection in RAM until the end of the whole ETL.
*
Expand Down
2 changes: 1 addition & 1 deletion src/Extractors/OnClause.php
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ class OnClause implements OnClauseInterface
protected $defaultRecord;

/**
* Instatiate an OnClose
* Instantiate an OnClose
*
* @param string $fromKeyAlias The from unique key name in record
* @param string $joinKeyAlias The join unique key name in record
Expand Down
2 changes: 1 addition & 1 deletion src/Extractors/PdoExtractorTrait.php
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public function configurePdo(\PDO $pdo)
}

if ($this->dbDriverName === 'mysql') {
// buffered wueries can have great performance impact
// buffered queries can have great performance impact
// with large data sets
$this->driverBufferedQuery = $this->pdo->getAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY);

Expand Down
12 changes: 7 additions & 5 deletions src/Transformers/Arrays/ArrayWalkRecursiveTransformer.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,29 +27,31 @@ class ArrayWalkRecursiveTransformer extends TransformerAbstract
/**
* @var mixed
*/
protected $userdata;
protected $userData;

/**
* @param callable $callable Worth nothing to say that the first callback argument should
* be a reference if you want anything to append to the record
* @param null|mixed $userdata
* @param null|mixed $userData
*/
public function __construct(callable $callable, $userdata = null)
public function __construct(callable $callable, $userData = null)
{
$this->callable = $callable;
$this->userdata = $userdata;
$this->userData = $userData;
}

/**
* Execute the array_map call
*
* @param mixed $record
*
* @throws YaEtlException
*
* @return mixed
*/
public function exec($record)
{
if (!\array_walk_recursive($this->callable, $record, $this->userdata)) {
if (!\array_walk_recursive($record, $this->callable, $this->userData)) {
throw new YaEtlException('array_walk_recursive call failed', 1, null, [
'record' => $record,
]);
Expand Down
12 changes: 7 additions & 5 deletions src/Transformers/Arrays/ArrayWalkTransformer.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,29 +27,31 @@ class ArrayWalkTransformer extends TransformerAbstract
/**
* @var mixed
*/
protected $userdata;
protected $userData;

/**
* @param callable $callable Worth nothing to say that the first callback argument should
* be a reference if you want anything to append to the record
* @param null|mixed $userdata
* @param null|mixed $userData
*/
public function __construct(callable $callable, $userdata = null)
public function __construct(callable $callable, $userData = null)
{
$this->callable = $callable;
$this->userdata = $userdata;
$this->userData = $userData;
}

/**
* Execute the array_map call
*
* @param mixed $record
*
* @throws YaEtlException
*
* @return mixed
*/
public function exec($record)
{
if (!\array_walk($this->callable, $record, $this->userdata)) {
if (!\array_walk($record, $this->callable, $this->userData)) {
throw new YaEtlException('array_walk call failed', 1, null, [
'record' => $record,
]);
Expand Down
2 changes: 1 addition & 1 deletion src/Transformers/CallableTransformer.php
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
class CallableTransformer extends PayloadNodeAbstract implements TransformerInterface
{
/**
* Instantiate the transformmer
* Instantiate the transformer
*
* @param callable $payload
*/
Expand Down
2 changes: 1 addition & 1 deletion src/YaEtl.php
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ protected function aggregateTo(ExtractorInterface $extractor, ExtractorInterface
$aggregateNode = new AggregateNode(true);
$aggregateNode->addTraversable($this->nodes[$aggregateWithIdx])
->addTraversable($extractor);
// keep track of this extractor before we burry it in the aggregate
// keep track of this extractor before we bury it in the aggregate
$this->reverseAggregateTable[$this->nodes[$aggregateWithIdx]->getNodeHash()] = $aggregateWithIdx;
// now replace its slot in the main tree
$this->replace($aggregateWithIdx, $aggregateNode);
Expand Down
4 changes: 2 additions & 2 deletions tests/TestCase.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
use fab2s\YaEtl\Loaders\NoOpLoader;

// we need these two for phpunit to properly mock NoOpLoader
// doing this allows us to use phpunit awsome spies
// doing this allows us to use phpunit awesome spies
interface TestLoaderInterface extends NodeInterface, ExecNodeInterface, LoaderInterface
{
}
Expand Down Expand Up @@ -104,7 +104,7 @@ public function getPdo()
/**
* We mock loader to just gather all records than made
* their way up there and return it to have the whole
* Flow to return it and allow input / output commparison
* Flow to return it and allow input / output comparison
* The $spy will allow us to inspect invocations and arguments
*
* @return array
Expand Down
4 changes: 2 additions & 2 deletions tests/YaEtlTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ public function joinCasesProvider()
[
// test a join : success means that the to table ends up
// exactly like join table, that is, every join_id are set
// and missmatch are skipped
// and mismatch are skipped
'flow' => (new YaEtl)
->from($fullFrom1)
->join(new PdoUniqueKeyExtractor($this->getPdo(), $joinQuery, 'id'), $fullFrom1, $joinOnClause)
Expand Down Expand Up @@ -205,7 +205,7 @@ public function joinCasesProvider()
],
],
[
// same as left join test with unbalanced batchsizes
// same as left join test with unbalanced batchSizes
'flow' => (new YaEtl)
->from($fullFrom4)
->join($joiner2, $fullFrom4, $leftJoinOnClause)
Expand Down

0 comments on commit f753cd8

Please sign in to comment.