Generators in PHP

PHP official documents say: Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.

Before understanding Generator, we should be understand Object Iteration. We usually use foreach to traversal an array, if wanna to traversal an object, the object must implement Iteration interface. There are 5 methods Iteration interface should implement:

Iterator extends Traversable {
    /* Methods */
    abstract public mixed current ( void )   // return current location element
    abstract public scalar key ( void )      // Returns the key corresponding to the current element
    abstract public void next ( void )       // Moves to a location that points to the next element
    abstract public void rewind ( void )     // Reverting to the location of the first element
    abstract public boolean valid ( void )   // Determines whether the current position is valid
}

These 5 methods are very simple and explicit, in the foreach traversal process, these methods will be implicitly invoked, which next() method is to control the element movement, current() can access the current position of the element. Iterator interface extends the Traversable interface, Traversable is an empty interface, it's a flag, all Iterator interface to achieve the class certainly implements Traversable interface, so we can usually use the following code to determine whether a variable can be traversed by the foreach:

<?php

    if( !is_array( $items ) && !$items instanceof Traversable )
        //Throw exception here

Generator Object

Generator in PHP is generated by a function, it can be used to traverse the foreach, so we can infer that Generator is a class that implements the Iterator interface. Let's look at a classic example of Generator:

function xrange($start, $end, $step = 1) {
    for ($i = $start; $i <= $end; $i += $step) {
        yield $i;
    }
}

foreach (xrange(1, 1000000) as $num) {
    echo $num, "\n";
}

The xrange() function shown above provides the same functionality as the built-in range() function. The only difference is that range() will return an array with one million numbers in the above case, whereas xrange() returns an iterator that will emit these numbers, but never actually compute an array with all of them. Because xrange() returns a Generator object. Sticking to the above example, if you call xrange(1, 1000000) no code in the xrange() function is actually run. Instead PHP just returns an instance of the Generator class which implements the Iterator interface:

$range = xrange(1, 1000000);
var_dump($range);                       // object(Generator)#1
var_dump($range instanceof Iterator);   // bool(true)

The second var_dump output object(Generator)#1, means that variable $range is a Generator object, return by function xrange. This object can be traversal by foreach, so it must implements the Iterator interface. Let's look at class synopsis of Generator:

Generator implements Iterator {
    /* Methods */
    public mixed current ( void )
    public mixed getReturn ( void )
    public mixed key ( void )
    public void next ( void )
    public void rewind ( void )
    public mixed send ( mixed $value )
    public mixed throw ( Exception $exception )
    public bool valid ( void )
    public void __wakeup ( void )
}

It implements the 5 methods in the Iterator and provides 3 new methods, the __wakeup is a magic method for serialization, and Generator implements this method to prevent serialization. The other 2 new methods are throw and send, we'll talk about the send method later. Now we know Generator object can be traversal because it implement Iterator interface, but how it came about?

yield

The yield keyword in PHP can only be used in functions, and the use of the yield keyword function will return a Generator object, we call this function generator function. Here is an example:

<?php

    function gen() {
        yield 1;
    }

    $g = gen();
    echo $g->valid();    // 1
    echo $g->current();  // 1

    echo $g->next();

    echo $g->valid();    //
    echo $g->current();  //

Call the gen function to generate a Generator object assigned to the variable $g, because the Generator object implements the Iterator interface, so you can directly use the Iterator interface method. Call the valid method returns 1, which indicates that the object is currently in an iterative state, and then calls the current method, which also outputs 1, which is the value returned by yield, which is the value of the element of the current iteration. It's the first element in this example, then the next method has been called, it will Generator object to do an iteration, will move the current iteration position to next bit, and then call valid() again, this time the output is empty, the iteration of Generator is terminated, and the call to current() again returns null.

In this example, gen function is a generator function, call this function will return a Generator object and assigned to the variable $g. Because only one yield statement in gen function, so traversal of $g can only be done once.

<?php

    function gen() {
        yield 1;
        yield 2;
        yield 3;
    }

    $g = gen();
    echo $g->valid();
    echo $g->current();
    echo "\n";

    echo $g->next();
    echo $g->valid();
    echo $g->current();
    echo "\n";

    echo $g->next();
    echo $g->valid();
    echo $g->current();
    echo "\n";

    echo $g->next();
    echo $g->valid();
    echo $g->current();

The iterative element in the Generator object is the set of values returned by all yield statements, in this case [1,2,3]. Looks like an array, but it is essentially different from an array. Every iteration of the iteration of the generator object will only execute the code after the previous yield statement. Execution to the yield statement returns a value, this is equivalent to returning from the generator function.

Of course, we don't write the above code in fact, we used loop in the generator function instead of, use foreach to traversal, same as xrange() function above.

send

yield can also be used in the context of an expression, for example, to the right of an assignment statement:

$data = (yield $value);

The yield is equivalent to an expression, it needs to be used in conjunction with the send function in the Generator object. The send function receive a parameter and passes the value of this parameter to the Generator object as a result of the current yield expression, and also resumes the execution of the generator function.

<?php

    function gen() {
        $ret = (yield 'yield1');
        var_dump($ret);
        $ret = (yield 'yield2');
        var_dump($ret);
    }

    $g = gen();
    var_dump($g->current());
    var_dump($g->send('ret1'));
    var_dump($g->send('ret2'));

The output of the code:

string(6) "yield1"
string(4) "ret1"
string(6) "yield2"
string(4) "ret2"
NULL

In the above code, first calls the function gen to generate a Generator object, and then calls the object's current method to return the first value, which is the return value of the first yield statement: 'yield1', the execution of the gen function will be aborted at this time, then execution the statement var_dump($g->send('ret1'));.

Call $g->send('ret1') with parameters ret1, it will be assigned to the first yield expression: yield in (yield 'yield1') (Not that not include yield1 at this time), the value of it is ret1, then it will be assigned to $ret, so the second output: ret1 is the return by first var_dump in gen function. At this time iteration of the Generator object will resume and continue. In fact call the next function. It will execute to the next yield statement: yield 'yield2', this statement will return yield2. It will be the return value of $g->send('ret1'), so the second var_dump will output yield2 outside the function.

Finally, have to call the send function again with parameters ret2. The element at the current position of the Generator object is on the second yield of the gen function, ret2 will be passed to the second yield expression, as the value of yield in (yield 'yield2') assigned to the $ret. Then gen function resumes execution, the last var_dump in function gen, at this point the generator object $g traversal is over, the second send function return value is NULL, which is the function of the last var_dump output.

yield as a statement in the generator function. Any value of expression after the yield will be return value of call generator function. If there is no expression after the yield (variables or constants), it will return NULL, which is consistent with the return statement.

yield is also an expression whose value is the value passed by the send function, call the send method, if the Generator object iteration is not the end, the current position of the yield will be getting the value come from send method.

We can think of yield as both a statement (return a value for the generator function) and an expression (receive the value from the Generator object).

If not consider the use of Generator to achieve coroutine, generator be accompanied by a large amount of data collection traversal to save space, it is obvious. We write a simple benchmark to compare the range function with xrange function of the time and space overhead:

<?php

    $n           = 100000;
    $startTime   = microtime(true);
    $startMemory = memory_get_usage();
    $array       = range(1, $n);

    foreach($array as $a) {
    }

    echo memory_get_usage() - $startMemory, " bytes\n";
    echo microtime(true) - $startTime. " ms\n";


    function xrange($start,$end,$step=1) {
        for($i=$start;$i<$end;$i+=$step) {
            yield $i;
        }
    }

    $startTime   = microtime(true);
    $startMemory = memory_get_usage();
    $g           = xrange(1,$n);

    foreach($g as $i) {
    }

    echo memory_get_usage() - $startMemory, " bytes\n";
    echo microtime(true) - $startTime. " ms\n";

The output of the code in (PHP 5.5.38):

14649144 bytes
0.015892028808594 ms
408 bytes
0.067026853561401 ms

From this test range function to generate an array containing 100000 integers, and then traverse the data, it requires storage space for the 14649152 bytes, about 14 MB. The use of Generator doesn't need to generate an array of all the elements, so its space overhead is 408 bytes.

Reference Cooperative multitasking using coroutines (in PHP!)

0.00 avg. rating (0% score) - 0 votes