PHP official documents say: Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.
Before understanding Generator, we should be understand Object Iteration. We usually use foreach
to traversal an array, if wanna to traversal an object, the object must implement Iteration
interface. There are 5 methods Iteration interface should implement:
Iterator extends Traversable { /* Methods */ abstract public mixed current ( void ) // return current location element abstract public scalar key ( void ) // Returns the key corresponding to the current element abstract public void next ( void ) // Moves to a location that points to the next element abstract public void rewind ( void ) // Reverting to the location of the first element abstract public boolean valid ( void ) // Determines whether the current position is valid }
These 5 methods are very simple and explicit, in the foreach traversal process, these methods will be implicitly invoked, which next()
method is to control the element movement, current()
can access the current position of the element. Iterator
interface extends the Traversable
interface, Traversable
is an empty interface, it's a flag, all Iterator
interface to achieve the class certainly implements Traversable
interface, so we can usually use the following code to determine whether a variable can be traversed by the foreach
:
<?php if( !is_array( $items ) && !$items instanceof Traversable ) //Throw exception here
Generator Object
Generator in PHP is generated by a function, it can be used to traverse the foreach, so we can infer that Generator is a class that implements the Iterator interface. Let's look at a classic example of Generator:
function xrange($start, $end, $step = 1) { for ($i = $start; $i <= $end; $i += $step) { yield $i; } } foreach (xrange(1, 1000000) as $num) { echo $num, "\n"; }
The xrange()
function shown above provides the same functionality as the built-in range()
function. The only difference is that range()
will return an array with one million numbers in the above case, whereas xrange()
returns an iterator that will emit these numbers, but never actually compute an array with all of them. Because xrange()
returns a Generator
object. Sticking to the above example, if you call xrange(1, 1000000)
no code in the xrange()
function is actually run. Instead PHP just returns an instance of the Generator
class which implements the Iterator
interface:
$range = xrange(1, 1000000); var_dump($range); // object(Generator)#1 var_dump($range instanceof Iterator); // bool(true)
The second var_dump
output object(Generator)#1
, means that variable $range
is a Generator
object, return by function xrange
. This object can be traversal by foreach
, so it must implements the Iterator
interface. Let's look at class synopsis of Generator
:
Generator implements Iterator { /* Methods */ public mixed current ( void ) public mixed getReturn ( void ) public mixed key ( void ) public void next ( void ) public void rewind ( void ) public mixed send ( mixed $value ) public mixed throw ( Exception $exception ) public bool valid ( void ) public void __wakeup ( void ) }
It implements the 5 methods in the Iterator
and provides 3 new methods, the __wakeup
is a magic method for serialization, and Generator
implements this method to prevent serialization. The other 2 new methods are throw
and send
, we'll talk about the send
method later. Now we know Generator
object can be traversal because it implement Iterator
interface, but how it came about?
yield
The yield
keyword in PHP can only be used in functions, and the use of the yield
keyword function will return a Generator
object, we call this function generator function. Here is an example:
<?php function gen() { yield 1; } $g = gen(); echo $g->valid(); // 1 echo $g->current(); // 1 echo $g->next(); echo $g->valid(); // echo $g->current(); //
Call the gen
function to generate a Generator
object assigned to the variable $g
, because the Generator
object implements the Iterator
interface, so you can directly use the Iterator
interface method. Call the valid
method returns 1
, which indicates that the object is currently in an iterative state, and then calls the current
method, which also outputs 1
, which is the value returned by yield
, which is the value of the element of the current iteration. It's the first element in this example, then the next
method has been called, it will Generator
object to do an iteration, will move the current iteration position to next bit, and then call valid()
again, this time the output is empty, the iteration of Generator
is terminated, and the call to current()
again returns null
.
In this example, gen
function is a generator function, call this function will return a Generator
object and assigned to the variable $g
. Because only one yield
statement in gen
function, so traversal of $g
can only be done once.
<?php function gen() { yield 1; yield 2; yield 3; } $g = gen(); echo $g->valid(); echo $g->current(); echo "\n"; echo $g->next(); echo $g->valid(); echo $g->current(); echo "\n"; echo $g->next(); echo $g->valid(); echo $g->current(); echo "\n"; echo $g->next(); echo $g->valid(); echo $g->current();
The iterative element in the Generator
object is the set of values returned by all yield
statements, in this case [1,2,3]
. Looks like an array, but it is essentially different from an array. Every iteration of the iteration of the generator object will only execute the code after the previous yield
statement. Execution to the yield
statement returns a value, this is equivalent to returning from the generator function.
Of course, we don't write the above code in fact, we used loop in the generator function instead of, use foreach to traversal, same as xrange()
function above.
send
yield
can also be used in the context of an expression, for example, to the right of an assignment statement:
$data = (yield $value);
The yield
is equivalent to an expression, it needs to be used in conjunction with the send
function in the Generator
object. The send
function receive a parameter and passes the value of this parameter to the Generator
object as a result of the current yield
expression, and also resumes the execution of the generator function.
<?php function gen() { $ret = (yield 'yield1'); var_dump($ret); $ret = (yield 'yield2'); var_dump($ret); } $g = gen(); var_dump($g->current()); var_dump($g->send('ret1')); var_dump($g->send('ret2'));
The output of the code:
string(6) "yield1" string(4) "ret1" string(6) "yield2" string(4) "ret2" NULL
In the above code, first calls the function gen
to generate a Generator
object, and then calls the object's current
method to return the first value, which is the return value of the first yield
statement: 'yield1'
, the execution of the gen
function will be aborted at this time, then execution the statement var_dump($g->send('ret1'));
.
Call $g->send('ret1')
with parameters ret1
, it will be assigned to the first yield
expression: yield
in (yield 'yield1')
(Not that not include yield1
at this time), the value of it is ret1
, then it will be assigned to $ret
, so the second output: ret1
is the return by first var_dump
in gen
function. At this time iteration of the Generator object will resume and continue. In fact call the next
function. It will execute to the next yield
statement: yield 'yield2'
, this statement will return yield2
. It will be the return value of $g->send('ret1')
, so the second var_dump
will output yield2
outside the function.
Finally, have to call the send
function again with parameters ret2
. The element at the current position of the Generator
object is on the second yield
of the gen
function, ret2
will be passed to the second yield
expression, as the value of yield
in (yield 'yield2')
assigned to the $ret
. Then gen
function resumes execution, the last var_dump
in function gen
, at this point the generator object $g
traversal is over, the second send
function return value is NULL
, which is the function of the last var_dump
output.
yield
as a statement in the generator function. Any value of expression after the yield
will be return value of call generator function. If there is no expression after the yield (variables or constants), it will return NULL
, which is consistent with the return
statement.
yield
is also an expression whose value is the value passed by the send
function, call the send
method, if the Generator
object iteration is not the end, the current position of the yield
will be getting the value come from send
method.
We can think of yield
as both a statement (return a value for the generator function) and an expression (receive the value from the Generator
object).
If not consider the use of Generator to achieve coroutine, generator be accompanied by a large amount of data collection traversal to save space, it is obvious. We write a simple benchmark to compare the range
function with xrange
function of the time and space overhead:
<?php $n = 100000; $startTime = microtime(true); $startMemory = memory_get_usage(); $array = range(1, $n); foreach($array as $a) { } echo memory_get_usage() - $startMemory, " bytes\n"; echo microtime(true) - $startTime. " ms\n"; function xrange($start,$end,$step=1) { for($i=$start;$i<$end;$i+=$step) { yield $i; } } $startTime = microtime(true); $startMemory = memory_get_usage(); $g = xrange(1,$n); foreach($g as $i) { } echo memory_get_usage() - $startMemory, " bytes\n"; echo microtime(true) - $startTime. " ms\n";
The output of the code in (PHP 5.5.38):
14649144 bytes 0.015892028808594 ms 408 bytes 0.067026853561401 ms
From this test range
function to generate an array containing 100000
integers, and then traverse the data, it requires storage space for the 14649152 bytes, about 14 MB. The use of Generator
doesn't need to generate an array of all the elements, so its space overhead is 408 bytes.
Reference Cooperative multitasking using coroutines (in PHP!)