I fell for the oldest trick in the Scala book.
Iterators are Weird
An Iterator
is not a collection, but rather a way to access the elements of a collection one by one see documentation.
My Mistake
One day at work I was pulling messages from a Kafka queue.
We can create an Iterator
to pull messages from the consumer like so:
val someIterator = consumer.poll(java.time.Duration.ofMillis(100)).iterator().asScala
I was noticing some strange behavior. When I tried to debug, I noticed even stranger behavior. That’s all I’ll say for now.
Let’s Consider an Example
Let’s say we want to transform elements in an iterator to produce some final result:
val someTransformedCollection = someIterator.map(someFunction).map(anotherFunction).toSeq
Note, at the end, we actually have a Stream
.
Elements are lazily fetched because it’s probably not a good idea to materialize this entire collection even though we’ve finished transforming it.
But let’s say we made a mistake a happy, little accident along the way, and someFunction
does not work as intended.
We don’t know that the problem is in someFunction, so we wish to inspect the intermediate states, and the natural (and perfectly reasonable) first step is to log our results.
Let’s make this example more concrete:
def someFunction(x: Int): Int = x + 1
def anotherFunction(x: Int): Int = x + 2
val someIterator = Seq(1, 2, 3, 4, 5).toIterator
Now, we can inspect:
val someIntermediateCollection = someIterator.map(someFunction)
println(s"someIntermediateCollection: $someIntermediateCollection")
val someTransformedCollection = someIntermediateCollection.map(anotherFunction)
println(s"someTransformedCollection: $someTransformedCollection")
Great! Unfortunately, when we run this, we actually get something that looks like:
someIntermediateCollection: non-empty iterator
someTransformedCollection: non-empty iterator
Geez, that’s not helpful. Let’s dive in some more:
val someIntermediateCollection = someIterator.map(someFunction)
println("someIntermediateCollection")
for (someThing <- someIntermediateCollection) {
println(someThing)
}
val someTransformedCollection = someIntermediateCollection.map(anotherFunction)
println("someTransformedCollection")
for (someThing <- someIntermediateCollection) {
println(someThing)
}
And once again we inspect:
someIntermediateCollection
2
3
4
5
6
someTransformedCollection
Wait! Nothing printed in our second section of output! But we know from our function definition above that we have two functions:
someFunction: (x: Int)Int
anotherFunction: (x: Int)Int
Thus, our map operations are called on an Iterator[Int]
and produce an Iterator[Int]
.
That is, map
takes some Collection[A]
and transforms it to a Collection[B]
.
Further, for every A
in the starting collection, we should have some corresponding B
in the transformed collection.
So why are we receiving an empty collection?!
Simple:
An iterator is not a collection.
Boom!
An Iterator
can only be iterated on once!
It’s so easy to get side tracked that we forget the fundamentals.
By trying to inspect it, we used our one and only iteration!
If we try to continue iterating on it, then nothing is left over for us to iterate on.
Conclusion
Do NOT log iterators! It is a waste of your time! If you do log, and you can spare the resources, be sure to materialize the iterator before inspection. Otherwise, you will only lead yourself down the rabbit hole.
🐰