I'm obsessed with lazy collections.
Published almost 2 years ago
lazy-collections
is one of my all-time favorite open source projects.
@RobinMalfait designed lazy-collections
to process arrays, Maps, Sets, and other iterable objects with pipelines of super efficient functions, transforming the data into new values.
When I first found the project, the benchmark and the functional programming style sold me on it instantly. Since then, I've been using lazy-collections
functions (and contributing new ones!) for all of my array transformations.
import { pipe, map, filter, includes } from 'lazy-collections'
const oneMillionItems = new Array(1_000_000).fill(0)
const result = pipe(
map((item, index) => index),
filter(index => index % 2 === 0),
includes(0)
)(oneMillionItems)
I didn't really understand what was going on under the hood though, until I finally took the time to learn about JavaScript iterators and generators.
That's a fun topic, and I might deeply explore it in a future post. But I think it's possible to understand the benefits of lazy-collections
without doing the deep dive.
Let's take a look at some code, so we can compare array method chaining to lazy-collections
functions.
Array method chaining vs. lazy-collections
import { pipe, map, filter, includes } from 'lazy-collections'
const oneMillionItems = new Array(1_000_000).fill(0)
/*
When you method chain on arrays, you always iterate over
every single item before moving to the next step, and every
method constructs a brand new array.
*/
const eagerAndSlow = oneMillionItems
// `map` iterates over 1,000,000 items, and creates
// a new array with 1,000,000 items
.map((item, index) => index)
// `filter` iterates over 1,000,000 items, and creates
// a new array with 500,000 items
.filter(index => index % 2 === 0)
// `includes` returns `true` on the first item
.includes(0)
/*
But when you compose lazy-collections functions in a pipeline,
you can lazily pass each individual item through the full
pipeline before moving to the next item. If any individual
item gives your program what it's looking for, the program
will end, and avoid processing all remaining items.
*/
const lazyAndFast = pipe(
// `map` only transforms the first item, then passes it on
map((item, index) => index),
// `filter` only filters the first item, then passes it on
filter(index => index % 2 === 0),
// `includes` returns `true` on the first item, and ends the program
includes(0)
// The other 999,999 items never go through the pipeline.
// Blazing fast! 🚀
)(oneMillionItems)
This is a smart way to work with large arrays that get computed and processed frequently. Take the following UI pattern for example:
- An end user types in a search box
- On every keystroke, you want to filter an array of search candidates to find matches
- In the UI, you just want to display the first 10 matches
Using array method chaining, I'd have to iterate over the entire search candidates array on every keystroke, then slice the filtered result to get the first 10 matches. But with a lazy-collections
pipeline, and virtually no change to my syntax, I can analyze search candidates one at a time, stopping as soon as I find my 10 results.
import { pipe, filter, slice, toArray } from 'lazy-collections'
function matches (candidate, value) {
...
}
function onInput (event) {
/*
This pipeline is virtually identical to what I'd write with
array method chaining, but it almost never has to iterate
over the full candidates array to give me the results I need.
I never create a new array copy until that last step in the
pipeline, where I call `toArray`.
*/
const results = pipe(
filter(candidate => matches(candidate, event.target.value)),
slice(0, 9),
toArray(),
)(candidates)
...
}
With a
lazy-collections
pipeline, and virtually no change to your syntax, you can process array items one at a time, stopping as soon as you get the result you need.
Why does this work?
Wait, so the pipe
function can pass individual array items through an entire pipeline of separately defined functions before moving on to the next item? And it accepts an arbitrary number of functions, and an arbitrary number of items? How the f🤬ck does that work?
It's hard to explain why this works the way it does, because at the end of the day, the explanation is: it's just a language feature of JavaScript.
Asking why this works is like asking why 2 * 2
evaluates to 4
in JS—it's because multiplication is a feature of JavaScript, and the *
operator is how you use that feature.
Maybe a closer analogy—it's like asking why async code works the way it does:
const foo = Promise.resolve().then(() => console.log('foo'))
console.log('bar')
// 'bar' logs first
// 'foo' logs second
If console.log('bar')
is written after console.log('foo')
, why does bar
log before foo
? Because we're using Promise
. Why does Promise
behave like that? It's just a language feature of JavaScript, which can run code asynchronously, if you wrap it in the right syntax.
Likewise, lazily processing array items one-by-one is a feature of JavaScript, and to use that feature, you write code with a special syntax.
Luckily though, lazy-collections
has done all the special syntax work for you under the hood. Instead of having to write wonky stuff like *[Symbol.iterator]
and yield
, you can just lightly refactor your existing array method chains, and reap the benefits.
What about generators?
Generators are wicked cool if you have the right problems to apply them to...and extremely confusing if you don't.
I'm currently tinkering with generators to lay the foundations for a Vue-powered rich text editor, built from scratch, optimizing for accessibility and runtime performance, minimizing complexity for people who want to extend it with custom features, even custom Vue components rendered inside the editor.
You can catch a glimpse of that work in my Baleada Logic repo, but that'll be a topic for a future post.
A quick and dirty explanation: a generator is kind of like an array, but instead of defining the entire array at once, you define a function that returns the next item every time it's called.
With an array, you would write myArray[1]
to get the second item. But with a generator, you would do this:
function* toGenerator() {
...
}
const myGenerator = toGenerator()
// Generate the first item
myGenerator.next().value
// Call the same function again to generate the second item
const secondItem = myGenerator.next().value
lazy-collections
comes to the rescue again here to make things a little easier:
import { at } from 'lazy-collections'
function* toGenerator() {
...
}
const myGenerator = toGenerator()
const secondItem = at(1)(toGenerator())
This toGenerator
function might be set up to generate a list of a million items, but since it's a function instead of an array, those million items are never actually in memory. When we apply lazy-collections
to that function, lazy-collections
will use magical JS language features to generate exactly as many items as it needs, and no more.
The end goal here is always the same: when working with large lists of data, or lists where each item is expensive to compute, try to interact with as few items as possible.
When you create an array, all items will be in memory, but you can still use lazy-collections
to avoid iterating over irrelevant items and holding unnecessary data in memory. When you create a generator, which generates each item of a list on demand, lazy-collections
helps you generate as few items as possible.
function* generateOneMillionItems () {
for (let i = 0; i < 1000000; i++) yield i
}
/*
One million items don't actually exist in memory here.
Instead, `oneMillionItems` is basically a function that
I can call one million times, and it returns the next item
each time I call it.
*/
const oneMillionItems = generateOneMillionItems()
/*
I can pass my generator to a lazy-collections pipeline, just
like I would do with an array.
*/
const result = pipe(
// `map` generates the first item, putting it into memory and
// transforming it with this callback function
map((item, index) => index),
// `filter` operates on the first item after `map` transforms it
filter(index => index % 2 === 0),
// `includes` returns `true` and ends the program.
includes(0)
/*
The other 999,999 items never go through the pipeline, and
they never exist in memory in the first place.
Even more blazingly fast! 🚀
*/
)(oneMillionItems)
It's really wild stuff!
In a lot of UI code, it's probably a negligible performance optimization. But it's challenging and fun to learn about, and I think it'll come in handy for some future projects I have in mind...