Unix Pipelines and Java 8 Streams!

I remember seeing an interesting movie named “Blue Crush” on surfing (the sea, not the internet!). I liked watching how waves roll and sometimes form a beautiful pipeline. In this movie, Anne Marie scores perfect 10 for surfing in a wave pipeline. Programming using Unix pipelines is as much fun and difficult to master. Here is a simple example:

$ cat limerick.txt

There was a young lady of Niger
Who smiled as she rode on a tiger.
They returned from the ride
With the lady inside
And a smile on the face of the tiger.

$ cat limerick.txt | tr -cs “[:alpha:]” “\n” | awk ‘{print length(), $0}’  | sort | uniq

It prints:

1 a
2 as
2 of
2 on
3 And
3 Who
3 she
3 the
3 was
4 They
4 With
4 face
4 from
4 lady
4 ride
4 rode
5 Niger
5 There
5 smile
5 tiger
5 young
6 inside
6 smiled
8 returned

How does it work?

The cat limerick.txt command displays the contents of the given file; the command tr -cs “[:alpha:]” “\n” translates the given input string by removing the non-alphabetic characters and turns them into newline; the awk ‘{print length(), $0}’ prints the length of the string followed by the string; the sort command sorts the input strings; and finally, the uniq command removes the duplicates.

As a whole, these few commands transform the input to the desired output. We can easily treat such “pipes-and-filters” as a “tiny application”!

Now, one of the powerful features added in Java 8 is lambda functions. The stream library in Java 8 exploits lambda functions. Apparently, inspiration for the design of Java streams comes from Unix pipelines! Java 8’s support for functional programming enabled the designers of the stream library to create a small set of functions. Result: insanely powerful functionality.

To help explain this important concept, here is the (somewhat equivalent) code using Java 8 streams:

ListString> lines =
Files.readAllLines(Paths.get(“./limerick.txt”), Charset.defaultCharset());
MapInteger, ListString>> wordGroups
= lines.stream()
.map(line -> line.replaceAll(“\\W”, “\n”).split(“\n”))
.flatMap(Arrays::stream)
.sorted()
.distinct()
.collect(Collectors.groupingBy(String::length));

wordGroups.forEach( (count, words) -> {
words.forEach(word -> System.out.printf(“%d %s %n”, count, word));
}

The lines.stream() creates a stream of lines read from the file “limerick.txt” file. The map() function translates the non-word characters to new-line characters. The flatMap() function converts the stream of individual streams to a single stream of words. The sorted() function sorts the entries in ascending order. The distinct() function removes duplicates. And finally, the words are collected based on the word lengths and later printed in the console. The output remains the same as in the Unix command-line.

The order of the functions may be different, and the syntax does not appear like Unix command-line pipeline at all, but you can easily observe how powerful streams are.

Reflecting on this example, I am surprised how the decades old Unix pipelines continue to inspire modern designers of languages and libraries. If you are not familiar with Unix pipelines, perhaps its not too late to try it out (for fun and profit)!

Leave a Reply

Your email address will not be published. Required fields are marked *