A Functional Has Subsequence in Scala

Determining if a subsequence exists inside of a larger sequence is common coding fare. Previously, I covered Kadane’s Algorithm and another problem using Java. The dynamic programming approach seen in Kadane’s is the most efficient way I know of to approach a problem like this. By that, I mean it is best to approach any problem dynamically where the problem basically consists in finding some type of sequence within another one.

index_card_13101_sm

Boiling it down, which of the following approaches seems better?

Approach 1 – Brute force
The first thing that might come to mind is to loop through the larger sequence and once you find a match with the first element of the search subsequence, begin a second loop that repeats as long as each successive element matches, until the subsequence is exhausted.

Here’s something you can run on your command line (save it as HasSub.java and compile). I bolded the relevant function demonstrating this approach.

import java.lang.Integer;
import java.util.Arrays;

class HasSub {

  public static boolean hasSubsequence(int[] sequence, int[] sub) {
    boolean found = false;
    for(int i = 0; i < sequence.length; i++) {
      int j = 0;
      while(sub[j] == sequence[i + j] && ++j < sub.length);
      if(j == sub.length) {
        found = true;
        break;
      }
    }
    return found;
  }

  private static int[] parseArrayFromStr(String s) {
    String[] sArr = s.split(",");
    int[] a = new int[sArr.length];
    int i = 0;
    for(String num : sArr) {
      a[i++] = Integer.valueOf(num);
    }
    return a;
  }


  public static void main(String[] args) {
    System.out.println();
    if(args.length != 2) {
      System.out.println("Please pass only two sequences. Example: HasSub 1,2,3,4,5,6,7,8,9 1,2,3");
    } else {
      int[] sequence = parseArrayFromStr(args[0]);
      int[] sub = parseArrayFromStr(args[1]);
      System.out.printf("Does sequence %s contain %s?%n", Arrays.toString(sequence), Arrays.toString(sub));
      System.out.printf("Result: %s %n", hasSubsequence(sequence, sub));
    }
  }
}

Approach 2 – Dynamic programming
The dynamic approach is “naive” but more efficient. It’s naive because it does not store nearly as much information. It is more efficient because it requires fewer computations. Rather than start a second loop from each successive element in the larger sequence, we simply track a single integer which represents how much of the subsequence we’ve matched so far. Thus, we only have to compare each item in the list once.

Again, in Java, with the relevant function bold-faced for easy identification. Save to a file called DynamicHasSub.java and compile for command-line usage:

import java.lang.Integer;
import java.util.Arrays;

class DynamicHasSub {

  public static boolean hasSubsequence(int[] sequence, int[] sub) {
    int matchIndex = 0;
    for(int i = 0; i < sequence.length; i++) {
      if(sequence[i] == sub[matchIndex]) {
        matchIndex++;
        if(matchIndex == sub.length) {
          break;
        }
      } else {
        matchIndex = 0;
      }
    }
    return matchIndex == sub.length;
  }

  private static int[] parseArrayFromStr(String s) {
    String[] sArr = s.split(",");
    int[] a = new int[sArr.length];
    int i = 0;
    for(String num : sArr) {
      a[i++] = Integer.valueOf(num);
    }
    return a;
  }


  public static void main(String[] args) {
    System.out.println();
    if(args.length != 2) {
      System.out.println("Please pass only two sequences. Example: HasSub 1,2,3,4,5,6,7,8,9 1,2,3");
    } else {
      int[] sequence = parseArrayFromStr(args[0]);
      int[] sub = parseArrayFromStr(args[1]);
      System.out.printf("Does sequence %s contain %s?%n", Arrays.toString(sequence), Arrays.toString(sub));
      System.out.printf("Result: %s %n", hasSubsequence(sequence, sub));
    }
  }
}

The benefit of the dynamic approach is more than simplicity. Efficiency is also greater. Now we complete the task of searching for the subsequence in but one pass through the larger sequence, rather than touching many elements multiple times due to false / incomplete matches.

Approach 3 – Functional dynamic
Now to demonstrate taking this concept a little further. The following is some code I wrote while reading Paul Chiusano and Rúnar Bjarnason’s excellent book Functional Programming in Scala. This actually comes from exercise 3.24 of the book.

This code demonstrates how Scala’s match/case syntax can be used to simply “resolve” the problem down into the result. It’s recursive, functional, and simple. In the interest of saving space, I’m leaving out the implementations of List.head. List.head returns the first element in the list.

@annotation.tailrec
  def hasSubsequence[A](l: List[A], sub: List[A], originalSub: List[A]): Boolean = {
    (l, sub) match {
      case (_,Nil) => true
      case (Cons(h,t), Cons(h2,t2)) if(h == h2) => hasSubsequence(t, t2, originalSub)
      case (Cons(h,t), Cons(h2,t2)) if(h != h2 && h2 == List.head(originalSub)) => hasSubseq(t, originalSub, originalSub)
      case (Cons(h,t), Cons(h2,t2)) if(h != h2 && h2 != List.head(originalSub)) => hasSubseq(l, originalSub, originalSub)
      case _ => false
    }
  }

Functional dynamic approach walkthrough
coil_24484_sm
Let’s say you want to see if {1,2,3} exists in {4,1,1,2,3,7,8,9}. The approach taken here is to call the function with the sequence, the sub, and the originalSub. Initially the subsequence is the same as the original subsequence so the call to the function looks like:

hasSubsequence(List(4,1,1,2,3,7,8,9), List(1,2,3), List(1,2,3))

Every time the function is called, we see if the two heads of each list match. The head of the sequence is 4 and the head of the sub is 1 on the first pass. When they don’t match, we simply call the function again using the tail (the rest) of the sequence, and pass the “originalSub” in both parameters.

It matches on this line:

case (Cons(h,t), Cons(h2,t2)) if(h != h2 && h2 == List.head(originalSub)) => hasSubsequence(t, originalSub, originalSub)
hasSubsequence(List(1,1,2,3,7,8,9), List(1,2,3), List(1,2,3))

Now the head of each list is “1” and they match.

case (Cons(h,t), Cons(h2,t2)) if(h == h2) => hasSubsequence(t, t2, originalSub)

So the function calls itself with the tail of each list. In case this doesn’t pan out to be a full match, originalSub is still passed along. The reason for doing this is to keep the function tail-recursive and allow the computer to not have to hold each call in memory until successive calls resolve. Translating the above parameters to their actual values at this point the call is equivalent to:

hasSubsequence(List(1,2,3,7,8,9), List(2,3), List(1,2,3))

Now the heads no longer match. Sequence has “1” as its head and sub has “2” as its head. So we found a partial match. Now it starts sub matching from the beginning by using the originalSub that was passed in:

case (Cons(h,t), Cons(h2,t2)) if(h != h2 && h2 != List.head(originalSub)) => hasSubseq(l, originalSub, originalSub)

Let me pause and explain this. The second condition in the if guard (h2 != list.head(originalSub)) caught that we were in the middle of a partial match. After a partial match is found, we shouldn’t advance to the next element of the sequence yet. We need to compare the current element we are on with the head of the original sub. If we advanced along to the tail at this point we would lose this important step. We need to pass the full existing sequence (held in the l variable) into the next call along with the full original subsequence (originalSub) so that we can have the right comparison.

The above translates to…

hasSubsequence(List(1,2,3,7,8,9), List(1,2,3), List(1,2,3))

When that is called, the heads of sequence and sub match (“1” matches “1”) so this line gets executed:

case (Cons(h,t), Cons(h2,t2)) if(h == h2) => hasSubsequence(t, t2, originalSub)

This translates to:

hasSubsequence(List(2,3,7,8,9), List(2,3), List(1,2,3))

And then the same process repeats with this being the penultimate call:

case (Cons(h,t), Cons(h2,t2)) if(h == h2) => hasSubsequence(t, t2, originalSub)

…translated:

hasSubsequence(List(3,7,8,9), List(3), List(1,2,3))

After this call is made and again we call:

case (Cons(h,t), Cons(h2,t2)) if(h == h2) => hasSubsequence(t, t2, originalSub)

…”sub” is now empty (or “Nil”). That is, the “tail” of List(3) is Nil. So on that final call this part of the function executes:

case (_,Nil) => true

And the results come back: yes, true, List(1,2,3) is found in List(4,1,1,2,3,7,8,9).

Until next time…

One thought on “A Functional Has Subsequence in Scala

Leave a Reply

Your email address will not be published. Required fields are marked *