How to use Java Scanner to parse content

0
287
Use Java Scanner to parse content

Java Scanner is one of the most powerful utility class that helps parse input effectively through regular expressions.

This tutorial will give you a better look at Scanner class, and how to work with it properly.

What is Java Scanner?

By definition on official Java documentation about Scanner,

A simple text scanner which can parse primitive types and strings using regular expressions.

Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

Some notice-able points as it says:

  • Internally, Scanner uses regular expression method, so it could parse input quickly.
  • It can parse primitive types and strings only. That means, we cannot use Scanner to parse any custom Java classes.
  • Each match results as a token, and value can be get via nextXXX methods, depending on its types.
  • The delimiter pattern can be changed, default to whitespace.

It looks pretty interesting, so let’s dig deeper into technical parts.

Java Scanner constructors

To use Scanner, we have to instantiate a Scanner object to use through its constructors.

Following are all constructor definitions for Scanner:

  • Scanner(File source) : Constructs a new Scanner that produces values scanned from the specified file.
  • Scanner(File source, String charsetName) : Constructs a new Scanner that produces values scanned from the specified file.
  • Scanner(InputStream source) : Constructs a new Scanner that produces values scanned from the specified input stream.
  • Scanner(InputStream source, String charsetName) : Constructs a new Scanner that produces values scanned from the specified input stream.
  • Scanner(Path source) : Constructs a new Scanner that produces values scanned from the specified file.
  • Scanner(Path source, String charsetName) : Constructs a new Scanner that produces values scanned from the specified file.
  • Scanner(Readable source) : Constructs a new Scanner that produces values scanned from the specified source.
  • Scanner(ReadableByteChannel source) : Constructs a new Scanner that produces values scanned from the specified channel.
  • Scanner(ReadableByteChannel source, String charsetName) : Constructs a new Scanner that produces values scanned from the specified channel.
  • Scanner(String source) : Constructs a new Scanner that produces values scanned from the specified string.

As you see, generally, we can have three following cases to provide input for Scanner

  • Input from file, that can be initialized via File or Path class.
  • Input from a string content, apparently, a String object.
  • Input from InputStream, this also includes the standard input stream stdin, System.in instance.
REMEMBER
In my experiences, junior developers only knows to use Java Scanner class for getting user input, and don’t remember or know that it can be used to process string and file contents.
So, if you’re new to this, remember and don’t forget Scanner usage as I mentioned above.

Java Scanner usage

Okay, let’s go to the fun part, actual coding for Scanner usage.

Capture user input

We begin with capturing user input from command-line interface.

Here how it can be done.

public class ScannerGuide {
    public static void main(String... args) {
        System.out.println("What's your name? ");
        Scanner scanner = new Scanner(System.in);
        String name = scanner.next();
        System.out.println("Hi " + name + "!");
    }
}

This is the result of the above code when executing.

Java Scanner to get user input

Something went wrong here?! Can you see it?

The problem lies on this method scanner.next(). By definition, it parses the token up to the next string only. So it stops at the next whitespace right after string Pete on input.

To fix this, simply use scanner.nextLine() . This method will change delimiter from whitespace to linefeed; hence, it will get everything user input until Enter key is pressed, which causes linefeed to appear.

Similarly, depending on types of input data, you might need to use appropriate nextXXX() methods to capture input, such as nextInt(), nextByte(), nextFloat(), nextDouble(), nextBoolean(), nextLong(), nextShort()

But wait, still having another problem! Can you find it? Keep reading to the end to find out what the problem is.

Parse content

I’d like to re-phrase that the main purpose of Scanner class is to parse data through a specified delimiter value. By default, the delimiter is whitespace.

To set a delimiter, try this method useDelimiter().

Look at the following example:

public class ScannerGuide {
    public static void main(String... args) {
        String input = "Hello Pete Hello John Hello Jack Hello Mark";
        Scanner scanner = new Scanner(input).useDelimiter("Hello");
        while(scanner.hasNext()) {
            System.out.println(scanner.next());
        }
        scanner.close();
    }
}

Delimiter is set to Hello string, so whenever Scanner finds this delimiter, it will break into token values, which can be retrieved via nextXXX() method.

To check if there is any matching value value remaining on Scanner, use hasNext() method.

Note: after calling any nextXXX() method to get token values, Scanner will jump to the next position inside content to ready for parsing next token values.

Similarly, there is always a hasNextXXX() method to pair with a nextXXX() method, such as hasNext(), hasNextLine(), hasNextLong(), hasNextShort(), hasNextBoolen(), hasNextByte(), hasNextDouble()

GOOD PRACTICE
It is a good practice to call hasNextXXX() before calling nextXXX() to retrieve token values.

Handle errors and exceptions

The following problems will happen when you use Scanner improperly.

  • Use the wrong methods to get token values, it will throw InputMismatchException.
  • Scanner has reached to the end of input, and you still try to call nextXXX() method to get value, it will throw IOException.
  • If you initialize with an InputStream or a Readable input, these are implemented with Closeable interface, you must call close() to close input. Otherwise, the resource will be hold forever. Remember the question at the beginning code above? This is the answer.

Pros and cons

Java Scanner has several pros points:

  • Pre-supported in all JDK versions.
  • Easy to use.

For advance usage, it has following cons:

  • should not be use for a super-large input content.
  • it is a blocking operation while waiting for user input.
  • not safe for multi-threading environment.
  • to use under multi-threading environment, you should apply a synchronization method, for example, using instrinsic lock.

As a result, you can use Scanner freely under single-threaded programs, but you should pay more attentions when using it for multi-threaded programs.

In facts, Scanner object is cheap to create; so developers often instantiate new Scanner objects for quick use, then close() afterward.

In conclusion

Java Scanner class is a very powerful utility class to help parsing text quickly for a small block of input. For Java development, you can apply it either to input string, user direct terminal input and file content.

If you’re learning Java or new to this, don’t forget Java Scanner usage, it might come pretty handy on your development.

Finally, if you like this tutorial, please subscribe or follow us on Facebook, Twitter, Google+ and Youtube to get updates on latest programming tutorials.