Unix Commands: Let’s Build Cat
Introduction
Software engineers take advantage of built-in Unix commands to solve larger problems. Let’s implement cat; a command that logs the contents of a file to the command line and much more.
The Goal
The idea is pretty simple; take in a file as an argument, open the file, load the content of the file line-by-line into a buffer, and print that buffer to stdout. The Unix cat command is also equipped with different option flags that allow the user to manipulate the buffer before sending it to stdout. Before we start, let’s get a better understanding of the different ways that the cat command can be used.
Usage
Every Unix command is equipped with its own manual page. A manual page documents the different usages, option flags, examples, and other data important to the behavior of the program. In a new terminal window, run:
$ man cat
This will bring up the manual page for the cat command in your terminal. The page will look something like this…
Here we have everything we need to understand how the cat command functions. Let’s go through what we have learned so far…
Cat is short for “concatenate and print files” and its usage is as follows:
$ cat [-benstuv][file ...]
The first argument shows the different option flags that can be applied to manipulate the buffer before printing it to stdout. Under the description section, we can see the intended behavior that each flag has on the buffer. We will try to add a couple of these flags into our own implementation. All additional arguments after any applied option flags are considered files that should be logged to stdout. The cat command is capable of reading multiple files and outputting their contents sequentially in the order that they were provided. If there are no files provided as arguments, the cat command will read from stdin. Finally, if the argument file is a Unix domain socket, the cat command will connect to the socket and read and output its contents until an EOF is reached.
Let’s Get Started
Now that we have a good understanding of how the cat command should behave, let’s incrementally implement some key features. In a new terminal window, let’s make a directory for our project and create a new C file and give it a basic main function capable of taking command-line arguments…
$ mkdir myCat
$ cd myCat
$ vim myCat.c
Printing To Stdout
Before we attempt to print a file to stdout, let’s first work on printing stdin inputs. Remember, if no file names are passed as arguments, cat will read directly from stdin. We can see an example of this by running…
$ cat
Here, we see that the cat program does not terminate, but instead, waits for input from stdin. Let’s type a few lines and see how it behaves…
As you can see, each time return is pressed, the line of text is being sent to stdin. Cat then reads from stdin and writes everything to stdout. In this special case of running cat without passing a filename as an argument, an EOF is never reached. Since there is no EOF to stop cat from reading from stdin, the program appears to run in an infinite loop. You can send an EOF through stdin by pressing control-d, or by terminating the program with control-c.
Let’s implement this behavior in our own cat program. First, we need to create a buffer that will store the input from stdin before we print it to stdout. For this, we will use a 4K char array. Next, we need a way to read one line of stdin and store it in our buffer. We will do this by using the fgets function.
The fgets function reads at most one less than the number of characters specified by size parameter from the given stream and stores them in the buffer. Reading stops when a newline character is found, at EOF, or an error occurs. The fgets function also appends a ‘\0’ to the end of the buffer and retains any new line characters that may have been read from the stream. Upon successful completion, fgets returns a pointer to the new string. If an EOF occurs before any characters are read, fgets return NULL, and the buffer content remains unchanged. More information on the fgets function can be found on the fgets manual page.
Now that we know the behavior of fgets, let's use it to read from stdin and print to stdout. From the manual page, we learned that fgets will return NULL when and EOF occurs before any characters are read. This behavior is perfect for creating the loop that will terminate when we are done reading in lines from stdin. All we need to do now is surround fgets in a while loop, pass it our buffer, buffer size, and stdin, and print the buffer to stdout in the body of the while loop. Let’s compile the program and see what we get…
$ gcc myCat.c -o myCat
$ ./myCat
we are close, but our output isn’t quite right. Notice the extra spacing between lines that we did not see when using the real cat command. Recall from the manual page, the fgets function also appends a ‘\0’ to the end of the buffer and retains any new line characters that may have been read from the stream. When we typed “Let’s Build Cat” followed by the return button, the whole string including the ‘\r’ return character gets sent to stdin. The fgets function then reads what is sent to stdin and appends the ‘\0’ null terminating character to the buffer, making the final value stored in the buffer, “Let’s Build Cat\r\0”. It’s these extra return characters that are causing the unwanted behavior when printing our buffer. This can be easily fixed by overwriting the last character in the buffer with a ‘\0’ null terminator. After implementing this change, let's see what we have now…
$ gcc myCat.c -o myCat
$ ./myCat
And there we go! A cat implementation that reads from stdin. Next, we work on adding the ability to read from files.
Open The File For Reading
Our next goal is to read the contents of a single file to stdout. To do this, we need to open the file passed as an argument using the fopen function. We will give fopen two arguments: the name of the file passed in as a command line argument, and a string literal telling fopen what we want to do with the file. The string literal, “rb”, stands for “read binary” and tells fopen that we are going to be reading from the file as opposed to writing to it. fopen returns a type FILE * that we will read from and store its contents in a buffer. After each fopen call, it is important to check for a NULL file pointer. If the file pointer is NULL after the fopen call, fopen failed to execute properly. When you are done using your file pointer, remember to close it using fclose. More information on the fopen function can be found on the fopen manual page.
If we implement file reading without the ability to read from stdin, here is what we would have…
Although this implementation works, we want our cat command to know whether the user has passed a file as a command line argument or not. If they have, fgets should read from that file. If not, fgets should read from stdin. In order to be able to read from stdin and from files passed as command line arguments, we need to do a few modifications to our current work. First, let’s declare the FILE *fp before we initialize it with open. Next, we should surround the code that handles the opening of the file with an if statement. For now, let’s say if the user passes a command line argument, our FILE *fp will be initialized. If not, this block of code will be skipped and the FILE *fp will remain NULL. Although making these changes will limit us from being able to open multiple files or take option flags as command line arguments, we will address these problems soon. Finally, instead of passing in either stdin or our FILE *fp to fgets, we use a simple ternary operator that checks if our FILE *fp is NULL and chooses the right file stream for the job. Here’s what it looks like…
Now let’s test our cat command with and without arguments. First, I will run our program without any command line arguments and enter the familiar phrases we used in the previous section. Second, I will run our program by passing the myCat.c file in as a command line argument. Let’s see What we’ve got now…
$ gcc myCat.c -o myCat
$ ./myCat
$ ./myCat myCat.c
Pretty cool, right? Now we are getting close to implementing the full functionality of the Unix cat command. Next, we will look into adding the ability to read multiple files, and finally, adding a few option flags.
Reading Multiple Files
The trick here is to give our program the ability to read multiple while making sure that it continues to read from stdin when files are not provided. For this, we will add a currentFile variable and iterate through the different command line arguments. If file names were passed as arguments, we want to set the currentFile variable to the first argument passed via the command line. If no arguments are passed, we'll just set the currentFile variable to zero. This will cause the loop to only execute one time and will give us the functionality to read from stdin. Doing that looks something like this…
I’ve gone ahead and created a file called textfile.txt and added our familiar test phrases to it. Let’s test our program with no command line arguments, passing one file name as an argument, and passing two file names as arguments.
$ gcc myCat.c -o myCat
$ ./myCat
$ ./myCat testfile.txt
$ ./myCat testfile.txt myCat.c
And there we have it; the ability to read multiple files. Finally, let's see if you can add a few option flags that allow us to manipulate the buffer before printing.
Adding Option Flags
To implement the option flag functionality, we will be using a function called getopt. The getopt function provides a simple way to parse option flags. Getopt takes in three arguments: argc, argv, and a string literal containing all of the option flags you choose to implement. Getopt returns a negative one once it has iterated through every command line option. Using a while loop, we can use this behavior to loop through every option flag and use a switch statement to set variables for each flag. Here is what our getopt loop will look like for our cat program…
In this case, I chose to implement the b, e, n, and s flags. For details regarding the behavior of these flags, refer back to the cat manual page. Now, besides a few changes to our previous program that will allow us to use option flags, the only thing left is to implement the logic to manipulate the buffer before printing based on the option flags that are passed to the command line. I will not go into detail regarding the implementation of each flag, however, I will provide the final source code so you can see the logic involved.
A Quick Test of Our Final Program
Let’s see how well our cat program stands up to the real thing. For this, we will use the diff command to see if there are any differences in the outputs of our cat program and the real one. We can use diff to check the differences between the stdout of two programs like this…
$ diff <(program1) <(program2)
let’s stack all of our implemented option flags and pass in our myCat.c file and see if we find any differences…
$ diff <(cat -bens myCat.c) <(./myCat -bens myCat.c)
And there we have it. No differences!
Conclusion
I hope you enjoyed this implementation of the Unix cat command. Although our version does not implement every feature the real cat provides, it is very functional and has given us a better understanding of how something like this would be implemented. If you liked this tutorial, stay tuned for future Unix command implementations. Bellow is the final version of our implementation as well as a link to a GitHub repository so you can play with it yourself!