As discussed in part 1 of this series, we are trying to build an idiomatic way of accessing the database from the shell. Here is a quick snippet of how we want the code to look like:
## Setting the DSN opens a DB connection
DSN="host=localhost port=5432 user=postgres password=123456"
SELECT id, email from guest | while read id email
do
valid=$((is_email_valid $email))
echo "$id" "$valid"
done | UPDATE guest set is_valid = $2 where id = $1
## Unset the DSN to close the connection
unset DSN
Ideally, we would write custom shell built-ins in C, so we get full control over each db api invocation, but to keep things simpler, let's look at doing this purely within the shell using co-processes. All the samples here are written for ksh93
, but should be easily adaptable to any other shell.
To achieve this, we will be tackling the following:
- Implement discipline functions for the
DSN
variable, so the DB connection is opened and closed when this variable is set and unset. - Launch the
psql
utility as a co-process from these discipline functions, so we can communicate with it usingprint -p
andread -p
. - Define
SELECT
andINSERT
functions to use the co-process to execute the database operations. You can modelDELETE
andUPDATE
based on these examples.
|&
is the way to create a co-process. A co-process is just like any other background job, except it gets a set of i/o pipes that can be accessed from the current process using read -p
and print -p
.
To close the background co-process, we simply close the pipe p
for i/o using the >&-
and <&-
commands.
The key with the SELECT
function is to customize the behavior depending on where the output is going. For terminal output, we want to make it easier for the user to read. However, for a file or a pipe output, we want the data to be more machine-readable.
The key to note in the _dbq
function is that you do not want to get stuck in a read
from the co-process. For each print -p
, we need to make sure that we read -p
all the output, but not keep waiting for anything extra to avoid a deadlock.
Since each query can return an unknown number of records, we need a pre-defined marker to determine when the output of the query is complete. That is the purpose of the DNSSEP
command. We ask the psql
co-process to echo it after each query. Then, in the read loop we break when we find the $__DNSSEP
. This guarantees that we always read the necessary amount of data from the co-process.
Similar to the SELECT
function, we check if the input is from the terminal or a file/pipe. For terminal input, run the query as-is. For a file/pipe input, we want to read the values in from the input and process the records in batches using the COMMIT_LIMIT
.
Here is how this all works from the command-line.
$ DSN="host=localhost port=5432 user=postgres password=123456"
$ echo $DSN
( db=host\='localhost port=5432 user=postgres password=123456' pid=47325 )
$ SELECT \* from guest
-[ RECORD 1 ]------------
id | 10
email | xxx
is_valid | t
-[ RECORD 2 ]------------
id | 1
email | test@test.com
is_valid |
-[ RECORD 3 ]------------
id | 2
email | test@test.com
is_valid | f
(END)
$ SELECT \* from guest > guest.dat
$ cat guest.dat
10|xxx|t|
1|test@test.com||
2|test@test.com|f|
$ DELETE from guest
$ IFS=\| INSERT INTO guest < guest.dat
INSERT 0 3
$ SELECT count\(\*\) from guest
-[ RECORD 1 ]
count | 3
(END)
In the next post, we will see how we can implement these functions as C
builtins and enable a lot more sophisticated functionality.