As discussed in part 1 of this series, we are trying to build an idiomatic way of accessing the database from the shell. Here is a quick snippet of how we want the code to look like:
## Setting the DSN opens a DB connection DSN="host=localhost port=5432 user=postgres password=123456" SELECT id, email from guest | while read id email do valid=$((is_email_valid $email)) echo "$id" "$valid" done | UPDATE guest set is_valid = $2 where id = $1 ## Unset the DSN to close the connection unset DSN
Ideally, we would write custom shell built-ins in C, so we get full control over each db api invocation, but to keep things simpler, let's look at doing this purely within the shell using co-processes. All the samples here are written for
ksh93, but should be easily adaptable to any other shell.
To achieve this, we will be tackling the following:
- Implement discipline functions for the
DSNvariable, so the DB connection is opened and closed when this variable is set and unset.
- Launch the
psqlutility as a co-process from these discipline functions, so we can communicate with it using
INSERTfunctions to use the co-process to execute the database operations. You can model
UPDATEbased on these examples.
|& is the way to create a co-process. A co-process is just like any other background job, except it gets a set of i/o pipes that can be accessed from the current process using
read -p and
To close the background co-process, we simply close the pipe
p for i/o using the
The key with the
SELECT function is to customize the behavior depending on where the output is going. For terminal output, we want to make it easier for the user to read. However, for a file or a pipe output, we want the data to be more machine-readable.
The key to note in the
_dbq function is that you do not want to get stuck in a
read from the co-process. For each
print -p, we need to make sure that we
read -p all the output, but not keep waiting for anything extra to avoid a deadlock.
Since each query can return an unknown number of records, we need a pre-defined marker to determine when the output of the query is complete. That is the purpose of the
DNSSEP command. We ask the
psql co-process to echo it after each query. Then, in the read loop we break when we find the
$__DNSSEP. This guarantees that we always read the necessary amount of data from the co-process.
Similar to the
SELECT function, we check if the input is from the terminal or a file/pipe. For terminal input, run the query as-is. For a file/pipe input, we want to read the values in from the input and process the records in batches using the
Here is how this all works from the command-line.
$ DSN="host=localhost port=5432 user=postgres password=123456" $ echo $DSN ( db=host\='localhost port=5432 user=postgres password=123456' pid=47325 ) $ SELECT \* from guest -[ RECORD 1 ]------------ id | 10 email | xxx is_valid | t -[ RECORD 2 ]------------ id | 1 email | firstname.lastname@example.org is_valid | -[ RECORD 3 ]------------ id | 2 email | email@example.com is_valid | f (END) $ SELECT \* from guest > guest.dat $ cat guest.dat 10|xxx|t| firstname.lastname@example.org|| email@example.com|f| $ DELETE from guest $ IFS=\| INSERT INTO guest < guest.dat INSERT 0 3 $ SELECT count\(\*\) from guest -[ RECORD 1 ] count | 3 (END)
In the next post, we will see how we can implement these functions as
C builtins and enable a lot more sophisticated functionality.