Fun with Pipes: Copy Directories Across Servers With SSH

Many UNIX experts would swear by the use of pipes in the day-to-day operations, but very few know to combine it with ssh to perform operations across different servers.

The following command would copy the files contained in $SOURCE_FILES to $TARGET_DIR (any directory structure in $SOURCE_FILES will be created under $TARGET_DIR on the remote machine:

export SOURCE_FILES=file1
export TARGET_DIR=/copy/file/here

tar -cvf - $SOURCE_FILES | gzip -9 -c \
    | ssh $REMOTE_MACHINE "(cd $TARGET_DIR; gunzip -c | tar -xvf -)"

You might wonder why you should not just use sftp or scp to perform the same function... few reasons:

  1. Not every server allows scp, since it requires interactive login for accounts and some service accounts are not allowed to perform interactive logins for security reasons. sftp is a subsystem build into sshd, so its a lot more restrictive and secure.
  2. sftp does not preserve the file permissions and behaves like ftp and marks all files as non-executable.
  3. If you are behind multiple servers, which again is a network segmentation/security technique, you would have an edge server between your laptop and the destination server. This means you need to have enough disk space available on the edge server to stage the files before pushing them to the final destination. Following is an example of how you can extend this pattern to hop across multiple servers without requiring any disk space on intermediate servers.
export SOURCE_FILES=file1
export TARGET_DIR=/copy/file/here

tar -cvf - $SOURCE_FILES | gzip -9 -c \
    | ssh $HOP_MACHINE \
        "ssh $REMOTE_MACHINE '(cd $TARGET_DIR; gunzip -c | tar -xvf -)'"

This same pattern can be used to execute arbitrary code which reads from stdin and processes it. For example, I use this approach to dump data from an application server to a hadoop cluster via an edge server, without requiring to stage the files on the edge server.