Tuesday, June 4, 2013

Using 'find' to list files with multiple suffixes

I'm going through another data cleanup session in a (very) old work directory tree. I found myself examining/compressing/removing files with a recurring set of suffixes. The 'find' command can make this all a lot less painful.

Here's the command for matching a single suffix:
find . -name "*.csv" -exec ls -ls {} \;
view raw gistfile1.sh hosted with ❤ by GitHub

But editing this command-line to match the next suffix of interest becomes tedious very quickly. Thankfully, you can chain together file tests like so (note the grouping):
find . \( -name "*.csv" -or -name "*.tsv" \) -exec ls -lh {} \;
view raw gistfile1.sh hosted with ❤ by GitHub

Then acting on these files is easy - just update the -exec action to what you want (e.g. "-exec bzip2 {} \;" - you probably want to use xargs or "-exec bzip2 {} +" for this to reduce the number of command invocations)

An interesting note here is that the following command isn't executed as you might expect. The 'ls' command is only executed on the *.tsv files due to the way the expression is evaluated: from left to right with the implicit '-and' between the second '-name' and '-exec' exec having higher precedence than the '-or' between the two '-name' tests..
find . -name "*.csv" -or -name "*.tsv" -exec ls -lh {} \;
# this is actually executed as:
find . -name "*.csv" -or \( -name "*.tsv" -exec ls -lh {} \; \)
# when what you wanted was:
find . \( -name "*.csv" -or -name "*.tsv" \) -exec ls -lh {} \;
# the general recommendation is to use xargs inplace of the -exec action
find . -name "*.csv" -or -name "*.tsv" | xargs ls -lh

No comments:

Post a Comment