This is mainly somewhere for me to dump interesting tidbits of tech info. Stuff that I may forget but will need again in the future. For a proper blog, visit my electronics blog (http://www.fangletronics.com) or my wife's pre-school crafty blog (http://www.filthwizardry.com).
I use tab delimited files a lot. I hate comma separated values (and if you've ever had to deal with fields that could contain commas, you'll agree). If you like to use a good set of linux command line tools, then tabs sometimes cause problems.
One of the most viewed posts on this blog is the unix-join-with-tabs one where I describe the "Ctrl-v " method of inserting a tab character on the command line for things like cut -f':' --output-delimiter=<tab> where the output delimiter needs to be a tab.
However, there's a much nicer/easier way of specifying a tab character in bash: $'\t'. Take a look a the QUOTING section of the bash manpage for all the details.
Here's a simple example of it in action:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm going through another data cleanup session in a (very) old work directory tree. I found myself examining/compressing/removing files with a recurring set of suffixes. The 'find' command can make this all a lot less painful.
Here's the command for matching a single suffix:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
But editing this command-line to match the next suffix of interest becomes tedious very quickly. Thankfully, you can chain together file tests like so (note the grouping):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Then acting on these files is easy - just update the -exec action to what you want (e.g. "-exec bzip2 {} \;" - you probably want to use xargs or "-exec bzip2 {} +" for this to reduce the number of command invocations)
An interesting note here is that the following command isn't executed as you might expect. The 'ls' command is only executed on the *.tsv files due to the way the expression is evaluated: from left to right with the implicit '-and' between the second '-name' and '-exec' exec having higher precedence than the '-or' between the two '-name' tests..
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm a research scientist at a small start-up biotech company. I do a lot of programming at work and have been enjoying electronics at home when the kids are in bed.
Contact: paulbo@fangletronics.com