Thursday, April 23, 2009

Unix Join with tabs

This one had me scratching my head for a while yesterday.  When you use the unix 'join' command with the default separators (whitespace) then you can join tab delimited files, but the output is space delimited.

I tried specifying the join character with -t "\t" or -t '\t' and even the despearate -t\t, but none of that works.  Turns out you have to use an *actual* quoted tab character (which I found through a quick google search, the solution was here, thank you JJinuxLand).  

You can insert a tab on the command line with the following key combo "ctrl-v <tab>".

Update: the easier way of doing this is to use the special quoting construct $'string' e.g. join -t $'\t'
See the "QUOTING" section of the bash man page for details.

14 comments:

  1. Great!
    You saved me. :D

    ReplyDelete
  2. Thanks! I really didn't expect to find the answer to this problem in a 2009 post

    ReplyDelete
  3. Thank you, I struggle with this every time it comes up. Which is often. It should've been simpler.

    ReplyDelete
  4. ack, doesn't work on a mac! ctrl-v, before or in conjunction with hitting the tab key, doesn't produce the desired effect. Command-v (Command is sometimes used in place of Ctrl on macs) is assigned to the Paste function.

    Help again! Please!

    ReplyDelete
  5. Tim, I've not got a mac (and have never used one), so I can't really help with the control key confusion. However, in this kind of situation I end up resorting to Perl (is Perl installed/available?).

    To split a tab delimited file and re-join the first and second fields I'd do this:

    perl -F'\t' -lane 'print join("\t", @F[0,1])'

    There's quite a lot going on in the background with this command. Perl is auto-splitting each line of the the input file into the @F array (split delimiter is specified by -F flag). I'm then re-joining each line with the perl join command (first parameter is the join character all the remaining paramters are joined together with this character. The -l flag removes newlines from each input line but also adds them to each print statement so you don't have to worry about them.

    ReplyDelete
  6. Totally weird on a Mac with a PC keyboard - if you use ctrl-V and quickly hit the tab key, it works. However, if you hold down ctrl-V too long you get "^V" repeated until you release the keys.

    ReplyDelete
  7. Cheers! Thanks for saving me that frustrating afternoon fixing the problem! Now to re-run my code on 7GB of files--argh!!

    ReplyDelete
  8. Awesome tip, saved me hours of frustration! (AIX 6.1)

    ReplyDelete
  9. Very nice post! it puts me on the right way.
    I had the further problem of using join *in a script* with tab separators.
    Suppose your editor is set to not insert tabs...

    In bash you can do
    tab=`echo -e '\t'`
    join -t "$tab"

    but then I found an even smarter solution
    http://stackoverflow.com/questions/1722353/unix-join-separator-char

    ReplyDelete
  10. Thanks.. This saved my time.

    ReplyDelete
  11. This blog post: http://www.52nlp.com/error-join-multi-character-tab-t-for-using-join-tab/

    says you can use: $'\t'

    ReplyDelete
    Replies
    1. I've been using $'\t' for a while as well:

      http://geekbraindump.blogspot.co.uk/2013/06/better-tab-use-with-bash.html

      Maybe I should update this post.

      Delete