Tuesday, October 18, 2011

Using linux 'seq' with large numbers

Sometimes I deal with quite a lot of data (IMO). Occasionally this has to be split into smaller files in order to be processed. I use 'seq' quite a lot for generating program lists to work on these files.

I ran into an issue the other day; I wanted to include the line-number offset in the filename of the files I was generating, unfortunately as soon as I hit a million lines the 'seq' numbers started to use scientific notation (i.e. rather than 1000000 seq output 1e+6) - unfortunately, this wasn't compatible with some of the downstream processing.

The 'seq' manpage seemed to claims that it accepts printf formatting arguments. So I tried running 'seq -f "%d" 0 10000 160000000' and 'seq -f "%i" 0 10000 160000000' but neither of these were recognised. It turns out that seq actually only recognises the printf style floating-point format... so to get it to work as desired you have to use "%.0f" instead:

> seq -f "%.0f" 1000000 1000000 10000000
1000000 
2000000 
3000000 
4000000 
5000000 
6000000 
7000000 
8000000 
9000000 
10000000