GeekBrainDump: Use csplit to split SDF files (or contextually split any file)

Thursday, February 11, 2010

Use csplit to split SDF files (or contextually split any file)

Say you want to split an SDF into individual entities, you could write a Perl script/one-liner (which is what I've been doing for a long time) or you could just use csplit. Thanks to Pat and Jessen for pointing this one out.

e.g. say you had an SDF, test_mols.sdf, with 8 molecules in it and you wanted individual mol files:

> csplit -kzsf "test_mols" -b %0d.mol test_mols.sdf /\$\$\$\$/+1 {*}

This would result in 8 files called test_mols00.mol through test_mols07.mol. Unfortunately these would still contain the SDF delimiter at the end of the file (so, technically these are still SDFs). That's pretty easy to clean up with something like:

> perl -ni -e 'print unless /\$\$\$\$/' *.mol

See the csplit manpage for more details.

5 comments:

plinehanFebruary 11, 2010 at 11:32 AM
I think Brandon deserves credit, too. Teamwork! :)
ReplyDelete
Replies
AnonymousJune 9, 2011 at 11:44 AM
Hi guys,

Thanks a lot for your post. I had taken the perl route and was happy to find an alternative.

For info, I've had to double exit the $. The following line worked for me (suffix had to accommodate 4 digits):
csplit -kzsf "Prefix" -b %04d.mol ./Original.sdf /^\\$\\$\\$\\$/+1 {*}

Keep up the good work!
ReplyDelete
Replies
AnonymousOctober 25, 2011 at 5:34 AM
Hi all,
I'm new to shell scripting. If I wish to split a big sdf file into smaller sdf files with ~200 molecules per file, should I be writing:
csplit -kzsf "test_mols" -b %0b.mol test_mols.sdf /\$\$\$\$/+200 {*}
?
(or
csplit -kzsf "Prefix" -b %04d.mol ./Original.sdf /^\\$\\$\\$\\$/+200 {*}
?)
Thanks!
ReplyDelete
Replies
PaulBoOctober 25, 2011 at 11:25 AM
This comment has been removed by the author.
ReplyDelete
Replies
N@N!September 21, 2012 at 1:46 AM
hi
I am having a dataset of 2,00,000 compounds in SDF format. I want to split them into 50 subfiles. can u suggest me the command for performing this task as i am new to programming.
REGARDS !
ReplyDelete
Replies

Add comment

GeekBrainDump

Thursday, February 11, 2010

Use csplit to split SDF files (or contextually split any file)

5 comments:

Blog Archive

Labels

About Me

GeekBrainDump

Thursday, February 11, 2010

Use csplit to split SDF files (or contextually split any file)

5 comments:

Subscribe To

Blog Archive

Labels

About Me