Tags
agile agile testing Android Apple bash BCS BCS South West browser BT Home Hub 2.0 development DNS email event Firefox geolocation HTC Hero learning linux location aware browsing mobile non-technical privacy Problem & Solution professional review scripting server terminal testing tips Wireless NetworkingTwitter
- Using Bash to Change the Delimiter in a CSV File: Â The Problem I recently had a situation where I had a comma s... http://t.co/WaoORVD5 2012/01/23
- @James_Hellyer Not something I can Ebay unfortunately, shame, as I'm sure there'd be a huge demand for black market liposuction ;) 2011/11/30
- Anyone want a brown Ben Sherman leather jacket? 'cus I have a jacket like that on Ebay, you should all go bid on it ;) http://t.co/X0CSVRa4 2011/11/30
- Hello Twitter... its been a long while... 2011/11/15
- Job opportunity: Graduate Developers - Java, Tcl, J2EE, C at OpenBet Ltd - London, United Kingdom #jobs http://t.co/3ptpTOBE 2011/11/15
Using Bash to Change the Delimiter in a CSV File
The Problem
I recently had a situation where I had a comma separated value (CSV) file that I wanted to easily parse within a shell script. Unfortunately the CSV data contained some double quoted strings with embedded commas, for example:
This made parsing the file quite painful, particularly as only the strings with an embedded comma were double quoted like this.
The Solution
I devised a utility script that can parse the data and replace the delimiter with a new character of the users choice, which makes CSV files far easier to work with if you pick a character you know won’t be in your data.
The Script
This script borrows the core CSV parsing from this rather good post on CSV parsing with awk, but I’ve edited it to allow the substitution of the delimiter and ensure it still outputs a single record per line.
#!/bin/bash input=$1 delimiter=$2 if [ -z "$input" ]; then echo "Input file must be passed as an argument!" exit 98 fi if ! [ -f $input ] || ! [ -e $input ]; then echo "Input file '"$input"' doesn't exist!" exit 99 fi if [ -z "$delimiter" ]; then echo "Delimiter character must be passed as an argument!" exit 98 fi gawk '{ c=0 $0=$0"," while($0) { delimiter="" if (c++ > 0) # Evaluate and then increment c { delimiter="'$delimiter'" } match($0,/ *"[^"]*" *,|[^,]*,/) # save what matched in f s=substr($0,RSTART,RLENGTH) # remove extra stuff gsub(/^ *"?|"? *,$/,"",s) printf (delimiter s) # "consume" what matched $0=substr($0,RLENGTH+1) } printf ("\n") }' $inputSample Input
$ cat testprovinces.csv
Province,ProvinceCode,CriteriaId,CountryCode,Country
Australian Capital Territory,AU-ACT,20034,AU,Australia
Piaui,BR-PI,20100,BR,Brazil
"Adygeya, Republic",RU-AD,21250,RU,Russian Federation
Bío-Bío,CL-BI,20154,CL,Chile
Sample Output
$ ./change-delimiter testprovinces.csv '^'
Province^ProvinceCode^CriteriaId^CountryCode^Country
Australian Capital Territory^AU-ACT^20034^AU^Australia
Piaui^BR-PI^20100^BR^Brazil
Adygeya, Republic^RU-AD^21250^RU^Russian Federation
Bío-Bío^CL-BI^20154^CL^Chile