How to clean up your data in the command line

Full Article :

Opensource.com - https://opensource.com/article/18/5/command-line-data-auditing

I work part-time as a data auditor. Think of me as a proofreader who works with tables of data rather than pages of prose. The tables are exported from relational databases and are usually fairly modest in size: 100,000 to 1,000,000 records and 50 to 200 fields.

I haven’t seen an error-free data table, ever. The messiness isn’t limited, as you might think, to duplicate records, spelling and formatting errors, and data items placed in the wrong field. I also find:

read more

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: