Data Cleaning/Entry/Maintenance Best Practices

Latest post 09-18-2008 1:13 PM by MeganKeane. 8 replies.

Data Cleaning/Entry/Maintenance Best Practices

09-09-2008 7:35 AM

Hi All,

I work for a nonprofit that coordinates a number of agencies statewide, each of which have recently transitioned from keeping paper records to keeping track of clients a database.

The database managers, however, are not necessarily very tech-savvy people and are having some difficulty keeping their data clean. I'm trying to find resources on what steps they can take to help maintain clean databases (we're not looking for automated software cleanup, just processes that they can follow to keep their data clean).

Anyone have resources on data maintenance/clean up best practices?

Thanks!
Janaki

RE: Data Cleaning/Entry/Maintenance Best Practices

09-09-2008 2:39 PM

Hi, Janaki,

What a great question! The fact that you are giving it attention means that you already know how important it is to keep your data clean.

Make sure that your database managers also know this. Do they understand the consequences of 'dirty data'? They may not, yet.

From talking to people who do a good job at keeping their databases clean, here are some of the best practices that I have heard:


  • Have one person in charge of overseeing database entries. That person is responsible for making sure that data is entered consistently. This works well if it is also the person who does most of the data entry.

  • Limit the number of people who enter data. You might want to give some people access to view the database, but not to enter data.

  • Train each person who does database entry carefully on the correct procedures. The fewer number of people you have making entries, the easier this will be.

    Your software vendor may offer some kind of initial training. It's a good idea to make sure that everyone who is working in the database has taken at least this training.

    Be careful about using people who are unfamiliar with procedures. Temp workers or new volunteers are not good options for database entry. If you are really in a pinch, you might want to use them to record smaller donations, where entry is straight forward. In that case it is good to have someone go back over the records later to verify entries and add notes.


  • Make sure that each person who does entry has a clearly defined job.

    For instance the main person inputting data may also be responsible for all of the output except, one particular aspect like volunteer coordinating.

    Another example would be where one person is responsible for all database work that is not handled by someone else. In particular, that person's jobs might include appeal mailings and making notes about donor preferences. Another person might be in charge of recording donations, assigning corresponding motivation codes and sending thank you letters. The development director may just access the program to view records, and print out reports.



Hope that helps,



RE: Data Cleaning/Entry/Maintenance Best Practices

09-10-2008 6:31 AM

Janaki

Sasha has given you a lot of good advice. The only thing that I would add is that if you use features in data entry software such as MS Access forms to restrict data entry and exclude some obvious mistakes and errors make sure that someone knows how to over ride the restrictions if they have to.

In my experience it is very difficult to anticipate all eventualities such as when the name field of a database I was managing had to cope with Patrick Gordon Duff Pennington and I needed to use toes as well as fingers to count the number of characters.

Peter

RE: Data Cleaning/Entry/Maintenance Best Practices

09-10-2008 7:25 AM

Take a look at the United States Postal service web site for their guidelines for Address and City formatting guidelines. If you do snail mail, the cleaner the address the better mailings you will have. You can also use that as a frame work for the guidelines on the rest of your data fields. usps addressing standards

Dave

RE: Data Cleaning/Entry/Maintenance Best Practices

09-12-2008 6:17 AM

Thank you all for your great advice!

We try as much as possible to limit the number of people entering data, but for some of our agencies having one designated database manager is difficult, as each employee is forced to wear a number of hats. Clearly defined jobs aren't yet a possibility for us (well, maybe if we get more funding!) Since they are resource & referral agencies, often the referral specialists are also the ones entering the data.

The concept of having one person responsible for a particular type of output is one that could be very useful for us.

Since we cannot do too much with limiting access to the databases right now, I think that we will have to shore up our training as much as possible.

Thanks again for all your thoughts!
Janaki

RE: Data Cleaning/Entry/Maintenance Best Practices

09-12-2008 8:05 AM

Hi Janaki,

Good luck with the project. If you run into any specific problems, let us know. Maybe we can help sort them out.

Best wishes,

RE: Data Cleaning/Entry/Maintenance Best Practices

09-17-2008 8:59 AM

A standards guide can be helpful too, covering abbreviations, formatting etc. Though this takes time to develop, it will ultimately limit the need for clean-up and make text searches easier.

Furthermore, as the database's development occurs, consider the use of limited-response field formatting, such as uneditable dropdowns and radio buttons, to keep data formatted uniformly wherever possible.

RE: Data Cleaning/Entry/Maintenance Best Practices

09-17-2008 10:25 AM

As regards clean data, there are several areas may be useful to others who care to use GIS to map member/donor locations and/or eliminate duplicate records to make mailings and phone contacts more efficient.

1) Have a person print/list a count of zip codes by city for all (or just new) entries. Use the United States Postal Service web site to get a list of valid zip codes and official cities (versus local names for suburbs).
Enter any missing zip codes and change obvious common local names to Post Office standards.

2) As noted in one post, use data validation routines to check the spelling of common city names. In Access or another database program print/list cities by zip, with a count of occurances. To start, just get a count by city, then a count by zip to note and correct obvious typos.

3) Do a count of addresses by donor/member name (i.e. eliminate duplicate 123 Maple Street #3 vs 123 Maple Street, Apt #3; eliminate duplicate 123 N. Maple versus 123 N Maple).

4) Do a count of member/donor names for each address and/or phone number. (i.e. eliminate Bob and Robert as separate entries).

5) In Access or another database program, get a count of each unique name and address. Probably don't want dupicates.

6) If using phone numbers, do all have area codes? List name and address for any without an Area Code.

7) Count names/addresses for each phone number using Access or other database program. Be sure to include cross-checking for cell/voice/work/home/etc. phones to eliminate duplicate calls to the same number (or recognize this is a call to multiple people (?)).

Properly entered, addresses can be geo-coded to lat/long for mapping, or the addess zip codes can be used to plot member locations.

RE: Data Cleaning/Entry/Maintenance Best Practices

09-18-2008 1:13 PM

Hi, Robert_H, welcome to TechSoup! Thanks for the informative post--this is really useful--don't know why I never thought about using GIS for this purpose before, but it could really help with cleanup!

Best,

Megan