Following several recent conversations, I feel inspired to remind everyone of the facts behind business data. This is not a criticism, but it is a realistic view, because I sometimes think that with all the smoke and mirrors we use, we can forget what we are working with here. Data quality is relative. If you build a shelf with a warped piece of wood it was always be warped, even if the shelf itself does its job perfectly well.
Business data is sourced from two major feeds. Firstly, Companies House data, which comprises all the things limited companies have to register, and secondly, directory data. The major players in the industry match and merge these two sources, for the following major reasons.
1) Neither source is a fully mailable list. CH data has registered office addresses, not trading addresses, for instance, and no usable telephone numbers. Directory data does not have a list of company directors, or any idea of ownership.
2) Not every company is limited, therefore the CH list is by no means complete. Directory data picks up a lot of sole traders, partnerships etc.
3) Not every company registered at CH is trading.
4) Directory data is enhanced by adding some idea of turnover, SIC codes, lists of directors and ownership details.
There is more, but I want to stick to top line stuff here.
Once this match and merge is done, you will have a business universe of about 1.8m. There are more businesses out there, and many suppliers will claim to have them, but there are several problems with them. The data in excess of 1.8m is often on very small businesses, nearly always out of date, and this is the section of the universe with the most churn. I would also argue that it contains the businesses of least interest to DM...very small businesses, one man bands etc.
Having got to this point, industry players start to 'add value'. This can be done in the following ways:
1) Match universes with each other to create a mega-verse. In my experience, this always throws up a few hundred thousand names that are not common between any two datasets...you then have to decide whether these names are valuable or not. Mostly not, I fear, as they represent the old, the gone and the never were.
2) Add other data sets. Such as credit data. This can work well, and add some interesting information.
3) Tele-marketing to enhance data. Good tele-marketing adds names, responsibilities and key bits of information that can be used to predict buying activity, which end up being selections on databases. Expensive to research upwards of 2m, so no one does...unless you know different! In practise, sections of the universe are well researched by some, but watch out for recency. Anything approaching a year can be 40% inaccurate, especially in an economic climate like this one!
4) Modelling and analysis. Modelling involves taking a cross section of businesses and modelling them against certain criteria, then applying the results to the entire universe. Thus it works like an opinion poll. If you believe them, buy on a model. If you don't, don't! Analysis is the appliance of brain power. I know and have worked with some very clever people who can do wonderful things, but I point you back to the shelf example. If you analyse dog poo, it will still stink to high heaven, no matter how hard you think about it.
I am going to stop here for now. I hope this prompts some sort of debate about data quality.
no comments
Just a snippet on the Today programme this morning, but it made me sit up and take notice. Apparently old Golden and his crew have suggested that the organs of us all should be available for doner transplants, unless we opt out. Presumably before we die. It had all the interest groups up in arms, and is unlikely to get through, but it is another example of political double standards.
The theory was that it would solve the lack of suitable organs available for life-saving transplants - undoubtedly a laudable objective, but...
The same government would like all marketing to be opt-in, but when it suits them they are happy to impose opt-out.
It must be quite hard trying to face both ways at once!
5 comment(s)
Now here is a thought. Andy from Infouk replied to my last blog in defence of his new website and data quality. Good for him. Nice to see someone who is prepared to stand up and be counted. So why not challenge everyone to face a data MOT, to drive the systematic failures out of our B2B dataset.
Obviously for some, that would be like turkeys voting for Christmas, but one can dream. Anyhoo, here is a few simple tests we could apply to see that each dataset at least comes up to basic scratch. I'll also add a dodge I know has happened to show you how easy it is.
1) How big is your universe? Most people will claim +2m, but in reality most of the big datasets are really around 1.8m. The rest is old, suspect and unconfirmed for a considerable period of time, and will never appear in counts.
2) How do you verify your database and how often do you reverify? In practise, very little proactive verification goes on. Companies House data is matched against directory data to provide trading addresses, then maybe credit data is added. Sometimes this happens the other way around, starting with credit data. Telephone research is commonly only done on a cross-section of that...and sometimes this research is an internal sales team, with another purpose entirely, using the database and just confirming that contacts and numbers are still alive by default.
3) What do you do with goneaways? Nothing is quite often the answer. One major supplier had, up until the first quarter of 2008, not done a deep cleanse of their universe in FIVE years.
4) When you reverify your date, do you speak to each individual or do you just confirm with the person who answers the phone? The latter is by far the most practical solution, and therefore the most common. Therefore the individual concerned has rarely opted-in to anything at all.
5) How do you confirm job titles and responsibilities on the contacts you have? On small businesses, it is nearly always the same person! Try asking them to de-duplicate their total contacts universe, and watch it shrink before your eyes!
6) How big is your telephone research team? Then do the maths yourself. In my experience, a good researcher, chasing not only company details but multiple contacts as well, will speak to between 6 and 8 people an hour...and they might need to ring back to get the full information, because these calls can drag on. A database of 2m, fully verified, would need 3m+ phone calls a year.
That's enoough to be going on with for now...I wonder who would pass?
By the way, a personal aside. Did you see the future of English footnall last night? Go Gunners, go!
2 comment(s)
Infouk.com was not a disappointment...I am sure it works just fine. However, it is more of the same. Golly, an online counting tool...why didn't someone else think of that? Actually they did...they built it, put it out there, and discovered that hardly anyone used it.
Maybe it is intended as a shop window, but if that is all it is, why bother?
Using a search engine of any kind takes some skill. Using a search engine on a fairly limited universe, where it is very easy to be way too specific, usually means that punters end up with a mailing list of three...and two of them will probably be the same business. That is why the likes of marketfile invest in a telephone team to track website users, and call them if they think they need help. Very few people place an order without assistance off line.
Even list brokers, God bless the few of them that are left, need help and advice to get the right data. InfoUK would have been better off investing in some experienced sales people to offer the right advice. I am sure they have them, but they should be up front, not hidden behind the technology.
InfoUK is an interesting exercise, with bad timing. Not a great time to enter the market. Their universe is not complete yet, and as I predicted when they announced their arrival on the scene, there is no differentiation. It is also interesting to read on their website that the DMA somehow back their email lists. That is news to me. And if they really have 1,200,000 email addresses double opted in, from their own sources rather than licensed from someone else...
Somke and mirrors...once again, we cannot see the wood from the tree's.
By heck, this is bad. I know we don't want to admit it, but for the time being marketing is about as dead as an estate agents inbox. It is one of business life's true truisms, that when recession bites marketers are driven out of the city. If one in ten of people are going to lose their jobs in the next year, as some portions of the media are predicting, then about half of them will be from the sales and marketing departments.
You can tell already. The news bulletins feel a little filled to me. One data supplier taking on six new people having recently got rid of three times that is fairly desperate PR. Lots of clients moving around, slicing a few percentages off the price every time.
No big surprise. It happens every time, but to me, this feels different. A lot of the names being destroyed...and let's face it, they are...are big DM spenders. If consumer spending really does stop...if small businesses do start to fail at record levels...are we really all screwed?
Well, the answer for many of us is yes. Because there are a lot of me too companies out there, and not all of them will survive. In the end...true romantic that I am...I believe that only the strong will survive. Not the good, but the strong. They are two different things. Consolidation will happen, but with little money slopping around it will be painful, unwelcome consolidation.
Big is not always better. No, be honest Hugh, big is hardly ever better. Data driven marketing is about innovation and insight, and that rarely comes from big organisations.
I give it until Christmas. Come January the blood will be all over the walls. Good luck to each and every one of you.
The most amazing throw away line I've heard from a politician since Tony Blair said he wasn't lying in the Ecclestone affair. He can't promise to keep our data safe? Good grief, has anyone really considered what that means? He means he can't stop contractors losing memory sticks in pub car parks, or ministers going to sleep on the train with their laptops on, or civil servants getting off the train and leaving the contents of their briefaces behind with a dog-eared copy of the evening paper - but what he is saying is that it is impossible to keep data safe.
Utter claptrap, to put it politely. Good security procedures, proper encryption, extensive training for all personnel and immediate dismissal without pay offs for the miscreants would go a long way. Why let ANYONE download sensitive data to a memory stick? In this day and age they should be able to log in remotely and do what they need to do, without storing any data locally. I wrote a blog months ago about being able to put the business in your pocket and walk out the door, so people should not be able to leave the building with CD's or memory sticks.
Obviously Golden Brown has some previous here. He hasn't managed to keep our pensions safe, our savings safe or our economy safe. Maybe his mission about data is not so much of a surprise...his record is remarkably consistent. Interestingly enough, the ICO won't fine the government like he would a commercial operation. So they will just sack some poor scapegoat and move on, and we cannot let them.
After all, Gordie and his frighteningly rotten chums intend to hold every scrap of data on everyone. We'll soon be able to find our entire identities by the bin in the pub car park, next to that discarded kebab and an empty packet of Silk Cut.
Here's a suggestion, Mr Borwn. Make the CEO of any organisation personally responsible for data loss, including civil servants and ministers. Then watch how quickly you can come up with some effective safeguards, before you find yourself on the way to the Scrubs.
Hugh Bessant
Blogging for:
Member since: 03 Jun 2008
Last login: 18 Nov 2009
Total Posts: 323