Filling in the Empty Space – The Personal Data Store

I said here that at present there are very few genuine VRM tools available right for use right now, and that the main reason for that is that the underlying plumbing is not yet in place at any kind of scale.

By ‘plumbing’, I mean that ‘personal data stores’ and all that they imply are not as yet deployed en masse, or with any degree of robust functionality.

Before we get into what it will take to change that, let’s take a look at what I mean by the term ‘personal data store’, because obviously that is open to interpretation, and indeed this has been the subject of much debate in the Project VRM community. To get to the heart of that, I think it is useful to draw a parallel to the deployment of data warehouses within organisations, a process which began some 30 years back, and continues to evolve and extend today. The raison d’etre for a data warehouse within an organisation is normally to pull together the data from multiple operational sources (silo’s), organise that data, enhance it and make it available for use – whether that be for analysis within the warehouse, or via applications that will tap into it. Pulling data in from multiple operational systems is the key, because what is being acknowledged is that no one operational system can pull together a data set that is sufficiently rich, deep and broad to enable all of the functions required to run the organisation. That is to say, we need to distinguish between systems that are there to fulfill a specific task (an operational system such as a CRM application, an ERP instance, or a web site), and those whose main purpose is to generate knowledge, enable understanding and enable sharing information built across multiple business functions.

A further defining characteristic of the data warehouse is that it runs on ‘atomic level’ data, that is to say data that is stored at the lowest level of detail available  from the feeder system (e.g. line item of a receipt). When data is stored in this way, it can be aggregated and summarised where appropriate or necessary for use. This then enables a further defining characteristic of a data warehouse….that one cannot predict in advance all of the uses to which the data might be put which storing at an aggregated level would limit. The same will apply in the personal data store.

So what else is involved in data warehousing that might inform our thinking about personal data stores?

Firstly, i’d suggest there is a (mainly manual) ‘discovery’ phase in both that is about identifying and engaging with valid data sources (i.e. inputs to the store). In practice the data to be sourced is driven by the prioritised functionality sought by the user. For example, if my main purpose for the personal data store is to help me manage my health, clearly i’m going to need my health and my health care supplier data, or links to it, in the store.

Next, we need to consider the personal equivalent of the ETL processes and tools deployed in data warehouses; ETL is short for Extract, Transform and Load. In recognising the likely need for ETL equivalents, we imply that:

a) the personal data store will have its own target data schema (design), with greater of lesser degrees of flexibility built in dependent on technical choices. I think there will necessarily be open standards around personal data store design. That’s not the case in the data warehousing world (Oracle, SAP, Teradata, IBM are all largely proprietary), but I don’t think that approach is sustainable for the personal variant which needs to run at greater scale and much lower cost.

b) most/ all of the data sources will not hold data in precisely the same format/ design as the target data schema.

Extract, Transform and Load usually consists a set up phase, and then automation; many ETL tools exist in the data warehouse world and it is reasonable to assume that the same will emerge in the individual space (indeed they already are tactically with data exchange formats like OFX) in the banking world for moving transaction data around. Note that ETL may only be a precursor to a direct feed from a source system into the warehouse, whether they be batch, trickle or real time feeds.

Now that we have data in the warehouse/ store the task lies in organising the data and preparing it for use; there are a range of technology candidates in this area from standard RDBMS to NoSQL databases. At this point, it may be worth diverting briefly to a harsh reality, because it is pretty certain that this same reality will apply within personal data stores. This reality is that many data warehouses actually become ‘digital dumping grounds’ into which data is put ‘in case we need it later’ (note the clash with data minimisation principles in privacy law), and/ or it is not organised/ optimised for use. That does not make them a complete waste of time necessarily, it just means that they are not providing maximum value; ……the well worn phrase ‘Garbage In, Garbage Out’ springs to mind. My colleague John McKean tells this story much more eloquently in his first book, The Information Masters, which dates back to 1999 but is as valid today as it was back then. His research amongst the 30 or so ‘Information Master’ organisations sets out what differentiates the tiny percentage of organisations that get mega-returns on their information investments, versus those that just plod along or suffer regular failure to get a return on investment (hint….the master’s don’t regard the issue as something that ‘the IT folks do’).

The further functions of the data warehouse/ personal data store beyond getting data in, and organising it are:

– Data maintenance, i.e. refreshing data as appropriate, and having processes to keep it up to date, whether that be static data, dynamic data, or reference data.

– Data enhancement, either through combining existing data via queries into new attributes or meta data, or by bringing in further external data (e.g. my credit rating or verification via a third party that a data attribute is accurate at that point in time, or otherwise). This verification piece is a key issue, if I can prove for example that I am a gold level flyer on British Airways, or that i’ve not had any speeding tickets in the last 5 years, or that I do have a specific illness to manage then that ultimately takes a vast amount of guesswork and waste out of the current modus operandi.

– Make available for use; i.e. providing a data access layer that enables the data to flow onward to those entitled to it, in the way that they wish  to receive it.

– Archive, there comes a time in the life-cycle of a data attribute, that it is no longer useful. This situation, which will certainly apply in a personal data store, can lead the database manager to either physically move the data elsewhere/ onto back up media (usually after building summary histories that do remain), or just leave it within the warehouse on the basis that storage cost may be less than removal costs.

Two other aspects of data warehousing are probably worth noting

– whilst initially, a data warehouse was most likely to be a single computer (perhaps costing £1m upwards to buy and install), these days the concept of a virtual warehouse is also a perfectly viable option, with data stored physically in different places and brought together as and when required.

– the concept of a data mart has emerged, which means the carving off of a specific set of data to support a subsidiary warehouse tuned to particular task (e.g. a retailer may choose to set up a mart for the team managing the loyalty scheme). Typically the link to the main warehouse remains in place for maintenance and update purposes, but the mart acts more independently in terms of access and use.

So what does all of that mean for the ‘personal data store’ then?

Firstly, I would contend that there is a terminology point to be taken on-board. The data warehouse is a short, fairly well understood term (perhaps because it is 30 years old). But it actually covers a lot of ground, and is much more than just a storage facility. It covers ‘identify relevant data types and sources, enable processes for bringing that into the storage facility, keep it clean and up to date by looking back to the source and other other cross-reference files, aggregate and summarise data where appropriate, enhance and add meta data where useful, and make available for use in a controlled, auditable manner via a range of output mechanisms and formats. That’s a lot functionality to pack into two words….. I think that a personal data store will do pretty much all of those same functions, so the users of the term should ideally aligned with that description, or seek to agree different terms for each of the system components and functions.

Secondly, there should be a recognition that functionality will continue to emerge and evolve over time, rather than all turn up in one big bang deployment.That said, there is clearly a huge upside to deploying with the technology we have available now, than that of 30 years ago. Cost of storage and back up is very low, connectivity is solid, access routes/ devices are many and the range of things that will be enabled by them using the internet/ mobile internet as the main place where this user managed information will be deployed.

Third, my working assumption is that there will be both self managed, and hosted options and that people will chose the options that best suit them and their likely uses. It is probable that stand alone personal data stores might not be that common as the market evolves, and indeed the individual buys into a wider set of personal information management capabilities (e.g.  a personal data store, a set of key applications, and a hosting/ back up service).

So, after all that, here’s my working definition of a personal data store:

A personal data store helps me gather, manage, enhance and use information from across multiple aspects of my life, and share that information under my control with other individuals, organisations, or with applications or subsidiary data stores that I wish to enable.

The key, as per above, that this is a multi-life aspect data management platform that is infinitely extensible, and not constrained by the need to operate within a silo-ed context.

Here’s a diagram that seeks to illustrate the personal data store that I think will emerge over time.

One of the big issues around data warehousing is ‘the business case’ for what is typically regarded as a behind the scenes, not very sexy investment. I think the same will apply to the personal data store, but i’ll save that post for another day…

The Personal Data Eco-System

This post is a short(ish) summary of a working session led by Drummond Reed and me at the recent West Coast VRM Workshop, and also an introduction to the Kantara workgroup in which we are going to move this debate forward. It is also part of the thinking that will short emerge in a Mydex white paper.

At the VRM workshop, we discussed the need for the concept of the Personal Data Store, what it would do in practice, and what that will ultimately enable.

Why we need such things – because individuals have a complex need to manage personal information over a lifetime, and the tools they have at their disposal today to do so are inadequate. Existing tools include the brain (which is good but does not have enough RAM, onboard storage, or an ethernet socket……thankfully), stand alone data stores (paper, spreadsheets, phones, which are good but not connected in secure ways that enable user-driven data aggregation and sharing), and supplier based data stores (which can be tactically good but are run under the supplier provided terms and conditions). NB Our current perception of ‘personal data stores’ is shaped by the good ones that are out their (e.g. my online bank, my online health vault); what we need is all of that functionality, and more – but working FOR ME.

What they will do/ enable – the term Personal Data Store is not an ideal term to describe a complex set of functions, but it is what it is until we get a better one (the analogy I’d use in more ways than one is the term ‘data warehouse’ – again a simplistic term that masks a lot of complex activity). A Personal Data Store can take two basic forms:

Operational Data Stores – that get things done, and only need store sufficient breadth and depth of data to fulfill the operation they are built for (e.g. pay a credit card bill, book a doctor’s appointment, order my groceries).

Analytical Data Stores – that underpin and enable decision making, and which typically need a more tightly defined, but much deeper data-set that includes data from a range of aspects of life rather than just that from one specific operation (e.g. plan a home move, buy a car, organise an overseas trip).

A sub-set of the individual’s overall data requirement will lie in both of the above, this being the data that then integrates decision-making and doing.

In both cases, the functionality required is to source, gather, manage, enhance and selectively disclose data (to presentation layers, interfaces or applications).

We also discussed ‘who has what data on you’ and introduced the following diagrams to explain current state and target state (post deployment of Volunteered Personal Information (VPI) tech and standards).

The key terms that require explanation are:

My Data – is the data that is undeniably within, and only within, the  domain of an individual. It’s defining characteristic is that it has demonstrably not been made available to any other party under a signed, binding agreement. This space has been increasingly encroached upon by technology and organisations in recent history (e.g. behavioural tracking tools like Phorm) and this encroachment will continue. Indeed a general comment can be made that ‘my data’ equates to privacy in the context of personal data; so the rise of the surveillance society and state is a direct assault on ‘My Data’. Management of ‘My Data’ can be run by the individual themselves, or outsourced to a ‘fourth party service’.

Your Data – is the data that is undeniably within the domain of an organisation; either private, public or third sector. Proxy views of this data may exist elsewhere but are only that. This data would include, for example, the organisations own master records of their product/ service range, their pricing, their costs, their sales outlets and channels. Customer-facing views of much of Your Data is made available for reproduction in the ‘Our Data’ intersect.

Our Data – is the data that is jointly accessible to both buyer and seller/ service provider, and also potentially to any other parties to an interaction, transaction or relationship. It is the data that is generated through engaging in interactions and transactions in and around a customer/ supplier relationship. Despite being ‘our’ data, it is probably technically owned, or at least provided under terms of service designed by the seller/ service provider; in practical terms this also means that the seller/ service provider dictates the formats in which this data exists/ is made available.

Their Data – is the data built/ owned/ sold by third party data aggregators, e.g. credit bureaux, marketing data providers in all their forms. It’s defining characteristic is that it is only available/ accessible by buying/ licensing it from the owner.

Everybody’s Data – is the public domain data, typically developed/ run by large, public sector(ish) entities including local government (electoral roll), Post Offices (postal address files), mapping bureau (GIS). Typically this data is accessible under contract, but the barriers to accessing these contracts are set low – although often not low enough that an individual can engage with them easily.

The Basic Identifier Set/ Bit in the Middle – this is the core personal identity data which, like it or not, exists largely in the public domain – most typically (but not exclusively) as a result of electoral rolls being made available publicly, and specifically to service providers who wish to build things from them. This characteristic is that which enables the whole personal eco-system and its impact on data privacy to exist, with the individual as the un-knowing ‘point of integration’ for data about them.

Propeller Current State

The ovals in the venn diagram represent the static state, i.e. where does data live at a point in time. The flow arrows show where data flows to and from in this eco-system; I use red to signify data flowing under terms and conditions NOT controlled by the individual data subject.

Flow 1 (My Data to Your Data, and My Data to Our Data) – Individuals provide data to organisations under terms and conditions set by the organisation, the individual being offered a ‘take it or leave it’ set of options. Some granularity is often offered around choices for onward data sharing and use, i.e. the ‘tick boxes’ we all know and which are one of the main bitsof legacy CRM that VRM will fix.

Flow 2 (Your Data to Your Data, including Our Data) – Organisations share data with other organisations, usually through a back-channel, i.e. the details of the sharing relationship are typically not known to the data subject.

Flow 3 (Your Data, including Our Data to Their Data) – Organisations share data with a specific type of other organisation, data aggregators, under terms and conditions that enable onward sale. Typically the sharer is paid for this data/ has a stake in the re-sale value.

Flow 4 (Everybody’s Data to Their Data) – Data Aggregators use public domain data sources to initiate and extend their commercial data assets.

The target state is shown below, a different scenario altogether – and one which I believe will unfold incrementally over the next ten years or so…..data attribute by data attribute, customer/ supplier management process by customer/ supplier management process, industry sector by industry sector. In this scenario, the individual and ‘My Data’ becomes the dominant source of many valuable data types (e.g. buying intentions, verified changes of circumstance), and in doing so eliminates vast amounts of guesswork and waste from existing customer/ citizen managment processes.

The key new capabilities required to enable this to happen are those being worked on in the User Driven and Volunteered Personal Information work groups at Kantara (one tech group, one policy/ commerce one), and elsewhere within and around Project VRM. The new capabilities will consist of:

– personal data store(s), both operational and analytical

– data and technical standards around the sharing of volunteered personal information

– volunteered personal information sharing agreements (i.e. contracts driven by the individual perspective, creative commons-like icons for VPI sharing scenarios)

– audit and compliance mechanics

Around those capabilities, we will need to build a compelling story that clearly articulates, in a shared lexicon (thanks to Craig Burton for reminding us of the importance of this – watch this space), the benefits of the approach – for both individuals and organisations.

The target state that will emerge once these capabilities begin to impact will include the 4 additional individual-driven information flows over and above the current ones. The defining characteristic of these new flows is that the can only be initiated by the data subject themselves, and most will only occur when the receiving entity has ‘signed’ the terms and conditions asserted by the individual/ data subject. The new flows are:

Flow 5 (My Data to Your Data (inc Our Data) – Individuals will share more high value, volunteered information with their existing and potential suppliers, eliminating guesswork and waste from many customer management processes. In turn, organisations will share their own expertise/ data with individuals, adding value to the relationship.

Flow 6 (Everybody’s Data to My Data) – With their new, more sophisticated personal information management tools, individuals will be able to take direct feeds from public domain sources for use on their own mashups and applications (e.g. crime maps covering where I live/ travel)

Flow 7 (My Data to (someone else’s) My Data) – An enhanced version of ‘peer to peer’ information sharing.

Flow 8 (My Data to Their Data) – The (currently) unlikely concept of the individual making their volunteered information available to/ through the data aggregators. Indeed we are already starting to see the plumbing for this new flow being put in place with the launch of the Acxiom Identity Card.

Propeller Target State

The implications of the above are enormous, my projection being that over time some 80% of customer management processes will be driven from ‘My Data’. I’m pretty confident about that, a) because we are already see-ing the beginning of the change in the current rush for ‘user generated content’ (VPI without the contract), and b) because the economics will stack up. Organisation need data to run their operations – they don’t really mind where it comes from. So, if a new source emerges that is richer, deeper, more accurate, less toxic – and all at lower cost than existing sources; then organisations will use this source.

It won’t happen overnight obviously; as mentioned above specific tools, processes and commercial approaches need to emerge before this information begins to flow – and even then the shift will be slow but steady, probably beginning with Buying Intention data as it is the most obvious entry point with enough impact to trigger the change. That said, the Mydex social enterprise already has a working proof of concept up and running showing much of the above working. A technical write up of the proof of concept build can be found here. And the market implications of this are explored in more detail in new research on the market value of VPI shortly to be published by Alan Mitchell at Ctrl-Shift.

The two hour session at the VRM workshop was barely enough to scratch the surface of the above issues, so the plan is to continue the dialogue and begin specifying the capabilities required in detail in the User Driven and Volunteered Personal Information (technology) workgroup at The Kantara Initiative. The workgroup charter can be found here. A parallel workgroup focused on business and policy aspects will also be launched in the next few weeks. Anyone wishing to get involved in the workgroup can sign up to the mailing list here and we’ll get started with the work in the next couple of weeks.

 

“The Personalisation of Today is Like Lipstick on a Pig….”

I just love that quote from James Gardner of LloydsTSB, who goes on to say…

‘No, the only way to get to markets of one is if customers make the products themselves. This is where the “mash up” I spoke of my in my last post comes in. Customers, who are able throw together bits of offers in unique ways, and then share them with other like minded customers, are the way things will eventually pan out. These are crowds at the centre of the financial services value chain, which will be highly distributed, highly chaotic, but not subject to the system risks of a centralised banking system.’

Spot on i’d say, and that’s where we’re looking to get to with Mydex – allowing the individual to genuinely be the point of integration for the personal data, and the processes/ applications/ mashups that engage with it. I don’t think banking will be the first to engage, but it will probably be a fast follower.

Scotweb 2

I’ll be speaking at this event next week in Edinburgh about VRM and the Mydex initiaitive.

Also, moves are afoot to get a Scotland based ‘chapter’ up and running to do some local pushing forward on VRM initaitives.

CRM…. meet VRM, the ‘three meeting theory’

I’ve had a couple more validations of this theory in the last few weeks, so thought i’d best write it up. My hope is that we can use the upcoming VRM Workshop to get the VRM story refined and presented so that we can reduce the number of meetings required to get to the detail of why an organisation should consider ‘VRM enabling’ itself.

So, here’s my theory:

It takes three, fairly in-depth meetings before a smart, typically senior CRM/ Customer Management/ Customer Experience executive in a large customer-facing organisation to genuinely ‘get’ VRM and where we are coming from with the project and mind-set – and thus what’s in it for them.

Here’s how it usually pans out in my experience:

Meeting One: This usually happens on the back of an existing contact who has heard/ read some snippet about ‘VRM’, or can also be in one of the more in-depth, small-group presentations that I and others have run in the last 12 months or so (mainly UK).

The outcome of this meeting, from the perspective of the CRM/ CM Exec is usually along the lines of ‘These people are well meaning, are obviously committed to their ‘hobby’, but a bit mad and naive as to what us big organisations have to deal with; but at least i’ve done my bit for keeping an eye on innovation in my space’. Alternately, the shorter meetings can be driven by ‘don’t these people realise that we’ve just spent a zillion pounds on our CRM application and need to get that to work because we’ve told everyone it will’.

Most CRM….meet VRM discussions finish at this stage….for now anyway.

Meeting Two: Let’s say that at best one in twenty of the above meetings end up with a follow up meeting, and that many of these are through personal ongoing contacts (where CRM/ CM work is going on in parallel); or that sufficient time has passed since meeting one for an update to be of possible value.

This is the meeting during which ‘the penny drops’….but typically only in connection with a very small nugget of opportunity, often one which is front of mind for the exec at that point in time. Examples would include:

– yes, I know our data quality is shockingly bad….., you mean we could work with our customers to fix that…..? Or

– so you mean we could accept these highly qualified leads into our existing CRM system with hardly any tweaks….? Or

– so our customers can help us refine/ define our new products if we engage in the right way?

The outcome of this second meeting is usually….’let me think about that’; and ‘is there anything up and running as a genuine VRM application that I can have a look at?’

Meeting Three: So now we’re down to a very small number of ‘almost converts’. These third meetings are typically much more ‘CRM/ CM/ CE Exec driven’ and are about:

– where do I see this stuff? (i.e. we are usually showing some of the behind the scenes development projects at this stage)

– how can I access it to play around with it, prototype it and build proofs of concept in my domain?

– can you meet up with our innovation folks to talk about a possible pilot?

Underpinning these third meetings is usually the realisation that what we VRM folks are talking about actually has a very sound economic argument, and also that we are about ‘win win’ rather than consumer activism for the sake of it.

What happens after meeting three? I don’t know to be honest, we’ve not had any yet that i’d count as such – although there are a couple lined up for June and July. I think for those meetings the challenge falls back onto the VRM community, or those of us building VRM type solutions – we need to be able to answer the ‘meeting three challenges’ loud and clear.

What does that mean for Project VRM and our workshop this week? I think we need to get better at telling our big and complex story, probably in bit sized chunks and in accessible ways – a good web site for example. I think we also need to focus on getting some real, live pilots and proofs of concept out there to be engaged with. Let’s pick up on that on Friday.

Lastly, i’d have to add that the record for ‘getting it’ is actually nothing like my three meeting theory – it was about twenty minutes and the only question at the end of that was ‘where do we sign up’?