I will be speaking with my colleague Stephan Mitchev on Big Data and its application at a presentation titled “Unity in Diversity: Towards Unified Data Future” at the Big Data & Analytics in Government Summit.
As the title suggests, we will focus not on the size aspect of “big data” but on its diversity — the fact that we work at an organization that deals with data of varied systems and formats and yet we need to put it to good use.
Starting with the notion that, as Kenneth Cukier eloquently explains, we have arrived at the era of “datafication”, we need to consider what we do with the massive volumes of data we collect.
Is there value to data that is collected but not used? Shouldn’t we it a rule that:
- If we collect data, we need to analyze it
- Unless we analyze data, we should not collect it
Establishing these basic principles, we can then approach the architectural challenges of processing and analyzing massive volumes of data in order to gain insight from it.
What challenges do you face? How do you approach and resolve them?
Blogs throughout the world are reporting that there is an on-going and highly-distributed, global attack on WordPress installations across virtually every web host in existence.
HostGator and LunarPages hosting both posted on what to do to protect your WordPress-based site:
DELETE THE ‘ADMIN’ USER FOR YOUR WP SITE
Before you do that, make sure to create a new administrator account, log out from the original admin account, log into the new account and only then attempt to delete the old admin account.
CHANGE YOUR PASSWORDS REGULARLY
That should be a no-brainer but it is surprising how many sites get hacked because of simple passwords being used. The Geek Stuff offers some ideas for creating strong passwords but if your WordPress is updated, it will tell you if the new password is strong enough.
INSTALL SECURITY PLUGINS ON YOUR WP SITE
A terrific WordPress plugin, Limit Login Attempts is a good start.
PASSWORD PROTECT YOUR WP-LOGIN PAGE
Your hosting company should offer this and if not, you should perhaps change your web hosting company. I can highly recommend LunarPages! Use code “aff15off” for 15% off of a new shared hosting account if you sign up today!
Stay safe!
Just after yesterdays data visualization of the average commute time in the U.S., now we get another powerful data visualization tool courtesy of USDA, this one mapping the food deserts and average time we commute to get to our food.
I am grateful to have grown up in a family which continues to produce quite a bit of its own fruits and vegetables in addition to my dad’s beekeeping, back in the Bulgarian village where my parents live and where I spent every weekeend and vacation as a child. Here, in the U.S. it is a very different story for the majority of people.
For a vast country as the U.S., it is not surprising that there are massive areas where getting to food requires long commute. The problem I am sure is multi-dimensional and is partially rooted in the way cities in this country are built but also in the frontier culture which pushes many people to sacrifice the convenient proximity to food and work for the independence of living on your own piece of land.
I am personally lucky to live within walking distance from Giant, Harris Teeter and, most importantly, Trader Joe’s groceries stores. Occasionally I would drive to Costco for some big purchases but as a whole if I needed to, I could walk or bite for my groceries every day — just like I did early this morning when I needed yogurt and bananas.
Just as the Slow Food movement and Michael Pollan’s call to know where our food comes from, there are more and more people who demand to know the origin of their food and the way it travels to their tables. Thus the emergence of search engines like BuyLocal.com.
The new Food Access Research Atlas should help with this noble endeavor as well!
When Kate Crawford of Microsoft Research presented at the 2013 Strata Conference, she gave powerful examples of how big data analysis and visualization can be skewed unless coupled with depth and context.
As NPR reported about the Food Access Research Atlas:
The atlas, which is a big upgrade from the USDA’s two-year-old Food Desert Locator, is intended as a tool for state policymakers, local planners, and nonprofit groups concerned about food access.
The team working on the Atlas have made this powerful data visualization tool doubly more useful by mashing data on the distance to food sources with data about car ownership. They admit regretting not being able to add information about public transportation which would have made the tool even greater by providing contextual depth but such data is apparently not available on a national level.
Just as many of the presenters at the Strata Conference illustrated, when data is beautiful, we are more willing and able to consume it — not unlike healthy, organic food: if it is accessible and affordable, we will gladly opt to take advantage of it.
I wish the Atlas were not Flash-based. I wish it were built on a more open, flexible platform — Google Maps perhaps? I would have loved to be able to move from address to address quicker. But these are minimal complains. The Food Access Research Atlas is a welcome and powerful tool and its authors should be proud!
The DataNews team at WNYC has put together a stunning data visualization of the average commute time in this great country. According to the U.S. Census Bureau, whose data the talented data scientists and data artist up in New York used:
About 8.1 percent of U.S. workers have commutes of 60 minutes or longer, 4.3 percent work from home, and nearly 600,000 full-time workers had “megacommutes” of at least 90 minutes and 50 miles. The average one-way daily commute for workers across the country is 25.5 minutes, and one in four commuters leave their county to work.
This makes me appreciate the fact that most days I bike to work which is a good 30 min workout downhill and another 35-40 min really good workout uphill.
So much food for thought but nothing beats a beautiful picture:
“The programmers of tomorrow are the wizards of the future. You’re going to look like you have magic powers compared to everybody else.” – Gabe Newell
A friend today raised the valid question of why should everybody be able to learn to code. It is a matter of competitiveness, I think. I sat today at a fascinating presentation with the Guardian Data team at the Strata Conference and it is clear that the immense data and data analysis and visualization tools available today are enabling the type of journalism that a few years ago would have been impossible, ignored, or in the best scenario stumbled upon by luck. Moreover, as my professor of global business used to joke: nowadays only your local barbershop is truly local, and even this might be disputed (the ladies who cut my hair are all Vietnamese). So, put globalization and data overflow together, and you arrive at a world that is inherently more complex than the one inhabited by our grandparents. For that reason, the basic skills of pattern recognition (which my daughters study in elementary school) should be augmented by the equally basic skill of algorithm building and programming – logical process, as my friend rightly noted. I see it also in the context of consuming vs. co-creating. A few years ago not many people would consider having computer skills as essential – now it is the norm. But we should not stop at using the computers to consume only — once the kids learn how to co-create using computers, many of the current challenges will meet their, undoubtedly unexpected, solutions.
Yes, it is Cyber Monday and there is a handy website which in one convenient list will give you the available discounts from 800 online retailers.
And of course, my favorite Amazon has its own list of Cyber Monday deals.
And if you wondered how much business was done on Black Friday via Tweeter referrals, Big Blue has provided a report:
What to make of it all? As Business Insider’s Henry Blogget analyzed:
So much so for the power of social media. Now back to work!
I attended a breakfast discussion on data governance this morning. I was given a copy of the book “Data Governance For The Executive“ by James Orr who was one of the presenters.
Here are a few brief — albeit disjoined — notes of what I found most interesting:
When looking at the evolution on IT systems from the mainframe days on, one can see an evolving perception that in the beginning code was deemed to be more important than data while nowadays data is clearly more important than code.
Studies show that the growth in unstructured data is not matched by a growth in the management of that data.
To ensure semantic interoperability, some organizations focus on the desired outcomes which in turn dictate the activities, which in turn define the resources necessary. Others start with defining a common business glossary.
A main question discussed was on how to infuse data quality into agile development. Data management needs to become critical of the mission of the organization and needs to be embedded into the software development processes, and not an afterthought. Without top down support and sufficient failure, any such effort is doomed to fail.
Another important question discussed was on what is core data; what data is important to key business processes, on a departmental level and what is critical to the enterprise. The flip-side is that if some data is not important we might not need to collect it.
Always start with why? What is the benefit? What is the compelling rationale for the initiative? Be mindful of the fact that business processes are tightly linked with the creation of data.
Tie operational incidents to data quality and data governance. Tie data governance to the prevention of exposure.
Crisis is a terrible thing to waste; a disaster – even more so.
Be mindful when to use tech language and when to use business language during the necessary translation between business and IT.
Dux Raymond Sy presented a very informative webinar organized by O’Reilly on the “5 deadline sins of SharePoint in the Enterprise”. The topic is of interest to me because as I was discussing with my colleagues, our company is guilty of all of the sins Dux listed:
Here are my notes from the webinar, presented here with gratitude to Dux!
SharePoint implementation continuum ranges from the draconian IT control to the wild wild west but you want to aim for governed empowerment.
SharePoint Governance is more than just a document:
Dux provided a Sample governance plan!
See IA Design Guidance from Microsoft for example.
It is common to see IT take a technology first approach and rollout SharePoint with little consideration for the user impact:
See training and adoption resources:
Dux showed us a very handy spreadsheet for analyzing the scope of effort for each functionality of SharePoint implementation based on business priorities.
Common theme among IT departments is that SharePoint doesn’t get enough executive attention and support. Executives want the benefits but fail to make the investments that are necessary.
Lack of understanding of how SharePoint can deliver business value is the cuprit:
When executives understand the Report portion of the equation, it is much easier to get them to commit to the Investment side.
Dux clearly knows SharePoint!
Today I attended a fabulous session of the Web Managers Roundtable at AARP which featured a presentation by Dan Siroker, the former Director of Analytics for the Obama campaign, and co-founder of the A/B testing company Optimizely.
As the group of 50 or so professionals were presenting themselves, I couldn’t help but notice how for so many organizations analytics falls in the same category as social media. Then again, as Dan mentioned in his presentation, on the Obama campaign anything that was not well understood went under the New Media umbrella
Titles aside, clearly the Obama’s campaign did many things right as demonstrated by Dan with a couple of data points:
In his presentation, subtitled “Lessons from Obama to Haiti”, Dan Siroker who had been approached also by the Clinton Bush Fund for Haiti in helping them optimize their fund-raising campaign, shared five lessons:
Define quantifiable success metrics, for example:
Website click thru rate = # of clicks / # of impressions
Email signup rate = # of signups / # of pageviews
Raised money per recipient = $ amount raised / # of recipients
This was one of the most fun parts of the presentations because Dan engaged the audience in a live multivariate testing. He showed us several variables — for the media and for the button on the splash page of the Obama campaign website, that were considered two nights before the Iowa primaries — and had us all vote for what we felt would be best. Very few of us guessed what the data had shown to work best — the “Learn More” button and a “Obama family photo”. And that was exactly the point — that by questioning all assumptions and relying on data, you can arrive at gradual improvements that lead to real results.
Selecting the “Learn More” button for the email signup over the previous “Join Us Now” button had increased signup rate by 18.6%.
Choosing the “Obama family photo” for the media choice on the splash page had carried additional 13.3% of improvement over.
The majority of the participants, myself included, had chosen a rousing, inspiring video from the Springfield conference but as Dan explained, the function of a splash page is to quickly skip it and get into the site, thus the media choice had to be something simple — a long video had its place but on the splash page.
The combination of these two optimization factors lead to a 40.6% combined improvement which lead to approximately 2.88 million additional email subscriptions, 288,000 additional volunteers and $57 million additional contributions. That is real money!
This lesson was illustrated by another demonstration of a multivariate testing, this time indicating that audience matters — what might work for one audience, or time, or place, might not work for another audience.
You can never predict life, so with all the data and powerful tools you might have at your disposal, you still need to have the flexibility to adjust course depending on what circumstances present themselves to you. A couple of well known examples illustrated well this point. Being present in the moment is important for all of us but probably even more valuable to a number person!
Indeed, with tools like Google Website Optimizer and Optimizely (whose public beta I hope to join soon), there is no need to postpone starting to optimize your website! Shall we start? Let’s do it!
Today, after a very fruitful collaboration with a talented team of volunteers, I launched the newly redesigned website for Slow Food DC, which is the Washington, DC metropolitan area chapter of Slow Food. It’s mission is very worthwhile, “Supporting Good, Clean, and Fair Food”, and I was delighted to assist them.
I got involved with this project after an old friend of mine, Alexandra Greeley, suggested my services to the chair woman of Slow Food DC, Kati Gimes. Before you know it, I was consulting with a group of young dedicated volunteers, many of whom have done wonderful work in website redesign or social media management for other DC-based non-profits, themselves.
During our meetings, we discussed the content strategy, the target audiences, the information architecture, and the technological platform. A young Photoshop aficionado took upon herself to design the banner incorporating the new Slow Food DC logo.
The site uses the new WordPress theme twentyone which is extremely powerful. It utilizes plug-ins for Google Analytics, Twitter and RSS feeds. It truly is a marvelous content management system which will burden of needing a webmaster any time a new content update is needed.
Congratulations, Slow Food DC! Cheers!