703-568-7125
mitko@websage.net

WebSage - Web Intelligence, Web Analytics, Web Marketing, Data Visualization

Questions for Big Data in Government Summit

Big Data and Analytics in Government

Big Data and Analytics in Government SummitI will be speaking with my colleague Stephan Mitchev on Big Data and its application at a presentation titled “Unity in Diversity: Towards Unified Data Future” at the Big Data & Analytics in Government Summit.

As the title suggests, we will focus not on the size aspect of “big data” but on its diversity — the fact that we work at an organization that deals with data of varied systems and formats and yet we need to put it to good use.

The Era of “Datafication”

Starting with the notion that, as Kenneth Cukier eloquently explains, we have arrived at the era of “datafication”, we need to consider what we do with the massive volumes of data we collect.

Unless we make use of data, should we collect it?

Is there value to data that is collected but not used? Shouldn’t we it a rule that:

- If we collect data, we need to analyze it

- Unless we analyze data, we should not collect it

Establishing these basic principles, we can then approach the architectural challenges of processing and analyzing massive volumes of data in order to gain insight from it.

What challenges do you face? How do you approach and resolve them?

Bulletproofing your WordPress site against a brute force attack

Bulletproofing your WordPress site against a brute force attacBlogs throughout the world are reporting that there is an on-going and highly-distributed, global attack on WordPress installations across virtually every web host in existence.

HostGator and LunarPages hosting both posted on what to do to protect your WordPress-based site:

DELETE THE ‘ADMIN’ USER FOR YOUR WP SITE

Before you do that, make sure to create a new administrator account, log out from the original admin account, log into the new account and only then attempt to delete the old admin account.

CHANGE YOUR PASSWORDS REGULARLY

That should be a no-brainer but it is surprising how many sites get hacked because of simple passwords being used. The Geek Stuff offers some ideas for creating strong passwords but if your WordPress is updated, it will tell you if the new password is strong enough.

INSTALL SECURITY PLUGINS ON YOUR WP SITE

A terrific WordPress plugin, Limit Login Attempts is a good start.

PASSWORD PROTECT YOUR WP-LOGIN PAGE

Your hosting company should offer this and if not, you should perhaps change your web hosting company. I can highly recommend LunarPages! Use code “aff15off” for 15% off of a new shared hosting account if you sign up today!

Stay safe!

Data Visualization of the Food Commute

Food Access Research Atlas

Just after yesterdays data visualization of the average commute time in the U.S., now we get another powerful data visualization tool courtesy of USDA, this one mapping the food deserts and average time we commute to get to our food.

Food and Commute

Food Access Research AtlasI am grateful to have grown up in a family which continues to produce quite a bit of its own fruits and vegetables in addition to my dad’s beekeeping, back in the Bulgarian village where my parents live and where I spent every weekeend and vacation as a child. Here, in the U.S. it is a very different story for the majority of people.

For a vast country as the U.S., it is not surprising that there are massive areas where getting to food requires long commute. The problem I am sure is multi-dimensional and is partially rooted in the way cities in this country are built but also in the frontier culture which pushes many people to sacrifice the convenient proximity to food and work for the independence of living on your own piece of land.

I am personally lucky to live within walking distance from Giant, Harris Teeter and, most importantly, Trader Joe’s groceries stores. Occasionally  I would drive to Costco for some big purchases but as a whole if I needed to, I could walk or bite for my groceries every day — just like I did early this morning when I needed yogurt and bananas.

Just as the Slow Food movement and Michael Pollan’s call to know where our food comes from, there are more and more people who demand to know the origin of their food and the way it travels to their tables. Thus the emergence of search engines like BuyLocal.com.

The new Food Access Research Atlas should help with this noble endeavor as well!

Data Is Contextual, Powerful and Beautiful

When Kate Crawford of Microsoft Research presented at the 2013 Strata Conference, she gave powerful examples of how big data analysis and visualization can be skewed unless coupled with depth and context.

As NPR reported about the Food Access Research Atlas:

The atlas, which is a big upgrade from the USDA’s two-year-old Food Desert Locator, is intended as a tool for state policymakers, local planners, and nonprofit groups concerned about food access.

The team working on the Atlas have made this powerful data visualization tool doubly more useful by mashing data on the distance to food sources with data about car ownership. They admit regretting not being able to add information about public transportation which would have made the tool even greater by providing contextual depth but such data is apparently not available on a national level.

Accessible Data is Usable Data

Just as many of the presenters at the Strata Conference illustrated, when data is beautiful, we are more willing and able to consume it — not unlike healthy, organic food: if it is accessible and affordable, we will gladly opt to take advantage of it.

I wish the Atlas were not Flash-based. I wish it were built on a more open, flexible platform — Google Maps perhaps? I would have loved to be able to move from address to address quicker. But these are minimal complains. The Food Access Research Atlas is a welcome and powerful tool and its authors should be proud!

Stunning Data Visualization of the Average Commute Time

Stunning Data Visualization of the Average Commute TimeThe DataNews team at WNYC has put together a stunning data visualization of the average commute time in this great country. According to the U.S. Census Bureau, whose data the talented data scientists and data artist up in New York used:

About 8.1 percent of U.S. workers have commutes of 60 minutes or longer, 4.3 percent work from home, and nearly 600,000 full-time workers had “megacommutes” of at least 90 minutes and 50 miles. The average one-way daily commute for workers across the country is 25.5 minutes, and one in four commuters leave their county to work.

This makes me appreciate the fact that most days I bike to work which is a good 30 min workout downhill and another 35-40 min really good workout uphill.

So much food for thought but nothing beats a beautiful picture:

Strata Conference – Data Journalism and Coding

Code“The programmers of tomorrow are the wizards of the future. You’re going to look like you have magic powers compared to everybody else.” – Gabe Newell

A friend today raised the valid question of why should everybody be able to learn to code. It is a matter of competitiveness, I think. I sat today at a fascinating presentation with the Guardian Data team at the Strata Conference and it is clear that the immense data and data analysis and visualization tools available today are enabling the type of journalism that a few years ago would have been impossible, ignored, or in the best scenario stumbled upon by luck. Moreover, as my professor of global business used to joke: nowadays only your local barbershop is truly local, and even this might be disputed (the ladies who cut my hair are all Vietnamese). So, put globalization and data overflow together, and you arrive at a world that is inherently more complex than the one inhabited by our grandparents. For that reason, the basic skills of pattern recognition (which my daughters study in elementary school) should be augmented by the equally basic skill of algorithm building and programming – logical process, as my friend rightly noted. I see it also in the context of consuming vs. co-creating. A few years ago not many people would consider having computer skills as essential – now it is the norm. But we should not stop at using the computers to consume only — once the kids learn how to co-create using computers, many of the current challenges will meet their, undoubtedly unexpected, solutions.

Cyber Monday deal list and Black Friday report

Yes, it is Cyber Monday and there is a handy website which in one convenient list will give you the available discounts from 800 online retailers.

And of course, my favorite Amazon has its own list of Cyber Monday deals.

And if you wondered how much business was done on Black Friday via Tweeter referrals, Big Blue has provided a report:

  • Consumer Spending Increases: Online sales on Thanksgiving grew by 17.4 percent followed by Black Friday where sales increased 20.7 percent over last year.
  • Mobile Shopping: Mobile purchases soared with 24 percent of consumers using a mobile device to visit a retailer’s site, up from 14.3 percent in 2011. Mobile sales exceeded 16 percent, up from 9.8 percent in 2011.
  • The iPad Factor: The iPad generated more traffic than any other tablet or smart phone, reaching nearly 10 percent of online shopping. This was followed by iPhone at 8.7 percent and Android 5.5 percent. The iPad dominated tablet traffic at 88.3 percent followed by the Barnes and Noble Nook at 3.1 percent, Amazon Kindle at 2.4 percent and the Samsung Galaxy at 1.8 percent.
  • Multiscreen Shopping: Consumers shopped in store, online and on mobile devices simultaneously to get the best bargains. Overall 58 percent of consumers used smartphones compared to 41 percent who used tablets to surf for bargains on Black Friday.
  • The Savvy shopper: While consumers spent more overall, they shopped with greater frequency to take advantage of retailer deals and free shipping. This led to a drop in average order value by 4.7 percent to $181.22. In addition, the average number of items per order decreased 12 percent to 5.6.
  • Social Media Sentiment Index: Shoppers expressed positive consumer sentiment on promotions, shipping and convenience as well as the retailers themselves at a three to one ratio.
  • Social Sales: Shoppers referred from Social Networks such as Facebook, Twitter, LinkedIn and YouTube generated .34 percent of all online sales on Black Friday, a decrease of more than 35 percent from 2011.
  • Shoppers referred from Social Networks such as Facebook, Twitter, LinkedIn and YouTube generated .34 percent of all online sales on Black Friday, a decrease of more than 35 percent from 2011.

What to make of it all? As Business Insider’s Henry Blogget analyzed:

  • The average Black Friday online shopper bought 5.6 items per order. That’s down 13% from last year. It’s also down 40% from Friday, November 16th, a week earlier. Hard to know what to make of that.
  • The average shopping “session” length was 6 minutes and 39 seconds. That’s down about 10% from last year. Compare that to the average hellish shopping session in a physical store, and you’ll see why ecommerce is continuing to grow as a percent over overall retail sales.
  • Only 0.68% of Black Friday online sales came from Facebook referrals–two-thirds of one percent. That was a decline of 1% from last year.
  • Commerce site traffic from Twitter accounted for exactly 0.00% of Black Friday traffic. That was down from 0.02% last year.

So much so for the power of social media. Now back to work!

Data Governance for Executives – Notes from the Roundtable at the National Press Club

I attended a breakfast discussion on data governance this morning. I was given a copy of the book “Data Governance For The Executive“ by James Orr who was one of the presenters.

Here are a few brief — albeit disjoined — notes of what I found most interesting:

When looking at the evolution on IT systems from the mainframe days on, one can see an evolving perception that in the beginning code was deemed to be more important than data while nowadays data is clearly more important than code.

Studies show that the growth in unstructured data is not matched by a growth in the management of that data.

To ensure semantic interoperability, some organizations focus on the desired outcomes which in turn dictate the activities, which in turn define the resources necessary. Others start with defining a common business glossary.

A main question discussed was on how to infuse data quality into agile development. Data management needs to become critical of the mission of the organization and needs to be embedded into the software development processes, and not an afterthought. Without top down support and sufficient failure, any such effort is doomed to fail.

Another important question discussed was on what is core data; what data is important to key business processes, on a departmental level and what is critical to the enterprise. The flip-side is that if some data is not important we might not need to collect it.

Always start with why? What is the benefit? What is the compelling rationale for the initiative? Be mindful of the fact that business processes are tightly linked with the creation of data.

Tie operational incidents to data quality and data governance. Tie data governance to the prevention of exposure.

Crisis is a terrible thing to waste; a disaster – even more so.

Be mindful when to use tech language and when to use business language during the necessary translation between business and IT.

5 deadline sins of SharePoint in the Enterprise

Dux Raymond Sy presented a very informative webinar organized by O’Reilly on the “5 deadline sins of SharePoint in the Enterprise”. The topic is of interest to me because as I was discussing with my colleagues, our company is guilty of all of the sins Dux listed:

  • Treating Governance as a one-time event
  • IT leadership abdicating responsibility for Information Architecture and Roadmap
  • Treating user adoption & training as an after thought
  • Underestimating Human Resource implications
  • Failing to educate and engage executives

Here are my notes from the webinar, presented here with gratitude to Dux!

SharePoint implementation continuum ranges from the draconian IT control to the wild wild west but you want to aim for governed empowerment.

Sin #1: Governance as a One Time Event:

SharePoint Governance is more than just a document:

  • Importance of SharePoint governance plan in addition to general IT governance
  • Balance of collaboration capabilities and their benefits vs. organizational requirements
  • Ensuring the plan evolves with the business
  • Who creates the plan? Who owns the plan? Who enforces the plan?
  • Defined roles and responsibilities
  • Making governance part of the training process
  • Enforcenent vs. empowerement approach

Dux provided a Sample governance plan!

Sin #2: Leadership Not Involved in IA

  • Inability to find information quickly and easily is one of the biggest frustrations.
  • Involve business decision makers on information architecture design is critical
  • Establishing a common language structure can greatly improve findability
  • Having a defined taxonomy strategy and ongoing management of metadata management
  • Understanding how to tag information correctly
  • Will out-of-the-box provide sufficient search capabilities or is a custom solution necessary?

See IA Design Guidance from Microsoft for example.

Sin #3: Training & Adoption an Afterthought

It is common to see IT take a technology first approach and rollout SharePoint with little consideration for the user impact:

  • Engage the business early on – identify and prioritize business solutions
  • Always remember the WIIFM user mindset
  • Relevant training approach (Process vs technology focused)
  • Support mechanism

See training and adoption resources:

Sin #4: Underestimate Human Resources Requirements for SharePoint

  • Management overlooks how broad SharePoint is and relies on existing personnel with little expertise to deliver platform and business solutions
  • Various skillsets are needed — not just developer and / or admin
  • Having a SharePoint business analyst is a key

Dux showed us a very handy spreadsheet for analyzing the scope of effort for each functionality of SharePoint implementation based on business priorities.

Sin #5:  Failed Exec Education & Engagement

Common theme among IT departments is that SharePoint doesn’t get enough executive attention and support. Executives want the benefits but fail to make the investments that are necessary.

Lack of understanding of how SharePoint can deliver business value is the cuprit:

  • What are you trying to accomplish and why?
  • What is the value? If it is quantifiable, then half the battle is won.

When executives understand the Report portion of the equation, it is much easier to get them to commit to the Investment side.

Dux clearly knows SharePoint!

How Can Website Optimization Help You?

optimizely_logo_BLUEToday I attended a fabulous session of the Web Managers Roundtable at AARP which featured a presentation by Dan Siroker, the former Director of Analytics for the Obama campaign, and co-founder of the A/B testing company Optimizely.

As the group of 50 or so professionals were presenting themselves, I couldn’t help but notice how for so many organizations analytics falls in the same category as social media. Then again, as Dan mentioned in his presentation, on the Obama campaign anything that was not well understood went under the New Media umbrella :-)

Titles aside, clearly the Obama’s campaign did many things right as demonstrated by Dan with a couple of data points:

  • Facebook friends – 2.4 million for Obama vs 0.6 million of McCain
  • YouTube video views – ~100 million for Obama vs ~20 million for McCain
  • Unique website visitors – ~130 million for Obama vs ~30 million for McCain
  • ~$500 million raised online

In his presentation, subtitled “Lessons from Obama to Haiti”, Dan Siroker who had been approached also by the Clinton Bush Fund for Haiti in helping them optimize their fund-raising campaign, shared five lessons:

1. Define Success

Define quantifiable success metrics, for example:

Website click thru rate = # of clicks / # of impressions

Email signup rate = # of signups / # of pageviews

Raised money per recipient = $ amount raised / # of recipients

2. Question Assumptions

This was one of the most fun parts of the presentations because Dan engaged the audience in a live multivariate testing. He showed us several variables — for the media and for the button on the splash page of the Obama campaign website, that were considered two nights before the Iowa primaries — and had us all vote for what we felt would be best. Very few of us guessed what the data had shown to work best — the “Learn More” button and a “Obama family photo”. And that was exactly the point — that by questioning all assumptions and relying on data, you can arrive at gradual improvements that lead to real results.

Selecting the “Learn More” button for the email signup over the previous “Join Us Now” button had increased signup rate by 18.6%.

Choosing the “Obama family photo” for the media choice on the splash page had carried additional 13.3% of improvement over.

The majority of the participants, myself included, had chosen a rousing, inspiring video from the Springfield conference but as Dan explained, the function of a splash page is to quickly skip it and get into the site, thus the media choice had to be something simple — a long video had its place but on the splash page.

The combination of these two optimization factors lead to a 40.6% combined improvement which lead to approximately 2.88 million additional email subscriptions, 288,000 additional volunteers and $57 million additional contributions. That is real money!

3. Divide & Conquer

This lesson was illustrated by another demonstration of a multivariate testing, this time indicating that audience matters — what might work for one audience, or time, or place, might not work for another audience.

4. Take Advantage of Circumstances

You can never predict life, so with all the data and powerful tools you might have at your disposal, you still need to have the flexibility to adjust course depending on what circumstances present themselves to you. A couple of well known examples illustrated well this point. Being present in the moment is important for all of us but probably even more valuable to a number person!

5. Start Today

Indeed, with tools like Google Website Optimizer and Optimizely (whose public beta I hope to join soon), there is no need to postpone starting to optimize your website! Shall we start? Let’s do it!

New website for Slow Food DC

Today, after a very fruitful collaboration with a talented team of volunteers, I launched the newly redesigned website for Slow Food DC, which is the Washington, DC metropolitan area chapter of Slow Food. It’s mission is very worthwhile, “Supporting Good, Clean, and Fair Food”, and I was delighted to assist them.

Slow Food DC

I got involved with this project after an old friend of mine, Alexandra Greeley, suggested my services to the chair woman of Slow Food DC, Kati Gimes. Before you know it, I was consulting with a group of young dedicated volunteers, many of whom have done wonderful work in website redesign or social media management for other DC-based non-profits, themselves.

During our meetings, we discussed the content strategy, the target audiences, the information architecture, and the technological platform. A young Photoshop aficionado took upon herself to design the banner incorporating the new Slow Food DC logo.

The site uses the new WordPress theme twentyone which is extremely powerful. It utilizes plug-ins for Google Analytics, Twitter and RSS feeds. It truly is a marvelous content management system which will burden of needing a webmaster any time a new content update is needed.

Congratulations, Slow Food DC! Cheers!