Power BI and Azure as the future of enterprise analytics


Combine Power BI and the various Azure data processing services and you get the next generation of business intelligence and analytics

It’s not surprising that many of Microsoft’s own services are built on Azure, but increasingly Microsoft is also offering Azure services as a way for customers to extend and customize products.

When you use dataflows to extract, clean and transform data that you’re loading into Power BI, that data is stored in Azure Data Lake. You could also use it in Azure Databricks or for analytics through Azure SQL Data Warehouse, which you can do through the Azure portal, or make interactive using the Power BI Desktop app.

The automated machine learning in Power BI is the AutoML feature from Azure Machine Learning, which looks at what you’re trying to predict and what data you have available, and iterates through multiple machine-learning algorithms to discover which gets the best score. Or you can take advantage of Azure Cognitive Services to analyze the data in images and text, or build your own machine-learning models and run them.

Power BI also now has built-in-AI-powered visualizations like Key Influencers, which runs different statistical analyses like logistical regression or classification on the data to extract the key factor associated with a particular outcome. You drag the factors you think are important into the visualizations and Power BI ranks them. As you add more factors that you think might be relevant or drill into a specific segment, it keeps re-running the model to see if more information reveals anything new.

So if you are analyzing which visitors come back to your hotel and stay again, the Key Influencer might be which country they’re from. But if you select visitors in a certain age group the model runs on just that slice of data, where they Key Influencer might be whether they ate in the hotel restaurant or had a spa treatment. If you’re looking at shipping delays, you can add factors like which division sent the delivery, what factory it came from, or what area it was being sent from to see what has the most effect on what arrives on time and what’s delivered late.

There are two new AI visualizations. Distribution Change looks for what makes one data distribution different from another. The Decomposition Tree sends multiple queries to the Power BI model and then links them together so you can click on a metric in visualization to see what’s behind it, and then keep clicking down to the different levels of data to understand it in depth. That way, you can see if those 500 sales in one city are driven by a particular group of customers or many different customers who still have something in common.

All of this can feed into the visualizations, dashboards, and natural-language Q&A features that Power BI is known for, as well as the new paginated reports that previously required SQL Server. For example, when you use the automated machine learning the prediction for each row includes details of what contributed to the prediction, so you could include the explanation in a report to clarify where the figures come from and what factors appear to be involved.


Data pros

Power BI has different paths for doing this, depending on whether you’re a data scientist who wants to make their work available to the rest of the business or an analyst who wants to use machine learning but doesn’t have the skills to do it themselves.

Data scientists can add steps to a dataflow to extract information from unstructured data like images or text from tweets or reviews, by extracting keywords, doing sentiment analysis or detecting what’s in a photograph. That’s powered by Cognitive Services, but without the usual steps of writing the code to call the API – you can just add the image and text analytics to the dataflow.

As new Cognitive Services come out, Power BI will add more of these features. The latest services are extracting text from images, handwriting recognition, and entity recognition – not just extracting keywords, but classifying what they refer to. If you’re a hotel owner looking at reviews on the internet, entity recognition can tell you whether ‘cycling’ in a review means a happy guest who stayed when they were on a cycling trip or an unhappy guest complaining about the air conditioning cycling on and off all night.

If you’re creating your own machine-learning models in Azure Machine Learning and publishing them as a web service, you can give Power BI analysts in your organization role-based access to them through the Azure portal, and then they’ll show up as models they can use in the same way as Cognitive Services. If you want to analyze the photos in those hotel reviews, you might need to train a custom image recognition model to understand pictures of the things you find in a hotel. Photos of air conditioners, light bulbs, windows and lifts in a hotel review are probably a bad sign, and the standard image recognition model might not highlight them as being important objects.

And if you’re building your own machine-learning model and use Python and R to integrate that into Power BI, or using the AutoML in Power BI to have it discover what machine-learning algorithm works best with your data, you can now upload those models to Azure Machine Learning to manage them or tune them further. That means business analysts could use the automated option, and if it proves useful a data scientist could pick it up and develop it further.

And all of these insights are available to use in a range of ways. Powerful as the interactive dashboards and visualizations in Power BI are, sometimes what business users want is the familiar report that they can print out and read, or email to a customer or supplier. Power BI now supports the same paginated reports with headers and footers and table, chart or matrix layouts as SQL Server Reporting Services (with a new Report Builder tool to create them). Paginated reports are part of Power BI Premium, but they’re also compatible with the on-premises Power BI Report Server.

So if you want to move your analytics from SQL Server Reporting Services to Power BI, you can create an enterprise business intelligence system that gives you the full range of business analytics, from the reports your organization probably already depends on, to machine learning that tries to automatically find insights in data that isn’t necessarily structured or numerical. If Power BI doesn’t fit your needs on its own, the idea is to make it so easy to extend with Azure that business users can do it themselves.


Resource Credit | TechRepublic 

Data Lakes: What are they and who needs them?

data lakes

The sheer scale of data being captured by the modern enterprise has necessitated a monumental shift in how that data is stored.

From the humble databases through to data warehouses, data stores have grown both in scale and complexity to keep pace with the businesses they serve, and the data analysis now required to remain competitive. What was at first a data stream has morphed into a data river as enterprise businesses are harvesting reams of data from every conceivable input across every conceivable business function.

To address the flood of data and the needs of enterprise businesses to store, sort, and analyze that data, a new storage solution has evolved: the data lake.

What’s in a Data Lake?

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state.” – James Dixon.

Enterprise businesses run on a foundation of tools and functions that provide valuable data, but rarely in a standardized format. while your accounting department is using their preferred billing and invoicing software, your warehouse is relying on a completely different inventory management system. All the while, your marketing team is relying on marketing automation or CRM software they find the most productive. These systems rarely communicate directly with each other – and while they can be cobbled together to react to business processes or workflows via integrations, there is still no standard output for the data being generated.

Data warehouses do a great job of standardizing data from disparate sources for analysis. In fact, by the time data is loaded into a data warehouse, the decision about how that data is going to be used or how it needs to be processed has already been made.

Data lakes, however, are a bigger, dirtier, more unwieldy beasts – taking all of the data an enterprise business has access to, whether structured, semi-structured or unstructured and storing it in its raw format for further exploration and querying. Remember that data stream/river analogy earlier? All data sources within your enterprise are tributaries for your data lake, which will collect all of your data, regardless of form, function, size, or speed. This is particularly useful when capturing event tracking or IoT data; through the uses of data lakes extended beyond just those scenarios.

Taking a Dip

Once the data has been collected in the lake, organizations can query and analyze the data, as well as utilize it as a data source for their data warehouse.

For example, Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed – and do all types of processing and analytics across platforms and languages. By removing the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming, and interactive analytics, Azure Data Lake works with existing IT investments for identity, management, and security for simplified data management and governance.

However, storage is only one component of a data lake, the other being the ability to run analysis on the structured, unstructured, relational and no-relational data to identify areas of opportunity or focus.

Analysis can be performed on data lake contents via Azure’s analytics job service or the HDInsight analytics service.

  • Analytics job service: Data lakes are particularly valuable in analytical scenarios where you don’t know what you don’t know – with unfiltered access to raw, pre-transformed data, machine learning algorithms, data scientists, or analysts can process petabytes of data for diverse workload categories such as querying, RTL, analytics machine learning, machine translation, image processing, and sentiment analysis. And using Azure’s built-in U-SQL library allows businesses to write code once and have it automatically parallelized for the scale you need, whether in .NET languages, R, or Python.
  • HDInsight: When it comes to Big Data analysis, the open-source Hadoop framework remains one of the most popular options. With the Microsoft HDInsight platform, open-source frameworks such as Hadoop, Spark, Hive, LLAP, Kafka, Storm, HBase, Microsoft ML Server & more can be applied to your data lakes via preconfigured clusters optimized for different big data scenarios.

Future-Proofing your Data

Data lakes represent a new frontier for businesses. By taking the entire sum of knowledge available to an enterprise and analyzing it in a raw, unfiltered state without expectation, incredible opportunities, insights, and optimizations can be unearthed.

Just like actual lakes, the long-term health of your organizational data lake depends on defending it from ‘pollution’ – data governance is critical to ensure your data lake doesn’t become a data swamp. Ungoverned or uncatalogued data can leave businesses vulnerable both in terms of data quality (and organizational trust in that data), as well as in terms of security, regulatory, and compliance risks. At the very worst, data lakes can provide a wealth of data that is impossible to analyze in a meaningful way due to incorrect metadata or cataloging.

For businesses to truly reap the rewards of data lakes, they’ll want to have a firm internal governance policy, used in conjunction with a data catalog (like Azure Data Catalog). A data catalog’s tagging system helps to unify data through the creation and implementation of a common language, which includes data and datasets, glossaries, definitions, reports, metrics, dashboards, algorithms, and models. This unifying language allows users to understand the data in business terms, while also establishing relationships and associations between datasets (once the data reaches the warehousing or relational stage).

Build your Business Intelligence Infrastructure on a Solid Foundation

By establishing a data lake alongside companion tools that allow for better organization and analysis, like Jet Reports, your data lake will remain a crystal-clear source of knowledge for your business for many years to come. For more information on organizing your data or running big data workloads effectively, please reach out to our talented team of reporting and analytics experts. Contact allonline365 on  info@allonline365.com or  +27 (21) 205 3650.


Resource Credit | Jet Global

allonline365 Newsletter

Call Now Button