Sunday, May 1, 2016

The True OLAP

OLAP or simply OnlineAnalytical Processing is the process that allows users to conduct multidimensional interaction analysis operations on real-time business data. This includes performing operations such as drilling, aggregating, pivoting, and slicing multidimensional data.

It is necessary to create topic specific data cubes in advance to support above operations, thereby users can inspect data in tables or graphs and conduct real time pivoting and drilling. Having said that, let’s consider whether OLAP is enough for catering analysis and forecasting in real world.

Figure 1: Information Systems can be divided into transnational (OLTP) and analytical (OLAP). In general OLTP provide source data to data warehouse, whereas OLAP helps to analyze it.


A mature company may have somewhat large data accumulated about its operations. These data can be used to make certain guesses about the business they are engage in. For example, a vehicle importer may guess what kind of people are tending to buy what kind of vehicles. These guesses are just the basis for forecast. Then the company could utilize accumulated data to evaluate above guesses. When a guess evaluated to be true they can be used in forecast and when it is false they will be re-guessed.

Above process could be referred as evaluation process, whose purpose is to justify conclusions with evidence find in historical data. In business analysis process, a query like the first n customers who has purchased a vehicle from vehicle types contributed for half of the sales volume of the company in the year the x is ubiquitous and required some form of a computation or querying, with intermediate steps.

The requirement of building the data cube in advance, and a limited set of actions available to perform against data cube are limiting the analysis process in a situation like above. The data model needs to have capabilities of reconstructing the cube or temporarily build cubes to cater diverse analysis demands. Furthermore, most of the OLAP products are more famous for their rich user interface; only a few has powerful online analytical capabilities.

Figure 2: In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).

So what kind of an online analytical tool could fulfill the evaluation process? Theoretically, steps for evaluation can be considered as computation regarding data. This computation can be defined by user and can decide next computation actions to be taken based of the intermediate results without having a defined model beforehand. Additionally this computation should support performing actions on huge amount of data instead of simple numeric computations. At this point of view, SQL is somewhat fulfilling this requirement, but considering its own computational capability, still it has its own limits on solving problems similar to above mentioned problem. This leaves us in a position to think about the limitations of SQL and builds a way through it to make a new generation of computational system for evaluation process, namely, the real OLAP.

I’m still completely a novice in Business Intelligence. J

Saturday, April 23, 2016

What Makes Business Intelligence Better?

I’m completely a Novice in Business Intelligence J

Business Intelligence is the process of giving businesses (technically to people who steers the business) insights about what they have been doing in their operations. This is done by processing data gathered during business transactions. This data is mostly in unstructured manner. This unstructured data available in internal and external data sources are transformed into reportable forms such as graphs as tables. The end result of business intelligence process is a set of reports and/or a dashboard that allows non technical business operators to perform ah-hoc queries on their data.

When we hear the term Business Intelligence, more precisely Intelligence, we tend to have an impression of performing huge amount of analytical tasks and serve a set of predictions. Bazinga! This is not the case in business intelligence at all. Yes, Corporations need numbers, facts about how they are performing in the process of achieving KPIs, but most importantly the data should make sense about operations to non technical users. This is where a business intelligence tool gets success, because not every user is going to be a node js geek who is waiting for npm to finish up his work.

Ideally the tool should support to build data visualizations in the matter of minutes, or maybe seconds without requiring technical report development knowledge. For example having functionalities like drag-drop might be quite interesting. In terms of reporting, the tool should require a minimal number of steps/clicks to build a report.

Ability to introduce new fields based on calculations using other existing fields/columns, filtering data based on predefined or auto modeled parameters, exploring data from different angles, and suggesting data types, schema and hierarchies automatically are also important in self service aspects of the tool.

Business intelligence also needs to perform predictive analytics. The tool should be capable of analyzing historical and current data to make predictions about future events. It should have functions to more clearly communicate a huge amount of complex data.
Furthermore it is important that tool could handle data sets in different sizes. The volume of data being consumed and number of queries being run simultaneously greatly affects the performance of the tool.



Ultimately business intelligence tool should support to interrogate data and draw conclusions that help to perform job better. Having a trending software stack doesn’t make the product a better one. A sustainable set of technologies that facilitates the product to get job done is what allows it to stand ahead in the game. 

Wednesday, February 24, 2016

Eastern and Western Narratives of the Philosophy of Language

Language and linguistics are two interesting research areas that I’d like to focus. These interests were stimulated by the work I have done in natural language processing, like unambiguously interpreting instructions from user manuals. After resigning from my previous work place I had some time to restard blogging the things I’m interested. Here I summed few facts that I have collected time by time and I’m hoping to write more about language and linguistics often.

Language is one of the most expressive forms of communicating knowledge with rest of the world; a master of this expressive form could share his wisdom with others. Philosophy of language can be denoted as a systematic study/inquiry of the origins of language, the nature of meaning, the usage and cognition of language and the relationship between language and reality.

The curious fate of humankind.
Philosophy of language has its roots in final centuries BC and early centuries AD. VedicAge in Indian history made contribution to the philosophy of language in early stages of this quest, to be precise philosophical schools of Nyaya and Mimamsa gave rise to linguistic philosophy. For example Sanskrit GrammaticalTradition can be considered and one such contribution.

Ancient Greece covered most of work in philosophy of language from the Western tradition. Plato, Aristotle and the Stoics made contributions in that era.

Plato believed that names of things are based on the nature where each smallest structural unit that distinguishes meaning represents basic ideas or sentiments. Aristotle considered that the way a subject is modified or described in a sentence is established through an abstraction of the similarities between various individual things. The Stoic philosophers separated five parts of speech: nouns, verbs, appellatives, conjunctions and articles. Their contributions are mostly made to the analysis of grammar.

In late 19th Century language start playing a major role in Western philosophy. Publication like “Cours delinguistique generale” made an impact to that contribution. 

Saturday, November 28, 2015

I Took the Red Pill; Setting Up Development Environment to Contribute Wildfly (1)

I have been working with JBoss EAP and Wildfly  for almost 2 years and the experiences is quite interesting. Inside JBoss EAP it uses Wildfly as its core JEE server. What I like most in Wildfly  are its easy configuration and management interface, modular design and indeed fast startup.


There were times I had to extensively investigate class loading mechanisms and dependency management with related to Wildfly, which gave me sort of a high confidence about its runtime environment. Its just crazy and made me to dig deep. Here I’m explaining steps required to setup your development environment to start Wildfly contribution.

Prerequisites,
  • Java 8
  • Mavan 3
  • GitHub account 

Setting up a personal repository to work,

1. Fork Wildfly repository into your personal account.
2. Clone your newly forked repository to local workspace.  

$ cd wildfly

3. Add a remote reference to upstream, for pulling future updates .

$ git remote add upstream git://github.com/wildfly/wildfly.git

4. Disable merge commits to your master.

$ git config branch.master.mergeoptions –ff-only

Build the Source Code,

Building WildFly 9 requires Java 8 or newer, make sure you have JAVA_HOME set to point to the JDK8 installation. Build uses Maven 3.

1. Run build.sh script.

$ ./build.sh

but,
Basically all you do is run

$ mvn clean install

or if your don’t want to wait for default testsuites you can do

$ mvn install -Dmaven.test.skip=true

Hakuna matata!


Launch Wildfly,

Built Wildfly zip file can be find in [repository_location]\build\target\

1. Navigate to above mentioned directory.

$ cd build\target\wildfly-10.0.0.CR5-SNAPSHOT\bin

Wildfly supports comes with two operation modes; standalone and domain.
To run Wildfly in standalone mode,

2. Execute standalone.sh script.

$ ./standalone.sh



From another post I’ll explain how to load Wildfly source to IntelliJ Idea and start debugging.



Monday, November 16, 2015

Machine Learning for Fluid Rendering in Video Games


Data-driven Fluid Simulations using Regression Forests from SoHyeon Jeong on Vimeo.

Water is one of elements that 3D animators put a lot of effort to get the perfection. It is really difficult to simulate dynamic movements of millions of particles and it requires huge amount of computational resources, specially for real time video rendering in 3D games.

However a group of researchers from Swiss Federal University of Technology and Disney Research has published a paper that shines some light on above problem with the aid of Machine Learning.

Rather than calculating the movements of each particle or reducing the particle count, this method could predict the dynamic movements of particles accurately. Presented algorithm needs to be trained with a random set of videos which have animated fluid particles with accurate calculations. Furthermore, the algorithm doesn't calculate movements in real-time, rather it predicts according to prior knowledge.

According to researches, this algorithm can render fluid animations 3 times faster than existing methods and can animate nearly 2 million particles in real-time.

Sunday, October 25, 2015

Puppet; the Continuous Delivery Tool

 Puppet; a tool that supports to automate application deployment.  Puppet enable you to practice continuous delivery. In this post I provide an an overview of Puppet Open Source continuous delivery tool, and outline it's necessary configurations and installations instructions specific to a Linux CentOS environment with recommended best practices. At the end of this post I have mentioned how to deploy a war file to JBoss Wildfly via it’s command line  tool.

Puppet is an automation software for IT system administrators and consultants. It allows you to automate repetitive tasks such as the installation of applications and services, patch management, and deployments. Configuration for all resources are stored in so called "manifests", that can be applied to multiple machines or just a single server. 

Puppet Open Source Tool have two major components; Puppet Master and Puppet Agent. Those are intended to host in two separate locations where Puppet Master keeps all manifest scripts related to deployment automation while puppet agent's are intended to frequently (in every 30mins of time) communicate with Puppet Masters to detect any updates to configurations and deployment artifacts, and pull them to agent's environment to finish the deployment.



Puppet Master is responsible for keeping agent specific deployment scripts while Puppet Agent is responsible for accessing Puppet Master and automate the deployment. First of all, Puppet Master's 8140 port must be enable to access via Puppet Agent and also both Puppet Master and Puppet Agent hosted servers needs to have their FQDNs registered with a DNS.

Master Configuration

On CentOS/RHEL 6, where iptables is used as firewall, add following line into section ":OUTPUT ACCEPT" of /etc/sysconfig/iptables.

#vim /etc/sysconfig/iptables

Add the following line to iptables to open port 8140.

-A INPUT -m state --state NEW -m tcp -p tcp --dport 8140 -j ACCEPT

Close the file after saving it.

Restart the iptables service.

# service iptables restart

Open hosts file to add FQDN of Puppet Master.

#vim /etc/hosts

Add FQDNs to the file.

10.101.15.190 nexus-jenkins.abc.lk
10.101.15.197 dev-179.abc.lk

Close the file after saving it.

Agent Configuration

Puppet client nodes have to know where the Puppet master server is located. The best practice for this is to use a DNS server, where Puppet domain name can be configured. If a DNS server is not available, /etc/hosts file can be modified as follows.

#vim /etc/hosts

Add FQDN of Puppet Master to the file.

10.101.15.197 nexus-jenkins.abc.lk

Close the file after saving it.

Installing Puppet Master

Since Puppet is not in basic CentOS or RHEL distribution repositories, add a custom repository provided by Puppet Labs

# rpm -ivh https://yum.puppetlabs.com/el/6.5/products/x86_64/puppetlabs-release-6-10.noarch.rpm

Install the "puppet-server" module in master server.

#   yum install puppet-server

When the installation is done, set the Puppet server to automatically start on boot and turn it on.

#   chkconfig puppetmaster on
#   service puppetmaster start


Installing Puppet Client

Since Puppet is not in basic CentOS or RHEL distribution repositories, add a custom repository provided by Puppet Labs

# rpm -ivh https://yum.puppetlabs.com/el/6.5/products/x86_64/puppetlabs-release-6-10.noarch.rpm

Install the puppet agent service in agent server.

#   yum install puppet

When the installation is done, set the Puppet server to automatically start on boot and turn it on.

#   yum chkconfig puppet on

Specify the Puppet master servers FQDN in /etc/sysconfig/puppet file.

#   vim /etc/sysconfig/puppet

Add the following line to specify the FQDN of the puppet master.

PUPPET_SERVER=nexus-jenkins.abc.lk

The master server name also has to be defined in the section of agent's puppet configuration file.

# vim /etc/puppet/puppet.conf

Add the following line to specify the master server.

server=nexus-jenkins.abc.lk

Start the puppet client.

# service puppet start

Certificate Verification

Execute the below command in puppet agent to generate a certificate request.

# puppet agent --test

Following error message will be appear in the terminal.

Exiting; no certificate found and waitforcert is disabled

Go back to puppet master server and list all certificate requests by executing the following command.

#   puppet cert list

Sign the certificate by executing the following command in puppet master's terminal.

#   puppet cert sign dev-86.abc.lk

Note: puppet agent's FQDN

Deployment Orchestration

For deployment automations, make sure site.pp file exist in /etc/puppet/manifests directory.

Following instructions are targeted to be placed in Puppet-Master node.

Create the following directory structure using mkdir command.
/etc/puppet/modules/[project_name]/files/
Example:
/etc/puppet/modules/xyz/files/

Open the /etc/puppet/manifests/site.pp file to configure the deployment plan.
# vim /etc/ puppet/manifests/site.pp

Add the following content to the file.

node 'pqr.abc.lk' {
                file { "/tmp/xyz/portal.war":
                                ensure => 'present',
                                mode => 0755,
                                owner => abc,
                                group => abc,
                                source => "puppet:///modules/xyz/portal.war"
                }
                exec { "deploy_portal":
                                command => "/home/abc/wildfly/bin/jboss-cli.sh --connect --command=\"deploy --force /tmp/xyz/portal.war\" "
                }
}


 References:
Installing Puppet: Red Hat Enterprise Linux (and Derivatives) — Documentation — Puppet Labs. 2015. Installing Puppet: Red Hat Enterprise Linux (and Derivatives) — Documentation — Puppet Labs. [ONLINE] Available at: https://docs.puppetlabs.com/guides/install_puppet/install_el.html. [Accessed 05 October 2015].

Installing Puppet: Post-Install Tasks — Documentation — Puppet Labs. 2015. Installing Puppet: Post-Install Tasks — Documentation — Puppet Labs. [ONLINE] Available at: https://docs.puppetlabs.com/guides/install_puppet/post_install.html. [Accessed 05 October 2015].

Language: Node Definitions — Documentation — Puppet Labs. 2015. Language: Node Definitions — Documentation — Puppet Labs. [ONLINE] Available at: https://docs.puppetlabs.com/puppet/3.8/reference/lang_node_definitions.html. [Accessed 05 October 2015].

How to install Puppet server and client on CentOS and RHEL - Xmodulo. 2015. How to install Puppet server and client on CentOS and RHEL - Xmodulo. [ONLINE] Available at: http://xmodulo.com/install-puppet-server-client-centos-rhel.html. [Accessed 05 October 2015].

Saturday, October 24, 2015

Continuous Delivery

Continuous Delivery; A quite new term I found very recently. I was working on a deployment automation task few weeks ago and here I summarize what I learnt about this practice and how it is beneficial to developers.

Continuous deployment is deploying every change that passes automated tests to production; simply it is the practice of releasing every good build to user. It is all about putting the release schedule in the hands of the business rather than in the hands of development team.

Introducing continuous delivery to a project means making sure the software is always production ready throughout its entire lifecycle and ability to interchange among release versions using a fully automated process in a matter of seconds or minutes.

According to Martin Fowler, continuous delivery is when,
  • Software is deployable throughout its lifecycle.
  • Team prioritizes keeping the software deployable over working on features.
  • Anybody can get fast, automated feedback on the production readiness of their systems any time somebody makes a change to them.
  • Perform push-button deployments of any version of the software to any environment on demand.

Incorporation of automation, frequent code releases, testing at every stage of the process, and a pull-based architecture that permits only successful releases to move to the next stage reduces errors and make it easier to improve the software delivery process.

Automation allows making successful processes repeatable. When introducing a new feature, make a change to a service underlying system or infrastructure, automation let us make the change quickly and safely without introducing errors that would result from repeating the process manually.

Releasing code frequently rather than releasing big releases once or twice means testing the product more often. There’s less change in each release, so it’s easier to isolate and fix problems. It’s also easier to roll back when needed.

Pull based architecture prevents passing code that fails automated tests to the next stage of development. This prevents errors propagating and making them harder to diagnose.

Software developers are rewarded for delivering quality software that addresses business needs, on schedule. Continuous delivery practices give software developers the ability to provision themselves with production like environment and automated deployment so they can run automated tests. Instead of standing in their way, the operations team helps developers get their work done. Continuous delivery depends on continuous integration, which means every change is merged into and tested against the main code base, reducing the opportunity for long-standing feature branches and large merge windows that can lead to serious errors. Deployment becomes much less stressful when changes are small and tested at every step.



Above video contains a speech given by Martin Fowler about Continuous Delivery.