White paperBusiness intelligenceOpen source solutions2010 EditionFor further information: 33 (0)1 41 40 11 00Mailto: [email protected]

Page 2Business intelligence – Open Source is a company comprised of engineers specialising in theimplementation of open source solutions and the integrationof systems based on open source solutions. Smile is a memberof APRIL, an association focused on the promotion andprotection of free software.With over 290 employees in France, and 320 throughout theworld (September 2009), Smile is the leading French OpenSource solution company.Since around the year 2000, Smile has been activelymonitoring the technological market, allowing us to identify,to test and assess the most promising open source solutions.We can then present our clients with the strongest, mostsustainable, most efficient products available.This approach has given way to a whole range of white paperscovering various different application sectors. Contentmanagement (2004); portals (2005); business intelligence(2006); PHP frameworks (2007); virtualisation (2007); digitaldocument management (2008); and ERPs (2008). Among theworks published in 2009, the “Open Source VPNs”, and “OpenSource flow controls and Firewalls” articles, within the“Systems and Infrastructures” collection are also of interest.Each of these works offers a selection of the best open sourcesolutions in the relevant domain, their respective qualities,and feedback on operational use.As stable open source solutions slowly gain ground in newsectors, Smile will be present to offer customers the benefit ofthese solutions risk free. Smile appears in the French as the integration service provider of choice, to assistmajor companies in adopting the best open source solutions.Smile has also developed a range of service offers over the lastfew years. A consultancy department has assisted our clientssince 2005, through preproject phases, solution research, andproject support. In 2000, Smile created a graphics studiowhich in 2007 became known as The Interactive MediaAgency. This agency offers not only graphic design services, Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 3Business intelligence – Open Source solutionsbut also e-marketing, editorial, and rich interface expertise.Smile also has an agency specializing in Third-partyApplication Maintenance, application support and applicationprocessing. Smile offices can be found in Paris, Lyon, Nantes,Bordeaux and Montpellier, with presence in Spain,Switzerland, the Ukraine and Morocco.Some Smile sitesLaboratoires Boiron, Foncia, Crédit Coopératif, EMI Music, Salon de l’Agriculture,Mazars, Areva, Société Générale, Gîtes de France, Patrice Pichet, Groupama,Eco-Emballage, CFnews, CEA, Prisma Pub, Véolia, NRJ, JCDecaux, Larousse,01 Informatique, Spie, PSA, Boiron, Dassault-Systèmes, Action Contre la Faim,BNP Paribas, Air Pays de Loire, Forum des Images, IFP, BHV, ZeMedical,Gallimard, Cheval Mag, Afssaps, CNIL Portals and IntranetsEurosport, HEC, Bouygues Telecom, Prisma, Veolia, Arjowiggins, INA, Primagaz,Croix Rouge, Invivo, Faceo, Château de Versailles, Ipsos, VSC Technologies,Sanef, Explorimmo, Bureau Veritas, Région Centre, Dassault Systèmes,Fondation d’Auteuil, Korian, PagesJaunes Annonces, Primagaz Electronic Document Management and ECMAgefiph, Primagaz, UCFF, Apave, Géoservices, Renault F1 Team, INRIA, CIDJ,SNCD, Ecureuil Gestion, CS informatique, Serimax, Véolia Propreté, NetasQ,Corep, Packetis, Alstom Power Services, Mazars E-businessFuret du Nord, Camif Collectivité, La Halle, De Dietrich, Adenclassifieds, Macif,Gîtes de France, GPdis, Longchamp, Projectif, ETS, Bain & Spa, Yves Rocher,Bouygues Immobilier, Nestlé, Stanhome, AVF Périmédical, CCI, Pompiers deFrance, Commissariat à l’Energie Atomique Business Intelligence and ERPLafarge, Groupe Accueil, Anevia, Projectif, Xinek, Companeo, Advans, Point P,Mindscape, Loyalty Experts, Cecim, Espace Loggia, Nouvelles Frontières,France24, La Poste, HomeCineSolutions, Vocatis, Skyrock, France Domicile,Polyexpert, Cadremploi, Cmonjob, Infrastructure and HostingKantar, Pierre Audoin Consultants, Rexel, Motor Presse, OSEO, Sport24,SETRAG, Canal-U, Institut Mutualiste Montsouris, ETS, Ionis, Osmoz, SIDEL,Atel Hotels, Cadremploi, Institut Français du Pétrole, Mutualité Française Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 4Business intelligence – Open Source solutionsThis white paperHaving taken the lead in entire markets, such as contentmanagement, portals, or development frameworks, opensource is also gaining market share in the businessintelligence solutions sector, with solutions that are with the other white papers published by Smile, this workaims to bring together: A general Business Intelligence approach, concepts,fields of application, specific needs. A compilation of the best open source solutions in theintelligence domain. A relatively exhaustive presentation of these tools, theirstrengths, limits, maturity, ability to satisfy operationalneeds. A presentation on the existing professional aids toaccompany these tools.This study was based on several years of research and staffdeployment, and constantly improved upon by way ofoperational implementation for our clients. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 5Business intelligence – Open Source solutionsVersion 2010The first version of this white paper was edited in June 2006.As a result of our project work and monitoring of toolprogress, a new white paper version has been compiled everyyear since 2010 version includes information on what’s new in theopen source solution market. As such we have a 2.4 versionon SpagoBI, a 3.5.2 version on the Pentaho suite, and a 3.7version on JasperSoft. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden


Page 7Business intelligence – Open Source analysis tools and techniquesThere was a drop in turnover in October. Decisions must bemade to tackle the situation. However in order to make theright decision, research must be done to discover whyturnover had decreased and how it had decreased. Whatproduct range was affected? What countries, what regions? Inthe customer portfolios of which sales agent(s)? In whichdistribution segment? Do we have a similar drop everyOctober? There are a number of questions to be asked and itis important to know how to respond to these questions inorder to be in a position to make the right decision.We can use a number of terms “business analysis tools andtechniques”, or “decision support tools” or “businessintelligence”, but basically we are referring to a number of which allow us to analyze company data, in order toextract new reliable information on which decisions can bebased, be these tactical or strategic.A modern company stocks a huge volume of data on itsinformation systems, however quite often, this very volumemakes it difficult to make sense of the data, and to understandwhat the data represents: underlying trends, hidden strengthsor weaknesses, everything we need to know to be able to makean informed decision.As such, business analysis tools, before helping us to make adecision, help us to aboard data analysis in order to identifyrelevant macroscopic information hidden in the large volume ofdata.Report generator or ReportingA report is the presentation of data in a coherent and legiblemanner, generally for printing.The report displays number charts, managing headers,footers, subtotals etc. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 8Business intelligence – Open Source solutionsA report generator is a program which allows one to define areport based on certain elements, with a minimum ofprogramming, simply using an interactive interface. Oncedefined, the report can be generated at regular intervals.A report can depend on settings: the year, month, region,sector of activity, etc. As such a single report which has beendefined once can be generated using different variants,depending on the values of certain report settings can be entered by the user based on theircriteria, and the user can also vary these settings to narrowdown their search.In other cases, settings can be automatically defined bymanagement rules, for example to give each sales agent areport on their monthly sales. These are known as “burstreports”.There are two report generation phases: An initial conception phase, which usually requires specificexpertise, and which is not in the hands of the end user; A settings and production phase, which does not requirespecialist expertise and can be carried out by the end user.We would, of course, like to be able to limit the first phase,which is the most expensive, and be able to put theconception tools in the hands of the end user. But experiencehas shown that this is rarely possible: even though the usercould learn how the tools work, they have other priorities.In 2007 reporting tools for end users and “ad hoc” toolsappeared in open source B.I. solutions. This allows end userswith no specific technical capabilities, to generate simplereports easily and rapidly. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 9Business intelligence – Open Source solutionsRequest or Ad hoc analysisThere are two main business intelligence modes: the “strategicrelations” mode and the “interactive mode”, with an analysiswhich interactively responds to user each correspond to different needs and are sometimescomplementary: Obtain recurrent information, which corresponds topredefined analysis, for example the breakdown of sales byregion ; Searching for information by carrying out differentsuccessive analyses, the results of each lead to newquestions, which in turn require the selection of differentelements for analysis.These tools are the same in some ways, but are quite differentin others: ion can necessitate expert intervention. Onceprepared, they will be produced regularly, almostautomatically. On the other hand, interactive analyses must be under theend user’s control, who is no longer just an analysisconsumer, but the key player, yet someone who is not atechnical expert on business intelligence tools. This meansthat the tools used in these analyses must be very userfriendly, both in regards to functions and to technicalconfiguration. They must allow complete abstraction fromtechnical know-how, and SQL database language inparticular, as it must be assumed that users will not haveprior technical knowledge. Finally, the interactive or ad hoc analysis also requiresexcellent response times, to allow the user to experimentdirectly and narrow down their analysis. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 10Business intelligence – Open Source solutionsMultidimensional analysis or OLAPMultidimensional analysis is one of the most commonly usedmethods of analysis in business intelligence.Let’s try to outline the main have basic information sets, generally in large volumes,each line of information represents an event characterised by: A unique identifier Elements that describe the information Values which carry information regarding quantitiesTake the following standardsimplified for our purposes:example,whichwehaveTake the basic entity the invoice line item. It is often acompany’s key entity the most conclusive information it holdsin regards to the sales process.The invoice line item is part of an invoice and relates to thesale of a product, in a given quantity, at a given price, to agiven customer.It carries various different attributes: a product reference, aquantity, a unit price, a total price.We can access other information via the invoice that the lineitem belongs to: the client reference, date (year, month, day)etc.We also have access to other information on the customerthemselves: country, region, type of customer, sector ofactivity, etc. The customer may even have an assigned salesagent.We will leave it here for this example, but we could go a lotfurther and collect a great deal of information all stemming fromthis one little invoice line item. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 11Business intelligence – Open Source solutionscountryinvoicecustomersales lineitemThe first step is to gather all of the information required forour analysis. Here, for example, the information required isthe: Date (year, month, day) Product Product category Customer Customer country Sales agent Quantity TurnoverThe first information constitutes the axes for potentialanalyses, the last two constitute values.In multidimensional analysis, the initial relational model isnot the most pertinent or the most reliable. We generallyprefer to denormalize the database, i.e. to work on a singletable that all useful information has been gathered on. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 12Business intelligence – Open Source solutionsIn our example we would obtain the following ardeningCastorama FranceLeroyMerlin ItalyCcalLepaulQuantity T.O.5050 000LepaulLegrand120250Etc. is clearly a great deal of redundancy in this table, but inthe end it is easier to manage redundant but simpleinformation, and to avoid problems regarding consistency andintegrity which are an area for information systems.This next step involves creating the first level of aggregationi.e. uniting certain line items.In our case for example, we can assume that the data will notbe used at product reference level, but only by segment. In thiscase, we will regroup all of the identical lines for the key (date,segment, family, customer, country, sales agent), and add thequantity and turnover values.The last step is that of “multidimensional analysis” in the truesense of the term, which consists of selecting the axes ofanalysis.Among these axes, we can distinguish between: Discrete or discontinued value axes i.e. that contain alimited number of values, a postal code or CSP segment forexample. Continuous value axes, typically a date or a price. We canlead them to a discrete number of values by definingranges: a price range, an age range etc.We can also distinguish between: Cumulable values, an amount or a number of items forexample. Noncumulable values, age or date for example.Cumulable values are those which it is wise to combine, i.e. those we cancalculate the sum of, (or the average, or some other mathematical function of),for a sub-total of lines, for each theme for example. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden11 000230 000

Page 13Business intelligence – Open Source solutionsSection: Month 01Sample: month 06,Eastern 2030405Month06Turnover : 320KPeople : 12Margin : 15KCellMonth 04Tourism sectorEastern regionMultidimensional analysis consists in: Defining the axes of analysis that will be used, and theorder in which they will be used. E.g.: by region, then byyear, then by sales agent, then by product range. We do notalways use all possible axes. Define the value(s) to be studied, and the function to beapplied to these values. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 14Business intelligence – Open Source solutionsEach hierarchy of axes of analysis correspond to a questionthat we ask it.For example analysis by year, by sales agent aims atrepresenting the development of turnover, and comparingsales agents for each preceding year. Analysis by sales agent,by year aims to compare the sales agents first.Data information or data which is the basis of the businessintelligence system often comes from the company’s multipledifferent information systems. A business intelligenceapplication often constitutes a datawarehouse, drawing ondata from several sources within the company.There are a number of reasons for this consolidation stage: Centralisation: it would be very difficult, sometimes evenimpossible, to access data on the various systems in realtime: network interconnection problems, bandwidthproblems, not to mention protocol and interface disparityissues. Unification: the datawarehouse gathers data in a singlesystem, with a common repository, a unified model, andidentical interface accesses. It allows to create linksbetween data of mixed origins. Computer resources: business intelligence applicationscan take up a lot of CPU, disk space, memory, and thesystems in place are not set up to be able to manage thesenew processes. Specialisation: business intelligence applications havespecific needs which cannot be satisfied by the systemsthat companies have in place. The datawarehouse is a consolidation database, often arelatively standard relational database, but one whichcollects and manipulates large volumes of information.This could be an open source database (MySql, PostgreSql,etc.) or a commercial database (Oracle, SQL-Server). Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page intelligence – Open Source e datawarehouse centralises the business intelligenceinformation derived from the companies various differentinformation systems.OrganisationAs previously mentioned, this new database is not designed toenhance an operational application directly linked to thecompany’s activity (e.g. accounting system, H.R. database,sales agent database etc.). The datawarehouse is merely usedto stock key company information and will be used purely foranalysis and reporting query purposes.This allows the company to analyse this data withoutdiminishing the performance of common production tools.The business intelligence database is modelled to facilitate“queries”. Two common forms of dimensional models are thesnowflake and star schemas. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 16Business intelligence – Open Source solutionsThe tables are linked by only one key field in order to aidquery performance, certain “value list” types of informationwill be repeated several times in dimensional with the modelling of any database, this step is crucial tosuccess of the project. The model must be flexible enough toallow future “data marts” to be received and to ensure thecompany real durability by archiving a large volume ofinformation.The datawarehouse must have a longer life expectancy thanproduction tools. These tools can be enhanced or evenreplaced. In this case, the datawarehouse serves as thecompany’s memory, and only the load interfaces of thisdatawarehouse will be updated.ETL PrinciplesWe use a range of ETL “Extract, Transform, Load” tools to feedthe datawarehouse. As the name indicates, these tools allowto extract data from different sources, to transform this data(format, name), and to load this data on the target database,here the datawarehouse.The transformations entrusted to an ETL are often basic, butcan in some cases include the procedural processes of specificprograms.With an ETL we avoid creating repetitive batch programs,often similar, the maintenance of which must also be ensured.The principle is that the integration of a new data flow doesnot require any development, and is operated by simpleinteractive configuration: we choose the data elements in thesource repository, we specify the basic transformations to becarried out, and we indicate the data’s destination in thedatawarehouse.The ETL can manage different types of data sources, both atentry and exit point, the main ones being RDBMS and XMLflows, but fixed format files or with comma-separated value(CSV) files are also managed.Once an ETL flow has been defined, it is generally triggered ona regular basis; this is controlled by a task planning orscheduling tool. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 17Business intelligence – Open Source solutionsAn ETL generally treats point-to-point flows, i.e. from a singlesource to a single destination.ETLs are designed to function in deferred-processing mode,most often at night. A datawarehouse saves a succession of“photos” of the activity of the company; the ETL allows to feedthe datawarehouse with an additional “photo”. Once the feedis complete, the loaded data becomes statistics which areavailable to filesCSV filesSelection ofincoming fieldsDBMS (SQL)Formattransformationand other basicprocesingsLayout ofoutgoing fieldsDBMS (SQL)XMLXMLConnectorsConnectorsThe dashboardA dashboard is none other than a particular type of report.A dashboard is a report that: Summarises information: everything must fit on one A4page, or on one screenshot; Is most often a composite report, i.e. presents severalindictors, which together offer an overall view of companyactivity; Is often also personalised, that is to say each user consultsa dashboard that relates to his/her activity.Key Performance Indicators (KPI) are the indicators which arepresent from the home page of a business intelligence portalon.Depending on the user needs, the dashboard may be sufficienton its own, or be the entry point to more precise analyses, ofsecondary dashboards, or drill-down accesses. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 18Business intelligence – Open Source solutionsThe term Executive Information System or EIS is used inreference to a business intelligence system designed for use bythe General Management; it contains highly-consolidatedstrategic information.Data mining refers to the search for useful statisticalinformation in a large volume of data.In multidimensional analysis reporting, the user knows whatthey are looking for: e.g. the breakdown of turnover by region.They don’t yet know what the breakdown is, but they do knowwhat it consists of, and that this knowledge is relevant to theirwork.In data mining, the user is looking for hidden statisticalinformation which they haven’t yet been identified: trends,correlation, similarities, etc. Analysis of supermarket tillreceipts will typically show correlations between productpurchases. While correlations some will have a simpleexplanations (razors & shaving gel), other will be moreunexpected.Such results need to be verified as the correlation observedmay not be related to cause and effect but rather be just acoincidence, or the result of a hidden cause, both withphenomena’s such as conjoint consequences.Data mining tools semi-automatically search through largevolumes of data for these correlations & statistical rules,sometimes using a visual reproduction of these rules, whichwill reveal, for example, clusters of points on an axialrepresentation of data.TrendsUp to 2006 open source business intelligence was made up ofJava components focused on specific functions: report edits,OLAP analysis, ETL, etc. These open source components werewritten and maintained by communities which haveprogressively evolved into the professional arena. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 19Business intelligence – Open Source solutionsThe current trend is clearly to professionalize open sourcebusiness intelligence solutions.The open source approach has become a “business model”,both already in place and also very promising, one in whichinvestors are placing their money. Open source has made itpossible to rapidly create key players with global is quite remarkable when we see that in 2008 and 2009, ata time when financers were rare and demanding, bothPentaho and JasperSoft were able to raise highly significantfunds.These editors offer complete business intelligence suites, thetechnology of which is under their control: the open sourcebusiness intelligence environment is based on importantsuites which we analyse below. Copyright Smile - Open Source Solutions – All unauthorised reproduction is strictly forbidden

Page 20Business intelligence – Open Source solutionsBUSINESS turning towards the creation of complete businessintelligence solutions, open source projects each focus on aspecific area of business intelligence.As such, the BIRT and JasperReports projects allow togenerate reports, while the Mondrian and Jpivot projects allowto present data in multidimensional format. These projectswere and still are, designed to be integrated as “components”in specific developments.Some open source business intelligence platforms are basedon proven components and integrate these in order toconstitute a uniform solut

Business intelligence Open source solutions 2010 Edition For further information: Tel.: 33 (0)1 41 40 11 00 Mailto: [email protected] . . techniques", or "decision support tools" or "business intelligence", but basically we are referring to a number of I.T. solutions which allow us to analyze company data, in order to