Transcription

Building PetaByte Warehouses withUnmodified PostgreSQLEmmanuel CecchetMember of Research StaffMay 21st, 2009TopicsIntroduction to frontline data warehousesPetaByte warehouse design with PostgreSQLAster contributions to PostgreSQLQ&A2PGCon 2009, Ottawa 2009 Aster Data Systems

Enterprise Data Warehouse Under StressEnterpriseEnterpriseDataData WarehouseWarehouse3PGCon 2009, Ottawa 2009 Aster Data SystemsOffloading The Enterprise Data WarehouseFrontlineFrontline DataDataWarehouseWarehouse 4Petabytes source data24 x7 AvailabilityTB/Day Load capacityIn-database transformsRapid data accessPGCon 2009, Ottawa 2009 Aster Data SystemsArchivalArchival DataDataWarehouseWarehouseEnterpriseEnterprise DataDataWarehouseWarehouse Petabytes detailed dataLow cost/TBFlexible compressionOn-line accessAging out to off-line

Requirements for Frontline Data Warehouses ! " ! # % & ' & ( ) * , -. / 0 2 15PGCon 2009, Ottawa 2009 Aster Data SystemsWho is Aster Data Systems?Aster nCluster is a software-only RDBMS for large-scalefrontline data warehousing6High performance“Always Parallel” MPP architectureHigh availability“Always On” on-line operationsHigh value analyticsIn-Database MapReduceLow costPetabytes on commodity HWPGCon 2009, Ottawa 2009 Aster Data Systems

MySpace Frontline Data Warehouse 0 -./ %% 011**22%3%311 ! ! '',, #3#3 4 5677689 :; 9 ? @ABC5 D E FGHIJKLMKNJNO PJQRST UVWXYZ[ !"#! % & ' ()* ,\ ] a bc c dc e] f ghhi jklmno pqhh rs tuvutwrx7PGCon 2009, Ottawa 2009 Aster Data SystemsTopicsIntroduction to frontline data warehousesPetaByte warehouse design with PostgreSQLAster contributions to PostgreSQLQ&A8PGCon 2009, Ottawa 2009 Aster Data Systems

Petabyte Datawarehouse DesignPostgreSQL as a building blockDo not hack PostgreSQL to serve the distributed databaseBuild on top of mainline PostgreSQLUse standard Postgres APIsService Oriented ArchitectureHierarchical query planning and optimizationShared nothing replication treating Postgres and Linux as aserviceCompression at the OS level transparently to PostgreSQLIn-database Map-Reduce out-of-process9PGCon 2009, Ottawa 2009 Aster Data SystemsAster nCluster DatabaseQueen Node O O OL JK M N QLMJKO O Q J QKR MK IJJ RMKQL O N M OQKQQueries/AnswersQueries/AnswersQueen Server GroupQKRWorker NodeQueriesQueriesWorker Server GroupDataDataLoader/Exporter Server GroupAster nCluster Database10PGCon 2009, Ottawa 2009 Aster Data Systems ! "# % % & ' Loader Node( ) ) * , - ' . / % 01 2 3 % 01 & '

Query Processing: How It Works C F5 F5 %& & 5 C887 #, 32 %& & %& & . %& &A: Users send queries to the Queen (via ACT, ODBC/JDBC, BI tools, etc.)B: The Queen parses the query, creates an optimal distributed plan, sends subqueriesto the Workers, and supervises the processingC: Workers execute their subqueries (locally or via distributed communication)D: The Queen aggregates Worker results and returns the final result to the user11PGCon 2009, Ottawa 2009 Aster Data SystemsTable Compression ArchitectureRS [email protected] [email protected] [email protected] FLogical Database (eg. Schema)YWZRS [email protected] [email protected] [email protected][email protected]@QGDHIJA [email protected] [email protected] [email protected] [email protected][email protected] [email protected] [email protected][email protected] [email protected] KILMHIJGD [email protected] [email protected] [email protected]@[email protected] [email protected] KILMPhysical TablespacesHow It Works !"# # % # !"# # &#' ( ' ) ' & #* "# ,,#"# &'# ! # #!# ) !"# '#-#'Architecture Benefits:. )/ !"# " 0& ))#" &' 1 2# 3 #' "# - ) 4 & # " ! "# 0# # , ,5 5"# 6 )"# 5!)" # 47 5""# 3 0/ )/8!#", " # 5' 8 &'# !"# 46#", " #9 :;#(8" ( 5#" # ## ,5'' &'# # !"# 12QueryPGCon 2009, Ottawa 2009 Aster Data Systems

Table Compression Enables Powerful ArchivalCompressed - HighCompressed - MediumCompressed - LowOlder data accessed less frequentlyCompress to save space and costOldest data is compressed the most, recent iscompressed the leastCompressed tables are fully available forqueries (true online archival)Not Compressed13PGCon 2009, Ottawa 2009 Aster Data SystemsWhat is “MapReduce” and Why Should I Care?What is MapReduce?Popularized by Processes data in parallel across distributed clusterWhy is MapReduce significant?Empowers ordinary developersWrite application logic, not debug cluster communication codeWhy is In-Database MapReduce significant?Unites MapReduce with SQL: power invoked from SQLDevelop SQL/MR functions with common languages14PGCon 2009, Ottawa 2009 Aster Data Systems

Aster In-Database MapReduceUsers, analysts,applicationsSQL/MR FunctionsIn-Database MapReduceData Store EngineAster nCluster15PGCon 2009, Ottawa 2009 Aster Data SystemsSlide 15In-Database MapReduceExtensible framework (MapReduce SQL) Flexible: Map-Reduce expressiveness, languages, polymorphism Performance: Massively parallel, computational push-down Availability: Fault isolation, resource managementOut-of-process executables Does not use PL/* for custom code execution Can execute Map and Reduce functions in any language that has aruntime on Linux (e.g. Java, Python, C#, C/C , Ruby, Perl, R,etc ) Standard PostgreSQL APIs to send/receive data to executables Fault isolation, security and resource management for arbitraryuser code16PGCon 2009, Ottawa 2009 Aster Data Systems

Always On: Minimize Planned DowntimeRebalance DataAdd CapacityLiveLive QueriesQueriesDataDataBackupBackupBackup &RestoreLoad & Export17PGCon 2009, Ottawa 2009 Aster Data SystemsPrecision Scaling W [ [S [ [ V [ [ [ [ .When more CPU/memory/capacity are needed, new nodes can be added for scale-out.Precision Scaling uses standard PostgreSQL APIs to migrate vWorker partitions tonew nodes either for load balancing (more compute power) or capacity expansionExample: Assume Workers 1/2/3 are 100% CPU-bottlenecked. Incorporation adds anew Worker4 node and migrates over vWorker partitions D/H/L. As a result of loadbalancing, CPU-utilization drops to 75% per node, eliminating hotspots.18PGCon 2009, Ottawa 2009 Aster Data Systems

Replication 0 " 0 " 1 Q K L J O MO M QL R QL QOL JK JK QKJL KJR KL J KJR QM N M QLMJK JL LO Q QMKOL RQLQ JOO QKR O OL JNLQ Q LMLMJK O 0 QKO Q KL LJ JOL OM J P QKR QLQ O M MK PMKN Q M MLM O LJ Q LN RQLQ QK O19PGCon 2009, Ottawa 2009 Aster Data SystemsFault Tolerance & Automatic Online Failover* , -./012'(##)!"#! 3 3 !"#!% 3 3 !"#!& 3 3 !"#!4 3 3Replication FailoverAutomatic, non-disruptive, graceful performance impactReplication RestorationDelta-based (fast) and online (non-disruptive)20PGCon 2009, Ottawa 2009 Aster Data Systems !"#!5 3 3

Using Commodity Hardware [ V M Q ML L K L IMO J NKM L W X [ VW . [ [ X [ . . [ X [ WX [ V [ [X N LM IJ I O N LM I O MOL M NL RJ MO M O Q MK NK L 21 O Q Q ML W V X Q Q J QK MQKQ Q M QM Q PJ OL I PGCon 2009, Ottawa 2009 Aster Data SystemsScaling OnOn-Demand to a PetaByteCommodity HardwareMore Blocks More Power2 TB Building BlockDell, HP, IBM, Intel x8616 GB Memory2.4TB of Storage8 Disks 5k to 10k NodeMassive Power Per Rack 7! E"8# 160 Cores640 GB RAM48 TB SAS22PGCon 2009, Ottawa 2009 Aster Data Systems

Heterogeneous Hardware Support WV CPU Memory8E E [ [ [ [ [ [Disk [ [.Heterogeneous HW support enables customers to addcheaper/faster nodes to existing systems for investmentprotectionMix-n-match different servers as you grow (faster CPUs, morememory, bigger disk drives, different vendors, etc)23PGCon 2009, Ottawa 2009 Aster Data SystemsTopicsIntroduction to frontline data warehousesPetaByte warehouse design with PostgreSQLAster contributions to PostgreSQLQ&A24PGCon 2009, Ottawa 2009 Aster Data Systems

Error Logging in COPYSeparate good from malformed tuplesPer tuple error informationErrors to capture Type mismatch (e.g. text vs. int) Check constraints (e.g. int 0) Malformed chars (e.g. invalid UTF-8 seq.) Missing / extra columnLow-performance overheadActivated on-demand using environment variables25PGCon 2009, Ottawa 2009 Aster Data SystemsError Logging in COPYDetailed error context is logged along with tuple content26PGCon 2009, Ottawa 2009 Aster Data Systems

Error Logging Performance 271 million tuples COPY performancePGCon 2009, Ottawa 2009 Aster Data SystemsAuto-partitioning in COPYCOPY into a parent table route tuplesdirectly in the child table with matchingconstraintsCOPY y2008 FROM ‘data.txt’28PGCon 2009, Ottawa 2009 Aster Data Systems

Auto-partitioning in loadingActivated on-demand set tuple routing in copy 0/1;Configurable LRU cache size set tuple routing cache size 3;COPY performance into parent table similar todirect COPY in child table if data is sortedWill leverage partitioning information in thefuture (WIP for 8.5)29PGCon 2009, Ottawa 2009 Aster Data SystemsOther contributionsTemporary tables and 2PC transactions[Auto-]Partitioning infrastructureRegression test suiteLFI (http://lfi.sourceforge.net/) Fault injection at the library level or below out-of-memory conditions, network connection errors, interruptedsystem calls, data corruption, hardware failures, etc Lightning talk on Friday!30PGCon 2009, Ottawa 2009 Aster Data Systems

TopicsIntroduction to Aster and data warehousingPetaByte warehouse design with PostgreSQLAster contributions to PostgreSQLQ&A31PGCon 2009, Ottawa 2009 Aster Data SystemsPetaByte Warehouses with PostgreSQLUnmodified PostgreSQL“Always Parallel” MPP architecture“Always On” on-line operationsIn-Database MapReducePetaByte on commodity Hardware32PGCon 2009, Ottawa 2009 Aster Data Systems

Aster Data SystemsLearn morewww.asterdata.comFree TDWI report on advanced analytics: asterdata.com/mapreduceFree Gartner webcast on mission-critical DW: asterdata.com/gartnerContact [email protected] 2009, Ottawa 2009 Aster Data SystemsBonus slides34PGCon 2009, Ottawa 2009 Aster Data Systems

nCluster Components 5 C887 FG G 6969 DE8 EG % & ! , '( "& ) & , #%! / 1 / &' # )%&, nCluster 35PGCon 2009, Ottawa 2009 Aster Data SystemsAster Loader )8G# 5G7G9*69 ,-./01/2 34567 8494 :79: ;57497 4 8494 954 : 75 75 [email protected] ;7 A?99B7 7;CDEF.GHIF02 JKB9L B7 3?4875: 457 @4 78 9? MN?5C75: ? N?5C75 ?87:D O 975 459L9L? L 6P 3?4875QLBB 454BB7B B?48 K LRK7 8494 L 9? MN?5C 75: DS/0/TIH2 3L 745BU :;4B4AB7 B?48 75 [email protected] ;7 V [email protected] 3?4875: 9? N?5C 75:WScalable Partitioning ,-./01/2 X459L9L? L 6 @K:9 A7 :;4B4AB7 4 8 ?9 [email protected] 787 9Y7 B?48L 6 5?;7::EF.GHIF02 Z4; Y 3?4875 ; ? 94L : 4 [X459L9L? 75\P QYL; Y K:7: 4 [email protected] 9? 4:: L6 8494 L 9?[AK; C79:\D Z4; Y AK;C 79 L: K LRK7BU @4 78 9? QL9YL 9Y7 ; BK: 975DS/0/TIH2 ]4:9P L 97BBL67 9 459L9L? L 6 8K5L 6 @4::LM7 :;4B7 8494 B?48: G!7 #769 G9 ,-./01/2 ]4LB78 ?87: @4U B?:7 8494 ?5 :L6 L L;4 9BU 85? B?48 75 [email protected] ;7DEF.GHIF02 a 4 ?87 4LB:P ?9Y75 : [email protected]: ;? 9L K7 9? B?48 Vbcd 75 [email protected] ;7 YL9WD [email protected] : ;4 74:LBU57;?M75 B?:9 8494 AU 57 B?48L 6 eKBC ]77875 8494DS/0/TIH2 f494 B?:: 5?97; 9L? g 75 [email protected] ;7 ;? :L: 97 ;U 8K5L 6 ?87 4LBK57D36PGCon 2009, Ottawa 2009 Aster Data Systems

Unmodified PostgreSQL Emmanuel Cecchet Member of Research Staff May 21 st, 2009 Topics . 2 PGCon2009, Ottawa 2009 Aster Data Systems . Enterprise Data Warehouse Under Stress Enterprise Data Warehouse 3 PGCon2009, Ottawa 2009 Aster Data Systems Offloading The Enterprise Data Warehouse Enterprise Data