Curs1
ProgramareParalelasiDistribuita
Curs 1 - PPD - 2
Con4nutulcursului(realizatsipebazapeh:p://grid.cs.gsu.edu/~tcpp/curriculum/?q=home)
Teore%c• No4uniintroduc4ve:arhitecturi,
concurenta,paralelism• Etapeindezvoltareaprogramelorparalele• Evaluareaperformanteiprogramelor
paralele• Modeledeprogramareparalela
– Diferentaintrecelebazatepememoriepartajatasimemoriedistribuita
• Pa#erns– Ptprogramareparalela– Ptprogramaredistribuita
Prac%c• Javathreads(lowlevelAPI)• C++(C++11)threads• High-levelAPI:pacheteJava->
java.u4l.concurrentpackages.• Javastreams• OpenMP(C++)• CUDA(C++)• MPI–MessagePassingInterface
– exemplificariC,C++
Curs 1 - PPD - 3
Bibliografie• IanFoster.DesigningandBuildingParallelPrograms,Addison-Wesley1995.• BernaL.Massingill,TimothyG.Ma:son,andBeverlyA.Sanders,AddisonAPa:ernLanguagefor
ParallelProgramming.WesleySodwarePa:ernsSeries,2004.• MichaelMcCool,ArchRobinson,JamesReinders,StructuredParallelProgramming:Pa:ernsfor
EfficientComputa4on,”MorganKaufmann,,2012.• D.Culler,J.PalSingh,A.Gupta.ParallelComputerArchitecture:AHardware/SodwareApproach.
MorganKaufmann.1998.• Grama,A.Gupta,G.Karypis,V.Kumar.Introduc4ontoParallelCompu4ng,AddisonWesley,2003.• D.Grigoras.CalcululParalel.Delasistemelaprogramareaaplica4ilor.ComputerLibrisAgora,2000.• V.Niculescu.CalculParalel.Proiectaresidezvoltareformalaaprogramelorparalele.PresaUniv.
Clujana,2006.• B.Wilkinson,M.Allen,ParallelProgrammingTechniquesandApplica4onsUsingNetworked
Worksta4onsandParallelComputers,Pren4ceHall,2002• A.Williams.C++ConcurrencyinAc4onPRACTICALMULTITHREADING.ManningPublisher.2012.• TutorialeJava:h:p://docs.oracle.com/javase/tutorial/essen4al/concurrency/further.html• C++11h:p://en.cppreference.com/w/• OpenMP:h:p://openmp.org/• MPI:h:p://www.mpi-forum.org/
Curs 1 - PPD - 4
Evaluare
• Laborator(25%)– Exerci4ilaborator– ProgramefolosindJava/C++threads,OpenMP,MPI
• Seminar(10%)– prezenta,par4cipareac4va
• Testeprac4ce–3X10%• Examen
– Scris– sesiune35%
• Informa4icurs– h:p://www.cs.ubbcluj.ro/~vniculescu/didac4c/PPD/CursPPD.html
5Curs 1 - PPD -
ProcesareParalela
• Uncalculatorparalelesteuncalculator(sistem)carefolosestemul%pleelementedeprocesaresimultanaintr-omanieracoopera%vapentruarezolvaoproblemacomputa%onala.
• ProcesareaParalelaincludetehnicisitehnologiicarefacposibilcalcululinparalel– Hardware,retele,SO,biblioteci,limbaje,compilatoare,algoritmi…
• Paralelismulestenatural.
• PERFORMANTA– Parallelismisverymuchaboutperformance!
6Curs 1 - PPD -
Curs 1 - PPD - 7
CalculSerialvs.Paralel(imagesfromIntroduc%ontoParallelCompu%ngBlaiseBarney)
Itwouldappearthatwehavereachedthelimitsofwhatitispossibletoachievewithcomputertechnology,althoughoneshouldbecarefulwithsuchstatements,astheytendtosoundpre9ysillyin5years.(JohnvonNeumann,1949)
Curs 1 - PPD - 8
Curs 1 - PPD - 9
Limitealeprogramariiseriale• Vitezadetransmisie–
• Vitezaluminii(30cm/nanosecond),limitadetransmisiepefirdecupru(9cm/nanosecond).
• Limitareaminiaturizarii–numardetrazistoripechip.– LegealuiMoore:număruldetranzistori
carepotfiplasa4peussingurcircuitintegrat(persquareinchchip)sedubleazalafiecare2ani.
– Darimpunecosturimari.
• Limitarieconomice
Istoric
• CrestereaperformanteiprocesorprincrestereafrecventeiceasuluiCPU(CPUclockfrequency)– RidingMoore’slaw
• Probleme:incalzireaputernicaachipurilor!– Frecventaceasmaimare⇒consumelectricmaimare(Pen:um4heatsink¦FryinganeggonaPen:um4)
• Solu4e–adugaremaimultorcore-uriptaajungelaperformantadorita– Sepastreazafrecventadeceaslafelsauchiarmicsorare– nucresteconsumul.
Curs 1 - PPD - 10
Nivelurideparalelism
1.paralelismlaniveldejob:-intrejoburi;-intrefazealejoburilor;2.paralelismlaniveldeprogram:-intrepărţialeprogramului;-inanumitecicluri;3.paralelismlaniveldeinstrucţiune:
-intrediferitefazedeexecuţiealeuneiinstrucţiuni;4.paralelismlanivelaritme%cşilaniveldebit:- intreelementealeuneioperaţiivectoriale;- intrecircuitelelogiciiaritme4ce.
Curs 1 - PPD - 11
• Arhitecturilecurentesebazeazatotmaimultpeparalelismlanivelhardwarepentruaimbunata4performanta:
– Mul4pleexecu4onunits– Pipelinedinstruc4ons– Mul4-core
Curs 1 - PPD - 12
Paralelism<->Concurenta
• Considerammaimultetaskuricaretrebuieexecutatepeuncalculator• Taskurileseconsideraafipurparaleledaca:
– Sepotexecutainacelasi4mp(parallelexecu:on)• Rezultacanuexistadependenteintreele;
• Dependente-execu4econcurenta:– Untaskarenevoiederezultatelealtora;– Untasktrebuiesaseexecutedupaceoanumitacondi4eeindeplinita– Maimultetaskuriincearcasafoloseascaaceeasiresursa
=>Formedesincronizaretrebuiefolositepentruasa4sfacedependentele;
• Concurentaestefundamentalaincomputerscience– Sistemedeoperare,bazededate,networking,…
13Curs 1 - PPD -
Paralelismvs.Concurenta
14 Curs 1 - PPD -
Obs:– Sepotfolosithreadurisauproceseinambelecazuri– Dacauncalculparalelnecesitaacceslaresursecomuneatunci
estenevoiesaseges4onezecorectconcurenta=>Paralelismulpoateimplicaconcurenta
Paralelism:Sefolosescmaimulteresursepentruarezolvaoproblemamairapid
resources
Concurenta:Ges4uneacorectasieficientaaaccesuluilaresursecomune
requestswork
resource
ConcurentasiParalelism
• Concurent<>paralel!• Execu4eParalela:
– Taskurileseexecutaefec4vinacelasi4mp;
– Estenecesaraexistentademul4pleresursedecalcul
• Paralelism=concurenta+hardware“paralel”
15Curs 1 - PPD -
Paralelism
• Existamaimultenivelurideparalelism:– Procese,threads,rou4ne,instruc4uni,…
• Trebuiesafiesuportatederesurselehardware– Procesoare,cores,…(execu4ainstruc4unilor)– Memorii,DMA,retele,…(opera4iasociate)
16Curs 1 - PPD -
Decesafolosimprogramareparalela?
• Mo4veprimare:– Timpdecalculmairapid(response:me)– Rezolvareaprobelemelor‘mari’decalcul(in4mprezonabildecalcul)
• Mo4vesecundare:– Folosireaefec4vaaresurselordecalcul– Costurireduse– Reducereaconstrangerilorasociatememoriei– Limitarilemasinilorseriale
• Paralelism=concurenta+parallelHardware+performanta
17Curs 1 - PPD -
Curs 1 - PPD - 18
• Rezolvareaproblemelordificile,mari:– "GrandChallenge"(en.wikipedia.org/wiki/Grand_Challenge)problemsrequiring
PetaFLOPSandPetaBytesofcompu4ngresources.– Websearchengines/databasesprocessingmillionsoftransac4onspersecond
• Folosirearesurselornon-locale:
– SETI@home(se4athome.berkeley.edu)usesover330,000computersforacomputepowerover528TeraFLOPS(asofAugust04,2008)
– Folding@home(folding.stanford.edu)usesover340,000computersforacomputepowerof4.2PetaFLOPS(asofNovember4,2008)
Direc4iinprocesareaparalela
• Arhitecturiparalele– Necesita4Hardware– Computersystemdesign
• Sistemedeoperare(Paralelism/concurenta)• Ges4onareaaspectelorsistemptuncalculatorparalel• Programareparalela
– Biblioteci(low-level,high-level)– Limbaje– Mediidedezv.– Sodware
• AlgoritmiParaleli• Evaluareaperformanteiprogramelorparalele• Testareavs.asigurareacorec4tudinii• Paralleltools:
– Performanta,analize,vizualizare,…
19Curs 1 - PPD -
Decesastudiemprogramareparalela?
• Arhitecturidecalcul– Inova4ileconduclanoimodeledeprogramare
• Convergentatehnologica– “killermicro”estepestetot– Laptop-urilesisupercomputeresuntfundamentalsimilare– Trend-urileactualeconduclaconvergentaabordarilordiverse
• Treduriletehnologicefaccalcululparalelinevitabil– Mul4-coreprocessors!– Acumoricesistemdecalculesteparalel
• Intelegereaprincipiilorfundamentale– Programare,comunica4i,memorie,…– Performanta
• “Parallelismisthefutureofcompu%ng”-BlaiseBarney– M.Andrews,J.S.Walicki.“Concurrencyandparallelism—futureofcompu4ng”in
ProceedingofACM'85Proceedingsofthe1985ACMannualconferenceonTherangeofcompu4ng:mid-80'sperspec4ve.pp.224-231.
20Curs 1 - PPD -
InevitabilitateaProcesariiParalele
• Cerinteleptaplica4i– Necesitateauriasadecicluridecalcul
• Trenduritehnologice– Procesaresimemorie
• TrenduriArchitecturale• Factorieconomici• Treduriactuale:
– Today’smicroprocessorshavemul:processorsupport– Serversandworksta:onsavailableasmul:processors– Tomorrow’smicroprocessorsaremul:processors– Mul:-coreisheretostayand#cores/processorisgrowing– Accelerators(GPUs,gamingsystems)
21Curs 1 - PPD -
Aplica4iinforma4ceperformante
• Performantaaplica4ilorimpunehardwareperformant(rapid,resursemul4ple,etc)
• Hardware-ulavansatgenereazanoiaplica4i• Noileaplica4iaucererideperformantamaimari
– Crestereexponen4alaaperformanteimicroprocesoarelor– Inova4iinarhitecturileparalelesiinintegrare
• Cerintedeperformanta=>– Performantasistemelortrebuiesaseimbunatateascainansamblu– Schimbari/abordari/reevaluariinSodwareengineering– Costuri-technologieavansata
applica4onsperformance
hardware
22Curs 1 - PPD -
Curs 1 - PPD - 23
Programareparalelavs.Programaredistribuita
INTERCONNECTION NETWORK
P2
P3
P1
P4 P5
Pn . . . .
Curs 1 - PPD - 24
TiPURI DE MULTIPROCESARE PARALLEL DISTRIBUTED
ASPECTE TEHNICE • PARALLEL COMPUTERS (- IN MOD UZUAL ) LUCREAZA BAZAT PE
• CUPLARE STRANSA, • in general bazate pe SINCRONICITATE, • CU UN SISTEM DE COMUNICATIE FOARTE RAPID SI FIABIL • Spatiu unic de adresare (intr-o masura mare)
• DISTRIBUTED COMPUTERS • MAI INDEPENDENTE, • COMUNICATIE MAI PUTIN FRECVENTA SI mai putin RAPIDA (ASINCRONA) • COOPERARE LIMITATA • NU EXISTA CEAS GLOBAL • “Independent failures”
SCOPURI
• PARALLEL COMPUTERS COOPEREAZA PENTRU A REZOLVA MAI EFICIENT PROBLEME DIFICILE
• DISTRIBUTED COMPUTERS AU SCOPURI INDIVIDUALE SI ACTIVITATI PRIVATE. DOAR UNEORI INTERCOMUNICAREA ESTE NECESARA
PARALLEL COMPUTERS: COOPERARE IN SENS POSITIV
DISTRIBUTED COMPUTERS: COOPERARE IN SENS NEGATIV, DOAR ATUNCI CAND ESTE NECESARA
Curs 1 - PPD - 25
Aplicatii paralele Suntem interesati sa rezolvam problemele mai rapid in paralel
Aplicatii distribuite
Suntem interesati sa rezolvam anumite probleme specifice :
• COMMUNICATION SERVICES ROUTING BROADCASTING
• MAINTENANCE OF CONTROL STUCTURE TOPOLOGY UPDATE LEADER ELECTION • RESOURCE CONTROL ACTIVITIES LOAD BALANCING MANAGING GLOBAL DIRECTORIES
Ingeneral…
Curs 1 - PPD - 26
Parallelv.s.DistributedSystems(fromM.FUKUDACSS434SystemModels)
Parallel Systems Distributed Systems
Memory Tightly coupled shared memory UMA, NUMA
Distributed memory Message passing, RPC, and/or used of distributed shared memory
Control Global clock control SIMD, MIMD
No global clock control Synchronization algorithms needed
Processor interconnection
Order of Tbps Bus, mesh, tree, mesh of tree, and hypercube (-related) network
Order of Gbps Ethernet(bus), token ring and SCI (ring), myrinet(switching network)
Main focus Performance Ex. - Scientific computing
Performance(cost and scalability) Reliability/availability Information/resource sharing
Unpunctdevedere…
Curs 1 - PPD - 27
SistemeleDistribuite
-potfifolositepentru-
• Aplica4idistribuiteimplicit– BDDistribuite,rezervaribilieteavion/etc.sistembancar
• Informa4ipartajateintreuseri• Partajareresurse• Raportcost/performantamaibunptaplica4iparalele
– Potfifolositeeficientpt.aplica4icugranularitatemare(coarse-grained)si/sauptaplica4iparalelede4pembarrassinglyparallelapplica:ons
• Fiabilitate(Reliability).• Scalabilitate
– Cuplareslaba(Looselycoupledconnec4on);hotplug-in
• Flexibilitate– Reconfiguraresistemptaintrunicerintele
Curs 1 - PPD - 28
Performanta/Scalabilitate
Spredeosebiredesistemeleparaleleceledistribuiteimplica:- interven4asistemuluideoperare- mediumaipu4nrapiddetransferaldatelor(reteamaipu4nrapida)- HeterogenitateSolu4i:-Procesarebatchamesajelor:
• Seevitainterven4aSOptfiecaretransferdemesaj.– Cachedata
• Seevitarepetareatransferuluiaceleiasidate– Evitateaen4ta4lorsiaalgoritmilorcentraliza4
• Evitareasaturariiretelei– Realizareopera4i“post”lanivelulclientului
• Evitareatraficuluiintensintreclien4siservere
– ….
Curs 1 - PPD - 29
Securitate
• Nuexistadoarunsingurpunctdecontrol• Probleme:
– Mesaje,furate,modificate,copiate,…• Solu4e:folosireCriptografie
– Failures• FaultTolerancesolu4ons
Tipuridesistemedistribuite(Munehiro Fukuda – PDP Fundamentals)
• Modele– Minicomputer– Worksta4on– Worksta4on-server– Processor-pool– Cluster– Gridcompu4ng
Curs 1 - PPD - 30
31
ModelulMinicomputer
• ExtensieasistemuluideTimesharing– Useriitrebuiesaseloghezepepropriulminicomputer.– Apoiselogheazalaaltamasina(remotemachine)prinprogramede
4ptelnet.• Partajareresurse
– DB– High-performancedevices
Mini-computer
Mini-computer
Mini-computer
net
Curs 1 - PPD -
Curs 1 - PPD - 32
ModelulWorksta4on
• MigrareProcese– Useriiselogheazamaiintaipesta4adelucrupersonala;– Dacaexistasta4i“inasteptare”–unjob“mare”poatemigrala
unadintreele.• Probleme:
– Cumseiden4ficasta4ile“inasteptare”(idle)?– Cummigreazaunjob?– Ceseintampladacaunaltuserselogheazapemasinafolosita?
100MbpsLAN
Worksta4on
Worksta4on Worksta4on
Worksta4onWorksta4on
Curs 1 - PPD - 33
ModelulWorksta4on-Server
• Sta4iClient– Aplica4ileGrafice/interac4veseproceseazalocal– Ptaltecereridecalculsetrimitcererilaservere.
• Servere(minicomputers)– Fiecareserverestededicatunuiasaumaimultor
4purideservicii.• Modeldecomunicare:Client-Servermodel
– RPC(RemoteProcedureCall)– RMI(RemoteMethodInvoca4on)
• Unprocesclientcheamaofunc4eaprocesuluiserver.
• Nusefacemigraredeprocese• Examplu:NFS
100GbpsLAN
Worksta4on
Worksta4on Worksta4on
Mini-Computerfileserver
Mini-ComputerhNpserver
Mini-Computercycleserver
Curs 1 - PPD - 34
ModelulProcessor-Pool
• Clien4i:– Selogheazalaunterminal– Toateserviciilesuntges4onatede
catreservere.• Servere:
– Ptfiecareusersealocanrnecesardeprocesoaredinpool.
• U4lizarebunadarinterac4vitateslaba.
Server1
100MbpsLAN
ServerN
Curs 1 - PPD - 35
ClusterModel
• ConstainmaimultePC/worksta4onsconectatelaoreteade4phigh-speed.
• Focuspeperformanta.100Mbps
LAN
Worksta4on
Worksta4on Worksta4on
Masternode
Slave1
SlaveN
Slave2
1GbpsSAN
h:pserver1
h:pserver2
h:pserverN
Curs 1 - PPD - 36
High-speedInforma4onhighway
GridCompu4ng
• Scop– Colectareaputeridecalculamaimultor
supercomputeresiclusteredispersategeografic
• DistributedSupercompu:ng– Ptproblemwfoartemari.Dificile.
(CPUintensive,memoryintensive).• High-ThroughputCompu4ng
– Folosireamultorresursecarenusuntfolosite
• On-DemandCompu4ng– Resurseladistantaintegrateincalculullocal
• Data-intensiveCompu4ng– distributeddata
• Collabora4veCompu4ng– Suportptcomunicareintremaimultepar4
Super-computer
Cluster
Super-computer Cluster
Mini-computer
Worksta4on
Worksta4on Worksta4on