금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2...

43
금융시스템을 위한 재해 복구 솔루션 Disaster Recovery Solution 오라클 iSD : 최태상/허은

Transcript of 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2...

Page 1: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

금융시스템을위한

재해복구솔루션Disaster Recovery Solution

오라클 iSD : 최태상/허은

Page 2: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템아키텍처

Oracle9i Real Application Clusters와데이타가드를이용한고가용성구축

Page 3: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

발표내용

비상사태개요

오라클재해복구솔루션

RAC 및 DataGuard 최대가용시스템아키텍처

데모 - DataGuard

Page 4: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

비상사태의개요비상사태의개요

Emergency Response:

데이타의손실을최대한줄이면서 신속한대응 (Crisis Management)

Emergency Response:

데이타의손실을최대한줄이면서 신속한대응 (Crisis Management)

Disaster Recovery:

비즈니스를재개하기위한데이타의복구 ( Mainframe Recovery, Distributed Recovery)

Disaster Recovery:

비즈니스를재개하기위한데이타의복구 ( Mainframe Recovery, Distributed Recovery)

Disaster:예상되지못한사태의발생으로기업의비즈니스를일정기간서비스하지못하는상태( Business Interruption, Emergency, Crisis)

Disaster:예상되지못한사태의발생으로기업의비즈니스를일정기간서비스하지못하는상태( Business Interruption, Emergency, Crisis)

BCPBCP

DRDR

Emergency Response

Emergency Response

DisasterDisaster

Business Continuity Planning:비즈니스중단의영향, 복구전략의정형화, 계속적인비즈니스수행을위해사전에필요한준비작업 ( Business Resumption, Contingency Planning, Business Continuity Management )

Business Continuity Planning:비즈니스중단의영향, 복구전략의정형화, 계속적인비즈니스수행을위해사전에필요한준비작업 ( Business Resumption, Contingency Planning, Business Continuity Management )

Page 5: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

인터넷시대의기업들의최대관심사 –시스템의신뢰성과가용성

High Availability (HA) is a top priority for major US organizations (IDC eWorld Survey 2001)

One minute of downtime can cost between $2,500 and $10,000 per minute (Standish Group 2001)

Even 99.9% data availability can cost a company nearly $5m a year (The Standish Group 2001)

Page 6: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

비즈니스중단 - 비용측면

Business Average Hourly Impact

Retail Brokerage $6.5 million

Credit Card Sales Authorization $2.6 million

Pay-per-View $150,000

Home Shopping Channels $113,000

Catalog Sales $90,000

Airline Reservation Centers $90,000

Tele-Ticket Sales $69,000

Package Shipping Service $28,000

ATM Fees $15,000

Page 7: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

가용성레벨

95%95% 1818 66 00

99%99% 33 1515 3636

99.9%99.9% 00 88 4646

99.99%99.99% 00 00 5353

99.999%99.999% 00 00 55

PercentageAvailability Days

Downtime Per Year (7x24x365)Hours Minutes

99.9999%99.9999% 00 00 11

Page 8: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

고가용성의목표

MTTF (Mean Time To Failure ) 의최대화– 24X7의신뢰성있는비즈니스운영환경제공

MTTR(Mean Time to Recover) 최소화– 비즈니스의영향을최소화

모든재난에대한시스템접근허용 –데이타코럽션, 사용자실수, 바이러스등

Page 9: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

가용성과업타임(Uptime)

Availability

Uptime

99.82% 16 hours

Unplanned DowntimeIgnores Planned

Downtime

24 X 6.67 8 hours

Planned DowntimeIgnores Unplanned

Downtime

Page 10: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Event Timeline: Day 1: Morning of Tues, Sept 11, 2001Event Timeline: Day 1: Morning of Tues, Sept 11, 20018:48 AM EST

Hijacked airliner crashes into WTC, North Tower

9:03 AM EST

Hijacked airliner crashes into WTC, South Tower

9:38 AM EST

Hijacked airliner crashes into Pentagon, Wash DC

9:59 AM EST

WTC South Tower collapses

10:05 AM EST

Security Operations Center established

10:28 AM EST

National Sales reports closing of east coast branches in hi-rises

10:29 AM EST

WTC North Tower collapses

10:29 AM EST

Employee messages posted on voicemail and 800-MER-HELP; m/f datacentersconfirmed operational; PWS begins setup at 450 Lexington Ave.

10:40 AM EST

Hijacked airliner crashes to ground in Somerset County, PA

11:40 AM EST

New York City orders evacuation of 570 Washington Street. (Reopens 1:00 PM)

11:45 AM EST

ML Canada reported the evacuation of all buildingsXxx Street evacuated

.

Company Actions

Corporate Response Team at Hopewell, NJ

Impacting Events

9:15 AM EST

Merrill Lynch NYC buildings evacuated; CRT members notified by E-Mail and Paging

9:47 AM EST

TIS Command Center established at

10:00 AM EST

ML organizations report post evacuation updates to CRT; IPCG Executives contacted by group’s BCP Team

9:25 AM EST

CRT conference call line made available as firm’s single point of contact.

Page 11: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

비상사태의유형형태

HumanHumanErrorsErrors

Data FailuresData Failuresand Disastersand Disasters

System System FailuresFailures

Drop Tables,Drop Tables,Administrator ErrorsAdministrator Errors

Terror,Terror,Data Corruption, Data Corruption, Flood, Fire, EarthquakesFlood, Fire, Earthquakes

Power Outages,Power Outages,System CrashesSystem Crashes

Unplanned Unplanned OutagesOutages

Page 12: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

오라클재해복구솔루션

System System FailuresFailures

Data FailureData Failureand Disastersand Disasters

Human ErrorsHuman Errors

Real Application ClustersContinuous availability for all applications

Real Application Clusters GuardZero data loss

UnplannedUnplannedOutagesOutages

Data Guard and Flashback QueryEnable users to correct their mistakes

SystemSystemMaintenanceMaintenance

Data Data MaintenanceMaintenance

Data Guard and Dynamic ReconfigurationCapacity on demand without interruption

PlannedPlannedOutagesOutages

Online RedefinitionAdapt to changes online

Page 13: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Oracle9i Real Application Clusters Oracle9i Real Application Clusters 가용성및구조상의이점

Page 14: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Oracle9i Real Application Clusters란 ?

Instance XNode 1

Instance YNode 2

HighHigh--speed interconnectspeed interconnectInstance Z

Node 3

Shared StorageShared Storage

Database filesDatabase files

Page 15: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Real Application Clusters의구조

Network

Low Latency Interconnectie. VIA or Proprietary

Users

No SinglePoint Of Failure

Shared CacheShared Cache

Centralized Management

Console

High Speed Switch or

Interconnect

ClusteredDatabase

Servers

Hub or Switch Fabric Storage Area Network

Mirrored Disk Subsystem

Page 16: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

RAC(Real Application Clusters)의이점–속도향상, 확장성및최대가용성제공

Instance XNode 1

Instance YNode 2

Instance ZNode 3

Larger data

Higherreliability and

availability

Non usertasks

Higher

Greatercomplexity

Lower response

times

Lessmanagement

overhead

Greater user population

volumes throughput rate

Page 17: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Real Application Clusters의장점

노드간의효율적인메세지교환

모든노드에장애가발생하지않는한데이타의접근가능

복구비용은시스템장애발생횟수에따라증가함 –노드수에는관계없음

클러스터링기반의옵티마이저

업무의로드밸런싱

Page 18: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Oracle9i Real Application ClustersOracle9i Real Application Clusters

최적의구성형태

Page 19: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

RAC의최적구성

RACRACDatabaseDatabase

Instance 1Instance 1 Instance 2Instance 2

Local Local

DiskDisk

Local Local

DiskDiskOracle SoftwareOracle Software

Archive LogsArchive LogsOracle Oracle SoftwareSoftware

Archive LogsArchive LogsData FilesData Files

Control FilesControl Files

Redo LogsRedo Logs

SPFILESPFILE

Page 20: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타데이타 가드가드

구성및가용성

Page 21: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드란 ?

기본(Primary) DB의복사본을생성,관리및운영을자동으로해주는데이타베이스

기본 DB에장애가발생하였을때(재해, 유지보수시) 대기(Standby) DB가비즈니스재개에필요한데이타, 프로그램을수행

Page 22: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드구조

Clients

Data Guard Broker

StandbyDatabase B

roke

r Age

ntB

roke

r Age

nt

Bro

ker A

gent

PrimaryDatabase

Clients

Primary Site

Standby Site

Data Changes

Page 23: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

동작원리 ?

기본 DB의리두로그를대기 DB로전송대기 DB는기본 DB의내용으로계속적으로 Sync 됨기본 DB 활성화 ; 대기 DB는복구모드나 Read-only /Read-Write 모드로활성화필요에따라대기 DB가기본 DB로전환가능

Page 24: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드구성

Managed as a single configurationPrimary and standby databases can be single-instance Oracle or Real Application ClustersUp to nine standby databases supported in a single configuration

PrimaryDatabase

StandbyDatabase

Standby Site A

StandbyDatabase

Standby Site B

Primary Site

Page 25: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드구성의상세사항Physical/Logical

StandbyDatabase

PrimaryDatabase

Online Redo Logs

ARCH(Synchronous)

RFS

StandbyRedo Logs

ARCH

FAL

Oracle NetTransactions

LGWR(Synchronous/Asynchronous)

Backup /Reports

MRP/ LSPAffirm/NoAffirm

Transform Redo to SQLfor SQL Apply

Archived Redo Logs Archived Redo Logs

Page 26: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드 - 리두적용

Data Guard Broker

Physical StandbyDatabase

OptionalDelay

Sync or Async Redo Shipping

Network

Redo Apply

DIGITAL DATA STORAGE

PrimaryDatabase

Backup

Physical Standby Database is a block-for-block copy of the primary databaseUses the database recovery functionality to apply changesCan be opened in read-only mode for reporting/queriesCan also perform backup, offloading production database

Page 27: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드 - SQL 적용

Optional Delay

Sync or Async Redo Shipping

NetworkContinuously

Open for Reports

Transform Redo to SQL and Apply

Data Guard Broker

Logical StandbyDatabase

AdditionalIndexes &

Materialized ViewsPrimary

Database

Logical Standby Database is an open, independent, active databaseContains the same logical information (rows) as the production databasePhysical organization and structure can be very differentCan host multiple schemas

Can be queried for reports while logs are being applied via SQLCan create additional indexes and materialized views for better query performance

Page 28: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

인적재해나데이타코럽션으로부터의보호

The application of changes received from the primary can be delayed at standby to allow for the detection of user errors andprevent standby to be affectedThe apply process also revalidates the log records to prevent application of any log corruptions

Primary Site

StandbyDatabase

Standby Site

ProductionDatabase

Optional Delayed Apply

Page 29: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Switch & Failover

Primary and Standby role transitionsSwitchover

– Planned role reversal– No database reinstantiation required

Failover – Unplanned failure (e.g. disasters) of primary – Primary database must be reinstantiated

Initiated using SQL or GUI interfaceData Guard automates the processes involved including full support for RAC

Page 30: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

유연한데이타보호모드

Asynchronous redo shipping

Minimal data loss – usually 0 to few seconds

Maximum Performance

Synchronous redo shipping

ZeroSingle Failure Protection

Maximum Availability

Synchronous redo shipping to 2 sites

ZeroDouble Failure Protection

Maximum Protection

Redo ShipmentRisk of Data LossProtection Mode

Balance cost, availability, performance, and transaction protection

Page 31: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템구조(Maximum Availability Architecture)

Overview

Page 32: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템구조

오라클의재해솔루션을바탕으로한신뢰성있는시스템구조

MTTR을최소화하는반면 MTTF를최대화하는 3-tier 구성방식

RAC & Data Guard 로구성됨고객의요구를수용하여 Customize 될수있음

Page 33: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템구조 –구성요소

• MAA의주요컴포넌트• 다중미들티어및애플리케이션

• 다중네트워크인프라스트럭쳐

• 다중스토리지인프라스트럭쳐

• Real Application Clusters (RAC)을이용한시스템및인스턴스장애로부터복구

• Data Guard (DG)를이용한인재및데이타파일장애로부터복구

• 운영프로세스 정리

Page 34: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Real Application Clusters 가드

데이타가드와 Real Application Clusters는상호보완적인관계

RAC을이용하여재해복구시스템을구축하여가용성향상

재해및데이타손실을방지하는데이타가드

RAC는데이타코럽션이나인재를방지하지못함

Page 35: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템구조

Page 36: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

최대가용시스템의복구현황

MTTROutages Solution Sets

Hardware andSoftware Patches

Human Errors andData Failures

Host and InstanceFailures

Site Disasters

<= 30 minutes

<= 30 minutes

<= 1-5 minutes

<= 30 minutes

RAC or Data Guard

Data Guard

RAC

Data Guard on Secondary Site

Page 37: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

재해복구시스템구축시성능고려사항

동일한사양의기본및대기데이타베이스구축– 대기데이타베이스로운영을할경우대비

네트워크의속도가전체응답시간에미치는영향– Network latency will increase response time

Remote write = network round trip time + local write I/O time

– Bandwidth > max redo generation rate

Page 38: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드와리모트미러링효율성비교

오라클데이타가드의장점– Better Performance

A Standby database propagates only changes, not every IO to a Standby database

– Better Resilience - changes are logically validated – Better Error tolerance - application of changes can be delayed to

back out mistakes– Better ROI - Standby allows read-only access 리모트미러링의장점

– useful for non-database files

Page 39: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

데이타가드와리모트미러링성능비교

If mirroring is used for a database, then the database files, the online logs, and the archive logs must be mirrored resulting in much worse performance

Network Bandwidth Network I/Os

Standby Database

Remote M irroring

77xx 2727xx

Page 40: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

DEMO 구축환경

Page 41: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

DEMO 시나리오

기본(Primary) 및대기(Standby) DB 기동및환경체크Gap Detection and Resolution(네트워크장애발생시)Delayed Apply of Redo Information(인적장애발생시)Creating a configuration with DGM(DataGuard Manager)Force Logging on primary database(no logging 문장수행시)Graceful Switchover (계획된시스템정지시)

Page 42: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

재해복구시스템구축시전반적인고려사항

데이타손실이나시스템의장애시발생되는경비 ?복구시스템에복제되는주요데이타의옵션 ?최소한의데이타손실을허용할것인가 ?성능을위해고려해야할사항은없는가 ?기본적인인프라스트럭쳐를변경하지않으면서재해복구시스템을구축할수는없는가 ?

Page 43: 금융시스템을위한 재해복구솔루션 - Oracle...란? Instance X Node 1 Instance Y Node 2 High-speed interconnect Instance Z Node 3 Shared Storage Database files Real Application

Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S