這張陰森魔幻的森林圖由Google Gemini生成。我網路上找了3種森林圖繪製案例(被我看上眼的,都有水準之上),都由SAS程式碼產生。文中會大概說明各段程式碼分別控制圖內哪些元素和區域。
第1種程式碼繪製成品(點圖可原圖呈現)
這是張經典的Meta Analysis統計圖,將各團隊研究結果綜整於這張表中。
案例出處: SAS官網 70217 - Create a forest plot with a YAXISTABLE
以下程式碼有被我小改
data forest;
input Study $1-16 grp OR LCL UCL Weight;
format Weight percent5. or lcl ucl 5.3;
format Q1 Q3 4.2 ;
or2=or;
if grp=1 then do;
%let wtFactor=0.008;
weight = weight*0.05;
Q1=or - or * weight; /* bar width for weight relative to OR */
Q3=or + or * weight;
or2=.;
end;
datalines;
Modano (1967) 1 0.590 0.096 3.634 1
Borodan (1981) 1 0.464 0.201 1.074 3.5
Leighton (1972) 1 0.394 0.076 2.055 2
Novak (1992) 1 0.490 0.088 2.737 2
Stawer (1998) 1 1.250 0.479 3.261 3
Truark (2002) 1 0.129 0.027 0.605 2.5
Fayney (2005) 1 0.313 0.054 1.805 2
Modano (1969) 1 0.429 0.070 2.620 2
Soloway (2000) 1 0.718 0.237 2.179 3
Adams (1999) 1 0.143 0.082 0.250 4
Truark2 (2002) 1 0.129 0.027 0.605 2.5
Fayney2 (2005) 1 0.313 0.054 1.805 2
Modano2 (1969) 1 0.429 0.070 2.620 2
Soloway2(2000) 1 0.718 0.237 2.179 3
Adams2 (1999) 1 0.143 0.082 0.250 4
Overall 2 0.328 . . .
;
run;
data bottomLabels;
study='Type'; lblX=0.1; lbl='Favors Treatment'; output;
study='Type'; lblX=10; lbl='Favors Placebo'; output;
run;
data forestFinal;
set forest bottomLabels;
run;
再拿此資料集來畫森林圖。森林圖程式碼如下:
ods _all_ close; 關閉所有ods作用
option missing=' '; 讓所有缺失值在輸出時顯示成空白
ods listing style=RTF image_dpi=300;
上例的analysis是一種報表風格,你也可以改成RTF風格、journal風格、...很多可選*/
ods graphics / imagename="第1案例";
title "Impact of Treatment on Mortality by Study";
title2 h=8pt 'Odds Ratio and 95% CL';
準備畫圖,故用ods graphics和ods graphics off將主語法區夾起來
設立標題和次標題
proc sgplot data=forestFinal noautolegend nocycleattrs nowall noborder;
上面4個NO就是調視覺效果的,把它拿掉跑一次就知效果
styleattrs axisextent=data;
可試著改用=full看看,控制軸是否延伸的指令
yaxistable study / y=study location=inside position=left labelattrs=(size=7);
Study欄位(由上至下排,也因如此,故是Y=,而不是X=)
上面補充說明: study欄位若要放右邊, left可改成right,size=7調整數字可控制study那欄字大小
highLow y=study low=lcl high=ucl / type=line;
Odds-ratio線, 那條短橫線,可做多種變化,例如雙箭頭,可用highlow statement去google查詢,
像這樣:
SAS Help Center: Syntax: PROC SGPLOT HIGHLOW Statement另外,也可加上lowlabel=lcl highlabel=ucl,就會變出這樣的效果,線端標出LCL和UCL值

在免費的SAS Studio上執行,一樣可以跑出來(只是某些路徑要額外調整,就能產生Log日誌零錯誤的完美作業)。
highLow y=study low=q1 high=q3 / type=bar barWidth=0.7;
權重方塊: 說明該研究的份量(簡言之),高是固定的(右列0.7就是定值,當然也可改),權重大寬就大(往左右擴)
scatter y=study x=or2 / markerattrs=graphdata2(symbol=diamondFilled);
標定Overall的odds ratio並設定樣式
yaxistable or lcl ucl weight / y=study location=inside position=right labelattrs=(size=7);
放入右邊4個欄位並設定位置和字大小
refline 1 / axis=x noclip;
refline 0.01 0.1 10 100 / axis=x lineattrs=(pattern=shortdash) transparency=0.5 noclip;
在X軸上各插上1條和4條參考線,因格式有兩版本,故分開寫
text y=study x=lblX text=lbl / position=center contributeoffsets=none;
依xy值放入Favors Treatment和Favors Placebo
xaxis type=log max=100 minor display=(nolabel) valueattrs=(size=7);
x軸設成LOG,即把跨距壓小,不顯示x軸名,設x軸值字大小
yaxis display=none fitpolicy=none reverse colorbands=even colorbandsattrs=Graphdatadefault(transparency=0.8);
y軸讓它整個消失,軸值反轉呈現,y軸每列間隔設立色帶及其格式
run;title;
ods graphics off;
第2種程式碼繪製成品
細節請參閱該網頁
程式碼如下:
data SG_EX;
input extra Study $3-18 res1 $19-30 res2 $31-41 res3 $43-60 per LCL UCL;
cards;
0 Overall 65.8(62/91) 66.6(67/89) 2.9(-6.31,10.81) 2.9 -6.31 10.81
0 Sex . . .
1 Male 67.5(27/44) 72.4(32/39) -1.7(-18.52,14.18) -1.7 -18.52 14.18
1 Female 64(23/41) 61.1(28/41) 6(-11.58,21.37) 6 -11.58 21.37
0 Age Group . . .
1 18-65 years 67.1(54/79) 67.3(57/75) 3(-9.72,13.64) 3 -9.72 13.64
1 >65 years 56.2(-4/6) 62.5(3/5) -3.3(-37.04,31.15) -3.3 -37.04 31.15
0 Race . . .
1 Asian 59.1(4/17) 59.5(6/10) 2.6(-24.71,28.29) 2.6 -24.71 28.29
1 White 63.5(25/44) 69.8(41/52) -3.2(-19.01,11.44) -3.2 -19.01 11.44
1 Other 77(9/18) 63.3(6/9) 16.8(-11.81,41.84) 16.8 -11.81 41.84
0 Region . . .
1 North America 56.2(-4/6) 82.8(4/8) -22.5(-58.3,16.31) -22.5 -58.3 16.31
1 Europe 68.5(37/56) 66.9(43/57) 4.8(-10.01,18.13) 4.8 -10.01 18.13
1 Japan 56.2(-8/0) 19.7(-5/-2) 39.6(-16.79,77.97) 39.6 -16.79 77.97
1 Asia Pacific 66(1/11) 75.7(4/4) -6.6(-37.87,25.36) -6.6 -37.87 25.36
;
run;
data SG_EX;
set SG_EX;
res_position=1;
referenceline_idx=Study;
if referenceline_idx in ('Overall' 'Age Group' '18-65 years' '>65 years'
'Region' 'North America' 'Europe' 'Japan' 'Asia Pacific' ) then referenceline_idx='';
run;
跑出的資料集如下:
畫圖語法如下:
proc template;
define statgraph ForestPlot_2Col_SAS94;
/*(1)畫布大小*/
begingraph / designwidth=800px designheight=410px;
layout lattice / columns=2 columngutter=0 columnweights=(.67 .33 );
layout overlay / walldisplay=(fill)
yaxisopts=(display=none reverse=true offsetmin=0.05 offsetmax=0.05)
/*(2)x軸的主刻度 與 顯示範圍,並設定顯示的 標籤label */
xaxisopts=( offsetmin=0.05 offsetmax=0.05 label="Favors TRTB Favors TRTA"
linearopts=(minorticks=true tickvaluelist=(-100 -80 -60 -40 -20 0 20 40 60 80 100)
viewmin=-100 viewmax=100) TICKVALUEATTRS=(SIZE=8pt family="Courier New")
labelattrs=(family="Courier New" weight=bold size=8pt));
/*(3)左邊大小分類的顯示&標題*/
Innermargin/align=left;
Axistable y=Study value=Study / INDENTWEIGHt=extra
valueattrs=(size=8pt weight=bold color=blue)
LABEL="Subgroup" LABELATTRS=(size=9pt weight=bold) LABELHALIGN=left;
Endinnermargin;
/*(4)整個圖表標題名稱 顯示位置 與 文字屬性*/
entry halign=center textattrs=(color=black weight=bold size=9pt family="Courier New")
" Treatment Difference (TRTA-TRTB)(95% CI)" / location=outside valign=top ;
/*(5)畫網底,根據 referenceline_idx 設定哪一行要網底,THICKNESS是寬度*/
referenceline y=referenceline_idx /lineattrs=(color=cxf0f0f7 THICKNESS=19) ;
/*(6)畫樹狀圖的 點 和 95%CI線段 ,"點"設定了顏色與樣式、大小*/
scatterplot x=per y=Study / xerrorlower=LCL xerrorupper=UCL markerattrs= (color=orange symbol=DiamondFilled size=8);
/*(7)畫垂直參考線*/
referenceline x=0 / lineattrs=(pattern=solid);
endlayout;
layout overlay / walldisplay=none border=false
yaxisopts=(reverse=true type=discrete display=none offsetmin=0.05 offsetmax=0.05)
xaxisopts=(display=none offsetmin=0.15 offsetmax=0.15);
referenceline y=referenceline_idx /lineattrs=(color=cxf0f0f7 THICKNESS=19);
/*(8)設定第二區塊 資料上方的標題,有兩行可以顯示,先設定的是第二行,再設第一行*/
entry halign=left textattrs=(color=black weight=bold size=8pt family="Courier New") " %(n/N) "
halign=center textattrs=(color=black weight=bold size=8pt family="Courier New") " %(n/N)"
halign=right " " / pad=(right=2.75% TOP = 0 ) location=outside valign=top;
entry halign=left textattrs=(color=black weight=bold size=8pt family="Courier New") " TRTA"
halign=center textattrs=(color=black weight=bold size=8pt family="Courier New") " TRTB"
halign=right textattrs=(color=black weight=bold size=8pt family="Courier New") " Diff(95% CI)"
/ pad=(right=2.75% TOP = 0.1% ) location=outside valign=top;
/*(9)列出第二區塊的 3欄資料 res1 res2 res3*/
scatterplot y=Study x=eval(res_position*100-20) / markercharacter=res1 MARKERCHARACTERPOSITION=center;
scatterplot y=Study x=eval(res_position*100+50) / markercharacter=res2 MARKERCHARACTERPOSITION=center;
scatterplot y=Study x=eval(res_position*100+120) / markercharacter=res3 MARKERCHARACTERPOSITION=center;
endlayout;
endlayout;
endgraph;
end;
run;
proc sgrender data=SG_EX template="ForestPlot_2Col_SAS94";
TITLE "SGRENDER: Forest Plot";
run;
第3種程式碼繪製成品(點圖可原圖呈現)
教學將留在實體課程或個人專書內(之後我會出一本專書,教如何用sas畫統計圖)說明。
以下程式碼有被我小改:
%let dpi=300;
%let w=9in;
%let h=6.5in;
data SubgroupData;
input Indent Subgroup $3-27 CountA EventA CountB EventB HR CIL CIU PValue;
format HR CIL CIU PValue 7.2 CountA EventA CountB EventB 5. PctA PctB 4.1;
format EventDisplayA CountDisplayA EventDisplayB CountDisplayB $4. ;
if CountA ne . then do;
PctA = (EventA/CountA)*100;
PctB = (EventB/CountB)*100;
EventDisplayA = right(put(EventA, 4.0));
CountDisplayA = left(put(CountA, 4.0));
EventCountA = right(EventDisplayA || "/" || CountDisplayA);
PctDisplayA = "(" || put(PctA, 4.1) || ")";
EventDisplayB = right(put(EventB, 4.0));
CountDisplayB = left(put(CountB, 4.0));
EventCountB = right(EventDisplayB || "/" || CountDisplayB);
PctDisplayB = "(" || put(PctB, 4.1) || ")";
HRCI = put(HR, 4.2) || " (" || put(CIL, 4.2) || ", " || put(CIU, 4.2) || ")";
/* Determine the marker size based on population size */
SquareSize = ((CountA + CountB)/3876) * 12 ;
end;
row = _n_;
datalines;
0 Age . . . . . . . 0.72
1 < 65 Yr 1077 81 1091 111 0.73 0.55 0.97 .
1 ≧ 65 Yr 862 94 846 117 0.79 0.60 1.03 .
0 Gender . . . . . . . 0.30
1 Male 1293 126 1245 147 0.81 0.64 1.03 .
1 Female 646 49 692 81 0.65 0.46 0.93 .
0 Race . . . . . . . 0.75
1 White 1600 144 1619 189 0.77 0.62 0.95 .
1 Black or African American 218 22 225 32 0.67 0.39 1.16 .
1 Other 88 8 60 5 1.08 0.35 3.29 .
0 Ethnicity . . . . . . . 0.51
1 Hispanic 75 7 72 12 0.58 0.23 1.46 .
1 Non-Hispanic 1852 168 1857 215 0.78 0.63 0.95 .
0 Body Mass Index . . . . . . . 0.14
1 < 30 1123 99 1096 142 0.67 0.52 0.87 .
1 ≧ 30 810 75 835 85 0.91 0.67 1.24 .
0 Coronary Artery Disease . . . . . . . 0.72
1 Yes 241 36 221 47 0.70 0.45 1.08 .
1 No 1697 139 1715 181 0.77 0.62 0.96 .
0 Hypertension . . . . . . . 0.94
1 Yes 1380 141 1390 186 0.76 0.61 0.95 .
1 No 558 34 546 42 0.77 0.49 1.21 .
0 HOMA . . . . . . . 0.25
1 < 4.6 933 76 948 114 0.67 0.50 0.90 .
1 ≧ 4.6 1006 99 989 114 0.85 0.65 1.11 .
0 HgbA1c (%) . . . . . . . 0.38
1 < 5.7 672 52 690 78 0.67 0.47 0.95 .
1 ≧ 5.7 1266 123 1247 150 0.81 0.64 1.02 .
0 HDL (mg/dL) . . . . . . . 0.17
1 < 40 787 77 785 84 0.90 0.66 1.23 .
1 ≧ 40 1147 98 1149 144 0.68 0.53 0.88 .
0 Fasting Glucose (mg/dL) . . . . . . . 0.48
1 < 100 1126 106 1137 132 0.81 0.62 1.04 .
1 ≧ 100 813 69 800 96 0.70 0.51 0.95 .
0 Triglycerides (mg/dL) . . . . . . . 0.22
1 < 150 1260 111 1293 137 0.83 0.65 1.07 .
1 ≧ 150 675 64 641 91 0.65 0.47 0.89 .
0 Medication Adherence . . . . . . . 0.79
1 < 80% 1071 111 834 123 0.69 0.53 0.89 .
1 ≧ 80% 861 61 1090 103 0.73 0.53 1.00 .
;run;
/*--Used for Subgroup labels in column 1--*/
data anno(drop=indent);
set SubgroupData(keep=row subgroup indent rename=(row=y1));
retain Function 'Text ' ID 'id1' X1Space 'DataPercent'
Y1Space 'DataValue ' x1 x2 2 TextSize 7 Width 100 Anchor 'Left ';
if indent;
label = tranwrd(subgroup, '>=', '(*ESC*){Unicode ''2265''x}');
run;
/*--Used for text under x axis of HR scatter plot in column 7--*/
data anno2;
retain Function 'Arrow' ID 'id2' X1Space X2Space 'DataValue'
FIllTransparency 0 Y1Space Y2Space 'GraphPercent'
Scale 1e-40 LineThickness 1 y1 y2 7.0 Width 100
FillStyleElement 'GraphWalls' LineColor 'Black';
x1 = 0.90; x2 = 0.25; output;
x1 = 2.10; x2 = 3.6; output;
function = 'Text'; y1 = 7.0; y2 = 7.0;
x1 = 0.92; anchor = 'Right'; label = 'Pioglitazone Better'; Textsize=8;
output;
x1 = 1.07; Anchor = 'Left '; label = 'Placebo Better'; Textsize=8; output;
run;
data anno; set anno anno2; run;
data forest2(drop=flag);
set SubgroupData nobs=nobs;
Head = not indent;
retain flag 0;
if head then flag = mod(flag + 1, 2);
if flag then ref=row;
if indent then subgroup = ' ';
run;
跑出的資料集如下:
畫圖語法如下:
proc template;
define statgraph Forest;
dynamic /*_show_bands*/ _color _thk;
begingraph;
discreteattrmap name='text';
value '1' / textattrs=(weight=bold); value other;
enddiscreteattrmap;
discreteattrvar attrvar=type var=head attrmap='text';
layout lattice /
columns=8 columnweights=(0.21 0.06 .07 0.06 .07 0.12 0.33 0.09);
/*--Column headers--*/
sidebar / align=top;
layout lattice /
rows=2 columns=5 columnweights=(0.21 0.12 0.13 0.40 0.14);
entry " ";
entry textattrs=(size=8) halign=left " Pioglitazone";
entry textattrs=(size=8) halign=left " Placebo";
entry " ";
entry " ";
entry textattrs=(size=8) halign=left "Subgroup";
entry textattrs=(size=8) halign=left "Events/N (%)";
entry textattrs=(size=8) halign=left " Events/N (%)";
entry textattrs=(size=8) halign=left " HR (95% CI)";
entry halign=center textattrs=(size=8) "P Value*" ;
endlayout;
endsidebar;
/*--First Subgroup column, shows only the Y2 axis--*/
layout overlay / walldisplay=none xaxisopts=(display=none)
yaxisopts=(reverse=true display=none
tickvalueattrs=(weight=bold));
annotate / id='id1';
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=subgroup /
display=(values) textgroup=type;
endlayout;
/*--Second column showing Pioglitizone Events/Count --*/
layout overlay / xaxisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=EventCountA /display=(values)
valuejustify = center;
endlayout;
/*--Third column showing Pioglitizone Percent--*/
layout overlay / xaxisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=PctDisplayA /display=(values)
valuejustify = left;
endlayout;
/*--Fourth column showing Placebo Events/Count --*/
layout overlay / xaxisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=EventCountB /display=(values)
valuejustify = center;
endlayout;
/*--Fifth column showing Placebo Percent--*/
layout overlay / xaxisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=PctDisplayB /display=(values)
valuejustify = left;
endlayout;
/*--Sixth column showing PCIGroup--*/
layout overlay / x2axisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=HRCI / display=(values);
endlayout;
/*--Seventh column showing Hazard ratio graph with 95% error bars--*/
layout overlay / xaxisopts=(type=log
label=' '
labelattrs=(size=10)
logopts=(tickvaluepriority=true
tickvaluelist=(0.25 0.5 1.0 1.5 2.0 3.0)))
yaxisopts=(reverse=true display=none) walldisplay=none;
annotate / id='id2';
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
scatterplot y=row x=HR / xerrorlower=CIL xerrorupper=CIU
sizeresponse=SquareSize sizemin=4 sizemax=12
markerattrs=(symbol=squarefilled);
referenceline x=1;
endlayout;
/*--Eigth column showing P-Values--*/
layout overlay / x2axisopts=(display=none)
yaxisopts=(reverse=true display=none) walldisplay=none;
referenceline y=ref / lineattrs=(thickness=_thk color=_color);
axistable y=row value=pvalue / display=(values)
valuejustify = right
showmissing=false; /*false removes . for missing pvalues*/
endlayout;
endlayout;
entryfootnote halign=left textattrs=(size=7)
'* P-Value is the test of interaction between treatment and each
subgroup unadjusted for multiplicity. The adjusted '
'p-value for multiple testing is 0.94 for all subgroups.';
endgraph;
end;
run;
/*----Create Graph with PNG output: Raster graphics file format-----*/
ods html close;
ods listing gpath="C:\PharmaSUG2017\SAS" image_dpi=&dpi;
ods graphics / reset noscale IMAGEFMT=PNG
width=&w height=&h imagename='PharmaSUG2017_ForestPlot';
proc sgrender data=Forest2 template=Forest sganno=anno;
dynamic _color='white' _thk=12 ; /*-Essentially invisible band-*/
run;
/*甚至可以匯出.EPS圖檔,事後可用Adobe Illustrator開啟改圖內文字或數據*/
ods listing gpath="C:\PharmaSUG2017\SAS" image_dpi=&dpi;
ods graphics / reset noscale IMAGEFMT=EPS
width=&w height=&h imagename='PharmaSUG2017_ForestPlot';
proc sgrender data=Forest2 template=Forest sganno=anno;
dynamic _color='white' _thk=12 ;
run;
結語:
像下面這些繪圖用的資料集,都可以經由病人明細表一路資料清理計算統計而來(整個流程完全靠SAS程式碼,不須手動搬移和調整,得有數年SAS操作經驗,少則2年,多則4-5年)。
這樣的資料集完成後再接畫圖程式碼,所以只要你的最上游raw data增修,SAS程式一跑就可以幾秒內畫出一張新圖,這是SPSS軟體辦不到的地方。整個流程就是用Code寫故事,過程對功力尚不深厚的人來說很具挑戰性。然一旦完成,事後再增減或修改源頭資料重新繪圖,都只是幾秒的時間。
這類圖除可用於Meta analysis呈現結果,也可呈現Odd ratios (crude OR, adjusted OR)和Hazard ratio (crude HR, adjusted HR)等有估計範圍(i.g., 95%CI)的統計值。
另外,SAS現代化繪圖程序(註)主為(我常碰的)Proc sgplot、Proc sgpanel、Proc sgmap和(Proc template+Proc sgrender),後面的Proc template負責專業度極高的佈侷和設計而Proc sgrender負責引入數據,所以他們倆在程式碼堆裡常一前一後出現,最後產出複合式統計圖。還有一種極簡單的佈侷指做了好多表和圖,可以透過ODS LAYOUT帶出多個ODS REGION,然後每個ODS REGION會指定其中一欄一列(子框)來存放一個表或圖。你可以想像:數據從raw data,經清理、計算、統計而輸出一連串表和一堆圖,然後再依ODS LAYOUT和ODS REGION的設定(好幾個子框,每個子框長寬可能都不同)放進相應的子框中,整個流程可設排程完全自動化(前提是要把這個流程用Code寫出來)。
1.古老的繪圖程序則很多,如Proc plot、Proc gchart、...這些程序我較少用。
2.你可以把程序想成Excel函數,只是它複雜度極高,有一階參數、二階參數、三階參數、...,然後每階參數內又有很多子參數
若要copy the code,請至Github
如果本文章提供的資訊對你有幫助,可以考慮請我喝飲料(金額隨喜),謝謝。
台新銀行帳戶 (備註: forest1)










沒有留言:
張貼留言