Reports

AI Inference benchmark

Contents

AI Inference benchmark
- Comparation
- Test results

Comparation

Test results

NVIDIA GeForce RTX 4090

===================alexnet====================
Batch   FP32	TF32	FP32*	TF32*	FP16	INT8
1	2129.87	2106.45	2221.51	2291.14	4600.03	6829.17
2	3374.72	3645.4	3526.23	3953.09	7710.19	13611.6
4	6658.29	6730.09	6844.24	7706.39	13914.4	19377.4
8	11621	11662	11720.7	12474.7	18961.9	20862.2
16	16213.5	16450.5	16045.4	16453.8	21021.3	21017.5
32	19708.1	19411.9	19818.9	20525.8	20874.9	20682.3
64	20937.1	19480.2	20950.4	20883	20888.5	20891.2
128	20985.5	20988.3	20992.5	20984.9	21014.9	21017.2
...
Note: * means with cudagraph

NVIDIA Tesla H100-PCIE

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1462.76	2329.64	1802.68	2464.44	3528.39	3848.93
2	2560.66	4185.54	2657.27	4322.58	6654.62	7445.68
4	4508.37	7664.33	4632.12	7792.58	11970.2	13886.6
8	6934.06	12729.8	6001.18	13195.9	19497.9	21975.7
16	9715.23	18922.6	9809.69	19381.1	28422.8	30702
32	12099.6	23616.5	11920.6	23857.1	37872.7	38199.4
64	13281.4	27940.1	13287.8	27922.4	40435.4	41758.9
128	14548.9	30301	14489.8	30447.6	42407.3	42349.2
...
Note: * means with cudagraph

NVIDIA Tesla A10G

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1123.65	1170.71	1187.35	1268.46	2180.29	2785.99
2	1714.78	2027.63	1790.78	2195.88	4070.68	4997.14
4	3063.01	3675.72	3204.78	4038.22	7127.37	9582.15
8	4881.74	5200.26	5100.62	5534.64	10979.1	13607.4
16	6678.75	7411.84	6749.67	7790.32	14331.2	18954.9
32	8302.84	9117.68	8419.56	9229.77	17655.9	21524.6
64	9394.92	10523.4	9425.21	10591.6	20207	21692.9
128	10288	11766.4	10323.8	11819.7	21263.6	21713.9
...
Note: * means with cudagraph

NVIDIA Tesla A10

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1091.61	1205.2	1139.77	1263.25	2259.18	3120.34
2	1618.06	2099.4	1681.74	2257.87	4235.52	5817.61
4	2733.42	3633.86	2860.66	3987.17	7199.53	10391.4
8	4077.4	5343.62	4190.22	5737.25	10796.5	12363.2
16	5352.06	7478.71	5363.57	7879.02	12392.5	12440.9
32	6316.06	9346.45	6288.33	9523.8	12641.7	12632.8
64	6675.61	10385.6	6699.9	10505.3	12862.9	12968.4
128	6968.84	11426.4	7026.39	11404	12700.2	12773.6
...
Note: * means with cudagraph

NVIDIA Tesla V100

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1219.53	1222.3	1341.86	1340.92	2365	1844.12
2	2222.15	2262.7	2254.87	2261.51	4044.98	3778.7
4	3446.41	3490.21	3755.27	3726	7036.72	6899.53
8	4987.35	4999.61	5324.77	5349.07	10418.2	10770.5
16	6696.8	6745.37	7129.59	7142.59	15055.9	14984.6
32	8391.61	8397.77	8550.06	8481.38	17513.8	17347.3
64	9202.78	9286.15	9381.99	9437.85	17799.2	17822.9
128	9609.89	9611.78	9691.47	9681.4	17811.1	17715.6
...
Note: * means with cudagraph

For the full report and future new hardware reports, please subscribe our benchmark service(365 RMB/year, mailto://mojianhao@jiansuan.tech)

Written on October 23, 2022