Reports

AI Inference benchmark

Contents

Perf test from AutoDL

GPU ViT-Transformer-FP16 (images/s) Visualization
H800 5815.0 ████████████████████████
RTX PRO 6000 5276.0 ██████████████████████
5090 3806.0 ██████████████████
H20 2263.0 ████████████
4090 2027.0 ██████████
A100 1691.0 █████████
L20 1590.0 ████████
L40 1516.0 ████████
Ascend 910B2 1455.0 ████████
4080 1408.0 ████████
A40 1152.0 ███████
V100 1078.0 ██████
3090 1045.0 ██████
A5000 950.0 ██████
3080 917.0 █████
A10 868.5 █████
2080 Ti 807.0 █████
A4000 658.0 ████
3060 404.5 ██
T4 361.0 ██
1080 Ti 165.3
TITAN Xp 146.5
P40 142.2

Perf/Cost

GPUs ViT-Transformer-FP16 (images/s) VRAM(GB) RMB/Hour Perf/Cost
5090 * 3 11418.0 96 9.09 1256.1
4080s-32g * 3 5224.8 96 4.98 1049.2
4090 * 4 8108.0 96 7.92 1023.7
RTX PRO 6000 * 1 5276.0 96 6.29 838.8
4090-48g * 2 5032.6 96 6.06 830.5
H20 * 1 2263.0 96 7.35 307.9

Comparation

Test results

NVIDIA GeForce RTX 4090

===================alexnet====================
Batch   FP32	TF32	FP32*	TF32*	FP16	INT8
1	2129.87	2106.45	2221.51	2291.14	4600.03	6829.17
2	3374.72	3645.4	3526.23	3953.09	7710.19	13611.6
4	6658.29	6730.09	6844.24	7706.39	13914.4	19377.4
8	11621	11662	11720.7	12474.7	18961.9	20862.2
16	16213.5	16450.5	16045.4	16453.8	21021.3	21017.5
32	19708.1	19411.9	19818.9	20525.8	20874.9	20682.3
64	20937.1	19480.2	20950.4	20883	20888.5	20891.2
128	20985.5	20988.3	20992.5	20984.9	21014.9	21017.2
...
Note: * means with cudagraph

NVIDIA Tesla H100-PCIE

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1462.76	2329.64	1802.68	2464.44	3528.39	3848.93
2	2560.66	4185.54	2657.27	4322.58	6654.62	7445.68
4	4508.37	7664.33	4632.12	7792.58	11970.2	13886.6
8	6934.06	12729.8	6001.18	13195.9	19497.9	21975.7
16	9715.23	18922.6	9809.69	19381.1	28422.8	30702
32	12099.6	23616.5	11920.6	23857.1	37872.7	38199.4
64	13281.4	27940.1	13287.8	27922.4	40435.4	41758.9
128	14548.9	30301	14489.8	30447.6	42407.3	42349.2
...
Note: * means with cudagraph

NVIDIA Tesla A10G

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1123.65	1170.71	1187.35	1268.46	2180.29	2785.99
2	1714.78	2027.63	1790.78	2195.88	4070.68	4997.14
4	3063.01	3675.72	3204.78	4038.22	7127.37	9582.15
8	4881.74	5200.26	5100.62	5534.64	10979.1	13607.4
16	6678.75	7411.84	6749.67	7790.32	14331.2	18954.9
32	8302.84	9117.68	8419.56	9229.77	17655.9	21524.6
64	9394.92	10523.4	9425.21	10591.6	20207	21692.9
128	10288	11766.4	10323.8	11819.7	21263.6	21713.9
...
Note: * means with cudagraph

NVIDIA Tesla A10

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1091.61	1205.2	1139.77	1263.25	2259.18	3120.34
2	1618.06	2099.4	1681.74	2257.87	4235.52	5817.61
4	2733.42	3633.86	2860.66	3987.17	7199.53	10391.4
8	4077.4	5343.62	4190.22	5737.25	10796.5	12363.2
16	5352.06	7478.71	5363.57	7879.02	12392.5	12440.9
32	6316.06	9346.45	6288.33	9523.8	12641.7	12632.8
64	6675.61	10385.6	6699.9	10505.3	12862.9	12968.4
128	6968.84	11426.4	7026.39	11404	12700.2	12773.6
...
Note: * means with cudagraph

NVIDIA Tesla V100

===================alexnet====================
Batch	FP32	TF32	FP32*	TF32*	FP16	INT8
1	1219.53	1222.3	1341.86	1340.92	2365	1844.12
2	2222.15	2262.7	2254.87	2261.51	4044.98	3778.7
4	3446.41	3490.21	3755.27	3726	7036.72	6899.53
8	4987.35	4999.61	5324.77	5349.07	10418.2	10770.5
16	6696.8	6745.37	7129.59	7142.59	15055.9	14984.6
32	8391.61	8397.77	8550.06	8481.38	17513.8	17347.3
64	9202.78	9286.15	9381.99	9437.85	17799.2	17822.9
128	9609.89	9611.78	9691.47	9681.4	17811.1	17715.6
...
Note: * means with cudagraph

For the full report and future new hardware reports, please subscribe our benchmark service(365 RMB/year, mailto://mojianhao@jiansuan.tech)

Written on September 26, 2025