Reports
AI Inference benchmark
Contents
Comparation
Test results
NVIDIA GeForce RTX 4090
===================alexnet====================
Batch FP32 TF32 FP32* TF32* FP16 INT8
1 2129.87 2106.45 2221.51 2291.14 4600.03 6829.17
2 3374.72 3645.4 3526.23 3953.09 7710.19 13611.6
4 6658.29 6730.09 6844.24 7706.39 13914.4 19377.4
8 11621 11662 11720.7 12474.7 18961.9 20862.2
16 16213.5 16450.5 16045.4 16453.8 21021.3 21017.5
32 19708.1 19411.9 19818.9 20525.8 20874.9 20682.3
64 20937.1 19480.2 20950.4 20883 20888.5 20891.2
128 20985.5 20988.3 20992.5 20984.9 21014.9 21017.2
...
Note: * means with cudagraph
NVIDIA Tesla H100-PCIE
===================alexnet====================
Batch FP32 TF32 FP32* TF32* FP16 INT8
1 1462.76 2329.64 1802.68 2464.44 3528.39 3848.93
2 2560.66 4185.54 2657.27 4322.58 6654.62 7445.68
4 4508.37 7664.33 4632.12 7792.58 11970.2 13886.6
8 6934.06 12729.8 6001.18 13195.9 19497.9 21975.7
16 9715.23 18922.6 9809.69 19381.1 28422.8 30702
32 12099.6 23616.5 11920.6 23857.1 37872.7 38199.4
64 13281.4 27940.1 13287.8 27922.4 40435.4 41758.9
128 14548.9 30301 14489.8 30447.6 42407.3 42349.2
...
Note: * means with cudagraph
NVIDIA Tesla A10G
===================alexnet====================
Batch FP32 TF32 FP32* TF32* FP16 INT8
1 1123.65 1170.71 1187.35 1268.46 2180.29 2785.99
2 1714.78 2027.63 1790.78 2195.88 4070.68 4997.14
4 3063.01 3675.72 3204.78 4038.22 7127.37 9582.15
8 4881.74 5200.26 5100.62 5534.64 10979.1 13607.4
16 6678.75 7411.84 6749.67 7790.32 14331.2 18954.9
32 8302.84 9117.68 8419.56 9229.77 17655.9 21524.6
64 9394.92 10523.4 9425.21 10591.6 20207 21692.9
128 10288 11766.4 10323.8 11819.7 21263.6 21713.9
...
Note: * means with cudagraph
NVIDIA Tesla A10
===================alexnet====================
Batch FP32 TF32 FP32* TF32* FP16 INT8
1 1091.61 1205.2 1139.77 1263.25 2259.18 3120.34
2 1618.06 2099.4 1681.74 2257.87 4235.52 5817.61
4 2733.42 3633.86 2860.66 3987.17 7199.53 10391.4
8 4077.4 5343.62 4190.22 5737.25 10796.5 12363.2
16 5352.06 7478.71 5363.57 7879.02 12392.5 12440.9
32 6316.06 9346.45 6288.33 9523.8 12641.7 12632.8
64 6675.61 10385.6 6699.9 10505.3 12862.9 12968.4
128 6968.84 11426.4 7026.39 11404 12700.2 12773.6
...
Note: * means with cudagraph
NVIDIA Tesla V100
===================alexnet====================
Batch FP32 TF32 FP32* TF32* FP16 INT8
1 1219.53 1222.3 1341.86 1340.92 2365 1844.12
2 2222.15 2262.7 2254.87 2261.51 4044.98 3778.7
4 3446.41 3490.21 3755.27 3726 7036.72 6899.53
8 4987.35 4999.61 5324.77 5349.07 10418.2 10770.5
16 6696.8 6745.37 7129.59 7142.59 15055.9 14984.6
32 8391.61 8397.77 8550.06 8481.38 17513.8 17347.3
64 9202.78 9286.15 9381.99 9437.85 17799.2 17822.9
128 9609.89 9611.78 9691.47 9681.4 17811.1 17715.6
...
Note: * means with cudagraph
For the full report and future new hardware reports, please subscribe our benchmark service(365 RMB/year, mailto://mojianhao@jiansuan.tech)
Written on October 23, 2022