天天视讯!【scrapy框架】scrapy框架糗事百科爬虫案例分享

发布时间:   来源:CSDN  

环境

架构:arm64工具链:gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnulinux-5.4log文件在win7环境生成decodecode文件在Ubuntu环境

背景


【资料图】

在分析oops异常时发现一个叫decodecode的脚本,可以在没有源代码或符号表的情况下,将oops异常的log作为输入就可以解析出错误位置的汇编代码。但在使用decodecode脚本的时候出现了如下错误:

$ ARCH=arm64 $ CROSS_COMPILE=gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu- $ ./scripts/decodecode < panic_test.txt[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)aarch64-linux-gnu-strip: "/tmp/tmp.5Y9eybnnSi.o": No such fileaarch64-linux-gnu-objdump: "/tmp/tmp.5Y9eybnnSi.o": No such fileAll code========   0:   d0002881        adrp    x1, 0x512000   4:   912f9c21        add     x1, x1, #0xbe7   8:   94067e68        bl      0x19f9a8   c:   d2800001        mov     x1, #0x0                        // #0  10:   b900003f        str     wzr, [x1]Code starting with the faulting instruction===========================================

panic_test.txt如下:

[    0.508246] Unable to handle kernel write to read-only memory at virtual address 0000000000000000[    0.517073] Mem abort info:[    0.519835]   ESR = 0x96000045[    0.522881]   EC = 0x25: DABT (current EL), IL = 32 bits[    0.528166]   SET = 0, FnV = 0[    0.531189]   EA = 0, S1PTW = 0[    0.534318] Data abort info:[    0.537169]   ISV = 0, ISS = 0x00000045[    0.540992]   CM = 0, WnR = 1[    0.543929] [0000000000000000] user address but active_mm is swapper[    0.550269] Internal error: Oops: 96000045 [#1] PREEMPT SMP[    0.555804] Modules linked in:[    0.558842] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.99-00006-g02e5b77f5cd8-dirty #14[    0.567069] Hardware name: sun50iw10 (DT)[    0.571059] pstate: 80400005 (Nzcv daif +PAN -UAO)[    0.575833] pc : reg_fixed_voltage_probe+0xdc/0x148[    0.580677] lr : reg_fixed_voltage_probe+0xd8/0x148[    0.585529] sp : ffffffc01002bb40[    0.588822] x29: ffffffc01002bb40 x28: 0000000000000000[    0.594109] x27: ffffffc01101c000 x26: ffffffc010faf000[    0.599395] x25: 0000000000000000 x24: 0000000000000000[    0.604682] x23: ffffffc011046000 x22: ffffff803c483400[    0.609968] x21: ffffffc010730000 x20: ffffffc011008000[    0.615255] x19: ffffffc010e88000 x18: 000000000000000a[    0.620542] x17: 00000000e45a70be x16: 00000000e90dbb24[    0.625829] x15: 000000000007a823 x14: ffffffc09002b877[    0.631115] x13: ffffffffffffffff x12: 0000000000000030[    0.636402] x11: 0000000000000004 x10: 0101010101010101[    0.641688] x9 : 0000000000000002 x8 : 0000000000000003[    0.646975] x7 : 0000000000000005 x6 : 00000000001b0b13[    0.652262] x5 : 130b1b0000000000 x4 : 0000000000000000[    0.657548] x3 : 0000000000000069 x2 : ffffff803df20040[    0.662835] x1 : 0000000000000000 x0 : 00000000ffffffea[    0.668122] Call trace:[    0.670551]  reg_fixed_voltage_probe+0xdc/0x148[    0.675060]  platform_drv_probe+0x54/0xa4[    0.679044]  really_probe+0x1d8/0x468[    0.682684]  driver_probe_device+0xec/0x12c[    0.686844]  device_driver_attach+0x54/0x78[    0.691004]  __driver_attach+0x130/0x148[    0.694907]  bus_for_each_dev+0x80/0xc8[    0.698717]  driver_attach+0x30/0x3c[    0.702270]  bus_add_driver+0x130/0x200[    0.706084]  driver_register+0xb0/0xfc[    0.709811]  __platform_driver_register+0x58/0x64[    0.714496]  regulator_pmc_voltage_init+0x20/0x28[    0.719173]  do_one_initcall+0xbc/0x224[    0.722985]  kernel_init_freeable+0x158/0x1f8[    0.727320]  kernel_init+0x18/0x108[    0.730785]  ret_from_fork+0x10/0x18[    0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)[    0.740417] ---[ end trace f73e218fc7aa2872 ]---[    0.745016] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b[    0.752626] SMP: stopping secondary CPUs[    0.756537] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

实验

当发现脚本没有正确解析后,只能去看脚本代码加打印来确认问题。最终在这段代码前后加打印确定了问题

echo "code before:$code"code=`echo $code |sed -e "s/ [<(]>)] / /;s/ /,0x/g; s/[>)]$//"`echo "code after:$code"

输出如下:

$ ARCH=arm64 CROSS_COMPILE=gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-  ./scripts/decodecode < panic_test.txt[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)code before:b900003f)code after:b900003f)aarch64-linux-gnu-strip: "/tmp/tmp.TxEaGdxHgA.o": No such fileaarch64-linux-gnu-objdump: "/tmp/tmp.TxEaGdxHgA.o": No such fileAll code========   0:   d0002881        adrp    x1, 0x512000   4:   912f9c21        add     x1, x1, #0xbe7   8:   94067e68        bl      0x19f9a8   c:   d2800001        mov     x1, #0x0                        // #0  10:   b900003f        str     wzr, [x1]Code starting with the faulting instruction===========================================

从输出可以看出code before & code after没有变化,猜想code经过处理后应该是要将)去掉的。可是打印中间的代码是有对)进行处理的。将上述实验抽离继续作如下实验:

$ code="b900003f)" && echo $code |sed -e "s/ [<(]>)] / /;s/ /,0x/g; s/[>)]$//"b900003f

从上述实验可以看出,单独抽出来就可以正常处理。难道是文本上有什么区别?于是创建一个测试文件并写入b900003f)。有做了如下实验:

cat temp.back |sed -e "s/ [<(]>)] / /;s/ /,0x/g; s/[>)]$//"b900003f

换了文本作为输入依旧正常。到了这里我将怀疑的重点导向了格式问题。当Windows环境的文本会产生一个换行符(CR)而Ubuntu环境没有并视为无效字符,当copy到Ubuntu环境中时这个无效字符(CR)将会充斥在每一行的句末。我们可以使用在vim中输入:e ++ff=unix %显示出来如下图: 知道问题的根因,接下来就是消除^M的问题了。这是我解决后提到社区的patch如下:

Date: Mon, 27 Sep 2021 15:41:34 +0800Subject: [PATCH] scripts/decodecode: fix faulting instruction no print when opps.file is DOS formatIf opps.file is in DOS format, faulting instruction cannot be printed:/ # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-/ # ./scripts/decodecode < oops.file[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)aarch64-linux-gnu-strip: "/tmp/tmp.5Y9eybnnSi.o": No such fileaarch64-linux-gnu-objdump: "/tmp/tmp.5Y9eybnnSi.o": No such fileAll code========   0:   d0002881        adrp    x1, 0x512000   4:   912f9c21        add     x1, x1, #0xbe7   8:   94067e68        bl      0x19f9a8   c:   d2800001        mov     x1, #0x0                        // #0  10:   b900003f        str     wzr, [x1]Code starting with the faulting instruction===========================================Background: The compilation environment is Ubuntu,and the test environment is Windows.Most logs are generated in the Windows environment.In this way, CR (carriage return) will inevitably appear,which will affect the use of decodecode in the Ubuntu environment.The repaired effect is as follows:/ # ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-/ # ./scripts/decodecode < oops.file[ 0.734345] Code: d0002881 912f9c21 94067e68 d2800001 (b900003f)All code========   0:   d0002881        adrp    x1, 0x512000   4:   912f9c21        add     x1, x1, #0xbe7   8:   94067e68        bl      0x19f9a8   c:   d2800001        mov     x1, #0x0                        // #0  10:*  b900003f        str     wzr, [x1]               <-- 0="" 1="" 2="" 7="" trapping="" instructioncode="" starting="" with="" the="" faulting="" instruction="" 0:="" b900003f="" str="" cc:="" borislav="" petkovcc:="" andrew="" mortoncc:="" marc="" zyngiercc:="" will="" deaconcc:="" rabin="" vincentcc:="" lkmlsigned-off-by:="" weidonghui---="" decodecode="" -="" file="" diff="" --git="" decodecodeindex="" 31d884e35f2f..c711a196511c="" 100755---="" if="" marker="" -ne="" then="" fi="" echo="" code=""> $T.aa echo =========================================== >> $T.aa-code=`echo $code | sed -e "s/ [<(]>)] / /;s/ /,0x/g; s/[>)]$//"`+code=`echo $code | sed -e "s/\r//;s/ [<(]>)] / /;s/ /,0x/g; s/[>)]$//"` echo -n "      .$type 0x" > $T.s echo $code >> $T.s disas $T 0--2.22.0.windows.1

相关文章Related

返回栏目>>