============================================ IBM MVS OS/390 - NOTES ON SPACE and BLOCKING ============================================ compiled by Kevin Solomon from various sources: "I did a bit of editing to make it easier for beginners. Almost all of it is culled from Bulletin Board and Newsgroup postings. When I see something useful I make a note of it. I have no idea who the original authors were with the exception that the note on SORT vs IDCAMS was 99% composed by Steve Wilke (who is the JCL Bulletin Board Moderator at mvshelp.com.)" General ------- A 3390-n device has a capacity of 56,664 bytes per track, of which 55,996 bytes are accessible by applications programmers. The largest blocksize you can define is 32,760, which is fine for tapes, but would be quite wasteful on DASD, as 55,996 - 32760 = 23,236 bytes left over, and because tracks can't be shared between other files, that leftover space would just be wasted. So, 55,996/2=27,998, which is half-track blocking, the most space-efficient blocksize to use on 3390's. If you have 3380 device types, the maximum half-track blocksize is 23,476. You don't need to specify a specific blocksize in MVS JCL, except when you need to have fine control over it. Instead, specify BLKSIZE=0, and let the operating system figure it out for you. [This optimization only works if you are running the IBM SMS subsystem. -IAN!] When you have a good idea about how many tracks/cylinders/blocks will be required by the output dataset, and it will be consistent from day to day, then use a large primary, and small secondary allocation. When you have no control over the number of records, use small primary and large secondary allocations. This will reduce the chances of ABENDS. For example, SPACE=(CYL,(10,1)) initially reserves 10 cylinders, with a maximum possible allocation of 25 cylinders. Specifying SPACE=(CYL,(1,3)) allows for a maximum of 46 cylinders, while not occupying space unless it's necessary. Calculating Blocksize --------------------- To calculate the most efficient blocksize to use the formula is: bestblocksize = INT(half-track-blocksize/LRECL)*LRECL Inter Block Gaps ---------------- It's important to use large blocksizes when writing "flat files", or purely sequential (QSAM) datasets if you're writing a lot of data. Between each block of data, there is what's known as an "interblock gap" or IBGs, which contains various system-maintained information, such as track#, cyl#, CRC's, etc. - but each of these IBGs use up somewhere in the neighbourhood of 650 bytes. Therefore, if your block size is 80, it has an IBG separating each block using up 650 bytes - your space efficiency is therefore just under 11%. I used ISPF 3.2 to allocate a test dataset with LRECL=1, BLKSIZE=1, RECFM=FB, and space in blocks. After allocating the test dataset, I checked to see how many blocks it was occupying. Result: 87 blocks. Here is my calculation. (MaxBytesPerTrack - MinBlockSize) / MinBlockSize = IBG (56,664 - 87) / 87 = 650.31+miscbits Or, 87 bytes stored with an overhead of 56,577 bytes, a space efficiency of about 0.15%. I might be off a bit in my IBG calculation, but not by anything significant. Simple Dataset Copy Jobstream ----------------------------- This jobstream stub is a simple way to reblock a dataset. //SORTCOPY EXEC PGM=SORT //SYSOUT DD SYSOUT=* //SORTIN DD DSN=my.input.data,DISP=SHR //SORTOUT DD DSN=my.output.data, // DCB=(BLKSIZE=0,LRECL=80,RECFM=FB), // DISP=(,CATLG,DELETE), // SPACE=(TRK,(30,30),RLSE), // UNIT=SYSDA //SYSIN DD * SORT FIELDS=COPY /* // SORT vs IDCAMS, IEBGENER etc. ----------------------------- I've found that SORT products have much better performance when copying files than basic IBM utilities like IDCAMS or IEBGENER. The CPU and EXCP (EXecute Channel Program, the "driver programs" that communicate directly with the hardware) savings might be small, but small savings add up. I use a sort utility whenever I can to copy, sort, re-format or re-block datasets. For example... If you used IDCAMS to copy & re-block a badly blocked dataset with BLKSIZE=80,LRECL=80 with no special JCL coding, you would receive 5 buffers for input and output. There are 77 records per track with this DCB. This is how IDCAMS executes... IDCAMS opens the input file, and asks the operating system for the first record. The OS does a check, and finds no data in the buffers yet, so an EXCP (Execute Channel Program) is issued to fill the buffers that are empty, and your jobstep becomes eligible to be "swapped out" until the data is ready. The channel program is told where the buffers are in memory, and what device and block(s) it needs to fill the buffers with. The CP then tells the disk drive to seek the head over the cylinder that contains the data, and waits for the disk platter to spin around under the head so the data can be read. The CP then transfers the data to the waiting buffers, and when it's done, signals the OS that your buffers have been filled. Back at the ranch, the OS has been busily doling out CPU time to other jobstreams, users, and started tasks. When it's taken care of things running at higher priority or using less resources, and there's enough storage to "swap in" your jobstream, your step gets loaded back into core storage, and IDCAMS goes merrily on it's way - it now has the first record. What to do with it? Write it to output DD. That occurs for the first 5 records - but then comes the request for the 6th input record and we need more data from the file. So, another EXCP is issued to fill the now-empty buffers, your jobstep gets swapped out, and the CP is told where to go fetch the data from and where to put it. But, by this time, the disk drive's head has been all over the place, so the CP has to put it right back where it was before, and wait for the 6th through the 10th records to spin by the head so it can grab the data and shove it into the buffers. After a seeming eternity the data is finally transferred to your buffers, and the OS is told your jobstep can be swapped back in. Once enough core is available, your step begins running again, and IDCAMS gets the 6th record, and writes it to ouput DD - but, "there is no room at the inn", all the output buffers have been filled, so your jobstep that was just swapped in gets swapped out as the OS scrounges up yet another CP to scurry those filled buffers off to your output file. If you think this was a lot of work to get this far, it was. And so far we've only transferred 5 records from the input file to the output file. Your jobstep would run much faster if you allocated 77 buffers to the input file, which would allow an entire track to be read at one time, and allocate 30 buffers to the output file, which would allow writing an entire cylinder at once. But now comes the good news. SORT utilities do all of this optimization FOR you; you don't have to fiddle around with a calculator and technical specifications of drive geometry, trying to determine the optimum number of buffers to use. And if the drive geometry changes in the future, your manually coded BUFNO= statements would have to be optimized for the new hardware. Why work that hard? Why leave a maintenance time bomb ticking somewhere?