Вопрос по – Как разделить текстовый файл с помощью PowerShell?

59

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Ваш Ответ

14   ответов
0

Error: User Rate Limit Exceeded

$sourceFile = "c:\myfolder\mylargeTextyFile.csv"
$partNumber = 1
$batchSize = 500000
$pathAndFilename = "c:\myfolder\mylargeTextyFile part $partNumber file.csv"

[System.Text.Encoding]$enc = [System.Text.Encoding]::GetEncoding(65001)  # utf8 this one

$fs=New-Object System.IO.FileStream ($sourceFile,"OpenOrCreate", "Read", "ReadWrite",8,"None") 
$streamIn=New-Object System.IO.StreamReader($fs, $enc)
$streamout = new-object System.IO.StreamWriter $pathAndFilename

$line = $streamIn.readline()
$counter = 0
while ($line -ne $null)
{
    $streamout.writeline($line)
    $counter +=1
    if ($counter -eq $batchsize)
    {
        $partNumber+=1
        $counter =0
        $streamOut.close()
        $pathAndFilename = "c:\myfolder\mylargeTextyFile part $partNumber file.csv"
        $streamout = new-object System.IO.StreamWriter $pathAndFilename

    }
    $line = $streamIn.readline()
}
$streamin.close()
$streamout.close()

Error: User Rate Limit ExceededStreamReaderError: User Rate Limit ExceededStreamWriterError: User Rate Limit Exceeded

48

Error: User Rate Limit Exceeded

Error: User Rate Limit ExceededError: User Rate Limit ExceededError: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

$from = "C:\temp\large_log.txt"
$rootName = "C:\temp\large_log_chunk"
$ext = "txt"
$upperBound = 100MB


$fromFile = [io.file]::OpenRead($from)
$buff = new-object byte[] $upperBound
$count = $idx = 0
try {
    do {
        "Reading $upperBound"
        $count = $fromFile.Read($buff, 0, $buff.Length)
        if ($count -gt 0) {
            $to = "{0}.{1}.{2}" -f ($rootName, $idx, $ext)
            $toFile = [io.file]::OpenWrite($to)
            try {
                "Writing $count to $to"
                $tofile.Write($buff, 0, $count)
            } finally {
                $tofile.Close()
            }
        }
        $idx ++
    } while ($count -gt 0)
}
finally {
    $fromFile.Close()
}
Error: User Rate Limit ExceededStreamReader? Так что вы можете разделить с новыми линиями?
Error: User Rate Limit Exceededgist.github.com/awayken/5861923
Error: User Rate Limit Exceeded
Если вы добавите эти строки в начало сценария, чтобы определить переменные и изменить их в соответствии с файлом, который вы пытаетесь разбить, у вас все будет готово! $ from = & quot; C: \ temp \ large_log.txt & quot; $ rootName = & quot; C: \ temp \ large_log_chunk & quot; $ ext = & quot; txt & quot;
@stej, основываясь на этом ответе, я добавил в свой ответ версию для чтения потокового видео.
3

Error: User Rate Limit Exceeded

$linecount=0; $i=0; Get-Content .\BIG_LOG_FILE.txt | %{ Add-Content OUT$i.log "$_"; $linecount++; if ($linecount -eq 3000) {$I++; $linecount=0 } }

Error: User Rate Limit Exceeded

2

Error: User Rate Limit Exceeded

##############################################################################
#.SYNOPSIS
# Breaks a text file into multiple text files in a destination, where each
# file contains a maximum number of lines.
#
#.DESCRIPTION
# When working with files that have a header, it is often desirable to have
# the header information repeated in all of the split files. Split-File
# supports this functionality with the -rc (RepeatCount) parameter.
#
#.PARAMETER Path
# Specifies the path to an item. Wildcards are permitted.
#
#.PARAMETER LiteralPath
# Specifies the path to an item. Unlike Path, the value of LiteralPath is
# used exactly as it is typed. No characters are interpreted as wildcards.
# If the path includes escape characters, enclose it in single quotation marks.
# Single quotation marks tell Windows PowerShell not to interpret any
# characters as escape sequences.
#
#.PARAMETER Destination
# (Or -d) The location in which to place the chunked output files.
#
#.PARAMETER Size
# (Or -s) The maximum size of each file. Size must be expressed in MB.
#
#.PARAMETER RepeatCount
# (Or -rc) Specifies the number of "header" lines from the input file that will
# be repeated in each output file. Typically this is 0 or 1 but it can be any
# number of lines.
#
#.EXAMPLE
# Split-File bigfile.csv -s 20 -rc 1
#
#.LINK 
# Out-TempFile
##############################################################################
function Split-File {

    [CmdletBinding(DefaultParameterSetName='Path')]
    param(

        [Parameter(ParameterSetName='Path', Position=1, Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
        [String[]]$Path,

        [Alias("PSPath")]
        [Parameter(ParameterSetName='LiteralPath', Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [String[]]$LiteralPath,

        [Alias('s')]
        [Parameter(Position=2,Mandatory=$true)]
        [Int32]$Size,

        [Alias('d')]
        [Parameter(Position=3)]
        [String]$Destination='.',

        [Alias('rc')]
        [Parameter()]
        [Int32]$RepeatCount

    )

    process {

  # yeah! the cmdlet supports wildcards
        if ($LiteralPath) { $ResolveArgs = @{LiteralPath=$LiteralPath} }
        elseif ($Path) { $ResolveArgs = @{Path=$Path} }

        Resolve-Path @ResolveArgs | %{

            $InputName = [IO.Path]::GetFileNameWithoutExtension($_)
            $InputExt  = [IO.Path]::GetExtension($_)

            if ($RepeatCount) { $Header = Get-Content $_ -TotalCount:$RepeatCount }

   Resolve-Path @ResolveArgs | %{

    $InputName = [IO.Path]::GetFileNameWithoutExtension($_)
    $InputExt  = [IO.Path]::GetExtension($_)

    if ($RepeatCount) { $Header = Get-Content $_ -TotalCount:$RepeatCount }

    # get the input file in manageable chunks

    $Part = 1
    $buffer = ""
    Get-Content $_ -ReadCount:1 | %{

     # make an output filename with a suffix
     $OutputFile = Join-Path $Destination ('{0}-{1:0000}{2}' -f ($InputName,$Part,$InputExt))

     # In the first iteration the header will be
     # copied to the output file as usual
     # on subsequent iterations we have to do it
     if ($RepeatCount -and $Part -gt 1) {
      Set-Content $OutputFile $Header
     }

     # test buffer size and dump data only if buffer is greater than size
     if ($buffer.length -gt ($Size * 1MB)) {
      # write this chunk to the output file
      Write-Host "Writing $OutputFile"
      Add-Content $OutputFile $buffer
      $Part += 1
      $buffer = ""
     } else {
      $buffer += $_ + "`r"
     }
    }
   }
        }
    }
}
15

Error: User Rate Limit Exceeded

##############################################################################
#.SYNOPSIS
# Breaks a text file into multiple text files in a destination, where each
# file contains a maximum number of lines.
#
#.DESCRIPTION
# When working with files that have a header, it is often desirable to have
# the header information repeated in all of the split files. Split-File
# supports this functionality with the -rc (RepeatCount) parameter.
#
#.PARAMETER Path
# Specifies the path to an item. Wildcards are permitted.
#
#.PARAMETER LiteralPath
# Specifies the path to an item. Unlike Path, the value of LiteralPath is
# used exactly as it is typed. No characters are interpreted as wildcards.
# If the path includes escape characters, enclose it in single quotation marks.
# Single quotation marks tell Windows PowerShell not to interpret any
# characters as escape sequences.
#
#.PARAMETER Destination
# (Or -d) The location in which to place the chunked output files.
#
#.PARAMETER Count
# (Or -c) The maximum number of lines in each file.
#
#.PARAMETER RepeatCount
# (Or -rc) Specifies the number of "header" lines from the input file that will
# be repeated in each output file. Typically this is 0 or 1 but it can be any
# number of lines.
#
#.EXAMPLE
# Split-File bigfile.csv 3000 -rc 1
#
#.LINK 
# Out-TempFile
##############################################################################
function Split-File {

    [CmdletBinding(DefaultParameterSetName='Path')]
    param(

        [Parameter(ParameterSetName='Path', Position=1, Mandatory=$true, ValueFromPipeline=$true, ValueFromPipelineByPropertyName=$true)]
        [String[]]$Path,

        [Alias("PSPath")]
        [Parameter(ParameterSetName='LiteralPath', Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
        [String[]]$LiteralPath,

        [Alias('c')]
        [Parameter(Position=2,Mandatory=$true)]
        [Int32]$Count,

        [Alias('d')]
        [Parameter(Position=3)]
        [String]$Destination='.',

        [Alias('rc')]
        [Parameter()]
        [Int32]$RepeatCount

    )

    process {

        # yeah! the cmdlet supports wildcards
        if ($LiteralPath) { $ResolveArgs = @{LiteralPath=$LiteralPath} }
        elseif ($Path) { $ResolveArgs = @{Path=$Path} }

        Resolve-Path @ResolveArgs | %{

            $InputName = [IO.Path]::GetFileNameWithoutExtension($_)
            $InputExt  = [IO.Path]::GetExtension($_)

            if ($RepeatCount) { $Header = Get-Content $_ -TotalCount:$RepeatCount }

            # get the input file in manageable chunks

            $Part = 1
            Get-Content $_ -ReadCount:$Count | %{

                # make an output filename with a suffix
                $OutputFile = Join-Path $Destination ('{0}-{1:0000}{2}' -f ($InputName,$Part,$InputExt))

                # In the first iteration the header will be
                # copied to the output file as usual
                # on subsequent iterations we have to do it
                if ($RepeatCount -and $Part -gt 1) {
                    Set-Content $OutputFile $Header
                }

                # write this chunk to the output file
                Write-Host "Writing $OutputFile"
                Add-Content $OutputFile $_

                $Part += 1

            }

        }

    }

}
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
6

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Error: User Rate Limit ExceededError: User Rate Limit ExceededError: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
Write-Host "Reading source file..."
$lines = [System.IO.File]::ReadAllLines("C:\Temp\SplitTest\source.sql")
$totalLines = $lines.Length

Write-Host "Total Lines :" $totalLines

$skip = 0
$count = 100000; # Number of lines per file

# File counter, with sort friendly name
$fileNumber = 1
$fileNumberString = $filenumber.ToString("000")

while ($skip -le $totalLines) {
    $upper = $skip + $count - 1
    if ($upper -gt ($lines.Length - 1)) {
        $upper = $lines.Length - 1
    }

    # Write the lines
    [System.IO.File]::WriteAllLines("C:\Temp\SplitTest\result$fileNumberString.txt",$lines[($skip..$upper)])

    # Increment counters
    $skip += $count
    $fileNumber++
    $fileNumberString = $filenumber.ToString("000")
}

$sw.Stop()

Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"

Error: User Rate Limit Exceeded

Reading source file...
Total Lines : 910030
Split complete in  1.7056578 seconds

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded
см. мой ответ ниже для дружественного памяти, разделение на основе новой строки
Это действительно довольно быстро, мне пришлось разделить файл 740Mb, потребовалось19s бежать. Принятое решение побежало для73 (!) minutes для того же файла. Это определенно мой выбор.
Но это будет занимать много памяти. Я пытаюсь переписать с помощью потокового чтения / записи
0

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Option Explicit

Private Const INPUT_TEXT_FILE = "c:\bigtextfile.txt"  
Private Const REPEAT_HEADER_ROW = True                
Private Const LINES_PER_PART = 500000                 

Dim oFileSystem, oInputFile, oOutputFile, iOutputFile, iLineCounter, sHeaderLine, sLine, sFileExt, sStart

sStart = Now()

sFileExt = Right(INPUT_TEXT_FILE,Len(INPUT_TEXT_FILE)-InstrRev(INPUT_TEXT_FILE,".")+1)
iLineCounter = 0
iOutputFile = 1

Set oFileSystem = CreateObject("Scripting.FileSystemObject")
Set oInputFile = oFileSystem.OpenTextFile(INPUT_TEXT_FILE, 1, False)
Set oOutputFile = oFileSystem.OpenTextFile(Replace(INPUT_TEXT_FILE, sFileExt, "_" & iOutputFile & sFileExt), 2, True)

If REPEAT_HEADER_ROW Then
    iLineCounter = 1
    sHeaderLine = oInputFile.ReadLine()
    Call oOutputFile.WriteLine(sHeaderLine)
End If

Do While Not oInputFile.AtEndOfStream
    sLine = oInputFile.ReadLine()
    Call oOutputFile.WriteLine(sLine)
    iLineCounter = iLineCounter + 1
    If iLineCounter Mod LINES_PER_PART = 0 Then
        iOutputFile = iOutputFile + 1
        Call oOutputFile.Close()
        Set oOutputFile = oFileSystem.OpenTextFile(Replace(INPUT_TEXT_FILE, sFileExt, "_" & iOutputFile & sFileExt), 2, True)
      ,  If REPEAT_HEADER_ROW Then
            Call oOutputFile.WriteLine(sHeaderLine)
        End If
    End If
Loop

Call oInputFile.Close()
Call oOutputFile.Close()
Set oFileSystem = Nothing

Call MsgBox("Done" & vbCrLf & "Lines Processed:" & iLineCounter & vbCrLf & "Part Files: " & iOutputFile & vbCrLf & "Start Time: " & sStart & vbCrLf & "Finish Time: " & Now())
40

Add-Content

$upperBound = 50MB # calculated by Powershell
$ext = "log"
$rootName = "log_"

$reader = new-object System.IO.StreamReader("C:\Exceptions.log")
$count = 1
$fileName = "{0}{1}.{2}" -f ($rootName, $count, $ext)
while(($line = $reader.ReadLine()) -ne $null)
{
    Add-Content -path $fileName -value $line
    if((Get-ChildItem -path $fileName).Length -ge $upperBound)
    {
        ++$count
        $fileName = "{0}{1}.{2}" -f ($rootName, $count, $ext)
    }
}

$reader.Close()
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded Ralph Shillington
Для тех, кому лень читать следующий ответ, вы можете установить объект $ reader через $ reader = new-object System.IO.StreamReader ($ inputFile)
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
1

Error: User Rate Limit Exceeded

split MyBigFile.csv

Error: User Rate Limit Exceeded

Error: User Rate Limit ExceededError: User Rate Limit Exceeded

27

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded1.7 GB TXT fileError: User Rate Limit Exceededin 95 seconds).

#split test
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$filename = "C:\Users\Vincent\Desktop\test.txt"
$rootName = "C:\Users\Vincent\Desktop\result"
$ext = ".txt"

$linesperFile = 100000#100k
$filecount = 1
$reader = $null
try{
    $reader = [io.file]::OpenText($filename)
    try{
        "Creating file number $filecount"
        $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
        $filecount++
        $linecount = 0

        while($reader.EndOfStream -ne $true) {
            "Reading $linesperFile"
            while( ($linecount -lt $linesperFile) -and ($reader.EndOfStream -ne $true)){
                $writer.WriteLine($reader.ReadLine());
                $linecount++
            }

            if($reader.EndOfStream -ne $true) {
                "Closing file"
                $writer.Dispose();

                "Creating file number $filecount"
                $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
                $filecount++
                $linecount = 0
            }
        }
    } finally {
        $writer.Dispose();
    }
} finally {
    $reader.Dispose();
}
$sw.Stop()

Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"

Error: User Rate Limit Exceeded

...
Creating file number 45
Reading 100000
Closing file
Creating file number 46
Reading 100000
Closing file
Creating file number 47
Reading 100000
Closing file
Creating file ,number 48
Reading 100000
Split complete in  95.6308289 seconds
Error: User Rate Limit Exceeded
Error: User Rate Limit Exceeded
2

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

Error: User Rate Limit Exceeded

    $linecount=0; $i=0; 
    Get-Content .\BIG_LOG_FILE.txt | %
    { 
      Add-Content OUT$i.log "$_"; 
      $linecount++; 
      if ($linecount -eq 3000) {$I++; $linecount=0 } 
    }

Error: User Rate Limit Exceeded

Get-Content C:\TEMP\DATA\split\splitme.txt | Select -First 5000 | out-File C:\temp\file1.txt -Encoding ASCII

Error: User Rate Limit Exceeded

Get-Content C:\TEMP\DATA\split\splitme.txt | Select -Skip 5000 | Select -First 5000 | out-File C:\temp\file2.txt -Encoding ASCII

Error: User Rate Limit Exceeded

Get-Content C:\TEMP\DATA\split\splitme.txt | Select -Skip 10000 | Select -First 5000 | out-File C:\temp\file3.txt -Encoding ASCII

Error: User Rate Limit Exceeded

спасибо, я в конечном итоге использовал это ... но не забудьте добавить -width для outfile, иначе он может обрезать ваш вывод до 80 символов ... также это работает по одной строке за раз ... быстрее использовать gc -readcount 1000 | select -first 5 ... это делает 1000 строк за раз ... наконец, gc прочитает весь файл и select проигнорирует большую его часть ... немного быстрее, чтобы включить параметр -totalcount с помощью gc для остановки после определенного числа из строк ... может сделать -tail для конца файла тоже
0

Error: User Rate Limit Exceeded

$infile = "D:\Malcolm\Test\patch6.txt"
$path = "D:\Malcolm\Test\"
$lineCount = 1
$fileCount = 1

foreach ($computername in get-content $infile)
{
    write $computername | out-file -Append $path_$fileCount".txt"
    $lineCount++

    if ($lineCount -eq 1000)
    {
        $fileCount++
        $lineCount = 1
    }
}
30

Error: User Rate Limit Exceeded

$i=0; Get-Content .....log -ReadCount 100 | %{$i++; $_ | Out-File out_$i.txt}
14

Error: User Rate Limit Exceeded

$reader = new-object System.IO.StreamReader("C:\Contacts.vcf")
$count = 1
$filename = "C:\Contacts\{0}.vcf" -f ($count) 

while(($line = $reader.ReadLine()) -ne $null)
{
    Add-Content -path $fileName -value $line

    if($line -eq "END:VCARD")
    {
        ++$count
        $filename = "C:\Contacts\{0}.vcf" -f ($count)
    }
}

$reader.Close()

Похожие вопросы