Recent

Author Topic: speeding up big data projects  (Read 3031 times)

MarkMLl

  • Hero Member
  • *****
  • Posts: 8533
Re: speeding up big data projects
« Reply #15 on: September 21, 2022, 09:35:52 pm »
oh, you mean an external sorting in %temp%  - that would take time... On every such a query

No, I do /not/.

The daily WTF etc. are full of stories of people who have ended up doing sequential scans of excesssively-large databases because they either lack the tools that show them they're doing something silly or lack the knowledge to apply them.

Apart from that: I really don't care enough about this to argue. OK?

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

KodeZwerg

  • Hero Member
  • *****
  • Posts: 2269
  • Fifty shades of code.
    • Delphi & FreePascal
Re: speeding up big data projects
« Reply #16 on: September 21, 2022, 09:45:35 pm »
-->> total offtopic to OP thread but i am curious
No, I do /not/.
Hello MarkMLl, why do you "escape" many times words in your text? What is the deeper meaning of that?
« Last Edit: Tomorrow at 31:76:97 xm by KodeZwerg »

Arioch

  • Sr. Member
  • ****
  • Posts: 421
Re: speeding up big data projects
« Reply #17 on: September 21, 2022, 09:55:30 pm »
or lack the knowledge to apply them.

Bingo!!! That is exactly whom we have here, person with no exposure to SQL, and SQL is more than memorising new syntax. It is rewiring you brain how you should think about data handling.
...and i was in those shoes. Gladly, i had no real data then, and i already got big performance problems with test data, so i started rewirting the system before i deployed it. And i managed to find a realyl good book about databases in Delphi, that was teaching way of thinking more, than specific libs pecularities.

Yet still some DB-global operations ended taking about 1.5 hours, when 5 minutes would've been enough.
Granted, people from pre-computer times considered it good. They felt standing on solid ground, looking how rrreal data is being chewed by rrreal program.

And since he already has real data on his hands and must keep it working while he would be learning - trial and error is bad proposition for him now.
Yes, i am pessimistic, and believe in his situation he should be too. Better safe than sorry.

I don't mind him mastering some SQL understanding and preparing slow transitioning. Like, in a year or two. After he would have a plenty time of making a copy of real data, pumping it into a local database and breaking that copy time and again, learnign in the process.

As of now, even casting performance aside, a single mistake in update/delete where would be for him tantamount to the infamous rm -rf / var/tmp

I was in a situation much less pressing than his is, and i know what a thin edge at times was between yet another lesson and disaster

------

Have a nice time. In the end, whatever we believe and advocate, it would be topicstarter making a choice, not me and you.

Arioch

  • Sr. Member
  • ****
  • Posts: 421
Re: speeding up big data projects
« Reply #18 on: September 21, 2022, 09:57:26 pm »
No, I do not.
Hello MarkMLl, why do you "escape" many times words in your text? What is the deeper meaning of that?

I'd suppose e-mail patterns of plain text of 1980-s or 1990-s, maybe even FIDO-net or BBSes, before internet was born, which later ascended into https://en.wikipedia.org/wiki/Markdown

daringly

  • Jr. Member
  • **
  • Posts: 73
Re: speeding up big data projects
« Reply #19 on: September 22, 2022, 04:29:54 am »
I appreciate all the feedback. I ultimately took one of the suggestions in this thread, abandoned Lazarus, and got an SQL developer to assist me to create a procedure that summarizes the user data (which then will be an input into a Python based neural net via xgboost to predict user behavior, and classify as smart/non-smart). Yeah it sounds borderline ridiculous, but the amount of money involved make it worth doing.

MarkMLl

  • Hero Member
  • *****
  • Posts: 8533
Re: speeding up big data projects
« Reply #20 on: September 22, 2022, 08:25:14 am »
I appreciate all the feedback. I ultimately took one of the suggestions in this thread, abandoned Lazarus, and got an SQL developer to assist me to create a procedure that summarizes the user data (which then will be an input into a Python based neural net via xgboost to predict user behavior, and classify as smart/non-smart). Yeah it sounds borderline ridiculous, but the amount of money involved make it worth doing.

Well done, glad you got somewhere (and promptly :-)

I think it's important to let the requirements drive the selection of tool and idiom. For something like this there would be a temptation to misapply the tool one is most comfortable with (i.e. Delphi/Lazarus/Pascal in this community) leading to a lot of wheel reinvention in order to keep resource usage tolerable.

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

PascalDragon

  • Hero Member
  • *****
  • Posts: 6322
  • Compiler Developer
Re: speeding up big data projects
« Reply #21 on: September 22, 2022, 08:49:48 am »
-->> total offtopic to OP thread but i am curious
No, I do /not/.
Hello MarkMLl, why do you "escape" many times words in your text? What is the deeper meaning of that?

MarkMLI doesn't use the BB codes of the forum, but designates them - as he wrote - in the ways that are used when formatting isn't available. I do that, too, when I write text mails:
cursive text is done as /cursive/, bold text done as *bold*, underlined as _underlined_ and strike out probably as -strike out- (I never use that one, so I don't know for sure :-[ ). Some of these are for example also recognized by TortoiseSVN's commit window. ;)

MarkMLl

  • Hero Member
  • *****
  • Posts: 8533
Re: speeding up big data projects
« Reply #22 on: September 22, 2022, 09:39:17 am »
MarkMLI doesn't use the BB codes of the forum, but designates them - as he wrote - in the ways that are used when formatting isn't available. I do that, too, when I write text mails:
cursive text is done as /cursive/, bold text done as *bold*, underlined as _underlined_ and strike out probably as -strike out- (I never use that one, so I don't know for sure :-[ ). Some of these are for example also recognized by TortoiseSVN's commit window. ;)

Sorry, habits of 40 years die hard. However I /do/ try to be careful if there's any chance of ambiguity, and that includes things like never putting punctuation immediately after a URL. And if I break my workflow to insert  >:( somebody'd better know I mean it :-)

Incidentally, this might be a good place to remind everybody of the [nobbc] markup, which makes it relatively easy to discuss things like [code] ... [/code] tags in the body of messages. This is one we have to memorise, I don't think SMF has a button for it.

And please could people /not/ put the version of Lazarus etc. they're using in their sig, since as soon as they change their sig the context of questions they've asked in the past gets messed up making forum (and Google) searches far less useful to anybody looking for "prior art".

MarkMLl
MT+86 & Turbo Pascal v1 on CCP/M-86, multitasking with LAN & graphics in 128Kb.
Logitech, TopSpeed & FTL Modula-2 on bare metal (Z80, '286 protected mode).
Pet hate: people who boast about the size and sophistication of their computer.
GitHub repositories: https://github.com/MarkMLl?tab=repositories

 

TinyPortal © 2005-2018